Ever since I've started using BOINC with Einstein on my ThinkPad E15 Gen 3, I've often seen bizarre numbers for task progress, elapsed and remaining. At the moment one task shows 4.5% progress, 4:09:43 elapsed, 5:33:55 remaining. That isn't 4.5%. I've also seen the remaining number suddenly up to several days, but then gradually over the next few hours go back down to just hours remaining. Also, checkpointing doesn't seem reliable; I've seen a task with many hours of elapsed time restart at 0 after a boot or somesuch. Not sure what info is needed to help this problem, here's my machine profile:
Name: DESKTOP-CHRQR6Q
Created: 7 Mar 2022 20:06:09 UTC
Total credit: 468,988
Average credit: 17,093.59
CPU type: AuthenticAMD AMD Ryzen 7 5700U with Radeon Graphics [Family 23 Model 104 Stepping 1]
Number of processors: 16
Coprocessors: AMD AMD Radeon(TM) Graphics (6523MB)
Operating system: Microsoft Windows 11 Professional x64 Edition, (10.00.22000.00)
BOINC client version: 7.20.2
Memory: 15178.43 MiB
Cache: 512 KiB
Swap space: 18634.43 MiB
Total disk space: 952.62 GiB
Free disk space: 869.12 GiB
Measured floating point speed: 2384.07 million ops/sec
Measured integer speed: 18062.16 million ops/sec
Average upload rate: 31.02 KiB/sec
Average download rate: 528.58 KiB/sec
Average turnaround time: 0.13 days
Number of times client has contacted server: 967
% of time BOINC client is running: 21.0547 %
While BOINC running, % of time host has an Internet connection: 96.0673 %
While BOINC running, % of time work is allowed: 95.9168 %
Task duration correction factor: 0.419478
BOINC can only respond to how
)
BOINC can only respond to how projects are configured. Unfortunately, Einstein uses old and proprietary task scheduling using a deprecated DCF mechanism. There can only be ONE DCF value applied to all tasks and sub-projects.
DCF = Task duration correction factor
This causes wild reporting of estimated time to complete values when the host switches from one type of task to a different type of task.
The DCF value is recalculated on every task return and validation but is very slow to respond in general and can take months to significantly change. What helps is to run only one type of task and no others so that the DCF value can converge faster to a stable value. The would allow the estimated time for completion to be more accurate.
Checkpointing is controlled by the scientific application. BOINC apps have no say in the matter.
Keith Myers wrote:...
)
Only one type of task is showing as completed so DCF is unlikely to be the most significant factor in the problem here.
The OP mentions "bizarre numbers for task progress" and the specific example was "4.5% progress, 4:09:43 elapsed, 5:33:55 remaining."
The machine is a laptop 8C/16T, quite low speed CPU with an integrated GPU. From the tasks list, only GRP tasks (both CPU and GPU) are being attempted. From the data supplied, BOINC is running about 20% of the time - say 5hrs per day. There are no completed CPU tasks, just 2 in progress and 2 that are 'timed out'. GPU tasks have completed (around 50mins each) so that would be controlling the DCF at a fairly steady value.
The quoted example is obviously the progress for one of the CPU tasks. I don't run CPU tasks but I believe checkpoints are quite widely spaced. Probably, the laptop is suspending itself quite regularly (when the user is inactive) and the option to keep tasks in memory when suspended may not be set. Under those circumstances it's possible that the CPU has indeed clocked up 4 hrs of run time and made very little real progress because of all the partial progress that has been thrown away each time computation is interrupted.
Since the completed GPU tasks are controlling DCF to a particular value, the CPU completion estimate is probably rather lower than what is actually needed. If a CPU task finally gets to write a checkpoint, this is where those "bizarre" changes might kick in since BOINC would jump the 'remaining time' estimate to a much larger value, based on the low actual progress and the large accumulated time for such low progress.
Keeping CPU tasks in memory when suspended, would probably make quite an improvement. This is only a guess - the OP needs to mention if that setting is already activated or not. Other details about how the machine is used and what the non-BOINC load is, would also help. Since there are no completed CPU tasks to look at, it's hard to know for sure.
Cheers,
Gary.
I didn't have leave non-GPU
)
I didn't have leave non-GPU tasks in memory checked, so I set that and we'll see if it helps. Thanks.
OK, thanks for
)
OK, thanks for confirming.
Please realise that a 1.8GHz CPU is slow so that CPU tasks might take many hours. Also, their 'work content' is a lot less than that for GPU tasks so your 'most productive' avenue would be to concentrate on just GPU tasks and opt out of the other type. This would also solve sudden jumps in estimates when a slow CPU task makes a big increase in the DCF.
You need to understand that the faster GPU tasks will then progressively reduce the DCF (in relatively small steps) every time a new GPU task finishes. The estimates will shrink again (with each GPU task) until a CPU task finishes and bumps them straight back up.
This behaviour is of no real consequence - they are just estimates after all - so just keep your work cache size like it is at the moment and all should be OK.
Cheers,
Gary.