Odd progress/elapsed/remaining reports in BOINC

n2xjk
n2xjk
Joined: 18 Mar 05
Posts: 3
Credit: 10028806
RAC: 5306
Topic 227927

Ever since I've started using BOINC with Einstein on my ThinkPad E15 Gen 3, I've often seen bizarre numbers for task progress, elapsed and remaining.    At the moment one task shows 4.5% progress, 4:09:43 elapsed, 5:33:55 remaining.   That isn't 4.5%.   I've also seen the remaining number suddenly up to several days, but then gradually over the next few hours go back down to just hours remaining.   Also, checkpointing doesn't seem reliable; I've seen a task with many hours of elapsed time restart at 0 after a boot or somesuch.   Not sure what info is needed to help this problem, here's my machine profile:
 

Name: DESKTOP-CHRQR6Q

Created: 7 Mar 2022 20:06:09 UTC

Total credit: 468,988

Average credit: 17,093.59

CPU type: AuthenticAMD AMD Ryzen 7 5700U with Radeon Graphics [Family 23 Model 104 Stepping 1]

Number of processors: 16

Coprocessors: AMD AMD Radeon(TM) Graphics (6523MB)

Operating system: Microsoft Windows 11 Professional x64 Edition, (10.00.22000.00)

BOINC client version: 7.20.2

Memory: 15178.43 MiB

Cache: 512 KiB

Swap space: 18634.43 MiB

Total disk space: 952.62 GiB

Free disk space: 869.12 GiB

Measured floating point speed: 2384.07 million ops/sec

Measured integer speed: 18062.16 million ops/sec

Average upload rate: 31.02 KiB/sec

Average download rate: 528.58 KiB/sec

Average turnaround time: 0.13 days

Tasks: 60

Number of times client has contacted server: 967

Last time contacted server: 4 Aug 2022 14:00:10 UTC

% of time BOINC client is running: 21.0547 %

While BOINC running, % of time host has an Internet connection: 96.0673 %

While BOINC running, % of time work is allowed: 95.9168 %

Task duration correction factor: 0.419478

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18751471839
RAC: 7102283

BOINC can only respond to how

BOINC can only respond to how projects are configured. Unfortunately, Einstein uses old and proprietary task scheduling using a deprecated DCF mechanism.  There can only be ONE DCF value applied to all tasks and sub-projects.

DCF = Task duration correction factor

This causes wild reporting of estimated time to complete values when the host switches from one type of task to a different type of task.

The DCF value is recalculated on every task return and validation but is very slow to respond in general and can take months to significantly change.  What helps is to run only one type of task and no others so that the DCF value can converge faster to a stable value.  The would allow the estimated time for completion to be more accurate.

Checkpointing is controlled by the scientific application.  BOINC apps have no say in the matter.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117694775828
RAC: 35125214

Keith Myers wrote:...

Keith Myers wrote:
... Einstein uses old and proprietary task scheduling using a deprecated DCF mechanism.  There can only be ONE DCF value applied to all tasks and sub-projects.

Only one type of task is showing as completed so DCF is unlikely to be the most significant factor in the problem here.

The OP mentions "bizarre numbers for task progress" and the specific example was "4.5% progress, 4:09:43 elapsed, 5:33:55 remaining."

The machine is a laptop 8C/16T, quite low speed CPU with an integrated GPU.  From the tasks list, only GRP tasks (both CPU and GPU) are being attempted.  From the data supplied, BOINC is running about 20% of the time - say 5hrs per day.  There are no completed CPU tasks, just 2 in progress and 2 that are 'timed out'.  GPU tasks have completed (around 50mins each) so that would be controlling the DCF at a fairly steady value.

The quoted example is obviously the progress for one of the CPU tasks.  I don't run CPU tasks but I believe checkpoints are quite widely spaced.  Probably, the laptop is suspending itself quite regularly (when the user is inactive) and the option to keep tasks in memory when suspended may not be set.  Under those circumstances it's possible that the CPU has indeed clocked up 4 hrs of run time and made very little real progress because of all the partial progress that has been thrown away each time computation is interrupted.

Since the completed GPU tasks are controlling DCF to a particular value, the CPU completion estimate is probably rather lower than what is actually needed.  If a CPU task finally gets to write a checkpoint, this is where those "bizarre" changes might kick in since BOINC would jump the 'remaining time' estimate to a much larger value, based on the low actual progress and the large accumulated time for such low progress.

Keeping CPU tasks in memory when suspended, would probably make quite an improvement.  This is only a guess - the OP needs to mention if that setting is already activated or not.  Other details about how the machine is used and what the non-BOINC load is, would also help.  Since there are no completed CPU tasks to look at, it's hard to know for sure.

Cheers,
Gary.

n2xjk
n2xjk
Joined: 18 Mar 05
Posts: 3
Credit: 10028806
RAC: 5306

I didn't have leave non-GPU

I didn't have leave non-GPU tasks in memory checked, so I set that and we'll see if it helps.   Thanks.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117694775828
RAC: 35125214

OK, thanks for

OK, thanks for confirming.

Please realise that a 1.8GHz CPU is slow so that CPU tasks might take many hours.  Also, their 'work content' is a lot less than that for GPU tasks so your 'most productive' avenue would be to concentrate on just GPU tasks and opt out of the other type.  This would also solve sudden jumps in estimates when a slow CPU task makes a big increase in the DCF.

You need to understand that the faster GPU tasks will then progressively reduce the DCF (in relatively small steps) every time a new GPU task finishes.  The estimates will shrink again (with each GPU task) until a CPU task finishes and bumps them straight back up.

This behaviour is of no real consequence - they are just estimates after all - so just keep your work cache size like it is at the moment and all should be OK.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.