Today I attempt to get additional info about BOINC issues with app_config and work fetch so look closely on E@h tasks and their progress.
And noticed such thing
Those 3 tasks sit on 89.979% quite long time already w/o any change in progress counter.
Perhaps it's known behavior but what if task will be stopped at this time? Is it checkpointing? Or all that time it spend sitting on this % will be lost on restart? Much longer than few minutes.
And they are CPU, not GPU ones, not quite usual behavior to "freeze" progress for CPU tasks...
Copyright © 2024 Einstein@Home. All rights reserved.
And 2 of them have very small
)
And 2 of them have very small memory footprint while consuming CPU...
What they do this time?...
This is normal, routine
)
This is normal, routine behaviour. If you do a forum search for "follow-up AND stage" you will find many examples where this has been asked and explained over the last 10 years :-).
I've linked to a brief example of such an explanation.
If you check the stderr output on the website for one of your completed and returned tasks you will see where the checkpoints have been written - a capital "C" is used to denote a checkpoint.
The last 10 of those are spaced differently from all the earlier ones so my guess is that a checkpoint is written after each candidate is evaluated in double precision. In other words, you don't lose all in the follow-up stage if you stop running during that stage.
You might like to confirm that since I don't run CPU tasks on my GPU hosts.
Cheers,
Gary.
Thanks!
)
Thanks!