FGRPSSE "pause"

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 198
Credit: 62,013,777
RAC: 120,385
Topic 224930

Today I attempt to get additional info about BOINC issues with app_config and work fetch so look closely on E@h tasks and their progress.

 

And noticed such thing

Those 3 tasks sit on 89.979% quite long time already w/o any change in progress counter.

Perhaps it's known behavior but what if task will be stopped at this time? Is it checkpointing? Or all that time it spend sitting on this % will be lost on restart? Much longer than few minutes.

And they are CPU, not GPU ones, not quite usual behavior to "freeze" progress for CPU tasks...

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 198
Credit: 62,013,777
RAC: 120,385

And 2 of them have very small

And 2 of them have very small memory footprint while consuming CPU...

What they do this time?...

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,495
Credit: 66,054,821,672
RAC: 54,923,058

This is normal, routine

This is normal, routine behaviour.  If you do a forum search for "follow-up AND stage" you will find many examples where this has been asked and explained over the last 10 years :-).

I've linked to a brief example of such an explanation.

If you check the stderr output on the website for one of your completed and returned tasks you will see where the checkpoints have been written - a capital "C" is used to denote a checkpoint.

The last 10 of those are spaced differently from all the earlier ones so my guess is that a checkpoint is written after each candidate is evaluated in double precision. In other words, you don't lose all in the follow-up stage if you stop running during that stage.

You might like to confirm that since I don't run CPU tasks on my GPU hosts.

Cheers,
Gary.

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 198
Credit: 62,013,777
RAC: 120,385

Thanks!  

Thanks!

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.