More checkpoints

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0
Topic 197302

The Gravitational Wave Search tasks run long times without writing a checkpoint, almost 40 minutes on my computer, so at every shutdown a lot of work is lost. I'd like to be able to stop any time without having to worry about wasting too much CPU power. Please consider adding more checkpoints, the more frequent the better.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

More checkpoints

What is your setting in the Computing preferences for "Tasks checkpoint to disk at most every xx seconds"? Set this to high and the app won't checkpoint...

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0

That setting is unchanged

That setting is unchanged from the default value of 60 seconds. In fact I've never changed global preferences through any project, I always use BOINC Manager.

Other projects' checkpoints never seem to be much older than a minute, and I think other Einstein tasks checkpointed more frequently, too. But Gravitational Wave Search are the only ones I get now, and those seem to be split in 13 blocks. The progress bar in BOINC Manager advances in steps of 7.7% and a checkpoint is written only after each step.

Right now, the Einstein task currently running is 24 minutes past the latest checkpoint and if I decided to shut down the computer now all that work would be lost. I'd hate that. For the Malariacontrol task running in parallel it would be 40 seconds, nothing to even think about. I'd really wish I didn't have to care about Einstein, too.

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

I've noticed this too, it

I've noticed this too, it seems to be a feature of S6CasA (and to a lesser extent, GRP2). Obviously the speed of the host plays into this to some extent but even on reasonable hosts it can be 10's of minutes between checkpoints.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 586304742
RAC: 120119

The request for more

The request for more checkpoints seems valid. But as a work-around you could use standby (costs some energy, but saves you some time upon waking up) or suspend to disk (takes longer to sleep and wake, but should still be faster than a regular boot).

MrS

Scanning for our furry friends since Jan 2002

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4342
Credit: 252640035
RAC: 35394

In general we design our Apps

In general we design our Apps to (potentially) checkpoint as often as possible / feasible, i.e. after each reasonably independent computation.

Feasibility limits here include the programming effort (parameters in data structures modified in nested loops saved and restored) and the data volume (storage space and time to write) of the necessary checkpoints. It doesn't make much sense to checkpoint every minute when writing the checkpoint takes several seconds (multiplied by the number of instances that may be running and checkpointing at once) and thus noticeably slows down computation, or if initializing the application picking up from a checkpoint takes several minutes alone.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3000482033
RAC: 698802

Currently the GW S6 CasA app

Currently the GW S6 CasA app checkpoints about once an hour on this host. That host runs for about 8 hours on weekdays only, so it basically gets shut down each day.
That means in a worst case scenario 59 minutes of crunching time are lost or 1/8 of available crunching time.
I'd consider that suboptimal.
In a effort to mitigate the situation the owner of said host has taken to monitor the timestamp of the checkpoint file in the slot directory and not to shut down unless the checkpoint is fairly recent but rather wait for the next checkpoint.

Very frequent checkpoints (e.g. every minute or every 5 minutes) may not be feasible, but said owner feels that checkpointing at least every 15 minutes would be highly desirable - keeping an eye on the checkpoint file is tedious.
Losses of up to 15 minutes crunching time are more easily stomached. To potentially lose an hour is very annoying.

So if you can make the app checkpoint a little more frequently, that would be nice.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2793143
RAC: 2671

Another curve ball is that

Another curve ball is that Boinc 7.2.38 introduced the following:

client: if app doesn't report fraction done, estimate it.
client: if app doesn't report fraction done, estimate fraction done in a way that converges to but never reaches 100%.

With Boinc 7.2.39 now the latest Recommended version, the app will seemly be progressing, before jumping back if you restart Boinc,
or if you have suspend when in use selected, and don't have Leave Tasks in Memory selected, and you interrupt crunching.

Claggy

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4342
Credit: 252640035
RAC: 35394

RE: So if you can make the

Quote:
So if you can make the app checkpoint a little more frequently, that would be nice.

Thanks for the note.

Under different circumstances I would just advise to not run GW work on this computer, but run BRP4 instead. However we currently haven't enough Arecibo data to produce enough BRP4 work for normal CPUs, so there currently is no alternative (I suspect FGRP3 won't run the last stage reasonably fast either).

It turns out that the setup of the current GW analysis run makes more extensive use of a feature ("second order spindown") that was negligible when we added checkpointing to the application; you may see this as a communication problem between scientists and (software) engineers.

I'll see what I can do about that, but seeing my current pile of work I doubt that I can fix that within the remaining time of the S6CasA run. As we probably need to change a few things for the next GW run anyway, I am confident, though, that the next GW app will checkpoint more frequently.

BM

BM

Eyrie
Eyrie
Joined: 20 Feb 14
Posts: 4
Credit: 4415
RAC: 0

Thanks, Bernd :) I wasn't

Thanks, Bernd :)

I wasn't expecting an update to the current app, it would just be nice if this was taken into consideration for future app development.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.