No checkpoint checkpoint.cpt found

Joe Caverly
Joe Caverly
Joined: 13 May 07
Posts: 4
Credit: 5716492
RAC: 0
Topic 197384

I recently upgraded to BOINC 7.2.39(x86) on a Vista machine.

Before I upgraded, I had dis-allowed any new tasks, and completed all existing tasks. I then removed Einstein@Home from BOINC.

After installing the new release of BOINC, I added the Einstein@Home project. After BOINC downloaded everything for the project, number crunching began.

Under Tools->Computing Preferences, I have "Tasks checkpoint to disk at most every 120 seconds."

This is the same setting as I was using in 7.2.28.1 from which I upgraded.

I had to restart my system, and after the restart, I noticed that the running tasks had started over from the beginning.

I checked the slots 0 and 1 folders, but there is no checkpoint.cpt file in either folder.

The stderr.txt file in both slots has the line;

2014-02-17 06:39:18.7348 (2720) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch

After running for 50 minutes, I still find no checkpoint.cpt files.

I must be missing something simple, as I am pretty sure that in the previous version of BOINC that I was using, a checkpoint was created, so that when I restarted the system, tasks would not start from the beginning again.

Constructive input is appreciated.

Joe

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 739111499
RAC: 1261149

No checkpoint checkpoint.cpt found

Hi!

Unfortunately, some of the CasA GW tasks do checkpoint rather infrequently, (sometimes at intervals > 50 min depending on CPU performance), see Bernd's message on this here: http://einsteinathome.org/node/197302&nowrap=true#129184 and the discussion leading to this message. Please bear with us while we are looking into this.

Cheers
HB

Joe Caverly
Joe Caverly
Joined: 13 May 07
Posts: 4
Credit: 5716492
RAC: 0

Thankyou for the quick

Thankyou for the quick response, and the link. It has answered my question.

I now have a checkpoint.cpt file in each slot. It created the files approx. one hour after Einstein@Home started running, according to the stderr.txt file.

I will monitor the timestamp of the checkpoint.cpt files to determine the best time to shutdown my system, and not loose all of that work.

Joe

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 739111499
RAC: 1261149

Hi I'm not quite sure how

Hi

I'm not quite sure how suspending (to disk/ to RAM) instead of power off might also help in this scenario but if that's available on the machine(s) in question, it's worth a try, I guess.

Cheers
HB

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2963965685
RAC: 711382

RE: Hi I'm not quite sure

Quote:

Hi

I'm not quite sure how suspending (to disk/ to RAM) instead of power off might also help in this scenario but if that's available on the machine(s) in question, it's worth a try, I guess.

Cheers
HB


There is a line of discussion being followed by the third-party developers at SETI concerning a very different model of disk write caching which Microsoft implemented in (Vista?)/7/8, compared with XP and earlier.

http://support.microsoft.com/kb/148505

The BOINC API doesn't do any of that, so there's a possibility that data might be lost at shutdown. We've seen it mainly in truncated stderr.txt, but it might affect checkpoint files too - though I personally would be surprised if even Microsoft could delay-write files by as much as an hour.

Edit - I see I linked a very old MS KB article. But there's plenty of discussion on the web about how Windows 7 caching is more aggressive - e.g.

http://cboard.cprogramming.com/c-programming/129192-fflush-not-working-windows-7-a-2.html

mikey
mikey
Joined: 22 Jan 05
Posts: 12718
Credit: 1839121911
RAC: 3569

RE: Thankyou for the quick

Quote:

Thankyou for the quick response, and the link. It has answered my question.

I now have a checkpoint.cpt file in each slot. It created the files approx. one hour after Einstein@Home started running, according to the stderr.txt file.

I will monitor the timestamp of the checkpoint.cpt files to determine the best time to shutdown my system, and not loose all of that work.
Joe

Each project chooses whether to have their units save checkpoints or not and how often, it is part of the flexibility the projects have. So if you run multiple projects it could be tricky.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.