'Fstat.out.ckp' not found after a workunit has been finished

Yin Gang
Yin Gang
Joined: 23 Feb 05
Posts: 52
Credit: 120187750
RAC: 0

I've turned on the "remove

I've turned on the "remove from memory" option for a few public computers because I don't want those crunching programs to disturb other users.

Since this option was turned on by default, maybe some inherent protection could be done in the core client or in the science application?

YG

Welcome To Team China!

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

RE: Since this option was

Message 42611 in response to message 42610

Quote:
Since this option was turned on by default, maybe some inherent protection could be done in the core client or in the science application?


Join up with the BOINC Development list, and put your thoughts into the hat, and see what they can accomplish. I am not on the list, so I am unsure where it resides, but they are always looking for good ideas.

Erik
Erik
Joined: 14 Feb 06
Posts: 2815
Credit: 2645600
RAC: 0

RE: RE: Since this option

Message 42612 in response to message 42611

Quote:
Quote:
Since this option was turned on by default, maybe some inherent protection could be done in the core client or in the science application?

Join up with the BOINC Development list, and put your thoughts into the hat, and see what they can accomplish. I am not on the list, so I am unsure where it resides, but they are always looking for good ideas.

Here is a link to various email lists for BOINC.

Yin Gang
Yin Gang
Joined: 23 Feb 05
Posts: 52
Credit: 120187750
RAC: 0

@Pooh Bear 27 &

@Pooh Bear 27 & nevermorestr:

Thanks for your suggestions, I will try it there;-)

YG

Welcome To Team China!

Yin Gang
Yin Gang
Joined: 23 Feb 05
Posts: 52
Credit: 120187750
RAC: 0

I just found the following

I just found the following APIs in this page:

Quote:


Critical sections

void boinc_begin_critical_section();
void boinc_end_critical_section();

Call these around code segments during which you don't want to be suspended or killed by the core client. NOTE: this is done automatically while checkpointing.

So the problem, as I supposed, should lie in the science application?

YG

Welcome To Team China!

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4350
Credit: 253891707
RAC: 35524

RE: I just found the

Message 42615 in response to message 42614

Quote:

I just found the following APIs in this page:

Quote:


Critical sections

void boinc_begin_critical_section();
void boinc_end_critical_section();

Call these around code segments during which you don't want to be suspended or killed by the core client. NOTE: this is done automatically while checkpointing.

So the problem, as I supposed, should lie in the science application?

YG

We actually asked David Anderson about that, and he told us what the NOTE above says (it might even have been inserted after our question).

Anyway - I'll have another look into the code. Might be some cleanup after removing the checkpoint takes long enough that it should be treated as a critical section.

BM

BM

Chris Kojiro
Chris Kojiro
Joined: 2 Mar 06
Posts: 4
Credit: 131915133
RAC: 0

Just to note that this has

Just to note that this has also been a recurring problem with my Sun Blade100. It has 1GB RAM so I don't think it involves a lack of memory. I don't watch things frequently enough to be sure, but there may be some correlation with the weekly benchmarking. Being a Sun there is no boinc screen saver, so video can probably be ruled out. But, I cannot rule out the boinc client, since I'm only running version 4.43. In this example (still running) after waiting 6 hours I had to reboot the system to get boinc to continue.

********

Here is the stderr.txt from slots/0/

>cat stderr.txt

2006-09-28 09:42:23.5455 [normal]: Start of BOINC application 'einstein_S5R1_4.08_sparc-sun-solaris2.7'.
2006-09-28 09:42:23.5618 [normal]: Started search at lalDebugLevel = 0
2006-09-28 09:42:27.4704 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-09-28 09:42:27.4856 [normal]: No usable checkpoint found, starting from beginning.
2006-09-28 17:45:23.9974 [normal]: Search finished successfully.

2006-09-29 15:42:26.3965 [normal]: Start of BOINC application 'einstein_S5R1_4.08_sparc-sun-solaris2.7'.
2006-09-29 15:42:26.6490 [normal]: Started search at lalDebugLevel = 0
2006-09-29 15:42:32.5813 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-09-29 15:42:32.6188 [normal]: No usable checkpoint found, starting from beginning.

*************

But I've has this happen over 20 times, and sometimes the same workunit will be processed more than two times before it finally kicks the finished result back to einstein. Here is the url for another recent example:

http://einsteinathome.org/task/44830758

Although, this example had an additional problem with the signal 15. I've had previous results without any other problems, but always doubling, or tripling the time to process a result (i.e. basically rerunning the work again and again).

Hope this adds some impetus to sorting out the problem. I've switched back to seti several times when the frustration becomes too much; although, I'd prefer to leave this system on einstein.

Czesc, Chris

Chris Kojiro
Chris Kojiro
Joined: 2 Mar 06
Posts: 4
Credit: 131915133
RAC: 0

An update on my previous

An update on my previous problem report on a Sun Blade100 running Solaris8. Part of the problem appears related to the old Boinc client version 4.43. I replaced it a couple weeks ago with Boinc version 5.49, for Solaris8 UltraSPARC I/II, from Stefan Urbat. When using this version of Boinc I've had none of the "hangups" noted in my previous post. Einstein runs with no problems under Boinc 5.49. This doesn't clear Einstein of blame, it just doesn't hic-cup when used with the newer Boinc; although, my problem could have been solely the old Boinc 4.43.

Einstein doesn't run out of the box with this new Boinc for Solaris/Sparc. This Boinc 5.49 reports itself as running under "sparc64-sun-solaris"; thus, Einstein doesn't recognize this platform and will provide no work. You have to explicitly tell Einstein to use its own standard application by setting up an app_info.xml. Create the file app_info.xml in the subdirectory projects/einstein.phys.uwm.edu/ under your Boinc directory. The following is my app_info.xml file:

 
     
         einstein_S5R1
     
     
         einstein_S5R1_4.08_sparc-sun-solaris2.7
     
     
     
         einstein_S5R1
         408
         
             einstein_S5R1_4.08_sparc-sun-solaris2.7
             
         
     
 


This "fools" Einstein into seeing this "non-standard" platform as an anonymous platform. Obviously, if/when Einstein changes their Solaris/Sparc application this app_info.xml well need to be modified.

This new Boinc 5.49 does not include a "boincmgr" GUI. The old "boincmgr" from the previous Boinc 4.43 works with the new Boinc 5.49. Thus, using the command line I start the new Boinc, then I use the old boincmgr GUI to manage Boinc.

Sorry for the intrusion into this thread, but my initial problem seems related.

Chris

P.S. The version of Seti provided with the new Boinc by Stefan Urbat works well.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.