Since this option was turned on by default, maybe some inherent protection could be done in the core client or in the science application?
Join up with the BOINC Development list, and put your thoughts into the hat, and see what they can accomplish. I am not on the list, so I am unsure where it resides, but they are always looking for good ideas.
Since this option was turned on by default, maybe some inherent protection could be done in the core client or in the science application?
Join up with the BOINC Development list, and put your thoughts into the hat, and see what they can accomplish. I am not on the list, so I am unsure where it resides, but they are always looking for good ideas.
Call these around code segments during which you don't want to be suspended or killed by the core client. NOTE: this is done automatically while checkpointing.
So the problem, as I supposed, should lie in the science application?
Call these around code segments during which you don't want to be suspended or killed by the core client. NOTE: this is done automatically while checkpointing.
So the problem, as I supposed, should lie in the science application?
YG
We actually asked David Anderson about that, and he told us what the NOTE above says (it might even have been inserted after our question).
Anyway - I'll have another look into the code. Might be some cleanup after removing the checkpoint takes long enough that it should be treated as a critical section.
Just to note that this has also been a recurring problem with my Sun Blade100. It has 1GB RAM so I don't think it involves a lack of memory. I don't watch things frequently enough to be sure, but there may be some correlation with the weekly benchmarking. Being a Sun there is no boinc screen saver, so video can probably be ruled out. But, I cannot rule out the boinc client, since I'm only running version 4.43. In this example (still running) after waiting 6 hours I had to reboot the system to get boinc to continue.
********
Here is the stderr.txt from slots/0/
>cat stderr.txt
2006-09-28 09:42:23.5455 [normal]: Start of BOINC application 'einstein_S5R1_4.08_sparc-sun-solaris2.7'.
2006-09-28 09:42:23.5618 [normal]: Started search at lalDebugLevel = 0
2006-09-28 09:42:27.4704 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-09-28 09:42:27.4856 [normal]: No usable checkpoint found, starting from beginning.
2006-09-28 17:45:23.9974 [normal]: Search finished successfully.
2006-09-29 15:42:26.3965 [normal]: Start of BOINC application 'einstein_S5R1_4.08_sparc-sun-solaris2.7'.
2006-09-29 15:42:26.6490 [normal]: Started search at lalDebugLevel = 0
2006-09-29 15:42:32.5813 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-09-29 15:42:32.6188 [normal]: No usable checkpoint found, starting from beginning.
*************
But I've has this happen over 20 times, and sometimes the same workunit will be processed more than two times before it finally kicks the finished result back to einstein. Here is the url for another recent example:
Although, this example had an additional problem with the signal 15. I've had previous results without any other problems, but always doubling, or tripling the time to process a result (i.e. basically rerunning the work again and again).
Hope this adds some impetus to sorting out the problem. I've switched back to seti several times when the frustration becomes too much; although, I'd prefer to leave this system on einstein.
An update on my previous problem report on a Sun Blade100 running Solaris8. Part of the problem appears related to the old Boinc client version 4.43. I replaced it a couple weeks ago with Boinc version 5.49, for Solaris8 UltraSPARC I/II, from Stefan Urbat. When using this version of Boinc I've had none of the "hangups" noted in my previous post. Einstein runs with no problems under Boinc 5.49. This doesn't clear Einstein of blame, it just doesn't hic-cup when used with the newer Boinc; although, my problem could have been solely the old Boinc 4.43.
Einstein doesn't run out of the box with this new Boinc for Solaris/Sparc. This Boinc 5.49 reports itself as running under "sparc64-sun-solaris"; thus, Einstein doesn't recognize this platform and will provide no work. You have to explicitly tell Einstein to use its own standard application by setting up an app_info.xml. Create the file app_info.xml in the subdirectory projects/einstein.phys.uwm.edu/ under your Boinc directory. The following is my app_info.xml file:
This "fools" Einstein into seeing this "non-standard" platform as an anonymous platform. Obviously, if/when Einstein changes their Solaris/Sparc application this app_info.xml well need to be modified.
This new Boinc 5.49 does not include a "boincmgr" GUI. The old "boincmgr" from the previous Boinc 4.43 works with the new Boinc 5.49. Thus, using the command line I start the new Boinc, then I use the old boincmgr GUI to manage Boinc.
Sorry for the intrusion into this thread, but my initial problem seems related.
Chris
P.S. The version of Seti provided with the new Boinc by Stefan Urbat works well.
I've turned on the "remove
)
I've turned on the "remove from memory" option for a few public computers because I don't want those crunching programs to disturb other users.
Since this option was turned on by default, maybe some inherent protection could be done in the core client or in the science application?
YG
Welcome To Team China!
RE: Since this option was
)
Join up with the BOINC Development list, and put your thoughts into the hat, and see what they can accomplish. I am not on the list, so I am unsure where it resides, but they are always looking for good ideas.
RE: RE: Since this option
)
Here is a link to various email lists for BOINC.
@Pooh Bear 27 &
)
@Pooh Bear 27 & nevermorestr:
Thanks for your suggestions, I will try it there;-)
YG
Welcome To Team China!
I just found the following
)
I just found the following APIs in this page:
So the problem, as I supposed, should lie in the science application?
YG
Welcome To Team China!
RE: I just found the
)
We actually asked David Anderson about that, and he told us what the NOTE above says (it might even have been inserted after our question).
Anyway - I'll have another look into the code. Might be some cleanup after removing the checkpoint takes long enough that it should be treated as a critical section.
BM
BM
Just to note that this has
)
Just to note that this has also been a recurring problem with my Sun Blade100. It has 1GB RAM so I don't think it involves a lack of memory. I don't watch things frequently enough to be sure, but there may be some correlation with the weekly benchmarking. Being a Sun there is no boinc screen saver, so video can probably be ruled out. But, I cannot rule out the boinc client, since I'm only running version 4.43. In this example (still running) after waiting 6 hours I had to reboot the system to get boinc to continue.
********
Here is the stderr.txt from slots/0/
>cat stderr.txt
2006-09-28 09:42:23.5455 [normal]: Start of BOINC application 'einstein_S5R1_4.08_sparc-sun-solaris2.7'.
2006-09-28 09:42:23.5618 [normal]: Started search at lalDebugLevel = 0
2006-09-28 09:42:27.4704 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-09-28 09:42:27.4856 [normal]: No usable checkpoint found, starting from beginning.
2006-09-28 17:45:23.9974 [normal]: Search finished successfully.
2006-09-29 15:42:26.3965 [normal]: Start of BOINC application 'einstein_S5R1_4.08_sparc-sun-solaris2.7'.
2006-09-29 15:42:26.6490 [normal]: Started search at lalDebugLevel = 0
2006-09-29 15:42:32.5813 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-09-29 15:42:32.6188 [normal]: No usable checkpoint found, starting from beginning.
*************
But I've has this happen over 20 times, and sometimes the same workunit will be processed more than two times before it finally kicks the finished result back to einstein. Here is the url for another recent example:
http://einsteinathome.org/task/44830758
Although, this example had an additional problem with the signal 15. I've had previous results without any other problems, but always doubling, or tripling the time to process a result (i.e. basically rerunning the work again and again).
Hope this adds some impetus to sorting out the problem. I've switched back to seti several times when the frustration becomes too much; although, I'd prefer to leave this system on einstein.
Czesc, Chris
An update on my previous
)
An update on my previous problem report on a Sun Blade100 running Solaris8. Part of the problem appears related to the old Boinc client version 4.43. I replaced it a couple weeks ago with Boinc version 5.49, for Solaris8 UltraSPARC I/II, from Stefan Urbat. When using this version of Boinc I've had none of the "hangups" noted in my previous post. Einstein runs with no problems under Boinc 5.49. This doesn't clear Einstein of blame, it just doesn't hic-cup when used with the newer Boinc; although, my problem could have been solely the old Boinc 4.43.
Einstein doesn't run out of the box with this new Boinc for Solaris/Sparc. This Boinc 5.49 reports itself as running under "sparc64-sun-solaris"; thus, Einstein doesn't recognize this platform and will provide no work. You have to explicitly tell Einstein to use its own standard application by setting up an app_info.xml. Create the file app_info.xml in the subdirectory projects/einstein.phys.uwm.edu/ under your Boinc directory. The following is my app_info.xml file:
This "fools" Einstein into seeing this "non-standard" platform as an anonymous platform. Obviously, if/when Einstein changes their Solaris/Sparc application this app_info.xml well need to be modified.
This new Boinc 5.49 does not include a "boincmgr" GUI. The old "boincmgr" from the previous Boinc 4.43 works with the new Boinc 5.49. Thus, using the command line I start the new Boinc, then I use the old boincmgr GUI to manage Boinc.
Sorry for the intrusion into this thread, but my initial problem seems related.
Chris
P.S. The version of Seti provided with the new Boinc by Stefan Urbat works well.