Apparently some people get Client Errors when they turn on CPU throtteling ("Use at most than ... of CPU time"). These Client Errors have exit status -226 (TOO MANY EXITS) and lots of "Can't acquire lockfile - exiting" messages in stderr_out.
To further track down this problem we made a Windows Application that produces some additional diagnostic output in stderr_out (actaully lots when you have throtteling turned on).
If you encounter this problem, please run this Application first without changing your settings (anonymous platform - download the archive, open it, drop all the files into the project folder and restart BOINC), so we can figure out what's happening. For fast feedback, please report Tasks finished with this Application (in particular in case of a Client Error) to the server manually.
Thanks in advance,
BM
BM
Copyright © 2024 Einstein@Home. All rights reserved.
Client Errors when throtteling
)
as requested
http://einsteinathome.org/task/122342951
Thank you very much! I'm
)
Thank you very much! I'm discussing this with the BOINC developers.
BM
BM
ok resultid=122521832 R
)
ok
resultid=122521832
Rudy
Good day! I am working for
)
Good day!
I am working for Einstein@Home as a developer and currently investigating the problem described above.
To get some more debugging information, we kindly ask you to upgrade your BOINC client to the latest development version 6.10.29, which can be found here for numerous platforms. This version includes a new feature, which lets us see the first 64kB of the applications stderr log instead of its last 64kB. To activate that feature, please create a file named cc_config.xml in your BOINC data folder (in Windows that is usually under C:\Documents and Settings\All Users\Application Data\BOINC) with the following contents:
If you encounter the problem (symptoms are numerous application restarts in your message log), we also kindly ask you to shut down the BOINC client completely and look into a task manager, whether there are still Einstein@Home applications running. If so, please report that and whether the process in question is consuming any CPU time.
Thanks in advance.
RE: If you encounter the
)
Apparently someone did this - see Too many exits (was: No finished file). He didn't stop BOINC, but at least computing activity.
BM
BM
I have gotten a rash of these
)
I have gotten a rash of these errors since the site went back up after the 1/26/11 day of down time. I don't know if the timing is coincidental.
Most stderr output looks like:
too many exit(0)s
With additional errors like:
[17:59:12][26008][ERROR] Error allocating power spectrum device memory: 25166336 bytes (error: 2)
[17:59:12][26008][ERROR] Demodulation failed (error: 1006)!
[17:59:12][26008][WARN ] CUDA memory allocation problem encountered!
------> Returning control to BOINC, delaying restart for at least five minutes...
------> If this problem persists you should consider aborting this task.
[17:59:13][26010][INFO ] Starting data processing...
[17:59:14][26010][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 318 MB (194 MB free / 512 MB total) -> Used by this application: 0 MB
or
[16:50:59][23992][ERROR] Error creating CUDA FFT plan (error code: 2)
[16:50:59][23992][ERROR] Demodulation failed (error: 1011)!
[16:50:59][23992][WARN ] CUDA memory allocation problem encountered!
Here's a few of them:
http://einsteinathome.org/task/217030348
http://einsteinathome.org/task/217029719
http://einsteinathome.org/task/217029182
http://einsteinathome.org/task/217029179
http://einsteinathome.org/task/217028962
http://einsteinathome.org/task/217028916
I have a total of 40 most on Binary Radio Pulsar Search v1.06 (BRP3cuda32fullCPU)
I also had a few that look like:
WU download error: couldn't get input files:
h1_1329.00_S5R4
-119
MD5 check failed
But I think those were transfers that were interrupted when the server when down they were clustered around 26 Jan 2011 14:41:19 UTC
Joe
RE: Here's a few of
)
I made the links active, but where are the (TOO MANY EXITS) and "Can't acquire lockfile - exiting" messages for those tasks?
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: I made the links
)
thank you for dealing with my laziness.
I'm not sure this is exactly the same problem but the tasks that I've examined report:
Exit status -226 (0xffffffffffffff1e)
and the stderr output start with:
6.10.58
too many exit(0)s
I am a novice at looking at this kind of output and mine do seem to be CUDA memory allocation errors. I guess I should have read closer if the others were problems obtaining locks.
I apologize if I'm asking about this problem in the wrong place but I would appreciate any suggestion on how to proceed. Leave it alone, disable GPU again, or what?
Thanks for looking into it.
Joe
Oh yes, and I have recently
)
Oh yes, and I have recently throttled back in an attempt to deal with temperature issues. I'm surprised how effective it is. Running at 50% CPU is 20C cooler on one system than 80%.
Joe
My problem seems to be
)
My problem seems to be resolved by rebooting. This is a Ubuntu 10.04 laptop. I now have a few CUDA BPR's completed and validated.