Exit Status 10 (0xa)

GrahamH
GrahamH
Joined: 7 May 05
Posts: 4
Credit: 407815
RAC: 0
Topic 192724

My last six E@H workunits have all finished Client Error - Exit Status 10 (0xa).

They are all claiming the same amount of credit as the successfully completed Results for the same WU.
On one of them there are two othere completed results who have apparently been granted credit but my result hasn't. If this is going to happen for all the others I will have wasted the last 10 days of processing time (since LHC@H and S@H are not currently producing WUs E@H had exclusive use of my PC).

I can't find anything about the exit code so I can't try to fix it! If I'm not going to get credit for results that fail after running for 30+ hours each I might as well wait until S@H come back and let them have the time - E@H looks like a waste of time.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

Exit Status 10 (0xa)

Quote:

My last six E@H workunits have all finished Client Error - Exit Status 10 (0xa).

They are all claiming the same amount of credit as the successfully completed Results for the same WU.
On one of them there are two othere completed results who have apparently been granted credit but my result hasn't. If this is going to happen for all the others I will have wasted the last 10 days of processing time (since LHC@H and S@H are not currently producing WUs E@H had exclusive use of my PC).

I can't find anything about the exit code so I can't try to fix it! If I'm not going to get credit for results that fail after running for 30+ hours each I might as well wait until S@H come back and let them have the time - E@H looks like a waste of time.

Don't know if this is overly simplistic, but just for grins and giggles, try rebooting...

Edit: Poking around some on the web, I found TANPAKU had this issue in the past with a bug with the checkpointing in their application. The forum thread describing the issue is here

GrahamH
GrahamH
Joined: 7 May 05
Posts: 4
Credit: 407815
RAC: 0

RE: RE: My last six E@H

Message 64176 in response to message 64175

Quote:
Quote:

My last six E@H workunits have all finished Client Error - Exit Status 10 (0xa).

They are all claiming the same amount of credit as the successfully completed Results for the same WU.
On one of them there are two othere completed results who have apparently been granted credit but my result hasn't. If this is going to happen for all the others I will have wasted the last 10 days of processing time (since LHC@H and S@H are not currently producing WUs E@H had exclusive use of my PC).

I can't find anything about the exit code so I can't try to fix it! If I'm not going to get credit for results that fail after running for 30+ hours each I might as well wait until S@H come back and let them have the time - E@H looks like a waste of time.

Don't know if this is overly simplistic, but just for grins and giggles, try rebooting...

Edit: Poking around some on the web, I found TANPAKU had this issue in the past with a bug with the checkpointing in their application. The forum thread describing the issue is here

As it happens I rebooted around 16:30(UTC) this afternoon. The machine has also been rebooted two or three times over the last ten days.

I'm going to bed now - will see how the latest WU gets on tomorrow (or Saturday!)

Udo
Udo
Joined: 19 May 05
Posts: 203
Credit: 8945570
RAC: 0

RE: My last six E@H

Quote:

My last six E@H workunits have all finished Client Error - Exit Status 10 (0xa).

They are all claiming the same amount of credit as the successfully completed Results for the same WU.
On one of them there are two othere completed results who have apparently been granted credit but my result hasn't. If this is going to happen for all the others I will have wasted the last 10 days of processing time (since LHC@H and S@H are not currently producing WUs E@H had exclusive use of my PC).

I can't find anything about the exit code so I can't try to fix it! If I'm not going to get credit for results that fail after running for 30+ hours each I might as well wait until S@H come back and let them have the time - E@H looks like a waste of time.

How do you run your BOINC client (as service, single installation? run always?)
I noticed that BOINC was restarted and could no longer read the checkpoint:

Quote:


2007-05-07 08:51:48.3609 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R2_4.17_windows_intelx86.exe'.
2007-05-07 08:51:53.0640 [debug]: Reading SFTs and setting up stacks ... done
2007-05-07 08:52:18.7202 [normal]: ERROR: Couldn't open existing checkpointing toplist file h1_0361.70_S5R2__127_S5R2c_0_0
2007-05-07 08:52:18.7202 [debug]: Couldn't open checkpoint - starting from beginning
2007-05-07 08:52:18.7202 [debug]: Total skypoints = 32152. Progress: 0, 1, 2, 3, c
4, 5, 6, 7, 8, 9, 10, c
...
1610, 1611, c
1612, No heartbeat from core client for 361 sec - exiting

2007-05-07 14:22:53.0946 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R2_4.17_windows_intelx86.exe'.
2007-05-07 14:22:55.6883 [debug]: Reading SFTs and setting up stacks ... done
2007-05-07 14:29:38.9696 [debug]: Found checkpoint - reading...
2007-05-07 14:29:38.9696 [debug]: Read checkpoint - reading previous output...
2007-05-07 14:29:40.3602 [debug]: Read exactly 785662 == maxbytes from Fstat-file, that's enough.
2007-05-07 14:29:40.3602 [debug]: DEBUG: read_fstat_toplist_from_fp() returned 785662
2007-05-07 14:29:40.3602 [debug]: Total skypoints = 32152. Progress: 1612, c
1613, 1614, 1615, 1616, c
...
2265, 2266, 2267, 2268, 2269, c
2270, 2271, 2272, 2273, 2274, 2275, c
2276, 2277, 2278, 2279, 2280, 2281, 2282, 2283, 2284, c
2285, 2286, 2287, 2288,
2007-05-07 20:26:22.0477 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R2_4.17_windows_intelx86.exe'.
2007-05-07 20:26:22.8602 [debug]: Reading SFTs and setting up stacks ... done
2007-05-07 20:38:06.8446 [debug]: Found checkpoint - reading...
2007-05-07 20:38:06.8446 [debug]: ERROR reading checkpoint
Could not resume from checkpoint2007-05-07 20:38:06.8446 [CRITICAL]: ERROR: MAIN() returned with error '10'

at 2007-05-07 14:22:53 Boinc is restarted and able to read the checkpoint file.

at the end Boinc seems to be started (a second time?) without being stopped and can't read the checkpoint file (file in use?)

[Edit] just noticed that Boinc took 12 minutes between 'Reading SFTs and setting up stacks' (at 20:26:22) and 'Found checkpoint - reading' (at 20:38:06).
What was going on there?

Udo

GrahamH
GrahamH
Joined: 7 May 05
Posts: 4
Credit: 407815
RAC: 0

RE: How do you run your

Message 64178 in response to message 64177

Quote:

How do you run your BOINC client (as service, single installation? run always?)
I noticed that BOINC was restarted and could no longer read the checkpoint:

Boinc runs as a service - run always.

I reset E@H yesterday evening and, despite there being nothing else running on the PC (I've been in bed and at work since then!) I just found that the Result that started last night says it has only managed 2 1/2 hours or CPU time and reached 5.7%. Watching it now it seems to be clocking up CPU time only slightly slower than real time. Also found 3 "... exited with zero status but no 'finished' file"

The other odd thing was that the machine was showing the E@H screen saver, time was correct when I woke it up (around 17:15) but display was frozen. It took 3 or 4 minutes before I could get any response (even to num lock/caps lock!) after which I got to the traditional WinXP "PC Locked - login prompt". According the the Boinc release notes, the screen saver should not work at all when running as a service. In addition, I use the WindowsXP Home login screen rather than the traditional windows version (easier for th efamily!) so something strange is going on.

History:
I stopped running Boinc (all projects) for several months but I recently installed the latest version Boinc Client 5.8.16. At that time I also changed to the "run as service" option so that Boinc would still run (without me being logged in) if the machine rebooted unexpectedly (kids games sometimes do that or kids forgetting I want it left on). I also recently attached to BAM.

Only other significant change was upgrading to latest Norton Internet Security (only choice was upgrade - renewal not available for version I was running before!) - this has significantly slowed the machine (using much more memory as well).

Quote:

[Edit] just noticed that Boinc took 12 minutes between 'Reading SFTs and setting up stacks' (at 20:26:22) and 'Found checkpoint - reading' (at 20:38:06).
What was going on there?

Not aware of anything unusual at that time.

While I have been typing thus - the E@H Result has kept clocking up CPU time at almost real time (only other thing going on is me typing into IE!) This is the way it used to work, so perhaps it has cured itself (got worried I was looking too hard ;-)

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 34

After you installed BOINC as

After you installed BOINC as a service, did you also use the work-around to re-enable the graphics and the screen saver?

And have you tried running without the screen saver? It's generally a resource hog. Even on computers with external (AGP, PCI, PCI-e) videocards.
To run without the screen saver, go to Display properties, screen saver tab, set the screen saver to None and OK out. Then just turn off the monitor, or use power standby options.

GrahamH
GrahamH
Joined: 7 May 05
Posts: 4
Credit: 407815
RAC: 0

RE: After you installed

Message 64180 in response to message 64179

Quote:

After you installed BOINC as a service, did you also use the work-around to re-enable the graphics and the screen saver?

And have you tried running without the screen saver? It's generally a resource hog. Even on computers with external (AGP, PCI, PCI-e) videocards.
To run without the screen saver, go to Display properties, screen saver tab, set the screen saver to None and OK out. Then just turn off the monitor, or use power standby options.

Wasn't aware of the work around - I wasn't too fussed about losing it anyway so I didn't look.

Now I have the info I may try it just so I can see the graphics occasionally (I quite like the LHC@H graphics)

I have just disabled the SCR - I should have done it before as I always though they looked quite expensive (I remember when the GL SCRs first came out in WinNT, if they popped up anything you might have left running on the system didn't get a look in!). Thanks for the prompt!

Graham

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 34

It's taken a while (slightly

It's taken a while (slightly over a month ;-)), but I finally know what exit code 10 is. If you run into this problem... just reset the project. It's a problem with the application's checkpointing capabilities. And it will affect the whole work unit you are crunching. Getting a new work unit seems to fix it for most people.

I know Bernd is trying to get this problem fixed ASAP though.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.