Client Errors of S5R2/S5R3 Apps

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250380657
RAC: 34906
Topic 193038

Exit code -1073741819 (0xC0000005): This is the famous "General Access Violation". There a numerous reasons for this error to occur, from hardware problems to graphics drivers and more. Ideally when this happens, the "Windows Runtime Debugger" should start up and write a stack dump to stderr out that helps to further diagnose the problem. If the "Access Violation" is listed in the "*** Dump of the Graphics thread ***", it was almost certainly a problem with the graphics driver. The most common cause for an Access Violation listed in the "*** Dump of the Worker thread ***" shows "houghmap.c:" near the end of the first line of the Callstack. This is a problem that we might be able to do something about, we are currently hunting this. Apparently it only happens on certain machines, it might be related to the data these hosts are processing, but may also be a property of the hardware or other software on the system.

Exit code 10: It means that the App could not resume from a previously written checkpoint. Again, the output listed in stderr out of the result should give a hint why. Most of the errors we get of this type are apparently due to a broken harddisk sector or even filesystems (e.g. some have the checkpoint file point to what looks like a portion of the client_state.xml). Again there's one error of this type we are trying to understand better in order to do something about it: It's an empty checkpoint file, in which case there will be an "EOF encountered" listed at the bottom of stderr out.

Exit code 99: This means that the App terminated because an internal check failed. Again there should be something at the end of stderr out that allows to further diagnose the problem. If stderr out lists "file SFTfileIO.c" at the bottom, the check that failed was a sanity check of the data read from the input files. Resetting the project and thus downloading a fresh set of data files might help. Again, there is one type of error we are working on to better understand what's happening in order to prevent this from happening again: In these cases the following lines are shown at the bottom of stderr out:
[CRITICAL]: Required frequency-bins [-8, 8] not covered by SFT-interval [...]
XLAL Error - LocalXLALComputeFaFb (LocalComputeFstat.c:536): Input domain error

Exit code -1073741502 (0xC0000142): It means that a DLL failed to load properly. This error looks like it's happening more frequently on Windows Vista, but we also get it from machines running Windows XP. We would greatly appreciate any idea which DLL this might be - so far I haven't got a clue (I could try to delay-load a specific DLL, but again for this I would need a "suspect").

Exit codes -1,0,1: These look like a program other than the BOINC Client (such as a malware scanner) terminated the App in the middle of crunching. The stderr out doesn't show anything helpful in these cases. Again, I (and probably a lot of participants) would be thankful for a hint why these are happening.

A word on BOINC Client versions: The "Windows Runtime Debugger" is only available with newer Clients (5.6 & up I think). The newest BOINC Client 5.10 has the bad habit of reporting only the "head" of the stderr output, which means that in many cases the useful diagnostic output is cut off. Earlier Clients (such as the 5.8 series) reported the last lines of stderr out, which made sure that the useful information that is at the bottom of the output doesn't get lost. So if you want to help us to track and fix the problems, I recommend using a 5.8 BOINC Core Client.

BM

BM

Charles
Charles
Joined: 2 Sep 07
Posts: 2
Credit: 13530
RAC: 0

Client Errors of S5R2/S5R3 Apps

I have recently joined this project (3 days ago) and all of my projects seem to complete just fine, however, my account shows Server State as "over" and the Outcome as "Client error." My PC is as follows...
GenuineIntel
Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz [x86 Family 6 Model 15 Stepping 13] Microsoft Windows Vista
Home Edition, (06.00.6000.00)

Thanks in advance for any ideas/help.
Charles

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

RE: I have recently joined

Message 71127 in response to message 71125

Quote:

I have recently joined this project (3 days ago) and all of my projects seem to complete just fine, however, my account shows Server State as "over" and the Outcome as "Client error." My PC is as follows...
GenuineIntel
Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz [x86 Family 6 Model 15 Stepping 13] Microsoft Windows Vista
Home Edition, (06.00.6000.00)

Thanks in advance for any ideas/help.
Charles

They're all erroring out with "exit code 1073807364 (0x40010004)" which the Unofficial BOINC wiki lists as a possible graphics error. Are you running with the screen saver? If so, disable it and switch to a 2-d Windows screen saver or let your screen go blank and power down. Graphics support for BOINC on Vista is hit and miss at best. You can also try upgrading your graphics drivers and possibly your version of DirectX.

Another thing about Vista is that (for most people) if BOINC is installed to the default location (c:\\program files\\boinc) it doesn't always play nice. Try installing it to somewhere outside of Program Files. I use c:\\boinc and things work fine. To do this, shut down BOINC (and stop the service if you've installed that way), uninstall through add/remove programs. Move the entire BOINC directory. Reinstall and point the installer to the now moved BOINC directory.

The other thing I see in sterrout is

2007-09-03 15:57:25.9857 [debug]: Couldn't open checkpoint (2) - starting from beginning

It's possible something like an AV is locking the file while it's scanning. Try excluding the BOINC directory from the AV scan.

Kathryn :o)

Einstein@Home Moderator

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 722309526
RAC: 1147488

RE: The other thing I see

Message 71128 in response to message 71127

Quote:


The other thing I see in sterrout is

2007-09-03 15:57:25.9857 [debug]: Couldn't open checkpoint (2) - starting from beginning

It's possible something like an AV is locking the file while it's scanning. Try excluding the BOINC directory from the AV scan.

The "Couldn't open checkpoint" message is normal when it appears at the start of the run, that is when the app is first started with a new result. It will always try to open a checkpoint file , and when it doesn't see one, assume that it's a fresh result and start from the beginnen. So this message is nothing to worry about. Hopefully the graphics workarounds from Kathryn will do the trick.

CU

H-BE

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

RE: The "Couldn't open

Message 71129 in response to message 71128

Quote:

The "Couldn't open checkpoint" message is normal when it appears at the start of the run, that is when the app is first started with a new result. It will always try to open a checkpoint file , and when it doesn't see one, assume that it's a fresh result and start from the beginnen. So this message is nothing to worry about. Hopefully the graphics workarounds from Kathryn will do the trick.

CU

H-BE

Thanks. I wasn't aware of it. Red herring... ignore me :)

Kathryn :o)

Einstein@Home Moderator

Guido Platteau
Guido Platteau
Joined: 9 Sep 06
Posts: 2
Credit: 858319
RAC: 0

When I run Einstein on

When I run Einstein on http://einsteinathome.org/host/734164
Windows XP home edition
the work unit abends

http://einsteinathome.org/task/86844983

Level 0: $Id: HierarchicalSearch.c,v 1.179 2007/08/22 11:16:04 badri Exp $
Function call `ComputeFstatHoughMap ( &status, &semiCohCandList, &pgV, &semiCohPar)' failed.
file HierarchicalSearch.c, line 1132
2007-09-16 16:50:15.3593 [normal]:
Level 1: $Id: HierarchicalSearch.c,v 1.179 2007/08/22 11:16:04 badri Exp $
2007-09-16 16:50:15.4062 [normal]: Status code 5: Null pointer
2007-09-16 16:50:15.4062 [normal]: function ComputeFstatHoughMap, file HierarchicalSearch.c, line 1848
2007-09-16 16:50:15.4062 [CRITICAL]: BOINC_LAL_ErrHand(): now calling boinc_finish()

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 722309526
RAC: 1147488

RE: When I run Einstein on

Message 71131 in response to message 71130

Quote:

When I run Einstein on http://einsteinathome.org/host/734164
Windows XP home edition
the work unit abends

http://einsteinathome.org/task/86844983

Level 0: $Id: HierarchicalSearch.c,v 1.179 2007/08/22 11:16:04 badri Exp $
Function call `ComputeFstatHoughMap ( &status, &semiCohCandList, &pgV, &semiCohPar)' failed.
file HierarchicalSearch.c, line 1132
2007-09-16 16:50:15.3593 [normal]:
Level 1: $Id: HierarchicalSearch.c,v 1.179 2007/08/22 11:16:04 badri Exp $
2007-09-16 16:50:15.4062 [normal]: Status code 5: Null pointer
2007-09-16 16:50:15.4062 [normal]: function ComputeFstatHoughMap, file HierarchicalSearch.c, line 1848
2007-09-16 16:50:15.4062 [CRITICAL]: BOINC_LAL_ErrHand(): now calling boinc_finish()


There's also this in your task's output:

Het vorige eigenaarschap van deze semafoor is opgeheven. (0x69) - exit code 105 (0x69)

Whatever it means :-). Also, the computation on your machine is quite slow. Do you have the Einstein@Hoem screensaver active? Consider switching to a another screensaver, the EaH screensavere is independent from the rest of the computation and is just eye candy. On some systems (depending on video drivers abnd hardware) E@H will run much faster and more stable without the screensaver.

CU

H-BE

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 112

RE: Het vorige

Message 71132 in response to message 71131

Quote:

Het vorige eigenaarschap van deze semafoor is opgeheven. (0x69) - exit code 105 (0x69)


That's broken Dutch... of Flemish. ;-)

Anyway, loosely translated that's:
The previous ownership of this semaphore has been discontinued. etc.

Still no clue what it means. :-)
And no, there's no flags in the [url=http://en.wikipedia.org/wiki/Semaphore_(programming)]semaphore[/url].

Odysseus
Odysseus
Joined: 17 Dec 05
Posts: 372
Credit: 20537819
RAC: 5737

My Mac G4/733 running OS

My Mac G4/733 running OS 10.3.9 has been getting a lot of errors recently, usually with exit code 193 (0xc1). Here’s the start of a typical output file:
[pre]5.8.17

process exited with code 193 (0xc1)

2007-09-18 15:07:36.9956 [normal]: Built at: Jul 27 2007 14:59:06

2007-09-18 15:07:36.9971 [normal]: Start of BOINC application 'einstein_S5R2_4.34_powerpc-apple-darwin'.
2007-09-18 15:07:38.2449 [debug]: Reading SFTs and setting up stacks ... done
2007-09-18 15:07:51.8390 [normal]: ERROR: Couldn't open existing checkpointing toplist file h1_0543.60_S5R2__169_S5R2c_1_0
2007-09-18 15:07:51.8396 [debug]: Couldn't open checkpoint (2) - starting from beginning
2007-09-18 15:07:51.8397 [debug]: Total skypoints = 72544. Progress: 0,
$Revision: 1.45 $ OPT:0 SCV:2, SCTRIM:0
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, SIGBUS: bus error[/pre]
The errors occur anywhere from a couple of minutes to several hours into a task; I suspect they’re being triggered by the screen-saver, either when starting or stopping. E@h isn’t the only project that this host has been having trouble with since upgrading to BOINC v5.8.17 from v5.4.9—but the apps without graphics seem immune.

Udo
Udo
Joined: 19 May 05
Posts: 203
Credit: 8945570
RAC: 0

My first wingman got a

My first wingman got a computation error with 'exit code 10' on a S5R3 WU (appl. 4.03).

Bernd wrote that 'exit code 10' is mostly related to disk failures. But his result file has a line which looks very strange...

Udo

Knorr
Knorr
Joined: 18 Feb 06
Posts: 16
Credit: 3129905
RAC: 0

My first S5R3 result failed

My first S5R3 result failed with signal 11

http://einsteinathome.org/task/87088144

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.