Computational Error

Arvid
Arvid
Joined: 6 Apr 05
Posts: 1
Credit: 4510
RAC: 0
Topic 194878

Has anyone else had their work units end in a computational error half way through?

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 76

Computational Error

I see you have Input file problems with errors -119 and -120, which are problems with the encryption and md5 signatures on the actual program files and tasks.

You may have a corrupt public key. Try to reset your project to fix that.
If these problems persist, try to uninstall BOINC and reinstall it to another drive or partition, then run an extensive chkdsk scan on your hard drive to see if it has sector problems. (See the MS knowledgebase for more help on chkdsk)

Richard Schumacher
Richard Schumacher
Joined: 8 Aug 06
Posts: 32
Credit: 14951639
RAC: 31521

Yes, I had this: Sun Apr

Yes, I had this:

Sun Apr 18 10:20:45 2010|Einstein@Home|Unrecoverable error for result h1_0453.20_S5R4__74_S5GCEa_1 (process exited with code 4 (0x4))

When I looked at it this AM BOINC had halted with no project or job info displayed. After re-starting BOINC everything looks normal again...

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 76

RE: Sun Apr 18 10:20:45

Message 97731 in response to message 97730

Quote:
Sun Apr 18 10:20:45 2010|Einstein@Home|Unrecoverable error for result h1_0453.20_S5R4__74_S5GCEa_1 (process exited with code 4 (0x4))


Process exited with code 4 says:

Quote:

With thanks to Bernd Machenschalk of Einstein for this explanation.

In general there are two reasons for an exit code 4:

- a signal 4 (illegal instruction) happens when the application tries to execute commands the CPU isn't capable of (e.g. AltiVec code running on a G3 Mac)

- an error in the command-line that is passed form the Client to the application. Typically something went wrong in the communication of the Core Client, talking to either the server, the application or the file-system of the host.


I'll ask Bernd to pass by.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250509603
RAC: 34419

It is an error in the

It is an error in the command-line parsing that arises from missing or unaccessible files:

2010-04-19 21:53:26.1182 (13718) [CRITICAL]: Failed to open 'skygrid_0460Hz_S5GC.dat' for reading: 2: No such file or directory
2010-04-19 21:53:26.1183 (13718) [debug]: ERROR: Couldn't open 'skygrid_0460Hz_S5GC.dat'
2010-04-19 21:53:26.1184 (13718) [CRITICAL]: Failed to open 'earth' for reading: 2: No such file or directory
2010-04-19 21:53:26.1184 (13718) [debug]: ERROR: Couldn't open 'earth'
2010-04-19 21:53:26.1185 (13718) [CRITICAL]: Failed to open 'sun' for reading: 2: No such file or directory
...
2010-04-19 21:53:26.1196 (13718) [CRITICAL]: ERROR: error 4 in command-line parsing

As this apparently doesn't happen at the very beginning (nonzero CPU time), it looks like the tasks was stooped (suspended), then something happened that deleted the files and when the App was restarted, it couldn't find the files anymore. Apparently whatever deleted the data files also deleted the stderr output of the first application start. Did you do a restore from a backup there? Check your filesystem. If that's ok, a project reset should help.

[edit]The files that were deleted is probably the 'slot' directory, not the actual data files in the project directory. You are using a pretty old BOINC Core Client (5.2.13). I remember that there have been some Clients that had a problem with managing theirs slot directories. Maybe upgrading the Client would help[/edit]

BM

BM

Odysseus
Odysseus
Joined: 17 Dec 05
Posts: 372
Credit: 20554451
RAC: 5431

Two of my G4s have now done

Message 97733 in response to message 97732

Two of my G4s have now done exactly the same: see the output at h1_0538.35_S5R4__161_S5GCEa_1 & h1_0447.25_S5R4__60_S5GCEa_1. The other two are still crunching.

P.S. Looks like they both ran benchmarks about the same time. All clients set to keep apps in memory. One of the above tasks ran on a file-server machine that hasn’t been restarted for months.

Richard Schumacher
Richard Schumacher
Joined: 8 Aug 06
Posts: 32
Credit: 14951639
RAC: 31521

I installed 6.6.36, which

I installed 6.6.36, which appears to be the most recent one that supports OSX 10.3.9. After starting it fetched and completed 14 (the daily quota) very short jobs, with results like this:

[...]
Thu Apr 22 00:06:07 2010 Einstein@Home Started download of l1_0454.05_S5R4
Thu Apr 22 00:06:09 2010 Einstein@Home Finished download of h1_0454.05_S5R4
Thu Apr 22 00:06:09 2010 Einstein@Home Started download of l1_0454.05_S5R7
Thu Apr 22 00:06:32 2010 Einstein@Home Finished download of l1_0454.05_S5R4
Thu Apr 22 00:06:33 2010 Einstein@Home Finished download of l1_0454.05_S5R7
Thu Apr 22 00:06:34 2010 Einstein@Home Starting h1_0453.75_S5R4__180_S5GCEa_0
Thu Apr 22 00:06:44 2010 Einstein@Home Starting task h1_0453.75_S5R4__180_S5GCEa_0 using einstein_S5GCE version 701
Thu Apr 22 00:06:53 2010 Einstein@Home Computation for task h1_0453.75_S5R4__180_S5GCEa_0 finished
Thu Apr 22 00:06:53 2010 Einstein@Home Output file h1_0453.75_S5R4__180_S5GCEa_0_0 for task h1_0453.75_S5R4__180_S5GCEa_0 absent
Thu Apr 22 00:07:57 2010 Einstein@Home Sending scheduler request: To fetch work.
Thu Apr 22 00:07:57 2010 Einstein@Home Reporting 1 completed tasks, requesting new tasks
Thu Apr 22 00:08:01 2010 Einstein@Home Scheduler request completed: got 0 new tasks
Thu Apr 22 00:08:01 2010 Einstein@Home Message from server: No work sent
Thu Apr 22 00:08:01 2010 Einstein@Home Message from server: (reached daily quota of 14 tasks)
Thu Apr 22 00:08:01 2010 Einstein@Home Message from server: Project has no jobs available

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Thu Apr 22 00:06:53

Message 97735 in response to message 97734

Quote:
Thu Apr 22 00:06:53 2010 Einstein@Home Computation for task h1_0453.75_S5R4__180_S5GCEa_0 finished
Thu Apr 22 00:06:53 2010 Einstein@Home Output file h1_0453.75_S5R4__180_S5GCEa_0_0 for task h1_0453.75_S5R4__180_S5GCEa_0 absent


The tasks I checked all got "process got signal 10" whatever that means :-)

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250509603
RAC: 34419

RE: The tasks I checked all

Message 97736 in response to message 97735

Quote:
The tasks I checked all got "process got signal 10" whatever that means


It means "Bus error". Equivalent of Windows "General access violation".

Looks like Task 171772121 was the last that was started at all, from then on all tasks errored out immediately.

FWIW: I'm running 6.6.29 on a G3 w. Mac OS 10.3.9 without apparent problems.

BM

BM

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

I got also some computational

I got also some computational errors with the S5GCE application on more than five computers.
( einstein_S5GCE_3.04_windows_intelx86__S5GCESSE2.exe )

I found always this error in the output:

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 76

RE: - Unhandled Exception

Message 97738 in response to message 97737

Quote:
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E


Like this one?

It starts with:

Quote:


Maximum elapsed time exceeded

Maximum elapsed time exceeded is an error you will get when the CPU or GPU exceeds the amount of time specified in the n amount given to the task.

Have you been manually editing the number? ;-)

The Breakpoint Encountered message is just a consequence to that message. You'll also see it with the other one, "Maximum disk space exceeded".

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.