I recently was sent 6 units to crunch.
2 were successful 1 was compute error when sent in; Don't know what the problem was.
3 were in the process of running; at various stages.
When BOINC switched over to the 3 units; at about the same time; they all went bad.
I am running 5.10.20; No problems with the other projects I run.
This is what I got as messeges.
9/24/2007 6:09:54 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__135_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:09:54 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__135_S5R2c_1 finished
9/24/2007 6:09:54 PM|Einstein@Home|Output file h1_0535.50_S5R2__135_S5R2c_1_0 for task h1_0535.50_S5R2__135_S5R2c_1 absent
9/24/2007 6:09:54 PM|Einstein@Home|Restarting task h1_0535.50_S5R2__123_S5R2c_1 using einstein_S5R2 version 438
9/24/2007 6:09:57 PM|Einstein@Home|Deferring communication for 1 min 0 sec
9/24/2007 6:09:57 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__123_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:09:57 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__123_S5R2c_1 finished
9/24/2007 6:09:57 PM|Einstein@Home|Output file h1_0535.50_S5R2__123_S5R2c_1_0 for task h1_0535.50_S5R2__123_S5R2c_1 absent
9/24/2007 6:09:57 PM|Einstein@Home|Restarting task h1_0535.50_S5R2__106_S5R2c_1 using einstein_S5R2 version 438
9/24/2007 6:10:03 PM|Einstein@Home|Deferring communication for 1 min 0 sec
9/24/2007 6:10:03 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__106_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:10:03 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__106_S5R2c_1 finished
I recently was sent 6 units to crunch.
2 were successful 1 was compute error when sent in; Don't know what the problem was.
3 were in the process of running; at various stages.
When BOINC switched over to the 3 units; at about the same time; they all went bad.
I am running 5.10.20; No problems with the other projects I run.
This is what I got as messeges.
9/24/2007 6:09:54 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__135_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:09:54 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__135_S5R2c_1 finished
9/24/2007 6:09:54 PM|Einstein@Home|Output file h1_0535.50_S5R2__135_S5R2c_1_0 for task h1_0535.50_S5R2__135_S5R2c_1 absent
9/24/2007 6:09:54 PM|Einstein@Home|Restarting task h1_0535.50_S5R2__123_S5R2c_1 using einstein_S5R2 version 438
9/24/2007 6:09:57 PM|Einstein@Home|Deferring communication for 1 min 0 sec
9/24/2007 6:09:57 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__123_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:09:57 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__123_S5R2c_1 finished
9/24/2007 6:09:57 PM|Einstein@Home|Output file h1_0535.50_S5R2__123_S5R2c_1_0 for task h1_0535.50_S5R2__123_S5R2c_1 absent
9/24/2007 6:09:57 PM|Einstein@Home|Restarting task h1_0535.50_S5R2__106_S5R2c_1 using einstein_S5R2 version 438
9/24/2007 6:10:03 PM|Einstein@Home|Deferring communication for 1 min 0 sec
9/24/2007 6:10:03 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__106_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:10:03 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__106_S5R2c_1 finished
Hope this helps
Though they all failed with exit status 99, at least two of the four tasks failed with completely different symptoms: one with an error in reading the data files (though your client should check their integrity (md5sum) before starting the App), and the other in what looks like a programming error (NULL pointer), but apparently nobody else has stumbled over yet. I couldn't get anything useful of the stderr output of the other two, as the actual message has been truncated.
To me I'd guess your memory has gone faulty right at the moment were the first crash happened. I'd suggest to run a memory checker.
Thanks for the reply, I will do a memory check ASAP.
Like I said, I've had no problem with my other projects and I have run two S3 units with no problem. I'm thinking that it might have that all three S2 units all started at the same time and with the large size, there was a FUBAH.
Before this wrong WU was reported, my DSL internet connection failed, so I needed to manually reset it. In Boincwiew's "Messages" screen I have lots of messages like this:
> Requesting 8640 seconds of new work, and reporting 1 completed tasks
> Sending scheduler request: To report completed tasks
> Reason: scheduler request failed
> Deferring communication for 1 min 0 sec
> Scheduler request failed: couldn't resolve host name
After resettting the DSL line, BOINC has reported the task and downloaded new work and this work is now running
I suppose that the temporary lack of Internet connection should not cause a compute error...
A "process got signal 11" error is a segmentation error. Which can be anything, from a bit wrong in memory to a problem with the CPU. Yet seeing that you return otherwise flawless results, just consider this a bug in the result.
Bernd wrote that 'exit code 10' is mostly related to disk failures. But his result file has a line which looks very strange...
This definitely is a disk corruption, even of the file the stderr output is kept in.
BM
I've got some errors "exit code 10", See in http://einsteinathome.org/task/87136558
Should this be considered disk errors, too?
Thank you for your help.
CCP
I also had one result with a signal 11 error, running Ubuntu Linux 6.06. It failed during a time when I was having internet issues. First, the xtremlab site was down, so I turned communications on and off a few times, then I lost my internet connection completely for a while.
I was wondering if it would be feasible to respond to some errors of this type by restarting at the last checkpoint. If that were done, there would need to be some way to insure that it didn't restart repeatedly. Perhaps the task could be suspended, with an option for the user to restart or abort it. I don't think it would be a good idea to require input to make the decision though. Perhaps it would be aborted automatically if it happened more than once, or more than once without significant progress from the last checkpoint.
If anyone has more to add about possible causes of the error, I would be interested in hearing them also. For now, I'm just assuming it was just a fluke.
RE: My first S5R3 result
)
Thank you very much indeed for your error report. This confirms the suspicion that there's something wrong with the screensaver code.
Yeah, I know, the Linux app doesn't have a screensaver. Don't tell me, tell the app :-) :
Bikeman
I recently was sent 6 units
)
I recently was sent 6 units to crunch.
2 were successful 1 was compute error when sent in; Don't know what the problem was.
3 were in the process of running; at various stages.
When BOINC switched over to the 3 units; at about the same time; they all went bad.
I am running 5.10.20; No problems with the other projects I run.
This is what I got as messeges.
9/24/2007 6:09:54 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__135_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:09:54 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__135_S5R2c_1 finished
9/24/2007 6:09:54 PM|Einstein@Home|Output file h1_0535.50_S5R2__135_S5R2c_1_0 for task h1_0535.50_S5R2__135_S5R2c_1 absent
9/24/2007 6:09:54 PM|Einstein@Home|Restarting task h1_0535.50_S5R2__123_S5R2c_1 using einstein_S5R2 version 438
9/24/2007 6:09:57 PM|Einstein@Home|Deferring communication for 1 min 0 sec
9/24/2007 6:09:57 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__123_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:09:57 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__123_S5R2c_1 finished
9/24/2007 6:09:57 PM|Einstein@Home|Output file h1_0535.50_S5R2__123_S5R2c_1_0 for task h1_0535.50_S5R2__123_S5R2c_1 absent
9/24/2007 6:09:57 PM|Einstein@Home|Restarting task h1_0535.50_S5R2__106_S5R2c_1 using einstein_S5R2 version 438
9/24/2007 6:10:03 PM|Einstein@Home|Deferring communication for 1 min 0 sec
9/24/2007 6:10:03 PM|Einstein@Home|Reason: Unrecoverable error for result h1_0535.50_S5R2__106_S5R2c_1 ( - exit code 99 (0x63))
9/24/2007 6:10:03 PM|Einstein@Home|Computation for task h1_0535.50_S5R2__106_S5R2c_1 finished
Hope this helps
RE: Bernd wrote that 'exit
)
This definitely is a disk corruption, even of the file the stderr output is kept in.
BM
BM
RE: I recently was sent 6
)
Though they all failed with exit status 99, at least two of the four tasks failed with completely different symptoms: one with an error in reading the data files (though your client should check their integrity (md5sum) before starting the App), and the other in what looks like a programming error (NULL pointer), but apparently nobody else has stumbled over yet. I couldn't get anything useful of the stderr output of the other two, as the actual message has been truncated.
To me I'd guess your memory has gone faulty right at the moment were the first crash happened. I'd suggest to run a memory checker.
BM
BM
Hi Bernd, Thanks for the
)
Hi Bernd,
Thanks for the reply, I will do a memory check ASAP.
Like I said, I've had no problem with my other projects and I have run two S3 units with no problem. I'm thinking that it might have that all three S2 units all started at the same time and with the large size, there was a FUBAH.
Mike F,
My two firsts S5R3 WU's have
)
My two firsts S5R3 WU's have been completed successfuly on my Duron 1600 running OpenSUSE Linux, with granted credit...
...but the third one has finished with a compute error, exit status 11 (0xb). I had no client errors at all with S5R2
Here are my results:
http://einsteinathome.org/account/tasks
Any idea what does it mean?
Before this wrong WU was reported, my DSL internet connection failed, so I needed to manually reset it. In Boincwiew's "Messages" screen I have lots of messages like this:
> Requesting 8640 seconds of new work, and reporting 1 completed tasks
> Sending scheduler request: To report completed tasks
> Reason: scheduler request failed
> Deferring communication for 1 min 0 sec
> Scheduler request failed: couldn't resolve host name
After resettting the DSL line, BOINC has reported the task and downloaded new work and this work is now running
I suppose that the temporary lack of Internet connection should not cause a compute error...
A "process got signal 11"
)
A "process got signal 11" error is a segmentation error. Which can be anything, from a bit wrong in memory to a problem with the CPU. Yet seeing that you return otherwise flawless results, just consider this a bug in the result.
Thank you for your answer. A
)
Thank you for your answer. A fourth result has now been completed and succesfully validated, so everything seems ok.
RE: RE: Bernd wrote that
)
I've got some errors "exit code 10", See in http://einsteinathome.org/task/87136558
Should this be considered disk errors, too?
Thank you for your help.
CCP
I also had one result with a
)
I also had one result with a signal 11 error, running Ubuntu Linux 6.06. It failed during a time when I was having internet issues. First, the xtremlab site was down, so I turned communications on and off a few times, then I lost my internet connection completely for a while.
I was wondering if it would be feasible to respond to some errors of this type by restarting at the last checkpoint. If that were done, there would need to be some way to insure that it didn't restart repeatedly. Perhaps the task could be suspended, with an option for the user to restart or abort it. I don't think it would be a good idea to require input to make the decision though. Perhaps it would be aborted automatically if it happened more than once, or more than once without significant progress from the last checkpoint.
If anyone has more to add about possible causes of the error, I would be interested in hearing them also. For now, I'm just assuming it was just a fluke.
http://einsteinathome.org/task/87532936
5.4.9
process got signal 11