My primary host has now had 5 failed WUs returned as 'Compute Error' with exit status : 139 (0x8b)
These are the failed WUs
33482430
33450969
33440121
33364148
33360594
This host is a A64 3700 overclocked to 2.86 Ghz. This host has never had stability issues (running 24/7 for over a year) with previous Science Runs. To rule out the overclock causing the WUs to fail, I set the host back to stock speed at 2.2 Ghz. No luck. The first WU I listed above failed at stock settings.
There is no obvious pattern to these failures, some WUs finishing successfully others not. I don't care about the credit, but the waisted crunching hours (38+) is starting to bother me. Anyone have any ideas? Is this a problem with my host, or are they still ironing out some kinks in the new app.
There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman
Copyright © 2024 Einstein@Home. All rights reserved.
5th Computing error for S5R2 on one host
)
Yep, unfortunately the new app is still not entirely reliable. I had a WU crash on me with the dreaded "SIGABRT" error with the new app- on an AMD, of course. I really wish the project, and all of us crunchers, that the developers will manage to put a stop to that soon. While I'm not complaining, it is, as you said, Dave, a pity about all those CPU hours.
RE: Yep, unfortunately the
)
Thanks for the quick reply. I hope the devs can get this worked out soon without to much headache (for them). Guess its time to crank my overclock back up an hope for some "lucky" WUs!
There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman
Yes, the only sure way to
)
Yes, the only sure way to avoid this would be running Windows, I think... I've only seen Linux hosts get that error. But Windows is just TOO slow on an AMD host, apart from not being many people's OS of choice ;-) so one has to take the odds... I wish you happy crunching and more luck with your next WUs.
I didn't realize this was
)
I didn't realize this was just limited to the Linux app. I dual-boot with Windows for games and such, but couldn't imagine "going back" and loosing Beryl and other features of Linux. I'll just wait patiently for a new app.
On a side note, wouldn't this bug be wrecking havoc with Bruce's cluster?
Cheers
There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman
IIRC I've seen win machines
)
IIRC I've seen win machines erroring out too.
Then I'm definitely sticking
)
Then I'm definitely sticking with linux!
There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman
RE: IIRC I've seen win
)
Not with SIGABRT... That seems to be Linux-specific...
It seems that every WU that
)
It seems that every WU that was interupted/resumed gets compute error with signal 11/SIGABRT on Linux machine.
Example http://einsteinathome.org/task/83757575 and this host have more such.
It's pitty, as I have to restart them quite often...
RE: It seems that every WU
)
That host is using an old version of BOINC (4.43) could it have something to do with that.
That couldn't have been the
)
That couldn't have been the reason for my WU to crash. I'm completely sure I didn't pause/resume that. Maybe this can trigger SIGABRT errors, but it can't be the only thing that causes them...
Oh, and for Bruce's cluster... afaik some of his machines were indeed affected quite badly...