BOINC, Einstein and the weird SIGFPE.

Modi
Modi
Joined: 15 Jun 06
Posts: 9
Credit: 255613
RAC: 0
Topic 193400

Hello everyone!

As you can see here Einstein has quite some problem on both of my PCs.

Most of the Workunits quit with Signal 8 (meaning SIGFPE.)

I was not able to really track down what happens, but it happens for some time now...

I even changed to the latest BOINC version without any success. :(

I'm using Linux and heard, that your Linux client often has some problems, so... I guess this is a bug that should be fixed or something like this.

Anyways, the problem is that I'm doing work for Einstein for MONTH now with just a few valid WUs and that is driving me nuts... it's just wasted CPU power, as well as for you, as for "me". :(

So do you have any clue how that happens/how it could be fixed?

-Ionic

Desti
Desti
Joined: 20 Aug 05
Posts: 117
Credit: 23762214
RAC: 0

BOINC, Einstein and the weird SIGFPE.

What distribution do you use? I'm sure this is not an Einstein bug, looks more your configuration or hardware is broken.

Modi
Modi
Joined: 15 Jun 06
Posts: 9
Credit: 255613
RAC: 0

I'm using Gentoo GNU/Linux,

I'm using Gentoo GNU/Linux, but I don't think that my hardware or configuration is broken because Einstein is the only program that has such problems.
Even SETI is working really fine...

I've run memtest86+ for almost 48 hours when I got the PC - with any bad results.

Neither do I "rice", my CFLAGS were -O2 -march=pentium3 -mtune=pentium3 -mcpu=pentium3 -pipe -fomit-frame-pointer on one PC and -O2 -march=pentium-m -pipe -fomit-frame-pointer on the other one.

It seems much like an Einstein client issue, but I might be wrong as well.

-Ionic

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 730093247
RAC: 1187101

Hi! The quoted error is

Hi!

The quoted error is quite rare and it's strange that you have two PCs consistently suffereing from them. Are your PCs overclocked? are they co-loated in the same room? You are not living in the vicinity of a nuclear reactor, right :-) :-) (just kidding)?

For the Pentium M, you might want to try the Linux / Intel "Power User App" discussed here. Unfortunaletly your PIII-S doesn't support it.

This version does floating point calculations differently. Whatever is causing this errors, with some luck it won't occur in the Power User app.

CU
Bikeman

BTW: The fact that other programs are not crashing is not really proving anything. The E@H app contains special code to catch these error conditions, older versions of the E@H would have continued as well for some time unless it ran into some random secondary error condition.

Modi
Modi
Joined: 15 Jun 06
Posts: 9
Credit: 255613
RAC: 0

Hey, no, I would never

Hey,

no, I would never overclock any PC. :)

Heh, yeah, they are on the same room, but that room is quite cool everytime. I don't like hot temperatures. ;)

Thanks, I will have a look at that...

One more thing: I guess the Pentium-III-S's and Pentium-M's architectures are quite close together, isn't it? Well anyways... I will try to upgrade to the latest glibc in a few weeks (I'm using 2.5 whereas 2.6 is current stable) and rebuild everything, in the hope, that it might magically solve some of my problems. :)

And I will try that other binary on my Laptop, I hope it'll solve at least the one on my Lap. :)

Thank you for your help,

-Ionic

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 730093247
RAC: 1187101

RE: One more thing: I

Message 76603 in response to message 76602

Quote:

One more thing: I guess the Pentium-III-S's and Pentium-M's architectures are quite close together, isn't it?

Indeed, the Pentium M is an enhanced Pentium III-S. Among other things, support for the SSE2 instruction set was added and this is required by the current build of the Linux/Intel Power user app.

CU
Bikeman

Modi
Modi
Joined: 15 Jun 06
Posts: 9
Credit: 255613
RAC: 0

Yes, I know that too well.

Yes, I know that too well. :)

I made a full backup of my Laptop before getting it repaired, mounted the harddisk via loop on my Desktop and nearly EVERY command produced SIGILL - that was my first contact with SSE2 No-SSE2. :)

Anyways, I hope I'll have some luck with that.
SIGFPE is indeed rather unusual, on my Router (200MHz MIPS32) I could "produce" it by installing binutils-2.18 with uClibc 0.9.27... it was horrible, every binutils program exited with SIGFPE and did just nothing. Later, I read that binutils-2.18 is not compatible with that old uClibc anymore... bad luck. :/

Thus I thought that my "old" glibc could cause that problem and I hope that upgrading will help... we'll see. :)

By the way, Dual PIII-S rocks, isn't it? Unfortunately, I just have a single board, but two of those CPUs. Though, buying a dual board would cause astronomic costs because I would have to replace nearly everything, even the RAM (EEC is a must on dual boards). :(

Hope you enjoy it anyways. :)

-Ionic

Modi
Modi
Joined: 15 Jun 06
Posts: 9
Credit: 255613
RAC: 0

Hi, well you are indeed

Hi,

well you are indeed right, with that new version I do not get SIGFPE anymore, but this time some other errors which look just MORE critical, could you please have a look at it? (The last result on sui.lan.)

-Ionic

Desti
Desti
Joined: 20 Aug 05
Posts: 117
Credit: 23762214
RAC: 0

Do you have installed some

Do you have installed some unstable libs?

Modi
Modi
Joined: 15 Jun 06
Posts: 9
Credit: 255613
RAC: 0

Not that I would know.

Not that I would know. :(

-Ionic

Desti
Desti
Joined: 20 Aug 05
Posts: 117
Credit: 23762214
RAC: 0

Or made a major GCC upgrade

Or made a major GCC upgrade without rebuilding system libs?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.