Client Errors of S5R2/S5R3 Apps

tapir
tapir
Joined: 19 Mar 05
Posts: 23
Credit: 462935446
RAC: 0

Task details My first

Task details

My first error with new app.

Knight * anday
Knight * anday
Joined: 13 Dec 05
Posts: 18
Credit: 555739
RAC: 0

Not had any new work loaded

Not had any new work loaded for about 5 days
everyday I download 10 units and all have client errors
totally uninstalled Boinc and Einstein twice deleted all the files I can find
but still getting error messages when I reinstall

Running Win XP

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731706044
RAC: 1223085

RE: Not had any new work

Message 71199 in response to message 71198

Quote:

Not had any new work loaded for about 5 days
everyday I download 10 units and all have client errors
totally uninstalled Boinc and Einstein twice deleted all the files I can find
but still getting error messages when I reinstall

Running Win XP

This might be related to the Glasgow mirror being down, I guess in theory the boinc client should skip that one and try the others one after another, I do not get why this doesn't work.

Try UPDATE once more, the Glasgow mirror was removed from the list of available mirrors and hopefully this will be synchonized with the client when hitting "UPDATE" in the the Boinc manager.

CU
Bikeman

Knight * anday
Knight * anday
Joined: 13 Dec 05
Posts: 18
Credit: 555739
RAC: 0

Installed new Boinc Managed

Installed new Boinc
Managed to get new work loaded
Just now

Witold Baryluk
Witold Baryluk
Joined: 20 Nov 06
Posts: 4
Credit: 12402880
RAC: 0

RE: Hi! The last tie I saw

Message 71201 in response to message 71180

Quote:

Hi!

The last tie I saw something similar was here.

The PC affected by this turned out to produce errors in another BOINC project (QMC) as well, so the most likely cause for this was a hardware failure.

This could well be the case for your PC as well. Is it overclocked, or aging?

CU

Bikeman

Quote:

Hi guys,

For the last couple of WUs I've repeatedly got the same compute error on one of my machines (running BOINC 5.10.27):

-------snip--------------------------------------------

APP DEBUG: Application caught signal 8.

FPU status word ffff80c1, flags: ERR_SUMM STACK_FAULT INVALID
Obtained 7 stack frames for this thread.
Use gdb command: 'info line *0xADDRESS' to print corresponding line numbers.
einstein_S5R3_4.20_i686-pc-linux-gnu[0x80a4b9e]
einstein_S5R3_4.20_i686-pc-linux-gnu(LocalComputeFStatFreqBand+0x1849)[0x80ace69]
einstein_S5R3_4.20_i686-pc-linux-gnu(MAIN+0x352d)[0x80a495d]
einstein_S5R3_4.20_i686-pc-linux-gnu[0x80a5b34]
../../projects/einstein.phys.uwm.edu/einstein_S5R3_4.20_i686-pc-linux-gnu.so(_Z6foobarPv+0x14)[0xb7cd9e24]
/lib/libpthread.so.0[0xb7ed4383]
/lib/libc.so.6(clone+0x5e)[0xb7e5863e]
Stack trace of LAL functions in worker thread:
LocalComputeFStatFreqBand at line 201 of file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.20/extra_sources/lalapps-CVS/src/pulsar/hough/src2/LocalComputeFstat.c
LocalComputeFStat at line 289 of file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.20/extra_sources/lalapps-CVS/src/pulsar/hough/src2/LocalComputeFstat.c
(null) at line 0 of file (null)
At lowest level status code = 0, description: NO LAL ERROR REGISTERED

-------snip--------------------------------------------

There seems to be some floating-point exception. Any idea?

Cheers, Oliver


Hi,

I have identical errors on one of my computer. It is HP dc7100 SFF with P4 2.8 GHz with HT, and 4x256MB DDR1 , Debian/Linux unstable, with custom 2.4.25-rc6 kernel.

http://einsteinathome.org/task/93783856

They are repeating many times. Some workunits are ended without problem, some (most) end with error. Mayby this is because I'm rebooting my computer quite often, or CPU is too hot? (I'm using cpufreqd for frequency scalling, so in night cooling fans aren't noisy. Constant monitoring of CPU temperature gives 38-54 C).

Computer as a whole is very stable. No crash since a buy. I'm running numerical, and raytracing codes on it often, and they produce no artifacts.
Is there any good and fast way to determine what is wrong? memtest86+ runing for hours gives nothing.

I discovered that when i don't use cpufreqd, and run with highest frequency, and not rebooting then most of workunits are accepted. And they are computed quickly.

But I don't know is this the case. Lots of people use laptops with frequency scalling and don't have similar problems.

Any idea?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731706044
RAC: 1223085

Hi! The vast majority of

Hi!

The vast majority of these errors appear with custom kernel/libs, so I wonder whether there might be anything in the compiler switches you used when building the libs and kernel that doesn't go well with the app. There are some options that are OK only if all the software using the libs are compiled with them as well.

CU
Bikeman

Venturini Dario[VENETO]
Venturini Dario...
Joined: 1 Apr 05
Posts: 12
Credit: 152416
RAC: 0

I used to run Einstein@Home

I used to run Einstein@Home on my PC equipped with WindowsXP. Now I tried switching to Ubuntu 64bit (7.10).

Every single WU I run fails:

ven 28 mar 2008 16:44:23 CET|Einstein@Home|Finished download of einstein_S5R3_4.38_graphics_i686-pc-linux-gnu
ven 28 mar 2008 16:44:36 CET|Einstein@Home|Starting h1_0900.50_S5R3__108_S5R3b_0
ven 28 mar 2008 16:44:37 CET|Einstein@Home|Starting task h1_0900.50_S5R3__108_S5R3b_0 using einstein_S5R3 version 438
ven 28 mar 2008 16:44:39 CET|Einstein@Home|Computation for task h1_0900.50_S5R3__108_S5R3b_0 finished
ven 28 mar 2008 16:44:39 CET|Einstein@Home|Output file h1_0900.50_S5R3__108_S5R3b_0_0 for task h1_0900.50_S5R3__108_S5R3b_0 absent


5.10.45

process exited with code 255 (0xff, -1)

Detected CPU type 1
execv returned: -1

]]>

That's the error I get. The WU fail immediately after starting.

One example is WU 38033925.

My computer is overclocked but I ran also other projects and they work successfully: LHC, Rosetta, ABC. Einstein and Spinhenge instead fail. Spinhenge fails with a different error (process exited with code 22 (0x16, -234))

I have BOINC version 5.10.45 64bit

Any idea?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731706044
RAC: 1223085

Hm....strange... But the

Hm....strange...

But the Spin exit code 22 hints to a lack of 32 bit compatibility libraries, see here:

http://boincfaq.mundayweb.com/index.php?language=1&view=280

If that doesn't help:

The 4.38 app package has four executables in it. One for graphics, and one that just performs a CPU capabilities check (basically detect SSE support), it then calls one of two other executables. For your machine it has to be the one with the "_1" suffix.

Make sure that *all* those executables are

a) present in the projects/einstein.phys.uwm.edu subdirectory of boinc
b) executable for the user that is used to run BOINC

It can't hurt to execute

ldd einstein_S5R3_4.38_i686-pc-linux-gnu_1

in the projects/einstein.phys.uwm.edu subdirectory of BOINC to verify all required shared libraries for this executable are present.

you can even try to execute this file directly .e.g.

einstein_S5R3_4.38_i686-pc-linux-gnu_1 --help

to see if it actually executes.

Any results from this?

Bikeman

Venturini Dario[VENETO]
Venturini Dario...
Joined: 1 Apr 05
Posts: 12
Credit: 152416
RAC: 0

RE: Hm....strange... But

Message 71205 in response to message 71204

Quote:

Hm....strange...

But the Spin exit code 22 hints to a lack of 32 bit compatibility libraries, see here:

http://boincfaq.mundayweb.com/index.php?language=1&view=280

If that doesn't help:

The 4.38 app package has four executables in it. One for graphics, and one that just performs a CPU capabilities check (basically detect SSE support), it then calls one of two other executables. For your machine it has to be the one with the "_1" suffix.

Make sure that *all* those executables are

a) present in the projects/einstein.phys.uwm.edu subdirectory of boinc
b) executable for the user that is used to run BOINC

It can't hurt to execute

ldd einstein_S5R3_4.38_i686-pc-linux-gnu_1

in the projects/einstein.phys.uwm.edu subdirectory of BOINC to verify all required shared libraries for this executable are present.

you can even try to execute this file directly .e.g.

einstein_S5R3_4.38_i686-pc-linux-gnu_1 --help

to see if it actually executes.

Any results from this?

Bikeman

Installed ia32 and everything works, including graphics. If you think I should try the rest, for everybody's benefit, I'll do. But I'm already happy. Thanks A LOT!

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731706044
RAC: 1223085

Glad it works now. No, the

Glad it works now. No, the other suggesteions were just "Plan B" in case the compatibility lib trick would not work.

Happy crunching
Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.