Lots of Client errors

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

This is what I get on my SuSE

This is what I get on my SuSE 11.1 with no errors in all apps I run:
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
Tullio

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

Hi everybody and a happy new

Hi everybody and a happy new year!

Since according to the FAQ the bug was fixed in kernel version 2.6.25.6 (or is it 2.6.27 as in the beginning of the FAQ?) it is likely to be a SUSE problem.
I have not yet tested the official (non-SUSE) kernel. On the other hand, if it was a general problem with all kernels we should see more people complaining about this error.

For me, re-compiling the kernel with

CONFIG_PREEMPT_VOLUNTARY=y

seems to do the trick.

This kernel is running since Dec.30th and except two "Validate errors" (probably due to a system crash on my side) all Einstein WUs behave normal, as you can see here.

If I can find the time I'll set up a test machine during the next couple of weeks an do some testing.

Cheers,
Michael

exo
exo
Joined: 11 Feb 06
Posts: 11
Credit: 133077998
RAC: 0

Hi, I had the same problem

Hi,

I had the same problem and CONFIG_PREEMPT_VOLUNTARY=y fixed it. I'm in the opinion that this is probably a kernel bug, because I'm using neither SuSE nor an AMD CPU. Computation SIGNAL 8 errors always occured when 5 or more processes were computing in parallel - with fewer processes there were no problems.

These are my system spec:

sb@shima ~ $ uname -a
Linux shima 2.6.32-gentoo-r1 #2 SMP Sat Jan 9 13:39:34 CET 2010 x86_64 Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz GenuineIntel GNU/Linux

Maybe someone should report this to upstream.

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

RE: Hi, I had the same

Message 96009 in response to message 96008

Quote:

Hi,

I had the same problem and CONFIG_PREEMPT_VOLUNTARY=y fixed it. I'm in the opinion that this is probably a kernel bug, because I'm using neither SuSE nor an AMD CPU. Computation SIGNAL 8 errors always occured when 5 or more processes were computing in parallel - with fewer processes there were no problems.

These are my system spec:

sb@shima ~ $ uname -a
Linux shima 2.6.32-gentoo-r1 #2 SMP Sat Jan 9 13:39:34 CET 2010 x86_64 Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz GenuineIntel GNU/Linux

Maybe someone should report this to upstream.

Are you running this kernel at the moment?

You have 3 compute errors today - so it doesn't look like it's fixed.

Cheers,
Michael

exo
exo
Joined: 11 Feb 06
Posts: 11
Credit: 133077998
RAC: 0

RE: Are you running this

Message 96010 in response to message 96009

Quote:

Are you running this kernel at the moment?

You have 3 compute errors today - so it doesn't look like it's fixed.

Cheers,
Michael

These 3 errors were with full preemptible kernel. Now I'm running the kernel I've posted.

Pushkin
Pushkin
Joined: 12 Mar 07
Posts: 15
Credit: 33187685
RAC: 0

RE: Computation SIGNAL 8

Message 96011 in response to message 96008

Quote:

Computation SIGNAL 8 errors always occured when 5 or more processes were computing in parallel - with fewer processes there were no problems.

Hi,
man, you're right! You have just lead me to this idea - as far as I remember, these compute errors occur occasionally while "normal" computer work (let's say one per day). I receive more computation errors while I run some hard computations (sometimes I work with FEM and compute large tasks). Absolutely most computation errors occured (really lots, almost all units have been bad), when my KDE4 got stuck - there was some problem with kwin4 and plasma ... both have used 100% of my CPU.

Is there a possibility, that these computation errors occure while there are more applications than CPU cores, which would like to use 100% of CPU core?

rroonnaalldd
rroonnaalldd
Joined: 12 Dec 05
Posts: 116
Credit: 537221
RAC: 0

RE: Absolutely most

Message 96012 in response to message 96011

Quote:

Absolutely most computation errors occured (really lots, almost all units have been bad), when my KDE4 got stuck - there was some problem with kwin4 and plasma ... both have used 100% of my CPU.

Is there a possibility, that these computation errors occure while there are more applications than CPU cores, which would like to use 100% of CPU core?

No, more applications than CPU cores are not a problem. This is multitasking. ;) The problem seems to be KDE4 and/or the plasma theme. IIRC, plasma is a theme with gpu-usage? Try to use an older desktop theme or change the whole desktop-manager to a smaller one like Icebox or Fvwm2.
I got many troubles (desktop freezes, mouse or/and keyboard dead) on an older host with KDE3 too. Really uncool but you learn to love a second pc with ssh onboard. ;)

exo
exo
Joined: 11 Feb 06
Posts: 11
Credit: 133077998
RAC: 0

RE: Is there a

Message 96013 in response to message 96011

Quote:

Is there a possibility, that these computation errors occure while there are more applications than CPU cores, which would like to use 100% of CPU core?

In my case errors occured when more processes than physical cores were running - but this is only a guess.

I'm running kde4 with desktop effects on, maybe there are only problems with high load when GPU is in use? Are you using proprietary nvidia drivers?

Pushkin
Pushkin
Joined: 12 Mar 07
Posts: 15
Credit: 33187685
RAC: 0

RE: RE: Is there a

Message 96014 in response to message 96013

Quote:
Quote:

Is there a possibility, that these computation errors occure while there are more applications than CPU cores, which would like to use 100% of CPU core?

In my case errors occured when more processes than physical cores were running - but this is only a guess.

I'm running kde4 with desktop effects on, maybe there are only problems with high load when GPU is in use? Are you using proprietary nvidia drivers?


No, I am using ATI proprietary drivers and I have GPU computations disabled in a configuration file, so I think, that this bug is not related to GPU.

Pushkin
Pushkin
Joined: 12 Mar 07
Posts: 15
Credit: 33187685
RAC: 0

Hello, I have discussed this

Hello,
I have discussed this problem in one of our national forums and there was a suggestion to run an infinite loop and try to provoke generation of client errors. I did so and the result is, that all workunits computed during the test were finished with an error.

Conditions of the test were simple - I have run two infinite loops (for approx. 18 hours), which have been processed by two of four cores of my CPU. Resting two cores were divided between four processes of BOINC (Einstein and Rosetta).

Can you please try it too to prove that this error may be reproduced on different hardware/distro?

Pushkin

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.