E@H on linux kernel 2.6.9 and AMD64 3000

The Pirate
The Pirate
Joined: 11 Nov 04
Posts: 57
Credit: 23332769
RAC: 0
Topic 187186

I just re-attached my AMD64 3000. When I do a top command, it does not show that E@H is running. The cpu utilization is at 99% however. And when I do a ps -Al it showes E@H as one of the processes but it is sleeping. If I kill E@H then the cpu goes to 100% idle so it is running. This is what happened before. I am using the new E@H client, 4.68. This is what happened before on the older E@H client, then after a while E@H quits running and the cpu goes to idle and does not start S@H.
I'll let it run over night to see what happens.

E@H does run correctly on the older linux kernel, 2.4.x


The Pirate
The Pirate
Joined: 11 Nov 04
Posts: 57
Credit: 23332769
RAC: 0

E@H on linux kernel 2.6.9 and AMD64 3000

Update
I had to detach the E@H project because while running E@H it went to idle about 16 hrs ago and never came back. Which prevented S@H from starting. This is the only box that I am running the linux 2.6.x kernel on.


Shaktai
Shaktai
Joined: 8 Nov 04
Posts: 183
Credit: 426451
RAC: 0

I wonder if this is due to

I wonder if this is due to the new client. The new Mac client seems to have a similar (though different) behavior.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> I just re-attached my AMD64

> I just re-attached my AMD64 3000. When I do a top command, it does not show
> that E@H is running. The cpu utilization is at 99% however. And when I do a ps
> -Al it showes E@H as one of the processes but it is sleeping. If I kill E@H
> then the cpu goes to 100% idle so it is running. This is what happened before.
> I am using the new E@H client, 4.68. This is what happened before on the older
> E@H client, then after a while E@H quits running and the cpu goes to idle and
> does not start S@H.
> I'll let it run over night to see what happens.
>
> E@H does run correctly on the older linux kernel, 2.4.x

James, can you please use 'strace -p PID' to poke at the
'sleeping' process and see what it's up to.

Bruce

Director, Einstein@Home

The Pirate
The Pirate
Joined: 11 Nov 04
Posts: 57
Credit: 23332769
RAC: 0

Ok, I just re-attached E@H,

Ok, I just re-attached E@H, but it probably will be a while before it happens again.


Steffen Grunewald, for Merlin/Morgane
Steffen Grunewa...
Joined: 18 Oct 04
Posts: 39
Credit: 592286604
RAC: 0

Got some similar behaviour.

Got some similar behaviour. Machine is running kernel 2.6.8.
After starting the boinc core client (4.13) and the usual benchmarks,
there will be two (dual-CPU machine!) einstein apps in Z state.
It is impossible to attach a strace to any of them.
Restarting from scratch will not help.
The last message I can see is "Restarting result..." using version 4.68.
Any ideas?

The Pirate
The Pirate
Joined: 11 Nov 04
Posts: 57
Credit: 23332769
RAC: 0

Ok, here are the results of

Ok, here are the results of three strace -p PID commands
--------------------------------------------------------------------------------
E@H running at 99% nice but does not show up in "top" as an active process, but does shows up in ps -Al as sleeping.

_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0}) = 0
gettimeofday({1104457857, 918755}, NULL) = 0
time(NULL) = 1104457857
gettimeofday({1104457857, 918954}, NULL) = 0
open("/proc/apm", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/proc/acpi/ac_adapter/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
time(NULL) = 1104457857
gettimeofday({1104457857, 919437}, NULL) = 0
time(NULL) = 1104457857
time(NULL) = 1104457857
wait4(0, 0xbffff75c, WNOHANG, 0xbffff760) = 0
time(NULL) = 1104457857
gettimeofday({1104457857, 919921}, NULL) = 0
time(NULL) = 1104457857
time(NULL) = 1104457857
time(NULL) = 1104457857
time(NULL) = 1104457857
time(NULL) = 1104457857
time(NULL) = 1104457857
select(1024, [4], NULL, [], {0, 0}) = 0 (Timeout)
time(NULL) = 1104457857
time(NULL) = 1104457857
gettimeofday({1104457857, 920955}, NULL) = 0
time(NULL) = 1104457857
time(NULL) = 1104457857
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0}) = 0
gettimeofday({1104457858, 923611}, NULL) = 0
time(NULL) = 1104457858
gettimeofday({1104457858, 923808}, NULL) = 0
open("/proc/apm", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/proc/acpi/ac_adapter/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or
----------------------------------------------------------------------------
Detached from E@H and re attached to E@H:

E@H running at 99% nice but does not show up in "top" as an active process, but does shows up in ps -Al as sleeping.

jimbob@XAN303000:~$ strace -p 5217
Process 5217 attached - interrupt to quit
setup() = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0}) = 0
-------------------------------------------------------------------------

Was running E@H but found 99% idle when I later checked on it with top command.
ps -Al did not show a process for E@H, only found a process for BOINC (PID 5142)
BOINC should have been running S@H in not running E@H But was sleeping and not running either.

jimbob@XAN303000:~$ strace -p 5142
Process 5142 attached - interrupt to quit
setup() = 0
gettimeofday({1104636731, 572051}, NULL) = 0
time(NULL) = 1104636731
gettimeofday({1104636731, 572257}, NULL) = 0
open("/proc/apm", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/proc/acpi/ac_adapter/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
time(NULL) = 1104636731
gettimeofday({1104636731, 572731}, NULL) = 0
time(NULL) = 1104636731
time(NULL) = 1104636731
wait4(0, 0xbffff75c, WNOHANG, 0xbffff760) = -1 ECHILD (No child processes)
gettimeofday({1104636731, 573138}, NULL) = 0
time(NULL) = 1104636731
time(NULL) = 1104636731
time(NULL) = 1104636731
time(NULL) = 1104636731
time(NULL) = 1104636731
time(NULL) = 1104636731
select(1024, [4], NULL, [], {0, 0}) = 0 (Timeout)
time(NULL) = 1104636731
time(NULL) = 1104636731
gettimeofday({1104636731, 574139}, NULL) = 0
time(NULL) = 1104636731
time(NULL) = 1104636731
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0}) = 0
gettimeofday({1104636732, 575925}, NULL) = 0
time(NULL) = 1104636732
gettimeofday({1104636732, 576128}, NULL) = 0
open("/proc/apm", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/proc/acpi/ac_adapter/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
time(NULL) = 1104636732
gettimeofday({1104636732, 576615}, NULL) = 0
time(NULL) = 1104636732
time(NULL) = 1104636732
wait4(0, 0xbffff75c, WNOHANG, 0xbffff760) = -1 ECHILD (No child processes)
gettimeofday({1104636732, 577032}, NULL) = 0
time(NULL) = 1104636732
time(NULL) = 1104636732
time(NULL) = 1104636732
time(NULL) = 1104636732
time(NULL) = 1104636732
time(NULL) = 1104636732
select(1024, [4], NULL, [], {0, 0}) = 0 (Timeout)
time(NULL) = 1104636732
time(NULL) = 1104636732
gettimeofday({1104636732, 578079}, NULL) = 0
time(NULL) = 1104636732
time(NULL) = 1104636732


Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

It looks like BOINC is trying

It looks like BOINC is trying to determine if your machine is running on batteries or not! Do you have 'battery/laptop' preferences set? If so, could you try changing them, please, and tell me if it does anything?

Cheers,
Bruce

Director, Einstein@Home

Steffen Grunewald, for Merlin/Morgane
Steffen Grunewa...
Joined: 18 Oct 04
Posts: 39
Credit: 592286604
RAC: 0

Hi, running kernel 2.6.8

Hi,

running kernel 2.6.8 here on a rackmount machine. The boinc client will go into
the same loop (but it would do so on any machine I suppose), and there are two
"zombie" processes (it's a dual Xeon machine) which would disappear if the boinc
client was killed. The zombies of course can't be straced...
The strange thing is that although no process seems to do real work, there's
still progress being made : the numbers in client_state.xml *do*
change.
This is strange, and I only see it with a 2.6 machine (I'm running another one
with basically the same Debian installation but kernel 2.4.xy so don't blame
Debian).
Unfortunately it took some time to find out (I wrote a watcher script to keep
track of the machines), so I dropped a lot of workunits that would have been
useful otherwise.
Since the WU is a pt* one at the moment, it will take some time to reach 99%,
I'm a bit curious what would happen next...

Cheers,
Steffen

The Pirate
The Pirate
Joined: 11 Nov 04
Posts: 57
Credit: 23332769
RAC: 0

Bruch, This is a desktop

Bruch,

This is a desktop and not a laptop so there is no battery. However, there appears to be a partial acpi setup and there are some battery options in the control panl. I just installed this an another desktop using a ASUS A7M266-D mobo with dual AMD MP 2100's and have the same problem. It also has some of the battery options showing.
I am using Xandros V3.0 Linux with a 2.6.9.x kernel. I am going to go over to the Xandros forums and post to see what can be found.

Jim


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.