GNU/Linux S5R3 App 4.16 available for Beta test

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0

Just for info: I crashed

Just for info: I crashed another WU last night, error looks similar. Since the box returns completely valid results for two other projects I really think it's an application problem. The interesting part is that it killed two workunits at the same moment again... anyone willing to bet it was switching between applications at that moment? To find out, I might suspend the other two projects for a couple of days, see if Einstein is running stable. Should I?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,449,747
RAC: 124,225

RE: Just for info: I

Message 75212 in response to message 75211

Quote:
Just for info: I crashed another WU last night, error looks similar. Since the box returns completely valid results for two other projects I really think it's an application problem. The interesting part is that it killed two workunits at the same moment again... anyone willing to bet it was switching between applications at that moment? To find out, I might suspend the other two projects for a couple of days, see if Einstein is running stable. Should I?

Hi!

Or maybe do the opposite: force a switch (by manually suspending projects in boinc manager, pulling the mains plug so it runs on battery...), and see if this crashes E@H app. I've tried this but could not reproduce the problem, I'll try some other kernel/glibc versions as well by booting from older Knoppix CDs.

Since your notebook is one of the few hosts that shows this error in 4.16, do you have DDD installed so that you could try to run the app in a debugger? See Bernd's instructions for this here.

Maybe it's also worthwhile to make a backup copy of the BOINC folder in it's current state for later analysis. Worst thing that could happen now would be if your PC would suddenly fail to crash WU before we understand what's going on. With a tarball of the BOINC directory in a state where it would crash the ongoing computation it would be possible to verify a fix for your problem later by just re-running the WU (offline).

CU

H-B

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0

RE: Hi! Or maybe do the

Message 75213 in response to message 75212

Quote:


Hi!

Or maybe do the opposite: force a switch (by manually suspending projects in boinc manager, pulling the mains plug so it runs on battery...), and see if this crashes E@H app. I've tried this but could not reproduce the problem, I'll try some other kernel/glibc versions as well by booting from older Knoppix CDs.

Since your notebook is one of the few hosts that shows this error in 4.16, do you have DDD installed so that you could try to run the app in a debugger? See Bernd's instructions for this here.

Maybe it's also worthwhile to make a backup copy of the BOINC folder in it's current state for later analysis. Worst thing that could happen now would be if your PC would suddenly fail to crash WU before we understand what's going on. With a tarball of the BOINC directory in a state where it would crash the ongoing computation it would be possible to verify a fix for your problem later by just re-running the WU (offline).

CU

H-B

Well, I created said tarball, installed ddd and created the file Bernd mentioned, but somehow it fails to trigger the debugger. Can anyone please help?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,449,747
RAC: 124,225

RE: Well, I created said

Message 75214 in response to message 75213

Quote:

Well, I created said tarball, installed ddd and created the file Bernd mentioned, but somehow it fails to trigger the debugger. Can anyone please help?

The EAH_DEBUG_DDD file is in your BOINC folder, where client_state.xml is as well?

Hmmm... DDD should start up whenever the einstein app is physically loaded and started, not on re-activation, so shuting down boinc and restarting is a must, but that goes without saying.

Any output from

tail -n 200 ~/BOINC/slots/*/stderr.txt

(or wherever the BOINC dir resides on your system)

that refers to the debugger?

Strange....works for me.

CU
H-B

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0

Thanks, that was really very

Message 75215 in response to message 75214

Thanks, that was really very helpful! Looks like I simply didn't get a message that the debugger was used, but it was there after all.
I did everything as you said (of course I had also restarted the core client ;-) I'm not _that_ newbish, I was simply confused there was no output since Bernd's posting sounded to me like there should be).

Quote:


Any output from

tail -n 200 ~/BOINC/slots/*/stderr.txt

(or wherever the BOINC dir resides on your system)

that refers to the debugger?

I get the following:

2007-11-11 15:07:44.6704 [normal]: Start of BOINC application 'einstein_S5R3_4.16_i686-pc-linux-gnu'.
2007-11-11 15:07:44.6705 [normal]: Found 'EAH_DEBUG_DDD' file, trying debugging with 'ddd'
ddd: Symbol `_XmStrings' has different size in shared object, consider re-linking

Looks okay, doesn't it?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,449,747
RAC: 124,225

RE: Thanks, that was really

Message 75216 in response to message 75215

Quote:

Thanks, that was really very helpful! Looks like I simply didn't get a message that the debugger was used, but it was there after all.
I did everything as you said (of course I had also restarted the core client ;-) I'm not _that_ newbish, I was simply confused there was no output since Bernd's posting sounded to me like there should be).

Quote:


Any output from

tail -n 200 ~/BOINC/slots/*/stderr.txt

(or wherever the BOINC dir resides on your system)

that refers to the debugger?

I get the following:

2007-11-11 15:07:44.6704 [normal]: Start of BOINC application 'einstein_S5R3_4.16_i686-pc-linux-gnu'.
2007-11-11 15:07:44.6705 [normal]: Found 'EAH_DEBUG_DDD' file, trying debugging with 'ddd'
ddd: Symbol `_XmStrings' has different size in shared object, consider re-linking

Looks okay, doesn't it?

Well, this should be the moment where DDD actually pops up and you have to enter the "cont" command in it's gdb prompt to resume the app.

Seems like ddd has problems starting on your system, maybe. what do you see when starting up ddd manually and attaching to any process?

CU

H-B

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0

I only seem to be able to

I only seem to be able to attach to some processes... PIDs up to 6000 and all running as root I think... but when I start the debugger with user rights I can't start BOINC since it's running as a daemon. So I can't really say anything. The debugger as such seems to start normally, though.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,449,747
RAC: 124,225

RE: I only seem to be able

Message 75218 in response to message 75217

Quote:
I only seem to be able to attach to some processes... PIDs up to 6000 and all running as root I think... but when I start the debugger with user rights I can't start BOINC since it's running as a daemon. So I can't really say anything. The debugger as such seems to start normally, though.

I start BOINC manually and not via some /etc/init.d/ magic , I wonder whether this makes a difference. Would the einstein app (which is starting DDD) have an X DISPLAY variable to start with, if it's started by a daemon boinc???

CU

H-B

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0

RE: I start BOINC manually

Message 75219 in response to message 75218

Quote:
I start BOINC manually and not via some /etc/init.d/ magic , I wonder whether this makes a difference.

Probably, but I'm too inexperienced with the debugger to make sense of it.

Quote:

Would the einstein app (which is starting DDD) have an X DISPLAY variable to start with, if it's started by a daemon boinc???

CU

H-B

Like this?

Xlib: connection to ":0.0" refused by server^M
Xlib: No protocol specified
^M
Error: Can't open display: :0

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,449,747
RAC: 124,225

RE: RE: I start BOINC

Message 75220 in response to message 75219

Quote:
Quote:
I start BOINC manually and not via some /etc/init.d/ magic , I wonder whether this makes a difference.

Probably, but I'm too inexperienced with the debugger to make sense of it.

Quote:

Would the einstein app (which is starting DDD) have an X DISPLAY variable to start with, if it's started by a daemon boinc???

CU

H-B

Like this?

Xlib: connection to ":0.0" refused by server^M
Xlib: No protocol specified
^M
Error: Can't open display: :0

Well this looks like a X related problem for sure.

I guess if you start boinc from a normal desktop session, it should work. IIRC just starting the "boinc" exectuable in it's base directory should do the trick.

CU
H-B

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.