Just for info: I crashed another WU last night, error looks similar. Since the box returns completely valid results for two other projects I really think it's an application problem. The interesting part is that it killed two workunits at the same moment again... anyone willing to bet it was switching between applications at that moment? To find out, I might suspend the other two projects for a couple of days, see if Einstein is running stable. Should I?
Just for info: I crashed another WU last night, error looks similar. Since the box returns completely valid results for two other projects I really think it's an application problem. The interesting part is that it killed two workunits at the same moment again... anyone willing to bet it was switching between applications at that moment? To find out, I might suspend the other two projects for a couple of days, see if Einstein is running stable. Should I?
Hi!
Or maybe do the opposite: force a switch (by manually suspending projects in boinc manager, pulling the mains plug so it runs on battery...), and see if this crashes E@H app. I've tried this but could not reproduce the problem, I'll try some other kernel/glibc versions as well by booting from older Knoppix CDs.
Since your notebook is one of the few hosts that shows this error in 4.16, do you have DDD installed so that you could try to run the app in a debugger? See Bernd's instructions for this here.
Maybe it's also worthwhile to make a backup copy of the BOINC folder in it's current state for later analysis. Worst thing that could happen now would be if your PC would suddenly fail to crash WU before we understand what's going on. With a tarball of the BOINC directory in a state where it would crash the ongoing computation it would be possible to verify a fix for your problem later by just re-running the WU (offline).
Or maybe do the opposite: force a switch (by manually suspending projects in boinc manager, pulling the mains plug so it runs on battery...), and see if this crashes E@H app. I've tried this but could not reproduce the problem, I'll try some other kernel/glibc versions as well by booting from older Knoppix CDs.
Since your notebook is one of the few hosts that shows this error in 4.16, do you have DDD installed so that you could try to run the app in a debugger? See Bernd's instructions for this here.
Maybe it's also worthwhile to make a backup copy of the BOINC folder in it's current state for later analysis. Worst thing that could happen now would be if your PC would suddenly fail to crash WU before we understand what's going on. With a tarball of the BOINC directory in a state where it would crash the ongoing computation it would be possible to verify a fix for your problem later by just re-running the WU (offline).
CU
H-B
Well, I created said tarball, installed ddd and created the file Bernd mentioned, but somehow it fails to trigger the debugger. Can anyone please help?
Well, I created said tarball, installed ddd and created the file Bernd mentioned, but somehow it fails to trigger the debugger. Can anyone please help?
The EAH_DEBUG_DDD file is in your BOINC folder, where client_state.xml is as well?
Hmmm... DDD should start up whenever the einstein app is physically loaded and started, not on re-activation, so shuting down boinc and restarting is a must, but that goes without saying.
Any output from
tail -n 200 ~/BOINC/slots/*/stderr.txt
(or wherever the BOINC dir resides on your system)
Thanks, that was really very helpful! Looks like I simply didn't get a message that the debugger was used, but it was there after all.
I did everything as you said (of course I had also restarted the core client ;-) I'm not _that_ newbish, I was simply confused there was no output since Bernd's posting sounded to me like there should be).
Quote:
Any output from
tail -n 200 ~/BOINC/slots/*/stderr.txt
(or wherever the BOINC dir resides on your system)
that refers to the debugger?
I get the following:
2007-11-11 15:07:44.6704 [normal]: Start of BOINC application 'einstein_S5R3_4.16_i686-pc-linux-gnu'.
2007-11-11 15:07:44.6705 [normal]: Found 'EAH_DEBUG_DDD' file, trying debugging with 'ddd'
ddd: Symbol `_XmStrings' has different size in shared object, consider re-linking
Thanks, that was really very helpful! Looks like I simply didn't get a message that the debugger was used, but it was there after all.
I did everything as you said (of course I had also restarted the core client ;-) I'm not _that_ newbish, I was simply confused there was no output since Bernd's posting sounded to me like there should be).
Quote:
Any output from
tail -n 200 ~/BOINC/slots/*/stderr.txt
(or wherever the BOINC dir resides on your system)
that refers to the debugger?
I get the following:
2007-11-11 15:07:44.6704 [normal]: Start of BOINC application 'einstein_S5R3_4.16_i686-pc-linux-gnu'.
2007-11-11 15:07:44.6705 [normal]: Found 'EAH_DEBUG_DDD' file, trying debugging with 'ddd'
ddd: Symbol `_XmStrings' has different size in shared object, consider re-linking
Looks okay, doesn't it?
Well, this should be the moment where DDD actually pops up and you have to enter the "cont" command in it's gdb prompt to resume the app.
Seems like ddd has problems starting on your system, maybe. what do you see when starting up ddd manually and attaching to any process?
I only seem to be able to attach to some processes... PIDs up to 6000 and all running as root I think... but when I start the debugger with user rights I can't start BOINC since it's running as a daemon. So I can't really say anything. The debugger as such seems to start normally, though.
I only seem to be able to attach to some processes... PIDs up to 6000 and all running as root I think... but when I start the debugger with user rights I can't start BOINC since it's running as a daemon. So I can't really say anything. The debugger as such seems to start normally, though.
I start BOINC manually and not via some /etc/init.d/ magic , I wonder whether this makes a difference. Would the einstein app (which is starting DDD) have an X DISPLAY variable to start with, if it's started by a daemon boinc???
I start BOINC manually and not via some /etc/init.d/ magic , I wonder whether this makes a difference.
Probably, but I'm too inexperienced with the debugger to make sense of it.
Quote:
Would the einstein app (which is starting DDD) have an X DISPLAY variable to start with, if it's started by a daemon boinc???
CU
H-B
Like this?
Xlib: connection to ":0.0" refused by server^M
Xlib: No protocol specified
^M
Error: Can't open display: :0
Well this looks like a X related problem for sure.
I guess if you start boinc from a normal desktop session, it should work. IIRC just starting the "boinc" exectuable in it's base directory should do the trick.
Just for info: I crashed
)
Just for info: I crashed another WU last night, error looks similar. Since the box returns completely valid results for two other projects I really think it's an application problem. The interesting part is that it killed two workunits at the same moment again... anyone willing to bet it was switching between applications at that moment? To find out, I might suspend the other two projects for a couple of days, see if Einstein is running stable. Should I?
RE: Just for info: I
)
Hi!
Or maybe do the opposite: force a switch (by manually suspending projects in boinc manager, pulling the mains plug so it runs on battery...), and see if this crashes E@H app. I've tried this but could not reproduce the problem, I'll try some other kernel/glibc versions as well by booting from older Knoppix CDs.
Since your notebook is one of the few hosts that shows this error in 4.16, do you have DDD installed so that you could try to run the app in a debugger? See Bernd's instructions for this here.
Maybe it's also worthwhile to make a backup copy of the BOINC folder in it's current state for later analysis. Worst thing that could happen now would be if your PC would suddenly fail to crash WU before we understand what's going on. With a tarball of the BOINC directory in a state where it would crash the ongoing computation it would be possible to verify a fix for your problem later by just re-running the WU (offline).
CU
H-B
RE: Hi! Or maybe do the
)
Well, I created said tarball, installed ddd and created the file Bernd mentioned, but somehow it fails to trigger the debugger. Can anyone please help?
RE: Well, I created said
)
The EAH_DEBUG_DDD file is in your BOINC folder, where client_state.xml is as well?
Hmmm... DDD should start up whenever the einstein app is physically loaded and started, not on re-activation, so shuting down boinc and restarting is a must, but that goes without saying.
Any output from
tail -n 200 ~/BOINC/slots/*/stderr.txt
(or wherever the BOINC dir resides on your system)
that refers to the debugger?
Strange....works for me.
CU
H-B
Thanks, that was really very
)
Thanks, that was really very helpful! Looks like I simply didn't get a message that the debugger was used, but it was there after all.
I did everything as you said (of course I had also restarted the core client ;-) I'm not _that_ newbish, I was simply confused there was no output since Bernd's posting sounded to me like there should be).
I get the following:
2007-11-11 15:07:44.6704 [normal]: Start of BOINC application 'einstein_S5R3_4.16_i686-pc-linux-gnu'.
2007-11-11 15:07:44.6705 [normal]: Found 'EAH_DEBUG_DDD' file, trying debugging with 'ddd'
ddd: Symbol `_XmStrings' has different size in shared object, consider re-linking
Looks okay, doesn't it?
RE: Thanks, that was really
)
Well, this should be the moment where DDD actually pops up and you have to enter the "cont" command in it's gdb prompt to resume the app.
Seems like ddd has problems starting on your system, maybe. what do you see when starting up ddd manually and attaching to any process?
CU
H-B
I only seem to be able to
)
I only seem to be able to attach to some processes... PIDs up to 6000 and all running as root I think... but when I start the debugger with user rights I can't start BOINC since it's running as a daemon. So I can't really say anything. The debugger as such seems to start normally, though.
RE: I only seem to be able
)
I start BOINC manually and not via some /etc/init.d/ magic , I wonder whether this makes a difference. Would the einstein app (which is starting DDD) have an X DISPLAY variable to start with, if it's started by a daemon boinc???
CU
H-B
RE: I start BOINC manually
)
Probably, but I'm too inexperienced with the debugger to make sense of it.
Like this?
Xlib: connection to ":0.0" refused by server^M
Xlib: No protocol specified
^M
Error: Can't open display: :0
RE: RE: I start BOINC
)
Well this looks like a X related problem for sure.
I guess if you start boinc from a normal desktop session, it should work. IIRC just starting the "boinc" exectuable in it's base directory should do the trick.
CU
H-B