Work unit completion kills BOINC client.

Dirk Mittler
Dirk Mittler
Joined: 15 Nov 09
Posts: 17
Credit: 387163
RAC: 0
Topic 194941

Hello. I'm somewhat new to BOINC, even though I've been running this client for several months now (less than one year). During my first few months of donating idle CPU time, all the work units just seemed to run, complete, and upload their results.

But in more recent times (for over a month now), I've been getting a consistent misbehavior of Einstein@Home projects. SETI@home is still running all its work units without a hitch. But every time an Einstein@Home work unit finishes, it seems to cause my BOINC client itself to exit. Worse, I can't seem to find any backlog or trace of error messages in dmesg or in /var/log/syslog.

Also, since my work units tend to complete overnight, or when I'm away, it's not that practical for me to keep a lookout on every dmesg debug line in the meantime.

In hope that there might have been some bug responsible in just a few work units, I aborted those a few of them, waited a week or so, and downloaded a new Einstein@Home work unit. For some time I've had to suspend your project temporarily, so that the other project can run.

Needless to say, when Einstein@home finishes a work unit, the BOINC client itself keeps running.

Okay, so I'm Debian/Etch based, and running BOINC client version 5.4.11 , which is the easiest thing to install right from the package manager. Some people might think that I absolutely need to compile and install the latest version. But I wouldn't see it as practical, to be spending too much effort running BOINC for now.

Also, even though earlier that v1.6, GPU support is ignored, I've turned GPU use off in the project settings. Alas, this did not solve the problem.

I'm wondering whether somebody might just be able to point something out to me, which I could be missing.

Dirk
AMD Athlon 64 X2 2.6GHz Dual Core 5000+
(Intentionally only running 32-bit Linux)
Debian-based system: Kanotix/Thorhammer
KDE
GeForce 6150SE On-Board GPU with 128MB of shared RAM
(Presumably not involved with my version of BOINC)

2x1GB 333MHz DDR2 RAM

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

Work unit completion kills BOINC client.

I'm assuming your package manager installs BOINC to its own user and its own home directory. So the core client (probably called boinc or boinc-client or something like that) keeps running, but the manager crashes?

Can you check to see if anything is written out in in an error file in ~/boinc (maybe boinc.log or error.log)?

Also, if you wanted to upgrade, you shouldn't have to compile it for yourself. Go the the main BOINC site and download the package. It's a .sh so you'll just need to unpack it where ever you want it.

I can't give much more help than that. Sorry. I'm a Fedora (albeit slightly unhappy atm) person and I install BOINC myself to run as a daemon.

Kathryn :o)

Einstein@Home Moderator

Dirk Mittler
Dirk Mittler
Joined: 15 Nov 09
Posts: 17
Credit: 387163
RAC: 0

RE: I'm assuming your

Message 98174 in response to message 98173

Quote:
I'm assuming your package manager installs BOINC to its own user and its own home directory. So the core client (probably called boinc or boinc-client or something like that) keeps running, but the manager crashes?

Well, this seems like a reasonable assumption, but just isn't true. My package manager installs a system, by which the userid is named "boinc". Project files are kept under /var/lib/boinc-client . But you see, the work unit itself also runs under userid boinc, launched by the core client. And so what does happen to me, is that the core client dies with it.

But now I have looked for an error log, and found that standard error output gets redirected to /usr/lib/boinc-client/stderrdae.txt . And I think that this may be the relevant output sent to the file:

SIGSEGV: segmentation violationStack trace (17 frames):
/usr/bin/boinc_client[0x8089505]
[0xffffe420]
/lib/tls/libc.so.6(wcslen+0x8)[0xb7a32c88]
/lib/tls/libc.so.6(wcsrtombs+0x19d)[0xb7a33e9d]
/lib/tls/libc.so.6(_IO_vfprintf+0x2e00)[0xb79ff480]
/lib/tls/libc.so.6(vsprintf+0x8b)[0xb7a193bb]
/usr/bin/boinc_client[0x808bd1c]
/usr/bin/boinc_client[0x808c1b9]
/usr/bin/boinc_client[0x805e416]
/usr/bin/boinc_client[0x806e909]
/usr/bin/boinc_client[0x806eb52]
/usr/bin/boinc_client[0x806ec6f]
/usr/bin/boinc_client[0x805a3d0]
/usr/bin/boinc_client[0x807b284]
/usr/bin/boinc_client[0x807b529]
/lib/tls/libc.so.6(__libc_start_main+0xc8)[0xb79d5ea8]
/usr/bin/boinc_client(__gxx_personality_v0+0xb9)[0x804b7c1]

Exiting...

It occurs twice.

This error output isn't managed by the core client anymore, for which reason the core client fails to associate it with the project. And yet, Einstein@Home is the only project which has been crashing my client program. Would you say that a stack trace such as this one might be useful?

Quote:


Can you check to see if anything is written out in in an error file in ~/boinc (maybe boinc.log or error.log)?

Also, if you wanted to upgrade, you shouldn't have to compile it for yourself. Go the the main BOINC site and download the package. It's a .sh so you'll just need to unpack it where ever you want it.

I can't give much more help than that. Sorry. I'm a Fedora (albeit slightly unhappy atm) person and I install BOINC myself to run as a daemon.


Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

RE: SIGSEGV: segmentation

Message 98175 in response to message 98174

Quote:

SIGSEGV: segmentation violationStack trace (17 frames):
/usr/bin/boinc_client[0x8089505]
[0xffffe420]
/lib/tls/libc.so.6(wcslen+0x8)[0xb7a32c88]
/lib/tls/libc.so.6(wcsrtombs+0x19d)[0xb7a33e9d]
/lib/tls/libc.so.6(_IO_vfprintf+0x2e00)[0xb79ff480]
/lib/tls/libc.so.6(vsprintf+0x8b)[0xb7a193bb]
/usr/bin/boinc_client[0x808bd1c]
/usr/bin/boinc_client[0x808c1b9]
/usr/bin/boinc_client[0x805e416]
/usr/bin/boinc_client[0x806e909]
/usr/bin/boinc_client[0x806eb52]
/usr/bin/boinc_client[0x806ec6f]
/usr/bin/boinc_client[0x805a3d0]
/usr/bin/boinc_client[0x807b284]
/usr/bin/boinc_client[0x807b529]
/lib/tls/libc.so.6(__libc_start_main+0xc8)[0xb79d5ea8]
/usr/bin/boinc_client(__gxx_personality_v0+0xb9)[0x804b7c1]

Exiting...


I've seen that error before. See this thread on the BOINC forums for possible cause and solution.

Dirk Mittler
Dirk Mittler
Joined: 15 Nov 09
Posts: 17
Credit: 387163
RAC: 0

RE: RE: SIGSEGV:

Message 98176 in response to message 98175

Quote:
Quote:

SIGSEGV: segmentation violationStack trace (17 frames):
/usr/bin/boinc_client[0x8089505]
[0xffffe420]
/lib/tls/libc.so.6(wcslen+0x8)[0xb7a32c88]
/lib/tls/libc.so.6(wcsrtombs+0x19d)[0xb7a33e9d]
/lib/tls/libc.so.6(_IO_vfprintf+0x2e00)[0xb79ff480]
/lib/tls/libc.so.6(vsprintf+0x8b)[0xb7a193bb]
/usr/bin/boinc_client[0x808bd1c]
/usr/bin/boinc_client[0x808c1b9]
/usr/bin/boinc_client[0x805e416]
/usr/bin/boinc_client[0x806e909]
/usr/bin/boinc_client[0x806eb52]
/usr/bin/boinc_client[0x806ec6f]
/usr/bin/boinc_client[0x805a3d0]
/usr/bin/boinc_client[0x807b284]
/usr/bin/boinc_client[0x807b529]
/lib/tls/libc.so.6(__libc_start_main+0xc8)[0xb79d5ea8]
/usr/bin/boinc_client(__gxx_personality_v0+0xb9)[0x804b7c1]

Exiting...


I've seen that error before. See this thread on the BOINC forums for possible cause and solution.

Thank you for pointing out that thread to me. As I wrote, my system is a 32-bit, not a 64-bit system. So in my case, there is really no issue with 64-bit packages. But in light of this response, what I just did was to upgrade my libc6 package from 2.3.6.ds1-13etch4 to 2.3.6.ds1-13etch10+b1 . (And actually, that is one kind of upgrade after which I do reboot.)

This type of upgrade is usually a stability improvement, and may be just what I need. In any case I'll re-enable the Einstein@Home project as of tonight, to let that work until tomorrow morning, and at that time we'll see if this fixed the problem.

Just as with the other person's post, doing such an upgrade also reinstalled libc6 (and libc6-dev in my case).

Dirk

P.S. I don't foresee increasing my RAM soon, because on a 32-bit system, 2GB is plenty, while 14GB becomes an unthing. Also, I don't happen to have a bigmem kernel with extended addressing, to make 14GB happen any time soon that way, either.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

RE: P.S. I don't foresee

Message 98177 in response to message 98176

Quote:
P.S. I don't foresee increasing my RAM soon, because on a 32-bit system, 2GB is plenty, while 14GB becomes an unthing. Also, I don't happen to have a bigmem kernel with extended addressing, to make 14GB happen any time soon that way, either.


The 14GB isn't RAM, but virtual memory, in other words your page file. Still, you'll be 'hampered' by the 32bit factor, so you can set up a maximum of 4GB (4096MB). You don't need to, though, if you don't want to. :-)

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

My 32 bit SuSE Linux kernel

My 32 bit SuSE Linux kernel is pae, so I can use all of my 5 GB RAM. I could go up to 8 GB on my SUN WS. Most of the time the kernel uses about 3 GB as a disk cache.
Tullio

Dirk Mittler
Dirk Mittler
Joined: 15 Nov 09
Posts: 17
Credit: 387163
RAC: 0

Well I'm afraid that you've

Well I'm afraid that you've posted an invalid solution to this problem.

I allowed the Einstein@Home project to run overnight, and the same error took place as did before (no change). Therefore, since you people are unwilling to consider the real problems behind your software, I see no choice but to suspend your project indefinitely.

Dirk

Dirk Mittler
Dirk Mittler
Joined: 15 Nov 09
Posts: 17
Credit: 387163
RAC: 0

I apologize. Firstly, I don't

I apologize. Firstly, I don't want to edit out the loss of patience in my posting above, because for me to do so might not put your off-line responses to my impatience in fair light.

And secondly, I should not become so impatient. I understand that in reality, you might be willing to consider possible bugs in the core client, or in your work units, if enough information was given to you to enable you to do so.

But as it stands, there could be an error in the build of the core client which I've downloaded the quick and dirty way through my package manager, which makes it possible in the first case for the work unit to take out the client. This would *not* really be your fault at this time...

I have to face another reality however. Ultimately, my only reason for suspending Einstein@Home, is the fact that I want my BOINC client process to keep running. But if I consider upgrading such a critical core library as libc6, this begins to look less appetizing, when everything on my computer except for BOINC is extremely stable. I still don't know, whether my Linux box is still as stable today as it was two days ago, because some problems would take a long time to become noticeable.

The function of my Linux box, isn't to act as an expandable toy, but to act as my main computer, with my important documents, projects and data to be processed reliably. So I'm not the type of person who tinkers overly in ways which could compromise stability. This actually goes well with the fact that it's an Etch based system.

But then it becomes a bit of a nuisance, to have upgraded libc6, and to see that I've been barking up the wrong tree.

Whether you're still willing to talk to me after the above posting, is your decision, and I'll respect it either way. And yet, one observation of mine is also, the idea that none of us would know where to look next, for the specific problems which I'm experiencing. When and if we do, I'll also be happy to reactivate Einstein@Home.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6542
Credit: 287070031
RAC: 97054

RE: Well I'm afraid that

Message 98181 in response to message 98179

Quote:

Well I'm afraid that you've posted an invalid solution to this problem.

I allowed the Einstein@Home project to run overnight, and the same error took place as did before (no change). Therefore, since you people are unwilling to consider the real problems behind your software, I see no choice but to suspend your project indefinitely.

Dirk


As E@H doesn't write or maintain or distribute BOINC, may I recommend that you direct your concerns to those that do eg. the BOINC forums as per the link given by Ageless? :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

That doesn't help, Mike, as

That doesn't help, Mike, as it's an application error. His latest was exit with code 4, which is in my FAQs. I did email Bernd about it already as he wants to know.

Dirk, I see you're still running BOINC 5.4.11
Have you ever thought of upgrading? We are at 6.10.56 at this time...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.