I have seen some reference to this in the past in other threads. I have now got three machines which are sitting idle because of "no heartbeat" - boinc versions 5.8... and 5.10...
All are running the 4.38 app and have been doing quite well until the last few days.
What to do?
Here is a transcript from one machine, file - slots/0/stderr.txt :
Detected CPU type 1
2008-03-13 21:59:26.0190 [normal]: Built at: Feb 21 2008 15:57:05
2008-03-13 21:59:26.0190 [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/einstein_S5R3_4.38_i686-pc-linux-gnu_1'.
2008-03-13 21:59:26.0197 [debug]: Set up communication with graphics process.
2008-03-13 21:59:26.1184 [debug]: Reading SFTs and setting up stacks ... done
2008-03-13 21:59:35.1614 [debug]: Successfully read checkpoint
2008-03-13 21:59:35.1615 [debug]: Total skypoints = 1205. Progress: 356,
$Revision: 1.115 $ OPT:3 SCV:9, SCTRIM:2, HLV:3, HP:7
357, No heartbeat from core client for 30 sec - exiting
Detected CPU type 1
2008-03-13 22:00:24.2716 [normal]: Built at: Feb 21 2008 15:57:05
2008-03-13 22:00:24.2717 [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/einstein_S5R3_4.38_i686-pc-linux-gnu_1'.
2008-03-13 22:00:24.2725 [debug]: Set up communication with graphics process.
2008-03-13 22:00:24.3716 [debug]: Reading SFTs and setting up stacks ... done
2008-03-13 22:00:33.4476 [debug]: Successfully read checkpoint
2008-03-13 22:00:33.4477 [debug]: Total skypoints = 1205. Progress: 356,
$Revision: 1.115 $ OPT:3 SCV:9, SCTRIM:2, HLV:3, HP:7
357, No heartbeat from core client for 30 sec - exiting
Copyright © 2024 Einstein@Home. All rights reserved.
No heartbeat - ??? - on three of my machines
)
My own mistake, I'm afraid; I changed ownership of the BOINC tree and that screwed things up. I changed ownership back (pretty much !!) to what it had been and it is at least happy from the "no heartbeat" point of view. Hopefully now it'll find a pulsar!
Never mind.
I messed with ownership why?
So I could get the "./boincmgr" (or "./run_manager" ) program to run. Why should those programs have problems that seem related to ownerships? All else I have done is create a script in "/etc/init.d" that starts the boinc process at boot. That runs as root from the init process and it seems that makes everything (almost) be owned by root. And so it must be, once it gets started that way, I guess. Why?
Oh well.....
Thanks.
.
RE: ... Why should those
)
Probably because there really is an ownership problem. :-).
If you have BOINC files created and owned by root, a program owned by an ordinary user is not going to be able to modify any of those files - eg client_state.xml would be an obvious problem. You shouldn't run BOINC with root privileges as it's a security risk. The BOINC system runs perfectly happily as an ordinary user.
Processes started this way do not have to be owned by root. Somewhere in that script you would probably have a line like
./boinc --whatever_flags_you_like
All you need to do is tell your script to start BOINC as an ordinary user
sudo -u username ./boinc --whatever_flags_you_like
So why don't you (as root) stop BOINC, modify your startup script as suggested, recursively change the ownership of every file in your BOINC tree from root to whatever ordinary user you intend to use and then restart BOINC as that ordinary user?
It doesn't have to be that way at all. The above procedure will work to change you from running as root to running as an ordinary user. Just use chown with the recursive flag to make the ownership change quite painless.
Cheers,
Gary.
OK I will try this. I think
)
OK I will try this. I think there might be something other than ownerships going on, however. More investigation...
Thanks.