Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...
Maybe it's more likely to happen on multi-cores.... I'lll do some tests
CU
Bikeman
My machine that had this error isn't multi-core, but it is multiprocessor. (It's a dual P-III 700 setup.) But then, the effect could still be the same.
Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...
Maybe it's more likely to happen on multi-cores.... I'lll do some tests
CU
Bikeman
My machine that had this error isn't multi-core, but it is multiprocessor. (It's a dual P-III 700 setup.) But then, the effect could still be the same.
I'm crunching here with a E6320 running virtualized Suse64-10.2 with 32bit-4.35-app and had in the moment no problems. All web-access will be routed thru a webserver/socks5-proxy + auth and a "Fritz!Box 7170" for DSL.
My problem was ever after a clean boinc-install to connect to boinc with proxy -> nOK. I tried without proxy -> nOK, closing boinc, restart boinc, connect ok. Proxy set up in boinc, no connect, closing boinc, restart boinc, ahhh =>> connect OK.
edit:
Today a "DL380 G2" comes to my sweet home and is still waiting for setup for Boinc. Now i'm thinking about where to place or hang especially with his incredible noise from 10 aircoolers...
Pooh - looks like the same issur (after loss of heartbeat).
We should look into this again - Bikeman, do you have time for another debugger session?
BM
Hi!
I could not reproduce it so far. I suspect that this might be related to a problem of wrong libs installed. Note that the first signal 11 problem reported by Donald had this debug message pronted before going down:
libgcc_s.so.1 must be installed for pthread_cancel to work
Donald, can you check whether this lib is accessible on the troubled PC ?
Pooh - looks like the same issur (after loss of heartbeat).
We should look into this again - Bikeman, do you have time for another debugger session?
BM
Hi!
I could not reproduce it so far. I suspect that this might be related to a problem of wrong libs installed. Note that the first signal 11 problem reported by Donald had this debug message pronted before going down:
libgcc_s.so.1 must be installed for pthread_cancel to work
Donald, can you check whether this lib is accessible on the troubled PC ?
CU
Bikeman
I just used "find" to double-check, and it's the first entry in the results list. Besides, this machine just has a standard installation of Ubuntu, and this is the only Einstein problem I've ever had with it.
The second machine has a standard installation of CentOS 5.1, so I wouldn't think that its problem would be library-related, either.
I'm still having this annoying problem that I get frequent "no heartbeat from core client" errors when my internet fails. Since I have serious problems with my ISP atm I get about ten or more of those errors per WU. While this is way better than losing the whole WU to a signal 11 my impression is that the frequent restarting from checkpoint makes me use several 1000 seconds of crunch time per WU. Don't misunderstand me, I'm not blaming the project devs or so (in fact, I think this is a problem with the BOINC core client rather than the app) but doesn't anyone have an idea what could be done about it?
EDIT: This is what it looks like http://einsteinathome.org/task/92889425
other WUs not reported yet but stderr.txt is showing roughly the same.
Good news from the boinc-dev's. 5.10.44 brings back synchronous dns lookups to the community.
That's asynchronous / nonblocking DNS lookup, hopefully :-)
CU
Bikeman
Nope synchronous is back, win64 version at least. Asynchronous was causing hangs of the BOINC client on the 64bit versions. I'm sure we will try it again the next time libCURL is updated.
So why is this good news? Synchronous DNS lookup causes lots of "no heartbeat for 30 seconds, exiting" incidents (at least it does so under Linus) where all apps running under BOINC exit whenever there's a network problem. Looks more like a choice between two evils.
RE: RE: Looks like
)
My machine that had this error isn't multi-core, but it is multiprocessor. (It's a dual P-III 700 setup.) But then, the effect could still be the same.
RE: RE: RE: Looks like
)
I'm crunching here with a E6320 running virtualized Suse64-10.2 with 32bit-4.35-app and had in the moment no problems. All web-access will be routed thru a webserver/socks5-proxy + auth and a "Fritz!Box 7170" for DSL.
My problem was ever after a clean boinc-install to connect to boinc with proxy -> nOK. I tried without proxy -> nOK, closing boinc, restart boinc, connect ok. Proxy set up in boinc, no connect, closing boinc, restart boinc, ahhh =>> connect OK.
edit:
Today a "DL380 G2" comes to my sweet home and is still waiting for setup for Boinc. Now i'm thinking about where to place or hang especially with his incredible noise from 10 aircoolers...
The 4.35 App has become part
)
The 4.35 App has become part of the new 4.38 beta App package.
BM
BM
RE: RE: It seems that the
)
Hi!
I could not reproduce it so far. I suspect that this might be related to a problem of wrong libs installed. Note that the first signal 11 problem reported by Donald had this debug message pronted before going down:
libgcc_s.so.1 must be installed for pthread_cancel to work
Donald, can you check whether this lib is accessible on the troubled PC ?
CU
Bikeman
RE: RE: RE: It seems
)
I just used "find" to double-check, and it's the first entry in the results list. Besides, this machine just has a standard installation of Ubuntu, and this is the only Einstein problem I've ever had with it.
The second machine has a standard installation of CentOS 5.1, so I wouldn't think that its problem would be library-related, either.
I'm still having this
)
I'm still having this annoying problem that I get frequent "no heartbeat from core client" errors when my internet fails. Since I have serious problems with my ISP atm I get about ten or more of those errors per WU. While this is way better than losing the whole WU to a signal 11 my impression is that the frequent restarting from checkpoint makes me use several 1000 seconds of crunch time per WU. Don't misunderstand me, I'm not blaming the project devs or so (in fact, I think this is a problem with the BOINC core client rather than the app) but doesn't anyone have an idea what could be done about it?
EDIT: This is what it looks like
http://einsteinathome.org/task/92889425
other WUs not reported yet but stderr.txt is showing roughly the same.
Good news from the
)
Good news from the boinc-dev's. 5.10.44 brings back synchronous dns lookups to the community.
RE: Good news from the
)
That's asynchronous / nonblocking DNS lookup, hopefully :-)
CU
Bikeman
RE: RE: Good news from
)
Nope synchronous is back, win64 version at least. Asynchronous was causing hangs of the BOINC client on the 64bit versions. I'm sure we will try it again the next time libCURL is updated.
BOINC WIKI
BOINCing since 2002/12/8
Hi! So why is this good
)
Hi!
So why is this good news? Synchronous DNS lookup causes lots of "no heartbeat for 30 seconds, exiting" incidents (at least it does so under Linus) where all apps running under BOINC exit whenever there's a network problem. Looks more like a choice between two evils.
CU
Bikeman