GNU/Linux S5R3 "power users" App 4.35 available

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: Looks like

Message 79537 in response to message 79536

Quote:
Quote:
Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...

Maybe it's more likely to happen on multi-cores.... I'lll do some tests

CU
Bikeman

My machine that had this error isn't multi-core, but it is multiprocessor. (It's a dual P-III 700 setup.) But then, the effect could still be the same.

rroonnaalldd
rroonnaalldd
Joined: 12 Dec 05
Posts: 116
Credit: 537221
RAC: 0

RE: RE: RE: Looks like

Message 79538 in response to message 79537

Quote:
Quote:
Quote:
Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...

Maybe it's more likely to happen on multi-cores.... I'lll do some tests

CU
Bikeman

My machine that had this error isn't multi-core, but it is multiprocessor. (It's a dual P-III 700 setup.) But then, the effect could still be the same.

I'm crunching here with a E6320 running virtualized Suse64-10.2 with 32bit-4.35-app and had in the moment no problems. All web-access will be routed thru a webserver/socks5-proxy + auth and a "Fritz!Box 7170" for DSL.
My problem was ever after a clean boinc-install to connect to boinc with proxy -> nOK. I tried without proxy -> nOK, closing boinc, restart boinc, connect ok. Proxy set up in boinc, no connect, closing boinc, restart boinc, ahhh =>> connect OK.

edit:
Today a "DL380 G2" comes to my sweet home and is still waiting for setup for Boinc. Now i'm thinking about where to place or hang especially with his incredible noise from 10 aircoolers...

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244932456
RAC: 16338

The 4.35 App has become part

The 4.35 App has become part of the new 4.38 beta App package.

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686138939
RAC: 552243

RE: RE: It seems that the

Message 79540 in response to message 79531

Quote:
Quote:

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11


Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?

BM

Hi!

I could not reproduce it so far. I suspect that this might be related to a problem of wrong libs installed. Note that the first signal 11 problem reported by Donald had this debug message pronted before going down:

libgcc_s.so.1 must be installed for pthread_cancel to work

Donald, can you check whether this lib is accessible on the troubled PC ?

CU
Bikeman

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: RE: It seems

Message 79541 in response to message 79540

Quote:
Quote:
Quote:

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11


Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?

BM

Hi!

I could not reproduce it so far. I suspect that this might be related to a problem of wrong libs installed. Note that the first signal 11 problem reported by Donald had this debug message pronted before going down:

libgcc_s.so.1 must be installed for pthread_cancel to work

Donald, can you check whether this lib is accessible on the troubled PC ?

CU
Bikeman

I just used "find" to double-check, and it's the first entry in the results list. Besides, this machine just has a standard installation of Ubuntu, and this is the only Einstein problem I've ever had with it.

The second machine has a standard installation of CentOS 5.1, so I wouldn't think that its problem would be library-related, either.

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I'm still having this

I'm still having this annoying problem that I get frequent "no heartbeat from core client" errors when my internet fails. Since I have serious problems with my ISP atm I get about ten or more of those errors per WU. While this is way better than losing the whole WU to a signal 11 my impression is that the frequent restarting from checkpoint makes me use several 1000 seconds of crunch time per WU. Don't misunderstand me, I'm not blaming the project devs or so (in fact, I think this is a problem with the BOINC core client rather than the app) but doesn't anyone have an idea what could be done about it?
EDIT: This is what it looks like
http://einsteinathome.org/task/92889425
other WUs not reported yet but stderr.txt is showing roughly the same.

rroonnaalldd
rroonnaalldd
Joined: 12 Dec 05
Posts: 116
Credit: 537221
RAC: 0

Good news from the

Good news from the boinc-dev's. 5.10.44 brings back synchronous dns lookups to the community.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686138939
RAC: 552243

RE: Good news from the

Message 79544 in response to message 79543

Quote:
Good news from the boinc-dev's. 5.10.44 brings back synchronous dns lookups to the community.

That's asynchronous / nonblocking DNS lookup, hopefully :-)

CU

Bikeman

Keck_Komputers
Keck_Komputers
Joined: 18 Jan 05
Posts: 376
Credit: 5744955
RAC: 0

RE: RE: Good news from

Message 79545 in response to message 79544

Quote:
Quote:
Good news from the boinc-dev's. 5.10.44 brings back synchronous dns lookups to the community.

That's asynchronous / nonblocking DNS lookup, hopefully :-)

CU

Bikeman


Nope synchronous is back, win64 version at least. Asynchronous was causing hangs of the BOINC client on the 64bit versions. I'm sure we will try it again the next time libCURL is updated.

BOINC WIKI

BOINCing since 2002/12/8

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686138939
RAC: 552243

Hi! So why is this good

Hi!

So why is this good news? Synchronous DNS lookup causes lots of "no heartbeat for 30 seconds, exiting" incidents (at least it does so under Linus) where all apps running under BOINC exit whenever there's a network problem. Looks more like a choice between two evils.

CU
Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.