GNU/Linux S5R3 "power users" App 4.35 available

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690720723
RAC: 269614

RE: The advantage that the

Message 79527 in response to message 79524

Quote:
The advantage that the PIII has is that it's only got a 10-stage pipeline vs something in the 20's for the Core 2 Duo architechture.


I think the pipeline length of Core 2 family CPUs is more like 13 or 14, it was the Pentium 4 that started with a 20 stage pipeline.

I should have made this clearer: The reason for the poor performance of teh Cor2 2 CPU here compared to teh P-III is the fact that here a stock, non-SSE version is used, as compared to a more optimized app under Linux on the P-III. This was to hoghlight how the optimization can make older hardware competitive again :-)

CU
Bikeman

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

First 4.35 WU crunched and

First 4.35 WU crunched and reported. It certainly shows a speed increase, I'd say roughly 20% compared to 4.27... since the WUs were from the same frequency, it should be about comparable, shouldn't it? Of course, if someone with a deeper understanding of the maths involved wants to have a look, I'd appreciate...
I also noticed a somewhat higher average number of sky positions between checkpoints, btw, which I normally find a reliable sign of a speed increase.
I had one "no heartbeat from core client" error, but that occasionally happened with the predecessor apps as well on this box and there don't seem to be any significant consequences unless you count having to restart from the last checkpoint.
If this WU is indeed just as "long" as my others, I must say this app really rocks performance-wise :-D
Should be the same kind of CPU Bikeman mentioned earlier so it's no wonder it reacts about the same...

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

It seems that the signal 11

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Strange, I haven't

Strange, I haven't experienced any of those yet and this box was very prone to this kind of error in the past. I did have a "no heartbeat from core client", which would probably have been a signal 11 with the pre-4.24 apps, but nothing worse... maybe it's not the same problem after all? Afaik, a signal 11 can be more or less anything.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245263884
RAC: 12585

RE: It seems that the

Message 79531 in response to message 79529

Quote:

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11


Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?

BM

BM

Conan
Conan
Joined: 19 Jun 05
Posts: 172
Credit: 7178839
RAC: 1270

> I have experienced a lot of

> I have experienced a lot of the "no heartbeat" messages recently. Seems to come in phases and the Linux machine does not lose the result but gets a heartbeat sooner or later and then completes.
A Windows AMD host I have gets heaps more and sometimes trashes a WU as a result.

For me this happens on any project at any time and with different Boinc Clients, from 5.10.15, 5.10.21 and 5.10.38.

Sometimes on Hydrogen@home I can get a failure rate as high as 50% of WU's and half of those failure have been "no heartbeat" messages.
But it does not happen all the time so can't track it down very easily.

I was under the impression it was the Boinc client losing contact with the process for a while and not knowing what it is doing, it later catches up to itself and all things go back to normal.

I have now on 3 occasions had my data reset in Boinc Manager on some CPDN work units for no apparent reason with 2 different Boinc Clients, this is why I suspect Boinc Client not the project application.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245263884
RAC: 12585

RE: RE: It seems that the

Message 79533 in response to message 79531

Quote:
Quote:

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11


Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?


Actually it isn't quite the same. A loss of heartbeat (i.e. stopping the client) leads to a "normal" exit of the App and a restart after the client is back at my machine (Pentium-M (1 core), BOINC 5.10.21). No segfault. Not that easy...

BM

BM

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: RE: It seems

Message 79534 in response to message 79533

Quote:
Quote:
Quote:

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11


Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?


Actually it isn't quite the same. A loss of heartbeat (i.e. stopping the client) leads to a "normal" exit of the App and a restart after the client is back at my machine (Pentium-M (1 core), BOINC 5.10.21). No segfault. Not that easy...

BM

If it helps, here are the error messages from when this happened.

Wed 27 Feb 2008 05:45:44 AM EST|Einstein@Home|Sending scheduler request: To fetch work. Requesting 81 seconds of work, reporting
0 completed tasks
Wed 27 Feb 2008 05:46:24 AM EST|Einstein@Home|Task h1_0847.50_S5R3__434_S5R3b_0 exited with zero status but no 'finished' file
Wed 27 Feb 2008 05:46:24 AM EST|Einstein@Home|If this happens repeatedly you may need to reset the project.
Wed 27 Feb 2008 05:46:24 AM EST||Project communication failed: attempting access to reference site
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Restarting task h1_0847.50_S5R3__434_S5R3b_0 using einstein_S5R3 version 435
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Scheduler request failed: Couldn't resolve host name
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Computation for task h1_0847.50_S5R3__430_S5R3b_1 finished
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Output file h1_0847.50_S5R3__430_S5R3b_1_0 for task h1_0847.50_S5R3__430_S5R3b_1 ab
sent
Wed 27 Feb 2008 05:46:45 AM EST||Access to reference site succeeded - project servers may be temporarily down.
Wed 27 Feb 2008 05:47:29 AM EST|Einstein@Home|Sending scheduler request: To fetch work. Requesting 30284 seconds of work, report
ing 1 completed tasks
Wed 27 Feb 2008 05:47:34 AM EST|Einstein@Home|Scheduler request succeeded: got 1 new tasks
Wed 27 Feb 2008 05:47:36 AM EST|Einstein@Home|Starting h1_0847.50_S5R3__417_S5R3b_1
Wed 27 Feb 2008 05:47:37 AM EST|Einstein@Home|Starting task h1_0847.50_S5R3__417_S5R3b_1 using einstein_S5R3 version 435
Wed 27 Feb 2008 05:48:35 AM EST|Einstein@Home|Sending scheduler request: To fetch work. Requesting 28 seconds of work, reporting
0 completed tasks
Wed 27 Feb 2008 05:48:40 AM EST|Einstein@Home|Scheduler request succeeded: got 1 new tasks

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Looks like internet

Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690720723
RAC: 269614

RE: Looks like internet

Message 79536 in response to message 79535

Quote:
Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...

Maybe it's more likely to happen on multi-cores.... I'lll do some tests

CU
Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.