GNU/Linux S5R3 "power users" App 4.35 available

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 753262758

RAC: 1191838

RE: The advantage that the

25 Feb 2008 13:48:15 UTC

Message 79527 in response to message 79524

(moderation:

)

Quote:

The advantage that the PIII has is that it's only got a 10-stage pipeline vs something in the 20's for the Core 2 Duo architechture.

I think the pipeline length of Core 2 family CPUs is more like 13 or 14, it was the Pentium 4 that started with a 20 stage pipeline.

I should have made this clearer: The reason for the poor performance of teh Cor2 2 CPU here compared to teh P-III is the fact that here a stock, non-SSE version is used, as compared to a more optimized app under Linux on the P-III. This was to hoghlight how the optimization can make older hardware competitive again :-)

CU
Bikeman

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

First 4.35 WU crunched and

25 Feb 2008 14:41:46 UTC

Message 79528

(moderation:

)

First 4.35 WU crunched and reported. It certainly shows a speed increase, I'd say roughly 20% compared to 4.27... since the WUs were from the same frequency, it should be about comparable, shouldn't it? Of course, if someone with a deeper understanding of the maths involved wants to have a look, I'd appreciate...
I also noticed a somewhat higher average number of sky positions between checkpoints, btw, which I normally find a reliable sign of a speed increase.
I had one "no heartbeat from core client" error, but that occasionally happened with the predecessor apps as well on this box and there don't seem to be any significant consequences unless you count having to restart from the last checkpoint.
If this WU is indeed just as "long" as my others, I must say this app really rocks performance-wise :-D
Should be the same kind of CPU Bikeman mentioned earlier so it's no wonder it reacts about the same...

Donald A. Tevault

Joined: 17 Feb 06

Posts: 439

Credit: 73516529

RAC: 0

It seems that the signal 11

27 Feb 2008 10:55:31 UTC

Message 79529

(moderation:

)

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

Strange, I haven't

27 Feb 2008 11:02:17 UTC

Message 79530

(moderation:

)

Strange, I haven't experienced any of those yet and this box was very prone to this kind of error in the past. I did have a "no heartbeat from core client", which would probably have been a signal 11 with the pre-4.24 apps, but nothing worse... maybe it's not the same problem after all? Afaik, a signal 11 can be more or less anything.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251369047

RAC: 36364

RE: It seems that the

27 Feb 2008 11:17:30 UTC

Message 79531 in response to message 79529

(moderation:

)

Quote:

It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11

Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?

Conan

Joined: 19 Jun 05

Posts: 172

Credit: 8431534

RAC: 4563

> I have experienced a lot of

27 Feb 2008 11:21:22 UTC

Message 79532

(moderation:

)

> I have experienced a lot of the "no heartbeat" messages recently. Seems to come in phases and the Linux machine does not lose the result but gets a heartbeat sooner or later and then completes.
A Windows AMD host I have gets heaps more and sometimes trashes a WU as a result.

For me this happens on any project at any time and with different Boinc Clients, from 5.10.15, 5.10.21 and 5.10.38.

Sometimes on Hydrogen@home I can get a failure rate as high as 50% of WU's and half of those failure have been "no heartbeat" messages.
But it does not happen all the time so can't track it down very easily.

I was under the impression it was the Boinc client losing contact with the process for a while and not knowing what it is doing, it later catches up to itself and all things go back to normal.

I have now on 3 occasions had my data reset in Boinc Manager on some CPDN work units for no apparent reason with 2 different Boinc Clients, this is why I suspect Boinc Client not the project application.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251369047

RAC: 36364

RE: RE: It seems that the

27 Feb 2008 14:18:36 UTC

Message 79533 in response to message 79531

(moderation:

)

Quote:

Quote:
It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11

Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?

Actually it isn't quite the same. A loss of heartbeat (i.e. stopping the client) leads to a "normal" exit of the App and a restart after the client is back at my machine (Pentium-M (1 core), BOINC 5.10.21). No segfault. Not that easy...

Donald A. Tevault

Joined: 17 Feb 06

Posts: 439

Credit: 73516529

RAC: 0

RE: RE: RE: It seems

27 Feb 2008 15:33:00 UTC

Message 79534 in response to message 79533

(moderation:

)

Quote:

Quote:
Quote:
It seems that the signal 11 problem is back. This is my second one since switching to the 4.35 app. (I had none at all with the 4.27 app.)

2nd Signal 11

Pooh - looks like the same issur (after loss of heartbeat).

We should look into this again - Bikeman, do you have time for another debugger session?

Actually it isn't quite the same. A loss of heartbeat (i.e. stopping the client) leads to a "normal" exit of the App and a restart after the client is back at my machine (Pentium-M (1 core), BOINC 5.10.21). No segfault. Not that easy...

BM

If it helps, here are the error messages from when this happened.

Wed 27 Feb 2008 05:45:44 AM EST|Einstein@Home|Sending scheduler request: To fetch work. Requesting 81 seconds of work, reporting
0 completed tasks
Wed 27 Feb 2008 05:46:24 AM EST|Einstein@Home|Task h1_0847.50_S5R3__434_S5R3b_0 exited with zero status but no 'finished' file
Wed 27 Feb 2008 05:46:24 AM EST|Einstein@Home|If this happens repeatedly you may need to reset the project.
Wed 27 Feb 2008 05:46:24 AM EST||Project communication failed: attempting access to reference site
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Restarting task h1_0847.50_S5R3__434_S5R3b_0 using einstein_S5R3 version 435
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Scheduler request failed: Couldn't resolve host name
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Computation for task h1_0847.50_S5R3__430_S5R3b_1 finished
Wed 27 Feb 2008 05:46:28 AM EST|Einstein@Home|Output file h1_0847.50_S5R3__430_S5R3b_1_0 for task h1_0847.50_S5R3__430_S5R3b_1 ab
sent
Wed 27 Feb 2008 05:46:45 AM EST||Access to reference site succeeded - project servers may be temporarily down.
Wed 27 Feb 2008 05:47:29 AM EST|Einstein@Home|Sending scheduler request: To fetch work. Requesting 30284 seconds of work, report
ing 1 completed tasks
Wed 27 Feb 2008 05:47:34 AM EST|Einstein@Home|Scheduler request succeeded: got 1 new tasks
Wed 27 Feb 2008 05:47:36 AM EST|Einstein@Home|Starting h1_0847.50_S5R3__417_S5R3b_1
Wed 27 Feb 2008 05:47:37 AM EST|Einstein@Home|Starting task h1_0847.50_S5R3__417_S5R3b_1 using einstein_S5R3 version 435
Wed 27 Feb 2008 05:48:35 AM EST|Einstein@Home|Sending scheduler request: To fetch work. Requesting 28 seconds of work, reporting
0 completed tasks
Wed 27 Feb 2008 05:48:40 AM EST|Einstein@Home|Scheduler request succeeded: got 1 new tasks

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

Looks like internet

27 Feb 2008 16:22:29 UTC

Message 79535

(moderation:

)

Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 753262758

RAC: 1191838

RE: Looks like internet

27 Feb 2008 17:31:53 UTC

Message 79536 in response to message 79535

(moderation:

)

Quote:

Looks like internet trouble... maybe that still plays a role?
Bernd, that's just what happens on my Core machine...

Maybe it's more likely to happen on multi-cores.... I'lll do some tests

CU
Bikeman

GNU/Linux S5R3 "power users" App 4.35 available

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner