S40.12 Observation thread

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7057494931
RAC: 1604227

RE: Wow, those results are

Message 28354 in response to message 28353

Quote:
Wow, those results are really consistant in time. Even my dedicated crunching box, a 1.5gig athlon has noise in the +-5 minute range.

It is my impression from observation on my own boxes that some sections of some datafiles are more consistent in required CPU from result to result than others. Even my Win98SE machines, with their notoriously inaccurate CPU time reporting, sometimes get a string of remarkably close CPU times when I'm not using them myself at all, and they get a lucky string of results.

And, yes, Ziegenmelkers latest report is again quite persuasive for S40.12 time improvement on that specific machine. It starts to appear that S40.12 and the more modern big-cache hyper-threaded Intel P4s (both Gallatin and Prescott) are not a good match for some reason.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7057494931
RAC: 1604227

RE: I did not do a

Message 28355 in response to message 28342

Quote:
I did not do a controlled rerun, but the first five results returned running S40.12 on my Gallatin (Northwood-descended P4 EE 8k L1, 512k L2, 2M L3 cache) are definitely slower than most recent results from the same two major datafiles.

During the outage, I did do a controlled run of a short Einstein result (r1_0265.5_2133_S4R2a_0) on my Gallatin (Northwood-descended 2M L3 cache--the first P4 EE).

hyperthreaded, with the other BOINC job also an Einstein job:

S40.12 2345 seconds
S40.04 1933 seconds

so S40.12 was 1.21 times the execution time! Pretty high prices.

(though I did not rerun it, I had previous measured 1954 seconds for this result using S-40).

To recapitulate, my two Pentium III's showed slight improvement with S40.12 compared to S40.04. my Banias Pentium M a slight degradation, and my Gallatin Pentium 4 a large degradation. While it may be that the amount of degradation depends highly on the Work Unit, it seems more likely to me that it depends highly on the processor architecture, and perhaps memory speed. (my Gallatin is served by slower FSB memory than most--would be good to hear from other Gallatin owners). I've reverted the Banias and Gallatin machines to S40.04.

szshell
szshell
Joined: 7 Mar 05
Posts: 4
Credit: 9426215
RAC: 0

RE: RE: agree, definitely

Message 28356 in response to message 28347

Quote:
Quote:
agree, definitely slower on my P4 2.8C HT enabled

Same with my P4 with HT on.

My other P4 without HT is faster with S40.12.

All work done so far with S40.12 is without errors.

Anders n

Edit info

On my xp1800+, time cost from 92 mins(40.04) to 71 mins(40.12), very great improvment.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 162

My a64x2's done enough work

My a64x2's done enough work that I'm seeing a ~10-20% speedup, 10 on the bigs, 20 on the short WUs.

Knorr
Knorr
Joined: 18 Feb 06
Posts: 16
Credit: 3129905
RAC: 0

My system also seems to

My system also seems to crunch S40.12 faster than S40.04

S40.04 ran about 4100 sec.
S40.12 is about 3900 sec.

The CPU is a Athlon XP 2200+

I've had 2 out of 4 WU's that errored out though.

The CPU is OC'ed a bit. Only the multiplier hence no memory OC.
It hasn't been a problem with any BOINC project before.

Is the S40.12 more stressfull than S40.04?

- Knorr

Brickhead
Brickhead
Joined: 7 Mar 05
Posts: 4
Credit: 69397814
RAC: 4168

Could it be that although the

Could it be that although the double-size lookup table comes with a speed penalty, the reduced cache load more than outweighs this on some CPUs?

Nightbird
Nightbird
Joined: 17 Feb 05
Posts: 79
Credit: 561723
RAC: 0

My Athlon 64 3200+ is faster

My Athlon 64 3200+ is faster with S40.12 than S40.03

in short :
4044 sec to 4047 sec with S-39L and wus r1_1220
3706 sec to 3877 sec with S40.03 and wus z1_1174
now 3503 sec to 3521 sec with S40.12 and wus z1_1174

[

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2770411051
RAC: 913259

I've now got a couple of

I've now got a couple of sequences, which both show good improvements for S40.12 over S40.04 (I'll leave you to work out the percentages...).

Both machines are stock Dell motherboards, PowerEdge / Dimension respectively.

W2K Server 475735 - P4 Northwood SSE2, 1.8 GHz, 512KB L2 cache

r1_1255.0__344_S4R2a_2 --- 7602 --- S40.04
r1_1255.0__343_S4R2a_2 --- 7592 --- S40.04
r1_1255.0__342_S4R2a_2 --- 7592 --- S40.04
r1_1255.0__341_S4R2a_2 --- 7596 --- S40.04
r1_1255.0__278_S4R2a_1 --- 7149 --- S40.04
r1_1255.0__266_S4R2a_2 --- 7136 --- S40.04
r1_1255.0__247_S4R2a_2 --- 7137 --- S40.04

r1_1255.0__228_S4R2a_2 --- 6796 --- S40.04/12 mixed

r1_1390.0__1417_S4R2a_3 --- 6563 --- S40.12
r1_1390.0__1401_S4R2a_1 --- 6570 --- S40.12
r1_1390.0__1397_S4R2a_0 --- 6568 --- S40.12
r1_1390.0__1393_S4R2a_1 --- 6569 --- S40.12
r1_1390.0__1389_S4R2a_1 --- 6570 --- S40.12
r1_1390.0__1386_S4R2a_3 --- 6567 --- S40.12
r1_1390.0__1381_S4R2a_1 --- 6570 --- S40.12
r1_1390.0__1378_S4R2a_2 --- 6568 --- S40.12
r1_1390.0__1377_S4R2a_2 --- 6568 --- S40.12

XP SP2 Workstation 475717 - P4 Northwood SSE2, 2.0 GHz, 512KB L2 cache

r1_1200.0__394_S4R2a_1 --- 6549 --- S40.04
r1_1200.0__393_S4R2a_1 --- 6738 --- S40.04
r1_1200.0__392_S4R2a_1 --- 6750 --- S40.04
r1_1200.0__391_S4R2a_0 --- 6614 --- S40.04
r1_1200.0__390_S4R2a_0 --- 6637 --- S40.04
r1_1200.0__389_S4R2a_0 --- 6649 --- S40.04
r1_1200.0__388_S4R2a_0 --- 6610 --- S40.04

r1_1200.0__387_S4R2a_0 --- 6671 --- S40.04/12 mixed

z1_0874.0__23_S4R2a_2 --- 5566 --- S40.12
z1_0874.0__20_S4R2a_1 --- 5466 --- S40.12
z1_0874.0__17_S4R2a_2 --- 5488 --- S40.12
r1_1200.0__324_S4R2a_1 --- 6129 --- S40.12
z1_0874.0__11_S4R2a_0 --- 5468 --- S40.12
z1_0874.0__9_S4R2a_0 --- 5530 --- S40.12
z1_0874.0__8_S4R2a_0 --- 5445 --- S40.12
r1_1200.0__284_S4R2a_0 --- 5762 --- S40.12
r1_1200.0__277_S4R2a_1 --- 5764 --- S40.12

So many thanks and congratulations to Akosf, yet again - keep up the good work!

(P.S. ignore the timings shown on results pages - I'm using Trux's tx36 calibrationg client, and it 'tweaks' the timings.)

rbpeake
rbpeake
Joined: 18 Jan 05
Posts: 266
Credit: 979219760
RAC: 648528

RE: Could it be that

Message 28362 in response to message 28359

Quote:
Could it be that although the double-size lookup table comes with a speed penalty, the reduced cache load more than outweighs this on some CPUs?


I was thinking the same thing, but it seems to also be dependent on the chip design. My AMD 3500+ has sped up nicely, as have a number of related chips it seems.

Knorr
Knorr
Joined: 18 Feb 06
Posts: 16
Credit: 3129905
RAC: 0

It doesn't seem to work

It doesn't seem to work properly on my system.

The CPU is overclocked a bit. Only by multiplier so no tweak of FSB/memory.

Out of a batch of 7 WU's.

These 3 errored:

26295992
26344584
26445945

1 turned out invalid:

26295998

And the last 3 is currently pending:

26416342
26416347
26423016

The speed is great though... ;)

But I'm gonna switch to D40 for a while.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.