S38 Observation thread

Rudy
Rudy
Joined: 12 Dec 05
Posts: 33
Credit: 3747406
RAC: 1355
Topic 190937

Looks good so far on my P4.

Very fast.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7211864931
RAC: 936591

S38 Observation thread

I've replaced akosf C-37 with S-38 on all four of my machines just a couple of hours ago.

wcpuid reports that all support SSE, (though the two oldest don't support SSE2).

All four have completed one mixed result (started on C-37, finished on S-38), without any observed abnormal behavior.

Initial indications are of appreciable further speedup compared to C-37--I'll report numbers and validation here when I have them.

The machines which appear to be working and further sped up include:

P4 EE WinXPPro
Pentium M (of the initial Banias generation) WinXPPro
Pentium 3 Win98SE
Pentium II Win98SE

Extremely preliminary observation suggests that the Banias part may be getting the biggest benefit, and the P4 the least, but all look well worth having assuming validation and stability prove to be OK.

Stephen R
Stephen R
Joined: 10 Dec 05
Posts: 28
Credit: 7008154
RAC: 0

AMD XP 2600+ C37->S38

AMD XP 2600+ C37->S38 (approx 25%)increase - no probs - validating fine
AMD 64 X2 3800+ S37a->S38 (approx 26%)increase - no probs - validating fine

:-) you did it akosf! XP 2600+ S38(SSE) beat the X2 3800+ S37a(SSE2)
Looking foward to your next SSE2 opt. hope it won't need more than 512K L2
Well done again! Keep having fun!

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

P3T 1.26GHz (512kB)

P3T 1.26GHz (512kB) C37:13250s; S38: 8510s -> -36%
P3mobile 1.0GHz (256kB) C37: 16230s; S38: 10720s -> -34%

Both are averages on the same WU size each.
Well done akosf! :-)

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7211864931
RAC: 936591

Pentium M Banias crunching

Pentium M Banias crunching major datafile 843.0
C-38 gives 61.6% of CPU time for 5 most recent C-37

Overall implied improvement compared to the official distributed Albert 4.37 is thus .438*.616= .270 of previous CPU time, or science output improvement on this machine by a factor of 3.71!

Initial indications are that this may be the best of my four machines of varying Intel architecture in speedup.

akosf's contribution to Einstein if somehow his work runs on a noticeable fraction of the user base is stunning.

Big Blue
Big Blue
Joined: 3 Mar 05
Posts: 3
Credit: 7176937
RAC: 0

RE: (approx 26%)increase -

Quote:
(approx 26%)increase - no probs - validating fine

Where can I get it ?

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

RE: Where can I get it

Message 26124 in response to message 26123

Quote:
Where can I get it ?


http://einsteinathome.org/node/190906

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7211864931
RAC: 936591

RE: Initial indications are

Message 26125 in response to message 26119

Quote:
Initial indications are of appreciable further speedup compared to C-37--I'll report numbers and validation here when I have them.


[pre]
CPU S-38/C-37 C-37/Dist S-38/Dist
Pentium M 0.616 0.448 0.276
Pentium III 0.676 0.383 0.259
[/pre]

So far my S-38/C-37 reports are based on a single "pure" S-38 result per CPU.
By "Dist" I mean performance on the unmodified Albert 4.37 science application as distributed by the project.

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 598458
RAC: 773

First result on my sempron

First result on my sempron 3000+

S-38/C-37
0.735

Edit
Interesting, so far Intel based hosts ~ 33% faster, AMD ~25%. (S-38/C-37)

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Nightbird
Nightbird
Joined: 17 Feb 05
Posts: 79
Credit: 561723
RAC: 0

First Results on a Barton

First Results on a Barton 3000+ :
"before" with A36 : ~ 2100 - 2200 sec
with S38 : 1,520.57 sec ; 1,627.00 sec ; 1,661.00 sec

First Results on a Barton 2500+ :
"before" with C37 : 6734 sec -> 7027 sec
with S38 : 5,161.00 sec ; 5,132.00 sec

Pretty fast :)

[

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: Looking foward to your

Message 26128 in response to message 26120

Quote:
Looking foward to your next SSE2 opt. hope it won't need more than 512K L2
Well done again! Keep having fun!

Bruce suggested to me Chebyshev polinomials instead of Taylor series a month ago. I did a fast test to compare these methods, but I found that Chebyshev approximation produced worse average by same number of coefficients. I'm working on a program that will generate more precise values. I belive that it has to be better. So, if it works, that means we don't need the 512kB size look-up table (very-very wild idea from me).
I prefer polinomials than look-up table, because my Durons have just 256kB cache, altogether. :-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.