This is the small comparison test of akosf optimized applications-> albert_4.37_x86 for Windoze
BOINC 5.2.13 optimized by truX
CPU,RAM,Instruction set:
Generating the "original, HT on" case will have to wait. Using StoneLord's method to re-measure the same Result probably gives a more accurate measure of Hyperthreading gain for this system on this application than that I posted in the HT thread some days back--18.6% more science output per hour in HT than not on Einstein using akosf S-40.
At face value, my Gallatin system (with slower RAM than many Gallatin systems) got slightly better speedup over distribution than did StoneLord's Northwood, though it could be within remaining measurement noise. It just barely missed the magic 4x speedup. My previous conclusion from serial multiplication of five successive speedup ratios (in HT mode) was just slightly better than 4x, so not much different with the more precise measurement method.
P4; 3,0 GHz; RAM 1 Gb; use 1 proc of HT:
WU r1_1425 (average of 6 rez.)
app__| time |Faster orig, in
Orig.|17246 | 1
s39L_| 4770 | 3,616
s40__| 4496 | 3,835
Athlon XP 2600+ (Thunderbred) (266 Mhz), 512 Ram:
WU r1_1131 (average of 3 rez.)
app__| time |Faster orig, in
Orig.|16135 |
s39L_| 4543 | 3,552
D40__| 4332 | 3,724
s40__| 4156 | 3,882
This results only for my comps and on other they can be another...
Now testing S40.003 on both comp. And test not end on another P4 using 2 proc of HT with S40.
If your CPU supports SSE,
)
If your CPU supports SSE, then run S40, if it supports 3DNow run D40. (If it supports both, run S40)
...and for older processors
)
...and for older processors supporting non of them, run C40...
...working on next test with
)
...working on next test with P4 2.4GHz, results will be soon (in a day)
This is the small comparison
)
This is the small comparison test of akosf optimized applications-> albert_4.37_x86 for Windoze
BOINC 5.2.13 optimized by truX
CPU,RAM,Instruction set:
Intel P4 (Notrhwood) 2.4 GHz
512MB RAM PC133
MMX,SSE,SSE2
Tested WU:
(short) z1_0261.5_2539_S4R2a_1
-------------------------------------------------
|version |time[s]| curr./orig.| speedup |
|-----------------------------------------------|
|original | 6496 | 1.0000 | 1.0000 |
|-----------------------------------------------|
|387 | 3237 | 0.4983 | 2.0068 |
|C-37 | 3227 | 0.4968 | 2.0130 |
|C-40 | 2545 | 0.3918 | 2.5525 |
|-----------------------------------------------|
|S-37a | 3402 | 0.5237 | 1.9095 |
|-----------------------------------------------|
|S-38 | 2413 | 0.3715 | 2.6921 |
|S-39 | 1962 | 0.3020 | 3.3109 |
|S-39L | 1806 | 0.2780 | 3.5969 |
|S-40 | 1798 | 0.2768 | 3.6129 |
-------------------------------------------------
So the best results gives S-40 with SSE 3.6129 x faster than original
But between S-39L and S-40 is only small difference
Tested app.
)
Tested app. albert_4.50_windows_intelx86.zip
Tested WU:
(short) z1_0261.5_2539_S4R2a_1
Athlon XP
------------------------------------------
|version |time[s]| curr./orig.| speedup |
|----------------------------------------|
|4.37 | 4553 | 1.0000 | 1.0000 |
|----------------------------------------|
|4.50 | 2990 | 0.6567 | 1.5227 |
------------------------------------------
P4 (Northwood)
------------------------------------------
|version |time[s]| curr./orig.| speedup |
|----------------------------------------|
|4.37 | 6496 | 1.0000 | 1.0000 |
|----------------------------------------|
|4.50 | 4173 | 0.6424 | 1.5567 |
------------------------------------------
A comparison in the StoneLord
)
A comparison in the StoneLord style
albert_4.37_x86
Windows XP Pro
Intel P4 EE (Gallatin) 3.2 GHz
2 Gbyte 133MHz DDR SDRAM
Tested WU:
(short) r1_0265.5_2113_S4R2a_0
--------------------------------------
|version | HT |time[s]| curr./orig.|
|-------------------------------------
|original | off | 4630 | 1.0000 |
|-------------------------------------
|S-40 | off | 1159 | 0.2503 |
|-------------------------------------
|original | on | TBD | 1.0000 |
|-------------------------------------
|S-40 | on | 1954 | 0.2503 |
|-------------------------------------
Generating the "original, HT on" case will have to wait. Using StoneLord's method to re-measure the same Result probably gives a more accurate measure of Hyperthreading gain for this system on this application than that I posted in the HT thread some days back--18.6% more science output per hour in HT than not on Einstein using akosf S-40.
At face value, my Gallatin system (with slower RAM than many Gallatin systems) got slightly better speedup over distribution than did StoneLord's Northwood, though it could be within remaining measurement noise. It just barely missed the magic 4x speedup. My previous conclusion from serial multiplication of five successive speedup ratios (in HT mode) was just slightly better than 4x, so not much different with the more precise measurement method.
Test of Akosf app. P4; 3,0
)
Test of Akosf app.
P4; 3,0 GHz; RAM 1 Gb; use 1 proc of HT:
WU r1_1425 (average of 6 rez.)
app__| time |Faster orig, in
Orig.|17246 | 1
s39L_| 4770 | 3,616
s40__| 4496 | 3,835
Athlon XP 2600+ (Thunderbred) (266 Mhz), 512 Ram:
WU r1_1131 (average of 3 rez.)
app__| time |Faster orig, in
Orig.|16135 |
s39L_| 4543 | 3,552
D40__| 4332 | 3,724
s40__| 4156 | 3,882
This results only for my comps and on other they can be another...
Now testing S40.003 on both comp. And test not end on another P4 using 2 proc of HT with S40.
Sanks Akosf, for his very good work!
--------------------------
Sorry, for my english.