long WU (granted credit 167.86): CPU time dropped from arround 36000 secs to slightly less than 34000 secs - that's a 6% speedup
short WU (granted credit 19.72): CPU time dropped from arround 4200 secs to 3900 secs - that's a 7% speedup
All WUs validate.
This CPU obviously doesn't like Akos' magic as much as some other CPUs. On the other hand the biggest magic comes from SSE3 optimizations while this CPU doesn't support SSE3.
mmh, should we only link the stable releases of akos optimised patchfiles (clients)so that new people dont download a "ugly" version etc. ?
i think it is better for all.
I'm curious. How do you impliment automatic SSE/SSE3? Also should the stderr.txt file state found a cpu type 2 or something? though i think thats more to do with the detection part of the app. Anyway this is what i get-SSE3 capable machine
Quote:
2006-06-25 11:23:13.8593 [normal]: E@H S5R1 4.02 0711 TEST
2006-06-25 11:23:13.8750 [normal]: Started search at lalDebugLevel = 0
2006-06-25 11:23:14.8437 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-06-25 11:23:14.8437 [normal]: Trying to read Fstat-file into toplist ...
2006-06-25 11:23:18.0156 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-06-25 11:23:18.0156 [normal]: Resuming computation at (18390/85660327/1714784).
Detected CPU type 1
oh, thx. i should stop this
)
oh, thx.
i should stop this app and go back to 0709 ;)
S5S0007 on Pentium M 755
)
S5S0007 on Pentium M 755 (2Ghz):
long WU (granted credit 167.86): CPU time dropped from arround 36000 secs to slightly less than 34000 secs - that's a 6% speedup
short WU (granted credit 19.72): CPU time dropped from arround 4200 secs to 3900 secs - that's a 7% speedup
All WUs validate.
This CPU obviously doesn't like Akos' magic as much as some other CPUs. On the other hand the biggest magic comes from SSE3 optimizations while this CPU doesn't support SSE3.
Good work, Akos!
Metod ...
mmh, should we only link the
)
mmh, should we only link the stable releases of akos optimised patchfiles (clients)so that new people dont download a "ugly" version etc. ?
i think it is better for all.
S5T0711.dat - eliminated
)
S5T0711.dat
- eliminated double jumps
- reduced amount of FPU macro ops
- removed double loads on general purpose registers
- better SSE register usage
- reduced memory and integer register usage
- optimized branch structure
- faster FPU comparisons
- SSE3 truncation
- some reordered instructions
- automatic SSE/SSE3 usage
CPU: ALL
RE: S5T0711.dat -
)
Hallo Akos,
are in this version some improvements from S5S0007 or major in this version is test for automatic SSE/SSE3 usage?
Are you doing anything else or only make beter Boinc project's applications ? :-)
@ Akosf I'm curious. How
)
@ Akosf
I'm curious. How do you impliment automatic SSE/SSE3? Also should the stderr.txt file state found a cpu type 2 or something? though i think thats more to do with the detection part of the app. Anyway this is what i get-SSE3 capable machine
Let you know when its finished.
I tested some different SSE3
)
I tested some different SSE3 versions, all with valid results. You see an overview here.
Very fine to see the progress in speed. :-))
cu,
Michael
RE: I think if the
)
I compare result which produce some opti app. (I write here only first row)
WU (output file): l1_0229.0_S5R1__2611_S5R1a_1_0
Original app:
229.499118078 0.010001 -0.843352 -1.7676e-009 32.6857
S5S007:
229.499118078 0.010001 -0.843352 -1.7676e-009 32.6857
S5T0308:
229.499115044 4.87049 0.871399 6.8924e-011 654.035
That isn't problem of one digit. S5T0308 produce totally incompatible results! I don't know whay?
With me now run S5T0307 for
)
With me now run S5T0307 for SSE and S5T0709 for SSE3.
That were fastest which I up to now tested.
And all results Valid.
Athlon
Stay tuned and keep crunching
standard app / 3140,11 sec
)
standard app / 3140,11 sec / 52,3min / result
S5T0709 SSE3 / 2385,27 sec / 39,8min / result
S5T0709 SSE3 / 2415,67 sec / 40,3min / result
S5T0711 SSE3 / 2277,33 sec / 38,0min / result
S5T0711 SSE3 / 2334,70 sec / 38,9min / result
All Results are valid.
EDIT: my gain calculation was wrong, here the correct
gain:
S5T0709 = 23-24%
S5T0711 = 25-28%