Optomized S5 SSE3

Norman_RKN
Norman_RKN
Joined: 9 Feb 06
Posts: 27
Credit: 1368075
RAC: 0

oh, thx. i should stop this

oh, thx.
i should stop this app and go back to 0709 ;)

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 826243590
RAC: 84775

S5S0007 on Pentium M 755

S5S0007 on Pentium M 755 (2Ghz):

long WU (granted credit 167.86): CPU time dropped from arround 36000 secs to slightly less than 34000 secs - that's a 6% speedup

short WU (granted credit 19.72): CPU time dropped from arround 4200 secs to 3900 secs - that's a 7% speedup

All WUs validate.

This CPU obviously doesn't like Akos' magic as much as some other CPUs. On the other hand the biggest magic comes from SSE3 optimizations while this CPU doesn't support SSE3.

Good work, Akos!

Metod ...

Norman_RKN
Norman_RKN
Joined: 9 Feb 06
Posts: 27
Credit: 1368075
RAC: 0

mmh, should we only link the

mmh, should we only link the stable releases of akos optimised patchfiles (clients)so that new people dont download a "ugly" version etc. ?
i think it is better for all.

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

S5T0711.dat - eliminated

Message 39162 in response to message 39099

S5T0711.dat

- eliminated double jumps
- reduced amount of FPU macro ops
- removed double loads on general purpose registers

- better SSE register usage
- reduced memory and integer register usage
- optimized branch structure
- faster FPU comparisons

- SSE3 truncation
- some reordered instructions
- automatic SSE/SSE3 usage

CPU: ALL

LiborA
LiborA
Joined: 8 Dec 05
Posts: 74
Credit: 337135
RAC: 0

RE: S5T0711.dat -

Message 39163 in response to message 39162

Quote:

S5T0711.dat

- eliminated double jumps
- reduced amount of FPU macro ops
- removed double loads on general purpose registers

- better SSE register usage
- reduced memory and integer register usage
- optimized branch structure
- faster FPU comparisons

- SSE3 truncation
- some reordered instructions
- automatic SSE/SSE3 usage

CPU: ALL

Hallo Akos,
are in this version some improvements from S5S0007 or major in this version is test for automatic SSE/SSE3 usage?

Are you doing anything else or only make beter Boinc project's applications ? :-)

Pepperammi
Pepperammi
Joined: 20 Feb 05
Posts: 131
Credit: 437943
RAC: 0

@ Akosf I'm curious. How

@ Akosf

I'm curious. How do you impliment automatic SSE/SSE3? Also should the stderr.txt file state found a cpu type 2 or something? though i think thats more to do with the detection part of the app. Anyway this is what i get-SSE3 capable machine

Quote:
2006-06-25 11:23:13.8593 [normal]: E@H S5R1 4.02 0711 TEST
2006-06-25 11:23:13.8750 [normal]: Started search at lalDebugLevel = 0
2006-06-25 11:23:14.8437 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-06-25 11:23:14.8437 [normal]: Trying to read Fstat-file into toplist ...
2006-06-25 11:23:18.0156 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-06-25 11:23:18.0156 [normal]: Resuming computation at (18390/85660327/1714784).
Detected CPU type 1


Let you know when its finished.

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

I tested some different SSE3

I tested some different SSE3 versions, all with valid results. You see an overview here.

Very fine to see the progress in speed. :-))

cu,
Michael

LiborA
LiborA
Joined: 8 Dec 05
Posts: 74
Credit: 337135
RAC: 0

RE: I think if the

Message 39166 in response to message 39148

Quote:
I think if the validator doesn't accept one digit fault then it will not accept 7 digits fault too...

I compare result which produce some opti app. (I write here only first row)

WU (output file): l1_0229.0_S5R1__2611_S5R1a_1_0
Original app:
229.499118078 0.010001 -0.843352 -1.7676e-009 32.6857
S5S007:
229.499118078 0.010001 -0.843352 -1.7676e-009 32.6857
S5T0308:
229.499115044 4.87049 0.871399 6.8924e-011 654.035

That isn't problem of one digit. S5T0308 produce totally incompatible results! I don't know whay?

Athlonheizer
Athlonheizer
Joined: 3 Jun 06
Posts: 33
Credit: 513937
RAC: 0

With me now run S5T0307 for

With me now run S5T0307 for SSE and S5T0709 for SSE3.
That were fastest which I up to now tested.
And all results Valid.

Athlon

Stay tuned and keep crunching

Crunchers For More Power
Crunchers For M...
Joined: 3 Aug 05
Posts: 69
Credit: 1071273
RAC: 0

standard app / 3140,11 sec

standard app / 3140,11 sec / 52,3min / result
S5T0709 SSE3 / 2385,27 sec / 39,8min / result
S5T0709 SSE3 / 2415,67 sec / 40,3min / result

S5T0711 SSE3 / 2277,33 sec / 38,0min / result
S5T0711 SSE3 / 2334,70 sec / 38,9min / result

All Results are valid.

EDIT: my gain calculation was wrong, here the correct

gain:
S5T0709 = 23-24%
S5T0711 = 25-28%

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.