Optomized S5 SSE3

Ulrich Metzner
Ulrich Metzner
Joined: 22 Jan 05
Posts: 113
Credit: 963370
RAC: 0

Please don't get this

Please don't get this negative... ;)
I love the SSE optimizations, but i (...and i think i'm not alone...) would greatly love a 3Dnow! optimization... :D

Aloha, Uli

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

Unless the validator is

Unless the validator is relaxed somewhat it's not going to happen. 3dnow's calculations are sightly noisier than x86/sse and the differences in final result are too large to pass validation against the standard app.

Ulrich Metzner
Ulrich Metzner
Joined: 22 Jan 05
Posts: 113
Credit: 963370
RAC: 0

RE: Unless the validator is

Message 39141 in response to message 39140

Quote:
Unless the validator is relaxed somewhat it's not going to happen. 3dnow's calculations are sightly noisier than x86/sse and the differences in final result are too large to pass validation against the standard app.

Well, from the last 3Dnow! optimizations akosf did, i know, that the 3Dnow! instructions were much more precise than the corresponding SSE instructions, so what's the case here? ;)

Aloha, Uli

ca_grufti
ca_grufti
Joined: 9 Feb 05
Posts: 53
Credit: 4309237
RAC: 0

S5T0709 - 2 long WU's - both

S5T0709 - 2 long WU's - both valid

34473447
34473370

Athlon 64 X2 4800+ / approx. 25% faster than standard science application

LucaB76 - BOINC.Italy
LucaB76 - BOINC...
Joined: 16 Jan 06
Posts: 14
Credit: 754232
RAC: 0

My results with long h1

My results with long h1 WUs:

1st PC: P4 HT Disabled
40810s (11h 20m) with Standard application => Granted 139.69 credits;
35658s (09h 54m) with S5T0301 for SSE3 (13% saved) => Granted 155.91 credits;
29957s (08h 19m) with S5T0709 for SSE3 (27% saved) => Claimed 177.57 credits;

2nd PC: P4 3.0GHz HT Disabled
44327s (12h 19m) with Standard application => Granted 140.61 credits;
39332s (10h 55m) with S5T0301 for SSE3 (11% saved) => Granted 178.62 credits;
32065s (08h 54m) with S5T0709 for SSE3 (28% saved) => Granted 176.42 credits;

These results came from different WUs with different weight (credits granted differ). From this, I think that these application could bring even more savings if checked against the same "reference" WU.

Hope this helps!
Luca B. from Italy
(mmm... since it's 3.00AM... i'm going to sleep!)

miw
miw
Joined: 18 Jan 05
Posts: 19
Credit: 46235552
RAC: 0

Windows XP Athlon X2 4800+

Message 39144 in response to message 39143


Windows XP Athlon X2 4800+ Running Einstein on Both Sides:

S5T0301: 34848688 Success 22,577.65 178.93 178.93
S5T0301: 34848686 Success 22,831.80 178.93 pending
S5T0301: 34848682 Success 23,418.51 178.93 pending
S5T0301: 34848680 Success 24,094.41 178.93 pending
Hybrid: 34848677 Success 26,407.12 178.93 pending
Stock app: 34848671 Success 29,707.83 178.93 178.93
Stock App: 34848667 Success 29,723.77 178.93 pending
Stock App: 34842050 Success 29,260.13 178.93 178.93

At least a 20%-22% speed improvement. No invalid WUs yet, all the one where craedit has not been granted are in "Initial" validate state. Note that the 23k and 24k second ones were done while there was quite a bit of rebooting (once or twice an hour) going on for other reasons.

Mark

--miw

James
James
Joined: 13 Apr 06
Posts: 5
Credit: 37146
RAC: 0

S5S0007.dat After using 304

S5S0007.dat
After using 304 this is a welcome change. Speed improvement is good, went from a ~13 percent improvement over stock with 304 to a ~17-18 percent improvement with 0007. All 0007 results are verifying.

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: RE: Unless the

Message 39146 in response to message 39141

Quote:
Quote:
Unless the validator is relaxed somewhat it's not going to happen. 3dnow's calculations are sightly noisier than x86/sse and the differences in final result are too large to pass validation against the standard app.
Well, from the last 3Dnow! optimizations akosf did, i know, that the 3Dnow! instructions were much more precise than the corresponding SSE instructions, so what's the case here? ;)

The validator doesn't accept any differences at moment...

a = 29524.343689092802
b = 0.00010640685824120094
c = a * b

after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736

Both unit used the same rounding method (rc=2).

The SSE2 calculation was fail on the validation.
So, i cannot exchange any FPU instructions with SSE or 3DNow!

LiborA
LiborA
Joined: 8 Dec 05
Posts: 74
Credit: 337135
RAC: 0

To Akos: Hallo Akos, I'm

To Akos:

Hallo Akos, I'm not sure if you read my post about diferent size of output files from some optimalization. Therefor I repeat it (it's a question from me):
Original app and opti app with validated results (e.g. S5S0007) have the same size of output files. But size of output file from S5T0308 is diferent. So I'm not programer but can not be fault in implemented optimalized algorythm?
For my view it can be problem not as this:
a = 29524.343689092802
b = 0.00010640685824120094
c = a * b

after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736

but maybe

after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265

This is only my idea I'm not a specialist on this.

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: Hallo Akos, I'm not

Message 39148 in response to message 39147

Quote:
Hallo Akos, I'm not sure if you read my post about diferent size of output files from some optimalization.

The output files are compressed, but the same compression of the same datas give the same size to the compressed files. If the sizes are different with the same compression that means the datas are also different.

Quote:
Original app and opti app with validated results (e.g. S5S0007) have the same size of output files. But size of output file from S5T0308 is diferent. So I'm not programer but can not be fault in implemented optimalized algorythm?

As far as i know S5T0308 gives invalid result, but the original gives good results. It means that the results are different, so the datas are different in the result file. Compression of different datas give a bit different size ( and as far as i know the compression is executed on ASCII characters that also gives differences, ... ).

Quote:

after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736

but maybe

after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265

Where are the other digits?

after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265xxxxxxx

I think if the validator doesn't accept one digit fault then it will not accept 7 digits fault too...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.