Please don't get this negative... ;)
I love the SSE optimizations, but i (...and i think i'm not alone...) would greatly love a 3Dnow! optimization... :D
Unless the validator is relaxed somewhat it's not going to happen. 3dnow's calculations are sightly noisier than x86/sse and the differences in final result are too large to pass validation against the standard app.
Unless the validator is relaxed somewhat it's not going to happen. 3dnow's calculations are sightly noisier than x86/sse and the differences in final result are too large to pass validation against the standard app.
Well, from the last 3Dnow! optimizations akosf did, i know, that the 3Dnow! instructions were much more precise than the corresponding SSE instructions, so what's the case here? ;)
1st PC: P4 HT Disabled
40810s (11h 20m) with Standard application => Granted 139.69 credits;
35658s (09h 54m) with S5T0301 for SSE3 (13% saved) => Granted 155.91 credits;
29957s (08h 19m) with S5T0709 for SSE3 (27% saved) => Claimed 177.57 credits;
2nd PC: P4 3.0GHz HT Disabled
44327s (12h 19m) with Standard application => Granted 140.61 credits;
39332s (10h 55m) with S5T0301 for SSE3 (11% saved) => Granted 178.62 credits;
32065s (08h 54m) with S5T0709 for SSE3 (28% saved) => Granted 176.42 credits;
These results came from different WUs with different weight (credits granted differ). From this, I think that these application could bring even more savings if checked against the same "reference" WU.
Hope this helps!
Luca B. from Italy
(mmm... since it's 3.00AM... i'm going to sleep!)
At least a 20%-22% speed improvement. No invalid WUs yet, all the one where craedit has not been granted are in "Initial" validate state. Note that the 23k and 24k second ones were done while there was quite a bit of rebooting (once or twice an hour) going on for other reasons.
S5S0007.dat
After using 304 this is a welcome change. Speed improvement is good, went from a ~13 percent improvement over stock with 304 to a ~17-18 percent improvement with 0007. All 0007 results are verifying.
Unless the validator is relaxed somewhat it's not going to happen. 3dnow's calculations are sightly noisier than x86/sse and the differences in final result are too large to pass validation against the standard app.
Well, from the last 3Dnow! optimizations akosf did, i know, that the 3Dnow! instructions were much more precise than the corresponding SSE instructions, so what's the case here? ;)
The validator doesn't accept any differences at moment...
a = 29524.343689092802
b = 0.00010640685824120094
c = a * b
after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736
Both unit used the same rounding method (rc=2).
The SSE2 calculation was fail on the validation.
So, i cannot exchange any FPU instructions with SSE or 3DNow!
Hallo Akos, I'm not sure if you read my post about diferent size of output files from some optimalization. Therefor I repeat it (it's a question from me):
Original app and opti app with validated results (e.g. S5S0007) have the same size of output files. But size of output file from S5T0308 is diferent. So I'm not programer but can not be fault in implemented optimalized algorythm?
For my view it can be problem not as this:
a = 29524.343689092802
b = 0.00010640685824120094
c = a * b
after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736
but maybe
after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265
This is only my idea I'm not a specialist on this.
Hallo Akos, I'm not sure if you read my post about diferent size of output files from some optimalization.
The output files are compressed, but the same compression of the same datas give the same size to the compressed files. If the sizes are different with the same compression that means the datas are also different.
Quote:
Original app and opti app with validated results (e.g. S5S0007) have the same size of output files. But size of output file from S5T0308 is diferent. So I'm not programer but can not be fault in implemented optimalized algorythm?
As far as i know S5T0308 gives invalid result, but the original gives good results. It means that the results are different, so the datas are different in the result file. Compression of different datas give a bit different size ( and as far as i know the compression is executed on ASCII characters that also gives differences, ... ).
Quote:
after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736
but maybe
after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265
Where are the other digits?
after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265xxxxxxx
I think if the validator doesn't accept one digit fault then it will not accept 7 digits fault too...
Please don't get this
)
Please don't get this negative... ;)
I love the SSE optimizations, but i (...and i think i'm not alone...) would greatly love a 3Dnow! optimization... :D
Aloha, Uli
Unless the validator is
)
Unless the validator is relaxed somewhat it's not going to happen. 3dnow's calculations are sightly noisier than x86/sse and the differences in final result are too large to pass validation against the standard app.
RE: Unless the validator is
)
Well, from the last 3Dnow! optimizations akosf did, i know, that the 3Dnow! instructions were much more precise than the corresponding SSE instructions, so what's the case here? ;)
Aloha, Uli
S5T0709 - 2 long WU's - both
)
S5T0709 - 2 long WU's - both valid
34473447
34473370
Athlon 64 X2 4800+ / approx. 25% faster than standard science application
My results with long h1
)
My results with long h1 WUs:
1st PC: P4 HT Disabled
40810s (11h 20m) with Standard application => Granted 139.69 credits;
35658s (09h 54m) with S5T0301 for SSE3 (13% saved) => Granted 155.91 credits;
29957s (08h 19m) with S5T0709 for SSE3 (27% saved) => Claimed 177.57 credits;
2nd PC: P4 3.0GHz HT Disabled
44327s (12h 19m) with Standard application => Granted 140.61 credits;
39332s (10h 55m) with S5T0301 for SSE3 (11% saved) => Granted 178.62 credits;
32065s (08h 54m) with S5T0709 for SSE3 (28% saved) => Granted 176.42 credits;
These results came from different WUs with different weight (credits granted differ). From this, I think that these application could bring even more savings if checked against the same "reference" WU.
Hope this helps!
Luca B. from Italy
(mmm... since it's 3.00AM... i'm going to sleep!)
Windows XP Athlon X2 4800+
)
Windows XP Athlon X2 4800+ Running Einstein on Both Sides:
S5T0301: 34848688 Success 22,577.65 178.93 178.93
S5T0301: 34848686 Success 22,831.80 178.93 pending
S5T0301: 34848682 Success 23,418.51 178.93 pending
S5T0301: 34848680 Success 24,094.41 178.93 pending
Hybrid: 34848677 Success 26,407.12 178.93 pending
Stock app: 34848671 Success 29,707.83 178.93 178.93
Stock App: 34848667 Success 29,723.77 178.93 pending
Stock App: 34842050 Success 29,260.13 178.93 178.93
At least a 20%-22% speed improvement. No invalid WUs yet, all the one where craedit has not been granted are in "Initial" validate state. Note that the 23k and 24k second ones were done while there was quite a bit of rebooting (once or twice an hour) going on for other reasons.
Mark
--miw
S5S0007.dat After using 304
)
S5S0007.dat
After using 304 this is a welcome change. Speed improvement is good, went from a ~13 percent improvement over stock with 304 to a ~17-18 percent improvement with 0007. All 0007 results are verifying.
RE: RE: Unless the
)
The validator doesn't accept any differences at moment...
a = 29524.343689092802
b = 0.00010640685824120094
c = a * b
after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736
Both unit used the same rounding method (rc=2).
The SSE2 calculation was fail on the validation.
So, i cannot exchange any FPU instructions with SSE or 3DNow!
To Akos: Hallo Akos, I'm
)
To Akos:
Hallo Akos, I'm not sure if you read my post about diferent size of output files from some optimalization. Therefor I repeat it (it's a question from me):
Original app and opti app with validated results (e.g. S5S0007) have the same size of output files. But size of output file from S5T0308 is diferent. So I'm not programer but can not be fault in implemented optimalized algorythm?
For my view it can be problem not as this:
a = 29524.343689092802
b = 0.00010640685824120094
c = a * b
after FPU calculation: c = 3.141592653589731
after SSE2 calculation: c = 3.141592653589736
but maybe
after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265
This is only my idea I'm not a specialist on this.
RE: Hallo Akos, I'm not
)
The output files are compressed, but the same compression of the same datas give the same size to the compressed files. If the sizes are different with the same compression that means the datas are also different.
As far as i know S5T0308 gives invalid result, but the original gives good results. It means that the results are different, so the datas are different in the result file. Compression of different datas give a bit different size ( and as far as i know the compression is executed on ASCII characters that also gives differences, ... ).
Where are the other digits?
after correct calculation: c = 3.141592653589731
after invalid calculation: c = 3.14159265xxxxxxx
I think if the validator doesn't accept one digit fault then it will not accept 7 digits fault too...