Hi, I try crunche the same WUs by original aplication and Akos opti. For this test a downloaded 4 WU and crunched by original app. are:
h1_0318.0_S5R1__23088_S5R1a_1 - T: 4012.6 sec
h1_0081.5_S5R1__242_S5R1a_1 - T: 4718.7 sec
l1_0229.0_S5R1__2610_S5R1a_0 - T: 3908.6 sec
l1_0229.0_S5R1__2610_S5R1a_1 - T: 3949.9 sec
Now I'm crunchig it by S5T0003 version:
h1_0318.0_S5R1__23088_S5R1a_1 - T: is not finished yet
h1_0081.5_S5R1__242_S5R1a_1 - T: 4773.2 sec
l1_0229.0_S5R1__2610_S5R1a_0 - T: 3916.4 sec
l1_0229.0_S5R1__2610_S5R1a_1 - T: 3916.1 sec
It seem the speed-up is ZERO
Copyright © 2024 Einstein@Home. All rights reserved.
S5Txxxx.dat Patched App Tests - Speed up
)
Please tell us what kind of processor these were run on. Your machines are hidden.
Dead men don't get the baby washed. HTH
RE: Please tell us what
)
This is on my A64 2800+ (nonOC) machine with 1024 MB RAM (DDR400) and WinXP SP2.
RE: Hi, I try crunche the
)
Thanks LiborA!
Your results a little bit surprising, but not too much.
I had to free up some places in the binary code at first.
I'm optimizing the hot loop now, but the zero-level tolerance at validation is a very strong restriction ( i cannot remove lots of slow rounding/memory operations ).
Akos, I don't know, if it can
)
Akos, I don't know, if it can help, but the Linux standard app. is around 20 % faster, than the Windows-version.
Thanks for you greatly work.
My P 4 HT 3.4Ghz was running
)
My P 4 HT 3.4Ghz was running S5T0001 and now S5T0301 and they cut about 1hour (maybe 2) off per unit when running two same time. Don't know about when running two diffrent projects at same time yet. Have another to compare times in an hour (this is compared to one long unit the mechine did on its own so not conclusive sorry)
My P D 830 3.21Ghz might have cut half an hour or more with the later S5T0301-S5T0304 waiting for the other person to return result so can see if will validate normally or if need concensus.
RE: Akos, I don't know, if
)
I think the official app could be about 2 times faster than the current speed. There are lots of needless FPU -> memory -> FPU operations, but i could not remove them because these movements change the last bit of the numbers ( only one bit! ).
The results would be better ( the last bit would be also good ) without these movements and the app would be more faster, but... the current S5 validator doesn't accept these ( better, faster ) results.
edit: perhaps i will do a test with SSE2, probably those registers doesn't need these corrections because they are only 64-bit wide.
RE: RE: Akos, I don't
)
Now I'm testing S5T0307. For a first view it seems a litle faster as original app and S5T003. The first of my testing Wu will be finished after cca 30 min :)
RE: I think the official
)
Thats very interesting. What say on this people from E@H as Bruce and other leader of project?
RE: Thats very interesting.
)
OK very fast so we make the WUs bigger. :-(
Athlon
Stay tuned and keep crunching
The first test result of
)
The first test result of S5T0307:
WU: l1_0229.0_S5R1__2610_S5R1a_1 - T: 3404.0 sec
That is about 14% speedup :)