From the 4.25 App Thread:
I'll try to build an App with the old Visual Studio of 2003 (instead of VS2005). At least the /G7 optimization should work there. Let's see if it helps...
It can be found on the Power User's Apps page. This is definitely not a release candidate, just something to see in which direction to proceed.
BM
BM
Copyright © 2024 Einstein@Home. All rights reserved.
Windows S5R3 "power users" App 4.26 available
)
Do you feel it is ok to switch with a result in progress? I just fired up the last one I had due to being away from the computer for 8-10 hours later on today...
RE: RE: From the 4.25 App
)
Eh, no pain no gain... I'm going to try it and post the results...
Edit: It has restarted with the new application without crashing, so that's a good sign, I guess...
Edit2: The "AuthenticAMD" string is back in the app. Does this mean that AMD processors may be at a disadvantage in certain segments of code?
Preliminary mixed-result
)
Preliminary mixed-result performance seems to be nothing short of amazing, but as others have pointed out, it is hard to get a feel for the actual performance without doing some sampling.
My current estimated runtime for h1_0712.50_S5R2__37_S5R3a is only 33,000 seconds
Switched two XP machines a
)
Switched two XP machines a couple of hours ago - running smoothly, but haven't been monitoring the speed.
More to the point, I've now switched the Vista32 box. Task 91267641 is mixed-mode (first 12% with 4.25, remainder with 4.26): anything later on host 831490 will be pure 4.26. This is the machine where I first reported the SETI optimised incompatibility with Vista, and did subsequent testing of what turned into viable apps. NB Vista didn't trash every WU, but when it did fail, it happened at the beginning of a run - so the current one is going to be OK (touch wood).
I just deleted the -lines
)
I just deleted the -lines from the app_info, just it was said in the linuxthread. Now the errormessage I got is gone away. So let´s see, what happens. ;)
RE: Preliminary
)
Up to 34,000 now, but I've seen this behavior before, where it runs slower during the middle portions of the result than it does at the beginning and end...
...that and I don't have some pow(x)/log(y)^2 - |(3.14159 - atan(x))| formula to guide me...
BTW, I still give props to you math nerds... ;-)
Hi! In theory (from
)
Hi!
In theory (from disassembly, profiling and early extrapolation of runtime) , this version should be about 15...20% faster than the previous one (for workunits around the minimum runtime within a frequency range). There's only one store-forwarding stall left in the critical code (in the part that does a conversion of a double to a 64 bit int) and I think this one could be eliminated in future versions as well. Looks good to me. I don't think the AMD punishment stuff does any harm in this app, but I will try later to replace "AuthenticAMD" with "GenuineIntel" and see what happens to performance :-)
The output of the compiler looks so much better when compared to that of the newer (!) compiler version that I wonder what has happened to the MS compiler. Did they completely change the underlying compiler engine?? It is rather radical that MS dropped the CPU specific optimization switches in the newer compiler version.
CU
Bikeman
RE: this version should be
)
Is that a comparison to 4.25, or to 4.15?
My in-process results look pretty clearly faster than 4.15, so a fortiori faster than 4.25. I won't guess by how much, some real answers will be available in a few hours.
I do have a completion and validation on one mixed-ap result. It started on 4.15 for somewhat less than 2 CPU hours. Then it finished on 4.26. The total time 25,563 seconds is quite plainly faster than expected for this host on 4.15.
I should be able to post a pure 4.26 result, with an attempt at speedup estimate account for the periodicity effect within about two hours.
RE: RE: this version
)
Compared to 4.25, on a Core 2. Will know more tomorrow. I happen to have some workunits which are quite close to the minimum runtime per frequency, where the slope of the runtime variation is quite small, so it's quite possible to make comparisions by comparing runtimes from consecutive results.
CU
Bikeman
EDIT: From what I read here, it seems that the poor performance of the VS 2005 compiler compared to VS 2003 in this particular case (generating code full of store-forwarding-stalls) might be related to a bug acknowledged by Microsoft and fixed only in Visual Studio 2008.
CU
BRM
My first pure 4.26 result is
)
My first pure 4.26 result is complete, but awaits quorum partner return for validation.
The execution time is very encouraging indeed:
23960 seconds, which is 86% of the value I'd expect for this host using the 4.15 ap for sequence number 69 at frequency 719.80.
As I lack samples from nearby sequence numbers, I've relied on the cycle period estimate to choose a comparable number from the next cycle higher. Plausible errors in that estimate and random variation from activity on the host puts a little uncertainty on this number, but it is a big speedup beyond any doubt. About 27000 CPU seconds was the minimum for two higher cycles on this host, and sequence number 69 is not at a cycle minimum, nor even close.
On the secondary indicator of power, I again forgot to get comparative readings, but the indirect die temperature indicator strongly hinted that stalling was much less prevalent on 4.26 than on 4.25. 4.26 matched 4.15 die temperatures on my Q6600 closely, while 4.25 ran appreciably cooler (3 or 4 degrees C).
My Q6600 has completed two mixed ap 4.15/4.26 results. Both are clearly faster than 4.15 expectation, but validation awaits quorum partner returns.