Windows S5R3 "power users" App 4.26 available

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251366088
RAC: 36406
Topic 193453

From the 4.25 App Thread:

Quote:
I'll try to build an App with the old Visual Studio of 2003 (instead of VS2005). At least the /G7 optimization should work there. Let's see if it helps...


It can be found on the Power User's Apps page. This is definitely not a release candidate, just something to see in which direction to proceed.

BM

BM

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

Windows S5R3 "power users" App 4.26 available

Quote:
From the 4.25 App Thread:
Quote:
I'll try to build an App with the old Visual Studio of 2003 (instead of VS2005). At least the /G7 optimization should work there. Let's see if it helps...

It can be found on the Power User's Apps page. This is definitely not a release candidate, just something to see in which direction to proceed.

BM

Do you feel it is ok to switch with a result in progress? I just fired up the last one I had due to being away from the computer for 8-10 hours later on today...

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: RE: From the 4.25 App

Message 77480 in response to message 77479

Quote:
Quote:
From the 4.25 App Thread:
Quote:
I'll try to build an App with the old Visual Studio of 2003 (instead of VS2005). At least the /G7 optimization should work there. Let's see if it helps...

It can be found on the Power User's Apps page. This is definitely not a release candidate, just something to see in which direction to proceed.

BM

Do you feel it is ok to switch with a result in progress? I just fired up the last one I had due to being away from the computer for 8-10 hours later on today...

Eh, no pain no gain... I'm going to try it and post the results...

Edit: It has restarted with the new application without crashing, so that's a good sign, I guess...

Edit2: The "AuthenticAMD" string is back in the app. Does this mean that AMD processors may be at a disadvantage in certain segments of code?

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

Preliminary mixed-result

Preliminary mixed-result performance seems to be nothing short of amazing, but as others have pointed out, it is hard to get a feel for the actual performance without doing some sampling.

My current estimated runtime for h1_0712.50_S5R2__37_S5R3a is only 33,000 seconds

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2974621286
RAC: 795427

Switched two XP machines a

Switched two XP machines a couple of hours ago - running smoothly, but haven't been monitoring the speed.

More to the point, I've now switched the Vista32 box. Task 91267641 is mixed-mode (first 12% with 4.25, remainder with 4.26): anything later on host 831490 will be pure 4.26. This is the machine where I first reported the SETI optimised incompatibility with Vista, and did subsequent testing of what turned into viable apps. NB Vista didn't trash every WU, but when it did fail, it happened at the beginning of a run - so the current one is going to be OK (touch wood).

Svenie25
Svenie25
Joined: 21 Mar 05
Posts: 139
Credit: 2436862
RAC: 0

I just deleted the -lines

I just deleted the -lines from the app_info, just it was said in the linuxthread. Now the errormessage I got is gone away. So let´s see, what happens. ;)

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: Preliminary

Message 77484 in response to message 77481

Quote:

Preliminary mixed-result performance seems to be nothing short of amazing, but as others have pointed out, it is hard to get a feel for the actual performance without doing some sampling.

My current estimated runtime for h1_0712.50_S5R2__37_S5R3a is only 33,000 seconds

Up to 34,000 now, but I've seen this behavior before, where it runs slower during the middle portions of the result than it does at the beginning and end...

...that and I don't have some pow(x)/log(y)^2 - |(3.14159 - atan(x))| formula to guide me...

BTW, I still give props to you math nerds... ;-)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 753218670
RAC: 1191991

Hi! In theory (from

Hi!

In theory (from disassembly, profiling and early extrapolation of runtime) , this version should be about 15...20% faster than the previous one (for workunits around the minimum runtime within a frequency range). There's only one store-forwarding stall left in the critical code (in the part that does a conversion of a double to a 64 bit int) and I think this one could be eliminated in future versions as well. Looks good to me. I don't think the AMD punishment stuff does any harm in this app, but I will try later to replace "AuthenticAMD" with "GenuineIntel" and see what happens to performance :-)

The output of the compiler looks so much better when compared to that of the newer (!) compiler version that I wonder what has happened to the MS compiler. Did they completely change the underlying compiler engine?? It is rather radical that MS dropped the CPU specific optimization switches in the newer compiler version.

CU
Bikeman

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7256082285
RAC: 1464685

RE: this version should be

Message 77486 in response to message 77485

Quote:
this version should be about 15% faster than the previous one...


Is that a comparison to 4.25, or to 4.15?

My in-process results look pretty clearly faster than 4.15, so a fortiori faster than 4.25. I won't guess by how much, some real answers will be available in a few hours.

I do have a completion and validation on one mixed-ap result. It started on 4.15 for somewhat less than 2 CPU hours. Then it finished on 4.26. The total time 25,563 seconds is quite plainly faster than expected for this host on 4.15.

I should be able to post a pure 4.26 result, with an attempt at speedup estimate account for the periodicity effect within about two hours.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 753218670
RAC: 1191991

RE: RE: this version

Message 77487 in response to message 77486

Quote:
Quote:
this version should be about 15% faster than the previous one...

Is that a comparison to 4.25, or to 4.15?

Compared to 4.25, on a Core 2. Will know more tomorrow. I happen to have some workunits which are quite close to the minimum runtime per frequency, where the slope of the runtime variation is quite small, so it's quite possible to make comparisions by comparing runtimes from consecutive results.

CU
Bikeman

EDIT: From what I read here, it seems that the poor performance of the VS 2005 compiler compared to VS 2003 in this particular case (generating code full of store-forwarding-stalls) might be related to a bug acknowledged by Microsoft and fixed only in Visual Studio 2008.

CU

BRM

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7256082285
RAC: 1464685

My first pure 4.26 result is

My first pure 4.26 result is complete, but awaits quorum partner return for validation.

The execution time is very encouraging indeed:

23960 seconds, which is 86% of the value I'd expect for this host using the 4.15 ap for sequence number 69 at frequency 719.80.

As I lack samples from nearby sequence numbers, I've relied on the cycle period estimate to choose a comparable number from the next cycle higher. Plausible errors in that estimate and random variation from activity on the host puts a little uncertainty on this number, but it is a big speedup beyond any doubt. About 27000 CPU seconds was the minimum for two higher cycles on this host, and sequence number 69 is not at a cycle minimum, nor even close.

On the secondary indicator of power, I again forgot to get comparative readings, but the indirect die temperature indicator strongly hinted that stalling was much less prevalent on 4.26 than on 4.25. 4.26 matched 4.15 die temperatures on my Q6600 closely, while 4.25 ran appreciably cooler (3 or 4 degrees C).

My Q6600 has completed two mixed ap 4.15/4.26 results. Both are clearly faster than 4.15 expectation, but validation awaits quorum partner returns.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.