MacOS PPC Beta Test App 4.56 available

Guido Waldenmeier
Guido Waldenmeier
Joined: 18 Feb 05
Posts: 24
Credit: 2293
RAC: 0

RE: And I remember having

Message 30332 in response to message 30331

Quote:
And I remember having that same effect when I was using MacNN's heavily optimised SETI client, of which they claimed was the result of a much efficient Altivec utilisation....

Yeah, I had to swap between three or four different versions of the SETI worker, but I eventually saw a 175% speed increase on my 15" G4 867MHz (I go through four SETI WUs in a half hour on the Quad).

None of the optimized clients took longer, though.

Quote:
Assuming that is the case here, why the slowdown....?

Your guess is as good as mine right now. I'm too tired to look into it right now, but I'd hazard that Xcode was used to compile the worker app and without the gcc -fast flag (because it's G5-specific), and as a result whatever boosts there may have been by using 64-bit or SIMD ops were discarded in favour of making the worker app available to as large an audience as possible.

If someone can detail how I'd be able to throw `-fast` into the list of compiler flags during ./configure && make && make install, I'll set up the Quad and run a few WUs after I get some shut-eye and before I head off to class.

Just in case nobody knows what `-fast` does, here's the relevant part of man gcc:

Quote:

Optimize for maximum performance. -fast changes the overall optimization strategy of GCC in order to produce the fastest possible running code for PPC7450 and G5 architectures. By default, -fast optimizes for G5. Programs optimized for G5 will not run on PPC7450. To optimize for PPC7450, add -mcpu=7450 on command line.

-fast currently enables the following optimization flags (for G5 and PPC7450). These flags may change in the future. You cannot override any of these options if you use -fast except by setting -mcpu=7450.

To build shared libraries with -fast, specify -fPIC on command line.

-O3 -falign-loops-max-skip=15 -falign-jumps-max-skip=15 -falign-loops=16 -falign-jumps=16 -falign-functions=16 -malign-natural (except when -fastf is speified) -ffast-math -funroll-loops -ftree-loop-linear -ftree-loop-memset -mcpu=G5 -mpowerpc-gpopt -mtune=G5 (unless -mtune=G4 is specified) -fsched-interblock -fgcse-sm -mpowerpc64

Important notes: -ffast-math results in code that is not necessarily IEEE-compliant. -fstrict-aliasing is highly likely break non-standard-compliant programs. -malign-natural only works properly if the entire program is compiled with it, and none of the standard headers/libraries contain any code that changes alignment when this option is used.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245274321
RAC: 12211

To the people observing a

To the people observing a slowdown: Are these G4 or G5 machines? Is there a difference between them?

BM

BM

Elphidieus
Elphidieus
Joined: 20 Feb 05
Posts: 245
Credit: 20603702
RAC: 0

I'm on a single-core

I'm on a single-core G5....

from 10100-10300 on 4.39 up to 10500-10600 sec/WU on 4.56....

Rand
Rand
Joined: 11 Dec 05
Posts: 10
Credit: 601115
RAC: 0

I have also started using the

I have also started using the Beta version on a PowerMac G5 Quad. I also don't see much difference in the number of CPU seconds that it takes to process a WU. It appears to save about 150 seconds out of 7,800 seconds looking at my results. None of the results have been granted credit yet so they have not been verified. See computer ID 476088, with the result ID beginning at 27904540 and later. Thanks for the work Benard, hopefully it can be improved, or perhaps you can build different versions for different processors.

Rand

Rand
Rand
Joined: 11 Dec 05
Posts: 10
Credit: 601115
RAC: 0

RE: I have also started

Message 30336 in response to message 30335

Quote:

I have also started using the Beta version on a PowerMac G5 Quad. I also don't see much difference in the number of CPU seconds that it takes to process a WU. It appears to save about 150 seconds out of 7,800 seconds looking at my results. None of the results have been granted credit yet so they have not been verified. See computer ID 476088, with the result ID beginning at 27904540 and later. Thanks for the work Benard, hopefully it can be improved, or perhaps you can build different versions for different processors.

Rand

By the way I'm running Mac OS X 10.4.6 (Tiger).

Rand

Rand
Rand
Joined: 11 Dec 05
Posts: 10
Credit: 601115
RAC: 0

RE: RE: I have also

Message 30337 in response to message 30336

Quote:
Quote:

I have also started using the Beta version on a PowerMac G5 Quad. I also don't see much difference in the number of CPU seconds that it takes to process a WU. It appears to save about 150 seconds out of 7,800 seconds looking at my results. None of the results have been granted credit yet so they have not been verified. See computer ID 476088, with the result ID beginning at 27904540 and later. Thanks for the work Bernd, hopefully it can be improved, or perhaps you can build different versions for different processors.

Rand

By the way I'm running Mac OS X 10.4.6 (Tiger).

Rand

I assume Bernd that you are aware of the various vectorized vector/matrix libraries that exist for Mac OS X and perhaps you are already using them in the code, or maybe you've found coding by hand is faster, but just in case, here is a link which describes them:

http://www.hmug.org/man/7/vMathLib.php

(Sorry for misspelling your name in my first couple of posts :-))

Rand

Christian Hoklas
Christian Hoklas
Joined: 9 Feb 05
Posts: 11
Credit: 366364
RAC: 0

I have successfully finished

I have successfully finished 3 units with 4.56beta on Panther (10.3.9) / G4. Computation times went down from around 24000 secs to 20000 secs, which is highly appreciated by me! ;-)

Guido Waldenmeier
Guido Waldenmeier
Joined: 18 Feb 05
Posts: 24
Credit: 2293
RAC: 0

RE: To the people observing

Message 30339 in response to message 30333

Quote:
To the people observing a slowdown: Are these G4 or G5 machines? Is there a difference between them?


PowerMac11,2 (Quad G5) OS X.4.6 (8I127)

Elphidieus
Elphidieus
Joined: 20 Feb 05
Posts: 245
Credit: 20603702
RAC: 0

My first result validation of

My first result validation of Albert 4.56...

27873191

And an AMD Ahtlon system smoked my G5 by slightly over 4 times the speed with this WU....

EDITED:

another WU validation...

27935842

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245274321
RAC: 12211

That's actually pretty weird

That's actually pretty weird - my machines here are all G4 and show the 20% speedup of the AltiVec code. I really wonder why the very same code doesn't have any (positive) effect on G5 machines. The only thing I can think of is that the G5 further optimizes the old AltiVec code by out-of-order execution so much that it doesn't gain anything from the new code. I'll have to consult some people more knowledgeable than me...

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.