Bruce, a question about An Optimized Application

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

7 Jan 2006 6:56:47 UTC

Topic 190543

(moderation:

)

Bruce,

Can you give us the latest on the possibilities of getting the Albert application in optimized forms? WIth the Altivec version I see super performance and know that this is also (based on SETI@Home experience) potentially possible with the PC type CPUs. I know that to have decent coverage there would have to be about 7 different "flavors"

1) Standard
2) AMD SSE2
3) AMD SSE3
4) Intel SSE2
5) Intel SSE3
6) I forget
7) I forget #2

Is it this complexity and the difficulty of ensuring the download brings the correct version down?

Or something else?

Or, the check is in the mail?

Enquiring minds want to know! :)

Keck_Komputers

Joined: 18 Jan 05

Posts: 376

Credit: 5744955

RAC: 0

Bruce, a question about An Optimized Application

7 Jan 2006 10:47:32 UTC

Message 23541

(moderation:

)

I think I read somewhere that Albert was basically automatically optimized. When it detects that SSE3 or whatever is available it automatically runs code better suited for that instruction set.

BOINC WIKI

BOINCing since 2002/12/8

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

Hmmm, I don't think it is

7 Jan 2006 11:19:39 UTC

Message 23542

(moderation:

)

Hmmm,

I don't think it is doing a very good job then. If it was, I would expect closer concurrence between the G5 and the Xeons and I am not seeing that at all ...

Steve Cressman

Joined: 9 Feb 05

Posts: 104

Credit: 139654

RAC: 0

Even if it is too difficult

8 Jan 2006 0:29:59 UTC

Message 23543

(moderation:

)

Even if it is too difficult to have boinc d/l the appropriate app it could be left as is. Then have a seperate d/l page where we can d/l the one we need and manually install the app. A lot of us are quite familiar with this proceedure because we have done so with our seti apps.

98SE XP2500+ @ 2.1 GHz Boinc v5.8.8

tekwyzrd

Joined: 25 Feb 05

Posts: 49

Credit: 2922090

RAC: 0

@Paul: Make that 1)

8 Jan 2006 3:13:25 UTC

Message 23544

(moderation:

)

@Paul:

Make that

1) Standard
2) AMD SSE2
3) AMD SSE3
4) Intel SSE
5) Intel SSE2
6) Intel SSE3
7) I forget

Nothing travels faster than the speed of light with the possible exception of bad news, which obeys its own special laws.
Douglas Adams (1952 - 2001)

Ulrich Metzner

Joined: 22 Jan 05

Posts: 113

Credit: 963370

RAC: 0

If you're at this, make it

8 Jan 2006 4:23:24 UTC

Message 23545

(moderation:

)

If you're at this, make it that:

1) Standard
2) MMX
3) MMX + 3Dnow
4) MMX + SSE
5) MMX + 3Dnow2 + iSSE
6) MMX + SSE + SSE2
7) MMX + 3Dnow2 + SSE
8) MMX + 3Dnow2 + SSE + SSE2
9) MMX + SSE + SSE2 + SSE3
10) MMX + SSE + SSE2 + SSE3 + iA64
11) MMX + SSE + SSE2 + SSE3 + VT
...

... you see a complexity in this pattern? ;)

Aloha, Uli

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4352

Credit: 253933890

RAC: 34972

During the last weeks and

8 Jan 2006 7:59:42 UTC

Message 23546

(moderation:

)

During the last weeks and months we have been mainly busy with getting the Albert setup working, so I had not much time to spend on further optimization.

- The AltiVec-version of code is hancoded, explicitely using vector instructions where possible (at least in the very core of the program).
- On Linux, if SSE is detected the App switches to a part of the program that has been optimized for SSE by the compiler (gcc 3.4 or 4.0).
- On Windows we use the stock MSC compiler (7.1) on the generic version of the code.

I played with compiler options, compiler versions and modifications to the code for quite some time, but found the following measurements not to give any significant improvement in the calculation times compared to the Apps we currently deliver:

- prefer SSE2 over SSE when available (Linux)
- use hand-coded vector code (for SSE2) instad of leaving the optimization to the compiler (Linux)
- use SSE(2) optimization of the MSC compiler (Windows)
- use icc (the Intel compiler, version 8) instead of gcc or MSC

So my preliminary conclusions are that
- The MSC compiler does a suprisingly good job, at least on our code
- The SSE optimization of gcc seems to give results that are (nearly) as good as hand-written code
- The AltiVec Unit is simply better (and somewhat easier to program) than the SSE stuff; thats why I desperately regret the decision of Apple ragarding CPUs.

I began to play with the auto-vectorization of gcc-4 and icc-9, but without a usable result yet. It's something I'm still working on.

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

RE: - The AltiVec Unit is

8 Jan 2006 8:57:19 UTC

Message 23547 in response to message 23546

(moderation:

)

Quote:

- The AltiVec Unit is simply better (and somewhat easier to program) than the SSE stuff; thats why I desperately regret the decision of Apple ragarding CPUs.

Jobs did it to me with the Lisa, now I have a G5 he is at it again. Sorry, it is all my fault. I was thinking to go all PowerMac over windows.

I guess I will have to rethink that one. Though, I would like to get a Quad this year.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 590438032

RAC: 122747

Hello Bernd, thx for

5 Feb 2006 20:36:09 UTC

Message 23548

(moderation:

)

Hello Bernd,

thx for sharing that information! It's good to hear that devs are looking into this. When I compare this to the optimization process of the seti application, several things come to my mind:

- I think the largest single contribution in s@h was the caching of FFT results.. anything like that possible here?

- 2nd was the usage of a special FFT library, can't remember the name but it was hand coded for different CPUs and instruction sets
-> since e@h searches for periodic signals I suspect you're using FFT as the main algorithm as well?

- 3rd was the impact of using the icc 8 or 9, with different flags for p3, p4, p-m and with some tricks the p3 version worked for AXP as well and they made a A64 version
-> would it be useful to talk with the seti guys about their optimization experiences with the icc? (thinking of TMR, crunch3r, Harold Naparst)

MrS

Scanning for our furry friends since Jan 2002

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

RE: Hello Bernd, thx for

6 Feb 2006 4:18:30 UTC

Message 23549 in response to message 23548

(moderation:

)

Quote:

Hello Bernd,

thx for sharing that information! It's good to hear that devs are looking into this. When I compare this to the optimization process of the seti application, several things come to my mind:

- I think the largest single contribution in s@h was the caching of FFT results.. anything like that possible here?

- 2nd was the usage of a special FFT library, can't remember the name but it was hand coded for different CPUs and instruction sets
-> since e@h searches for periodic signals I suspect you're using FFT as the main algorithm as well?

- 3rd was the impact of using the icc 8 or 9, with different flags for p3, p4, p-m and with some tricks the p3 version worked for AXP as well and they made a A64 version
-> would it be useful to talk with the seti guys about their optimization experiences with the icc? (thinking of TMR, crunch3r, Harold Naparst)

MrS

Here is what I use on my Pentium II, SuSE Linux 9.3:
Optimized SETI client V4.07.3a for i686 with FFTW3 by Ned Slider
Tollio

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

Hi! I did a hand-optimized

6 Feb 2006 7:58:07 UTC

Message 23550

(moderation:

)

Hi!

I did a hand-optimized version of the albert code. (windows, no SSE)
It produces absolutely correct results, but at least two times faster.
Can I use it without any kickback?

Bruce, a question about An Optimized Application

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner