GNU/Linux S5R3 App 4.49 available for Beta test

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: It indeed did contain

Message 81740 in response to message 81733

Quote:

It indeed did contain SSE2 instructions. I built a new SSE App and updated the archive (and the md5sum on the page).

Thanks for the report.

BM

@Bernd. . .

Would you like for some of us to test the SSE2 app as well? I have three SSE2-capable machines that run Linux, and I would be glad to help out.

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: RE: The new beta runs

Message 81741 in response to message 81739

Quote:
Quote:
The new beta runs fine on my XP 3000.

It would be really interesting to know if the SSE2 capable version (first release) you have running on your X2 shows a bigger speedup than the "fixed" SSE version you now have running on your XP. In other words, is the speedup due mainly to the SSE2 instructions or is there a speedup for SSE only machines as well.

Don't lose that SSE2 version - it might be quite valuable :-).

You read my thoughts. :-))

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

Im using the SSE2 version on

Im using the SSE2 version on my E8400, there is for sure a speedup (over 4.38)

1 WU almost broke the 9000 sec barrier, and thats when clocked to only 3.6GHz. Amazing work Bernd, you rule. Now someone please find out if the sse2 is indeed faster than the sse version =)
HOST: http://einsteinathome.org/host/1280362/tasks

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

I added Donald A. Tevault's

I added Donald A. Tevault's X2 6000 to the hosts that I feed into my DB, as I have an X2 6000 too, but which is running the SSE2 version. In a few days there will be results to compare.
Comparison can probably not be done on a single or some few results, because the speedup, if there is any, might only show up at some WUs, depending on their position close to a trough or to the peak.

cu,
Michael

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109959643893
RAC: 31120765

RE: RE: 1. Is it

Message 81744 in response to message 81735

Quote:
Quote:
1. Is it possible, that the first version(with some SSE2 instructions) might be faster than the new one?

Yes, it is possible, but I simply don't know.

Hey Bernd,

How about you make that SSE2 version available as a "power user" app so that those of us who weren't quick enough of the mark can at least test it a little?

That'll save us having to bribe Michael or th3 who are the two who have so far admitted to having it :-).

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 689044278
RAC: 213147

Hi all! I would be

Hi all!

I would be surprised to see a significant (if at all measurable) speedup for the initial app version that contains SSE2 instructions.

The app code consists of parts are really important to performance, and those have now been converted to hand-optimized assembly code (SSE).

The rest of the code is in C but not that crucial for performance. Only in those parts of the code there will be a difference in the two app versions, mostly by scalar double precision code being compiled to x87 or SSE2 instructions, respectively. To make optimal use of SSE2, one would have to generate SSE2 versions of the handcoded sections.

So, Iwould not hold my breath wrt. the SSE2 app variant.

CU
Bikeman

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

RE: Switched from 4.35 to

Message 81746 in response to message 81737

Quote:
Switched from 4.35 to 4.49 on my AMD Opteron 1210 cpu running SuSE Linux 10.3 and BOINC 5.10.45. Looks definitely faster but graphics is not working. It used to work in 4.35.
Tullio


Graphics did not work when I switched from 4,35 to 4.49 during a WU run. Now that I started one with 4.49 it works,

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: Hi all! I would be

Message 81747 in response to message 81745

Quote:

Hi all!

I would be surprised to see a significant (if at all measurable) speedup for the initial app version that contains SSE2 instructions.

The app code consists of parts are really important to performance, and those have now been converted to hand-optimized assembly code (SSE).

The rest of the code is in C but not that crucial for performance. Only in those parts of the code there will be a difference in the two app versions, mostly by scalar double precision code being compiled to x87 or SSE2 instructions, respectively. To make optimal use of SSE2, one would have to generate SSE2 versions of the handcoded sections.

So, Iwould not hold my breath wrt. the SSE2 app variant.

CU
Bikeman


In Bernd's initial posting I do not read anything about hand-coded SSE instructions, but about using a compiler switch.

I do not doubt what you are writing, but this app clearly is at least 10% faster, so there might be a chance that the SSE2 version is even a little faster.

Anyway, it will be fun to prove you are right. ;-)
Or in other words: Let's see if practice can prove theory. :-)

cu,
Michael

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 689044278
RAC: 213147

RE: In Bernd's initial

Message 81748 in response to message 81747

Quote:

In Bernd's initial posting I do not read anything about hand-coded SSE instructions, but about using a compiler switch.

Yes, exactly this I wanted to stress: any SSE2 instructions will be in the "compiled from C", scalar parts of the app. They are not in the "vectorized" hand coded parts. So don't expect wonders :-).

Quote:


I do not doubt what you are writing, but this app clearly is at least 10% faster, so there might be a chance that the SSE2 version is even a little faster.

Anyway, it will be fun to prove you are right. ;-)
Or in other words: Let's see if practice can prove theory. :-)

cu,
Michael

Yes, that should be very interesting. Any speed improvement (SSE or SSE2 variant) compared with 4.38 has multiple reasons:

-better "hardware prefetching" thru the use of SSE prefetching instructions (more noticeable for the "slow" WUs)
-I think the new app uses the same "interleaved loop" variant as the latest MacOS Intel app (stuff transplanted from Akos' magic app :-) ) (more noticeable for the "fast" WUs)

CU
Bikeman

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245209538
RAC: 13147

RE: RE: Switched from

Message 81749 in response to message 81746

Quote:
Quote:
Switched from 4.35 to 4.49 on my AMD Opteron 1210 cpu running SuSE Linux 10.3 and BOINC 5.10.45. Looks definitely faster but graphics is not working. It used to work in 4.35.
Tullio

Graphics did not work when I switched from 4,35 to 4.49 during a WU run. Now that I started one with 4.49 it works,


Yep. Switching App versions in the middle of a Task is not supported in BOINC. In case of the "separate graphics" Apps this means that the "graphics_app" link in the slot directory is not updated and points to a file that doesn't exist anymore after installing a new App version. It is only set up new when a new Task is started.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.