GNU/Linux S5R3 App 4.31 available for Beta test

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4265
Credit: 244921393
RAC: 16900
Topic 193497

A new Linux App is available from our Beta Test page.

This App looks a little faster than the previous 4.24 due to some hacking with the sin/cos routine, and it is a new "separate graphics" App (featuring the "extended information" mentioned in the "screensver competition" thread).

It's probably not the fastest we can do w/o SSE, but in contrast tothe quick-fix 4.24 it's an actual release candidate.

Please test and report!

BM

BM

Mikie Tim T
Mikie Tim T
Joined: 22 Jan 05
Posts: 105
Credit: 263777741
RAC: 0

GNU/Linux S5R3 App 4.31 available for Beta test

Just finished up my last 4.24 app result, so just started my first 4.31 result from scratch, so as soon an my old crusty Athlon 1200 finishes it and validates, I will report. No issues with installing this app, and no warnings this time in the messages.

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 117
Credit: 15648923808
RAC: 7420783

How does this relate to the

How does this relate to the Linux 4.27 'power' application? Is it the same but without the SSE optimisations, or are there other improvements involved with this release?

Soli Deo Gloria

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4265
Credit: 244921393
RAC: 16900

RE: How does this relate to

Message 78483 in response to message 78482

Quote:
How does this relate to the Linux 4.27 'power' application? Is it the same but without the SSE optimisations, or are there other improvements involved with this release?


There is some tuning on single instructions in the sin/cos approximation code which should give a few % overall compared to the 4.24. IThis won't bring it up to the speed of the 4.27, though.

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686042351
RAC: 598410

I've resurrected my Athlon

I've resurrected my Athlon 800 (w/o SSE) for this one, it can use all the speedup possible :-).

I'm expecting a ca. 10-15% speedup over 4.20.
CU
Bikeman

Mikie Tim T
Mikie Tim T
Joined: 22 Jan 05
Posts: 105
Credit: 263777741
RAC: 0

This looks noticeably faster

This looks noticeably faster than 4.24 was. It's a full 50% faster than a couple of the results on 4.24, so faster than the cyclical nature of the results can account for.

91934213

It hasn't had the chance to validate yet, which is unusual with this host as it's usually so much slower than what it's paired up with, that it usually validates after updating the scheduler.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: This looks noticeably

Message 78486 in response to message 78485

Quote:

This looks noticeably faster than 4.24 was. It's a full 50% faster than a couple of the results on 4.24, so faster than the cyclical nature of the results can account for.

91934213

It hasn't had the chance to validate yet, which is unusual with this host as it's usually so much slower than what it's paired up with, that it usually validates after updating the scheduler.

Based on the analysis tool that Mike Hewson created, the closest comparison result is 91702232.

91702232 completed in 97175.70 seconds.
91934213 completed in 84546.08 seconds.

A definite improvement...around 13-15%...

The unfortunate thing for us "selfish Windows users" (as someone else put it), is that if this beta app goes official, the Linux app will return to the 15-20% advantage over the Windows app running on the same hardware... :(

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

Nice app, its quite fast.

Nice app, its quite fast. First number is sequence number:

SEQ#.. TASKID ........ CPU TIME ..... CREDIT
4.31
58 .... 92029504 .... 15,499.44 .... 236.47
59 .... 92028745 .... 15,501.76 .... 236.47
60 .... 92027277 .... 15,725.44 .... 236.47
61 .... 92025205 .... 15,687.50 .... 236.47
4.27
62 .... 92024888 .... 13,404.96 .... 236.47
63 .... 92024882 .... 13,200.88 .... 236.47
64 .... 92023736 .... 13,076.19 .... 236.47

So it took only 16-17% more time to crunch with 4.31 compared to 4.27. That again tells me 4.27 isnt yet close to its "SSE powered" potential, you know what to do next for the penguin crunchers BM =D

This is the host, running at 4GHz, 32bit Debian 4.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4265
Credit: 244921393
RAC: 16900

RE: So it took only 16-17%

Message 78488 in response to message 78487

Quote:
So it took only 16-17% more time to crunch with 4.31 compared to 4.27. That again tells me 4.27 isnt yet close to its "SSE powered" potential, you know what to do next for the penguin crunchers BM =D


Actually the speedup is more than I expected. However I recently learned that the fiddling eith the sin/cos code led to the compiler handling other parts of the code differently. The speedup you see is largely "delayed" from the the 4.20 -> 4.24 code changes (where there was a speedup announced but not actually observerd). It's not bound to the sin/cos stuff itself, and thus can't be ported to the SSE version - it's already included there.

BM

BM

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: RE: So it took only

Message 78489 in response to message 78488

Quote:
Quote:
So it took only 16-17% more time to crunch with 4.31 compared to 4.27. That again tells me 4.27 isnt yet close to its "SSE powered" potential, you know what to do next for the penguin crunchers BM =D

Actually the speedup is more than I expected. However I recently learned that the fiddling eith the sin/cos code led to the compiler handling other parts of the code differently. The speedup you see is largely "delayed" from the the 4.20 -> 4.24 code changes (where there was a speedup announced but not actually observerd). It's not bound to the sin/cos stuff itself, and thus can't be ported to the SSE version - it's already included there.

BM

Do you have any idea why Windows 4.26 is roughly equivalent in speed to Linux 4.20, at least when viewed from the perspective of running on identical AMD hardware? IOW, why does the Windows app need the Linear sin/cos code and the compiler optimizations to get close to the performance of the Linux code-compiler combination that doesn't have the Linear sin/cos routines?

If you think this should be discussed in the Windows thread or the S5R3 general thread, feel free to move it... I'm not sure of where the discussion "should" go, since it is in regards to both platforms...

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4265
Credit: 244921393
RAC: 16900

RE: Do you have any idea

Message 78490 in response to message 78489

Quote:
Do you have any idea why Windows 4.26 is roughly equivalent in speed to Linux 4.20, at least when viewed from the perspective of running on identical AMD hardware? IOW, why does the Windows app need the Linear sin/cos code and the compiler optimizations to get close to the performance of the Linux code-compiler combination that doesn't have the Linear sin/cos routines?


Both compilers (gcc and MSVC) produce inefficient code in the "hot-loop" because they think they have too few FPU registers left for efficient code. On gcc you can get away lucky and it produces efficient code, denpending on how you fiddle with the sin/cos routine, but the MSC compiler seems to do it bad almost always, and the code is worse than that of the gcc.

I asked Akos to write an efficient implementation of the hot-loop in x87 assembler to be independent of the compiler; he agreed to do this, but I haven't received any code yet.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.