A new Linux App is available from our Beta Test page.
This App looks a little faster than the previous 4.24 due to some hacking with the sin/cos routine, and it is a new "separate graphics" App (featuring the "extended information" mentioned in the "screensver competition" thread).
It's probably not the fastest we can do w/o SSE, but in contrast tothe quick-fix 4.24 it's an actual release candidate.
Please test and report!
BM
BM
Copyright © 2024 Einstein@Home. All rights reserved.
GNU/Linux S5R3 App 4.31 available for Beta test
)
Just finished up my last 4.24 app result, so just started my first 4.31 result from scratch, so as soon an my old crusty Athlon 1200 finishes it and validates, I will report. No issues with installing this app, and no warnings this time in the messages.
How does this relate to the
)
How does this relate to the Linux 4.27 'power' application? Is it the same but without the SSE optimisations, or are there other improvements involved with this release?
Soli Deo Gloria
RE: How does this relate to
)
There is some tuning on single instructions in the sin/cos approximation code which should give a few % overall compared to the 4.24. IThis won't bring it up to the speed of the 4.27, though.
BM
BM
I've resurrected my Athlon
)
I've resurrected my Athlon 800 (w/o SSE) for this one, it can use all the speedup possible :-).
I'm expecting a ca. 10-15% speedup over 4.20.
CU
Bikeman
This looks noticeably faster
)
This looks noticeably faster than 4.24 was. It's a full 50% faster than a couple of the results on 4.24, so faster than the cyclical nature of the results can account for.
91934213
It hasn't had the chance to validate yet, which is unusual with this host as it's usually so much slower than what it's paired up with, that it usually validates after updating the scheduler.
RE: This looks noticeably
)
Based on the analysis tool that Mike Hewson created, the closest comparison result is 91702232.
91702232 completed in 97175.70 seconds.
91934213 completed in 84546.08 seconds.
A definite improvement...around 13-15%...
The unfortunate thing for us "selfish Windows users" (as someone else put it), is that if this beta app goes official, the Linux app will return to the 15-20% advantage over the Windows app running on the same hardware... :(
Nice app, its quite fast.
)
Nice app, its quite fast. First number is sequence number:
SEQ#.. TASKID ........ CPU TIME ..... CREDIT
4.31
58 .... 92029504 .... 15,499.44 .... 236.47
59 .... 92028745 .... 15,501.76 .... 236.47
60 .... 92027277 .... 15,725.44 .... 236.47
61 .... 92025205 .... 15,687.50 .... 236.47
4.27
62 .... 92024888 .... 13,404.96 .... 236.47
63 .... 92024882 .... 13,200.88 .... 236.47
64 .... 92023736 .... 13,076.19 .... 236.47
So it took only 16-17% more time to crunch with 4.31 compared to 4.27. That again tells me 4.27 isnt yet close to its "SSE powered" potential, you know what to do next for the penguin crunchers BM =D
This is the host, running at 4GHz, 32bit Debian 4.
Team Philippines
RE: So it took only 16-17%
)
Actually the speedup is more than I expected. However I recently learned that the fiddling eith the sin/cos code led to the compiler handling other parts of the code differently. The speedup you see is largely "delayed" from the the 4.20 -> 4.24 code changes (where there was a speedup announced but not actually observerd). It's not bound to the sin/cos stuff itself, and thus can't be ported to the SSE version - it's already included there.
BM
BM
RE: RE: So it took only
)
Do you have any idea why Windows 4.26 is roughly equivalent in speed to Linux 4.20, at least when viewed from the perspective of running on identical AMD hardware? IOW, why does the Windows app need the Linear sin/cos code and the compiler optimizations to get close to the performance of the Linux code-compiler combination that doesn't have the Linear sin/cos routines?
If you think this should be discussed in the Windows thread or the S5R3 general thread, feel free to move it... I'm not sure of where the discussion "should" go, since it is in regards to both platforms...
RE: Do you have any idea
)
Both compilers (gcc and MSVC) produce inefficient code in the "hot-loop" because they think they have too few FPU registers left for efficient code. On gcc you can get away lucky and it produces efficient code, denpending on how you fiddle with the sin/cos routine, but the MSC compiler seems to do it bad almost always, and the code is worse than that of the gcc.
I asked Akos to write an efficient implementation of the hot-loop in x87 assembler to be independent of the compiler; he agreed to do this, but I haven't received any code yet.
BM
BM