Einstein@Home for 64-bit Linux on AMD Athlon 64 X2

ebahapo
Joined: 22 Jan 05
Posts: 47
Credit: 750,425
RAC: 0

Correction: the x86-64 Linux

Message 43305 in response to message 43304

Correction: the x86-64 Linux client, version 5.8.11, can be downloaded from boinc_5.8.11_x86_64-pc-linux-gnu.tgz (make sure to copy both files to the BOINC working directory). The new x64 Windows client, version 5.8.11, by Crunch3r, can be found at boinc_5.8.11_windows_amd64.zip.

Update on project applications:

  • * Native 64-bit Application Sent to AMD64 Clients
    • * SIMAP (Linux)
      * Chess960 (Linux)
      *

ABC (Linux)
* ABC ß (Linux & Windows)
* Predictor (Linux)
* RieselSieve (Linux)
* 32-bit Application Sent to AMD64 Clients

  • * SETI & SETI ß (Linux)
    * HashClash (Linux & Windows)
    * Leiden (Linux)
    * Malaria (Linux)
    * Docking (Linux)
    * RieselSieve (Windows)
    * WCG (Linux)
    *

Pirates (Linux)
For more information, see BoincStats Forum.

HTH

clownius
clownius
Joined: 16 Jun 06
Posts: 42
Credit: 2,164,665
RAC: 0

I would really really like to

I would really really like to see x86_64 supported by Einstein in some form. Native app would be best but short term a 32 bit app issued to 64 bit would be good still.
I tried just about everything to get the app working with an app info on my C2D with no luck and its stopping my fastest 2 cores from crunching Einstein during AA6.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,129
Credit: 229,823,240
RAC: 21,022

RE: I'm not sure if they

Message 43307 in response to message 43303

Quote:
I'm not sure if they made it into the official version, but Akos was experimenting with hotloops that used both SSE and 387 instructions to process data in parallel. If they were put into the deployed version, disabling the 387 would be a significant performance hit for an x86-64 native app.

The current Einstein App uses this method, i.e. doing "more contributing" parts of the calculation in high precision (80bit on FPU) while doing the rest in single precision (SSE). For the current setup doing everything in single precision isn't precise enough.

This complicated way of calculation, btw, is the reason why I couldn't simply compile a (native) 64bit App of the current code.

We are working on the code for S5R2, and it looks like it will become a lot cleaner, and probably everything in the "inner loop" can be done in single precision, so it will be a little faster and it should also be easier to build native 64bit Apps (yes, we do care).

BM

BM

ebahapo
Joined: 22 Jan 05
Posts: 47
Credit: 750,425
RAC: 0

RE: The current Einstein

Message 43309 in response to message 43307

Quote:

The current Einstein App uses this method, i.e. doing "more contributing" parts of the calculation in high precision (80bit on FPU) while doing the rest in single precision (SSE). For the current setup doing everything in single precision isn't precise enough...

We are working on the code for S5R2, and it looks like it will become a lot cleaner, and probably everything in the "inner loop" can be done in single precision, so it will be a little faster and it should also be easier to build native 64bit Apps (yes, we do care).


Good to know!

But let me correct you in that although SSE supports only single-precision, SSE2 supports double-precision too. Of course, if Einstein really needs to use x87's extended-precision 80-bit, that's the only way to go.

And in case someone is wondering whether using SSE/SSE2 code side-by-side with x87 code is faster, it isn't, as both SSE/SSE2 and x87 share the same FPU, only through different interfaces.

HTH

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,129
Credit: 229,823,240
RAC: 21,022

RE: But let me correct you

Message 43310 in response to message 43309

Quote:
But let me correct you in that although SSE supports only single-precision, SSE2 supports double-precision too. Of course, if Einstein really needs to use x87's extended-precision 80-bit, that's the only way to go.


I know that there is SIMD support for double precision, but 1) there are (or at least were at time of coding) much more machines that could do SSE but coudn't run SSE2 than that could run both, and 2) (re-)aligning the data for double precision SIMD calculation ate up all speed we would gain from doing the just four FPU calculations in two double precision SSE2 calculations. It simply wasn't worth the effort. [Edit] Modern CPUs with their "virtually two FPUs" (another interface to the same physical unit) will combine the FPU calculations for us anyway.

BM

BM

Webmaster Yoda
Webmaster Yoda
Joined: 15 Mar 05
Posts: 17
Credit: 608,427
RAC: 0

This is all a bit technical

This is all a bit technical for me, but is it safe to assume that any CPU currently capable of 64 bit supports at least SSE2?

In other words, would the problem with SSE vs SSE2 support be irrelevant for a 64 bit app?

I too have a 64bit (Core 2 Duo) machine that normally runs 64 bit Ubuntu. It's temporarily running Windows so it can participate at Einstein but I would much rather run 64 bit Linux (so it can do a lot of work at projects that have a fast 64 bit app).

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 795,508,352
RAC: 34,140

It's not too complicated to

It's not too complicated to get current Einstein running under AMD64 linux. Details are highlited in this thread.

Metod ...

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,361
Credit: 3,278,658,646
RAC: 1,498,040

RE: This is all a bit

Message 43313 in response to message 43311

Quote:

This is all a bit technical for me, but is it safe to assume that any CPU currently capable of 64 bit supports at least SSE2?

In other words, would the problem with SSE vs SSE2 support be irrelevant for a 64 bit app?

Hardware support would be 100% for SSE2, but that wouldn't change the complexity of the software and of having to maintain more concurrent versions of it. The SSE to SSE2 port would still require just as much effort to carry out, and if sufficiently different, more work to maintain as well. AFAIK the only major difference across the codebase for different platforms is the x86 versions having assembler hotloops instead of c++. Different alignment requirements would require more widespread changes, and from the Akos client days of s4 there was extremely little performance gained from the change.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,129
Credit: 229,823,240
RAC: 21,022

I actually made a SSE2

I actually made a SSE2 version once, not modifying the "hot loop", but other parts of the program (sin/cos LUT). It didn't gain much on some CPUs and was much slower on others (Akos said there _might_ be some advantage on Woodcrests). And yes, it required to rearrange the data for a larger part of the program. At that time, the hazzle of maintaining (and deploying) yet another different version of the code wasn't worth the minimal speedup on only a few CPUs.

For the techs: For the current Apps we maintain four ("production"-) versions of the source code (for the central function, BOINC and graphics is C++, the rest is plain vanilla C):
- Hand-coded Assembler used for all x86 CPUs capable of SSE
- Hand-coded Assembler for x87 calculations (for x86 CPUs that can't do SSE)
- An AltiVec version using Motorola's C/C++-API to AltiVec instructions
- A generic C version that runs on all other CPUs such as G3, MIPS and SPARC

BM

BM

ebahapo
Joined: 22 Jan 05
Posts: 47
Credit: 750,425
RAC: 0

Thanks for the detailed

Thanks for the detailed explanation.

Then again, it shouldn't be too hard to send the 32-bit application to the 64-bit clients, as the number of projects already doing this confirm it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.