Is EINSTEIN client OPTIMIZED for speed?

azazil
azazil
Joined: 25 Feb 05
Posts: 4
Credit: 240,475
RAC: 0
Topic 188110

This project is about number crunching. I wonder if the programmers optimized the calculating part of Einstein client. A good optimization may make any calculation 10 times faster. Any information about it?

I hope they wrote it in C (or better Assembly) and used the fastest compiler available (they say Intel's C compiler makes 25% faster code for example then gcc or microsoft C++).

Kind Regards:

azazil

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632,255
RAC: 0

Is EINSTEIN client OPTIMIZED for speed?

> This project is about number crunching. I wonder if the programmers optimized
> the calculating part of Einstein client. A good optimization may make any
> calculation 10 times faster. Any information about it?
>
> I hope they wrote it in C (or better Assembly) and used the fastest compiler
> available (they say Intel's C compiler makes 25% faster code for example then
> gcc or microsoft C++).
>
>
> Kind Regards:
>
> azazil
>
Some of the optimizations will destroy the science being done as they will propogate round off errors faster. So it is much more likely that the code has been optimized more for correctness than speed.

verstapp
verstapp
Joined: 10 Nov 04
Posts: 43
Credit: 191,828
RAC: 0

... but we want wrong answers

... but we want wrong answers twice as fast! :-)


Cheers,
PeterV.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,861
Credit: 183,876,827
RAC: 35,822

It's optimized for speed as

It's optimized for speed as far we could get as long as the results stay correct within tolerances we found acceptable. Some calculations still need to be done in double precision, which means we can not make much use of e.g. SSE. It looks like the MSC compiler makes the most of our (C) code, so the Windows version is somewhat faster than the Linux & Mac versions (built with gcc). We didn't find a significant improvement with icc. When things have settled down a bit and became more stable, we may address this issue again. You may want to take a look at this old thread.

BM

BM

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5,385,205
RAC: 0

> It's optimized for speed as

Message 6063 in response to message 6062

> It's optimized for speed as far we could get as long as the results stay
> correct within tolerances we found acceptable. Some calculations still need to
> be done in double precision, which means we can not make much use of e.g. SSE.
> It looks like the MSC compiler makes the most of our (C) code, so the Windows
> version is somewhat faster than the Linux & Mac versions (built with gcc).
> We didn't find a significant improvement with icc. When things have settled
> down a bit and became more stable, we may address this issue again. You may
> want to take a look at href="https://einsteinathome.org/%3Ca%20href%3D"http://einsteinathome.org/node/187125">http://einsteinathome.org/node/187125">this[/url] old
> thread.
>
> BM

You also failed to point out that you have to "optimize" for the specific target hardware. In the old days when we optimized by hand, we would select the instructions that had the fewest clock cycles but would give us the same effects as the "correct" instructions. Since these instructions were part of the specific Instruction Set Architecture (ISA) you crafted the programs effectively by hand.

Right now, there is a debate "raging" on the boards (if you have the time to read all of them for the various projects) that discusses which CPU is "fastest" for processing work. The difficulty here is even "identical" processors of different "steppings" can alter the performance delivered due to changes in the internal arrangements.

The fundamental problem is that you have several degrees of "freedom" that rule the actual performance, as Bernd mentioned, you have the program itself, the compiler, the compiler "switches" used, the CPU ISA, and the CPU physical architecture.

The reason that I mention ALL of this is to point out that the AMD and Intel chips use an "identical" exterrnal ISA based on the 8086 ISA (with extensions that complicate things again as Bernd stated) but they have vastly different internal ISA and therefore delivered performance.

One of the favorite things I came across while reseaching ISA for a hardware class was a program that HP developed to "model" one of their CPUs, they compiled the program which emulated a CPU and ran it on the same actual physical CPU. Now, the expectation is that when you emulate in software a hardware system the delivered performance will be, at best, significantly slower. In fact, the emulator allowed them to do "on-the-fly" optimizations and deliivered performances as high as 20% over the native hardware. WHen you consider that you had added software overhead this is very interesting ...

Anyway, just more food for confusion ...

Follow the top link on my site to the lectures part and read the hardware lecture notes (well, that is all that is there are THIS time ... hard to miss)

ADDMP
ADDMP
Joined: 25 Feb 05
Posts: 104
Credit: 7,332,049
RAC: 0

Since the WIN version is said

Since the WIN version is said here to be better optimized than the Linux version, what is the experience of those here who run the WIN version under WINE under Linux as opposed to running the native Linus version under Linux?

Is WIN-under-WINE-under-Linux stable & error free?

How much faster is WIN-under-WINE-under-LINUX compared to native Linux?

(I am not asking about the screensaver but only the number-cunching.)

Thanks,
ADDMP

Divide Overflow
Divide Overflow
Joined: 9 Feb 05
Posts: 91
Credit: 183,220
RAC: 0

> How much faster is

Message 6065 in response to message 6064

> How much faster is WIN-under-WINE-under-LINUX compared to native Linux?

Since this delta is going to vary from host to host, why don't you try it for yourself and see?

Biogenesis
Biogenesis
Joined: 11 Nov 04
Posts: 24
Credit: 140,834
RAC: 0

> Since the WIN version is

Message 6066 in response to message 6064

> Since the WIN version is said here to be better optimized than the Linux
> version, what is the experience of those here who run the WIN version under
> WINE under Linux as opposed to running the native Linus version under Linux?
>
> Is WIN-under-WINE-under-Linux stable & error free?
>
> How much faster is WIN-under-WINE-under-LINUX compared to native Linux?
>
> (I am not asking about the screensaver but only the number-cunching.)
>
> Thanks,
> ADDMP
>

I tried it with mixed results. Basically the first WU returned fine with a very fast completion time (21,926.15s vs 37,047.61s for native Linux) however the 2 WUs processed after that completed in ~10k seconds and had errors. I don't know why they had errors but re-starting the client didn't help :(. So now I'm just back to Linux native only.

These are a couple of other threads that have been started on the subject, the devs definetly know about the problem.

http://einsteinathome.org/node/187846
http://einsteinathome.org/node/187471

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 755,289,694
RAC: 52,880

> > How much faster is

Message 6067 in response to message 6065

> > How much faster is WIN-under-WINE-under-LINUX compared to native Linux?
>
> Since this delta is going to vary from host to host, why don't you try it for
> yourself and see?

I can't tell you about E@H, but I did play a bit with seti classic. Running binary over the exactly the same WU on the same hardware. Linux native binary was slowest (by about 50%) and that's what I expected. What I didn't expect was that running Windows binary in WINE was actually faster than Windows binay in Windows (Win2k SP3 vs. RH7.3 with 2.4 series kernel). Not much, but persistently in order of 5%.

Metod ...

Jordan Wilberding
Jordan Wilberding
Joined: 19 Feb 05
Posts: 162
Credit: 715,454
RAC: 0

> I can't tell you about E@H,

Message 6068 in response to message 6067


> I can't tell you about E@H, but I did play a bit with seti classic. Running
> binary over the exactly the same WU on the same hardware. Linux native binary
> was slowest (by about 50%) and that's what I expected. What I didn't expect
> was that running Windows binary in WINE was actually faster than Windows binay
> in Windows (Win2k SP3 vs. RH7.3 with 2.4 series kernel). Not much, but
> persistently in order of 5%.
>

I am going to go ahead and try running E@H with wine on Linux, I'll give the results back once I get through testing it out.

such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell

ben
ben
Joined: 9 Feb 05
Posts: 36
Credit: 1,663
RAC: 0

If they want to borrow any of

If they want to borrow any of the code from seti optimization they are more than welcome.

https://sourceforge.net/projects/setiboinc/

Haven't seen the source for einstein but if it is all doubles, then only SSE2 and SSE3 would work. I'm not positive but I believe Mac's Altivec only uses single precision SIMD.

The real trick with most optimization these days isn't so much converting to SIMD, but arranging the code so that multiple execution units inside of the CPU can work on adjacent instructions at the same time.

Example:
for(i = 1; i &lt 100; i++) {
a += buffer[i];
}

vs

for(i = 1; i &lt 100; i+=4 ) {
a += buffer[i+0];
b += buffer[i+1];
c += buffer[i+2];
d += buffer[i+3];
}
a += b + c + d;

Then comes dependancy chains and latency scheduling. These can be done even under C without resorting to assembly language.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.