64-bit in Einstein crunching

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

RE: But are you sure the

20 Jan 2008 6:11:21 UTC

Message 76625 in response to message 76624

(moderation:

)

Quote:

But are you sure the apps are using the same code basis and the only difference is the 64 bit mode?

Yes. I'm sure.

The key: ABC@Home uses lots of operations on 64 bit wide integers.
These operations are 2-3 times faster in 64 bit mode.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 690600114

RAC: 266595

RE: RE: But are you sure

20 Jan 2008 9:39:54 UTC

Message 76626 in response to message 76625

(moderation:

)

Quote:

Quote:
But are you sure the apps are using the same code basis and the only difference is the 64 bit mode?

Yes. I'm sure.

The key: ABC@Home uses lots of operations on 64 bit wide integers.
These operations are 2-3 times faster in 64 bit mode.

Ah, I see! Not something E@H would benefit from (mostly floating point ops), tho. Number theory projects are a different story, I admit!

CU
Bikeman

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 690600114

RAC: 266595

I have to correct myself once

20 Jan 2008 14:44:19 UTC

Message 76627

(moderation:

)

I have to correct myself once more, there is one area where even floating point calculations would benefit from the 64 bit instrcution set.

Some compilers tend to use a pair of 32bit integer move instructions to copy a 64 bit double precision floating point. This is very bad: It leads to mixing 32bit write- and 64 bit read-accesses on the same data, which is causing a so called "store forward stall", something which is quite expensive. I see this a lot in the code the MS compiler generated for the latest Windows beta app. With 64 bit instructions, even the dumbest compiler would copy the 64 bit floats in one instruction, preventing a store forward stall.

Even in 32 bit mode there would be ways around this effect, but in 64 bit mode it's kind of foolproof :-)

Bikeman

Jesse Viviano

Joined: 8 Jun 05

Posts: 33

Credit: 133045917

RAC: 0

The biggest advantages for

22 Jan 2008 8:10:51 UTC

Message 76628

(moderation:

)

The biggest advantages for using 64-bit is that there are double the number of registers in the register files, that SSE2 support is guaranteed, and that function arguments can usually be passed in the registers.

Doubling the number of registers reduces the amount of wasted cycles caused by these reasons:

* The low number of registers in x86 32-bit mode forces the compiler or assembly programmer to use more move instructions, wasting cycles that would otherwise be used for computation. This also forces the CPU to use up more cache throughput, which could be used more efficiently if there are not so many instructions that must compete with one another to use the cache. Also, with more instructions competing to use the cache, the chance that a cache miss rises. Cache misses usually stall the processor until the data is fetched from a larger and slower cache or from main memory, unless the processor has some form of hardware multithreading.
* CPU pipelines are getting really deep, forcing the use of many techniques to keep the pipelines full and the execution units busy. Unfortunately, these techniques only get you so far. The low amount of registers often creates bubbles, which are empty slots or groups of empty slots, in the pipeline which accomplish nothing but resolve dependencies. With more registers, the dependencies can often be spread out farther away from each other with other work whose data are in the extra registers, which can either reduce the number and size of the bubbles in the pipeline or completely eliminate them, resulting in greater efficiency.

The guarantee of SSE2 speeds things up because SSE2 can do anything that the x87 FPU can do but much more efficiently. With the old x87 FPU used in 16 and 32-bit modes, you must make sure that the data you want worked on are on the top two locations of the FPU stack, forcing the compiler or assembly programmer to create register exchange instructions that waste time. With SSE2, you can work on any two registers you desire, and can perform the same operation on small arrays of data if desired, raising efficiency. You also do not have to write code that checks to see if SSE2 is present or not, which invariably wastes time and program space.

Passing values to functions in the registers is much faster than passing them on the stack (a structure in memory), as is done in 32-bit mode and 16-bit mode. When a function is called in 32-bit mode or 16-bit mode, the arguments being passed are usually written to memory, and then are read from memory by the function that needs the data. In 64-bit mode, the arguments are kept in the registers unless there are too many arguments to keep in the registers under the calling convention being used, forcing the compiler or assembly programmer to push the remaining arguments onto the stack like in 32-bit or 16-bit mode.

Remember, registers are much faster than memory and caches. However, registers require huge amounts of chip real estate per bit compared to caches and RAM, so CPU designers cannot put many of them in a CPU and hope that the CPU remain inexpensive. Therefore, CPU designers must choose a good compromise between speed and cost when designing an architecture. The compiler's and the assembly programmer's responsibility is to use all of them effectively.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 418452019

RAC: 36954

And one more stone in this

23 Jan 2008 19:19:15 UTC

Message 76629

(moderation:

)

And one more stone in this way: integer computations are significantly faster than floating ones. Whith large registers why not to change some fp-operations (where the precision is limited) with integer computation whith fixed point. Or may be it is better to use SSE2 instead (I don't yet know SSE2 enough)?

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

RE: And one more stone in

23 Jan 2008 19:59:06 UTC

Message 76630 in response to message 76629

(moderation:

)

Quote:

And one more stone in this way: integer computations are significantly faster than floating ones.

It was true about 5-10 years ago.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 418452019

RAC: 36954

Let it be so. I'm familiar

24 Jan 2008 13:41:40 UTC

Message 76631

(moderation:

)

Let it be so. I'm familiar whith new processors architecture, espesially with new command subsets like SSE,SSE2,SSE2,SSSE2 and so on. But the key of this thread is that we should ask Bernd to compile a 64-bit binary. Or may be, you can try it yourself, Akos?

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

RE: But the key of this

24 Jan 2008 14:10:15 UTC

Message 76632 in response to message 76631

(moderation:

)

Quote:

But the key of this thread is that we should ask Bernd to compile a 64-bit binary. Or may be, you can try it yourself, Akos?

You should try to ask Bernd to compile a 64-bit binary.
I can't do it. I don't have access to the sources.
I would be glad to a x86-64 code...

Shawn

Joined: 3 Mar 05

Posts: 1

Credit: 2616558

RAC: 0

I think that unless you know

24 Jan 2008 21:39:51 UTC

Message 76633

(moderation:

)

I think that unless you know the code inside and out there wouldn't be much of a way to be sure if it would benefit from a 64bit recompile except trying it out. From what I understand porting to x64 is pretty easy. The main challenge is that, I do not believe there is a way to compile Win64 binaries with GCC (in that, there is no 64bit cygwin). Do you guys compile the Win32 binary with visual studio? Something else? How hard would it be to at least test it to see if it is worth bothering with?

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 133

The windows boinc code is all

24 Jan 2008 23:52:04 UTC

Message 76634

(moderation:

)

The windows boinc code is all written for the MS compiler. Bernd has tried (and failed) to get it to compile in gcc previously.

64-bit in Einstein crunching

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner