Uptake of SSE4 and SSE5 in the E@H userbase ... what is it?

MP
MP
Joined: 9 Feb 05
Posts: 29
Credit: 761,523
RAC: 0
Topic 195674

Uptake of SSE4 and SSE5 in the E@H userbase ... what is it?

SSE4 is an incoherent mess :

http://en.wikipedia.org/wiki/SSE4

Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as SSE4.1 in some Intel documentation, is available in Penryn.

Additionally, SSE4.2, a second subset consisting of the 7 remaining instructions, is first available in Nehalem-based Core i7. Intel credits feedback from developers as playing an important role in the development of the instruction set.

AMD currently supports only 4 instructions from the SSE4 instruction set, but have also added two new SSE instructions that is named SSE4a. These instructions are not found in Intel's processors supporting SSE4.1 and alternatively AMD processors aren't supporting Intel's SSE4.1.

Support was added for SSE4a for unaligned SSE load-operation instructions (which formerly required 16-byte alignment).

SSE5 is less of mess, but still :
http://en.wikipedia.org/wiki/SSE5

The proposed SSE5 instruction set consisted of 170 instructions (including 46 base instructions), many of which are designed to improve single-threaded performance.

Some SSE5 instructions are 3-operand instructions, the use of which will increase the average number of instructions per cycle achievable by x86 code.

AMD claims SSE5 will provide dramatic performance improvements, particularly in high-performance computing (HPC), multimedia and computer security applications, including a 5x performance gain for Advanced Encryption Standard (AES) encryption and a 30% performance gain for discrete cosine transform (DCT) used to process video streams.

Tony DeBari
Tony DeBari
Joined: 29 Apr 05
Posts: 30
Credit: 38,576,823
RAC: 0

Uptake of SSE4 and SSE5 in the E@H userbase ... what is it?

IIRC, SSE3 was found to offer minimal (if any) performance advantage over SSE2 in the Global Correlations HF app, and the Binary Radio Pulsar (BRP3) app only goes up to SSE. So, while I'm sure the number of SSE4x-capable hosts is growing, I doubt that there will be any movement toward incorporating instruction sets beyond SSE2 into the current science apps. I also don't see much traction for SSE5 (or more accurately, XOP+FMA4+CVT16), as it will only be available on AMD's Bulldozer CPUs which are not even out yet.

If I had to guess, I would say the next big development - at least on the CPU front - will be support for Intel's AVX instruction set. From I have read, AVX has the potential for significant performance gains over SSEx. What remains to be seen is whether or not that potential can be realized here at E@H given the type of science that we're doing.

-- Tony D.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 546,680,675
RAC: 195,533

Yeah, as I just said in

Yeah, as I just said in another thread:

myself wrote:
With Win 7 SP1 just released we could start to use AVX. Devs.. any word on this?
So far AVX has shown impressive speed gains in synthetic benchmarks (80% to 90% gain), but doesn't have much real world usage yet. At Einstein you could probably use it on 256 bit vectors anywhere where you're now using SSE2 128 bit vectors. Another code path.. but should be worth a little recompile ;)

MrS

Scanning for our furry friends since Jan 2002

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0

Hi , difference between

Hi , difference between SSSE3x and SSE4.1 and on i7 e.a also SSE4.2, aren't that big, very dependable on the AR of the WU.
But AMD CPU's don't use SSSE3 or SSE4.1, only the latest models.(If they do?)
And performance gain from SSE3 and SSSE3x, is quite big, also depending on CPU, memory speed. The CPU that loads the 480 is a X9650 @ 3.5GHz (3570FLOPS/core)
IMHO, they're talking about something else.........?
Talking about speed gain: the GTX480 does a WU in 50min, while a CPU (~2500FLOPS)
takes 16 hour to complete the same WU.
Example 1
CUDA vs CUDA.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,142
Credit: 2,797,982,585
RAC: 792,794

RE: Hi , difference between

Quote:
Hi , difference between SSSE3x and SSE4.1 and on i7 e.a also SSE4.2, aren't that big, very dependable on the AR of the WU....


Fred, the concept of the Angle Range of a workunit is only meaningful for a SETI@home task. You're posting on the Einstein@home message board here... ;-)

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0

RE: Fred, the concept of

Quote:

Fred, the concept of the Angle Range of a workunit is only meaningful for a SETI@home task. You're posting on the Einstein@home message board here... ;-)

My bad, yep quite stupid. :-/
Thanks for correcting though.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.