Uptake of SSE4 and SSE5 in the E@H userbase ... what is it?
SSE4 is an incoherent mess :
http://en.wikipedia.org/wiki/SSE4
Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as SSE4.1 in some Intel documentation, is available in Penryn.
Additionally, SSE4.2, a second subset consisting of the 7 remaining instructions, is first available in Nehalem-based Core i7. Intel credits feedback from developers as playing an important role in the development of the instruction set.
AMD currently supports only 4 instructions from the SSE4 instruction set, but have also added two new SSE instructions that is named SSE4a. These instructions are not found in Intel's processors supporting SSE4.1 and alternatively AMD processors aren't supporting Intel's SSE4.1.
Support was added for SSE4a for unaligned SSE load-operation instructions (which formerly required 16-byte alignment).
SSE5 is less of mess, but still :
http://en.wikipedia.org/wiki/SSE5
The proposed SSE5 instruction set consisted of 170 instructions (including 46 base instructions), many of which are designed to improve single-threaded performance.
Some SSE5 instructions are 3-operand instructions, the use of which will increase the average number of instructions per cycle achievable by x86 code.
AMD claims SSE5 will provide dramatic performance improvements, particularly in high-performance computing (HPC), multimedia and computer security applications, including a 5x performance gain for Advanced Encryption Standard (AES) encryption and a 30% performance gain for discrete cosine transform (DCT) used to process video streams.
Copyright © 2024 Einstein@Home. All rights reserved.
Uptake of SSE4 and SSE5 in the E@H userbase ... what is it?
)
IIRC, SSE3 was found to offer minimal (if any) performance advantage over SSE2 in the Global Correlations HF app, and the Binary Radio Pulsar (BRP3) app only goes up to SSE. So, while I'm sure the number of SSE4x-capable hosts is growing, I doubt that there will be any movement toward incorporating instruction sets beyond SSE2 into the current science apps. I also don't see much traction for SSE5 (or more accurately, XOP+FMA4+CVT16), as it will only be available on AMD's Bulldozer CPUs which are not even out yet.
If I had to guess, I would say the next big development - at least on the CPU front - will be support for Intel's AVX instruction set. From I have read, AVX has the potential for significant performance gains over SSEx. What remains to be seen is whether or not that potential can be realized here at E@H given the type of science that we're doing.
-- Tony D.
Yeah, as I just said in
)
Yeah, as I just said in another thread:
MrS
Scanning for our furry friends since Jan 2002
Hi , difference between
)
Hi , difference between SSSE3x and SSE4.1 and on i7 e.a also SSE4.2, aren't that big, very dependable on the AR of the WU.
But AMD CPU's don't use SSSE3 or SSE4.1, only the latest models.(If they do?)
And performance gain from SSE3 and SSSE3x, is quite big, also depending on CPU, memory speed. The CPU that loads the 480 is a X9650 @ 3.5GHz (3570FLOPS/core)
IMHO, they're talking about something else.........?
Talking about speed gain: the GTX480 does a WU in 50min, while a CPU (~2500FLOPS)
takes 16 hour to complete the same WU.
Example 1
CUDA vs CUDA.
RE: Hi , difference between
)
Fred, the concept of the Angle Range of a workunit is only meaningful for a SETI@home task. You're posting on the Einstein@home message board here... ;-)
RE: Fred, the concept of
)
My bad, yep quite stupid. :-/
Thanks for correcting though.