AMD reveals plans for "SSE5"

ebahapo
ebahapo
Joined: 22 Jan 05
Posts: 47
Credit: 755276
RAC: 0

RE: Im not paranoid enough

Message 72749 in response to message 72748

Quote:
Im not paranoid enough to think intel deliberately crippled the compiler for AMD, i mean, how much work would THAT be, and could it even be done without losing lots of optimalization for their own processors?


Simple, the compiler checks for "GenuineIntel" and runs the best optimized code available. For anything else, it runs the same code it would for a Pentium II.

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

At compile time? Should

At compile time? Should depend on compiler flags set. How about at run-time, you get an app with the binaries already compiled, then i doubt they could pull of a significant slowing down on AMDs without a trade-off where also Intels would run slower than optimal.

ebahapo
ebahapo
Joined: 22 Jan 05
Posts: 47
Credit: 755276
RAC: 0

RE: At compile time? Should

Message 72751 in response to message 72750

Quote:
At compile time? Should depend on compiler flags set. How about at run-time, you get an app with the binaries already compiled, then i doubt they could pull of a significant slowing down on AMDs without a trade-off where also Intels would run slower than optimal.


At run-time, when the program starts, the processor is probed. Then it sets up the entries in an jump-table to select which key math routines are run.

It doesn't cost much, as it already probes the processor for support of MMX, SSE, SSE2, etc, and then uses the best routine for a given processor. Of course, Intel considers the "best" routine for non-Intel processors the worst it can choose.

Just read the write-up linked to above and it's crystal clear what Intel did. And it's no new news either, as you can see by its date. It's been around since version 8 of their compiler and is very well known in the industry.

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

Look at never stuff, "P4 1.7"

Look at never stuff, "P4 1.7" sounds like socket 423, not even Northwood gen.

Intels compiler is faster also for AMD in many cases:
http://techreport.com/articles.x/8369/10
http://techreport.com/articles.x/8369/13 (bottom of the page)
(Lower is better)

The part about the Fortran compiler is interesting enough, but considered how old that article is i dont know how much to put into it for current gen CPUs.

ebahapo
ebahapo
Joined: 22 Jan 05
Posts: 47
Credit: 755276
RAC: 0

RE: Intels compiler is

Message 72753 in response to message 72752

Quote:
Intels compiler is faster also for AMD in many cases:
http://techreport.com/articles.x/8369/10
http://techreport.com/articles.x/8369/13 (bottom of the page)
(Lower is better)


Being faster than the MS compiler is easy, as it doesn't support vectorization. But you're right, sometimes using the Intel compiler even crippling non-Intel processors is better than nothing.

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 51

RE: At compile time? Should

Message 72754 in response to message 72750

Quote:
At compile time? Should depend on compiler flags set. How about at run-time, you get an app with the binaries already compiled, then i doubt they could pull of a significant slowing down on AMDs without a trade-off where also Intels would run slower than optimal.


As mentioned, it is trivial for the Intel compiler to add additional code (and not even code specified by the programmer) to check the CPU manufacturer's name at run-time and then run 'whatever code'.

Note that the compiler can insert any code the compiler writers wish, regardless of what the program being compiled might specify. You only get to find out what really has been coded if you look to see what machine code the compiler has generated...

I guess inserting a NOPs loop would be a little too obvious if anyone ever took the trouble to look...

Meanwhile, a LOT of time and effort can be (and I'm sure has been) needlessly wasted.

All very 'silly'.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Klimax
Klimax
Joined: 27 Apr 07
Posts: 87
Credit: 1370205
RAC: 0

Somewhere here on this forum

Somewhere here on this forum is a post,explaining certain slow down for AMD CPUs with Intel Mathlib.It is caused by misdetection.It was attempted to rule-out I think- first Athlons because of wrong implementation of SSE.However misdetection took out every OTHER AMD and left the wrong ones.It by the way affected even some Celerons at least... part of discussion is in S2R2 thread.

Maybe Bikeman could know?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 756963739
RAC: 1162337

RE: Somewhere here on this

Message 72756 in response to message 72755

Quote:

Somewhere here on this forum is a post,explaining certain slow down for AMD CPUs with Intel Mathlib.It is caused by misdetection.It was attempted to rule-out I think- first Athlons because of wrong implementation of SSE.However misdetection took out every OTHER AMD and left the wrong ones.It by the way affected even some Celerons at least... part of discussion is in S2R2 thread.

Maybe Bikeman could know?

Yup, we (Annika, Ziegenmelker and I ) detected this problem a few month ago. I think Akos discovered the same thing around that time.

On paper, this looked exactly like "naughty" the Intel maltlib runtime check:

At startup of the app, the math library that was linked by the MS Visual C++ compiler version used for E@H at that time, would use the CPUID instruction to find out whether the crunchers CPU supports SSE2 or not, to toggle between different code paths for some math library functions. But then the detection code went on to check whether the Vendor string reported by the CPU was equal to "AthenticAMD" and whether the "family" of the CPU was 15 (ALL (!) AMD K8). In that case, it would disable SSE2 codepaths even if CPUID reported that it was supported!

So if you hacked the app to change the comparision string constant "AuthenticAMD" into "AuthenticABC", the AMD detection would fail to recognize it's an AMD K8, and voila, the app at that time ran ca. 20% faster on SSE2 enabled AMDs (because the non SSE codepaths uincluded a really slow implementation of a specific function, modf).

I guess the real intention behind this CPU detection code was to exclude only first generation, 130nm K8s because their SSE2 implementation was so slow that the standard x87 FPU codepaths would indeed have been faster. Instead, all K8s were slowed down :-(.

Since then, a newer version of the Visual C++ compiler was used by E@H so there's no need anymore for hacks like this.

CU

H-BE

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86563414
RAC: 51

RE: ... Since then, a newer

Message 72757 in response to message 72756

Quote:
... Since then, a newer version of the Visual C++ compiler was used by E@H so there's no need anymore for hacks like this.


'Fixed' as in that you are no longer using the Intel compiler?

Regards,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 756963739
RAC: 1162337

RE: RE: ... Since then, a

Message 72758 in response to message 72757

Quote:
Quote:
... Since then, a newer version of the Visual C++ compiler was used by E@H so there's no need anymore for hacks like this.

'Fixed' as in that you are no longer using the Intel compiler?

Regards,
Martin

The problem occured with a Microsoft compiler, not the Intel Compiler. Since then a newer version of the Microsoft compiler was used which no longer checks for the "AuthenticAMD" vendor CPU info.

CU

Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.