Im not paranoid enough to think intel deliberately crippled the compiler for AMD, i mean, how much work would THAT be, and could it even be done without losing lots of optimalization for their own processors?
Simple, the compiler checks for "GenuineIntel" and runs the best optimized code available. For anything else, it runs the same code it would for a Pentium II.
At compile time? Should depend on compiler flags set. How about at run-time, you get an app with the binaries already compiled, then i doubt they could pull of a significant slowing down on AMDs without a trade-off where also Intels would run slower than optimal.
At compile time? Should depend on compiler flags set. How about at run-time, you get an app with the binaries already compiled, then i doubt they could pull of a significant slowing down on AMDs without a trade-off where also Intels would run slower than optimal.
At run-time, when the program starts, the processor is probed. Then it sets up the entries in an jump-table to select which key math routines are run.
It doesn't cost much, as it already probes the processor for support of MMX, SSE, SSE2, etc, and then uses the best routine for a given processor. Of course, Intel considers the "best" routine for non-Intel processors the worst it can choose.
Just read the write-up linked to above and it's crystal clear what Intel did. And it's no new news either, as you can see by its date. It's been around since version 8 of their compiler and is very well known in the industry.
The part about the Fortran compiler is interesting enough, but considered how old that article is i dont know how much to put into it for current gen CPUs.
Being faster than the MS compiler is easy, as it doesn't support vectorization. But you're right, sometimes using the Intel compiler even crippling non-Intel processors is better than nothing.
At compile time? Should depend on compiler flags set. How about at run-time, you get an app with the binaries already compiled, then i doubt they could pull of a significant slowing down on AMDs without a trade-off where also Intels would run slower than optimal.
As mentioned, it is trivial for the Intel compiler to add additional code (and not even code specified by the programmer) to check the CPU manufacturer's name at run-time and then run 'whatever code'.
Note that the compiler can insert any code the compiler writers wish, regardless of what the program being compiled might specify. You only get to find out what really has been coded if you look to see what machine code the compiler has generated...
I guess inserting a NOPs loop would be a little too obvious if anyone ever took the trouble to look...
Meanwhile, a LOT of time and effort can be (and I'm sure has been) needlessly wasted.
Somewhere here on this forum is a post,explaining certain slow down for AMD CPUs with Intel Mathlib.It is caused by misdetection.It was attempted to rule-out I think- first Athlons because of wrong implementation of SSE.However misdetection took out every OTHER AMD and left the wrong ones.It by the way affected even some Celerons at least... part of discussion is in S2R2 thread.
Somewhere here on this forum is a post,explaining certain slow down for AMD CPUs with Intel Mathlib.It is caused by misdetection.It was attempted to rule-out I think- first Athlons because of wrong implementation of SSE.However misdetection took out every OTHER AMD and left the wrong ones.It by the way affected even some Celerons at least... part of discussion is in S2R2 thread.
Maybe Bikeman could know?
Yup, we (Annika, Ziegenmelker and I ) detected this problem a few month ago. I think Akos discovered the same thing around that time.
On paper, this looked exactly like "naughty" the Intel maltlib runtime check:
At startup of the app, the math library that was linked by the MS Visual C++ compiler version used for E@H at that time, would use the CPUID instruction to find out whether the crunchers CPU supports SSE2 or not, to toggle between different code paths for some math library functions. But then the detection code went on to check whether the Vendor string reported by the CPU was equal to "AthenticAMD" and whether the "family" of the CPU was 15 (ALL (!) AMD K8). In that case, it would disable SSE2 codepaths even if CPUID reported that it was supported!
So if you hacked the app to change the comparision string constant "AuthenticAMD" into "AuthenticABC", the AMD detection would fail to recognize it's an AMD K8, and voila, the app at that time ran ca. 20% faster on SSE2 enabled AMDs (because the non SSE codepaths uincluded a really slow implementation of a specific function, modf).
I guess the real intention behind this CPU detection code was to exclude only first generation, 130nm K8s because their SSE2 implementation was so slow that the standard x87 FPU codepaths would indeed have been faster. Instead, all K8s were slowed down :-(.
Since then, a newer version of the Visual C++ compiler was used by E@H so there's no need anymore for hacks like this.
... Since then, a newer version of the Visual C++ compiler was used by E@H so there's no need anymore for hacks like this.
'Fixed' as in that you are no longer using the Intel compiler?
Regards,
Martin
The problem occured with a Microsoft compiler, not the Intel Compiler. Since then a newer version of the Microsoft compiler was used which no longer checks for the "AuthenticAMD" vendor CPU info.
RE: Im not paranoid enough
)
Simple, the compiler checks for "GenuineIntel" and runs the best optimized code available. For anything else, it runs the same code it would for a Pentium II.
At compile time? Should
)
At compile time? Should depend on compiler flags set. How about at run-time, you get an app with the binaries already compiled, then i doubt they could pull of a significant slowing down on AMDs without a trade-off where also Intels would run slower than optimal.
Team Philippines
RE: At compile time? Should
)
At run-time, when the program starts, the processor is probed. Then it sets up the entries in an jump-table to select which key math routines are run.
It doesn't cost much, as it already probes the processor for support of MMX, SSE, SSE2, etc, and then uses the best routine for a given processor. Of course, Intel considers the "best" routine for non-Intel processors the worst it can choose.
Just read the write-up linked to above and it's crystal clear what Intel did. And it's no new news either, as you can see by its date. It's been around since version 8 of their compiler and is very well known in the industry.
Look at never stuff, "P4 1.7"
)
Look at never stuff, "P4 1.7" sounds like socket 423, not even Northwood gen.
Intels compiler is faster also for AMD in many cases:
http://techreport.com/articles.x/8369/10
http://techreport.com/articles.x/8369/13 (bottom of the page)
(Lower is better)
The part about the Fortran compiler is interesting enough, but considered how old that article is i dont know how much to put into it for current gen CPUs.
Team Philippines
RE: Intels compiler is
)
Being faster than the MS compiler is easy, as it doesn't support vectorization. But you're right, sometimes using the Intel compiler even crippling non-Intel processors is better than nothing.
RE: At compile time? Should
)
As mentioned, it is trivial for the Intel compiler to add additional code (and not even code specified by the programmer) to check the CPU manufacturer's name at run-time and then run 'whatever code'.
Note that the compiler can insert any code the compiler writers wish, regardless of what the program being compiled might specify. You only get to find out what really has been coded if you look to see what machine code the compiler has generated...
I guess inserting a NOPs loop would be a little too obvious if anyone ever took the trouble to look...
Meanwhile, a LOT of time and effort can be (and I'm sure has been) needlessly wasted.
All very 'silly'.
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
Somewhere here on this forum
)
Somewhere here on this forum is a post,explaining certain slow down for AMD CPUs with Intel Mathlib.It is caused by misdetection.It was attempted to rule-out I think- first Athlons because of wrong implementation of SSE.However misdetection took out every OTHER AMD and left the wrong ones.It by the way affected even some Celerons at least... part of discussion is in S2R2 thread.
Maybe Bikeman could know?
RE: Somewhere here on this
)
Yup, we (Annika, Ziegenmelker and I ) detected this problem a few month ago. I think Akos discovered the same thing around that time.
On paper, this looked exactly like "naughty" the Intel maltlib runtime check:
At startup of the app, the math library that was linked by the MS Visual C++ compiler version used for E@H at that time, would use the CPUID instruction to find out whether the crunchers CPU supports SSE2 or not, to toggle between different code paths for some math library functions. But then the detection code went on to check whether the Vendor string reported by the CPU was equal to "AthenticAMD" and whether the "family" of the CPU was 15 (ALL (!) AMD K8). In that case, it would disable SSE2 codepaths even if CPUID reported that it was supported!
So if you hacked the app to change the comparision string constant "AuthenticAMD" into "AuthenticABC", the AMD detection would fail to recognize it's an AMD K8, and voila, the app at that time ran ca. 20% faster on SSE2 enabled AMDs (because the non SSE codepaths uincluded a really slow implementation of a specific function, modf).
I guess the real intention behind this CPU detection code was to exclude only first generation, 130nm K8s because their SSE2 implementation was so slow that the standard x87 FPU codepaths would indeed have been faster. Instead, all K8s were slowed down :-(.
Since then, a newer version of the Visual C++ compiler was used by E@H so there's no need anymore for hacks like this.
CU
H-BE
RE: ... Since then, a newer
)
'Fixed' as in that you are no longer using the Intel compiler?
Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: RE: ... Since then, a
)
The problem occured with a Microsoft compiler, not the Intel Compiler. Since then a newer version of the Microsoft compiler was used which no longer checks for the "AuthenticAMD" vendor CPU info.
CU
Bikeman