No, the Opterons seemed to get away unharmed, just like Michael's X2... and I think we had someone else here who complained about extreme runtimes on his Celeron. I checked and that box has 128 KB of cache... go figure... I mean, I'm of course not sure it depends on cache size but atm it seems more likely to me than "an AMD problem" based on the data we have...
No, the Opterons seemed to get away unharmed, just like Michael's X2... and I think we had someone else here who complained about extreme runtimes on his Celeron. I checked and that box has 128 KB of cache... go figure... I mean, I'm of course not sure it depends on cache size but atm it seems more likely to me than "an AMD problem" based on the data we have...
Careful... I think you're confusing AMD/Windows with AMD/Linux. The problem that had been seen was AMD/Windows. I'm suggesting that now AMD may also be affected under Linux...
Mostly watching here, but if Annika's unit takes longer, it possibly indicates that yet again AMD systems take a hit...
The Linux app does not and did not punnish AMD users. ;-)
It's just the Windows compiler, that doesn't like AMD.
All my AMD/Linux systems experince about the same amount of credit reduction(4%-14%) as e.g. a Core 2 Duo.
Intel Macs run even better, maybe because there are some SSE instructions in the code.
Mostly watching here, but if Annika's unit takes longer, it possibly indicates that yet again AMD systems take a hit...
Yup, which is kind of sad. So I guess we need more beta-testers with AMD CPUs to corobborate this. Unfortunately the only AMDs I can offer are older ATHLON XPs (one Palomino & one T-bird), so far the slow down seemed to happen with the more modern Opterons and 64s, right?
CU
BRM
I think the performance hit involves anything AMD that is at least SSE-capable, so an Athlon XP should do for testing purposes...
The Linux app does not and did not punnish AMD users. ;-)
It's just the Windows compiler, that doesn't like AMD.
All my AMD/Linux systems experince about the same amount of credit reduction(4%-14%) as e.g. a Core 2 Duo.
Intel Macs run even better, maybe because there are some SSE instructions in the code.
cu,
Michael
You might want to discuss this with Annika and Kirsten :-).
And as far as I know there are no SSE instructions in the code at all. No multiple code paths for different architectures either.
The Linux app does not and did not punnish AMD users. ;-)
It's just the Windows compiler, that doesn't like AMD.
All my AMD/Linux systems experince about the same amount of credit reduction(4%-14%) as e.g. a Core 2 Duo.
Intel Macs run even better, maybe because there are some SSE instructions in the code.
cu,
Michael
You might want to discuss this with Annika and Kirsten :-).
Oh no, I better don't. ;-)
Kirsten: AMD/Win problem(-40% speed) and maybe something else.
Annika: We will find it out!
Seriosly there must be some other reason, my Knoppix host is cruching with 4.21 now and the progress is just normal. :)
Quote:
And as far as I know there are no SSE instructions in the code at all. No multiple code paths for different architectures either.
CU
H-B
True for Windows amd Linux. But a team member has debugged the Intel-Mac code and there are SSE instruction in it, probably created from the compiler by default, because there is no Intel-Mac without SSE capability.
True for Windows amd Linux. But a team member has debugged the Intel-Mac code and there are SSE instruction in it, probably created from the compiler by default, because there is no Intel-Mac without SSE capability.
cu,
Michael
Agreed, for the latest Mac compilers, you probably have to turn SSE support off explicitly to generate plain vanilla, and figure why would would anybody want to. ;-)
and I think we had someone else here who complained about extreme runtimes on his Celeron. I checked and that box has 128 KB of cache...
I remember that, but the comment was made based on the forecast for the ETA, which can be off by miles. That Celeron was a Coppermine, we'll know only in a few days what the real runtime & credit was.
Today I downloaded (for fun) a document from AMD about optimizing code in C , C++ and assembler for the Opterons & Athlon 64 CPUs. This is nice reading (400 pages!!!). And it's surprising what huge penalties are suffered performance-wise for things you wouldn't expect.
For example if you are working with 10 byte floating points and store them in an array without padding between elements, basically you are toast :-). The same if you work with local variables and they happen not to be quad-word aligned on the stack.
While many rules will hold for Intel CPUs as well, the sheer volume of this optimization guide gives an idea how difficult it must be to optimize for AMD and Intel alike, as Intel will, no doubt, have their own 400 page documents on how to optimize for the C2D , the Pentium 4, Pentium M /CD etc.
No, the Opterons seemed to
)
No, the Opterons seemed to get away unharmed, just like Michael's X2... and I think we had someone else here who complained about extreme runtimes on his Celeron. I checked and that box has 128 KB of cache... go figure... I mean, I'm of course not sure it depends on cache size but atm it seems more likely to me than "an AMD problem" based on the data we have...
RE: No, the Opterons seemed
)
Careful... I think you're confusing AMD/Windows with AMD/Linux. The problem that had been seen was AMD/Windows. I'm suggesting that now AMD may also be affected under Linux...
RE: RE: So my first beta
)
The Linux app does not and did not punnish AMD users. ;-)
It's just the Windows compiler, that doesn't like AMD.
All my AMD/Linux systems experince about the same amount of credit reduction(4%-14%) as e.g. a Core 2 Duo.
Intel Macs run even better, maybe because there are some SSE instructions in the code.
cu,
Michael
RE: RE: RE: So my first
)
I think the performance hit involves anything AMD that is at least SSE-capable, so an Athlon XP should do for testing purposes...
RE: The Linux app does not
)
Then I'm at a loss on why Annika's is running slowly... Are any of the "systems" really VMs?
RE: The Linux app does not
)
You might want to discuss this with Annika and Kirsten :-).
And as far as I know there are no SSE instructions in the code at all. No multiple code paths for different architectures either.
CU
H-B
RE: RE: The Linux app
)
Oh no, I better don't. ;-)
Kirsten: AMD/Win problem(-40% speed) and maybe something else.
Annika: We will find it out!
Seriosly there must be some other reason, my Knoppix host is cruching with 4.21 now and the progress is just normal. :)
True for Windows amd Linux. But a team member has debugged the Intel-Mac code and there are SSE instruction in it, probably created from the compiler by default, because there is no Intel-Mac without SSE capability.
cu,
Michael
**sorry, duplicated...hitting
)
**sorry, duplicated...hitting reply instead of edit :-( **
RE: True for Windows amd
)
Agreed, for the latest Mac compilers, you probably have to turn SSE support off explicitly to generate plain vanilla, and figure why would would anybody want to. ;-)
Alinator
RE: and I think we had
)
I remember that, but the comment was made based on the forecast for the ETA, which can be off by miles. That Celeron was a Coppermine, we'll know only in a few days what the real runtime & credit was.
Today I downloaded (for fun) a document from AMD about optimizing code in C , C++ and assembler for the Opterons & Athlon 64 CPUs. This is nice reading (400 pages!!!). And it's surprising what huge penalties are suffered performance-wise for things you wouldn't expect.
http://www.compsci.wm.edu/SciClone/documentation/hardware/AMD/Opteron/OptimizationGuide.pdf
For example if you are working with 10 byte floating points and store them in an array without padding between elements, basically you are toast :-). The same if you work with local variables and they happen not to be quad-word aligned on the stack.
While many rules will hold for Intel CPUs as well, the sheer volume of this optimization guide gives an idea how difficult it must be to optimize for AMD and Intel alike, as Intel will, no doubt, have their own 400 page documents on how to optimize for the C2D , the Pentium 4, Pentium M /CD etc.
CU
BRM