Linux S5R2 App 4.21 available for Beta test

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0

No, the Opterons seemed to

No, the Opterons seemed to get away unharmed, just like Michael's X2... and I think we had someone else here who complained about extreme runtimes on his Celeron. I checked and that box has 128 KB of cache... go figure... I mean, I'm of course not sure it depends on cache size but atm it seems more likely to me than "an AMD problem" based on the data we have...

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: No, the Opterons seemed

Message 63901 in response to message 63900

Quote:
No, the Opterons seemed to get away unharmed, just like Michael's X2... and I think we had someone else here who complained about extreme runtimes on his Celeron. I checked and that box has 128 KB of cache... go figure... I mean, I'm of course not sure it depends on cache size but atm it seems more likely to me than "an AMD problem" based on the data we have...

Careful... I think you're confusing AMD/Windows with AMD/Linux. The problem that had been seen was AMD/Windows. I'm suggesting that now AMD may also be affected under Linux...

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15,872,262
RAC: 0

RE: RE: So my first beta

Message 63902 in response to message 63898

Quote:
Quote:

So my first beta app result was just validated. Runtime was not longer than for WU with same credit for 4.18.

http://einsteinathome.org/task/83943971

CU

BRM

Mostly watching here, but if Annika's unit takes longer, it possibly indicates that yet again AMD systems take a hit...

The Linux app does not and did not punnish AMD users. ;-)
It's just the Windows compiler, that doesn't like AMD.
All my AMD/Linux systems experince about the same amount of credit reduction(4%-14%) as e.g. a Core 2 Duo.
Intel Macs run even better, maybe because there are some SSE instructions in the code.

cu,
Michael

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: RE: RE: So my first

Message 63903 in response to message 63899

Quote:
Quote:
Quote:

So my first beta app result was just validated. Runtime was not longer than for WU with same credit for 4.18.

http://einsteinathome.org/task/83943971

CU

BRM

Mostly watching here, but if Annika's unit takes longer, it possibly indicates that yet again AMD systems take a hit...

Yup, which is kind of sad. So I guess we need more beta-testers with AMD CPUs to corobborate this. Unfortunately the only AMDs I can offer are older ATHLON XPs (one Palomino & one T-bird), so far the slow down seemed to happen with the more modern Opterons and 64s, right?

CU

BRM

I think the performance hit involves anything AMD that is at least SSE-capable, so an Athlon XP should do for testing purposes...

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: The Linux app does not

Message 63904 in response to message 63902

Quote:

The Linux app does not and did not punnish AMD users. ;-)

Then I'm at a loss on why Annika's is running slowly... Are any of the "systems" really VMs?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,515
Credit: 450,909,869
RAC: 101,442

RE: The Linux app does not

Message 63905 in response to message 63902

Quote:


The Linux app does not and did not punnish AMD users. ;-)
It's just the Windows compiler, that doesn't like AMD.
All my AMD/Linux systems experince about the same amount of credit reduction(4%-14%) as e.g. a Core 2 Duo.
Intel Macs run even better, maybe because there are some SSE instructions in the code.

cu,
Michael

You might want to discuss this with Annika and Kirsten :-).

And as far as I know there are no SSE instructions in the code at all. No multiple code paths for different architectures either.

CU

H-B

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15,872,262
RAC: 0

RE: RE: The Linux app

Message 63906 in response to message 63905

Quote:
Quote:


The Linux app does not and did not punnish AMD users. ;-)
It's just the Windows compiler, that doesn't like AMD.
All my AMD/Linux systems experince about the same amount of credit reduction(4%-14%) as e.g. a Core 2 Duo.
Intel Macs run even better, maybe because there are some SSE instructions in the code.

cu,
Michael

You might want to discuss this with Annika and Kirsten :-).

Oh no, I better don't. ;-)
Kirsten: AMD/Win problem(-40% speed) and maybe something else.
Annika: We will find it out!
Seriosly there must be some other reason, my Knoppix host is cruching with 4.21 now and the progress is just normal. :)

Quote:

And as far as I know there are no SSE instructions in the code at all. No multiple code paths for different architectures either.
CU

H-B

True for Windows amd Linux. But a team member has debugged the Intel-Mac code and there are SSE instruction in it, probably created from the compiler by default, because there is no Intel-Mac without SSE capability.

cu,
Michael

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,515
Credit: 450,909,869
RAC: 101,442

**sorry, duplicated...hitting

Message 63907 in response to message 63900

**sorry, duplicated...hitting reply instead of edit :-( **

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0

RE: True for Windows amd

Message 63908 in response to message 63906

Quote:


True for Windows amd Linux. But a team member has debugged the Intel-Mac code and there are SSE instruction in it, probably created from the compiler by default, because there is no Intel-Mac without SSE capability.

cu,
Michael

Agreed, for the latest Mac compilers, you probably have to turn SSE support off explicitly to generate plain vanilla, and figure why would would anybody want to. ;-)

Alinator

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,515
Credit: 450,909,869
RAC: 101,442

RE: and I think we had

Message 63909 in response to message 63907

Quote:
and I think we had someone else here who complained about extreme runtimes on his Celeron. I checked and that box has 128 KB of cache...

I remember that, but the comment was made based on the forecast for the ETA, which can be off by miles. That Celeron was a Coppermine, we'll know only in a few days what the real runtime & credit was.

Today I downloaded (for fun) a document from AMD about optimizing code in C , C++ and assembler for the Opterons & Athlon 64 CPUs. This is nice reading (400 pages!!!). And it's surprising what huge penalties are suffered performance-wise for things you wouldn't expect.

http://www.compsci.wm.edu/SciClone/documentation/hardware/AMD/Opteron/OptimizationGuide.pdf

For example if you are working with 10 byte floating points and store them in an array without padding between elements, basically you are toast :-). The same if you work with local variables and they happen not to be quad-word aligned on the stack.

While many rules will hold for Intel CPUs as well, the sheer volume of this optimization guide gives an idea how difficult it must be to optimize for AMD and Intel alike, as Intel will, no doubt, have their own 400 page documents on how to optimize for the C2D , the Pentium 4, Pentium M /CD etc.

CU

BRM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.