Information about the new S5 workunits

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: Does anybody still have

Message 37887 in response to message 37886

Quote:
Does anybody still have a copy of the S5R1 or S5RI science app for Windows??


Of course!

Quote:
Is the string "AuthenticAMD" also appearing in those apps?


Yes. I checked. ( S5RI 4.24 windows )

roadrunner_gs
roadrunner_gs
Joined: 7 Mar 06
Posts: 94
Credit: 3369656
RAC: 0

Couldn't it be they now just

Couldn't it be they now just link against the mathlib from the ICC whereas before they linked against the Microsoft VCC standard-mathlib?

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Will something be done about

Will something be done about the AMD penalty under Windows? What are the project devs planning; it can't be in their interest that a significant percentage of boxes is running at 70% of their potential or less, so, do you plan to change this part of the app in the next release of the Einstein science app?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2987626652
RAC: 711560

RE: RE: Does anybody

Message 37890 in response to message 37887

Quote:
Quote:
Does anybody still have a copy of the S5R1 or S5RI science app for Windows??

Of course!
Quote:
Is the string "AuthenticAMD" also appearing in those apps?

Yes. I checked. ( S5RI 4.24 windows )


Do you have a test S5RI datapak so you could run an offline comparison of the patched 4.24 app - on an AMD SSE2, of course?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 774560235
RAC: 1263435

RE: RE: Does anybody

Message 37891 in response to message 37887

Quote:
Quote:
Does anybody still have a copy of the S5R1 or S5RI science app for Windows??

Of course!
Quote:
Is the string "AuthenticAMD" also appearing in those apps?

Yes. I checked. ( S5RI 4.24 windows )

Hmmm, so maybe the modf function wasn't used as heavily in the old app. Bernd mentioned something that the old app used some alternative to modf which was later found to be numerically suboptimal in the context of the new run.

Akos, you will know what I mean, something along the lines
frac = x - (UINT4) x instead of frac = modf(x,&dummy)

This explains why the old app didn't suffer.

As to the compiler, I guess Microsoft may have licensed Intel's math library, or maybe they use the Intel compiler to build their math lib (just kidding, don't sue me, MS ...).

CU

BRM

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: As to the compiler, I

Message 37892 in response to message 37891

Quote:

As to the compiler, I guess Microsoft may have licensed Intel's math library, or maybe they use the Intel compiler to build their math lib (just kidding, don't sue me, MS ...).

CU

BRM


Afaik you can download math libs from Intel and AMD for free. Don't know about the licence though.

cu,
Michael

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Update from the Opteron... a

Update from the Opteron... a full 50 percent increase! The WU isn't finished yet but it's more than half crunched so the estimate should be quite okay. Looks like that kind of box uses SSE2 a lot normally, therefore the huge 70% penalty and now the big performance increase. I think this kind of box will benefit most if a patch is applied on the large scale... okay, dunno how many people combine a server CPU and Windows, but still, it's a significant difference and 70% on a fast machine can have quite an effect even if there are not that many boxes of this kind around.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 774560235
RAC: 1263435

RE: Update from the

Message 37894 in response to message 37893

Quote:
Update from the Opteron... a full 50 percent increase! The WU isn't finished yet but it's more than half crunched so the estimate should be quite okay. Looks like that kind of box uses SSE2 a lot normally, therefore the huge 70% penalty and now the big performance increase. I think this kind of box will benefit most if a patch is applied on the large scale... okay, dunno how many people combine a server CPU and Windows, but still, it's a significant difference and 70% on a fast machine can have quite an effect even if there are not that many boxes of this kind around.

Hi all!

So cool!

I looked at that suspect code in the math lib again and I think that it is not "evil", just not correct. It could be that whoever wrote this, didn't want to exclude all AMDs from SSE2 but only a certain processor family. After the comparison with the string "AuthenticAMD", the code does some more arithmetic with the CPU model and extended CPU model info, (my assembly language knowledge isn't that good anymore), it's possible that the intention was to exclude only the first generation of 130nm "Newcastle" K8s. I think what it actually does might be the opposite: enable SSE2 on the Newcastles and disabling it for all others.

Was there something wrong with the Newcastle SSE2 implementation? I didn't find anything by googling. Maybe it was just plain slow??

CU

BRM

roadrunner_gs
roadrunner_gs
Joined: 7 Mar 06
Posts: 94
Credit: 3369656
RAC: 0

RE: (...) I looked at that

Message 37895 in response to message 37894

Quote:
(...)
I looked at that suspect code in the math lib again and I think that it is not "evil", just not correct. It could be that whoever wrote this, didn't want to exclude all AMDs from SSE2 but only a certain processor family. After the comparison with the string "AuthenticAMD", the code does some more arithmetic with the CPU model and extended CPU model info, (my assembly language knowledge isn't that good anymore), it's possible that the intention was to exclude only the first generation of 130nm "Newcastle" K8s. I think what it actually does might be the opposite: enable SSE2 on the Newcastles and disabling it for all others.

look here

Quote:
Was there something wrong with the Newcastle SSE2 implementation? I didn't find anything by googling. Maybe it was just plain slow??
(...)

Not as i know, but i go searching.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 774560235
RAC: 1263435

RE: RE: (...) I looked at

Message 37896 in response to message 37895

Quote:
Quote:
(...)
I looked at that suspect code in the math lib again and I think that it is not "evil", just not correct. It could be that whoever wrote this, didn't want to exclude all AMDs from SSE2 but only a certain processor family. After the comparison with the string "AuthenticAMD", the code does some more arithmetic with the CPU model and extended CPU model info, (my assembly language knowledge isn't that good anymore), it's possible that the intention was to exclude only the first generation of 130nm "Newcastle" K8s. I think what it actually does might be the opposite: enable SSE2 on the Newcastles and disabling it for all others.

look here

Quote:
Was there something wrong with the Newcastle SSE2 implementation? I didn't find anything by googling. Maybe it was just plain slow??
(...)

Not as i know, but i go searching.


Clawhammer and Newcastle, that is. Everything that would report Family 15, extended family 0

CU

BRM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.