Why the short due date with long workunits?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 474,383,691
RAC: 111,993

RE: I think all PI-MMX's

Message 67006 in response to message 67004

Quote:
I think all PI-MMX's are going to be hard pressed to meet any but the lowest template frequency WU's at this point, based on what my K6's have done even taking into account the stronger FPU's in them. Assuming the project team can roughly half the runtime once they optimize it should open them up to a wider range of frequencies, but if the deadline stays at two weeks EAH will be a very tight deadline project for them.

The expected optimization may half the runtime per workunit, but only for (at least) SSE capable CPUs, so anything below a P III or Athlon XP won't benefit from SSE codepaths. However, the current Windows app is running quite slow on those clients compared to the Linux app, so a ca. 30% increase might be possible for those older CPUs as well under Windows.

CU

BRM

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0

RE: RE: I think all

Message 67007 in response to message 67006

Quote:
Quote:
I think all PI-MMX's are going to be hard pressed to meet any but the lowest template frequency WU's at this point, based on what my K6's have done even taking into account the stronger FPU's in them. Assuming the project team can roughly half the runtime once they optimize it should open them up to a wider range of frequencies, but if the deadline stays at two weeks EAH will be a very tight deadline project for them.

The expected optimization may half the runtime per workunit, but only for (at least) SSE capable CPUs, so anything below a P III or Athlon XP won't benefit from SSE codepaths. However, the current Windows app is running quite slow on those clients compared to the Linux app, so a ca. 30% increase might be possible for those older CPUs as well under Windows.

CU

BRM

Well I don't know where you got that from. When Akos worked over the S5R1 apps he didn't leave the old timers out then (ie non-SSE), performance improved by a factor 2 for them. I haven't seen anything said about not doing any thing for them this time around, so I would expect there to be comparable gains all other things being equal.

Alinator

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 474,383,691
RAC: 111,993

RE: Well I don't know

Message 67008 in response to message 67007

Quote:


Well I don't know where you got that from. When Akos worked over the S5R1 apps he didn't leave the old timers out then (ie non-SSE), performance improved by a factor 2 for them. I haven't seen anything said about not doing any thing for them this time around, so I would expect there to be comparable gains all other things being equal.

Alinator

Akos already helped to improve the C source code of the current app's hot loop, so it's not completely un-optimized;-). Any further improvements (short of using SSE(n) instructions) would have to come from handcoding the algorithm in assembly language, and modern compilers are not that bad that you can expect a speedup by the factor of 2 when using the same instruction set.

I've taken a look at the compiler output and it's not all that bad, actually, except for one thing that will be corrected soon and will hopefully bring performance parity between Windows and Linux.

Maybe Akos can do magic again, I just think it's unfair to expect that with every iteration of optimization, a factor of 2 can be achieved.

CU

BRM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.