curious performance enhancement or punishment?

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1000560
RAC: 0

I've made up a chart which

I've made up a chart which shows all the relative "Avg cpu seconds" and another column which shows processor cpu seconds/Q6600 cpu seconds, or the ratio of how much faster the Q6600 is compared to the AMD processors on all those projects. You can see the disparity is much wider (so far) on Einstein, then the other projects.

simap / ratio
Q6600 2065 1
9950one 2243 1.09
9950two 2287 1.11
9600 2492 1.21
x2 6000 2115 1.02
x2 5200 2253 1.07
x2 4800 2484 1.2

Primegrid / ratio
Q6600 817 1
9950one 924 1.13
9950two 921 1.13
9600 1016 1.24
x2 6000 827 1.01
x2 5200 915 1.12
x2 4800 1022 1.25

Docking / ratio
Q6600 3008 1
9950one 3568 1.19
9950two 3622 1.2
9600 4280 1.42
x2 6000 3292 1.09
x2 5200 3580 1.19
x2 4800 4029 1.34

Einstein / ratio
6600 21666 1
9950one 33308 1.54
9950two 32556 1.5
9600 47567 2.2
x2 6000 30275 1.4
x2 5200 35385 1.63
x2 4800 38489 1.78

John Clark
John Clark
Joined: 4 May 07
Posts: 1087
Credit: 3143193
RAC: 0

I know this has been stateed

I know this has been stateed before, but is it not better to express the CPU output per hour times the number of CPUs present. This value is much easier to understand.

Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 764717862
RAC: 1063064

RE: I've made up a chart

Message 85912 in response to message 85910

Quote:
I've made up a chart which shows all the relative "Avg cpu seconds" and another column which shows processor cpu seconds/Q6600 cpu seconds, or the ratio of how much faster the Q6600 is compared to the AMD processors on all those projects. You can see the disparity is much wider (so far) on Einstein, then the other projects.
...

Impressive!! I must admit I'm surprised how big the difference is. Even more so as the E@H SSE2 Linux app is definitely not even compiled specifically for the Core2! AFAIK the target architecture that the compiler will optimize for is a Pentium-M (you have to chose some target to get decent performance).

Well, some of the assembly code that speeds up the app was contributed or inspired by Akos and I think he mainly used a Core2 for testing, so I guess a bit of an Intel bias might be in that code. But remember, we are not talking about an AMD penalty here, it's just that AMD CPUs might get less of a benefit from the handcrafted code. But that's just a theory at the moment.

CU
Bikeman

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1000560
RAC: 0

EEEE GAds, I just saw a chart

EEEE GAds, I just saw a chart of Archae86's Q6600 using windows in the other thread. My average of 21666 seconds with linux must really tweak him a bit. heck the new power app he's using still isn't getting him close.

Like I said, I have more data collection for Einstein as I had a 1.5 day cache and by tonite/tomorrow morning, my hosts should be out of work for the other projects and have AT LEAST 1.5 days worth of Einstein to fight through (probably closer to 3-4 days). I don't really think the "averages" will change enough to bring the ratios back inline, but I have the work and will run them.

If nothing else, I shed a light on something, and my total Einstein credit should be closer to that of boincsimap.

tony

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 764717862
RAC: 1063064

RE: EEEE GAds, I just saw a

Message 85914 in response to message 85913

Quote:
EEEE GAds, I just saw a chart of Archae86's Q6600 using windows in the other thread. My average of 21666 seconds with linux must really tweak him a bit. heck the new power app he's using still isn't getting him close.

Well, if you consider that your Quad is running at 7% higher clock rate, it's beginning to look interesting. But yes, the Linux app will still be faster, even tho the gap is much narrower now.

CU
Bikeman

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7281128377
RAC: 2030914

RE: EEEE GAds, I just saw a

Message 85915 in response to message 85913

Quote:
EEEE GAds, I just saw a chart of Archae86's Q6600 using windows in the other thread. My average of 21666 seconds with linux must really tweak him a bit. heck the new power app he's using still isn't getting him close.


Nope, I don't get tweaked by that. It does indicate a possible opportunity for the project--there is lot of Conroe-class Windows horsepower contributing, so, as practically everyone has observed (we used to call this "violent agreement" at Ye Olde Microprocessor Works) if the project can find means to deliver a noticeable fraction of the advantage of the current Linux ap to the Windows side, the project will get a nice jump in output, much more than ATLAS is contributing at the moment.

I'd score the 6.05 Windows ap as delivering a noticeable fraction, but with plenty more opportunity left. But a common code base and known build path is a good thing, even if it continues to leave us benighted slaves of Redmond in the rear.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: even if it continues to

Message 85916 in response to message 85915

Quote:
even if it continues to leave us benighted slaves of Redmond in the rear.

Ni!

Anyway, I suspected the gap between Windows and Linux apps had widened, but lack the machines and the knowledge / time to prove it. As for your specific example, is that memory contention thing an architecture problem, meaning it happens on Linux and Windows, or is it a flaw in the Windows memory management subsystem?

If I just spoke "Greek" to you, I may have not got the problem named right (I think I do), but I'm meaning the contention that happens on multi-core...

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7281128377
RAC: 2030914

RE: but I'm meaning the

Message 85917 in response to message 85916

Quote:
but I'm meaning the contention that happens on multi-core...


I've not seen people compare the Einstein output loss per core of a Q6600 matched to an E6600 (2 vs. 4-core) on Windows vs. Linux running the same ap, which might be a way to address your question observationally.

But it won't get done--someone might build a test case in sufficiently pure assembler that one could really run the same ap both places, but I doubt anyone would sign up to a full-fledged science ap, just for the experiment. Without that, one is comparing the applications, not the OS, primarily, I think. I don't think either the Einstein or the SETI science ap sub-contracts much of its work to system calls.

I think the people who really know much about the (many) sorts of contention don't speak much in public. Lots of folks who "know" something they read on an enthusiasts' web site speak all too much.

I happen to own an almost perfectly matched E6600 and Q6600 (some motherboard, with same RAM, same timing settings ...) The Q6600 indeed gets a bit less than twice the work done on either SETI or Einstein, but not by a large enough ratio to keep me up nights with worry, or make it a significant target of opportunity for improvement. If running the current Linux ap showed a smaller degradation ratio, it would still rank several rungs down on the opportunity for improvement list, and I doubt that is even true.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: I think the people who

Message 85918 in response to message 85917

Quote:


I think the people who really know much about the (many) sorts of contention don't speak much in public. Lots of folks who "know" something they read on an enthusiasts' web site speak all too much.

...

The Q6600 indeed gets a bit less than twice the work done on either SETI or Einstein, but not by a large enough ratio to keep me up nights with worry, or make it a significant target of opportunity for improvement.

No need to get testy. From what I recall, the discussion happened here about the contention, probably in one of the threads where you all were trying to determine the cycles... As I don't have a multi-core (I'm too poor and debt-laden), it is not something I really care that much about either, just thought I'd mention it...

/me goes back to my corner

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1000560
RAC: 0

OK, I've run it for several

OK, I've run it for several days. What follows is the ratio of Avg cpu time per wu for each processor divided by the avg cpu time for the Q6600. For example, the Q6600 is 2% faster per core on Boincsimap compared to my X2 6000, 1% faster on Primegrid, 9% faster on Docking, but 40% faster on Einstein. This disparity is what I was asking about in my first post and why I wasn't running all my AMD's at Einstein.

simap / ratio
Q6600 1
9950one 1.09
9950two 1.11
9600 1.21
x2 6000 1.02
x2 5200 1.07
x2 4800 1.2

Primegrid / ratio
Q6600 1
9950one 1.13
9950two 1.13
9600 1.24
x2 6000 1.01
x2 5200 1.12
x2 4800 1.25

Docking / ratio
Q6600 1
9950one 1.19
9950two 1.2
9600 1.42
x2 6000 1.09
x2 5200 1.19
x2 4800 1.34

Einstein / ratio
6600 1
9950one 1.52
9950two 1.45
9600 2.19
x2 6000 1.40
x2 5200 1.57
x2 4800 1.74

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.