Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 330,141,191
RAC: 645,623

Yep, same stepping says it

Yep, same stepping says it all. The Refresh K models are the only ones with known differences (the soldered heat spreader).

MrS

Scanning for our furry friends since Jan 2002

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,494
Credit: 65,668,648,729
RAC: 54,138,400

RE: I suspect there is some

Quote:
I suspect there is some other reason lurking deep in the system.


Perhaps, and perhaps not very deep either :-).

The Haswell is an i3-4130 (2 cores / 4 threads) @ 3.4 GHz with 2 free threads.
The Haswell Refresh is a G3258 (2 cores / 2 threads) @ 3.9 GHz with 1 free thread.

I had expected that supporting 4 GPU tasks with 1 free core would be a penalty (even though the 3.9 GHz is an obvious bonus) such that there might be a detrimental effect on the CPU component of the crunch time.

When I saw the G3258 giving 618 secs average CPU time and the i3-4130 giving more than 50% higher at 960 secs, I wondered if there was something beneficial with Haswell Refresh. Perhaps the benefit is coming from 3.9 GHz compared to 3.4 GHz, although that seems like too big a difference for not that big a frequency increase. Perhaps part of the difference is not having the use of HT.

Cheers,
Gary.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 449
Credit: 208,736,569
RAC: 21,387

RE: Perhaps part of the

Quote:
Perhaps part of the difference is not having the use of HT.


It depends on the project of course, but I run the WCG/CEP2 work units on both an Ivy Bridge i5-3550 (4 cores, non-hyperthreaded) and on an i7-3770 (8 cores, hyperthreaded). I see only about a 12% to 15% improvement in overall throughput using hyperthreading, accounting for the difference in clock rates. With other projects, it is a bit more, but probably not over 25% in most cases.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 470,912,882
RAC: 54,659

We now have gathered enough

We now have gathered enough confidence in the Beta app to release it as the official one. All your work to summarize the performance characteristics of the new app helped a great lot, so thank you very much indeed. Special Thanks to Gary who suggested and initiated this very structured and focused discussion in the current form.

Next steps:

As promised earlier, we will try to put an additional CUDA app version up that wil use a newer CUDA version, at least 5.5.

HB

archae86
archae86
Joined: 6 Dec 05
Posts: 3,021
Credit: 4,990,428,990
RAC: 3,039,263

RE: a newer CUDA version,

Quote:
a newer CUDA version, at least 5.5.


Great news. I hope you find your effort to try this rewarded by higher throughput with not too painful a set of difficulties. In my dreams I hope you will try Cuda7, which seems more likely to find better ways to use Maxwell GPUs than earlier ones, but I'll loyally test whatever you find good enough to have us try it. I've started shortening my queues in anticipation.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 449
Credit: 208,736,569
RAC: 21,387

RE: In my dreams I hope you

Quote:
In my dreams I hope you will try Cuda7, which seems more likely to find better ways to use Maxwell GPUs than earlier ones, but I'll loyally test whatever you find good enough to have us try it. I've started shortening my queues in anticipation.


Me too. It should be pointed out, though it is probably obvious, that Crunchers tend to go where their hardware can best be used, other things being equal. Therefore, in deciding on versions, it is not just the present user population that should be considered, but those that will be attracted by new applications, and alternatively might leave if the grass is greener elsewhere. Therefore, you need to lead your target a little bit.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

I generally don´t pay much

I generally don´t pay much attention to CPU times, as it it fairly meaningless outside of the context of the system in question.

Elapsed times are more comparable between systems, and in some sense more easily verified (stopwatch for example!).

However this thread, has piqued my interest, and i noticed the following.

If i run only BRP6 1.52 (x2) for both GPUs the CPU times average around 980s.

If i then run additional CPU tasks (on the i3 CPU 530) namely, GWS S6 Bucket 1.06 (X64) followups, the CPU times for the BRP6 tasks DROP to around 720s! Over 20%.

Elapsed times do not appear to change either way.

The system feels better to use browsing when these extra tasks are running.

Not what i expected. Has anyone else noticed or can explain this?

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 330,141,191
RAC: 645,623

Gary wrote:The Haswell is an

Gary wrote:

The Haswell is an i3-4130 (2 cores / 4 threads) @ 3.4 GHz with 2 free threads.
The Haswell Refresh is a G3258 (2 cores / 2 threads) @ 3.9 GHz with 1 free thread.

Perhaps part of the difference is not having the use of HT.


Ahh, that's quite a difference! The 2 CPU Threads on the i3 each run on a separate physical core (that's how the OS schedulers handle HT CPUs). If the Einstein app joins those 2 threads every now and then it's guaranteed to share a core with either of them. That's OK, but makes it take longer. On the other hand on the Pentium there's always a physical core free, so the CPU portion of the Einstein tasks completes quicker.

@AgentB: I've also got an explanation for you. The Einstein tasks of the optimized 1.52 app use little CPU time. If you're not running any CPu tasks along with them, the CPU will be at idle / base frequency (1600 MHz I think) when the Einstein tasks start. Ramping it up to full speed takes some time. If Einstein is already finished, or at least most of it, the average CPU clock speed will be well below the maximum clock speed.

If you run CPU tasks along with the GPU tasks, the continous load will keep the CPU clock up and reduce execution times. This effect is probably amplified by your CPU being a bit older, so it doesn't switch power states as quickly as newer hardware. This doesn't matter, though, as either way is fast enough to support your GPU (same elapsed times) :)

MrS

Scanning for our furry friends since Jan 2002

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,494
Credit: 65,668,648,729
RAC: 54,138,400

RE: ... If you're not

Quote:
... If you're not running any CPu tasks along with them, the CPU will be at idle / base frequency (1600 MHz I think) when the Einstein tasks start. Ramping it up to full speed takes some time. If Einstein is already finished, or at least most of it, the average CPU clock speed will be well below the maximum clock speed.


Thanks very much for pointing this out! I've sometimes seen people say that they run multiple GPU tasks and leave ALL the cores free. I'm sure that helps with both power consumption and temperature but may hinder GPU performance if all the CPU cores are likely to be running at idle frequency most of the time. I remember looking at such a host some time ago and expecting to find the fastest crunch times but actually seeing what seemed to be slightly worse performance. That all makes sense now. Thanks for the explanation.

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

RE: I've sometimes seen

Quote:
I've sometimes seen people say that they run multiple GPU tasks and leave ALL the cores free. I'm sure that helps with both power consumption and temperature but may hinder GPU performance if all the CPU cores are likely to be running at idle frequency most of the time.

Thanks also MrS every day at E@H a school day. So the CPU is not doing more it´s just doing it slower.

With the old BRP4 tasks, the CPU load on my system was much higher, and running two GPUs would keep all four CPU threads busy running 6 tasks, around the 25% mark, so that explains why i did not see this before. With BRP4 i would notice any CPU load would have a negative effect on GPU elapsed time, but thinking about it now, a single card with a better processor probably benefit from some CPU load to keep it lit up and feeding the GPU.

BRP6 is a totally different GPU app, so much so, i´m toying with the idea of running a third GPU in a PCIEx1 slot. Second hand GTX-460 are getting cheap on ebay, and i have power and cooling capacity.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.