Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

tapir

Joined: 19 Mar 05

Posts: 23

Credit: 462935446

RAC: 0

Improvement on GeForce GTX

22 Mar 2015 12:27:43 UTC

Message 130798

(moderation:

)

Improvement on GeForce GTX 570:

HOST: AMD Phenom II X6 1090T + 2 GTX 570
PCIe2.0 2 x 16 , 2 x 2 GB 1600 , Windows XP Professional x86
running 0,2 CPUs + 0,5 NVIDIAs GPUs
running GW Follow-up and Climate on CPU

BRP6 v 1.39: 3 CPU core free (CPU usage 78%)
2 x BRP6 v 1.39: 15,050sec + 5,700sec CPU time
GPU temp 80C

BRP6 v 1.52: 2 CPU core free (CPU usage 71%)
2 x BRP6 v 1.52: 8,250sec + 500sec CPU time
GPU temp 97C

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3529874050

RAC: 1455509

RE: GPU temp 97C This is

22 Mar 2015 14:32:07 UTC

Message 130799 in response to message 130798

(moderation:

)

Quote:

GPU temp 97C

This is certainly something you should avoid.

-----

tapir

Joined: 19 Mar 05

Posts: 23

Credit: 462935446

RAC: 0

Yes I did 2 ULTRA KAZE

22 Mar 2015 15:13:03 UTC

Message 130800 in response to message 130799

(moderation:

)

Yes I did

2 ULTRA KAZE (3000 rpm, 133 CFM) push air on GPUs and temps drop to 82C

Just want to show temp diference (under the same conditions) between two app version.

1.52 run 17C warmer on GTX 570

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 579346861

RAC: 197894

RE: 2. Why is the Sandy

22 Mar 2015 16:44:11 UTC

Message 130801 in response to message 130797

(moderation:

)

Quote:

2. Why is the Sandy Bridge host significantly better than the Westmere? My impression is that the CPU is less of a factor so what is hampering the Xeon??

He's reporting improvements, not absolute performance. So either the Westmere is bad, or it was so good to being with that there was less room for improvement. From his older posts it seems to be the latter. The most probable reason is more PCIe bandwidth (dedicated 16x PCIe 2 lanes from Westmere vs. 2 * 8x PCIe 2 from Sandy), but a larger L3$ and more main memory bandwidth don't hurt either.

archae86 wrote:

As the season is warming, I may soon throw away some of this performance by throttling to reduce room heating in the sun-afflicted hours, but it is available to me at will.

Side note: I recall you're using TThrottle. Do you know how it throttles? Does it pause the CUDA tasks or does it reduce the GPU power and/or temperature target? If it's the former it's inefficient: the GPU runs at full steam for a brief period, at inefficient high voltage, because it doesn't know that it should temper itself a bit, and then gets paused to cool down. It would be more efficient to continously run at moderate voltage and slightly reduced clocks.

MrS

Scanning for our furry friends since Jan 2002

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7231104755

RAC: 1160164

On the Sandy Bridge vs.

22 Mar 2015 20:54:24 UTC

Message 130802 in response to message 130801

(moderation:

)

On the Sandy Bridge vs. Westmere matter, very tentatively I think that as I have them configured, the Westmere may well have superior memory/IO performance, with the Sandy Bridge having superior CPU throughput, or possibly task switch latency.

As the 1.39 application version put a lot of stress on memory/IO needs, the Westmere won, but with the change to 1.52 that need was wonderfully reduced (improved), so that the latency or actual CPU computational capacity of the host when executing the service task assumed greater relative importance.

(a long time ago, we joked that certain people made hand-waving arguments with such vigor as nearly to achieve lift-off. I'll confess that in this post I'm in that domain).

Regarding TTthrottle. I'm pretty sure it does not use GPU power or temperature targetting. I agree that using voltage reduction, where attainable, rather than task interruption would be an energetically much more efficient means. I don't know whether the requisite control interface is readily available to the developer, nor whether he could readily sort out precisely which GPU installations would appropriately respond to that form of direction as he could supply it. Maybe on reflection I'll post an inquiry on those lines to his user board.

Meanwhile, I may want to re-think my goals of shifting thermal load to desired times of day vs. simple overall power efficiency. If I decide my household power consumption is higher than I'd like, but discard interest in heating impact, I should probably pursue interventions to lower GPU operating voltage, by whatever name they are styled. If I want to shift heat away from the most inconvenient parts of the day, an alternate method to TThrottle is to use BOINC'S operating hours of the day facility, though I recall not liking it much the time I tried it. Even here in New Mexico the sun does not shine every day, and, as the year rolls by, the overheated time period shifts anyway.

For those looking at my hosts, currently I have TThrottle running (the overhead is small), but the limits set high enough that it is not intervening. The room housing two of the hosts is, however, rapidly heading into the part of the year when it is too hot in the morning sun, while the third is in the room where it is slowly getting too hot in the afternoon. So once my RAC gets close to catching up to my 1.52 raised production, I'll probably resume some form of power conservation. I have little doubt that the otherwise wonderful beta applications have raised my household monthly energy consumption above my informal target.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 579346861

RAC: 197894

RE: Regarding TTthrottle...

22 Mar 2015 23:09:48 UTC

Message 130803 in response to message 130802

(moderation:

)

Quote:

Regarding TTthrottle... Maybe on reflection I'll post an inquiry on those lines to his user board.

That would be nice :)
I thought to do so myself, but if you find the time I'd be happy to have delegated the task into competent hands. And to extend on my previous post: he would not necessarily have to interface with the video driver himself (I think adjusting those settings is done via the nvapi) but may be able to cooperate with the author of one of the regular tweak utilities. Bundle them with TThrottle and let them take care of interfacing with the driver properly.

Compared to simply modifying the temperature target in those tools TThrottle could offer further options, like only throtteling during specific day times (as seems useful for you). On the other hand: if you configure your GPUs primarily via a temperature target, the things should adjust themselves nicely to when ever it gets warm - no further inpout would be needed.

I'm running my nVidia to keep it around 1.1 V to stay in a "fairly efficient" operating point. I can't set this directly on Kepler or Maxwell, so I adjust the power target for a certain software load so that the result is +/- what I want.
Actually I had thought about setting it directly. It should be possible if I modify the GPU BIOS to only support boost states up to the desired voltage and then remove the power target limitation. This way I'll loose some performance under light loads, but would gain overall efficiency.

MrS

Scanning for our furry friends since Jan 2002

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

@Gary regarding your notes on

24 Mar 2015 22:08:46 UTC

Message 130804

(moderation:

)

@Gary regarding your notes on GPU time variability in the Results Only thread here http://einsteinathome.org/node/198004&nowrap=true#139604

I do not see much variability in recent tasks.

I also saw some variability in early times.

Instead of using the last 100 tasks - how do the numbers look using the last 50?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117815235133

RAC: 34700528

I think (from what HB has

25 Mar 2015 0:37:18 UTC

Message 130805 in response to message 130804

(moderation:

)

I think (from what HB has been trying to drum into us) that the variability is a function of data 'favourableness' (if that's even a word) in that lots of high scoring 'toplist' candidates found very early will save a lot of time in preventing later on, lots of expensive memory transfers GPU CPU.

This seems to be showing in the following graphic where you can see early 'unfavourable data' tasks, followed by a string of 'good' data before relapsing back to more unfavourable data. Notice (at around task 25) that the two 'worst' tasks are immediately bracketed by the two 'best' tasks - this hints at some interaction going on there. You can see evidence of this later on too.

Perhaps this might be being exacerbated by running excessive concurrent tasks for the hardware combination being used. I intend to test this next with the same model GPU in a very similar host (G645 rather than G640), but running 2x rather than 3x. I should be able to find enough data points to give a decent comparison.

Cheers,
Gary.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7231104755

RAC: 1160164

Gary, It is worth

25 Mar 2015 1:36:40 UTC

Message 130806

(moderation:

)

Gary,

It is worth remembering that we just report Elapsed time, and not what fraction of that time a particular WU (assuming one is running multiplicity greater than 1X) actually is enjoying the services of the GPU.

Imagine two simultaneously resident WUs, one of which demands CPU service substantially more frequently than the other, but for which said CPU service in practice is accomplished in an infinitesimal amount of time (some combination of negligible amount of computation or transfer actually requested, low latency, high throughput of the CPU...). If task management for the GPU tends to leave it servicing the current task until set aside awaiting service, then in the hypothesized case the more frequently demanding task will be reported with a much longer elapsed time on the GPU than the other, even though in the case I've made up, the two consumed a negligibly different amount of resource.

Something at least partially akin to this oversimplified pictures seems to be going on, as multiple careful observers have observed there to be a sort of base population regarding ET/CPU time, but that WUs running on the GPU at the same time as one of the less favored WUs actually reported materially shorter than base population ET--presumably through no special virtue of their own.

Aside from that, I've noticed in my many adventures of fiddling with process priorities, core affinities, and number of allowed CPU jobs, that for some applications on some hosts, some values of these adjustments would render greatly more equal the ETs for a stream of work which with other settings were far more unequal.

Lots of words, not much of an answer buried in them--but maybe more food for thought on the ET variation issue. Now on 1.47/1.50, I'm convinced that a lot of the variability was driven by a fundamental WU interaction with the code, and the considerable correlation of CPU and ET variation helped give this credibility. I still think there are real "somewhat worse" units for 1.52, but that the strong variation from host to host in 1.52 variability suggests there is some simple "bag squeezing" of the type I sketched above going on also.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117815235133

RAC: 34700528

RE: It is worth remembering

25 Mar 2015 5:24:22 UTC

Message 130807 in response to message 130806

(moderation:

)

Quote:

It is worth remembering that we just report Elapsed time, and not what fraction of that time a particular WU (assuming one is running multiplicity greater than 1X) actually is enjoying the services of the GPU.

Hi Peter,

Thanks for taking the time to contribute. I've seen all the points you refer to but I don't claim to have any sort of real understanding of the issues involved. I've seen all your previous posts about process priorities and core affinities and the use of process lasso and certainly respect your expertise with tweaking things the way you have previously documented.

I regard myself as very much a complete novice when it comes to understanding even the most basic things about the inner workings of the kernel and how it handles process scheduling in the complex CPU/GPU hardware/software environment we are using. You use Windows, I use Linux. I imagine just that difference alone could be having quite an effect on the numbers we see when we start studying the variations in some detail.

I see my role as one of presenting data. I hope there will be others with the computer science background I don't have, who can step up and comment on what it all means. I'm sure you will have noticed that I've started rolling out some data on my NVIDIA GPUs. I intend to keep doing that for a while yet. It seems that something new pops up each time I process the next host. Take a look at Host 08, the data for which I've just posted.

Cheers,
Gary.

Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner