Nvidia Pascal and AMD Polaris, starting with GTX 1080/1070, and the AMD 480

Todderbert
Todderbert
Joined: 3 Jun 15
Posts: 1,285
Credit: 645,963,019
RAC: 0

Mumak wrote:I too increased

Mumak wrote:

I too increased the mem clock to ~7408 (QDR) and times are now ~2730 s. That's only 25% slower than my Tesla K20, which consumes 110W.

Now trying 7608 QDR mem..
EDIT: Well, this is odd - running at this clock run times are actually prolonged by ~40 sec. Not sure why, as all parameters reported seem to be OK.

 

That's interesting.  My 1050Ti is now set at 1877mhz(7508), and turning in units just under 2700.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 1,932,011,719
RAC: 0

Don't know what exactly

Don't know what exactly happened, but even going back to 7408 still produces run times of 2800 s, while at the beginning it was 2730. Might be some driver glitch, will try to reboot the machine.

-----

Todderbert
Todderbert
Joined: 3 Jun 15
Posts: 1,285
Credit: 645,963,019
RAC: 0

Mumak wrote:Don't know what

Mumak wrote:
Don't know what exactly happened, but even going back to 7408 still produces run times of 2800 s, while at the beginning it was 2730. Might be some driver glitch, will try to reboot the machine.

When I increased to 7600, the same thing happened to me with the slowdown.  Maybe a clock issue?  I'm back to +250 which I'm going to leave for now.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,021
Credit: 5,013,588,041
RAC: 3,028,886

I've made multiplicity

I've made multiplicity comparison runs of my 1050 (not Ti) card running BRP4G/cuda55 work on my Westmere Windows 7 host.  All runs were at requested core clock +250, memory Clock +500 with the request via MSIAfterburner.  These gave usual displayed Core clock of 1911 and memory clock of 2002 on the GPU-Z scale.  All runs were with zero BOINC CPU tasks running, and I avoided personal use of the machine.

I give the results in theoretical credit/day, using the average observed Elapsed Time, and the credit rate of 693.  I make no deduction for expected system downtime, nor for the (currently considerable) rate of bad-beam invalid losses.

nX    ET   Credit/day IncWatts
1X 0:28:04   51306      53.9
2X 0:47:19   60867      59.6
3X 1:10:21   61407      59.8
4X 1:34:55   60685      59.3

The incremental power consumption is the system increase above idle going to the card, the CPU and other motherboard components, and conversion inefficiency in the main box power supply.  I guess that just over 50 watts  is going to the actual 1050 card at the 2x/3x/4x conditions.  I was unable to cause the card to run at 5X, though that seems unlikely to have been a desirable condition.

These results are likely considerably dependent on application, and quite possibly somewhat dependent on host characteristics, OS, and driver.  The take-away message is that one gets a nice productivity bump of over 18% in going from 1X to 2X, and on my setup a small additional increase of about 1% in going on up to 3X, with a gentle turndown to lower production at 4X.

My advice remains what it has been for a long time: Unless your card is one not able to run multiple GPU tasks, it is best to experiment to find the best result for your application, host, driver, etc.  But if you are short on time or interest to experiment much, 2X is the place to start. It if works at all it is almost certain to be more productive than 1X, and seldom gives up much to possible slightly better settings at higher multiplicity.

Todderbert
Todderbert
Joined: 3 Jun 15
Posts: 1,285
Credit: 645,963,019
RAC: 0

Nice write up Archae86. My

Nice write up Archae86.

My current EVGA 1050Ti SC setup yields:

GPU-Z reporting:

Memory: 1877mhz Samsung, +250 offset in MSI Afterburner.

GPU Core: 1784mhz Factory boosting, with zero offset in MSI Afterburner.

2X: 2690sec(1345)

Wattage consumed at wall: 55W

Temp: 53C, set fan to 50% Still very quiet.

Ambient Room Temp: 23.3C

archae86
archae86
Joined: 6 Dec 05
Posts: 3,021
Credit: 5,013,588,041
RAC: 3,028,886

I've tabulated a partially

I've tabulated a partially populated power and purchase cost efficiency table for Nvidia cards currently running on my hosts.   All are overclocked--at levels I believe to be long-term stable.  Both credit rates and power are measured running BRP4G/cuda55.  The power measurements are for system incremental power above idle to run the GPU tasks load (some are at 2X, some at 3X).  Sadly I don't have current power measurements for most of them.

Card     Cred/day IncWatts price DayCred/$ DayCred/IncWatts
750       46,652   50.7     90     518         920
750Ti     50,052           100     501
1050      61,407   59.8    110     558       1,027
970       88,389           230     384
1060_3GB  96,375           199     484
1060_6GB 104,180           240     434
1070     150,785   135.8   400     377       1,111

The prices I have used here are meant to be currently available new purchase prices in US$, usually from NewEgg, for the cheapest card of the model.

While I don't have a power measurement to show, I am confident that the 970 is the big loser on this list regarding power efficiency. I don't think it was a bad card for the time--but here it is compared a Pascal card, and with the 750 cards which were remarkably good in power efficiency for their generation.

In the hypothetical case of someone seeking to add a little Einstein-directed GPU capacity to a large existing PC fleet, the 1050 is an obvious choice. It will likely fit in most systems with an available PCIe slot without needing a power supply upgrade, and the power and purchase price efficiency are very good. However purpose-built systems aimed at Einstein system efficiency have good reason to aim higher in the Einstein product line, as amortizing the non-GPU overhead in host system cost, assembly and configuration effort, and power consumption will give better return from a more capable card.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 284
Credit: 2,016,797,643
RAC: 907,493

Can clocks be given in mhz

Can clocks be given in mhz instead of offsets. Everyone's starting point is different due to nvidia's automatic boost. Running the same offset on my cards result in different mhz due to the power draw and thus heat required for different projects. Offsets can be raised and raised but clocks can remain the same. Offset # is pointless w/o a starting point. 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.