Nvidia Pascal and AMD Polaris, starting with GTX 1080/1070, and the AMD 480

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225614931
RAC: 1052139

I join Gamboleer and AgentB

I join Gamboleer and AgentB in have an RX 480 card on order. Mine is a NewEgg order for a Sapphire 4G RX 480 for which the primary favorable aspects were that it actually carries the $199.95 price which got much publicity before launch, but has been not much available just yet, and that NewEgg was taking backorders for it. Before this I three times clicked on NewEgg order buttons for other RX 480 cards represented as being in stock, only to have the dreaded message something like "our web site is having problems just now" message show up when trying to view the cart for more than a couple of seconds.

As I doubt the card will show up for a couple of weeks if not more, I obviously won't be competitive with the earliest owners for first posted information, but at this price this may be a very fine value of a card, and the clearest way for me to compare it with the GTX 1070 (which ain't bad) is to own and operate one. My current guess is that this model of 480 will clobber any offered GTX 1070 or 1080 for performance per unit purchase price. Past history suggests it may fall somewhat behind the 1070 on power consumption per unit performance. Testing will tell. Comparison to the expected GTX 1060 or 1050 is highly speculative at this point. So to is the question as to whether other Polaris models below the 480 may offer even more compelling performance for price.

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0

Beware if you plan to go for

Beware if you plan to go for a 4GB RX 480. They usually seem to have slower memory than the 8GB versions, possibly resulting in a noticable performance loss here at Einstein. The Sapphire could be an exception.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3524817889
RAC: 1530417

I was promised an RX 480 few

I was promised an RX 480 few weeks ahead of launch. But apparently they didn't have enough review units, so now I have to wait few more weeks for another batch. Very disappointed...

-----

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225614931
RAC: 1052139

I've speculated without data

I've speculated without data except for historic tendencies that Polaris might not be fully power competitive with Pascal earlier in this thread. While we don't yet have Einstein-specific data on this point, there begins to be some actual data on the Polaris initial offering--RX 480--and power.

There is rather a lot of chatter about concerning power and the RX 480. People planning to place an order should probably look around and see whether they are troubled. I, for one, don't plan to cancel my pre-order.

The concern has two basic parts:
1. The board is widely observed to consume quite a lot of power per unit performance considering the generation.
2. Multiple respected reviewers have published measurements appearing to show the card drawing more than the connection specification maximum from the motherboard power connection, and, less widely also the 6-pin connector. This is true in default settings, and of course considerably worsened in some overclocking configurations.

While I have a really extensive background in electronic design, I lack specific competence in the practical issues involved in specific current levels through the types of connections involved, so have no independent opinion on how likely the specification transgression is to have practical impact to a particular user.

An likely mitigating factor for this community may be that Einstein crunching may use rather less power than many games and benchmarks. I am nearly certain this has been true for the recent Nvidia offerings I've brought into service (750, 970, and 1070) and would not be surprised if it tends to be true of the AMD products as well.

On the other hand, total power consumption is a very real interest of mine, and unless the Polaris products are fairly near to power efficiency parity with the Pascal products, I'm not likely to buy a second one.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225614931
RAC: 1052139

RE: RE: Any wisdom out

Quote:
Quote:
Any wisdom out there on "beyond 4X".

Thanks for the various suggestions. I've decided to kill two birds with one stone by revising my already started run at 3X GPU + 1 CPU job at two increments below maximum core and memory clock to run with zero CPU jobs, while setting the fraction of allowed CPUs to 100%, and the "GPU utilization factor of BRP apps" to 0.19, but putting my requested queue at 3 days.


This worked.

Just now I raised my queue request from 3.15 days to just under 4 days intending to trigger the first downloads in a day, having yesterday set the utilization factor to .19 and allowed CPUs to 100%. I'd forgotten I had all CPU tasks suspended, so needed another minute to fix that. But on releasing all the CPU tasks it restarted one, and started three new ones. As soon as the first fresh GPU tasks carrying the message of my desire to run 5X were downloaded, one of the four running CPU tasks paused, and two new GPU tasks started so a full 5 GPU tasks were running as requested. I've suspended all my CPU tasks again, and started a 5X run.

Meanwhile my 3X GPU run with zero CPU tasks at 2 increments less than maximum observed success core and GPU clock averaged 1:34:01 elapsed time, for a calculated 202,177 credits/day for the system, all from the GPU.

I may finally understand my previous failure to get to 5X. I had CPU tasks in queue, so stopping them all would obviously put me in calculated deadline trouble, but in only allowing 40% of CPUs to be "used" (and later 48%) the 5*.2 = 1.0 CPUs calculated as needed to support 5 GPU tasks would not "leave" a full CPU to run a CPU task. I suspect had I set 52% instead of 48% all would have been well. Or maybe I fumbled in a more simple way.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3524817889
RAC: 1530417
Anonymous

RE: A bit more about

A lot to digest. It will be interesting to see some of the numbers here on the various cards because it is here that matters since most here do not seem to be gamesters. Here, Here!!!!!! :>P

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225614931
RAC: 1052139

I consider this an attempt at

I consider this an attempt at a fair first-draft comparison of the RX 480 and the GTX 1070 running Einstein BRP6/CUDA55, using for the RX 480 data reported by Gamboleer and for the GTX 1070 data I have observed on the system I call Stoll9
I've recycled a format I used earlier in this thread. The two columns of 1070 data I show here include first one running stock clocks, 3 GPU tasks, and one CPU tasks, as the closest condition I have available to the way Gamboleer ran the data I am showing, and one column at 5X GPU, + 3 CPU GW tasks, at what I believe to be a sustainable overclock, which is the most productive 1070 configuration I have demonstrated on my system.
[pre]
RX 480 1070 1070
stock stock stable OC
3X 3X 5X
3 3 5 Number of GRP6 GPU tasks at once
0 1 3 Number of 1.04 G Wave F tasks running at once
2:03:00 1:54:05 2:35:32 Average elapsed time for GPU tasks
-:--:-- 6:46:44 8:04:17 Average elapsed time for CPU tasks
161,085 166,615 203,686 Daily credit rate, GPU tasks
0 3540 8920 Daily credit rate, CPU tasks
161,085 170,155 212,607 System total daily credit rate
315 185.6 225.6 Watts System power draw at the wall
???? 1860.5 2062.5 average core clock rate
???? 1901.2 2304 memory clock rate
???? 64.4 65.2 degrees C average GPU temperature
???? 68 69 average fan speed percentage
~95 94 96 average GPU load percentage
~60 87 89 average memory controller load percentage
185 122.8 162.7 incremental watts attributable to Einstein work
833 1386 1307 credit/day per incremental watt
[/pre]

I've neglected to show a price, but the Founders Edition 1070 I used carries a list price of about $450, and although the claimed baseline MSRP for 1070s is about $380, there has been little availability so far below about $420. While the claimed baseline MSRP for the RX 480 is announced as about $200, very little product has been available so far below $250, and there is some concern than the RAM speed on the 4G cards slated to be cheapest may hurt their Einstein productivity. But I was actually able to place a backorder for a $200 Sapphire 4G RX 480 card a few days ago.

Observations:

1. The RX 480 wins big time over the 1070 on purchase price specific Einstein BRP6 productivity at default clock rates.

2. The GTX1070 has a big power productivity advantage over the RX 480 any way you slice it.

3. We don't have much basis for guessing the relative overclock response of the two cards. My GTX 1070 got a lot of benefit out of memory clock well beyond stock, and appreciable benefit out of core clock overclock. It also liked to have the fan curve lifted. While the details will be important, it seems unlikely that relative overclock ability will differ sufficiently to invalidate my first two observations.

4. With the current applications and drivers, the AMD card consumes more CPU support than does the Nvidia card. This will probably cause more host-dependent output variability. In this test the AMD card was probably at a disadvantage in depending on a host with lower per-core computational oomph than the Nvidia card, but I doubt this was a big enough effect to change the first two observations greatly.

5. The Nvidia card may be suffering an opportunity loss in using a application built using the CUDA55 environment. This is old enough that it lacks not only Pascal optimizations, but also Maxwell optimizations. When we progressed from a CUDA32 build to a CUDA55 build at Einstein the two then most recent NVidia generations got a really substantial benefit both in increased GPU output, and in decreased required CPU support, while older generations got none. So we have an historic example of generation-specific CUDA support making a big difference on Einstein. Whether that history has predictive value for Maxwell and Pascal is speculation. Whether the project will consider Nvidia platform productivity high enough on the priority list to merit a trial build is also speculation.

6. The excess power through connector issue for the initial RX 480 cards may trouble some users at stock, and may lead others to be more restrained in overclocking than they otherwise might. While I assess that the claimed exceedance is real for the current product, I have no strong opinion on the practical impact, nor do I know whether other released board designs of this series of parts may find ways to mitigate it. Putting on another PCIe connector, or the 8-pin variant, would need to be supported by a revised power delivery circuit design, but is certainly feasible, if the partner designers either think this is a serious practical issue or a perception issue worth addressing.

I've tried to proofread this, but suspect that typos and other errors persist. I'd be grateful for useful criticisms.

I do recognize that I am jumping the gun a bit in using very briefly generated RX 480 data, and if the differences of interest were smaller, I'd have delayed saying anything. But I think the big price advantage and the big power consumption disadvantage seem pretty clear. For my own part I am cancelling my RX 480 backorder, as I have an irrationally high priority for power efficiency, and the differences are large enough not to need the extra resolution of comparison on the exact same system. I plan to await the 1060 announcement before deciding on a next card purchase. I think it likely that one will keep much of the power efficiency of the 1070, and perhaps improve on it, and that it will be somewhat closer to competing with the RX 480 on purchase price productivity.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

I've been a bit busy on this

I've been a bit busy on this host

Repaired the hd7990 and had a test run with the new fans, and service.

Disabled boinc and stopped crunching.

Removed the HD7990

Removed fglrx (took a little effort)

Upgraded to ubuntu 16.04

Added RX-480, and drivers.

Rebooted...

To my surprise, it started cruching (immediately) BRP6.

We still have work to do, a black screen is not enough to work with - and some executables are seg faulting (on exit).

Coprocessors	AMD Ellesmere (8084MB)
Operating System	Linux
4.4.0-28-generic
BOINC version	7.6.32
Memory	15761.28 MB
Cache	6144 KB
Measured floating point speed	5983.62 million ops/sec
Measured integer speed	100713.33 million ops/sec

boinc is reporting a silly value (still) on the integer speed.

Running 4x at the moment (need boinctui / remote boincmgr to view) and the power draw at the wall is about 235-242W.

Too early to make good comment on times, probably will have some reporting in about two hours, looking like 2:40:00 (40 minutes approximately 25%)

I will now go into the land of dmsg in search of black screen fixes. I may be some time.

Meanwhile boinc crunches on.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225614931
RAC: 1052139

RE: Otherwise I plan to put

Quote:
Otherwise I plan to put a GTX 750Ti in the box and see what fraction of their individual productivities results when they run together.


For about a day and a half my main PC which had been the host for a single GTX 1070 for a couple of weeks, has been running with two unlike GPUs, that 1070 still overclocked, and the GTX 750Ti SC which was the previous card on the machine continuing to run on as-shipped clocks.

Generally the results are quite good, with very little apparent productivity loss considering that the two cards must share host system support in general, and in particular the motherboard downgrades the PCI-E from 16 lanes to 8 when both available x16 slots are populated.

I did have two specific problems:

1. on initial boot BOINC noticed I had two cards, but did not employ both of them. This was easily fixed by adding a cc_config.xml file in the BOINC directory including the line:
1
2. After about half a day's running, all three tasks currently running on the GTX 1070 failed simultaneously. They showed in the manager status as "computation error".
My initial reaction was to lower both the core clock and memory clock overclocks by one increment each, but on reflection I decided that the symptoms seemed more likely to indicate too high a core clock. So in the end I lowered the core clock two increments from what I had formerly thought safe, and put the memory clock back where it had been. The twelve hour run I am documenting here (and my current configuration) relied on a command-line Nvidia Inspector intervention to raise the P0 core clock rate by 160, and the P0 memory clock rate by 800, giving average observed (as reported by GPU-Z) core clock rate of 2012.6 and memory clock 2304.

Comparisons are somewhat compromised by the fact I've chosen 3X multiplicity for this configuration, and that the availability of the GW CPU work I had run during previous tests dried up, so I've switched to GRP#1 work, which seems to have somewhat different operating characteristics, and also a misleading progress indication progression.

Still, I'm happy to report that a thought-to-be-stable 12 hour run with consistent conditions suggests:

3X 1070 BRP6 elapsed time 1:34:01, vs. 1:33:32 seen running alone
3X 750 Ti BRP6 elapsed time 5:01:43 vs. 5:01:00 running alone
3X GRP1 elapsed time 6:11:24

These combine to give a calculated daily credit production rate for the system of 273,237, of which 202,177 coming from 1070 tasks, 63,000 from 750 Ti tasks, and 8,061 from CPU tasks.

My 1070 runs perhaps a couple of degrees C warmer (and the fan a bit faster) even after I moved it to the bottom slot thinking that would get it fresher cooling air (my case has a bottom mounted 140mm fan). When it was on the bottom the 750Ti ran about the same temperature it ran in the top slot alone, but when moved to the top slot above the 1070 the temperature rose about 7C (from 55 to 62, so not a problem).

I tentatively think this PC can easily support two cards of the GTX 1070 or 1060 class, but that the card not in the bottom slot may by preference be one with more capable cooling fans than the one provided on the FE.

My near term plan is to watch the GTX 1060 announcement, currently rumored to be this week, possibly with initial shipments as soon as next week. I am pretty sure the 1070 offers somewhat better computation per purchase dollar than the 1080 (or will once the initial buying panic subsides and lower priced partner cards turn up), and if the 1060 fits Einstein well, it is possible that will be yet more true of the 1060. On the other hand, Nvidia may intentionally cripple some Pascal feature on the 1060 to keep their product differentiation for games clean which by bad luck hurts Einstein specially. So a direct trial on Einstein is probably the way to tell. Quite likely I shall supply such a trial.

By the way, my cheery results on running two cards on this PC should come with a strong "your mileage may vary" warning. In the older days, with other cards, other drivers, and other applications on other hosts, I've seen quite severe loss from individual performance in combined setups. I think the current Einstein BRP6/CUDA55 application is quite light in host requirements by historic standards (and even current standards in comparison with SETI, or AMD use on BRP6). I also think my host PC, while not bleeding edge, is quite capable. Further, there is no reason to think the overclock successful on my sample of this card in this case, will be the same seen by others.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.