Nvidia Pascal and AMD Polaris, starting with GTX 1080/1070, and the AMD 480

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

No HBM :(

No HBM :(

Anonymous

RE: It's time to

Quote:

It's time to Decommission those (already obsolete) 1070 - 1080's
and get a pascal Titan X on Aug. 2nd : )

Bill

http://www.tomshardware.com/news/nvidia-pascal-titan-x-details,32323.html

Don't throw out those 1070s/80s/60s just yet! There was something technical in Bill's link that I caught. Something about "deep pockets".

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

RE: RE: It's time to

Quote:
Quote:
It's time to Decommission those (already obsolete) 1070 - 1080's
and get a pascal Titan X on Aug. 2nd : )

I think the real architectural changes are coming with Volta. At least I am hoping that some projects that can't use GPUs now might be able to, if Nvidia keeps on developing CUDA.
And I am max(well)ed out at the moment anyway.
ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 579220198
RAC: 203098

I don't think Volta can

I don't think Volta can bringt what you whish. A strong APU could do it, with a low latency connection & switch between GPU and CPU and a good framework for heterogenous compute programming. This could allow apps to be ported successfully to GPUs which so far can't, because either the CPU is needed too often or the communication to the GPU is too slow.

MrS

Scanning for our furry friends since Jan 2002

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

I know it is more of a

I know it is more of a long-range vision, but I think it is inevitable. Chips (whether GPU or CPU) can't speed up much due to increased clock rates, so all they can do is add more functionality with more transistors on the IC. And the GPU is on the same card as the high-speed memory (soon to be HBM hopefully), so they can have a lot of memory conductors, and also avoid the extra capacitance and inductance that the socketed CPU and memories are subject to now on the motherboards. It is just a question of time, and I am sure Nvidia is planning on more convergence in each generation.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Stretching the thread

Stretching the thread furthur...

RX-460 and RX-470 annouced and here AMD RX card list

Spoilt for choices...

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 579220198
RAC: 203098

RE: Chips (whether GPU or

Quote:
Chips (whether GPU or CPU) can't speed up much due to increased clock rates, so all they can do is add more functionality with more transistors on the IC.


Adding more ALUs is a fine way to increase performance for tasks typically adressed by GPUs. The other option is to make each SM / EU / CU more complex and capable, at the cost of providing fewer of them. The extreme case would be to make them as capable as a CPU - with the side effect of not providing any benefit over CPUs anymore. Or in the other direction, more dumber ("specialized") cores leads to DSPs, which can be a lot more energy efficient than GPUs - if they can be used at all.

Steps taken towards more convergence are the APU from AMD, which allows tighter coupling between CPU and GPU as coprocessor, and the NVlink on enterprise Pascal, whichallows the same thing for them but is not yet compatible to our desktop infrastructure. And then there's this rumor of nVidia integrating a few ARM cores into their GPUs to feed them better with work. It's a crude approach ("if we can't get NVlink into Intel CPUs, let's include our own CPU") but could indeed allow more algorithms to perform well on the GPU.

MrS

Scanning for our furry friends since Jan 2002

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7229961520
RAC: 1155271

My MSRP GTX 1060 arrived

My MSRP GTX 1060 arrived yesterday, and I installed it this morning, "going for broke" in a two-card configuration with my 1070, intending to go to 3/4 of the established cruise overclock I have been using on the 1060, as I have a concert this weekend and am not in a position to spend hours in detailed configuration tweaking and measurement.

The first awful surprise was that I got the same unacceptable behavior I saw with my (recently returned) Founders Edition 1060--the installer left the 1060 as a generic graphics device (and since I had the 1060 in the primary slot, I got kiddee-cartoon scaled desktop).

Not thinking it likely I had received two sequential defective cards, I cast about for something to try, and tried the 368.95 DPC latency improvement hotfix driver, and during that driver install selected the Nvidia clean install option. I should explain that while in the last couple of weeks I have been using the sequence of first uninstalling the existing Nvidia driver, then rebooting to safe mode, then running DDU, I have not on the subsequent Nvidia driver install been selecting "Clean".

So I don't know whether the key improvement was 368.95 or "clean install", but suspect clean install. The implication is that neither Nvidia's installer nor DDU tidied up something that kept Nvidia's installer from properly installing either of my 1060's until now. Grumble.

Anyway, very quick first look is that the cheap PNY 1060 left to default clocking was a very close match to the FE 1070 on default clocks so far as clock rates are concerned (slightly higher on core clock, exactly the same on memory clock). So I suspended BOINC GPU tasks, and used Nvidia inspector command line commands to re-establish my "cruise clock" preferred 1070 overclock rates, and to set the 1060 to about 3/4 of the difference between default and those 1070 rates. So for the near term I intend to run the 1060 at NVI offsets of +110 for core clock and +600 for memory clock on the NVI scale of clock rates. On the GPU-Z readout scale this, after temperatures settle, is giving me 1961.5 core clock and 2202.2 memory clock on the 1060, with GPU-Z indicated GPU loading of 95% and memory controller loading of 86%, with reported temperature of 68C at a 68% fan speed (not default--I have an existing afterburner fan curve I'll probably tweak after a while).

Any hope that the 368.95 driver would by luck banish the memory leak is gone. Early indications suggest both the pool paged bytes and the Vi12 pool tag symptoms of the problem remain strongly present on BRP6 work on this machine.

I don't have good power numbers yet, but the early hints are very good--the box was burning 250 watts average at the wall with my 1070 plus a 750Ti running BRP6 at 3x, and appears to be burning about 291 with essentially a swap of an overclocked 1060 for the stock clocked 750Ti.

Extremely early performance indications suggest the 1060 at the current conservative overclock is somewhere between 2/3 and 3/4 of the 1070 performance on BRP6 at my cruise overclock.

I probably should wait to do more detailed evaluations until my concert is over, by which time the web site transition blackout will almost be upon us, so other than a better initial performance estimate at my first-guess conservative overclock, more details are probably a week or so out.

Other than my shamefaced regret at returning what was very likely a perfectly functional card, and the continuing serious memory leak issue, I consider this largely good news regarding the 1060 for Einstein use. This card cost me $249.99 in total.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7229961520
RAC: 1155271

I have a first validated BRP6

I have a first validated BRP6 result from the PNY 1060 cheap card.

I'm running both the 1070 and the 1060 on this host at 3X currently, with the 1070 at full cruise overclock (a couple of ticks below maximum observed successful) and the 1060 at a somewhat more conservative initial overclock.

Initial results have the 1070 average BRP6 elapsed times at 1:34:00, and the 1060 at 2:16:00. With zero BOINC CPU jobs running this yields an estimated 1070 contribution of 202,213 cobblestones/day, with the 1060 constribution of 139,765, giving system total at 341,977. I expect there is room for a little higher cruise clock on the 1060, and turning on some CPU work might slightly raise the system total, but these numbers are not far off. System power average for the first period of observation is 294.05 watts, but included a fair amount of my morning browser usage, so that will drop probably to about 290.

I've got a couple of additional observations on the specific cheap 1060 card I'm using. It is a PNY VCGGTX10606PB carrying the MSRP price of $249.99. The PC card extends all the way out to the 6-pin power connector, unlike the FE card which uses a shorter PC card and carries the power back from the connector to the card. The PCIe connector is reversed in orientation from that on my FE 1070 or the FE 1060 I tried. There is no back plate. The fan sound may be very slightly more tonal than my 1070 FE, but so far is not a problem to me. The fans look a bit thin-bladed and cheap, but seem to be performing adequately. The 1060 currently occupies the primary slot which in my case is less thermally advantaged than the secondary slot, but is having not trouble at this overclock staying under 70C with fan speed under 70%.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

RE: RE: Chips (whether

Quote:
Quote:
Chips (whether GPU or CPU) can't speed up much due to increased clock rates, so all they can do is add more functionality with more transistors on the IC.

Adding more ALUs is a fine way to increase performance for tasks typically adressed by GPUs.


Yes, but I consider adding more ALUs to be adding more functionality without speeding up the clock rate, as are all the other methods you mention. They probably can't compete with the DSP makers, and certainly not with Intel head-on, but in graphics-related areas they are the experts themselves. So it is just a question of intelligently expanding their space, and it appears to me that Jen-Hsun Huang is a master at that. The fact that Nvidia GPUs are used in various supercomputers shows what can be done for a start.

It appears to me that they can implement more functionality at a lower cost starting from a card-based approach than Intel can from a motherboard approach, because of the increased memory bandwidth and avoidance of sockets, at least for the computing-intensive work that we do here. And they don't have to worry so much about backwards-compatibility with old code, etc, so they have more of a blank slate.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.