CUDA Performance Disparity

Jonathan Jeckell

Joined: 11 Nov 04

Posts: 114

Credit: 1342003269

RAC: 919

13 Oct 2016 16:30:24 UTC

Topic 201921

(moderation:

)

Something weird is going on with CUDA. I have three machines running NVIDIA cards, and their performance running CUDA apps does not match up with their specs.

I have an i7-3720QM (laptop) running Mac OS Sierra with a GTX650M card that benches 1,249. It averages 14,708 seconds for BRP6, and 4,581 seconds for BRP4G tasks.

An i3-4160 running Windows 10 with a GTX 950 card that benches 5,340 runs BRP6 in about 8,066 seconds, and BRP4G in 2,504 seconds.

As you would expect, the GTX 950 does the tasks in about half the time, although on paper it's about 4x more powerful by the G3Dmark benchmark. But this is where it gets weird:

My i7-5820k running Ubuntu Linux with a GTX 960, which benches 5,987, but averages *37,611* seconds for BRP6 and *3,465* seconds for BRP4G--slower than the GTX950 and the GTX650M. Way slower.

These cards are stock, and none of them have been overclocked or tweaked. The GTX 950 and 960 allegedly have the same memory and everything.

System RAM can't be an issue. The Windows box has 16 GB and the 5820k has 32MB with quad channel in effect.

I have nouveau blacklisted and turned off in Ubuntu, and have also killed the GUI.

Does CUDA performance vary this much across various platforms, or what is going on with this? Any ideas how to speed up that GTX 960?

mmonnin

Joined: 29 May 16

Posts: 292

Credit: 3444726540

RAC: 799183

BRP6 is done so theres no use

13 Oct 2016 17:19:17 UTC

Message 150657

(moderation:

)

BRP6 is done so theres no use comparing with that app anymore. BRP6 WUs are 4.4x times the points as 4g but do not take 4.4x as long.

There are several 4g app versions out there right now. Make sure to compare the same version.

Gaming benchmarks don't always compare to GPU Compute performance.

GPU memory clocks can affect completion times.

Jonathan Jeckell

Joined: 11 Nov 04

Posts: 114

Credit: 1342003269

RAC: 919

I know there are no more BRP6

13 Oct 2016 18:13:21 UTC

Message 150666 in response to message 150657

(moderation:

)

I know there are no more BRP6 left (actually, I am still running some). But I have a mountain of data on these, which is useful for comparison.

I also know the comparisons have flaws, not least with the gaming benchmarks, but also because all three cards are running on completely different operating systems! So it's impossible as well to compare the exact same CUDA app.

That said, there is no damned way the GTX 960 should be slower than the GTX 950, or so close to the GTX650M. No way. Something is wrong. Instead of being about 12% faster, it's almost 60% slower. All of the clock speeds on the 960 are higher than the 950 in the stock configuration, along with a third more CUDA cores.

Sebastian M. Bo...

Joined: 20 Feb 05

Posts: 63

Credit: 1529603785

RAC: 24

From what I see host with 960

13 Oct 2016 21:42:46 UTC

Message 150681

(moderation:

)

From what I see host with 960 is also doing CPU task, while this with 950 aren't. Try to free some CPU cores to allow proper feed of GPU.

WhiteWulfe

Joined: 3 Mar 15

Posts: 31

Credit: 62249506

RAC: 0

One curiosity of mine is how

14 Oct 2016 4:37:19 UTC

Message 150691

(moderation:

)

One curiosity of mine is how many work units are being run at once? Some would say that for a GTX 980 Ti running at 1341 MHz core a time of 2,700 or so seconds is bad... Until one realizes that in that time frame my card has crunched four work units, as I run four at a time to keep the GPU's load steady at 92-94%.

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

Since you have a single GPU,

14 Oct 2016 5:10:14 UTC

Message 150692

(moderation:

)

Since you have a single GPU, you would want to have a single physical CPU core available for the GPU application. I noticed that you have an Intel 5820K which is a 6-core CPU with HT turned on. In this case, I would run no more than 10 CPU tasks and leave the remaining 2 threads for the GPU application. I think the setting in BOINC computing preferences would be 84% of the CPUs. I would suggest starting with running just one task per GPU with GPU utilization factor of BRP apps set to 1.00 in the project preferences.

mmonnin

Joined: 29 May 16

Posts: 292

Credit: 3444726540

RAC: 799183

Setting the gpu applications

14 Oct 2016 10:40:06 UTC

Message 150697

(moderation:

)

Setting the gpu applications .exe to have a slightly higher priority than CPU applications exes is perfectly fine. I ran 6 GPU threads and 8 CPU threads on a 3770k and there was no slowdown on the GPUs. They took any CPU cycles when needed to feed the GPUs.

Jonathan Jeckell

Joined: 11 Nov 04

Posts: 114

Credit: 1342003269

RAC: 919

The machine with the 950 is

14 Oct 2016 14:28:47 UTC

Message 150708 in response to message 150681

(moderation:

)

The machine with the 950 is doing CPU tasks--for other projects. Until recently it was also using its Intel HD4400 GPU for SETI, which really killed CPU performance.

But even though the 950 machine is using the CPU, it's only doing 4 threads on its 2 cores, while the 960 machine is doing 12 threads on 6 cores. But it also has quad channel memory and a huge amount of RAM.

Jonathan Jeckell

Joined: 11 Nov 04

Posts: 114

Credit: 1342003269

RAC: 919

Ok, I kind of feel stupid...I

14 Oct 2016 14:30:35 UTC

Message 150709 in response to message 150691

(moderation:

)

Ok, I kind of feel stupid...I didn't know you could do that. Mine is only running one GPU task at a time. When I run top on the command line, it doesn't even look like it's running it all of the time (it's probably just the times the GPU checks in with the CPU). The temp on the 960 card is only like 39C so it's clearly not working very hard.

Jonathan Jeckell

Joined: 11 Nov 04

Posts: 114

Credit: 1342003269

RAC: 919

Thanks! I'll give that a

14 Oct 2016 14:32:06 UTC

Message 150710 in response to message 150692

(moderation:

)

Thanks! I'll give that a shot. As I said above, it doesn't look like the GPU is working very hard at all, so there must be some kind of bottleneck somewhere. Maybe that's it.

Jonathan Jeckell

Joined: 11 Nov 04

Posts: 114

Credit: 1342003269

RAC: 919

Again, I feel like a moron.

14 Oct 2016 14:33:38 UTC

Message 150711 in response to message 150697

(moderation:

)

Again, I feel like a moron. I did not know you could do that. Is that the GPU utilization preference in the project preferences? The one with all of the nasty warnings about how you will end the world if you change the setting?

CUDA Performance Disparity

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner