Maxwell 2

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7059784931
RAC: 1136247

Robert, thanks for the data.

Robert, thanks for the data. It will be interesting to see what impact the three jobs switch makes. I run my GTX 750 at 2x but my GTX 660s at 3x. On the other hand your high reported utilization may limit hopes for improvement.

While the performance seems a bit less than I hoped, the power consumption is less than I feared. Not wishing to raise power consumption much from my 660s I had been thinking of the 970, but it seems my power concern may allow me to use a 980.

As you were using Afterburner, what clock frequencies did it report during 2x job operation?

One last tidbit: I've done some careful test series over the last couple of years, and I distinctly recall that in one of them the GPU job output was actually BETTER with one CPU Einstein task running than with zero. Not by much, and yes, it was a surprise, so I checked carefully, and for that test set it was true.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 540304128
RAC: 131416

This may be a stupid

This may be a stupid question, but can the nVidia cards run the OpenCL app here? With the CUDA app being built with 3.2, the OpenCL version might be better despite nVidia's lackluster OpenCL drivers.

Quote:
While the performance seems a bit less than I hoped


I agree, at least for the current Einstein CUDA app. It seems to run into some other bottleneck than shading power. What about memory bandwidth? The 970 achieves almost double the throughput of the 750Ti, with about 2.6 times as much shader power and twice the bandwidth. The L2 cache size is also the same for both cards - so if the cache is helping the 750Ti a lot, it won't be quite as useful for the larger card, since naturally more work is in flight there.

Quote:
One last tidbit: I've done some careful test series over the last couple of years, and I distinctly recall that in one of them the GPU job output was actually BETTER with one CPU Einstein task running than with zero. Not by much, and yes, it was a surprise, so I checked carefully, and for that test set it was true.


I could imagine this to be due to the delay associated with ramping up an idle core to full clock & voltage. On the first K10 Athlons this was so bad that this dynamic scaling was removed via bios updates entirely. Your old Nehalem was certainly better, but from the same generation.

MrS

Scanning for our furry friends since Jan 2002

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

RE: This may be a stupid

Quote:
This may be a stupid question, but can the nVidia cards run the OpenCL app here? With the CUDA app being built with 3.2, the OpenCL version might be better despite nVidia's lackluster OpenCL drivers.


I guess you are referring to OpenCl on S6CasA and FGRP3. The short answer is yes, but in general AMD is a little better on comparable cards. I am about to build a new Haswell machine, and will be putting in a pair of HD 7790s rather than a pair of GTX 660s for example. My past experience indicates that will be better overall, though I don't remember the numbers.

Also, the OpenCl versions of BRP4G and BRP5 run a little better on the HD 7790s as compared to the CUDA versions on the GTX 660s as I recall, though it is not a large difference. I keep hoping for improved CUDA support, but until then I will go with AMD.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2774264737
RAC: 852234

Repeating a request I first

Repeating a request I first posted over at GPUGrid, but which didn't get an answer.

Could somebody running a Maxwell-aware version of BOINC check and report this [the 'GFOPS peak' value shown at startup for a GTX 970/980], please, and do a sanity-check of whether BOINC's figure is correct from what you know of the card's SM count, cores per SM, shader clock, flops_per_clock etc. etc? We got the figures for the 'baby Maxwell' 750/Ti into BOINC on 24 February (3edb124ab4b16492d58ce5a6f6e40c2244c97ed6), but I think that was just too late to catch v7.2.42

We're in a similar position this time, with v7.4.22 at release-candidate stage - I'd say that one was safe to test with, if nobody here has upgraded yet. TIA.

Robert
Robert
Joined: 5 Nov 05
Posts: 47
Credit: 318759030
RAC: 19575

Here are the results for

Here are the results for running 3 jobs at a time on the GTX 970.

970 = 12,926 secs; GPU Usage = 96%; temp = 62 C; watts = 125
970 Daily Credits = 3333 * 86,400 / (12,926 / 3) = 66,835

Effectively no change over running 2 jobs at a time.

970 core clock for both runs = 1342 MHz. This must be a built-in boost, because I made no adjustments to have it run at that speed.

Richard, I'll try to get to your request tonight.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 540304128
RAC: 131416

RE: RE: This may be a

Quote:
Quote:
This may be a stupid question, but can the nVidia cards run the OpenCL app here? With the CUDA app being built with 3.2, the OpenCL version might be better despite nVidia's lackluster OpenCL drivers.

I guess you are referring to OpenCl on S6CasA and FGRP3.


No, I was referring to the OpenCL BRP binaries the AMD GPUs are running. They are compiled with AMD in mind, but OpenCL should be device-agnostic and e.g. at Collatz the actual binary is the same for AMD, Intel and nVidia.

MrS

Scanning for our furry friends since Jan 2002

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

RE: No, I was referring to

Quote:
No, I was referring to the OpenCL BRP binaries the AMD GPUs are running. They are compiled with AMD in mind, but OpenCL should be device-agnostic and e.g. at Collatz the actual binary is the same for AMD, Intel and nVidia.


In that case, I don't know how you would get them to run on Nvidia. They always pick up the CUDA 3.2 work units on BRP for me (GTX 650 Ti, 660, 660 Ti and 750 Ti).

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2774264737
RAC: 852234

RE: RE: No, I was

Quote:
Quote:
No, I was referring to the OpenCL BRP binaries the AMD GPUs are running. They are compiled with AMD in mind, but OpenCL should be device-agnostic and e.g. at Collatz the actual binary is the same for AMD, Intel and nVidia.

In that case, I don't know how you would get them to run on Nvidia. They always pick up the CUDA 3.2 work units on BRP for me (GTX 650 Ti, 660, 660 Ti and 750 Ti).


Anonymous platform?

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Folks, Just been

Hi Folks,
Just been having a look at the output report for a BRP5 WU and noticed something I find odd.

NO CUDA Cores!! on a GPU with over a thousand? Any ideas why a GTX970 is so reported?

7.2.42

Activated exception handling...
[04:03:10][4920][INFO ] Starting data processing...
[04:03:10][4920][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 194 MB (3904 MB free / 4098 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[04:03:10][4920][INFO ] Using CUDA device #0 "GeForce GTX 970" (0 CUDA cores / 0.00 GFLOPS)
[04:03:10][4920][INFO ] Version of installed CUDA driver: 6050
[04:03:10][4920][INFO ] Version of CUDA driver API used: 3020
[04:03:11][4920][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[04:03:11][4920][INFO ] Header contents:
------> Original WAPP file: ./PB0048_01241_DM1604.00

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2774264737
RAC: 852234

Yes. The cuda core

Yes. The cuda core enumeration (and the peak speed enumeration, for that matter) is done by a piece of software called an API (Application Programming Interface) built into the project's - any project's - science application at the time it was compiled.

The API can process the reply from any card already in manufacturing at the time it was designed, but for newer cards it just throws up its hands and says Huh? Wassat?

In fact, the newest cards the BRP5 API knows about are

GeForce GT 555M (144 CUDA cores / 374.40 GFLOPS)
GeForce GTX 570 (480 CUDA cores / 1440.00 GFLOPS)

After that, we just get

GeForce GT 640 (0 CUDA cores / 0.00 GFLOPS)
GeForce GTX 650 Ti (0 CUDA cores / 0.00 GFLOPS)

(data from Albert, there may be more exotic cards here that Albert hasn't seen yet)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.