Maxwell 2

archae86

Joined: 6 Dec 05

Posts: 3160

Credit: 7261341897

RAC: 1549368

Robert, thanks for the data.

24 Sep 2014 14:28:55 UTC

Message 123729

(moderation:

)

Robert, thanks for the data. It will be interesting to see what impact the three jobs switch makes. I run my GTX 750 at 2x but my GTX 660s at 3x. On the other hand your high reported utilization may limit hopes for improvement.

While the performance seems a bit less than I hoped, the power consumption is less than I feared. Not wishing to raise power consumption much from my 660s I had been thinking of the 970, but it seems my power concern may allow me to use a 980.

As you were using Afterburner, what clock frequencies did it report during 2x job operation?

One last tidbit: I've done some careful test series over the last couple of years, and I distinctly recall that in one of them the GPU job output was actually BETTER with one CPU Einstein task running than with zero. Not by much, and yes, it was a surprise, so I checked carefully, and for that test set it was true.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 581801485

RAC: 138259

This may be a stupid

24 Sep 2014 19:11:44 UTC

Message 123730 in response to message 123729

(moderation:

)

This may be a stupid question, but can the nVidia cards run the OpenCL app here? With the CUDA app being built with 3.2, the OpenCL version might be better despite nVidia's lackluster OpenCL drivers.

Quote:

While the performance seems a bit less than I hoped

I agree, at least for the current Einstein CUDA app. It seems to run into some other bottleneck than shading power. What about memory bandwidth? The 970 achieves almost double the throughput of the 750Ti, with about 2.6 times as much shader power and twice the bandwidth. The L2 cache size is also the same for both cards - so if the cache is helping the 750Ti a lot, it won't be quite as useful for the larger card, since naturally more work is in flight there.

Quote:

One last tidbit: I've done some careful test series over the last couple of years, and I distinctly recall that in one of them the GPU job output was actually BETTER with one CPU Einstein task running than with zero. Not by much, and yes, it was a surprise, so I checked carefully, and for that test set it was true.

I could imagine this to be due to the delay associated with ramping up an idle core to full clock & voltage. On the first K10 Athlons this was so bad that this dynamic scaling was removed via bios updates entirely. Your old Nehalem was certainly better, but from the same generation.

MrS

Scanning for our furry friends since Jan 2002

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

RE: This may be a stupid

24 Sep 2014 20:45:11 UTC

Message 123731 in response to message 123730

(moderation:

)

Quote:

This may be a stupid question, but can the nVidia cards run the OpenCL app here? With the CUDA app being built with 3.2, the OpenCL version might be better despite nVidia's lackluster OpenCL drivers.

I guess you are referring to OpenCl on S6CasA and FGRP3. The short answer is yes, but in general AMD is a little better on comparable cards. I am about to build a new Haswell machine, and will be putting in a pair of HD 7790s rather than a pair of GTX 660s for example. My past experience indicates that will be better overall, though I don't remember the numbers.

Also, the OpenCl versions of BRP4G and BRP5 run a little better on the HD 7790s as compared to the CUDA versions on the GTX 660s as I recall, though it is not a large difference. I keep hoping for improved CUDA support, but until then I will go with AMD.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2976797735

RAC: 782749

Repeating a request I first

25 Sep 2014 8:40:38 UTC

Message 123732

(moderation:

)

Repeating a request I first posted over at GPUGrid, but which didn't get an answer.

Could somebody running a Maxwell-aware version of BOINC check and report this [the 'GFOPS peak' value shown at startup for a GTX 970/980], please, and do a sanity-check of whether BOINC's figure is correct from what you know of the card's SM count, cores per SM, shader clock, flops_per_clock etc. etc? We got the figures for the 'baby Maxwell' 750/Ti into BOINC on 24 February (3edb124ab4b16492d58ce5a6f6e40c2244c97ed6), but I think that was just too late to catch v7.2.42

We're in a similar position this time, with v7.4.22 at release-candidate stage - I'd say that one was safe to test with, if nobody here has upgraded yet. TIA.

Robert

Joined: 5 Nov 05

Posts: 47

Credit: 324009351

RAC: 23885

Here are the results for

25 Sep 2014 13:35:55 UTC

Message 123733

(moderation:

)

Here are the results for running 3 jobs at a time on the GTX 970.

970 = 12,926 secs; GPU Usage = 96%; temp = 62 C; watts = 125
970 Daily Credits = 3333 * 86,400 / (12,926 / 3) = 66,835

Effectively no change over running 2 jobs at a time.

970 core clock for both runs = 1342 MHz. This must be a built-in boost, because I made no adjustments to have it run at that speed.

Richard, I'll try to get to your request tonight.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 581801485

RAC: 138259

RE: RE: This may be a

25 Sep 2014 18:50:23 UTC

Message 123734 in response to message 123731

(moderation:

)

Quote:

Quote:
This may be a stupid question, but can the nVidia cards run the OpenCL app here? With the CUDA app being built with 3.2, the OpenCL version might be better despite nVidia's lackluster OpenCL drivers.

I guess you are referring to OpenCl on S6CasA and FGRP3.

No, I was referring to the OpenCL BRP binaries the AMD GPUs are running. They are compiled with AMD in mind, but OpenCL should be device-agnostic and e.g. at Collatz the actual binary is the same for AMD, Intel and nVidia.

MrS

Scanning for our furry friends since Jan 2002

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

RE: No, I was referring to

25 Sep 2014 19:03:11 UTC

Message 123735 in response to message 123734

(moderation:

)

Quote:

No, I was referring to the OpenCL BRP binaries the AMD GPUs are running. They are compiled with AMD in mind, but OpenCL should be device-agnostic and e.g. at Collatz the actual binary is the same for AMD, Intel and nVidia.

In that case, I don't know how you would get them to run on Nvidia. They always pick up the CUDA 3.2 work units on BRP for me (GTX 650 Ti, 660, 660 Ti and 750 Ti).

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2976797735

RAC: 782749

RE: RE: No, I was

25 Sep 2014 19:25:08 UTC

Message 123736 in response to message 123735

(moderation:

)

Quote:

Quote:
No, I was referring to the OpenCL BRP binaries the AMD GPUs are running. They are compiled with AMD in mind, but OpenCL should be device-agnostic and e.g. at Collatz the actual binary is the same for AMD, Intel and nVidia.

In that case, I don't know how you would get them to run on Nvidia. They always pick up the CUDA 3.2 work units on BRP for me (GTX 650 Ti, 660, 660 Ti and 750 Ti).

Anonymous platform?

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Hi Folks, Just been

11 Oct 2014 22:50:12 UTC

Message 123737

(moderation:

)

Hi Folks,
Just been having a look at the output report for a BRP5 WU and noticed something I find odd.

NO CUDA Cores!! on a GPU with over a thousand? Any ideas why a GTX970 is so reported?

7.2.42

Activated exception handling...
[04:03:10][4920][INFO ] Starting data processing...
[04:03:10][4920][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 194 MB (3904 MB free / 4098 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[04:03:10][4920][INFO ] Using CUDA device #0 "GeForce GTX 970" (0 CUDA cores / 0.00 GFLOPS)
[04:03:10][4920][INFO ] Version of installed CUDA driver: 6050
[04:03:10][4920][INFO ] Version of CUDA driver API used: 3020
[04:03:11][4920][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[04:03:11][4920][INFO ] Header contents:
------> Original WAPP file: ./PB0048_01241_DM1604.00

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2976797735

RAC: 782749

Yes. The cuda core

11 Oct 2014 23:35:42 UTC

Message 123738 in response to message 123737

(moderation:

)

Yes. The cuda core enumeration (and the peak speed enumeration, for that matter) is done by a piece of software called an API (Application Programming Interface) built into the project's - any project's - science application at the time it was compiled.

The API can process the reply from any card already in manufacturing at the time it was designed, but for newer cards it just throws up its hands and says Huh? Wassat?

In fact, the newest cards the BRP5 API knows about are

GeForce GT 555M (144 CUDA cores / 374.40 GFLOPS)
GeForce GTX 570 (480 CUDA cores / 1440.00 GFLOPS)

After that, we just get

GeForce GT 640 (0 CUDA cores / 0.00 GFLOPS)
GeForce GTX 650 Ti (0 CUDA cores / 0.00 GFLOPS)

(data from Albert, there may be more exotic cards here that Albert hasn't seen yet)

Maxwell 2

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner