Low memory clock on Maxwell2 cards (960/970/980, probably Titan X)

archae86
archae86
Joined: 6 Dec 05
Posts: 2,605
Credit: 2,080,496,078
RAC: 2,254,015

I've been applying the GTX

I've been applying the GTX 970 memory clock overclocking procedure from skgiven as posted by MrS. Without change it was 3005, and I've so far tried 3100, 3200, 3300, 3400, 3500, and 3600 and am currently trying 3700.

To get to 3700 I had to click the unlock max button on the P2 settings page, without which the slider was limited to 3600. After clicking unlock the displayed maximum is 4000.

I've generally just waited long enough to get three Perseus WUs completed (I'm running 3X), so my timing comparisons are not accurate for each individual level, but the overall impression is that each step has reduced average elapsed time, with comparable condition average ET coming down from perhaps 3:57 at the unmodified condition to about 3:37:40 at the 3600 step.

I make no claim that any of these steps is long-term stable, nor compatible with full normal use of the machine. It has been very little used save for the interventions to change the overclock setting during this period.

At the moment I am just cautious feeling for the ceiling in memory clock for this application on my rig. If I find it, or max out at 4000, I'll consider backing down a notch and nudging up core clock in much smaller increments.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 758
Credit: 142,329,089
RAC: 48,720

You can use "memtestCL" to

You can use "memtestCL" to test GPU memory for errors, although I have yet to find an error with limited testing (e.g. 1024 MB, 1000 runs, which takes a few minutes). Just make sure not to use the older CUDA based "memtestG80", as it's got a bug which always throws errors in one subtest.

And you're right, I should have added the "unlock max" step.. but at that point I hadn't noticed it, since my first OC was from 3.5 to 3.6 GHz :p

Running 3.75 GHz now, seems fine but got a few strangely colored surfaces playing some Tomb Raider.. not sure if it was the memory or something else.

Edit: upon reboot with the option "create startup task" ticked in nVidia Inspector it failed to set the memory OC. At that point BOINC had already started GPU crunching, which likely blocked the clock change in the same way it's blocking it upon manual trials.

Edit2: I added a trigger "at system start" but will wait until the next regular reboot to test if this works (otherwise my Einstein@iGPU tasks take 1-2 days again to become asynchronous, performance suffers a bit before that).

MrS

Scanning for our furry friends since Jan 2002

archae86
archae86
Joined: 6 Dec 05
Posts: 2,605
Credit: 2,080,496,078
RAC: 2,254,015

My GTX 970 system ran the

My GTX 970 system ran the night at a set GPU memory clock of 3900. The three Perseus WUs which ran uninterrupted to completion after I made the change averaged just under 3:27 elapsed time (compared to about 3:57 running stock). I've just started a run at 4000.

Some comments on behavior observations when making the higher changes in nVidia Inspector.

1. If one does not have the P0 memory clock rate set high enough to allow the new requested P2 rate, the displayed request relapses to the previous request when clicking the "apply" button--not to limiting P0 rate, and with no other error message.
2. As the P0 display only shows offset and not rate, but the maximum resultant is limited to 4000, for me it took a bit of cut and try to find the right offset to apply to P0, ending as it did in 95. When one tries too high a number one gets a slightly alarming exception popup box.

At the moment I am not clear on operating nVidia Inspector to change the P2 core clock rate (assuming I might want to try this). In that case is it really needed to have BOINC not active? The NVI displayed rate before making requests in that case seems not to be the actual rate observed during BOINC operation.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,605
Credit: 2,080,496,078
RAC: 2,254,015

My GTX 970 system ran the day

My GTX 970 system ran the day at a set GPU memory clock of 4000. Perseus WU average elapsed times appear to be slightly under 3:26.

Adding a modest contribution from CPU jobs running on the dual-core Haswell host, overall credit production/day appears to be about 73,400, up from about 64,200 at stock memory and core clock rates.

As the reported core clock rate while running Perseus is already 1367, I don't know how much headroom there may be on that side. But I hope to have a try a core clock overclocking soon.

I'm out of the overclocking habit, and not sure I shall try to leave this one in place. But it does seem that the stock memory clock setting is remarkably conservative when running Perseus on this box.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,605
Credit: 2,080,496,078
RAC: 2,254,015

Oops, My claim that my GTX

Oops,

My claim that my GTX 970 ran OK for a day at 4000 memclock was incorrect.

My GTX 970 host has returned seven Perseus WUs which generated validate errors in the period when I was running among my highest GPU memory clock rates. It appears almost certain that my setup is not satisfactory at stock core clock with memory clock commanded to 4000 (the highest available).

At the moment I am crawling up the core clock rate overclock matter with very small increments of 20 MHz, using a 3800 memory clock. I think I should stop my ascent where I am to see whether 24 hours of results suggests I am actually currently at a tolerable setting, or need to back down on one or both.

For all seven invalid results, the task page outcome field reads "Validate error (8:00001000)", but the stderr output does not seem to add useful information.

By the way, for the exercise of reducing the memory clock from 4000 to 3800 and bumping up core clock by 20 at a time I did not stop BOINC. But the field in nVidia Inspector in which I found clock rate change to propagate to the BOINC running state was the P0 delta. I'm currently running at core clock +80 (which gives 1447) memclock 3800.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 758
Credit: 142,329,089
RAC: 48,720

It's OK for your card to fail

It's OK for your card to fail at 4.0 GHz, because that's just about the highest typical reported OC for these cards. There are quite a few which can "only" take 3.8 or 3.9 GHz. And when seeing such reports you never know how thoroughly they tested stability.

Regarding the core clock: I ran Heaven in a loop and incerased the core clock every few minutes. That's a rough test and I got trouble at a clock offset of 240 MHz, so I initially tried 220 MHz. I reduced this over time to make things stable, now I'm at +170 MHz and things look very good. This yields 1366 MHz at 1.08 V on my card. Your card will be different, but not by all that much.

MrS

Scanning for our furry friends since Jan 2002

archae86
archae86
Joined: 6 Dec 05
Posts: 2,605
Credit: 2,080,496,078
RAC: 2,254,015

I cut short my stability test

I cut short my stability test at less than 24 hours as I realized I'd prefer to make changes early in the day with more time to detect any substantial invalid result problem. A wedding trip out of town just after Christmas motivates me to search for the ceiling and back down to a safe setting before December 26.

The good news is that I got no observed invalids in about 3/4 day running at core clock reported 1447, memclock reported 3802. So I've just now nudged core clock up to 1467.

The PC survived my wife playing some solitaire, and perhaps some Firefox web surfing, without observed anomalies.

MrS--is your reported 1366 MHz while playing games? Or is it your current rate while actually running BOINC processing? It seems somewhat unlikely that my 1447 is directly comparable to your 1366, especially as it was the response to my specification of +80. While my +80 was specified on NVI's P0 page, the resulting 1447 was observed while running Perseus.

My current plan is to inch the core clock up 20 at a time, while holding memclock to 3800. When I run out of core clock adjustment room or detect trouble, I'll back down core clock a little and try nudging memclock up a little at a time. That may take a while, as the invalid result generation cliff on the memclock slope may not be really crisp. I think I had approximately 1/3 of WUs process at 4000 with invalid results--none of which terminated early or gave any other sign of distress that I noticed.

I've not yet decided on whether I want to try running at an overclock configuration after this is all over, nor how. But, as for other historic observed problem reasons I use a delayed launch command file to sequence at controlled intervals the launch of a few applications for which I have suspected launch order interactions, most likely I'll plan to use that mechanism. To do so I'll need to find a proper command line invocation of nVidia inspector for this purpose, or else learn that, just possibly, I can actually use Afterburner if configured properly.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,605
Credit: 2,080,496,078
RAC: 2,254,015

In my crawl up in core clock

In my crawl up in core clock rate, I've been pausing at each 20 MHz step only long enough to report one or two WUs. I've just seen my first validate error, which came on a WU which ran about the last half of its processing at reported rates of core clock 1507 and memclock 3802.

Interestingly enough it reported a different error signature than the uniform (8:00001000) reported for all of my memclock=4000 validate errors. This new signature is reported as outcome: Validate error (58:00111010)
It remains to be seen whether this signature is a useful means to distinguish between excessive memory clock and core clock rate failures, or just a happenstance.

As MrS mentioned 1.08 Volts, I should mention that reporting software asserts my GTX 970 is running at 1.20V during BOINC processing. I've not knowingly done anything to alter this.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,605
Credit: 2,080,496,078
RAC: 2,254,015

So, having found at least to

So, having found at least to first order, memclock and core clock ceilings for Perseus work on this rig at 4000 and 1507, I backed down a bit. A run at 3899/1486 gave me an invalid matching the core clock syndrome, so I backed down core clock 20 more.

I have over 24 hours of running at core clock reported at 1466 (+100 from stock) and memclock of 3899. So far no validate errors. I won't claim this is safe or stable, but with over 20 completions, many of which have validated against quorum partners, it is currently not a high error rate condition.

With a modest 3634 in indicated credit/day from two CPU GW jobs added to an indicated 69800 from GPU Perseus jobs, this currently indicates a host credit rate of 73434/day. Though considerably helped by the overclock (especially of memory clock) that still is not a very satisfactory return on the price of a GTX 970 card, nor in line to the performance such cards get in other applications. My two hosts which each have a GTX 660 plus a 750 handily beat it in credit, despite not being overclocked by me at all.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,792
Credit: 27,201,955,310
RAC: 34,906,247

RE: With a modest 3634 in

Quote:
With a modest 3634 in indicated credit/day from two CPU GW jobs added to an indicated 69800 from GPU Perseus jobs, this currently indicates a host credit rate of 73434/day. Though considerably helped by the overclock (especially of memory clock) that still is not a very satisfactory return on the price of a GTX 970 card, nor in line to the performance such cards get in other applications. My two hosts which each have a GTX 660 plus a 750 handily beat it in credit, despite not being overclocked by me at all.


I've been following with interest, your experiences with overclocking the GTX 970. It's good that you've gained a nice performance boost.

From your earlier posts, I understand that you are using two CPU cores only for CPU tasks on your i3-4130 and you are running 3 concurrent BRP5 tasks. Have you tried running more than 3?

The reason I ask is that I find there is an improvement going from 3 to 4 on a 2GB AMD HD7850, which I would assume should be an inferior card to yours. The other two things that puzzle me are the run time taken by your FGRP4 CPU tasks and the CPU time needed for each GPU task - around 34000 secs and 4300 secs respectively. Also, there seems to be a big difference between the run time and the CPU time for FGRP4 tasks on your host, even though you have two free virtual cores for other duties.

On my host, validated FGRP4 tasks take around 20-22 ksecs (and run time and CPU time are quite close) and the CPU time component of a GPU task is only around 1700 secs. That particular host has a current RAC around 77K. The GPU is not overclocked at all so maybe it's capable of even a little more.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.