Overclocking Pascal

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270
Topic 201227

The Opportunity

The Pascal family graphics cards released so far (GTX 1080, GTX 1070, GTX 1060 6GB and 3GB) have given users a considerable improvement on power efficiency over previous Nvidia cards (except the lower-end 750) or any AMD cards in carrying out Einstein@home work.  While the automatic provisions on the cards typically raise their core clock well above the advertised levels for the cards, and leave only a modest additional potential from user overclocking, their memory clocks at stock condition have significantly more head room.  Users may see something on the order of 10-25% output improvement, and thus gain both purchase price efficiency and power consumption efficiency (at the system level) by engaging in overclocking.

Some Risks

All overclocking raises operating temperatures.  Since heat is the universal enemy for most failure mechanisms, all of us who overclock are degrading the reliability of our equipment.  But that is on a statistical average expected outcome basis,  In real life we experience a particular outcome, not all points of a distribution, and I believe most of us who practice overclocking endure no equipment harm at all.

As overclocking that starts by trying to find a limit necessarily generates failures, we generally impose some extra communication burden on the project during commissioning work.  Some of us do not set our final operating point conservatively enough to avoid long-term meaningful error rates, and thus impose continuing extra communication burden.  We may also incur some risk that once in a while an erroneous result may be accepted in spite of quorum checking.  Error rates elevated by overclocking may dissuade the project from implementing single-result schemes in cases where they otherwise would, and thus lower project productivity.

As some combinations of application, card, driver, and overclock rates give system crashes, all the usual risks of system crashes apply, including file corruption, loss of non-Einstein work in process, etc.

My particular method of P2 state overclocking for Einstein works by a dodge through the P0 state offsets.  It is quite possible that people using their cards for gaming will find that settings chosen for Einstein success create problems in their games.

Hopes for this thread

I've divided up a rather long amount of material intended for this thread into about eight posts.  I hope it will encourage some people to try and succeed at Pascal overclocking.  I hope other participants will point out my errors, and will extend the use of my input by adding their own techniques and experiences.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

Special Problems of Pascal

Special Problems of Pascal Similar to those of Maxwell2

Many of us first became aware of the P2-state distributed computing mode memory overclock opportunity on Maxwell2 cards when we learned that cards such as the GTX970, as delivered, ran memory clock rates well below those employed by the same card for games.  Once the method was found, many of these cards were actually able to produce correct Einstein work at higher memory clock rates giving much higher productivity.  Finding the method was tricky, as multiple obvious ways of trying to set the memory clock rate in the state which runs Einstein work did not have the desired effect.  An entire sticky thread had been devoted to the Maxwell2 overclocking subject, with considerable complexity, controversy, and internal contradiction.  Pascal with current drivers and tools appears to differ somewhat from Maxwell2 in this regard, so beware of carrying over lessons learned without checking.

I believe the Pascal cards so far don't drop memory clock below game level as much as the 970, but many of them have substantial memory overclock headroom, so a significant opportunity still exists.  The sticky bit is that when a Pascal (or Maxwell2) card is actually processing Einstein work, it is generally in the P2 state.  But the methods for altering card clock rates while running Einstein work that I have found successful specify offsets to the default P0 state behavior.  For the Pascal cards tried (a 1070, a 1060 6GB, and a 1060 3GB) these offsets specified as applying to P0 promptly take effect in the P2 state (no need to suspend running tasks for this action).

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

My preferred monitoring and

My preferred monitoring and control programs

GPU-Z gives a compact view of current values for most of the numbers you want to monitor GPU behavior, plus a short-term low-resolution graph for each.  I like this combination particularly to monitor behavior during changes made looking for the overclocking ceiling.  It also has an averaging capability for each parameter.  There are two gotchas on the averaging: the average is since you started the program--so if you make a change and want an average in the new condition--you need to quit and start a new copy.  The second is that if there is more than one GPU in the system, you need to start a separate copy of GPU-Z to average each one, and leave it configured to the GPU you expect it to monitor the whole time.  If you change GPUs, you lose the accumulated averaging information for the previous one.  I download GPU-Z directly from TechPowerUp.

MSIAfterburner supports quite a few monitoring and control functions.  The particular role it plays on all my systems is GPU fan speed vs. GPU temperature.  Afterburner lets you specify a multi-point curve, which can help to manage the tradeoff between room noise and card coolness.  As to my taste the Pascal cards all use too low a GPU fan speed by default in the crucial 60-80C temperature band, this is an important function.  Afterburner has heavily configurable graphing of some merit, but I find myself using GPU-Z instead for that function.  It also offers some overclocking control, but I currently use Nvidia Inspector for that function.  I prefer to download Afterburner directly from MSI.

NVidia Inspector completes my trio of preferred monitoring and control programs.  While occasionally I start the GUI to get an overall look at card status, Pstate status and transitions, and do overclocking by slider rather than by typing parameters for switches, my primary use is the command line interface.  I both use this in batch delayed startup files to set operating overclocks automatically on reboot, and in a cmd window to make adjustments to overclocking when I am searching for the ceiling.  I've found it comforting to find a link for NVI download at the Major Geeks site.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

Symptoms of Failure When one

Symptoms of Failure

When one has pushed a clock rate too high, something bad happens, but the particular presenting symptom varies from card to card, application to application, clock rate, and quite likely with host system characteristics.  While I have often seen the same symptom repeat for the same card running the same application with the same clock near limit, one should not assume the experience reported by another will apply.  Here is a list of failure symptoms I have seen:

1. "Safe mode" forced downclocking:

While I have never seen this on Pascal cards yet, multiple times on Nvidia 970, 750, 660, and 460 cards I have observed the card suddenly to drop to a drastically lower clock rate, then continue processing.  These are not small changes, but more like an order of magnitude.  The condition generally does not heal without a reboot, and sometimes that does not do the trick on first try.

2. Involuntary reboot:

Both for Pascal cards and some earlier models, certain excess overclocking conditions have triggered unrequested system rebooting.  Often, but not always, the overclock that triggers this is somewhat higher than that required to trigger another detectable symptom.  Avoiding these reboots is one reason to use small increments while increasing clocks, pausing at each step long enough for the subtler symptoms to present themselves, and checking carefully enough to notice.

3. Abnormal early termination of a specific task, with a Computation error reported visible at the BoincMgr level:

Occasionally this will happen almost immediately (I've seen less than six seconds), so if you have a large number of tasks in the ready to run state, the system can rip through and ruin them all in short order.  This is a reason to consider suspending most of your queue for some tests, at least at risky transitions.  On the web site task list these list "Error while computing" in the status column.  The Stderr for that task will contain lines like:

Outcome: Computation error
Client state: Compute error
Exit status: 1007 (0x000003EF) Unknown error code

But of course there is variety in the specific error codes in different cases.

In the one case in which I got rapidly repeated errors of this type on a Pascal card, I was just slightly over the maximum useful core clock.  

4. Task completes, and passes the sanity test immediately performed on return (if there is any), but flunks a broader sanity test performed only when your quorum partner has also returned a result:

This differs from a miscompare.  There is a long-running thread on this topic started by Gary Roberts.  According to his initial post, the displayed symptom of this exact problem is the status of "Validate error"--that exact text, accept no substitutes.

5. Task completes and makes it through any sanity checks, but is found in insufficient agreement with a quorum partner for immediate validation:

This case results in a temporary status of "Completed, Validation inconclusive" with the task Validation state posted as "Checked, but no consensus yet" and the sending out of a tie-breaker result to a third host.  When that host returns their result, if it matches your partner and not your result, your status is changed to "Completed, marked as invalid" with the task-level validation status shown as "Invalid".  Awkwardly, these cases can take quite a while to resolve, as your quorum partner may not even be available for the first comparison for some days, sending out the tie-breaker is sometimes delayed by hours, return of the tie-breaker can easily take more days, and sometimes a time-out means yet another tie-breaker must be sent.  However, failing overclocks over a non-trivial range may sometimes ONLY manifest this way, so watching carefully enough to notice (and exercising patience) is important.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

On Resolving Inconclusive

On Resolving Inconclusive Validation Results

Nothing in this discussion of the assessment of inconclusive results is specific to Pascal overclock, but it is a useful skill.

When both you and your quorum partner have returned a result, the Validator starts up, does some sanity checks on both results, then compares the actual computational results and decides whether they are close enough to be declared a match.  If they are not close enough a temporary status of "Completed, validation inconclusive" is posted in the status column for the task, which remains on the Pending page of the task lists at the web site.  Judging the risk posed by listings of this type is key to timely decision making, but tedious, labor-intensive and lacking in certainty.  It is useful to distinguish among four cases.

1. The only problem is that your quorum partner flunked the Validator sanity check:

This has zero concerning consequence to your situation, but means the final judgment is delayed pending receipt of a tie-breaker result.  When this case applies, the simple pending task list status is not distinguishable from the other cases, but a click on the WU link will show you that the result for your computer is shown with status "Completed, validation inconclusive", while your guilty quorum partner result shows status "Validate error".  Just wait, and fear not.

2. The comparison between your result and the quorum partner result failed to match, but your quorum partner has a terribly flawed track record:

In this case clicking on the WU link will show both initial results with the "Completed, validation inconclusive" status, and clicking on the computer link and then the task list for your partner will allow you to review his recent results with special attention to errors, and the balance of invalid and valid results, especially on the application your task ran.  If your partner has multiple error and invalid results for the application, and zero valid results in the retention interval of this display, you are entitled to almost as much comfort as in case 1.  Sadly, such computers do exist here.

3. A graded case between case 2 and case 4 has you matched to a quorum partner with a moderately flawed track record--several invalid results against a background of mostly valid results might be typical:

Since you are engaged in testing with an elevated a priori probability of error, you must regard such a case as concerning and suspicious, but not by itself conclusive.  Keep watching--if there are more of these (from different partners) in a short time, the evidence is building against you.  If no more show up, this may have been the other guy's fault, or if yours your error rate may be low.

4. If your quorum partner has many recent returns, flawlessly reaching validation status with zero listings for errors or invalid results, you should be seriously concerned:

I'd score just two such outcomes as very strong evidence that your result was the one at fault.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

Usage of NvidiaInspector for

Usage of NvidiaInspector for Pascal Overclocking--One way

I'll describe one method, using one tool.  This is not to claim no other ways exist.  First I'll sketch, then give pesky details.

Download a copy of the program Nvidia Inspector.  For trials use, run the nvidia inspector program from a command prompt, supplying command-line arguments to make desired clock settings.  For production use, create a desktop shortcut allowing you to institute your desired clock settings by hand shortly after reboot, or create a batch file containing the required nvidia inspector commands.

Program source:

I've found it comforting to find a link for NVI download at the Major Geeks site.  For some time now the newest version available seems to have been 1.9.7.6

Program installation

This is one of those programs which don't actually have an installation process.  After unzipping the provided zip files, I made a directory in the program files directory and just copied all four of the offered files there.  To get a desktop icon I selected the exe file in File Explorer, dragged to desktop and selected the option to make a shortcut.  I only use this when I want the GUI interface.  That is seldom, and normally I used the command line interface.  As there is no installation, this executable is not on your path, so running it from the command line is easily done by first changing context to the directory where you put it.

Setting clock offsets--meaning and specific commands

The specific NVI switches I have used successfully for Pascal overclocking are those which specify an offset to the default behavior for core clock and memory clock.  For example, the exact command I use to set my GTX1070 currently is:

nvidiainspector.exe -setBaseClockOffset:1,0,180 -setMemoryClockOffset:1,0,800

Each of these switches takes three arguments.  The first is the index number of the GPU, the second is the pstate you wish to configure, and the third the actual numeric value you wish applied.  To figure out what NVI thinks the index number of the GPU that interests you is, it is best to open the NVI GUI and look. The various programs that monitor and talk to your GPUs are not consistent in their reference IDs.

To get a full list of the various switches available for the nvidia inspector command line, just run it with the /? following.  This is actually unsupported, but causes the full switch list (including parameter designations) to appear.  

You will find in the popup list switches for direct setting of clock rates.  The straightforward thing to do for a Pascal card would be to use these, directed to the P2 state, in order directly to set the rates at your actual operating condition.  This has not worked for me on repeated trials, but I've not exhausted all possibilities, maybe you can find a way.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

An example of a delayed

An example of a delayed startup batch file

Years ago I had some problems I eventually diagnosed as being affected by the starting sequence and time interval of launch of some programs.  I've not rechecked for the problems in a long time, so my practice may be overconstrained.  But the technique is a convenient way to launch a controlled sequence of programs.  For this purpose I compose a batch file in a plain text editor (I use Textpad, but even Word can work if you take care to save as using text format).  I use six types of lines:

 1. @echo off: Stops appearance of the prompt

2. REM:  any line starting with these characters is just a REMark, added to aid later understanding on review of the file.

3. echo:  any line starting with these characters appears in the command line window in which the batch file is running, at the moment exescution reaches that line

      I use it to advise myself of progress through the file

4. timeout:  a line of the form time /5 n imposes a delay of n seconds before execution of the next line.

5. start:  a line of the form start /d "location" executable [switches] causes the execution of the specified program

6. exit

With the above explanations of the format and function of lines, this subset of my current standard delayed startup launch file for my main daily use machine (which has both a GTX 1070 and GTX 1060) may be useful:

@echo off

REM This is a batch file intended to launch Stoll9 startup files with a controlled sequence and a controlled delay after my account logon
REM The programs in question should have autolaunch disabled (for example by removal of registry keys in CurrentVersion/Run folders, deselecting in Task Manager, removing links in startup directories, or disabling Task Scheduler entries)) if they are not commented out here with a REM
echo this batch file will launch background applications desired upon Peter user login
timeout /t 5
echo *********************************** Setting GPU overclock for GTX 1070 MSI Founders Edition card
start /d "c:\Program Files\Monitoring\Nvidia Inspector" nvidiaInspector.exe -setBaseClockOffset:1,0,180 -setMemoryClockOffset:1,0,800
echo *********************************** Setting GPU overclock for GTX 1060 6GB PNY card
start /d "c:\Program Files\Monitoring\Nvidia Inspector" nvidiaInspector.exe -setBaseClockOffset:0,0,170 -setMemoryClockOffset:0,0,550
timeout /t 2
echo ******************************** MSI Afterburner
start /d "C:\Program Files (x86)\MSI Afterburner" MSIAfterburner.exe
timeout /t 5
echo ******************************** start Performance monitor with values for memory leak monitoring
start /d "C:\Users\Peter\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Administrative Tools" MemLeak1.msc
echo exiting delayed start in 5 seconds
timeout /t 5
exit

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

Scheduling the delayed

Scheduling the delayed startup tasks

I use an entry created in Task Scheduler.  I'll mention possibly relevant settings I currently use.  Many other combinations may work fine, and some better.  Reviewing properties for my currently successful task entry, under the General tab I've selected the option to run using my account, only when my userid is logged on, and to run with highest privileges, configured for Windows 10.  At the Triggers tab, I opt to begin the task at logon of my personal account, with the advanced setting to delay by 30 seconds.  At the Actions tab I list my startup batch file in the "program/script" box, and give the directory that file lives in in the optional "Start in" box.

These settings work for me, but I award my daily user account Administrator status, and have my UAC dialled down to the lowest setting.  If nvidia inspector commands you place in such a delayed launch file fail to reach the desired effect, you may find it helpful, while still using logon to your user account as the trigger, to select in the security section of the general tab the Administrator account to run the batch file.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7264248437
RAC: 1574270

Some Pascal Overclocked

Some Pascal Overclocked results on Einstein GRP6/CUDA55

Attribute   1070 1060_6GB 1060_3GB
Paid        $450   $250     $200
core_offset  180    170      205
core_MHz    2012   2025     2006
mem_offset   800    550      550
mem_MHz     2304   2177     2177 on the GPU-Z scale
credit/day 203,657 143,097 132,306  for BRP6/CUDA55, computed from average elapsed time

These results, all of which are for "cruise overclocks" typically 100 MHz below highest observed success memory overclock and 40 or 50 MHz core clock below highest observed success are at settings which I believe give under 1% error rate, possibly very far under.  For the 1070 the overclocking gain was 20%, and it was 14% for the 1060 6GB card, and 13% for the 1060 3GB card.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.