All things Nvidia GPU

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 232
Credit: 9484668920
RAC: 22927377

Ian&Steve C. wrote:have you

Ian&Steve C. wrote:

have you tried swapping the GPUs between these systems to see if the clock speed anomalies or performance issues follow the card or stay with the system?

This is a good idea. I will actually have an interesting comparison within the next few days- we will have our other two new TR systems (identical to the TR workstation we are talking about having the issue, except they have the 24 core TR CPU and half the memory (all DIMMs are full though). If these ALSO have the same issue, then I think we will be able to say it is not the actual GPU. If they are faster, then it might be the GPU, then I would try the swap idea to confirm. 

 

Tom M wrote:

Different gpu driver versions?

Sometimes Amd has motherboard drivers for windows.

I like Ian&Steves either clean reset the new TR gpus. Or clean swap the gpus. 

Are the pci slots set to "auto" or to Gen4 or Gen3?

Are the cards showing how hard they are loaded? Any differences between systems.

Is the TR MB bios current?

Hth,

Tom M

Same driver version but I am wondering if there is some odd Dell driver activity happening. I have seen that Dell likes to release specific Nvidia drivers via their distribution platform in the past. I downloaded the newest Nvidia drivers directly from the Nvidia website so maybe there is an odd conflict going on. I know there is software out there that completely wipes GPU drivers (anyone remember the name?), so I might start fresh with drivers, as you all recommend. 

I will check the bios to see what the pcie slots are set to. 

There IS something odd about the Windows Task Manager- CUDA does not show up. My Windows 11 laptop also does this, so I was not surprised, but all of the computation is being shown under "3D" and it is sustained at 80%, with a few spikes to 100% here and there. Under the Nvidia control panel, it is showing CUDA usage is 100%. 

I will be back in front of the workstation tomorrow, but I am curious if the "hardware-accelerated GPU scheduling" is on or off. Either way, I will play around with this. 

Thank you all for your suggestions!

mikey
mikey
Joined: 22 Jan 05
Posts: 12544
Credit: 1838609621
RAC: 5260

Boca Raton Community HS

Boca Raton Community HS wrote:

 

Same driver version but I am wondering if there is some odd Dell driver activity happening. I have seen that Dell likes to release specific Nvidia drivers via their distribution platform in the past. I downloaded the newest Nvidia drivers directly from the Nvidia website so maybe there is an odd conflict going on. I know there is software out there that completely wipes GPU drivers (anyone remember the name?), so I might start fresh with drivers, as you all recommend.

DDU Display Driver Uninstaller

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8901603658
RAC: 10032314

From guru3d I think.

From guru3d I think.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

mikey
mikey
Joined: 22 Jan 05
Posts: 12544
Credit: 1838609621
RAC: 5260

Tom M wrote: From guru3d I

Tom M wrote:

From guru3d I think. 

YES I have them on a tab in my browser because I trust their downloads to be clean and virus free.

catavalon21
catavalon21
Joined: 5 Nov 11
Posts: 22
Credit: 179982630
RAC: 31379

Installed the 4070 ti, ran a

Installed the 4070 ti, ran a few tasks under Win10 stock app and several Linux FGRP on the custom app.  Single tasks at a time.

The W10 tasks are a little better than half the time on the 1660 ti

https://einsteinathome.org/host/12799118/tasks/2/0

 

On the Linux side they are also a little quicker than half the time, around 2 minutes 25

https://einsteinathome.org/host/12880857/tasks/2/0?sort=desc&order=Sent

Sorry for the clunky way of linking. 

 

A small data sample but FWIW.

Fred

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8901603658
RAC: 10032314

What I am looking at is

What I am looking at is command line scripting under Linux for Nvidia

I want to set all gpus for a power limit of say 300 watts.

I want to set the graphics memory overclock to 1500

I don't want to change the graphics clock.

I have run "coolbits" and can do this in Nvidia-X server.

So how do you setup sudo in the script? 

And why doesn't the Nvidia-X server gui display the changes I made on the command line.

sudo nvidia-smi -pl 300

sudo nvidia-settings  -a '[gpu:0]/GPUGraphicsClockOffset[3]=0' -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=1500'

sudo nvidia-settings  -a '[gpu:1]/GPUGraphicsClockOffset[3]=0' -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=1500'

========================

tommiller@Ryzen-Charon:~$ nvidia-smi
Sat Jan 28 22:27:03 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
|100%   72C    P2   299W / 300W |   4036MiB / 12288MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 78%   63C    P2   298W / 300W |   3767MiB / 12288MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       982      G   /usr/lib/xorg/Xorg                 21MiB |
|    0   N/A  N/A      1773      G   /usr/lib/xorg/Xorg                 88MiB |
|    0   N/A  N/A      1977      G   /usr/bin/gnome-shell               37MiB |
|    0   N/A  N/A   1347777      G   /usr/lib/firefox/firefox          121MiB |
|    0   N/A  N/A   1368585      C   ...-pc-linux-gnu-opencl_v1.0     1874MiB |
|    0   N/A  N/A   1369086      C   ...-pc-linux-gnu-opencl_v1.0     1874MiB |
|    1   N/A  N/A       982      G   /usr/lib/xorg/Xorg                  5MiB |
|    1   N/A  N/A      1773      G   /usr/lib/xorg/Xorg                  6MiB |
|    1   N/A  N/A   1368618      C   ...-pc-linux-gnu-opencl_v1.0     1874MiB |
|    1   N/A  N/A   1369320      C   ...-pc-linux-gnu-opencl_v1.0     1874MiB |
+-----------------------------------------------------------------------------+
tommiller@Ryzen-Charon:~$

=========================

I have spent an hour plunking around which this.  And I doubt I will ever become a highly-skilled Linux professional.  I seem to have gotten beyond "novice" and reached "beginner" where I am likely to stay.

Thank you. 

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3911
Credit: 43685315976
RAC: 63086814

Put all the commands in a

Put all the commands in a bash script without sudo in the commands 

run the script with sudo. 
 

 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4894
Credit: 18432501084
RAC: 5725987

Here is my script I named

Here is my script I named gpuoverclock.sh that sits on the Desktop.  It is one of the first things I run when I first boot the PC.  It sets up the cards for each host.  The only difference in the script is the number of cards, the power levels I set for them and the clocks depending on what kind of cards are in the host.

#!/bin/bash

/usr/bin/nvidia-smi -pm 1

nvidia-smi -i 0 -pl 200
nvidia-smi -i 1 -pl 200
nvidia-smi -i 2 -pl 200

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:2]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:3]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:4]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:5]/GPUTargetFanSpeed=90"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:0]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:1]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:2]/GPUGraphicsClockOffset[4]=60"

As you can see from the first shebang at the top of the script, it is meant to be run as root since you need root access to set the persistence mode.


 

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8901603658
RAC: 10032314

Thank you.

Thank you.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8901603658
RAC: 10032314

Thank you. I am certain I

Thank you. I am certain I will have some questions after I edit it for two and one gpu setups.

Like what performance gain does persistence mode provide?

If the gpus seem to be cooling fine do need I need manual settings?

Power mizer mode "automatic" seems to be most productive for my systems. That is mode zero?

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.