have you tried swapping the GPUs between these systems to see if the clock speed anomalies or performance issues follow the card or stay with the system?
This is a good idea. I will actually have an interesting comparison within the next few days- we will have our other two new TR systems (identical to the TR workstation we are talking about having the issue, except they have the 24 core TR CPU and half the memory (all DIMMs are full though). If these ALSO have the same issue, then I think we will be able to say it is not the actual GPU. If they are faster, then it might be the GPU, then I would try the swap idea to confirm.
Tom M wrote:
Different gpu driver versions?
Sometimes Amd has motherboard drivers for windows.
I like Ian&Steves either clean reset the new TR gpus. Or clean swap the gpus.
Are the pci slots set to "auto" or to Gen4 or Gen3?
Are the cards showing how hard they are loaded? Any differences between systems.
Is the TR MB bios current?
Hth,
Tom M
Same driver version but I am wondering if there is some odd Dell driver activity happening. I have seen that Dell likes to release specific Nvidia drivers via their distribution platform in the past. I downloaded the newest Nvidia drivers directly from the Nvidia website so maybe there is an odd conflict going on. I know there is software out there that completely wipes GPU drivers (anyone remember the name?), so I might start fresh with drivers, as you all recommend.
I will check the bios to see what the pcie slots are set to.
There IS something odd about the Windows Task Manager- CUDA does not show up. My Windows 11 laptop also does this, so I was not surprised, but all of the computation is being shown under "3D" and it is sustained at 80%, with a few spikes to 100% here and there. Under the Nvidia control panel, it is showing CUDA usage is 100%.
I will be back in front of the workstation tomorrow, but I am curious if the "hardware-accelerated GPU scheduling" is on or off. Either way, I will play around with this.
Same driver version but I am wondering if there is some odd Dell driver activity happening. I have seen that Dell likes to release specific Nvidia drivers via their distribution platform in the past. I downloaded the newest Nvidia drivers directly from the Nvidia website so maybe there is an odd conflict going on. I know there is software out there that completely wipes GPU drivers (anyone remember the name?), so I might start fresh with drivers, as you all recommend.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 982 G /usr/lib/xorg/Xorg 21MiB |
| 0 N/A N/A 1773 G /usr/lib/xorg/Xorg 88MiB |
| 0 N/A N/A 1977 G /usr/bin/gnome-shell 37MiB |
| 0 N/A N/A 1347777 G /usr/lib/firefox/firefox 121MiB |
| 0 N/A N/A 1368585 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 0 N/A N/A 1369086 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 1 N/A N/A 982 G /usr/lib/xorg/Xorg 5MiB |
| 1 N/A N/A 1773 G /usr/lib/xorg/Xorg 6MiB |
| 1 N/A N/A 1368618 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 1 N/A N/A 1369320 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
+-----------------------------------------------------------------------------+
tommiller@Ryzen-Charon:~$
=========================
I have spent an hour plunking around which this. And I doubt I will ever become a highly-skilled Linux professional. I seem to have gotten beyond "novice" and reached "beginner" where I am likely to stay.
Thank you.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Here is my script I named gpuoverclock.sh that sits on the Desktop. It is one of the first things I run when I first boot the PC. It sets up the cards for each host. The only difference in the script is the number of cards, the power levels I set for them and the clocks depending on what kind of cards are in the host.
/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:2]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:3]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:4]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:5]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:0]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:1]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:2]/GPUGraphicsClockOffset[4]=60"
As you can see from the first shebang at the top of the script, it is meant to be run as root since you need root access to set the persistence mode.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Thank you. I am certain I will have some questions after I edit it for two and one gpu setups.
Like what performance gain does persistence mode provide?
If the gpus seem to be cooling fine do need I need manual settings?
Power mizer mode "automatic" seems to be most productive for my systems. That is mode zero?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Ian&Steve C. wrote:have you
)
This is a good idea. I will actually have an interesting comparison within the next few days- we will have our other two new TR systems (identical to the TR workstation we are talking about having the issue, except they have the 24 core TR CPU and half the memory (all DIMMs are full though). If these ALSO have the same issue, then I think we will be able to say it is not the actual GPU. If they are faster, then it might be the GPU, then I would try the swap idea to confirm.
Same driver version but I am wondering if there is some odd Dell driver activity happening. I have seen that Dell likes to release specific Nvidia drivers via their distribution platform in the past. I downloaded the newest Nvidia drivers directly from the Nvidia website so maybe there is an odd conflict going on. I know there is software out there that completely wipes GPU drivers (anyone remember the name?), so I might start fresh with drivers, as you all recommend.
I will check the bios to see what the pcie slots are set to.
There IS something odd about the Windows Task Manager- CUDA does not show up. My Windows 11 laptop also does this, so I was not surprised, but all of the computation is being shown under "3D" and it is sustained at 80%, with a few spikes to 100% here and there. Under the Nvidia control panel, it is showing CUDA usage is 100%.
I will be back in front of the workstation tomorrow, but I am curious if the "hardware-accelerated GPU scheduling" is on or off. Either way, I will play around with this.
Thank you all for your suggestions!
Boca Raton Community HS
)
DDU Display Driver Uninstaller
From guru3d I think.
)
From guru3d I think.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: From guru3d I
)
YES I have them on a tab in my browser because I trust their downloads to be clean and virus free.
Installed the 4070 ti, ran a
)
Installed the 4070 ti, ran a few tasks under Win10 stock app and several Linux FGRP on the custom app. Single tasks at a time.
The W10 tasks are a little better than half the time on the 1660 ti
https://einsteinathome.org/host/12799118/tasks/2/0
On the Linux side they are also a little quicker than half the time, around 2 minutes 25
https://einsteinathome.org/host/12880857/tasks/2/0?sort=desc&order=Sent
Sorry for the clunky way of linking.
A small data sample but FWIW.
Fred
What I am looking at is
)
What I am looking at is command line scripting under Linux for Nvidia
I want to set all gpus for a power limit of say 300 watts.
I want to set the graphics memory overclock to 1500
I don't want to change the graphics clock.
I have run "coolbits" and can do this in Nvidia-X server.
So how do you setup sudo in the script?
And why doesn't the Nvidia-X server gui display the changes I made on the command line.
sudo nvidia-smi -pl 300
sudo nvidia-settings -a '[gpu:0]/GPUGraphicsClockOffset[3]=0' -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=1500'
sudo nvidia-settings -a '[gpu:1]/GPUGraphicsClockOffset[3]=0' -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=1500'
========================
tommiller@Ryzen-Charon:~$ nvidia-smi
Sat Jan 28 22:27:03 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:09:00.0 On | N/A |
|100% 72C P2 299W / 300W | 4036MiB / 12288MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:0A:00.0 Off | N/A |
| 78% 63C P2 298W / 300W | 3767MiB / 12288MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 982 G /usr/lib/xorg/Xorg 21MiB |
| 0 N/A N/A 1773 G /usr/lib/xorg/Xorg 88MiB |
| 0 N/A N/A 1977 G /usr/bin/gnome-shell 37MiB |
| 0 N/A N/A 1347777 G /usr/lib/firefox/firefox 121MiB |
| 0 N/A N/A 1368585 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 0 N/A N/A 1369086 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 1 N/A N/A 982 G /usr/lib/xorg/Xorg 5MiB |
| 1 N/A N/A 1773 G /usr/lib/xorg/Xorg 6MiB |
| 1 N/A N/A 1368618 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 1 N/A N/A 1369320 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
+-----------------------------------------------------------------------------+
tommiller@Ryzen-Charon:~$
=========================
I have spent an hour plunking around which this. And I doubt I will ever become a highly-skilled Linux professional. I seem to have gotten beyond "novice" and reached "beginner" where I am likely to stay.
Thank you.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Put all the commands in a
)
Put all the commands in a bash script without sudo in the commands
run the script with sudo.
_________________________________________________________________________
Here is my script I named
)
Here is my script I named gpuoverclock.sh that sits on the Desktop. It is one of the first things I run when I first boot the PC. It sets up the cards for each host. The only difference in the script is the number of cards, the power levels I set for them and the clocks depending on what kind of cards are in the host.
#!/bin/bash
/usr/bin/nvidia-smi -pm 1
nvidia-smi -i 0 -pl 200
nvidia-smi -i 1 -pl 200
nvidia-smi -i 2 -pl 200
/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:2]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:3]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:4]/GPUTargetFanSpeed=85"
/usr/bin/nvidia-settings -a "[fan:5]/GPUTargetFanSpeed=90"
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:0]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:1]/GPUGraphicsClockOffset[4]=60"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:2]/GPUGraphicsClockOffset[4]=60"
As you can see from the first shebang at the top of the script, it is meant to be run as root since you need root access to set the persistence mode.
Thank you.
)
Thank you.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Thank you. I am certain I
)
Thank you. I am certain I will have some questions after I edit it for two and one gpu setups.
Like what performance gain does persistence mode provide?
If the gpus seem to be cooling fine do need I need manual settings?
Power mizer mode "automatic" seems to be most productive for my systems. That is mode zero?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!