All things Nvidia GPU

Skip Da Shu
Skip Da Shu
Joined: 18 Jan 05
Posts: 150
Credit: 997883709
RAC: 799251

Which is close enough to my

Which is close enough to my current problem for me to reply to you in hopes there's a solution someplace...

This box is a Linux Mint v20.3 box that has two card PCIE slots.  Before I ever got 'coolbits' / xorg working one of the cards fans gave up.  While that card was out I got xorg & coolbits working while the box was single card.  I got the replacement card in today, adjusted my xorg device entries, installed the card and with one minor BusID change (I still had prior one in) I got to my screen and ran my script to set things:

# card on bottom w/ monitor
    /usr/bin/nvidia-smi -i 0 -pm 1
    /usr/bin/nvidia-smi -i 0 -pl 222
    
# card on top w/o monitor
    /usr/bin/nvidia-smi -i 1 -pm 1
    /usr/bin/nvidia-smi -i 1 -pl 190
 
# added 2nd Zotac 3070 5/24/2023, coolbits only functional on gpu0(Device0)
#
    /usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
    /usr/bin/nvidia-settings -a "[gpu:0]/GPUGraphicsClockOffset[4]=90"
    /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=138"
    /usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
#    /usr/bin/nvidia-settings -a "[gpu:1]/GPUGraphicsClockOffset[4]=90"
#    /usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[4]=138"

Last two lines were just ignored, so I commented them out to remind me for now.

My clock checker at the moment shows:

  Attribute 'GPUCurrentClockFreqs' (skip-MS7C91:0.0): 2085,6870.
  Attribute 'GPUCurrentClockFreqs' (skip-MS7C91:0[gpu:0]): 2085,6870.
  Attribute 'GPUCurrentClockFreqs' (skip-MS7C91:0[gpu:1]): 1920,6800.

 

nvidia-smi:

Wed May 24 17:27:24 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3070         On | 00000000:04:00.0  On |                  N/A |
| 60%   64C    P2              139W / 222W|   1370MiB /  8192MiB |     99%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3070         On | 00000000:2B:00.0 Off |                  N/A |
| 97%   86C    P2              184W / 189W|    722MiB /  8192MiB |     97%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1226      G   /usr/lib/xorg/Xorg                          146MiB |
|    0   N/A  N/A      1755      G   cinnamon                                     32MiB |
|    0   N/A  N/A      5171      G   ...gnu/webkit2gtk-4.0/WebKitWebProcess       26MiB |
|    0   N/A  N/A     10050      G   /usr/lib/firefox/firefox                     97MiB |
|    0   N/A  N/A     17585      C   ...6_x86_64-pc-linux-gnu__BRP7-cuda102      714MiB |
|    0   N/A  N/A     18953      C   ..._64-pc-linux-gnu__opencl_nvidia_101      348MiB |
|    1   N/A  N/A      1226      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A     17663      C   ...6_x86_64-pc-linux-gnu__BRP7-cuda102      714MiB |
+---------------------------------------------------------------------------------------+

All to show that coolbits is only applying to GPU0 (Device0).  In xorg it is coded in the single Screen Section.  I will run off now and try putting it on each device but I think I've been down this road w/o success.  Any suggestions?

Thanx, Skip

 

 

 

Skip Da Shu
Skip Da Shu
Joined: 18 Jan 05
Posts: 150
Credit: 997883709
RAC: 799251

With coolbits and Thermal...

With coolbits and Thermal... stuff applied to the Device1 section:

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:43:0:0"
    Option         "Coolbits" "12"
    Option         "ThermalConfigurationCheck" "True"
EndSection

# this will become device 0, lower card w/ monitor
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:04:0:0"
EndSection

Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
        Monitor        "Monitor0"
        Option         "Coolbits" "12"
        Option         "ThermalConfigurationCheck" "True"
        DefaultDepth    24
    SubSection     "Display"
            Depth       24
    EndSubSection
EndSection

 

Xorg.0.log shows:

[     6.866] (II) Applying OutputClass "nvidia" options to /dev/dri/card0
[     6.866] (**) NVIDIA(0): Option "AllowEmptyInitialConfiguration"
[     6.866] (**) NVIDIA(0): Option "ThermalConfigurationCheck" "True"
[     6.866] (**) NVIDIA(0): Option "Coolbits" "12"

[     6.866] (**) NVIDIA(0): Enabling 2D acceleration
[     6.866] (II) Loading sub module "glxserver_nvidia"
[     6.866] (II) LoadModule: "glxserver_nvidia"
[     6.866] (II) Loading /usr/lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so
[     6.879] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[     6.879]     compiled for 1.6.99.901, module version = 1.0.0
[     6.879]     Module class: X.Org Server Extension
[     6.879] (II) NVIDIA GLX Module  530.41.03  Thu Mar 16 19:27:05 UTC 2023
[     6.879] (II) NVIDIA: The X server supports PRIME Render Offload.
[     8.129] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:4:0:0

.

[     8.318] (II) Applying OutputClass "nvidia" options to /dev/dri/card1
[     8.318] (**) NVIDIA(G0): Option "AllowEmptyInitialConfiguration"
[     8.318] (**) NVIDIA(G0): Enabling 2D acceleration
[     8.318] (II) NVIDIA: The X server supports PRIME Render Offload.
[     9.193] (--) NVIDIA(0): Valid display device(s) on GPU-1 at PCI:43:0:0
[     9.193] (--) NVIDIA(0):     DFP-0
[     9.193] (--) NVIDIA(0):     DFP-1
[     9.193] (--) NVIDIA(0):     DFP-2
[     9.193] (--) NVIDIA(0):     DFP-3
[     9.193] (--) NVIDIA(0):     DFP-4
[     9.193] (--) NVIDIA(0):     DFP-5
[     9.193] (--) NVIDIA(0):     DFP-6
[     9.194] (II) NVIDIA(G0): NVIDIA GPU NVIDIA GeForce RTX 3070 (GA104-A) at PCI:43:0:0
[     9.194] (II) NVIDIA(G0):     (GPU-1)

.

[     9.197] (II) NVIDIA(G0): Virtual screen size determined to be 640 x 480
[     9.197] (WW) NVIDIA(G0): Unable to get display device for DPI computation.
[     9.197] (==) NVIDIA(G0): DPI set to (75, 75); computed from built-in default
[     9.199] (II) NVIDIA: Reserving 24576.00 MB of virtual memory for indirect memory
[     9.199] (II) NVIDIA:     access.
[     9.213] (II) NVIDIA(0): Setting mode "DFP-0:nvidia-auto-select"
[     9.292] (==) NVIDIA(0): Disabling shared memory pixmaps
[     9.292] (==) NVIDIA(0): Backing store enabled
[     9.292] (==) NVIDIA(0): Silken mouse enabled
[     9.292] (**) NVIDIA(0): DPMS enabled
[     9.292] (II) Loading sub module "dri2"
[     9.292] (II) LoadModule: "dri2"
[     9.292] (II) Module "dri2" already built-in
[     9.292] (II) NVIDIA(0): [DRI2] Setup complete
[     9.292] (II) NVIDIA(0): [DRI2]   VDPAU driver: nvidia
[     9.304] (II) NVIDIA(G0): Setting mode "NULL"
[     9.313] (==) NVIDIA(G0): Disabling shared memory pixmaps
[     9.313] (==) NVIDIA(G0): Backing store enabled
[     9.313] (==) NVIDIA(G0): Silken mouse enabled
[     9.313] (**) NVIDIA(G0): DPMS enabled
[     9.313] (WW) NVIDIA(G0): Option "Coolbits" is not used
[     9.313] (WW) NVIDIA(G0): Option "ThermalConfigurationCheck" is not used

[     9.313] (II) Loading sub module "dri2"
[     9.313] (II) LoadModule: "dri2"
[     9.313] (II) Module "dri2" already built-in

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4895
Credit: 18434715897
RAC: 5708155

What was your coolbits

What was your coolbits command line?

This is the one I use to have thermal and fan control set for all cards.

Quote:
sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus

If you have installed a card after you ran the coolbits tweak, you need to rerun it.  You should see thermal control sections for each card in the xorg.conf file in /etc/X11

I'd clear out the existing xorg.conf by copying back over the original xorg.conf.backup file of the original bare installation and then rerun the coolbits tweak.

 

 

Skip Da Shu
Skip Da Shu
Joined: 18 Jan 05
Posts: 150
Credit: 997883709
RAC: 799251

1) Saw a post of yours from

1) Saw a post of yours from long ago and ordered a hdmi dummy plug.  Arrives tomorrow.

2) There is a lot of good info in this thread.  I've gotta make to time to read the 50 pages I haven't yet.

3) Mint v20.3 doesn't come with an etc/X11/xorg.conf file and we could never get it to start the Xserver with one placed there.  It builds it on the fly from parts and pieces in usr/share/X11/xorg.conf.d.  I'll call it a virtual xorg.conf until I can figure out where it writes it out at.

I hand built a 20-nvidia.conf there:

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "AOC"
    ModelName      "32G2WG3"
    Option         "DPMS"
EndSection


Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:43:0:0"
    Option         "Coolbits" "12"
    Option         "ThermalConfigurationCheck" "True"
EndSection

# this will become device 0, lower card w/ monitor
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:04:0:0"
EndSection

Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
        Monitor        "Monitor0"
        Option         "Coolbits" "12"
        Option         "ThermalConfigurationCheck" "True"
        DefaultDepth    24
    SubSection     "Display"
            Depth       24
    EndSubSection
EndSection

 

4) I took a backup so I can see what "sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus" does to me on this box.

RESULTS: 4) above.  Surprisingly I didn't get a black sceen and it built a legit etc/X11/xorg.conf   It didn't get me coolbits control on GPU1 but I'm going to try adding the virtual display thing to the screen it hooked GPU1 (Device1) to.  And try again tomorrow with hdmi dummy.

    SubSection     "Display"
        Depth       24
        Virtual     1920 1080
    EndSubSection

 

Xorg.0.log:

[     9.433] (WW) NVIDIA(G0): Option "Coolbits" is not used
[     9.433] (WW) NVIDIA(G0): Option "ThermalConfigurationCheck" is not used

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4895
Credit: 18434715897
RAC: 5708155

Mint is weird.  Should behave

Mint is weird.  Should behave exactly like Ubuntu and Debian which it is derived from.

But it doesn't a lot it seems.

I've never had any card not be enabled by the coolbits tweak.  I've run as many as four at a time on hosts before with only one card connected to the monitor. And all got fan and thermal control.

And none of them needed a dummy plug.  The only time I've needed a dummy plug was on headless SoC's to get video output for VNC or RDP sessions and to keep crunching on the embedded gpu in the SoC.

 

mikey
mikey
Joined: 22 Jan 05
Posts: 12547
Credit: 1838630996
RAC: 7097

Keith Myers wrote: Mint is

Keith Myers wrote:

Mint is weird.  Should behave exactly like Ubuntu and Debian which it is derived from.

But it doesn't a lot it seems.

I've never had any card not be enabled by the coolbits tweak.  I've run as many as four at a time on hosts before with only one card connected to the monitor. And all got fan and thermal control.

And none of them needed a dummy plug.  The only time I've needed a dummy plug was on headless SoC's to get video output for VNC or RDP sessions and to keep crunching on the embedded gpu in the SoC.

I have some homemade ones in a drawer because Windows required them for awhile but I'm not sure it does anymore. It's kinda like having to load the gpu drivers twice, once for each gpu, but not one installation takes care of both cards as long as they are both Nvidia. And for me it even works with a miner gpu and a fairly cheap 1gb card I use just for the display.

Skip Da Shu
Skip Da Shu
Joined: 18 Jan 05
Posts: 150
Credit: 997883709
RAC: 799251

Keith Myers wrote: Mint is

Keith Myers wrote:

Mint is weird.  Should behave exactly like Ubuntu and Debian which it is derived from.

But it doesn't a lot it seems.

I've never had any card not be enabled by the coolbits tweak.  I've run as many as four at a time on hosts before with only one card connected to the monitor. And all got fan and thermal control.

And none of them needed a dummy plug.  The only time I've needed a dummy plug was on headless SoC's to get video output for VNC or RDP sessions and to keep crunching on the embedded gpu in the SoC.

Well, I'll try the dummy plug with the

SubSection     "Display"
        Depth       24
        Virtual     1920 1080
EndSubSection

and if it doesn't get me there so be it.  The top card (in standard atx mobo) is the one running a bit slower and it's the hotter one so I'll just live with it.

With about 135~150w current loading...

Every 11.0s: NV_Clocks.sh          skip-MS7C91: Thu May 25 06:44:53 2023

  Attribute 'GPUCurrentClockFreqs' (skip-MS7C91:0[gpu:0]): 2115,6870.
  Attribute 'GPUCurrentClockFreqs' (skip-MS7C91:0[gpu:1]): 2040,6800.

Skip

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8907163658
RAC: 10163641

So far any price pressure the

So far any price pressure the rtx 4090 has provided doesn't seem to have driven the rtx 3090 ti price on eBay down much at all.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

yummycheese
yummycheese
Joined: 12 Aug 16
Posts: 8
Credit: 204951385
RAC: 0

I bought a EVGA 3080TI FTW3

I bought a EVGA 3080TI FTW3 for $650 on ebay which seems ~ reasonable ~ in this day and age. Seems like a sweet spot for compute/value with 10,000 compute cores. 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3911
Credit: 43711745976
RAC: 63087289

3080Ti is certainly a sweet

3080Ti is certainly a sweet spot for Einstein.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.