All things Nvidia GPU

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6511
Credit: 9619473207
RAC: 3330074

I have a pair of rtx 3080 ti

I have a pair of rtx 3080 ti Founders Edition gpus.

The one that is not driving the monitor is utilizing 100 watts+ less power draw than the other.

OC doesn't seem to make a difference.  And I am seeing a wider variability between processing times than I used to.  Some tasks are 156s, some are 163s.

Any ideas?

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4016
Credit: 47634818295
RAC: 43800249

need more information. are

need more information.

are they both power limited to the same power? what value?
when you say one is "100W more" than the other. what are the specific power values observed being pulled? how is is being measured?
what are the clocks observed by each card? please specify the clocks with the power draw for each card.

 

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6511
Credit: 9619473207
RAC: 3330074

Ian&Steve C. wrote: need

Ian&Steve C. wrote:

need more information.

are they both power limited to the same power? what value?
when you say one is "100W more" than the other. what are the specific power values observed being pulled? how is is being measured?
what are the clocks observed by each card? please specify the clocks with the power draw for each card.

Tried to post a reply before I did a cold boot.  Apparently I didn't save the message.

After the cold boot both gpus are drawing at near 400 watts on a PL of 400.

I have just OCed the memory transfer back up to +900

Just ran nvidia-smi again and one gpu is drawing 100+ watts less than the other.

tommiller@Ryzen-Charon:~$ nvidia-smi
Sat Feb 18 08:58:21 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:08:00.0 Off |                  N/A |
|100%   76C    P2   301W / 400W |   2927MiB / 12288MiB |     89%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
|100%   70C    P2   396W / 400W |   4042MiB / 12288MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1212      G   /usr/lib/xorg/Xorg                  5MiB |
|    0   N/A  N/A      1752      G   /usr/lib/xorg/Xorg                  6MiB |
|    0   N/A  N/A      3354      C   ...-pc-linux-gnu-opencl_v1.0     1034MiB |
|    0   N/A  N/A      3416      G   /usr/bin/nvidia-settings            0MiB |
|    0   N/A  N/A      3426      C   ...-pc-linux-gnu-opencl_v1.0     1874MiB |
|    1   N/A  N/A      1212      G   /usr/lib/xorg/Xorg                 17MiB |
|    1   N/A  N/A      1752      G   /usr/lib/xorg/Xorg                 60MiB |
|    1   N/A  N/A      1956      G   /usr/bin/gnome-shell               79MiB |
|    1   N/A  N/A      2942      G   /usr/lib/firefox/firefox          119MiB |
|    1   N/A  N/A      3370      C   ...-pc-linux-gnu-opencl_v1.0     1874MiB |
|    1   N/A  N/A      3396      C   ...-pc-linux-gnu-opencl_v1.0     1874MiB |
|    1   N/A  N/A      3416      G   /usr/bin/nvidia-settings            0MiB |
+-----------------------------------------------------------------------------+
tommiller@Ryzen-Charon:~$

 

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4016
Credit: 47634818295
RAC: 43800249

That does not tell me what

That does not tell me what clock speed the cards are running. 
 

but I do see that the lower power GPU is only running 89% GPU utilization where the full power card is at 99%. 
 

 

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4016
Credit: 47634818295
RAC: 43800249

run this command instead,

run this command instead, post the output. make sure you run this a few times to make sure you get an output that is representative of the cards running steady state (and not an outlier reading like if you were to run it at the exact moment that a task stopped). this is all one line command, not multiple lines/commands

nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6511
Credit: 9619473207
RAC: 3330074

It looks like it was a

It looks like it was a connectivity issue.  I got an error that said nvidia-smi could not talk to a card.

So I blew the cards out.  Took off the problem child and did the Keith M cleaning routine.  Re-seated it. 

After I booted and started it up, the OTHER gpu quit completely (lights out).

Wiggled that cards power wiring and the card.

Booted again.  And both are now drawing right next to 400 watts.

Nope.  The card that also has a label on the hdmi port claiming it doesn't work has powered down to a lower draw again.

Going to do the requested command line diagnostic next.

Testing Results follows

tommiller@Ryzen-Charon:~$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], clocks.current.memory [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 3080 Ti, 00000000:08:00.0, 100 %, 93 %, 1530 MHz, 9801 MHz, 265.14 W, 3767 MiB, 4, 8
NVIDIA GeForce RTX 3080 Ti, 00000000:09:00.0, 90 %, 97 %, 1875 MHz, 9801 MHz, 364.05 W, 2885 MiB, 4, 8
tommiller@Ryzen-Charon:~$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], clocks.current.memory [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 3080 Ti, 00000000:08:00.0, 99 %, 93 %, 1530 MHz, 9801 MHz, 262.40 W, 3767 MiB, 4, 8
NVIDIA GeForce RTX 3080 Ti, 00000000:09:00.0, 100 %, 100 %, 1875 MHz, 9801 MHz, 372.43 W, 4053 MiB, 4, 8
tommiller@Ryzen-Charon:~$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], clocks.current.memory [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 3080 Ti, 00000000:08:00.0, 100 %, 93 %, 1530 MHz, 9801 MHz, 264.97 W, 3767 MiB, 4, 8
NVIDIA GeForce RTX 3080 Ti, 00000000:09:00.0, 100 %, 100 %, 1875 MHz, 9801 MHz, 373.02 W, 4056 MiB, 4, 8
tommiller@Ryzen-Charon:~$

 

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6511
Credit: 9619473207
RAC: 3330074

If nothing else I am going to

If nothing else I am going to swap cards and slots.  And see if the problem follows the card or stays with the slot.

===edit== Even though the temperature isn't displaying like it has hit the temperature limit, a way it slows down as the temperature goes up, makes me wonder.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4016
Credit: 47634818295
RAC: 43800249

Looks like GPU0 is locked at

Looks like GPU0 is locked at 1530MHz. That’s probably why it’s running with less power draw. 

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6511
Credit: 9619473207
RAC: 3330074

Ian&Steve C. wrote: Looks

Ian&Steve C. wrote:

Looks like GPU0 is locked at 1530MHz. That’s probably why it’s running with less power draw. 

Two questions.  Use some kind of reset?  And it draws full power when it starts up.  Then apparently heats up and slows down?

I will look for command line GPU reset stuff.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4016
Credit: 47634818295
RAC: 43800249

What is the performance

What is the performance setting for this GPU in Nvidia Settings? “Auto” “Adaptive” “Prefer Maximum Performance” 

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.