You can always add the run_manager script or just point to the client by adding the application to the Startup Applications list via that app. That way the client will run automatically at boot.
I'd do at least 2 on the 4090's, likely 3 will be better production. Look at the gpu utilization in nvidia-smi to see how many fit and when the total time per task starts going higher than the time of a 1X run.
I'd do at least 2 on the 4090's, likely 3 will be better production. Look at the gpu utilization in nvidia-smi to see how many fit and when the total time per task starts going higher than the time of a 1X run.
And the way to do this is with the app config file? I have it set to run 3 concurrent on the project settings, but I know that is probably trumped by the modifications done with this special app?
you have more control using an app_config.xml file.
when you have tasks running, can you run a command for me and report the output? I've always had the hunch that the 4090 performance was limited by the GPU memory bandwidth. it has too many cores and not enough memory bandwidth to use them all to their full potential.
run this command while the tasks are about halfway through running a task.
you have more control using an app_config.xml file.
when you have tasks running, can you run a command for me and report the output? I've always had the hunch that the 4090 performance was limited by the GPU memory bandwidth. it has too many cores and not enough memory bandwidth to use them all to their full potential.
run this command while the tasks are about halfway through running a task.
you have more control using an app_config.xml file.
My brain is all jumbled up from today- what would be the name of the special version of the app for the app_config.xml file? Would the rest of this be correct? I want to try two concurrent first:
but this is supporting my theory. Look how high the memory controller load is compared to GPU load. Running 1 task has the memory bus almost maxed already with only 85% GPU utilization. Running 2x will probably have the memory at 100% and the core still not at 100%
what is your power limit set to? You can see that with the default “nvidia-smi” command.
you have more control using an app_config.xml file.
My brain is all jumbled up from today- what would be the name of the special version of the app for the app_config.xml file? Would the rest of this be correct? I want to try two concurrent first:
Power setting when not running two tasks (baseline):
$ nvidia-smi
Fri Feb 17 14:29:27 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:41:00.0 On | Off |
| 0% 37C P8 42W / 480W | 213MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1522 G /usr/lib/xorg/Xorg 77MiB |
| 0 N/A N/A 3825 G /usr/lib/firefox/firefox 133MiB |
+-----------------------------------------------------------------------------+
Power setting when running two:
nvidia-smi
Fri Feb 17 14:31:01 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:41:00.0 On | Off |
| 0% 51C P2 298W / 480W | 4238MiB / 24564MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1522 G /usr/lib/xorg/Xorg 77MiB |
| 0 N/A N/A 3825 G /usr/lib/firefox/firefox 133MiB |
| 0 N/A N/A 4740 C ...-pc-linux-gnu-opencl_v1.0 2010MiB |
| 0 N/A N/A 4744 C ...-pc-linux-gnu-opencl_v1.0 2010MiB |
+-----------------------------------------------------------------------------+
So, basically, there is still A LOT of compute power left on the table that cannot even be used? Even the wattage under load is way less then I would have thought. All can be tracked back to the memory bus?
I see Ian's fingers are
)
I see Ian's fingers are faster.
You can always add the run_manager script or just point to the client by adding the application to the Startup Applications list via that app. That way the client will run automatically at boot.
BRILLIANT! Success! htt
)
BRILLIANT!
Success!
https://einsteinathome.org/host/13125618/tasks/0/0
That did it. Thank you all for getting us up and running with this (these) new system(s).
Do you think we should be running 2 or 3 concurrently on this hardware? I know you all suggest two, but wanted to know your thoughts.
I'd do at least 2 on the
)
I'd do at least 2 on the 4090's, likely 3 will be better production. Look at the gpu utilization in nvidia-smi to see how many fit and when the total time per task starts going higher than the time of a 1X run.
Keith Myers wrote: I'd do at
)
And the way to do this is with the app config file? I have it set to run 3 concurrent on the project settings, but I know that is probably trumped by the modifications done with this special app?
you have more control using
)
you have more control using an app_config.xml file.
when you have tasks running, can you run a command for me and report the output? I've always had the hunch that the 4090 performance was limited by the GPU memory bandwidth. it has too many cores and not enough memory bandwidth to use them all to their full potential.
run this command while the tasks are about halfway through running a task.
nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
_________________________________________________________________________
Ian&Steve C. wrote: you have
)
Done. I tried it under a few settings, all running 1 work unit.
Baseline (not running E@H, using “adaptive” power setting)
$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 4090, 00000000:41:00.0, 0 %, 1 %, 210 MHz, 33.28 W, 175 MiB, 1, 16
While running (1 work unit, using “adaptive” power setting)
$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 4090, 00000000:41:00.0, 84 %, 91 %, 2850 MHz, 281.55 W, 2282 MiB, 3, 16
While running (1 work unit, using “prefer maximum performance” power setting)
$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 4090, 00000000:41:00.0, 87 %, 94 %, 2850 MHz, 286.23 W, 2282 MiB, 3, 16
Ian&Steve C. wrote: you have
)
My brain is all jumbled up from today- what would be the name of the special version of the app for the app_config.xml file? Would the rest of this be correct? I want to try two concurrent first:
Load up 2x tasks per GPU and
)
Load up 2x tasks per GPU and repeat please.
but this is supporting my theory. Look how high the memory controller load is compared to GPU load. Running 1 task has the memory bus almost maxed already with only 85% GPU utilization. Running 2x will probably have the memory at 100% and the core still not at 100%
what is your power limit set to? You can see that with the default “nvidia-smi” command.
_________________________________________________________________________
Boca Raton Community HS
)
name should be ‘hsgamma_FGRPB1G’
set cpu_usage to 1.0. That’s more inline with what is actually used.
_________________________________________________________________________
Updated to run
)
Updated to run two:
nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 4090, 00000000:41:00.0, 98 %, 97 %, 2850 MHz, 295.30 W, 4238 MiB, 3, 16
Power setting when not running two tasks (baseline):
$ nvidia-smi
Fri Feb 17 14:29:27 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:41:00.0 On | Off |
| 0% 37C P8 42W / 480W | 213MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1522 G /usr/lib/xorg/Xorg 77MiB |
| 0 N/A N/A 3825 G /usr/lib/firefox/firefox 133MiB |
+-----------------------------------------------------------------------------+
Power setting when running two:
nvidia-smi
Fri Feb 17 14:31:01 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:41:00.0 On | Off |
| 0% 51C P2 298W / 480W | 4238MiB / 24564MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1522 G /usr/lib/xorg/Xorg 77MiB |
| 0 N/A N/A 3825 G /usr/lib/firefox/firefox 133MiB |
| 0 N/A N/A 4740 C ...-pc-linux-gnu-opencl_v1.0 2010MiB |
| 0 N/A N/A 4744 C ...-pc-linux-gnu-opencl_v1.0 2010MiB |
+-----------------------------------------------------------------------------+
So, basically, there is still A LOT of compute power left on the table that cannot even be used? Even the wattage under load is way less then I would have thought. All can be tracked back to the memory bus?
Running three concurrently:
nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 4090, 00000000:41:00.0, 100 %, 100 %, 2850 MHz, 300.80 W, 6260 MiB, 3, 16
nvidia-smi
Fri Feb 17 14:49:11 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:41:00.0 On | Off |
| 0% 55C P2 303W / 480W | 6260MiB / 24564MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1522 G /usr/lib/xorg/Xorg 89MiB |
| 0 N/A N/A 3825 G /usr/lib/firefox/firefox 133MiB |
| 0 N/A N/A 4959 G /usr/bin/nvidia-settings 0MiB |
| 0 N/A N/A 5400 C ...-pc-linux-gnu-opencl_v1.0 2010MiB |
| 0 N/A N/A 5404 C ...-pc-linux-gnu-opencl_v1.0 2010MiB |
| 0 N/A N/A 5436 C ...-pc-linux-gnu-opencl_v1.0 2010MiB |