it's built into the driver. you don't *need* to do anything outside of setting the environment variable for CUDA_VISIBLE_DEVICES and running the command to start the mps daemon. when it's running you will see the nvidia-cuda-mps-server running as a process on your GPU in nvidia-smi. it's best to start the daemon before you start BOINC, as it wont take effect on any in-progress tasks, and will pickup subsequest tasks as they spin up.
you can also play with active_thread_percentage to tune things further.
beware of the major caveat with MPS. it is CUDA ONLY. you cannot run OpenCL tasks while MPS is running. that means you wont be able to run opencl work from other projects, or even the GW app from Einstein. you will need to stop the MPS server before you can run OpenCL tasks again.
I have a question about the Interaction between a non-overclocked rtx 3080 ti, gen1 / 1X pcie bus, a 2 core cpu AND Invalids.
I am wondering if an "over-committed" cpu would tend to lead to more Invalids?
My Mining-Rig is trying to run 4 rtx 3080 ti's under the above description. And more often than not it is not only processing slower than "average" eg. taking more than 240s (4 minutes) and it is "throwing" enough Invalids to stop or slowdown any increase in RAC towards the nominal 4M it should end up at.
Some days it is running 20+% Invalids with a flat RAC. Dec 22 it appears to have run about 18% Invalids.
This is ALL in the context of Brp7/MeerKat (optimized/Antonymous platform) tasks running 1 task per gpu.
If this supposition is correct one way to test this is start running a single rtx 3080 ti for at least a week on that machine. If the % of Invalids the machine has drops to below or near 5% then I think I may have a case for the "over-committed" cpu. Possibly.
I know that I am getting quite high bandwidth reported per gpu. Up to 27% on one gpu. So an alternative supposition of not enough bandwidth could not be ruled out.
If I can confirm that 1 gpu on this system runs with the typical Invalid rate of near 5% then the only way I see to test the over-committed cpu would be to upgrade the cpu and try running 4-5 gpus on it again?
Or just stop "nibbling around the edges" and give up this particular configuration?
So I suppose the issue is how much "good money" do I want to potentially throw after the bad money. To date I have spent less than the cost of a new AsRock Epycd8 motherboard on this experiment.
And I have an Epycd8 waiting in the wings if/when this experiment dies.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
it's built into the driver.
)
it's built into the driver. you don't *need* to do anything outside of setting the environment variable for CUDA_VISIBLE_DEVICES and running the command to start the mps daemon. when it's running you will see the nvidia-cuda-mps-server running as a process on your GPU in nvidia-smi. it's best to start the daemon before you start BOINC, as it wont take effect on any in-progress tasks, and will pickup subsequest tasks as they spin up.
you can also play with active_thread_percentage to tune things further.
beware of the major caveat with MPS. it is CUDA ONLY. you cannot run OpenCL tasks while MPS is running. that means you wont be able to run opencl work from other projects, or even the GW app from Einstein. you will need to stop the MPS server before you can run OpenCL tasks again.
_________________________________________________________________________
I have a question about the
)
I have a question about the Interaction between a non-overclocked rtx 3080 ti, gen1 / 1X pcie bus, a 2 core cpu AND Invalids.
I am wondering if an "over-committed" cpu would tend to lead to more Invalids?
My Mining-Rig is trying to run 4 rtx 3080 ti's under the above description. And more often than not it is not only processing slower than "average" eg. taking more than 240s (4 minutes) and it is "throwing" enough Invalids to stop or slowdown any increase in RAC towards the nominal 4M it should end up at.
Some days it is running 20+% Invalids with a flat RAC. Dec 22 it appears to have run about 18% Invalids.
This is ALL in the context of Brp7/MeerKat (optimized/Antonymous platform) tasks running 1 task per gpu.
If this supposition is correct one way to test this is start running a single rtx 3080 ti for at least a week on that machine. If the % of Invalids the machine has drops to below or near 5% then I think I may have a case for the "over-committed" cpu. Possibly.
I know that I am getting quite high bandwidth reported per gpu. Up to 27% on one gpu. So an alternative supposition of not enough bandwidth could not be ruled out.
If I can confirm that 1 gpu on this system runs with the typical Invalid rate of near 5% then the only way I see to test the over-committed cpu would be to upgrade the cpu and try running 4-5 gpus on it again?
Or just stop "nibbling around the edges" and give up this particular configuration?
So I suppose the issue is how much "good money" do I want to potentially throw after the bad money. To date I have spent less than the cost of a new AsRock Epycd8 motherboard on this experiment.
And I have an Epycd8 waiting in the wings if/when this experiment dies.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!