Einstein FGRPB1G Linux/Nvidia Special app "AIO"

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46763022642

RAC: 64070039

first, I just want to say

7 Mar 2023 17:57:12 UTC

Message 209379

(moderation:

)

first, I just want to say that there is no guarantee that slowing the memory clock will reduce the invalids. it was only a suggestion to try to see if it helped. same with slowing the core clock. i just wanted that to be more clear.

second, yes you need to edit the coolbits to unlock the overclocking ability (as well as thermal control). I do it by running this command:

sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus

then reboot. and you should have the ability to adjust the clocks in the Nvidia Settings app.

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10555785586

RAC: 25229017

Ian&Steve C. wrote:first, I

7 Mar 2023 18:09:28 UTC

Message 209381 in response to message 209379

(moderation:

)

Ian&Steve C. wrote:

first, I just want to say that there is no guarantee that slowing the memory clock will reduce the invalids. it was only a suggestion to try to see if it helped. same with slowing the core clock. i just wanted that to be more clear.

second, yes you need to edit the coolbits to unlock the overclocking ability (as well as thermal control). I do it by running this command:

sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus

then reboot. and you should have the ability to adjust the clocks in the Nvidia Settings app.

For sure- I am really just playing around with it to see the impact.

I can edit the clock settings but they will not save when I exit the app and re-open it. I am not even sure they update the clock speeds after I change the numbers. What is odd is that if I change the preferred mode, that DOES save when I exit the app and re-open it.

EDIT: Nevermind- I think I figured it out... I think it saved this time?

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46763022642

RAC: 64070039

the command i posted is the

7 Mar 2023 18:13:02 UTC

Message 209382

(moderation:

)

the command i posted is the command i use on all my systems. changing the clock speed persists. but it does not persist after a reboot. you will have to re-set the clocks to what you want after reboots.

i just do this with a script.

GPU_clocks.sh wrote:

#!/bin/bash

/usr/bin/nvidia-smi -pm 1
/usr/bin/nvidia-smi -acp UNRESTRICTED

/usr/bin/nvidia-smi -i 0 -pl 130

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=1000" -a "[gpu:0]/GPUGraphicsClockOffset[4]=50"

/usr/bin/nvidia-settings -a '[gpu:0]/GPUFanControlState=1' -a '[fan:0]/GPUTargetFanSpeed=100'

save this, adjust the values as needed. and make sure it's executable. (either run chmod +x on it, or go into the file properties and click the checkbox to 'run as a program')

if you want to reduce clocks, use a negative offset.

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10555785586

RAC: 25229017

Okay, thank you. I will play

7 Mar 2023 18:26:19 UTC

Message 209383 in response to message 209382

(moderation:

)

Okay, thank you. I will play around with this. Initial observation- changing the clocks (up or down by even 100) is causing the work units to fail at about ~2%. That is even with a reduction in clock and/or memory speed. It seems to be very... finicky. I will keep messing around with it.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18716270141

RAC: 6373016

I see you are getting hit by

7 Mar 2023 19:03:29 UTC

Message 209390 in response to message 209383

(moderation:

)

I see you are getting hit by a couple of different errors.

First is you are getting the "flushing" errors that I get on my 2080 cards using a 5950X. I seemed to be the only one with this issue. I extensively tried troubleshooting against all possible variables and never could pin down the cause on that host. I had other cards in other hosts not afflicted. I thought that only the older 2080 cards caused the issue as none of my 3000 series cards ever had the problem.

Interesting to see a new 4000 series card afflicted also.

The other errors I see are something about Petri's clfft files is not liked by the 4090 card.

This is an interesting snippet from a errored task.

Quote:

Using alternate fft kernel file: ../../clfft.kernel.Transpose2.cl.alt Using alternate fft kernel file: ../../clfft.kernel.Stockham3.cl.alt FFTGeneratedStockhamAction::compileKernels failed Error in OpenCL context: CL_OUT_OF_RESOURCES error executing CL_COMMAND_WRITE_BUFFER on NVIDIA GeForce RTX 4090 (Device 0).

And here again:

Quote:

Using alternate fft kernel file: ../../clfft.kernel.Transpose4.cl.alt
FFTGeneratedTransposeGCNAction::compileKernels failed
ERROR: plan generation("baking") failed: -5
09:05:52 (5558): [CRITICAL]: ERROR: MAIN() returned with error '-5'

Never knew that a "baking" plan generation was in the AIO setup.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46763022642

RAC: 64070039

Boca Raton Community HS

8 Mar 2023 13:59:59 UTC

Message 209423 in response to message 209383

(moderation:

)

Boca Raton Community HS wrote:

Okay, thank you. I will play around with this. Initial observation- changing the clocks (up or down by even 100) is causing the work units to fail at about ~2%. That is even with a reduction in clock and/or memory speed. It seems to be very... finicky. I will keep messing around with it.

another question of curiosity.

how much VRAM is each gamma ray task using?

you can check with the 'nvidia-smi' command, in the listed processes at the bottom of the output.

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10555785586

RAC: 25229017

Keith Myers wrote: I seemed

8 Mar 2023 14:21:00 UTC

Message 209424 in response to message 209390

(moderation:

)

Keith Myers wrote:

I seemed to be the only one with this issue. I extensively tried troubleshooting against all possible variables and never could pin down the cause on that host. I had other cards in other hosts not afflicted. I thought that only the older 2080 cards caused the issue as none of my 3000 series cards ever had the problem.

Interesting to see a new 4000 series card afflicted also.

Glad I could join you with this error!

Ian&Steve C. wrote:

another question of curiosity.

how much VRAM is each gamma ray task using?

you can check with the 'nvidia-smi' command, in the listed processes at the bottom of the output.

Each one is using 2010MiB.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46763022642

RAC: 64070039

ok that's not too much and

8 Mar 2023 15:24:08 UTC

Message 209428

(moderation:

)

ok that's not too much and pretty normal for this app.

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10555785586

RAC: 25229017

I had a thought during the

13 Mar 2023 13:20:00 UTC

Message 209661

(moderation:

)

I had a thought during the weekend- what if I enable ECC on the GPU? Do you think this could decrease the invalids? I do not know enough about the reason something is "invalidated" and if that relates to GPU errors in the memory. I can easily test this and let it go for a while, but wanted to know some thoughts on this.

I understand this will slow the work unit down, but if it increases the valid rate, then it might be worth it.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46763022642

RAC: 64070039

you can give it a shot. but

13 Mar 2023 13:26:51 UTC

Message 209664

(moderation:

)

you can give it a shot. but it also will likely slow down the computation. you'd have to test if the decrease in invalids (if at all) offsets the slower crunch times.

_________________________________________________________________________

Einstein FGRPB1G Linux/Nvidia Special app "AIO"

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner