All things Nvidia GPU

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8904993658
RAC: 10120773

Ian&Steve C. wrote:What is

Ian&Steve C. wrote:

What is the performance setting for this GPU in Nvidia Settings? “Auto” “Adaptive” “Prefer Maximum Performance” 

"Auto".

Ian&SteveC,

It looks (apparently) like the MB slot next to the CPU is bad.

I got the two gpu's (finally) hooked up into the two furthest away slots.  And both are registering again.

But the "upper" card is feeling just a little hot (85c).  And has slowed down a 100+ watts on its load.

I will test the first slot again with just that card installed later.

Time for a rest break.

Yes, I have a spare b450-f that is likely to swap back in....

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8904993658
RAC: 10120773

It was too much to hope for. 

It was too much to hope for.  The #1 GPU slot not being willing to send out a video signal has followed me to the other copy of the same MB that I have.

At least for the rtx 3080 ti FE's.

Apparently as soon as I OC the memory transfer on the card in the closest to the CPU pcie slot it drops just like it did on the other version of the same make/model of the MB.

So I guess the next test is trying it with a single GPU in the #1 slot.

===its late=== going to do the next test tomorrow.

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8904993658
RAC: 10120773

Tom M wrote: So I guess the

Tom M wrote:

So I guess the next test is trying it with a single GPU in the #1 slot.

===its late=== going to do the next test tomorrow.

Would not boot with card and video in first slot.  OS starting loading and then stalled.

So it FINALLY occurred to me that the other thing that the two Asus x570 prime MB's had in common is they were booting off the same m2 ssd drive.

Backed up the Boinc stuff.  Re-installed the OS et al.  And now Ryzen-Charon is happily crunching and displaying video in the #1 slot (closest to the cpu).

After I get the 2nd gpu back online the next step will be to move back to the Asus Prime MB with the 3950x cpu installed.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2997
Credit: 4926034438
RAC: 161522

Tom M wrote: Tom M

Tom M wrote:

Tom M wrote:

So I guess the next test is trying it with a single GPU in the #1 slot.

===its late=== going to do the next test tomorrow.

Would not boot with card and video in first slot.  OS starting loading and then stalled.

So it FINALLY occurred to me that the other thing that the two Asus x570 prime MB's had in common is they were booting off the same m2 ssd drive.

Backed up the Boinc stuff.  Re-installed the OS et al.  And now Ryzen-Charon is happily crunching and displaying video in the #1 slot (closest to the cpu).

After I get the 2nd gpu back online the next step will be to move back to the Asus Prime MB with the 3950x cpu installed.

Tom M

Try OC'ing the memory first.  ...just to be sure...

George

Proud member of the Old Farts Association

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8904993658
RAC: 10120773

GWGeorge007 wrote: Tom M

GWGeorge007 wrote:

Tom M wrote:

Tom M wrote:

So I guess the next test is trying it with a single GPU in the #1 slot.

===its late=== going to do the next test tomorrow.

Would not boot with card and video in first slot.  OS starting loading and then stalled.

So it FINALLY occurred to me that the other thing that the two Asus x570 prime MB's had in common is they were booting off the same m2 ssd drive.

Backed up the Boinc stuff.  Re-installed the OS et al.  And now Ryzen-Charon is happily crunching and displaying video in the #1 slot (closest to the cpu).

After I get the 2nd gpu back online the next step will be to move back to the Asus Prime MB with the 3950x cpu installed.

Tom M

Try OC'ing the memory first.  ...just to be sure...

Both gpus are on the test bench. The memory transfers are +1100 right now.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8904993658
RAC: 10120773

I remember Mean Time to

I remember Mean Time to Failure # being published for mechanical HDD's.

Are there any such #'s for gpu's?

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6258
Credit: 8904993658
RAC: 10120773

If you are holding things

If you are holding things like shader #'s etc equal I have a couple of questions.

Is Memory Transfer Rate of a GPU the limiting factor on how fast the numeric calculations are?  Or does it actually push the calculations faster?

I think I have a feel for graphic clock rates.  It resembles the behavior of CPU clock rates.

When you crank up the Memory Transfer Rate what is the next "choke" point?  Absolute available memory bandwidth?

What would the realistic upper limit to the Memory Transfer Rate on a 384 bit memory path for an rtx 3080 ti GPU?

Right now E@H is reporting much lower processing times for each individual task on the FGRP#1 tasks on the gpu I have Memory Transfer OCed to +3000.  Is there some other choke point(s) besides "bad calculations" aka: marked invalid that I have missed?

eg. Anything else that could slow the total processing down as recorded by RAC?

Tom M

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2997
Credit: 4926034438
RAC: 161522

Tom M wrote: Right now E@H

Tom M wrote:

Right now E@H is reporting much lower processing times for each individual task on the FGRP#1 tasks on the gpu I have Memory Transfer OCed to +3000.  Is there some other choke point(s) besides "bad calculations" aka: marked invalid that I have missed?

eg. Anything else that could slow the total processing down as recorded by RAC?

If you could lower the temperature of the GPU, that would increase production output - but to what degree ?  I don't know.  Nor do I know what else is truly involved

Keith had mentioned that the Graphics Clock settings are in a 13 block mode (I think I have that right), but raising the GC too high - like 100 / 200 - isn't beneficial to BOINC.  The higher you go in MTR the worse your invalids get.  The one thing that isn't mentioned is the actual GPU temps.  They have a "high" limit, but won't say how low you can go.  That's why guy's like Steve @ Gamer's Nexus (and others) do liquid Nitrogen testing to see what their ultimate scores are.

I seriously doubt that you are going into liquid nitrogen mode, and even if you did, it would only be short lived.

But as we have told you in the past, liquid cooling would benefit you in your BOINC scores.

The only other thing that I know of that could possibly be a choke point (as you say) would be your CPU use.  Remember, we're using the CPU for doing CPU tasks, as well as your browser which you and I like to use to wander through the internet.  How many CPU threads you use can affect how much GPU use you get for scores - to a point.  Of course, when you lower the CPU threads you lower the tasks accomplished by the CPU.

It is all a balancing act of doing or not doing more than one thing to make your system get to accomplish the highest BOINC settings possible.  It's all about how comfortable you are with your settings.

...and that's my two cent's worth...

George

Proud member of the Old Farts Association

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4895
Credit: 18433839079
RAC: 5715769

The achievable graphics

The achievable graphics clocks are not incremented decimally.  They jump in "bins" of 13Mhz, or at least they did up to Turing generation.  Not sure what the bin increment is on Ampere or Ada. The bins are a result of the internal clock generators resolution in the binary realm.

Each "bin" is a position along the VTP or voltage-temperature-power curve. When the temperature and power values allow, the GPUBoost mechanism in the firmware boosts the graphics clocks up into the next bin and holds it there as long as the temperature is still under the throttling limit and the power limit isn't exceeded and the graphics loading is held constant.

The bins are why you don't get an immediate 1:1 jump in the clocks when you enter an increase in offset.  You can continue to add 5Mhz half a dozen times and not see any change in the clocks.  Then the 6th increase in clocks make the graphics clock jump up by 13Mhz.

If you try an get too agressive, irrespective of causing actual image or compute errors, the clocks will jump up to a higher value for a quick burst and then immediately drop back and likely below the original clocks because you tried to jump up the VT curve too far and the firmware pulled the reins back and said whoah.

You should try and use smaller incremental offsets until you can keep the card consistently in an elevated bin under load and it doen't jump around.

When you increase the memory clocks you are speeding up the data transfer between cpu memory and gpu memory. The higher the clocks and the wider the memory bus is the more data can be transferred back and forth more quickly which speeds up the calculations.  But modern Nvidia gpus use on board error correction to validate memory transfers.  If the memory block does not transfer correctly, the gpu is asked to send the data again twice or more times until the result received is the same as what was sent.  Too many retries in data transmission of the same data packets slow the actual calculation down.  So you can push the memory data transfer rate too far and that is counter-productive.

You always need to keep at least one to two cores free to service the background processes or to make the data transfer between main memory and gpu memory.  Remember modern cpu architeture is all about time-slicing service requests from processes.  And that operates in a round-robin fashion for the most part.

If your data isn't ready when the cpu tells the gpu . . . . OK, its' your turn for my attention, it just cycles your request to the back of the queue until your turn comes around again.

 

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2997
Credit: 4926034438
RAC: 161522

Okay Keith!  Now that's what

Okay Keith!  Now that's what I'm talking about!  You obviously know much, much more than I do about GPUs and how they work.  And you have a much better way of saying it than I do.  Thank you!

Tom, +1 to what he said!

George

Proud member of the Old Farts Association

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.