Observations on AMD GPU memory clocks

cecht
cecht
Joined: 7 Mar 18
Posts: 426
Credit: 398,968,805
RAC: 868,012
Topic 218258

During the recent E@H work outages, I've been switching to Milkyway@Home GPU tasks and found some interesting differences in GPU performance between the Projects. The difference that first caught my eye was quite a bit lower GPU temperatures when running M@H tasks compared to E@H tasks. I then realized that both my AMD RX 460 and RX 570 run E@H at the top memory clock speed (1750 MHz), but run M@H at the lowest Mclk speed (300 MHz).
I noticed that the cards were also running at higher clock speeds (shader clock or core clock). The cards carried over the power caps I was using for E@H, so I tweaked the cards' power caps and ended up running M@H on the RX 460 at its max shader clock speed (1172 MHz) using a power cap of 44 W (vs. 48 W max). I set the RX 570 at near its max shader clock speed (1250 MHz @ state 6 vs. max 1286 MHz @ state 7) using a power cap of 105 W (vs. 125 W max). I adjusted the RX 570 power cap to that particular setting because I didn't notice any decrease in M@H task time with higher power that allowed the max clock speed.
The upshot is that I learned, perhaps what some folks already knew, that a sizeable chunk of the power for running E@H GPU tasks comes from running the card's memory clock at top speed. i haven't messed with underclocking memory. 
I have a Ubuntu system with AMDGPU open drivers and did all GPU performance monitoring and power capping with the utility amdgpu-utils https://github.com/Ricks-Lab/amdgpu-utils. I assume similar results could be seen under Windows with AMD Wattman.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 430
Credit: 333,494,714
RAC: 596,043

I think it's been known for a

I think it's been known for a while that Einstein tasks are one of the few gpu projects that is actually sensitive to both gpu memory clocks and also PCIe bus speeds.  For Einstein you should try and run Gen. 3 bus speeds and as high a gpu memory clock along with as high gpu core clocks.

BoincStats

archae86
archae86
Joined: 6 Dec 05
Posts: 2,657
Credit: 2,196,385,825
RAC: 2,078,165

Keith Myers wrote:I think

Keith Myers wrote:
I think it's been known for a while that Einstein tasks are one of the few gpu projects that is actually sensitive to both gpu memory clocks and also PCIe bus speeds.  For Einstein you should try and run Gen. 3 bus speeds and as high a gpu memory clock along with as high gpu core clocks.

I think this has changed with time.  Quite a long time ago when I first became aware that people were putting multiple high-end cards in a single box, I noticed that on the then-current Einstein application there was a severe degradation in output over that to be expected by simply adding the single-card performance.  On the current application, with current cards and motherboards, sharing resource does remarkably little harm, and motherboards with far less than up-to-date bus capabilities have a scarcely discernable harmful impact.  Gary Roberts has posted specific real data to this second point quite recently.

Regarding memory clock speed, I was one of those who gained a lot on a GTX 970 with memory clock boosting (that was my first encounter with GPU overclocking of any kind).  My experience with later generation Nvidia cards running current Einstein applications has been far less beneficial.  My RTX 2080 gives correct results with a remarkably high memory overclock, but gets a rather small benefit from it.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 430
Credit: 333,494,714
RAC: 596,043

Gary's test was informative. 

Gary's test was informative.   Guess things have changed and the current work and app are not as sensitive as once.

I run my 2080 at stock 14000Mhz memory.  Actually haven't tried to increase it.  Seems such a ludicrous number compared to Maxwell and Pascal.  I only run very small bumps in core clocks 30-60 Mhz and just let GPU Boost take care of the actual clocks based on temps and power targets.

I also just go for generalized performance tuning as I run multiple projects and they are all different in requirements. So optimizing for one project may negatively impact another project.

BoincStats

kb9skw
kb9skw
Joined: 25 Feb 05
Posts: 20
Credit: 207,651,452
RAC: 8,009

I replaced a RX 570 with a R9

I replaced a RX 570 with a R9 Fury that uses HBM clocked at 500MHz. So odd to see such a low number but is has over twice the bandwidth of the DDR5 on the RX 570.

cecht
cecht
Joined: 7 Mar 18
Posts: 426
Credit: 398,968,805
RAC: 868,012

kb9skw wrote:I replaced a RX

kb9skw wrote:
I replaced a RX 570 with a R9 Fury that uses HBM clocked at 500MHz. So odd to see such a low number but is has over twice the bandwidth of the DDR5 on the RX 570.

It seems like HBM would give better power efficiency than DDR5 for crunching E@H, but I don't know how you'd tease out memory watts from GPU watts.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.