Observations on AMD GPU memory clocks

cecht

Joined: 7 Mar 18

Posts: 1530

Credit: 2883522472

RAC: 2108880

24 Feb 2019 17:08:36 UTC

Topic 218258

(moderation:

)

During the recent E@H work outages, I've been switching to Milkyway@Home GPU tasks and found some interesting differences in GPU performance between the Projects. The difference that first caught my eye was quite a bit lower GPU temperatures when running M@H tasks compared to E@H tasks. I then realized that both my AMD RX 460 and RX 570 run E@H at the top memory clock speed (1750 MHz), but run M@H at the lowest Mclk speed (300 MHz).
I noticed that the cards were also running at higher clock speeds (shader clock or core clock). The cards carried over the power caps I was using for E@H, so I tweaked the cards' power caps and ended up running M@H on the RX 460 at its max shader clock speed (1172 MHz) using a power cap of 44 W (vs. 48 W max). I set the RX 570 at near its max shader clock speed (1250 MHz @ state 6 vs. max 1286 MHz @ state 7) using a power cap of 105 W (vs. 125 W max). I adjusted the RX 570 power cap to that particular setting because I didn't notice any decrease in M@H task time with higher power that allowed the max clock speed.
The upshot is that I learned, perhaps what some folks already knew, that a sizeable chunk of the power for running E@H GPU tasks comes from running the card's memory clock at top speed. i haven't messed with underclocking memory.
I have a Ubuntu system with AMDGPU open drivers and did all GPU performance monitoring and power capping with the utility amdgpu-utils https://github.com/Ricks-Lab/amdgpu-utils. I assume similar results could be seen under Windows with AMD Wattman.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Keith Myers

Joined: 11 Feb 11

Posts: 4962

Credit: 18660149727

RAC: 5642715

I think it's been known for a

25 Feb 2019 2:24:22 UTC

Message 169757

(moderation:

)

I think it's been known for a while that Einstein tasks are one of the few gpu projects that is actually sensitive to both gpu memory clocks and also PCIe bus speeds. For Einstein you should try and run Gen. 3 bus speeds and as high a gpu memory clock along with as high gpu core clocks.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7213154931

RAC: 956074

Keith Myers wrote:I think

25 Feb 2019 3:04:48 UTC

Message 169758 in response to message 169757

(moderation:

)

Keith Myers wrote:

I think it's been known for a while that Einstein tasks are one of the few gpu projects that is actually sensitive to both gpu memory clocks and also PCIe bus speeds. For Einstein you should try and run Gen. 3 bus speeds and as high a gpu memory clock along with as high gpu core clocks.

I think this has changed with time. Quite a long time ago when I first became aware that people were putting multiple high-end cards in a single box, I noticed that on the then-current Einstein application there was a severe degradation in output over that to be expected by simply adding the single-card performance. On the current application, with current cards and motherboards, sharing resource does remarkably little harm, and motherboards with far less than up-to-date bus capabilities have a scarcely discernable harmful impact. Gary Roberts has posted specific real data to this second point quite recently.

Regarding memory clock speed, I was one of those who gained a lot on a GTX 970 with memory clock boosting (that was my first encounter with GPU overclocking of any kind). My experience with later generation Nvidia cards running current Einstein applications has been far less beneficial. My RTX 2080 gives correct results with a remarkably high memory overclock, but gets a rather small benefit from it.

Keith Myers

Joined: 11 Feb 11

Posts: 4962

Credit: 18660149727

RAC: 5642715

Gary's test was informative.

25 Feb 2019 20:55:13 UTC

Message 169764

(moderation:

)

Gary's test was informative. Guess things have changed and the current work and app are not as sensitive as once.

I run my 2080 at stock 14000Mhz memory. Actually haven't tried to increase it. Seems such a ludicrous number compared to Maxwell and Pascal. I only run very small bumps in core clocks 30-60 Mhz and just let GPU Boost take care of the actual clocks based on temps and power targets.

I also just go for generalized performance tuning as I run multiple projects and they are all different in requirements. So optimizing for one project may negatively impact another project.

kb9skw

Joined: 25 Feb 05

Posts: 21

Credit: 376306908

RAC: 10087

I replaced a RX 570 with a R9

26 Feb 2019 4:13:35 UTC

Message 169767

(moderation:

)

I replaced a RX 570 with a R9 Fury that uses HBM clocked at 500MHz. So odd to see such a low number but is has over twice the bandwidth of the DDR5 on the RX 570.

cecht

Joined: 7 Mar 18

Posts: 1530

Credit: 2883522472

RAC: 2108880

kb9skw wrote:I replaced a RX

26 Feb 2019 18:55:01 UTC

Message 169778 in response to message 169767

(moderation:

)

kb9skw wrote:

I replaced a RX 570 with a R9 Fury that uses HBM clocked at 500MHz. So odd to see such a low number but is has over twice the bandwidth of the DDR5 on the RX 570.

It seems like HBM would give better power efficiency than DDR5 for crunching E@H, but I don't know how you'd tease out memory watts from GPU watts.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Observations on AMD GPU memory clocks

Forums › Cruncher's Corner

I think it's been known for a

Keith Myers wrote:I think

Gary's test was informative.

I replaced a RX 570 with a R9

kb9skw wrote:I replaced a RX

Comment viewing options

Forums › Cruncher's Corner