A walk to the AMD side

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915405298
RAC: 2114380

I too have built a new PC! My

I too have built a new PC! My first! It's a Linux/Ubuntu host dedicated to running two RX 570 cards. I'm please as punch how the build turned out.
I'm now working on fine tuning card performance using the amdgpu-utils utility (a Linux version of AMD WattMan). What I've learned is that while fan speeds and power caps can be set on the fly while the cards are crunching, all other settings (voltages, clocks, state masks) need the card is a resting state, i.e. with BOINC suspended. I learned previously on my other (old) host that some settings just don't take if a monitor is connected to the card. But that's not a problem with my new host because I'm have the monitor connected to the on-board Intel GPU (Pentium G5600).
What has me puzzled, however, is that while both cards performed very similarly when run singly in my old host, they now run quite differently in the new host.  The card in the top PCIe slot runs about 10W higher and 10 C hotter than the card in the bottom PCIe slot. And that's with the top card's fan running at 60%, while the bottom card runs at 40%.  Even though clock speeds are the same, task times in the top card are about 10-15 sec faster over a ~10 minute run time.
So is this usual for dual AMD card systems? Any ideas how I might reduce power consumption through the top PCIe slot? The high temps and fan speeds are what I'm trying to bring down.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7230164840
RAC: 1157405

cecht wrote: The card in the

cecht wrote:
 The card in the top PCIe slot runs about 10W higher and 10 C hotter than the card in the bottom PCIe slot. And that's with the top card's fan running at 60%, while the bottom card runs at 40%.  Even though clock speeds are the same, task times in the top card are about 10-15 sec faster over a ~10 minute run time.
So is this usual for dual AMD card systems? Any ideas how I might reduce power consumption through the top PCIe slot? The high temps and fan speeds are what I'm trying to bring down.

I have no experience with AMD dual-card systems.  My experience with a few Nvidia dual-card systems was an apparent indifference to card position regarding performance, but an appreciable temperature increase for the upper card.  I run closed cases, so if you are running with the sides off and a big industrial fan blowing in my experience is irrelevant, but I think in most cases the intake air is biased low, and the output air is biased high, so the upper card gets input air appreciably hotter than does the lower card.

Not all motherboards treat all slots the same.  The one I just bought would, I think, provide only 4 PCI-E lanes to the lower slot in your situation.   Another card I run would provide 8 PCI-E lanes to each of the two "x16" slots were both in use.  I'm not sure that would matter for the current Einstein GRP application, which seems not to require much from the host.  Still, it might be interesting to check your motherboard documentation regarding lane provision vs. usage.

I've seen my two 570 cards respond quite nicely to use of the "power limitation" parameter.  One of my two is currently running with -40%, the other with -20%.  I'm still fiddling with my fan curves.  For no reason known to me, my newer box suddenly started running the 570 fan at 100% recently.  I heard it in another room and ran to the rescue.  On the other hand, at 40% the fans are scarcely noticeable. 

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915405298
RAC: 2114380

Archae86 wrote:  My

Archae86 wrote:
  My experience with a few Nvidia dual-card systems was an apparent indifference to card position regarding performance, but an appreciable temperature increase for the upper card.  I run closed cases, so if you are running with the sides off and a big industrial fan blowing in my experience is irrelevant, but I think in most cases the intake air is biased low, and the output air is biased high, so the upper card gets input air appreciably hotter than does the lower card.

Okay, yep, that did it.  I ran the cards singly in different positions, took measurements, and ended up simply reversing their slot positions. First, I was wrong about the cards being the equivalent.  Don't know how I missed it on my other host, but one card does run ~10 W higher and has slightly shorter run times. But it turns out that it wasn't the wattage that made that card run hotter, it was because it was above the other card; those metal backplates get stinking hot. And, um, oh yeah, heat rises. That and the air flow through my case not being as effective as I had assumed. What made the biggest difference was opening up the case.  That brought temps, and fan speed, way down. So I guess the only down side of an open case is faster accumulation of dust and cat hair on fan blades?

I have the host now running ~254 W at-the-wall with both cards running x2 tasks. The top card is ~70 C with its fan at 47% and the bottom card is ~65 C with its fan on automatic. Crunch times are ~615s and ~595s per task, respectively. I didn't keep good data, but I got the impression from all my trials that higher temps give longer run times and higher GPU wattage.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7230164840
RAC: 1157405

cecht wrote: First, I was

cecht wrote:
First, I was wrong about the cards being the equivalent.

I currently am running two of the exact same model of XFX brand FX 570 cards in two different systems.  While I have bragged here that the first of these two cards responds wonderfully well to power limitation, at least so far down as -40%, the second has a decidedly different response, and loses a lot of performance to power limitation compared to the first.  When all is said and done I"ll probably wind up running the second card at appreciably higher power consumption and lower Einstein productivity than the first.

I've still not tried out the magic "cecht/mining BIOS" switch on either card.  I feel I need to establish clearly the performance parameters at a decent operating point before flipping the switch and observing differences.  I have good hope it will help me.  I doubt it will eliminate what appears to be an appreciable difference between two manufacturing samples of the same card.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915405298
RAC: 2114380

Archae86 wrote:I've still not

Archae86 wrote:
I've still not tried out the magic "cecht/mining BIOS" switch on either card.

DO IT! DO IT! DO IT! CHUG! CHUG! CHUG! (I thought maybe a bit of peer pressure appropriate here. :)

The mining BIOS has its own magical built-in effective power limits, so you can just leave the power setting at the max.  My "mining" rx570s run about 80 W for the GPU core, compared to the 120 W gaming default. What I found gives an additional performance-efficiency increase is to use p-state masking, which can lock the card into specific  preset MHz-mV settings. On my Linix system, for example, I have my rx570s set to a p-state mask of 0, 6, which means they will only run at state 0, the no-load state, or state 6 (out of 7), which gives me the best E@H performance. On a rx460 or rx560 card, I use a 0,4 mask, which is equivalent to a 44W power limit (from 48W default max). Both methods will run the 460/560 card at 1100 MHz, but masking seems to give better task performance. Rick, from Rick's Lab on GitHub, put me on to using masks for performance increases.

In a Linux system, masking will only work on a card not connected to a monitor. (It can be done, but it doesn't stick). Masking should also work in Windows. From what I remember tooling around with AMD WattMan, I could manually set several states to be the same clock speed and voltage, and that was with a monitor connected. But I didn't take note of efficiency comparisons to power limiting.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7230164840
RAC: 1157405

I flipped the BIOS switch on

I flipped the BIOS switch on my second (less capable) RX 570 shortly before cecht issued clanging encouragement.  So far I've only run it at zero power limitation (and no other modifications).  The only clear impact of the switch position is that on first boot after the change there was a five or ten second extra pause before the BIOS offered to let me go into setup.  I expect this to go away on second boot.  This was about the extra time I saw at this stage when I added two hard drives not part of the original build.

However, core clock speed, memory clock speed, VDDC,  power consumption and task elapsed time did not change by enough to be sure it was outside the noise level.

I'll try a couple of levels of power limitation next.  In particular, I'll look to see whether it does better than this sample did at the game switch position for power limitation deeper than -20%.

I don't have clear guesses as to why my observation differs so drastically from that of the estimable cecht.

Some candidates:

I run Windows

I have MSIAfterburner turned on, and currently controlling the RX 570 fan speed

I've not rebooted since first boot with the switch flipped.

I don't look nearly so good as the picture cecht uses (but maybe he does not either).

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117787875256
RAC: 34672018

archae86 wrote:.... I don't

archae86 wrote:
.... I don't look nearly so good as the picture cecht uses (but maybe he does not either).

Now that I've picked myself up off the floor from laughing so much ...

     .... and regained a modicum of composure .....

          .... I'm waiting with bated breath ....

               .... to see the awe inspiring image you must choose ....

                    .... to really put him back in his place ;-) :-).

 

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7230164840
RAC: 1157405

Gary Roberts wrote: the awe

Gary Roberts wrote:
the awe inspiring image you must choose

A rather ratty copy of a painting of a Doge of Venice was at my desk throughout the design of the 8086.  I think the painting was Giovanni Bellini's portrait of Doge Leonardo Loredan.

Jim McKevitt, one of the four of us, was wont at critical points in our design debates to gesture to the print and say "What would the Doge do?".

I had a little trouble finding the avatar upload point.  It was at Account|Preferences|Community.  The other key point was to click the Save link at the bottom of the page.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

cecht wrote:So I guess the

cecht wrote:
So I guess the only down side of an open case is faster accumulation of dust and cat hair on fan blades?

Unwanted visitors sticking their noses where they don't belong?

Cat on keyboards

(not my cat)

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915405298
RAC: 2114380

Archae86 wrote:However, core

Archae86 wrote:
However, core clock speed, memory clock speed, VDDC,  power consumption and task elapsed time did not change by enough to be sure it was outside the noise level.

Hmm.  If the mining bios has kicked in, Afterburner should show a top memory clock of 1850 MHz, and a top core clock of ~1040 MHz.  Maybe set Afterburner to default card settings? Or temporarily replace it with AMD WattMan? (WattMan will also let you see the p-state table, in the graph where you can adjust core speeds and voltages.) It can't think why the bios change wouldn't take hold in Windows, though I've only flipped the switch under Linux.

If all else fails, try imparting a smug smirk to The Doge's stern visage. See my avatar for guidance... ;)  I will say though, The Doge is intimidating. If I were a BIOS, I would not want to disappoint him.

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.