All things Radeon VII / Vega 20

shuhui1990
shuhui1990
Joined: 16 Sep 06
Posts: 27
Credit: 3,631,456,971
RAC: 0

I think you should try to

archae86 wrote:
Peter van Kalleveen wrote:

This configuration provides an efficiency improvement of 21% over the stock 319-watt power consumption.

CONFIGURATION CORE VOLTAGE CORE CLOCK MEMORY CLOCK POWER LIMIT POWER CONSUMPTION
Stock 1136mV 1801MHz 1000MHz +0% 319W
Optimized 950mV 1750MHz 1100MHz +0% 251W

Interesting.  This source advocates power reduction by turning the core voltage knob, not the power limit knob.  My personal observations on my card in my system on Einstein work has been bad behavior in response to imposition of reduced core voltage using that knob, and much better behavior in response to the power limit knob.

The real mechanism of power reduction must primarily depend on the core voltage and core clock rate, regardless of which knob one turns to get the result.  But the card internal controls continue to make very frequent adjustments of both voltage and frequency.  The key question is what user input configuration causes one's card to make the best choices.

My personal Radeon VII current configuration involves control using Afterburner (I like the smooth fan speed I could get out of it much better than the motorboating I frequently got under Wattman control) with the following user settings:

Core voltage -- default (reads max 1123)
Power Limit -- -16%
Core clock -- default (reads max 1801)
Memory clock -- default (reads max 1000)
Fan speed is on a user map

As to the average actual operating condition, GPU-Z reports these averages:

GPU clock: 1605 MHz
Memory clock: 1000 MHz
GPU voltage: 1.07 V
GPU only Power draw: 207 W
GPU Temperature: 79C
GPU Temperature (hot spot): 98C

I've seen multiple reports of people seeing quite severe errors when adjusting down the voltage limit by surprisingly small amounts on the Radeon VII.  I speculate that the current cards are shipping with controls which are not very smart about setting the other parameters to workable levels when user voltage limitation is dialed down.

Quite likely these matters are very workload dependent.

I think you should try to undervolt and underclock instead of using power limit. Changing power limit doesn't change the frequency voltage curve. The card reduces power by underclocking. Thus reduces performance. While undervolting reduces power at the same frequency.

My card defaults at 1050 mV works fine at 960 mV and 1750 MHz. My other card defaults at 1110 mV works fine at 990 mV and 1750 MHz.

You can try 1000 mV at 1750 MHz. Average frequency will be higher. Power should be about 190 W.

Chooka
Chooka
Joined: 11 Feb 13
Posts: 117
Credit: 3,230,260,814
RAC: 99

Yep. I get quite a lot of

Yep. I get quite a lot of errors even with power limit set to -10%. Not sure about you guys but sometimes the card seems to stop altogether.  As in I got home today and the WU's had been crunching for 7hrs. Virtually no output from the card although my monitor works etc. When I go to restart my system, there's an error box that appears regarding the Radeon card but not sure what it's about. After a reboot all's good again.

I'm not even sure the reduction in power limit is the issue. 

Sadly the Radeon VII isn't as stable as say my Vega 56's or R9 280x's which just crunch away without issue. In fact my 2 x Vega 56's have a higher average than 1 x Radeon VII.

I'll try your suggestion of 998mV & 1750MHz see how that goes Shuhui1990.

Did you boost the memory clock at all? Or just undervolt only? Leave the power limit at 0%?


 

 


shuhui1990
shuhui1990
Joined: 16 Sep 06
Posts: 27
Credit: 3,631,456,971
RAC: 0

Chooka wrote:Yep. I get quite

Chooka wrote:

Yep. I get quite a lot of errors even with power limit set to -10%. Not sure about you guys but sometimes the card seems to stop altogether.  As in I got home today and the WU's had been crunching for 7hrs. Virtually no output from the card although my monitor works etc. When I go to restart my system, there's an error box that appears regarding the Radeon card but not sure what it's about. After a reboot all's good again.

I'm not even sure the reduction in power limit is the issue. 

Sadly the Radeon VII isn't as stable as say my Vega 56's or R9 280x's which just crunch away without issue. In fact my 2 x Vega 56's have a higher average than 1 x Radeon VII.

I'll try your suggestion of 998mV & 1750MHz see how that goes Shuhui1990.

Did you boost the memory clock at all? Or just undervolt only? Leave the power limit at 0%?


 

 


 

I only undervolt. Running 2x WU gives no issue.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,140
Credit: 6,971,664,931
RAC: 1,853,144

shuhui1990 wrote:I only

shuhui1990 wrote:
I only undervolt. Running 2x WU gives no issue.

I only power limit.  Currently mine is at -16% power limit.  Also I run a manual fan curve using MSIAfterburner, not Wattman.

My Radeon VII in the last couple of months is actually a bit more inclined to just keep working on Einstein than my two RX 570s, each of which has lapsed either to sloth mode (about a 3X reduction in productivity) or catatonic mode (zero production) on average after a couple of weeks of up time.

Reading elsewhere I've come across multiple observations of poor Radeon VII response to explicit voltage control.  As reports are inconsistent, this may vary both with specific samples of the card and with work load. 

I'm not currently tinkering with it.  My worst current complaint is that several times a day my monitor and the PC fall out of communication with each other, but that recovers substantially every time simply by powering off the monitor and immediately powering it back up again.  It is an annoying nuisance, but I live with it.

Chooka
Chooka
Joined: 11 Feb 13
Posts: 117
Credit: 3,230,260,814
RAC: 99

Thanks Shuhui1990.I

Thanks Shuhui1990.

I certainly don't have that monitor issue Archae86, only the issue I mentioned before and lots of errors.

I've got to say (touchwood) that I tried the undervolting set to 998mV & 1750 Mhz and I've noticed no difference in run times which is great! The fan noise and RPM has dropped right off to 1486rpm and run times appear the same as before. The errors are dropping away too which is all really great!

I'll say it's a bit early to get toooo excited but so far all good!

Wattman shows an average of - 

*GPU 1670Mhz

*Memory 1000Mhz

*Temp 84 degrees

*Fan 1574rpm.

GPU-Z shows average power draw now 192W. A huge improvement.


mesman21
mesman21
Joined: 30 Jan 07
Posts: 15
Credit: 3,337,606,947
RAC: 0

Chooka wrote:Not sure about

Chooka wrote:
Not sure about you guys but sometimes the card seems to stop altogether.  As in I got home today and the WU's had been crunching for 7hrs. Virtually no output from the card although my monitor works etc. When I go to restart my system, there's an error box that appears regarding the Radeon card but not sure what it's about. After a reboot all's good again.

I'm glad that I'm not the only one dealing with this strange behavior of hanging tasks. This issue has been haunting me for months, and it has been very frustrating, especially since I'm traveling for work and have limited access to my rigs. I've tried everything: about 5 different driver versions, under-volting, under-clocking, manual/ auto memory clocks, power limit, and various combinations of each. Sometimes I'll find settings that works for a week straight, then I restart computer to do a windows update, load the same profile, and I find a stuck WU an hour later. Infuriating.

The whole time I was running 3x WU per card. I asked Gavin about it, and he suggested that I go from 3x WU down to 2x WU. This was about 3 weeks ago, and I haven't had an hanging task issue since on either machine. No idea why this is, but it's been working for me.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,140
Credit: 6,971,664,931
RAC: 1,853,144

mesman21 wrote:The whole time

mesman21 wrote:
The whole time I was running 3x WU per card. I asked Gavin about it, and he suggested that I go from 3x WU down to 2x WU. This was about 3 weeks ago, and I haven't had an hanging task issue since on either machine.

It seems we can add you to the list of people who have decided it better not to run 3X for Einstein on Radeon VII.  

In my case the killer problem was an appreciable rate of "error while computing" of type 65.  At 2X I don't get these at all.

Is anyone here enjoying current success running Einstein GRP tasks at 3X on a Radeon VII?

Chooka
Chooka
Joined: 11 Feb 13
Posts: 117
Credit: 3,230,260,814
RAC: 99

Well I've been running x3 the

Well I've been running x3 the whole time but as mentioned, my error rate is falling since undervolting.

I think the error message that pops up (it only pops up when I go to shutdown so it quickly disappears) says something about a memory issue..... so perhaps running the 3 WU's puts a strain on the RVII memory??

I'm not sure. If it does it again, I'll take a photo.

It's one of those red and white error messages with the x0000 etc etc issue.

Anyway, so far so good with the new settings.


Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40,583,023,870
RAC: 2,773,032

Hello all, I've been away on

Hello all,

I've been away on holiday and then had two weeks away with work so will chime in now.

First off, congrats to Mesman21, glad I have helped :-) and I'm pleased to see you rocketing back through the ranks!

In the early days of owning the Radeon vii's I did manage to run tasks x3 for a few days error free but decided (before seeing signs of trouble) that the very marginal gain over running tasks x2 was not worth the extra fan noise or power consumption so x2 per card is where I'm staying. I run the cards using only power limit and memory clock adjustments, my tinkerings with real undervolting and core clock settings have all ended in tears due to the behaviour of any given setting differing with each reboot of the machine... There appears to me to be some sort of floating dynamic between Windows 10 and the AMD drivers, sometimes they interact great other times not so much.

My advice to owners of Radeon vii, Vega and even RX4xx/5xx cards would be to run with a maximum 2 tasks per card in a Windows enviroment to give the best chance of avoiding trouble, don't jump into undervolts or over/underclocks because others on the internet have had success with 'X' settings, just start by lowering your cards power limit and creep up the memory clock ;-) 
I still get invalid task results but at such a low rate it isn't worth my time investigating, at the time of writing I have 62 invalids V's 4798 valid tasks, I think you will agree my invalid rate is really a drop in the ocean.

Gav.

cecht
cecht
Joined: 7 Mar 18
Posts: 1,387
Credit: 2,400,133,076
RAC: 2,258,673

Here's a bit of news: The new

Here's a bit of news: The new Radeon RDNA (Navi) cards are due out 7 July.  From the TechPowerUp GPU Database, I put together a basic spec comparison to the Radeon VII. While the new cards won't give any performance boost, they may provide a nice task power efficiency.  Who's going to be first to let us know?

Radeon:___________ VII         RX_5700      RX_5700_XT  
USD, $ 699 379 449
TDP, W 295 180 225
7nm chip Vega 20 Navi 10       Navi 10      
Compute Units 60 36 40
GPU clk (boost), MHz 1750 1625 1750
mem clk (eff.),  MHz 2000 14000 14000
memory size, GB 16 8 8
memory type HBM2 GDDR6 GDDR6
power connectors 2x8-pin 6-pin+8-pin 6-pin+8-pin

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.