All things Radeon VII / Vega 20

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 51
Credit: 853,847,015
RAC: 747,814

With my VII under windows 7,

With my VII under windows 7, I also observe frequent driver resets, at intervals of a few minutes to a few hours.

Currently I'm testing the card with milkyway only. there I have no driver resets, but about 5 % invalids. Strange.

Driver 19.3.1, Powerlimit -20 %, max GPU 1650 MHz.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,666
Credit: 2,264,929,548
RAC: 2,863,455

Gary Roberts wrote:In my

Gary Roberts wrote:
In my case, the crunch rate would slow to a crawl.   

I did not check it at the time, but the history log (and the temperature graph from TThrottle) strongly suggests it was sailing along at full speed.

I'm happy to report that when just now I got up in the middle of the New Mexico night, the machine rewarded my mouse and keyboard attentions promptly.  I also noticed that GPU-Z reports my average VDDC at the moment as 1.121V.  As people are reporting good operation a bit below 1.0V, that may indicate that power limitation has some real room for beneficial effect when I try it soon.

I also realized as I lay awake that I have room to ventilate this case better.  While it has lots of fans, many of them were leftovers from years before I built the case.  Some were fans I had earlier set aside as being quiet but moving too little air to suit in a lower fan count older case.  I also may be running some of them under motherboard control at well under full speed.  There is probably room to trade some increased fan noise and air flow from the very quiet case fans for a bit less fan noise from the Radeon VII fans, which GPU-Z currently reports (in the middle of the night) are averaging 2903 rpm and 57%.  I should emphasize that compared to howling banshee cases with small diameter fans of my distant past this current situation is not so very loud, and of a remarkably agreeable character.   Many of you might not mind it at all.  I look forward to improving it--I hope with not very much loss of Einstein output.

Filipe
Filipe
Joined: 10 Mar 05
Posts: 148
Credit: 240,617,530
RAC: 2

Will we be able to crunch

Will we be able to crunch FGRPB1G in real time with all these very performant Radeon VII around?

archae86
archae86
Joined: 6 Dec 05
Posts: 2,666
Credit: 2,264,929,548
RAC: 2,863,455

Before I tinkered with using

Before I tinkered with using Afterburner for power limitation, the temperature reported by TThrottle was remarkably stable at just under 110C.  I think TThrottle may be receiving a temperature which is the one actually used by the card to regulate--perhaps not just the fan speed, but also to some degree core clock rate and GPU voltage.

Once I launched Afterburner I realized that I still don't understand the fan controls.  I fiddled with the user curve to get fan speed very similar to what the card had done for itself at the initial operating condition.

That done I tried power limitation, dropping from the initial 100% to 95%, then 90, then 85, then 80.

Remarkably little happened to elapsed times.  I relied on GPU-Z averaging over a half hour interval to see other changes.

Parameter           100%  80%
GPU Clock           1605  1546.5
Memory Clock        1000  998.1
GPU Temperature     81.0  76.9
Memory Temperature  85.0  80.3
Fan Speed           58%   56%
Fan Speed rpm       2904  2831
GPU only power draw 218.5 193.0
GPU voltage         1.115 1.055

TThrottle temp      109C  98C

I need to do a new long-term average power measurement at the box input, but am pretty sure that -20% power limitation instructed by Afterburner intervention gave substantial power reduction with little adverse performance impact.

I think lower fan noise is achievable but will wait for the moment while I lick my chops some more.  I suspect some more power reduction may be available by directly specifying a slightly lower maximum GPU voltage.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,666
Credit: 2,264,929,548
RAC: 2,863,455

With -20% power limitation as

With -20% power limitation as specified at the Afterburner interface engaged, as previously described, I did a 3-hour measurement run (i.e. I was away at church and eating lunch, so no competing usage).  That gave an average wall plug power measurement of 306 watts, and average elapsed times of 6:04.  Assuming away the details of real-world downtime and competing usage, that computes to a nominal daily credit of 1,644,923 and a system level power efficiency of 5,379 credits/day/watt.  

I, personally, may give up a little in credit rate to buy lower fan noise or lower power consumption on this system, but I think this is a realistic operating point for people who are willing to tweak just a little to consider.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,666
Credit: 2,264,929,548
RAC: 2,863,455

archae86 wrote:I suspect some

archae86 wrote:
I suspect some more power reduction may be available by directly specifying a slightly lower maximum GPU voltage.

I had a try at specifying lower maximum core voltage in Afterburner.  The Afterburner displayed GPU voltage graph indeed showed a response for a few trial values:

1.100 slightly lower the range of values

1.050 again lowered the range

1.010 greatly reduced the variation, in addition to removing the higher values

But, a couple of minutes after I specified 1.01 the machine did a spontaneous reboot.

During these tests I still had a -20% power limit slider position in Afterburner.

I don't know that the spontaneous reboot had anything to do with either my power limit or my voltage maximum settings, but suspect the voltage maximum setting.  I intend to put that one back to default.

Again this morning I have twice had the PC give me black screen on wake up.

I paid more attention than the first time:

1. it is responding to mouse movement.  I can see the monitor status light change, and the extremely faint backlighting comes on, but nary a single non-black pixel.  Attempting to provide my Windows password blind has no effect

2. Twice now, I've suspended all Einstein tasks from a remote connection, and about ten minutes later got a normal desktop and was able to login on next wiggling the mouse.

A little Internet searching on Windows 10 black screens got me something to try:

Supposedly Windows key+ctrl+shift+B forces a graphics driver restart, and some users have found it to be a way out of some black screen issues.

I plaintively hope this issue on my machine is somehow a matter of the current AMD driver interacting with my machine, the Radeon VII, and Einstein, and will go away with a new driver.  It is actually pretty dismaying.

Other than that, the beast roars on.  

Chooka
Chooka
Joined: 11 Feb 13
Posts: 95
Credit: 1,003,719,693
RAC: 671

Thank you very much for

Thank you very much for sharing all this info Archae86! All very interesting.

I hadn't heard of Windows key+ctrl+shift+B before.


Gavin
Gavin
Joined: 21 Sep 10
Posts: 175
Credit: 24,268,744,275
RAC: 19,884,471

Archae86, I think it would be

Archae86, I think it would be fair to say that the biggest stumbling block thus far with the VII is the driver(s). These cards can sing but chasing the dragon of lower power consumption whilst retaining problem free, quiet, fast cruching is going to be an issue this early in the driver development. Hang in there as things can only get better!

I've not experienced any black screen issues but I have avoided using Afterburner after the often counter productive results using it gave with my Vega64's. For now getting your head round Wattman is the best foot forward...

Take some time to get the fan curve to your ears liking and self imposed temperature limits but accept the fans ramping up loudly (yet briefly) for no apparent reason, raise the memory clock in increments and lower the power limit. Currently I wouldn't advise playing with adjusting core clock or trying to undervolt (other than using the power limit).

My dual VII machine 827 is chugging along very nicely, running x2 per card and drawing a peak of 610 Watts... 3.5 million+ credits per day here I come!!!

archae86
archae86
Joined: 6 Dec 05
Posts: 2,666
Credit: 2,264,929,548
RAC: 2,863,455

Gavin wrote:For now getting

Gavin wrote:
For now getting your head round Wattman is the best foot forward...

I've looked at Wattman more than once in the last few weeks, but until just now, prodded by your post to look again, I never found the power slider.  That is because I neither noticed the scroll bar at the right, nor happened to stretch the window vertically.  Now I have.  So I'll disable the "overclock at boot" setting I just imposed in Afterburner, and have a try at Wattman next boot up.  Thanks--sometimes one just has to look again to see the obvious answer.  It reminds me of the weeks I spent not understanding that in order to point Afterburner at the "other" card in a dual-card machine one had to click on Settings and use a scroll box there.

Quote:
3.5 million+ credits per day here I come!!!

Goodness me.  That is fantastic for a single dual-card machine.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,912
Credit: 30,212,956,632
RAC: 35,941,439

Gavin wrote:My dual VII

Gavin wrote:
My dual VII machine 827 is chugging along very nicely, running x2 per card and drawing a peak of 610 Watts... 3.5 million+ credits per day here I come!!!

"... dual VII ..."  That's right - rub it in! :-).  I think I'll have to put you two guys on ignore!! :-)  This is a bit like what I imagine waterboarding could be like!! :-).

Seriously, that's pretty impressive!  It looks like that within a very short time span on the late afternoon of March 5th, your machine transitioned from the old times to the new without missing a beat.

I was interested to do a very quick browse through your invalid list.  Due to the 'strictness' of the validator and the variety of hardware crunching these tasks, a certain level of invalids is to be expected.  There are both 'validate errors', where the result has gross discrepancies, and 'completed, marked as invalid', where the discrepancies are much less obvious and possibly due to hardware differences producing slightly different answers.

It seemed to me that you had quite a few validate errors prior to and immediately after the transition.  Now there are very few of these but some of the latter type.  Given that more of your `older' (prior to the transition) results will have now disappeared from the online database (so we don't see the full number of them), it seems to me that your new hardware is producing 'better' results (with fewer invalids of any type) than perhaps your previous cards were.

This is a very encouraging outcome, considering the newness of the hardware/drivers.  As you say, with AMD, performance and reliability always seem to improve with time.  That's been my experience as well.

 

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.