All things Radeon VII / Vega 20

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,605,345,759
RAC: 2,443,727

"That's right - rub it in!

"That's right - rub it in! :-).  I think I'll have to put you two guys on ignore!! :-)  This is a bit like what I imagine waterboarding could be like!! :-)"

 

HAHAAH Gary. I agree.

You can't even get ONE of these cards in Aus. Well...barely. Only if you want to pay way above the rrp. Even then, they are as rare as hens teeth.


cecht
cecht
Joined: 7 Mar 18
Posts: 1,511
Credit: 2,813,143,663
RAC: 2,135,359

I have my sights set on a

I have my sights set on a Radeon VII and am building a PC just to house it. I'm planning to wait though, in the hope that when AMD ramps up production and card vendors begin offering something other than the reference card, some will have dual BIOS with a mining BIOS preloaded.  A fellow can hope...

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,181,784,931
RAC: 738,106

This post is about fans--if

This post is about fans--if that is not interesting, skip it.

Gavin wrote:
Take some time to get the fan curve to your ears liking and self imposed temperature limits but accept the fans ramping up loudly (yet briefly) for no apparent reason

Spurred on by Gavin's advice, on my next reboot I did not start Afterburner but went direct to AMD Wattman.  Now that I knew to stretch or scroll down the window it was easy to find and adjust the power limit slider.  I clicked the toggle to "manual" for fan speed, and tried putting in the points I'd use in Afterburner.

That did not work well at all.  Apparently Afterburner "hears" the temperature that Wattman refers to as "current temperature", while the curve you supply for manual fan control in Wattman appears to refer to temperatures reported s "junction temperature". These differ by over 20C, and have different dynamics, so this matters.  So my first action was a shape-preserving tug to the right so that I could match the fan speed I had been getting at typical conditions using my curve in Afterburner.

It averaged OK, but was a nervous Nellie.  Quite frequently (say once a minute or more), it would suddenly boost fan speed from about 3000 rpm to about 3500, then slowly coast down.  It looked like a series of ski slopes on the Wattman graph, and was distracting to hear.

I resolved to attack the problem by altering the set points to reduce the slope of fan speed vs. temperature, expecting a rather slight moderation.  To my surprise, the "ski slopes" went away completely under unattended operation of 2X Einstein, and the variations in fan speed when I was, for example, typing a post to Einstein dropped to a couple of percent.

Your case, your ears, your card, your fan noise objections, your desire for performance, and your concerns about card temperature will all probably differ from mine, but for the record here is my current set of fan curve points.

39C 30%
66C 38%
82C 49%
95C 59%
105C 78%

Wattman reports my average fan speed as 2957 rpm, "temperature" as 74C, and "junction temperature" as 98C.  In the many minutes since I last did a "clear data" the peak fan speed is reported as only 3027 and peak junction temperature as 102.

I've revised my motherboard case fan controls to reference CPU (which varies a lot) instead of Motherboard (which varies little).  The case fans are cranking full steam during normal Einstein crunching, and are still quieter than the 3000 rpm Radeon VII fans.  I have an audio hobby, and this computer is my workbench for that.  So it matters that If I suspend Einstein processing both the card and the case fans drop greatly in noise within a couple of minutes.

Fan noise was one of my biggest concerns in buying this card.  I'm feeling much better about it.

Now if only the black screen on returning after lock would go away...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,870
Credit: 116,030,406,180
RAC: 35,667,348

archae86 wrote:Fan noise was

archae86 wrote:
Fan noise was one of my biggest concerns in buying this card.  I'm feeling much better about it.

I'm really pleased you've found a way to get an acceptable level of fan noise.  So now it's just the black screen issue.

I'm not really sure about exactly what you mean by "on returning after lock"?  Are you just referring to a power saving mode where the screen blanks out if there is no user activity (keyboard/mouse) or video activity (nothing happening on the display) for some set period?  I presume so but correct me if I'm wrong.

I'm just guessing but I'm wondering, if after the set time interval, the GPU puts itself in some sort of different state which shuts off display functions whilst allowing compute functions to continue and that (perhaps due to some driver/firmware bug) it can't resurrect the display function when there is further user activity.  I'm also wondering if you might be able to test this by setting a time interval of 'never' (if that's available) and then conserving power differently by turning off the display - eg. overnight, or for any other longish period where you won't otherwise be using the machine.

If the screen then always lights as soon as you switch it on (you'd probably need to test quite a few times) you could probably safely assume that it's the action of returning from that different state that's causing the problem.  If so, that would seem to be a driver/firmware issue that hopefully should get fixed fairly promptly.

 

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,181,784,931
RAC: 738,106

Gary Roberts wrote:I'm not

Gary Roberts wrote:
I'm not really sure about exactly what you mean by "on returning after lock"?   

The reference to lock was to a Windows function, most swiftly invoked by holding down the Windows key and pressing the L key, which "locks" the machine.  That means it keeps on running but the active desktop is no longer displayed, and in order for a keyboard to use the machine again they must first attract the attention of the machine (most commonly by moving the mouse or tapping the space bar on the keyboard) to get a login box, then supply their password.

On initially entering this state, generally a full-screen image is displayed for a short interval, but, presumably as a power-saving measure, in a little while the graphics card stops sending pixels to the monitor, and a little later the monitor goes to a much lower power state.

A second way to enter the condition of interest is just to walk away.  After a little while (I think I have mine set to 5 minutes), the system locks on its own, so the same sequence of a simple desktop picture, then no pixels, then monitor to sleep state.

In my current problem which is new since I installed the Radeon VII, when I walk back into the room and move the mouse or tap a space bar, the system clearly notices my motion--I see and hear a tiny bit of added disk activity, and the monitor status indicator shows it left sleep state, and the very low backlight comes on that is the way this sort of monitor does black.  But not a pixel lights up, and after a bit the monitor announces it is going to go to sleep shortly as it is receiving no input.

I'll hazard a guess I get a black screen instead of a login screen very roughly 50% of the time.  If I just wait (instead of doing a force reboot), and try now and again, eventually I've always gotten in.  I'm not sure whether the couple of times I remotely suspended Einstein work and got in were really assisted by the suspension or just early successes of the "wait a bit" solution.

In Windows, the place to configure the power-saving portion of this is located at Control Panel|Power options.

Opening that up to see where I am, I find I currently am using the "balanced" plan, and that specifically I have selected to never go to sleep (the computer, not the monitor), and to turn off the display after 10 minutes.

Control of on/off, and darn near everything else on this monitor is by a sort of single-button joystick hidden at the bottom.  As it happens I am experienced at touching it, as about a dozen actions are required each time I switch the monitor input source from a DisplayPort coming from my PC to an HDMI input coming from by DVD player.

So I should be able to try your suggestion.  Just now I have configured the Windows Power Plan "monitor off" setting which was at 10 minutes to "never".

If I am able to log in at my first try the next ten times, I'll figure that probably addressed the matter.  By then I should have the control button motions learned.

As my searches have failed to turn up other Radeon VII users complaining of this particular symptom, I suspect there is something in particular about my system configuration that gives this by bad luck.  So it might not get a driver fix any time soon.

In other news, I had a spontaneous reboot this evening.  This system never had those, so likely it is related to the Radeon VII, AMD driver, my settings of same, or the latest Windows Update.  My personal guess is that maybe Power Limitation -20% is a bit too much for highest reliability operation of this particular sample of the card with the current firmware, driver, and Einstein workload.  But I don't plan to back off just yet.  This is the second spontaneous reboot I've seen.  The first was during my brief efforts at Voltage limitation.

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,605,345,759
RAC: 2,443,727

I was going to suggest the

I was going to suggest the power options in Windows.

Something I've had to change multiple times with all my fresh Windows installs.


archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,181,784,931
RAC: 738,106

I was premature in saying I

I was premature in saying I had found a fan curve for Wattman that avoided surging.  It did at one operating point, but with variation in room temperature and so on I've given up on finding settings, whether a manual fan curve or relying on automatic, that do not often surge.  As I did not recall surging when I had a user fan curve active with a running copy of MSIAfterburner, I tried that again.

With a day and a half of renewed Afterburner usage, and several tweaks in the curve and over a pretty broad range of room temperatures, I've never heard fan surging.  Something seems fundamentally different about fan control for the Radeon VII card in these two cases, and for the moment I greatly prefer the Afterburner style.

As the only other intervention I'm using is power limit, and I imagine that is just a matter of the selected UI passing a number on to the driver, at the moment I don't know what might be wrong with Afterburner for my case.  I'm eager to hear, however.

Regarding my black screen problem, I tried Gary's suggestion by tampering with the Control Panel Power Options, and it seemed to work.  But that relied on my remembering to turn off the monitor each time I left, and after years of relying on the automatic, that could take a while.  I stumbled on another way, which may be an additional clue as to what is really happening.

If the monitor is in blackout mode (i.e. very weakly backlit, but every pixel black, when it should be showing something), if I turn the monitor off, then back on again, Presto! I immediately see what I should.  This has held true so far for four out of four trials, two each of two different scenarios.

1. the original problem: I try to wake up the machine after I've walked away, and get monitor active indication, but black screen.

2. a newly observed problem: I switched the monitor active input source from the DisplayPort cable connected to my PC to an HDMI cable connected to my DVD player in order to watch a movie.  That works fine, but when I switch back to the PC I get the dreaded black screen.

If you think cables from graphics cards to monitors are one-way, this makes no sense.  But DisplayPort has a low-bandwidth reverse channel (AUX) which I suppose is how your PC figures out what monitor you have attached, and what resolution options it supports.

So maybe part of the answer to why I see the black-screen problem and other Radeon VII users do not is that it may only arise for a small subset of monitors.

Again I've been verbose.  The bottom line is that I again think I have a good resolution on the fan-noise problem, and I have two much better workarounds for the black-screen problem which has reduced it from a serious objection to a substantial nuisance.

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,605,345,759
RAC: 2,443,727

So now that things have

So now that things have settled, is the general consensus running 2 or 3 WU's concurrently and using Wattman to set the powerlimit to -20%?

That's it?


DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 105
Credit: 3,666,070,617
RAC: 4,372,187

WattMan (19.3.2, Win 7) on

WattMan (19.3.2, Win 7) on host 6742381:

Powerlimit: -20 %

Max GPU: 1750 MHz@1025 mV

Max Mem: 1000 MHz

Fan:

50° - 8%

72° - 32%

94° - 44 %

104° - 50 %

1 CPU Task Universe BHspin with 2 WUs FGRP concurrently, t ~ 375-380 s.

One or two driverresets per day. Needs BOINC restart.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,181,784,931
RAC: 738,106

I backed down my Power limit

I backed down my Power limit from -20 to -15% a few days ago after my second unrequested system reboot (aka crash).

It may be coincidence, but I have not had another crash since.  I am burning more power, and in this range there is not a productivity improvement to speak of, so some day I may explore the power limit question again.  I suspect there may be some sample-to-sample variation in the actual behavior of cards of this type in this respect.  It seems likely there is a reason that AMD limited the power limit reduction to -20%--quite likely they saw trouble somewhere not far below that point, as I think for many other cards they support more reduction.

I've not been seeing driver resets.

I've not tried any interventions on either clock.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.