All things Navi 10

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7740359726
RAC: 2562876

Richie wrote: Have you

Richie wrote:

Have you already tried AMD driver version 20.11.1 that was released a couple of days ago ?

No. I think I have the latest enterprise content creator driver installed. I am also testing a different #3 gpu card with riser hardware. Depending on result I may revert to two cards on motherboard or start trouble shooting card 4.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7740359726
RAC: 2562876

Richie wrote:Have you

Richie wrote:

Have you already tried AMD driver version 20.11.1 that was released a couple of days ago ?

No, I haven't.  I am running the latest release of the Enterprise Content creator driver (I think).  So as I work through the list of possible problems that certainly should go on my TTT (Things To Try).

I am currently trying a different #3 R5700 GPU with different riser card hardware.  If that doesn't clear things, I will drop back to 2 GPUs plugged into the motherboard and see if that clears things.

The MB may not want to run 3 or more GPUs.  I have a GPU that was previously flashed to the XT version of the bios (which I have back-flashed) that might be unhappy.  And/or it could be something to do with my riser hardware.

Testing, one, two, three...  ;)

Tom M

----edit-------

Just got another invalid.  Going to try for two GPUs and if clean then change out the riser hardware on a card and add it back in.

---------edit----

 

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7740359726
RAC: 2562876

Tom M wrote: Richie

Tom M wrote:

Richie wrote:

Have you already tried AMD driver version 20.11.1 that was released a couple of days ago ?

---edit-------

Just got another invalid.  Going to try for two GPUs and if clean then change out the riser hardware on a card and add it back in.

---------edit----

Even with just the gpus directly on the motherboard I am getting a very small number of "precision" invalids.  Which makes it seem like it is inherent :(  It may not be possible to avoid it?

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059464931
RAC: 1206156

Tom M wrote: I am getting a

Tom M wrote:
I am getting a very small number of "precision" invalids.  Which makes it seem like it is inherent :(  It may not be possible to avoid it?

In recent years my natural rate of invalids on Einstein GRP GPU tasks has run broadly in the range of about 1%.  It seems to ebb and flow a bit, to perhaps as low as 0.5% and perhaps as high as 3%.  I don't think these are actual "mistakes" by the GPU, but the consequences of varying internal handling of format conversions among implementations that give slight numerical result differences which are sometimes judged too big by the validator.

However on Einstein GW tasks I don't see this.  The 5700 card I currently run on GW shows  696 valid and 0 invalid for the tasks currently retained in the tasks list, and I don't think I've seen more than one invalid in the time I've been running.

At this particular moment my single 5700 machine running GRP displays 16 invalid on 1318 valid, so a bit over 1%, and my double 5700 machine running GRP displays 27 invalid on 1978 valid.

The worst I can recall was my Radeon VII, which I think ran up close to 3% invalid on GRP.  Whatever the actual number, is was quite clearly higher than my RX 570 cards running the same work at the same time.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7740359726
RAC: 2562876

Thank you.  I was just

Thank you.  I was just looking at your Windows/ 2 R5700's box to see if your system was having comparable "Precision" errors because we are running similar gpus and Operating systems.  And had concluded I would not be able to drive it any lower.

It was good for you to be able quantify my guess.

Thank you.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7740359726
RAC: 2562876

Note to self.  When running

Note to self.  When running R5700's under Windows 10 do NOT use the latest gpu drivers from Amd.  Go back to a January 2020 or so driver.  It will save you weeks to trying to figure out why the system keeps crashing.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059464931
RAC: 1206156

I run three systems with 5700

I run three systems with 5700 graphics cards running under Windows. All spend most of their time running Einstein tasks. I am rather haphazard about when I install updated drivers for the cards. At the moment the system that I am typing this note on has an August 2020 driver version 20.8.1 installed.  That one seems to run OK on this machine for my workload.

It might help someone else if you mention specific driver versions for which you have had trouble and for which you have had success.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7740359726
RAC: 2562876

archae86 wrote: I run three

archae86 wrote:

I run three systems with 5700 graphics cards running under Windows. All spend most of their time running Einstein tasks. I am rather haphazard about when I install updated drivers for the cards. At the moment the system that I am typing this note on has an August 2020 driver version 20.8.1 installed.  That one seems to run OK on this machine for my workload.

It might help someone else if you mention specific driver versions for which you have had trouble and for which you have had success.

My failures seem to all revolve around using either the Windows-based Amd program that figures out which driver to install (Auto-Detect and Install) and then offers to install it (which means usually the latest, nearly the latest available) and both what was the the latest enterprise content driver and the previous version of the enterprise content driver.

The current gaming release is: Adrenalin 2020 Edition 20.9.1 (kernel crashes, reboot)

The current enterprise content release is: 20.Q4 (untested, previous version had kernel crashes, system reboot)

The one I am running successfully is: Adrenalin 2020 Edition 20.1.3 which is dated 1/29/2020

So it is likely if you stay with 20.8.x version or somewhat earilier gaming drivers you will not experience the crash/reboot issue I did.

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7740359726
RAC: 2562876

It now looks like there is a

It now looks like there is a broken dependency across "all" (tested installs for Rx 580 and R5700, all versions of Linux 20 for R5700) AMD gpu driver installs for both Linux 20 and 18.

I keep getting a message like this: "WARNING: amdgpu dkms failed for running kernel.."

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3714
Credit: 34661999747
RAC: 27450189

you must be doing something

you must be doing something wrong. there are several AMD systems in the top 50 that are running Ubuntu 18/19/20.

 

might be a good idea to list your EXACT install process (start to finish, not omitting any steps) so that others who have successful installs can try to see what you're doing wrong.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.