Troubleshooting Ubuntu 20 and a fresh install of Amd drivers

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4700
Credit: 17544086481
RAC: 6398591

Tom M wrote: Ian&Steve C.

Tom M wrote:

Ian&Steve C. wrote:

I don’t know what you mean by “install with 5.4 kernel”. That’s what it comes with. You have to enable extra stuff to get a later kernel in there. Look at all the 20.04 systems on the leaderboard. They all have 5.4

My version installs 5.8 which doesn't get along with the Amd driver for that version of Ubuntu.

Tom M

It can't if you install with the standard Ubuntu 20.04 LTS. Don't allow updating during installation. Start with a blank slate, no previous installation.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33825827084
RAC: 37777767

As Keith said, you had to

As Keith said, you had to have enabled HWE kernels to get 5.8 on there. I like to install Ubuntu as a Minimal install and do not do updates while installing. this should drop you with 5.4. the HWE kernel usually doesn't get pushed to LTS until the second point release (20.04.2) but it looks like the updates during install are pushing it while still on 20.04.1 (just confirmed by installing 20.04.1 in a brand new VM, when installed allowing updates, it left me with 5.8)

 

also, make sure you're not getting bit by some secure boot nonsense. go into your BIOS and look for the Secure Boot settings, and make sure the OS selection is set to "other OS" or whatever option is not Windows. or turn OFF secure boot altogether. do this before installing the OS.

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4700
Credit: 17544086481
RAC: 6398591

Everything Ian

Everything Ian said.

Double second on turning off Secure Boot in the BIOS.  Usually named "Other OS" or Legacy.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673572846
RAC: 1764075

Keith Myers

Keith Myers wrote:

Everything Ian said.

Double second on turning off Secure Boot in the BIOS.  Usually named "Other OS" or Legacy.

BINGO!

Also even if you turn off everything but the security updates it still tries to add Kernel 5.8 which defeats the whole purpose of installing without any upgrades.

1) I have tried any number of work-arounds or even out right command lines for deleting kernel 5.8  One of my last attempts deleted 5.8 and then reinstalled a reduced version of the same 5.8

2) In Ubuntu version 18.x I had discovered I needed to do a full update before the manual install of the Nvidia drivers would work.  This carried over to Ubuntu 20 where it promptly shot me in the foot!

Thank you Ian&Steve and Keith for your attention to detail and clues that got us this far.

RECAP!

If you don't allow Ubuntu 20 to perform ANY updates during a clean install it will only install kernel 5.4

Then don't allow any updates before you get the Amd gpu drivers installed! I am uncertain if anything will break with a kernel update afterwards. 

===edit===

Right now I am not allowing any kernel updates. This requires manual intervention during each update cycle.  You have to uncheck the 5.8 kernel updates.

---added--

I just tried unchecking all the OS image and kernel 5.8 check boxes and it started to download kernel 5.8 whereupon I canceled the entire update.  While I will revist trying to update anything but the OS, right now I am a bit gun shy.

===end edit==============

Go forth and conquer your higher RAC goal(s).

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4700
Credit: 17544086481
RAC: 6398591

Just run: sudo apt-mark

Just run:

sudo apt-mark hold linux-generic linux-image-generic linux-headers-generic

This will prevent you from updating your kernel any more after the initial install. You can also put any kernel updates in the blacklist section in /etc/apt/apt.conf.d/50unattended-upgrades file.

 

// List of packages to not update (regexp are supported) Unattended-Upgrade::Package-Blacklist { "linux-generic"; "linux-image-generic"; "linux-headers-generic";

 

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454481533
RAC: 8778

I used this procedure several

I used this procedure several years ago and it was successful.  I will probably be counting on it again as soon as I get my Ubuntu tower back.  Here is the link:

Be sure that your "location" has GPU WU selected and note if you have GPU WUs id in Boinc Manager.  Also note if your  GPU is being properly id'd in the GPU column on the E@H website where your computers are listed.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4700
Credit: 17544086481
RAC: 6398591

That post is completely

That post is completely outdated and incorrect for the current drivers. Will get you into further troubles in fact.

And you need current drivers to support the current hardware and operating systems.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673572846
RAC: 1764075

Keith Myers wrote: Just

Keith Myers wrote:

Just run:

sudo apt-mark hold linux-generic linux-image-generic linux-headers-generic

This will prevent you from updating your kernel any more after the initial install. You can also put any kernel updates in the blacklist section in /etc/apt/apt.conf.d/50unattended-upgrades file.

 

// List of packages to not update (regexp are supported) Unattended-Upgrade::Package-Blacklist { "linux-generic"; "linux-image-generic"; "linux-headers-generic";

Thank you.  After I re-installed Ubuntu 2.0 (again) I do need to report that the "native" install of the Nvidia drivers produced a non-booting system.

However, prior to that experimental cycle, I had successfully installed the AMD drivers for RX 580 followed by a successful install of the Nvidia drivers.  However, my ability to boot 7 GPUs went away due to an "out of memory" and failure to load the kernel issues till I got down to 3 Nvidia cards and 2 RX 580 cards.

I expect to reinstall the OS, applied the changes listed above.  Try running an update and then install the Nvidia drivers to see if that works.  Otherwise, I may resort to the launchpad install process to see if that works.

I am now pretty sure that the Boinc toolset is not adaptable to running heterogeneous collections of GPUs on the same instance of the same project unless everyone is running a single task per gpu (lowest common denominator).  You pretty much need gpus that can run the same number of tasks per gpu.

I have acquired through a private sale 4 RX 570's and a problem child RX 570.  They are "in the mail".  I expect to swap out all or nearly all of my Nvidia cards for these cards.  This will give me a chance at a 6 or 7 Rx 570/580 rig.  Which should end up in the top 50 too.

Tom M

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454481533
RAC: 8778

Keith Myers wrote:That post

Keith Myers wrote:

That post is completely outdated and incorrect for the current drivers. Will get you into further troubles in fact.

And you need current drivers to support the current hardware and operating systems.

Like I said I had not used this procedure in several years.  But I am downloading drivers specific to my AMD GPU an RX550 card.  You might be correct about this procedure being outdated but I am not clear by your reference to "current drivers to support current hardware".  The RX550 drivers are still available on the AMD website along with many others and the RX550 has been in my pc for at least 10 years.  Hopefully I will get my tower back with the RX550 GPU soon.  Then I will give driver installation a go.  

Instead of saying the procedure is "incorrect" maybe you could provide the correct steps for installing AMD drivers for a given GPU.  

Currently posted on the AMD website for the install of drivers for my AMD GPU RX550:  See here

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4700
Credit: 17544086481
RAC: 6398591

If you want to crunch, you

If you want to crunch, you need the OpenCL component of the drivers.  The PAL version of OpenCL is deprecated and replaced by the ROCr runtime version of OpenCL now.

So your command line parameter you posted will fail.  In fact look at Tom's install log at the beginning of the thread and see where he tried to use your command line parameters and was met with the warnings by the driver install that the parameter was deprecated and attempted to install the ROCr components.

It doesn't matter about the hardware.  The current drivers handle all past and current hardware. Even for your older  RX550.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.