Troubleshooting Ubuntu 20 and a fresh install of Amd drivers

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4704
Credit: 17548446266
RAC: 6434271

I've never had any issues

I've never had any issues with a Linux host suspending on me, unasked unless provoked by power outages.

Run 24/7 with Performance governor.

Have the Suspend & Power Button in the Power Settings option set for suspend when on battery (UPS) to  keep from discharging the battery too far which will hurt it after 20 minutes.

Have not enabled Suspend when plugged in.  Simple setup.

 

cecht
cecht
Joined: 7 Mar 18
Posts: 1421
Credit: 2445716274
RAC: 1497909

I just came out of a rabbit

I just came out of a rabbit hole that I dropped into after a botched upgrade of Ubuntu and AMDGPU drivers. (Basically, I failed to follow the good advice in this thread.) In all the discussions and fixes described here, in "Linux kernel 5.10 + AMDGPU + Radeon 20.45 = Frequent Gnome Crashes", and in "A quick guide: How to install OpenCL for AMD GPUs on Linux Kubuntu 18.04 (and similar distro)" I had a problem that hadn't been covered. Regardless of the combination of kernels, AMDGPU versions, and successful AMDGPU removal and installation options, the drivers were not recognizing my GPUs.  I finally found a solution on a cryptominer's forum. It turns out that somewhere in my fumbling the upgrade and recovery attempts, amdgpu package was blacklisted by the system which prevented amdgpu from loading.

The Fix: check to see if amdgpu is blacklisted, if so, delete the file and reboot.

$ ls /etc/modprobe.d/ alsa-base.conf blacklist-ath_pci.conf blacklist-framebuffer.conf blacklist-rare-network.conf iwlwifi.conf amd64-microcode-blacklist.conf blacklist.conf blacklist-modem.conf blacklist-amdgpu.conf dkms.conf blacklist-firewire.conf blacklist-oss.conf intel-microcode-blacklist.conf 
$ cat /etc/modprobe.d/blacklist-amdgpu.conf blacklist amdgpu 
$ sudo rm /etc/modprobe.d/blacklist-amdgpu.conf
$ reboot

I'm now running Ubuntu 20.04.3 with a 5.4 kernel and just the opencl=legacy component of AMDGPU 20.10, which is where I had (desperately) wound things back to before discovering the blacklist fix. Now to decide whether to upgrade things once again or let the sleeping dog lie. My other host is running fine with Ubuntu 20.04.2, a 5.11 kernel, and the opencl=rocr component of AMDGPU 20.10.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 1421
Credit: 2445716274
RAC: 1497909

After an unfortunate

After an unfortunate excursion in trying to upgrade to Ubuntu 22.04 with the latest AMDGPU 22.1 drivers (and everything in between), I followed the suggestions here with the following differences to achieve success in getting my RX 5600 XT crunching again.

I installed Ubuntu 20.04.1 from the EUFI USB drive (not direct from USB) without opting to run updates during installation (left Ethernet plugged in).

The amdgpu 21.10 package wouldn't install without Ubuntu updates (b/c of broken packages), so first held back kernel upgrades with:

sudo apt-mark hold linux-generic-hwe-20.04 linux-image-generic-hwe-20.04 linux-headers-generic-hwe-20.04

...THEN updated Ubuntu packages:
sudo apt update && sudo apt upgrade
(Note that this will update Ubuntu to 20.04.5, but not touch the kernel image.)

In case a prior failed attempt was made to install AMD drivers, uninstall direct from the bin dir:
/usr/bin/amdgpu-uninstall

...then from within the amdgpu-pro-21.10-1247438-ubuntu-20.04 directory, run the installation:

./amdgpu-install -y --opencl=rocr,legacy

    Which ends with building the initial module for the original kernel, 5.4.0-42-generic
    Note that the --headless and --no-dkms options were not used.

...then add subgroups :
~$ sudo usermod -a -G video $LOGNAME
~$ sudo usermod -a -G render $LOGNAME

I installed BOINC 7.16.6+dfsg-1 from the Ubuntu Software app (or could use '$ snap install boinc').

Then reboot.

Crunching went fine for a few hours, but threw a couple computation errors and stalled on a couple FGRPBG1 tasks, so I rebooted the system.  I don't know if the next step was necessary, but before resuming BOINC, I installed rickslab gpu-utils from GitHub, followed the documentation for setup, then used <gpu-pac --execute> to underclock the card's sclk to 1750 MHz (from default 1780) and set the power profile mode (PPM) to Compute. Everything is running smoothly so far...

One thing I noticed in this round of driver installation hell, which I missed in all prior attempts over the years, is that amdgpu installation will grab the newest Linux kernel module from which to create a new image, NOT the current running kernel version or grub default kernel. This is the reason one of the external links referenced earlier says to remove newer (incompatible) kernel modules before doing an installation. 

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2769
Credit: 4542317617
RAC: 2175804

CECHT, I'm a bit

CECHT,

I'm a bit befuddled...  why did you not want to upgrade to Ubuntu 22.04 with the newest Linux kernel module?

I was a bit perplexed following your line of reasoning.  Maybe I missed something.

George

Proud member of the Old Farts Association

cecht
cecht
Joined: 7 Mar 18
Posts: 1421
Credit: 2445716274
RAC: 1497909

Sorry for the confusion. I

Sorry for the confusion. I did want to and actually did the upgrade (and quite liked the new Ubuntu interface). Crunching worked for a short while, but after I did a system reboot, I got nothing but computations errors with my RX 5600 on gamma-ray tasks. I subsequently tried many different combinations of driver and system install options, but nothing worked until I rolled it all the way back to the earlier kernel and driver package.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.