All things Linux

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3939
Credit: 46500742642
RAC: 64032564

FYI, just out of curiosity, I

FYI, just out of curiosity, I manually installed the 6.0.0 kernel on one of my Ubuntu 22.04 systems (the one mainly running GPUGRID).

 

but even though it booted the OS fine with the 6.0 kernel, the major roadblock was that the Nvidia drivers refused to install with the new kernel. so that was a show stopper for me, and likely most others.

_________________________________________________________________________

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3060
Credit: 4959054353
RAC: 1348103

Thanks for trying it, Ian.  I

Thanks for trying it, Ian.  I was wondering if it would work.

George

Proud member of the Old Farts Association

JohnDK
JohnDK
Joined: 25 Jun 10
Posts: 116
Credit: 2550940478
RAC: 2317485

Ian&Steve C. skrev: FYI,

Ian&Steve C. wrote:

FYI, just out of curiosity, I manually installed the 6.0.0 kernel on one of my Ubuntu 22.04 systems (the one mainly running GPUGRID).

 

but even though it booted the OS fine with the 6.0 kernel, the major roadblock was that the Nvidia drivers refused to install with the new kernel. so that was a show stopper for me, and likely most others.

Was it with the download drivers or with the repository drivers (or both)?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3939
Credit: 46500742642
RAC: 64032564

JohnDK wrote: Ian&Steve C.

JohnDK wrote:

Ian&Steve C. wrote:

FYI, just out of curiosity, I manually installed the 6.0.0 kernel on one of my Ubuntu 22.04 systems (the one mainly running GPUGRID).

 

but even though it booted the OS fine with the 6.0 kernel, the major roadblock was that the Nvidia drivers refused to install with the new kernel. so that was a show stopper for me, and likely most others.

Was it with the download drivers or with the repository drivers (or both)?

both

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3939
Credit: 46500742642
RAC: 64032564

also a warning, beware of

also a warning, beware of Nvidia driver 515.76 (currently the latest). causes black screen at boot, and the system unresponsive (can't SSH, cant drop to tty alt shell). it seems like many others have/had the same issue.

 

after my attempts to install nvidia drivers on kernel 6.0 and after I reverted to the normal 5.15 kernel, I figured I would install the latest nvidia driver 515.76 from nvidia .run file. but it booted to a black screen and the system was unresponsive and had to be hard reset with the power button. I reverted the system by manually booting to recovery mode via the grub menu, and manually removing the 515.76 driver, then reinstalling the 515.65.01 driver, which worked fine and brought the system back to normal.

 

some reports that it works fine. but there's enough reports that it doesn't and nvidia devs have replicated the issue and created a bug report for it, so I would just avoid this version until they released a new version.

_________________________________________________________________________

JohnDK
JohnDK
Joined: 25 Jun 10
Posts: 116
Credit: 2550940478
RAC: 2317485

I will just stay with 510.*

I will just stay with 510.* until there's a reason to update.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3939
Credit: 46500742642
RAC: 64032564

I got the Linux 6.0 kernel

I got the Linux 6.0 kernel and nvidia issue sorted, kinda.

 

It's a multifaceted problem with the drivers. you NEED driver nvidia 515.76 for linux kernel 6.0+ (515.65.01 wont compile without editing some stuff and doing some hackey workarounds that I'm not comfortable doing at the moment).

BUT 515.76 also has some serious bugs under certain cases. it's a combination of this driver with RTX 30-series and using an HDMI monitor (maybe some monitors are OK, but these are the most common cases from my research). so you're between a rock and a hard place here if you want to run kernel 6.0 with Nvidia

you will get black screen if:
nvidia driver 515.76 installed (which I am)
using RTX 30-series GPU (which I am)
using HDMI for the main display (which I am)

if you boot this way, the system is unresponsive after booting the OS, and it centers around some issue initializing the HDMI connection on RTX 30-series. an easy-enough workaround I found is to boot the OS with the HDMI cable disconnected, then reconnect the HDMI cable after the OS loads.

I don't have a spare DP monitor or DP->DVI adapter (the monitor on this system is old and only has VGA/DVI/HDMI). I think if you use DP or something other than HDMI, you probably wont have this issue. also maybe OK if you're not using RTX 30-series but that's based on other comments I saw online, I never tried.

 

Nvidia is aware of the problem, and has apparently replicated it, so I guess the next stable driver will fix this issue.

 

this system right now, only runs GPUGRID Python tasks, and my main motivation for trying this configuration with kernel 6.0 was to see if the patch for the CPU issue on AMD made any sort of difference to the higher CPU utilization seen on AMD Zen vs Intel CPUs I've tried on this app. spoiler alert. it doesnt. CPU use is exactly the same, but I'll keep an eye out to see if the tasks run any faster.

See that system here: https://gpugrid.net/show_host_detail.php?hostid=582493

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3939
Credit: 46500742642
RAC: 64032564

oh, and Zenmonitor doesnt

oh, and Zenmonitor doesnt work on kernel 6.0. so there's that too. so you wont get the nicer temp monitoring for AMD

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4963
Credit: 18687463644
RAC: 6044329

Ian&Steve C. wrote: oh, and

Ian&Steve C. wrote:

oh, and Zenmonitor doesnt work on kernel 6.0. so there's that too. so you wont get the nicer temp monitoring for AMD

That's not unexpected I guess.  But you should have temp monitoring working with the stock k10temp driver that was blacklisted for the zenpower installation.

If you un-blacklist the k10temp driver and get it active again, it should provide some basic temp monitoring on Zen cpus.

You also lost any power monitoring with zenpower not active and no power monitoring with the stock Linux drivers anymore through the RAPL interface.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3939
Credit: 46500742642
RAC: 64032564

I saw your post on the

I saw your post on the zenpower GitHub and thought you had installed 6.0 yourself. You just post it on my behalf? 
 

but yeah. I know I should be able to get some basic temp monitoring back with k10. I just didn’t bother with it yet. Losing power measurements isn’t a big loss. Not something that you really have to keep an eye on. It’s all managed by the hardware/BIOS anyway. 

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.