Radeon RX 7900 XTX Linux (Fedora) Boinc Doesn't Detect Usable Driver

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1780975087
RAC: 1304830
Topic 229373

Here we are again.  I must have been though this a few times before, myself, I even see my comments in previous threads about, but I still don't quite remember how to fix it.  I'll keep looking but wanted to post this here right away:

My RX 6800XT was working fine, then I pulled it out and replaced it with a 7900XTX and how Boinc only detects the Clover/Mesa OpenCL implementation.  Clinfo shows three platforms at the top, included the HSA/ROCm one I think it should use, but *doesn't* show more detailed info below, which is what I expected.  And clpeak crashess.  BOINC tries to use the Mesa OpenCL 1.1, but all jobs finish immediately with an error.

I saw the thread about Ubuntu, but that turned out to be a problem that I'm sure is not my case, anyway.

It doesn't make sense that driver was working for the 6800XT but doesn't for the 7900XTX, so I'm guessing it's something else.  I don't like what clinfo and clpeak are doing.

Does anyone know of problems with AMDGPU + OpenCL for the new 7900 series?

Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5752463975
RAC: 7740708

For ubuntu I'd reinstall the

For ubuntu I'd reinstall the amdgpu driver.  For fedora ?

/usr/bin$ sudo ./amdgpu-uninstall

  followed by a clean install of your correct version for your flavour of fedora.

This is for the drivers on the amdgpu firmware support site. 

And assuming your fedora installation scheme isn't so far away from ubuntu / debian

where the /usr/bin path is correct.

Maybe you've been there and done that.

 

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 176
Credit: 12419912555
RAC: 8064590

Clover/Mesa OpenCL seems to

Clover/Mesa OpenCL seems to be getting in the way and most likely won't work with the 7900xtx.  Check the contents of /etc/OpenCL/vendors/.  Files ending in .icd are candidates for opencl and you may be able to guess the source from the filename.  You can cat each file to see which lib is being pointed to and it should be clear which file you probably should be using.  amdgpu file is usually named amdocl64.icd and I don't know/remember if rocr opencl is named the same.  You can prevent a .icd file from being used by renaming .icd to something else. After renaming, reboot to reset the driver.

If you need to reinstall the amdgpu opencl driver, download, install the first d/l part, then  sudo amdgpu-install -y --usecase=opencl --opencl=rocr [--accept-eula].  The eula shouldn't be needed for rocr, but you never know...

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1780975087
RAC: 1304830

Thanks all, but by AMDGPU I

Thanks all, but by AMDGPU I mean NOT AMDGPU-PRO, so that means no (extra) drivers and I didn't use amdgpu-install.  I can't.  But, it's not necessary and I'm sure of that.  I've been around this block many times.

But, good thinking; something like that is the right approach.  I think I should try to remove and reinstall AMD packages. I'll try that and get back to you.

It's true that the MESA OpenCL path doesn't work, and that is what BOINC is trying to use.  BUT!  That's not why it doesn't work.  It doesn't work because the ROCm driver is NOT working, even though it's installed and worked with my old card.

I like what you said about /etc/OpenCL/vendors.  But, it's not that I need to remove MESA OCL 1.1 from the list of usable ones so that it does try that one.  It's that I need to figure out why the OpenCL provider I want to use  isn't on the list, isn't detected, or isn't working.  clinfo shows this:

Platform Name                                   AMD Accelerated Parallel Processing                  Platform Vendor                                 Advanced Micro Devices, Inc.                       
Platform Version                                OpenCL 2.1 AMD-APP (3513.0)                          Platform Profile                                FULL_PROFILE                                       
Platform Extensions                             cl_khr_icd cl_amd_event_callback                     Platform Extensions function suffix             AMD                                                
Platform Host timer resolution                  1ns

That is the correct driver and it's detected.  But, after that, in the details that follow, it is NOT shown.  And, as I say, clpeak fails.  So, I think there is a problem with the driver, I just don't know what

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1780975087
RAC: 1304830

Okay, so, I was able to get

Okay, so, I was able to get it working by installing some ROCm packages from AMD.  This is troublesome because I don't really know which ones I need to have or should have installed.

It also means I might be wrong about not needing other packages.  I would need to do more testing, and I plan to, but this is very disturbing.  I really thought I had it figured out and had lots of evidence to back that up.

However, the system was horribly unstable, crashing frequently.  So, I'm not sure I have everything correct.  I went back to my 6800XT and it seems better, more stable, but it has only been an hour, so, I'm not confident in that assessment either.

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 176
Credit: 12419912555
RAC: 8064590

AMD supports RHEL 7/8/9 -

AMD supports RHEL 7/8/9 - where Fedora 37 fit in?   One of the RHEL bundles would provide the most direct install for opencl but it doesn't seem that this is an option for you.   A search for Fedora/amdgpu/opencl brings up a number of possible solutions that I expect you are aware of.  You only need opencl from amdgpu, all other graphics support can come via mesa.

What file(s) does your /etc/Opencl/vendors/ directory contain and what lib files are listed?  Since you are able to easily flip back to your 6800XT and you do get results that pass validation for both the 6800&7900 cards it seems that AMD rocr opencl is in fact installed.  Is the firmware file for the 7900 current?  You could check /var/log/kern.log when the 7900 is plugged in to see if there is some startup issue...

 

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1780975087
RAC: 1304830

Thanks for the

Thanks for the help!

Fedora is not supported.  But, it doesn't much matter because AMD has promised the support I need to be incorporated into the OSS stack. It's just a question of how much has been put out there now, and how much has been integrated into the distribution. ROCm, at least some parts of it, are in Fedora proper.  For 5700XT & 6800XT, I had good evidence that these extra pkgs from AMD were not required.  But, I confess, my validation could have been better.  I know for sure that it worked once without any extra packages, but I have since added some back trying to get HSA working for things beyond OpenCL.

I uninstalled MESA OpenCL, so now the only thing in OpenCL/vendors is the amdocl64 file, but it's not owned by anyone, which is upsetting.  It points to a library that is part of an AMD RPM, but that was an RPM I replaced, and the official Fedora RPM of the same name provides the same lib.  So, I had that installed before.

I completely agree with your troubleshooting approach, but it's not something that is missing.  ROCm opencl stack was present, it just wasn't working correctly for that card, but did work correctly for the previous gen card.  If it is, in fact, a software issue, then it's the wrong software or conflicting software, not missing software.

Now I don't know much about the firmware, and it sounds a bit more suspicious.  This is a part I haven't looked into, before. ...Okay, I think I found some firmware information about the card in the journal--no more SYSLOG ;-)--so if we know what to look for I think we can.  Nearby, I see "SMU driver if version not matched" then "SMU is initialized successfully!"

What's the name of that pkg? ... Yeah, okay amd-gpu-firmware.  Looks current.  last updated 30 days ago.

Hmm, what am I looking for with the firmware, just errors?  No errors regarding firmware, and no errors, generally, during boot that I can see.  Think I need to check FW versions or something?

I still have some messing I can think of do with the pkgs, so I'll try to make some time to play with that.

In the meantime, this is helping!  Any other ideas?

 

mikey
mikey
Joined: 22 Jan 05
Posts: 12636
Credit: 1839020911
RAC: 5803

Paul wrote: I still have

Paul wrote:

I still have some messing I can think of do with the pkgs, so I'll try to make some time to play with that.

In the meantime, this is helping!  Any other ideas?

Have you tried loading up a copy of Ubuntu in a VM box and see if it works there? It might give you some ideas for libraries you aren't using right now.

ahorek's team
ahorek's team
Joined: 16 Dec 05
Posts: 17
Credit: 249119257
RAC: 3458

Clover/Mesa OpenCL doesn't

Clover/Mesa OpenCL doesn't work for anything except card name detection :) really useless

if you have troubles with ROCm, try an alternative https://github.com/pocl/pocl it works well with AMD cards (including Einstein@Home apps)

and there's also a brand new Rusticl/Mesa https://docs.mesa3d.org/rusticl , but I haven't had a chance to test it yet.

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1780975087
RAC: 1304830

ahorek's team

ahorek's team wrote:

Clover/Mesa OpenCL doesn't work for anything except card name detection :) really useless

Yes, correct.  I'm long aware of this.  I even did some research and posted on the Fedora community about it.  AMD documentation explains it pretty clearly, it's just buried really deep. But, they state explicitly that they are no longer supporting CLover/Mesa OpenCL -- gosh, that must have been 6 years ago or more -- and that, instead, they are supporting OSS OCL via ROCm. That is their official position. Which is how I know this is *supposed* to work, it's just a matter of how much and how well and how that code gets out to end users.

ahorek's team wrote:

if you have troubles with ROCm, try an alternative https://github.com/pocl/pocl it works well with AMD cards (including Einstein@Home apps)

and there's also a brand new Rusticl/Mesa https://docs.mesa3d.org/rusticl , but I haven't had a chance to test it yet.

Oh, now that's very interesting.  POCL was a problem in the past.  It was just like CLover in that BOINC would detect it and not be able to use it.  I guess I could try it again.

Rusticl sounds very interesting to.  Not sure how that could help BOINC, but, good to know.  Keep meaning to teach myself Rust.  It's on the TODO list.

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1780975087
RAC: 1304830

mikey wrote: Have you tried

mikey wrote:

Have you tried loading up a copy of Ubuntu in a VM box and see if it works there? It might give you some ideas for libraries you aren't using right now.

I've been thinking about this over the day.  I thought of a couple of problems with this idea, but I also thought of solutions for all of them.  The question is, how do I get what you are saying I could get out of it.  So, trying Ubuntu means I can use the amdgpu-install. But, it installs a bunch of packages, silently, I think, so then I need to figure out how to get the apt log, which I assume I can do, but I forget how.  Then I need to look up the 'provides' list, and filter that for libs.  I mean, I think that would help, a little.  I guess I could compare that to the same list on my system.  Seems like a lot of work, but I don't see anything wrong with that approach, in theory.

I could also be useful just to stress test it that way.  At this point, I assume it's not a bad card on delivery, but I also cannot be sure it's not.  I suppose this would be one way to test.  Certainly a better test bed than, say, Windows, for me.

I think what I'm going to do is run down a couple other ideas I have, first.  But, I might return to this one.  Thanks.

I also have the "opportunity" to talk to the manufacturer's support.  I do feel like they own me a little help, considering the price I payed.

So, is anyone using the 7900XT/XTX on Linux, now, for OpenCL crunching? I would love to connect with someone who actually is doing that in this project.  I didn't see any on the top 50 machines list, which is where I expected to find them.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.