And there really wasn't anything wrong to begin with. So sorry, everyone. Thank you for all your excellent advice, as usual. I learned a lot:
Maybe you could layout the exact correct steps for a newbie should they buy an AMD card and try to get it running under Fedora Linux so it's saved for posterity here in the Forums. That way they also won't have to go thru what you did.
Maybe you could layout the exact correct steps for a newbie should they buy an AMD card and try to get it running under Fedora Linux so it's saved for posterity here in the Forums. That way they also won't have to go thru what you did.
Sure. First, I'll refer to the thread about AMDGPU and Fedora (and, really, any other "unsupported" Linux distro) this that details everything on the Fedora forums:
The only supported distributions are RHEL & Ubuntu LTS (& descendants?). But, as I say, what is "officially" supported is basically the pro cards so we're all living on luck and OSS greatness.
What happened differently this time for me isn't clear, but I must not have done it correctly, though I had, got confused, and panicked.
That thread solution shows the minimal working configuration. Althought I didn't start over from a clean install this time, I believe the only package you really need to install is rocm-opencl, which should include all necessary dependencies. I also recommend rocm-clinfo and rocminfo pkgs, though too, for checking bits. If rocm-clinfo and rocminfo show good information w/o any errors, then the system is functional.
If clinfo (or rocm-clinfo) shows any "Clover" or MESA "platforms", it's possible your application will get confused, but that's technically not the OS's fault. Just search pkgs for opencl and remove any mesa- ones. That's the easiest solution.
Here's what I have now, which is a slightly larger list than documented in that Fedora community thread, and I don't think all of it is necessary, but it is also safe have all this installed; no downside for these few extra pkgs:
Have you asked for help with the fan control over at Ricks-Lab forum and issues.
...
Or also possibly a system permission issue not allowing user access to /sys elements.
I believe someone has already asked or started an issue... I'll double check... oh, maybe not. Okay, I'll ask there, thanks. There is also an issue open about it on Freedesktop.org's gitlab space.
Yeah, I checked permissions. Not SELinux either. Writing "1" to pwm1_enable succeeds, but there's no effect; it doesn't change value. Writing to pwm1 throws a perms error. /sys is weird, though, so maybe I'm just not seeing it. It behaves as if it's just unimplemented stub code. The inodes in /sys are there, but, the interface is not working. Well, that's not completely true, either. The information in /sys *is* correct; it reports fan speed and mode accurately, AFAICT. You just can't interact as before with all the other cards from the past 6-7 years or however long it's been.
I even tried playing around with the "feature mask". When I run gpu-pac, it says "no writable gpus" or something, then "set kernel boot option amdgpu.ppfeaturemask=0xfffb7fff". I tried that, but that also didn't have any effect. I saw some other people tried using mask 0xfffffff, instead, but I didn't try that one. I understand how masks work, so that seems like could help, but I didn't see any documentation on what features correspond to which bits, so, I figure I'll leave it and hope it gets fixed, soon. For now, I'm using a script to do the temperature control by throttling BOINC. Pretty effective. I'm loosing about 20 % of available compute time + some overhead. Will be nice to get it fixed, but I can live with it, for now. Have too much other work to do.
corectrl might be a future option for your fan control. Issue 344 describes problems with the 7900xtx and has some current responses from two AMD developers. Most recent response is a week old with the expectation of a driver solution in April.
I'm returning here to just give some follow-up. After much effort, I wasn't able to get E@H to crunch without system crashes, so I actually gave up on E@H. That was April 2023.
Since then, there have been many updates to AMDGPU and ROCm. I was told ROCm 6.1 was something like a major rewrite, and that hit my distro (Fedora) recently. Remember, I'm running stock Fedora; no special pkgs or tweaks.
I thought everything was good and crunched a bunch of O3MD1 GW tasks. But, when it hit the BRP7 (Meerkat), it crashed my system. It then crashed two more times almost instantly after starting BRP7 tasks. So, it still doesn't work like it did for previous cards.
I think I said this before, but some delays in OCL support in the OSS stack for new GPU architectures are common with AMD, and I usually don't buy the newest cards. But, we are close to two years since this card was released, and it still doesn't work right. I think this is extremely strange, and obviously disappointing.
Paul wrote: Success! And
)
Maybe you could layout the exact correct steps for a newbie should they buy an AMD card and try to get it running under Fedora Linux so it's saved for posterity here in the Forums. That way they also won't have to go thru what you did.
Have you asked for help with
)
Have you asked for help with the fan control over at Ricks-Lab forum and issues.
I bet he would have great knowledge about fan control since he figured it out for the older cards.
Probably not probing or prodding the correct register address for which variable controls the fans.
Or also possibly a system permission issue not allowing user access to /sys elements.
mikey wrote: Maybe you could
)
Sure. First, I'll refer to the thread about AMDGPU and Fedora (and, really, any other "unsupported" Linux distro) this that details everything on the Fedora forums:
https://discussion.fedoraproject.org/t/how-to-deal-with-amds-new-amdgpu-installer-20-40/59499/17
The only supported distributions are RHEL & Ubuntu LTS (& descendants?). But, as I say, what is "officially" supported is basically the pro cards so we're all living on luck and OSS greatness.
What happened differently this time for me isn't clear, but I must not have done it correctly, though I had, got confused, and panicked.
That thread solution shows the minimal working configuration. Althought I didn't start over from a clean install this time, I believe the only package you really need to install is rocm-opencl, which should include all necessary dependencies. I also recommend rocm-clinfo and rocminfo pkgs, though too, for checking bits. If rocm-clinfo and rocminfo show good information w/o any errors, then the system is functional.
If clinfo (or rocm-clinfo) shows any "Clover" or MESA "platforms", it's possible your application will get confused, but that's technically not the OS's fault. Just search pkgs for opencl and remove any mesa- ones. That's the easiest solution.
Here's what I have now, which is a slightly larger list than documented in that Fedora community thread, and I don't think all of it is necessary, but it is also safe have all this installed; no downside for these few extra pkgs:
hsakmt-1.0.6-27.rocm5.4.1.fc37.x86_64
rocm-runtime-5.4.1-1.fc37.x86_64
rocminfo-5.4.1-1.fc37.x86_64
rocm-device-libs-5.4.1-1.fc37.x86_64
rocm-smi-4.0.0-6.fc37.noarch
rocm-clinfo-5.4.3-1.fc37.x86_64
opencl-headers-3.0-12.20220510gitdef8be9.fc37.noarch
ocl-icd-2.3.1-2.fc37.x86_64
opencl-filesystem-1.0-16.fc37.noarch
ocl-icd-2.3.1-2.fc37.i686
rocm-comgr-5.4.1-2.fc37.x86_64
rocm-opencl-5.4.3-1.fc37.x86_64
Keith Myers wrote: Have you
)
I believe someone has already asked or started an issue... I'll double check... oh, maybe not. Okay, I'll ask there, thanks. There is also an issue open about it on Freedesktop.org's gitlab space.
Yeah, I checked permissions. Not SELinux either. Writing "1" to pwm1_enable succeeds, but there's no effect; it doesn't change value. Writing to pwm1 throws a perms error. /sys is weird, though, so maybe I'm just not seeing it. It behaves as if it's just unimplemented stub code. The inodes in /sys are there, but, the interface is not working. Well, that's not completely true, either. The information in /sys *is* correct; it reports fan speed and mode accurately, AFAICT. You just can't interact as before with all the other cards from the past 6-7 years or however long it's been.
I even tried playing around with the "feature mask". When I run gpu-pac, it says "no writable gpus" or something, then "set kernel boot option amdgpu.ppfeaturemask=0xfffb7fff". I tried that, but that also didn't have any effect. I saw some other people tried using mask 0xfffffff, instead, but I didn't try that one. I understand how masks work, so that seems like could help, but I didn't see any documentation on what features correspond to which bits, so, I figure I'll leave it and hope it gets fixed, soon. For now, I'm using a script to do the temperature control by throttling BOINC. Pretty effective. I'm loosing about 20 % of available compute time + some overhead. Will be nice to get it fixed, but I can live with it, for now. Have too much other work to do.
corectrl might be a future
)
corectrl might be a future option for your fan control. Issue 344 describes problems with the 7900xtx and has some current responses from two AMD developers. Most recent response is a week old with the expectation of a driver solution in April.
I'm returning here to just
)
I'm returning here to just give some follow-up. After much effort, I wasn't able to get E@H to crunch without system crashes, so I actually gave up on E@H. That was April 2023.
Since then, there have been many updates to AMDGPU and ROCm. I was told ROCm 6.1 was something like a major rewrite, and that hit my distro (Fedora) recently. Remember, I'm running stock Fedora; no special pkgs or tweaks.
I thought everything was good and crunched a bunch of O3MD1 GW tasks. But, when it hit the BRP7 (Meerkat), it crashed my system. It then crashed two more times almost instantly after starting BRP7 tasks. So, it still doesn't work like it did for previous cards.
I think I said this before, but some delays in OCL support in the OSS stack for new GPU architectures are common with AMD, and I usually don't buy the newest cards. But, we are close to two years since this card was released, and it still doesn't work right. I think this is extremely strange, and obviously disappointing.