Can't get the AMD Radeon RX 7900 XTX to crunch for Einstein@Home on Ubuntu 22.04 LTS

magic_sam
magic_sam
Joined: 30 Dec 21
Posts: 23
Credit: 355485973
RAC: 1286869
Topic 229100

Dear all,

I have a new computer dedicated to BOINC, featuring an AMD Ryzen 9 7950X CPU and an AMD Radeon RX 7900 XTX GPU.

I'm getting good results with projects like Universe@Home and Asteroids@Home on the CPU front, but so far I've been unable to get the GPU to work with BOINC.

The computer is running Ubuntu 22.04 LTS x86_64 with BOINC 7.20.5, and I followed the official procedure to install the AMD drivers:

https://amdgpu-install.readthedocs.io/en/latest/

I tried two GPU projects: Einstein@Home and Milkyway@Home, and both failed in similar ways:

Error at Einstein@Home:

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 69 (0x45, -187)</message>
<stderr_txt>
20:14:43 (2080): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16

20:14:43 (2080): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
20:14:43 (2080): [debug]: 1e+16 fp, 1e+09 fp/s, 10500000 s, 2916h40m00s00
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4021L00.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 8.726650e-08 --ldiBins 30 --f0start 1180.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.413729381e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4021L00_1188_9484839.dat --debug 0 --device 0 -o LATeah4021L00_1188.0_0_0.0_9484839_1_0.out
output files: 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4021L00_1188.0_0_0.0_9484839_1_0' 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4021L00_1188.0_0_0.0_9484839_1_1'
20:14:43 (2080): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
20:14:43 (2080): [debug]: glibc version/release: 2.35/stable
20:14:43 (2080): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x2804a50 , 0x7f0017225eb0]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc.
Max allocation limit: 21890072576
Global mem size: 25753026560
Couldn't create OpenCL command queue (error: -6)!
OpenCL shutdown complete!
initialize_ocl returned error [2013]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
20:14:43 (2080): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
20:14:55 (2080): [normal]: done. calling boinc_finish(69).
20:14:55 (2080): called boinc_finish

</stderr_txt>
]]>

I believe the drivers are correctly installed, at least "clinfo" is working just fine. I also installed both OpenCL implementations, ROCr and "legacy".

I know BOINC complains about the lack of memory in the logs but this is nonsense IMHO: the GPU has 24GB or RAM, and the host itself has over 10GB of RAM available, out of 16GB.

What am I doing wrong ?

Should I report a bug over at AMD, as per https://amdgpu-install.readthedocs.io/en/latest/install-bugrep.html ?

Thank you all in advance for your help :)

Best regards,

Samuel

P.S: please have a look here for more details: https://boinc.berkeley.edu/forum_thread.php?id=14916

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3713
Credit: 34660169759
RAC: 29986272

try the full ROCm install for

try the full ROCm install for drivers

https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html

_________________________________________________________________________

magic_sam
magic_sam
Joined: 30 Dec 21
Posts: 23
Credit: 355485973
RAC: 1286869

Hi, I uninstalled the

Hi,

I uninstalled the previous drivers and reinstalled the full ROCm stack by following your procedure:

https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html

Still no luck: https://einsteinathome.org/fr/task/1430174600

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 69 (0x45, -187)</message>
<stderr_txt>
11:33:08 (1809): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16

11:33:08 (1809): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
11:33:08 (1809): [debug]: 1e+16 fp, 1e+09 fp/s, 10500000 s, 2916h40m00s00
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4021L04.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 8.726650e-08 --ldiBins 30 --f0start 692.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.413729381e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4021L04_0700_1548561.dat --debug 0 --device 0 -o LATeah4021L04_700.0_0_0.0_1548561_1_0.out
output files: 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4021L04_700.0_0_0.0_1548561_1_0' 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4021L04_700.0_0_0.0_1548561_1_1'
11:33:08 (1809): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
11:33:08 (1809): [debug]: glibc version/release: 2.35/stable
11:33:08 (1809): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x148aa40 , 0x7fadd59d7eb0]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc.
Max allocation limit: 21890072576
Global mem size: 25753026560
Couldn't create OpenCL command queue (error: -6)!
OpenCL shutdown complete!
initialize_ocl returned error [2013]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
11:33:08 (1809): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
11:33:19 (1809): [normal]: done. calling boinc_finish(69).
11:33:19 (1809): called boinc_finish

</stderr_txt>
]]>

What am I doing wrong ?

Thanks, Samuel

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3713
Credit: 34660169759
RAC: 29986272

I suspect that the driver

I suspect that the driver support for opencl on these cards is poor or not implemented properly. I saw DA's comments that it's a problem with the Einstein application, but that seems unlikely to me (not impossible though). have you been able to run ANY opencl application or load?

can you load windows on this system and try there? just to see if it will work with Windows drivers.

_________________________________________________________________________

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 175
Credit: 11029787069
RAC: 5288468

This has been a problem for

This has been a problem for all AMD cards with Boinc > 7.16.x, not just the latest round of cards.

Find boinc-client.service file (in /usr/lib/systemd/system/ for Ubuntu) and change ProtectSystem=strict to ProtectSystem=off

This just might clear up your problem - it did for me.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3713
Credit: 34660169759
RAC: 29986272

mountkidd wrote: This has

mountkidd wrote:

This has been a problem for all AMD cards with Boinc > 7.16.x, not just the latest round of cards.

Find boinc-client.service file (in /usr/lib/systemd/system/ for Ubuntu) and change ProtectSystem=strict to ProtectSystem=off

This just might clear up your problem - it did for me.

this is a good thing to try before he tries Windows. I'm aware of this issue, but wasnt sure if it caused the specific error with AMD cards.

Sam, please let us know if this works.

_________________________________________________________________________

magic_sam
magic_sam
Joined: 30 Dec 21
Posts: 23
Credit: 355485973
RAC: 1286869

Hi all, Setting

Hi all,

Setting ProtectSystem to "off" did the trick (at least for Einstein@Home), thanks for your help :)

Milkyway@Home GPU tasks still refuse to run, I'll post another message at their boards.

Cheers, Sam

alex
alex
Joined: 8 Apr 21
Posts: 6
Credit: 1273293677
RAC: 4909837

Hey Sam,   I've a bunch

Hey Sam,

 

I've a bunch of amd gpus (rx 570, 2 * rx 5700xt, rx 6600) and I can tell you that Milkyway won't run on them. I (and other people) posted this problem on their forum, but Milkyway guys don't seem to be doing anything to fix it. Asteroids just plainly requires cuda only. I can recommend for you to run Einstein, primegrid and amicable numbers. Also you may try folding, but I didn't yet. I feel like the three projects are enough for me.

 

You take care, man.

Mike
Mike
Joined: 26 Dec 20
Posts: 39
Credit: 4851986708
RAC: 9517211

Wow great find and thanks for

Wow great find and thanks for this!  I just went back to an AMD mobo and Ryzen 7800X3D after getting sucked back in (from intel iron for the last 8 yrs or so).  With the Radeon 7900XT gpu I couldn't get it running.  I couldn't figure out who to blame first, the new AMD cpu or the gpu.  I've been finding the new gee-whizz 7800X3D cpu to be a bit picky shall we say with BIOS setup.  For plug and pray - Intel still maybe has the edge for mass consumers.  And windows OS, of course.  Not my cup of tea.

Cheers and happy Memorial Day to all the vets and their families.

 

Paul
Paul
Joined: 3 May 07
Posts: 121
Credit: 1654597150
RAC: 21915

Hmm, so, I looked at

Hmm, so, I looked at @MAGIC_SAM's system, and it seems like it's crunching fine.  Wow, this is great news for everyone...else.

From what I can see and what Sam reported, Sam is using Ubuntu LTS and possibly the proprietary drivers. Frustrating, but seems like it is a driver issue with the OSS stack. 

Technik007[CZ]
Technik007[CZ]
Joined: 10 Dec 11
Posts: 1
Credit: 147035738
RAC: 126597

"I tried two GPU projects:

"I tried two GPU projects: Einstein@Home and Milkyway@Home"

I am using Radeon VII on Mint 21.1 Mate and

Einstein@Home runs gpu tasks fine same like Primegrid, NumberFields or  SRBase.

Milkyway was showing similar errors on my system before but it stopped sending gpu tasks recently due to project changes and sends cpu tasks only now.

 

I installed drivers using:

amdgpu-install_5.4.50403-1_all.deb

amdgpu-install --opencl=rocr,legacy --vulkan=amdvlk,pro --accept-eula

 

some info from my pc:

dkms status
amdgpu/6.0.5-1581431.22.04, 5.15.0-75-generic, amd64: installed
amdgpu/6.0.5-1581431.22.04, 5.15.0-76-generic, x86_64: installed
ryzen_smu/0.1.2, 5.15.0-75-generic, x86_64: installed
ryzen_smu/0.1.2, 5.15.0-76-generic, x86_64: installed
virtualbox/6.1.38, 5.15.0-75-generic, x86_64: installed
virtualbox/6.1.38, 5.15.0-76-generic, x86_64: installed

 

note:
you can manually upgrade amd repositories by editing files amdgpu.list  amdgpu-proprietary.list rocm.list in /etc/apt/sources.list.d up to version 5.6 I think

but when will you do upgrade you will get stuck on some files so remove them with dpkg -P --force-all and repeat upgrade again

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.