Dear all,
I have a new computer dedicated to BOINC, featuring an AMD Ryzen 9 7950X CPU and an AMD Radeon RX 7900 XTX GPU.
I'm getting good results with projects like Universe@Home and Asteroids@Home on the CPU front, but so far I've been unable to get the GPU to work with BOINC.
The computer is running Ubuntu 22.04 LTS x86_64 with BOINC 7.20.5, and I followed the official procedure to install the AMD drivers:
https://amdgpu-install.readthedocs.io/en/latest/
I tried two GPU projects: Einstein@Home and Milkyway@Home, and both failed in similar ways:
Error at Einstein@Home:
<core_client_version>7.20.5</core_client_version> <![CDATA[ <message> process exited with code 69 (0x45, -187)</message> <stderr_txt> 20:14:43 (2080): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16
20:14:43 (2080): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
20:14:43 (2080): [debug]: 1e+16 fp, 1e+09 fp/s, 10500000 s, 2916h40m00s00
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4021L00.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 8.726650e-08 --ldiBins 30 --f0start 1180.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.413729381e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4021L00_1188_9484839.dat --debug 0 --device 0 -o LATeah4021L00_1188.0_0_0.0_9484839_1_0.out
output files: 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4021L00_1188.0_0_0.0_9484839_1_0' 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4021L00_1188.0_0_0.0_9484839_1_1'
20:14:43 (2080): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
20:14:43 (2080): [debug]: glibc version/release: 2.35/stable
20:14:43 (2080): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x2804a50 , 0x7f0017225eb0]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc.
Max allocation limit: 21890072576
Global mem size: 25753026560
Couldn't create OpenCL command queue (error: -6)!
OpenCL shutdown complete!
initialize_ocl returned error [2013]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
20:14:43 (2080): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory
20:14:55 (2080): [normal]: done. calling boinc_finish(69).
20:14:55 (2080): called boinc_finish
</stderr_txt>
]]>
I believe the drivers are correctly installed, at least "clinfo" is working just fine. I also installed both OpenCL implementations, ROCr and "legacy".
I know BOINC complains about the lack of memory in the logs but this is nonsense IMHO: the GPU has 24GB or RAM, and the host itself has over 10GB of RAM available, out of 16GB.
What am I doing wrong ?
Should I report a bug over at AMD, as per https://amdgpu-install.readthedocs.io/en/latest/install-bugrep.html ?
Thank you all in advance for your help :)
Best regards,
Samuel
P.S: please have a look here for more details: https://boinc.berkeley.edu/forum_thread.php?id=14916
Copyright © 2024 Einstein@Home. All rights reserved.
try the full ROCm install for
)
try the full ROCm install for drivers
https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html
_________________________________________________________________________
Hi, I uninstalled the
)
Hi,
I uninstalled the previous drivers and reinstalled the full ROCm stack by following your procedure:
https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html
Still no luck: https://einsteinathome.org/fr/task/1430174600
11:33:08 (1809): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
11:33:08 (1809): [debug]: 1e+16 fp, 1e+09 fp/s, 10500000 s, 2916h40m00s00
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4021L04.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 8.726650e-08 --ldiBins 30 --f0start 692.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.413729381e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4021L04_0700_1548561.dat --debug 0 --device 0 -o LATeah4021L04_700.0_0_0.0_1548561_1_0.out
output files: 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4021L04_700.0_0_0.0_1548561_1_0' 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4021L04_700.0_0_0.0_1548561_1_1'
11:33:08 (1809): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
11:33:08 (1809): [debug]: glibc version/release: 2.35/stable
11:33:08 (1809): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x148aa40 , 0x7fadd59d7eb0]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc.
Max allocation limit: 21890072576
Global mem size: 25753026560
Couldn't create OpenCL command queue (error: -6)!
OpenCL shutdown complete!
initialize_ocl returned error [2013]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
11:33:08 (1809): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah4021L04_700.0_0_0.0_1548561_1_0.out.cohfu': No such file or directory
11:33:19 (1809): [normal]: done. calling boinc_finish(69).
11:33:19 (1809): called boinc_finish
</stderr_txt>
]]>
What am I doing wrong ?
Thanks, Samuel
I suspect that the driver
)
I suspect that the driver support for opencl on these cards is poor or not implemented properly. I saw DA's comments that it's a problem with the Einstein application, but that seems unlikely to me (not impossible though). have you been able to run ANY opencl application or load?
can you load windows on this system and try there? just to see if it will work with Windows drivers.
_________________________________________________________________________
This has been a problem for
)
This has been a problem for all AMD cards with Boinc > 7.16.x, not just the latest round of cards.
Find boinc-client.service file (in /usr/lib/systemd/system/ for Ubuntu) and change ProtectSystem=strict to ProtectSystem=off.
This just might clear up your problem - it did for me.
mountkidd wrote: This has
)
this is a good thing to try before he tries Windows. I'm aware of this issue, but wasnt sure if it caused the specific error with AMD cards.
Sam, please let us know if this works.
_________________________________________________________________________
Hi all, Setting
)
Hi all,
Setting ProtectSystem to "off" did the trick (at least for Einstein@Home), thanks for your help :)
Milkyway@Home GPU tasks still refuse to run, I'll post another message at their boards.
Cheers, Sam
Hey Sam, I've a bunch
)
Hey Sam,
I've a bunch of amd gpus (rx 570, 2 * rx 5700xt, rx 6600) and I can tell you that Milkyway won't run on them. I (and other people) posted this problem on their forum, but Milkyway guys don't seem to be doing anything to fix it. Asteroids just plainly requires cuda only. I can recommend for you to run Einstein, primegrid and amicable numbers. Also you may try folding, but I didn't yet. I feel like the three projects are enough for me.
You take care, man.
Wow great find and thanks for
)
Wow great find and thanks for this! I just went back to an AMD mobo and Ryzen 7800X3D after getting sucked back in (from intel iron for the last 8 yrs or so). With the Radeon 7900XT gpu I couldn't get it running. I couldn't figure out who to blame first, the new AMD cpu or the gpu. I've been finding the new gee-whizz 7800X3D cpu to be a bit picky shall we say with BIOS setup. For plug and pray - Intel still maybe has the edge for mass consumers. And windows OS, of course. Not my cup of tea.
Cheers and happy Memorial Day to all the vets and their families.
Hmm, so, I looked at
)
Hmm, so, I looked at @MAGIC_SAM's system, and it seems like it's crunching fine. Wow, this is great news for everyone...else.
From what I can see and what Sam reported, Sam is using Ubuntu LTS and possibly the proprietary drivers. Frustrating, but seems like it is a driver issue with the OSS stack.
"I tried two GPU projects:
)
"I tried two GPU projects: Einstein@Home and Milkyway@Home"
I am using Radeon VII on Mint 21.1 Mate and
Einstein@Home runs gpu tasks fine same like Primegrid, NumberFields or SRBase.
Milkyway was showing similar errors on my system before but it stopped sending gpu tasks recently due to project changes and sends cpu tasks only now.
I installed drivers using:
amdgpu-install_5.4.50403-1_all.deb
amdgpu-install --opencl=rocr,legacy --vulkan=amdvlk,pro --accept-eula
some info from my pc:
dkms status
amdgpu/6.0.5-1581431.22.04, 5.15.0-75-generic, amd64: installed
amdgpu/6.0.5-1581431.22.04, 5.15.0-76-generic, x86_64: installed
ryzen_smu/0.1.2, 5.15.0-75-generic, x86_64: installed
ryzen_smu/0.1.2, 5.15.0-76-generic, x86_64: installed
virtualbox/6.1.38, 5.15.0-75-generic, x86_64: installed
virtualbox/6.1.38, 5.15.0-76-generic, x86_64: installed
note:
you can manually upgrade amd repositories by editing files amdgpu.list amdgpu-proprietary.list rocm.list in /etc/apt/sources.list.d up to version 5.6 I think
but when will you do upgrade you will get stuck on some files so remove them with dpkg -P --force-all and repeat upgrade again