Hi, I am the BOINC maintainer for Fedora / RHEL / CentOS.
Before talking about computation errors on OpenCL on Radeon, I have to introduce you in the whole procedure I have followed.
Recently, I investigated to figure out why boinc-client, while running as a service, could not detect videocard for GPU calculus. In order to fix this problem I had to add Group=video to boinc-client systemd unit file. At the moment I haven't yet pushed the change into stable repositories, but I will do it soon.
AMDGPU-Pro drivers are not available for Fedora, so in order to use OpenCL propertary driver, you have to download AMDGPU-Pro drivers for CentOS 7, unpack libdrm, libAMDOpenCL, libkms RPMs and put them under
/opt/amdgpu-pro/
So you will have those driver files under
/opt/amdgpu-pro/lib64
Don't forget the file amdocl64.icd has to be copied in the path
/etc/OpenCL/vendors/
Ok once you have done, simply add
Environment=LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64
to boinc working unit file. Set SELinux to permissive, or create a new custom rule to let boinc-client use such path for OpenCL drivers. Then restart boinc-client service.
Ok so now boinc-client is able to see OpenCL AMDGPU-Pro device and use it. By the way, all Einstein@home working units fail. Milkyway working units instead, are working perfectly.
Here under I attach some boinc-client logs and the boinc-client systemd unit file
mar 06 giu 2017 10:39:48 CEST | Einstein@Home | project resumed by user mar 06 giu 2017 10:39:52 CEST | Einstein@Home | Sending scheduler request: To report completed tasks. mar 06 giu 2017 10:39:52 CEST | Einstein@Home | Reporting 2 completed tasks mar 06 giu 2017 10:39:52 CEST | Einstein@Home | Requesting new tasks for AMD/ATI GPU mar 06 giu 2017 10:39:53 CEST | Einstein@Home | Computation for task LATeah0030L_1052.0_0_0.0_14738720_0 finished mar 06 giu 2017 10:39:53 CEST | Einstein@Home | Output file LATeah0030L_1052.0_0_0.0_14738720_0_0 for task LATeah0030L_1052.0_0_0.0_14738720_0 absent mar 06 giu 2017 10:39:53 CEST | Einstein@Home | Output file LATeah0030L_1052.0_0_0.0_14738720_0_1 for task LATeah0030L_1052.0_0_0.0_14738720_0 absent mar 06 giu 2017 10:39:53 CEST | Einstein@Home | Starting task LATeah0030L_1052.0_0_0.0_14846650_0 mar 06 giu 2017 10:39:54 CEST | Einstein@Home | Scheduler request completed: got 5 new tasks mar 06 giu 2017 10:39:55 CEST | Einstein@Home | Computation for task LATeah0030L_1052.0_0_0.0_14846650_0 finished mar 06 giu 2017 10:39:55 CEST | Einstein@Home | Output file LATeah0030L_1052.0_0_0.0_14846650_0_0 for task LATeah0030L_1052.0_0_0.0_14846650_0 absent mar 06 giu 2017 10:39:55 CEST | Einstein@Home | Output file LATeah0030L_1052.0_0_0.0_14846650_0_1 for task LATeah0030L_1052.0_0_0.0_14846650_0 absent mar 06 giu 2017 10:39:56 CEST | Einstein@Home | Started download of templates_LATeah0030L_1156_19851590.dat mar 06 giu 2017 10:39:56 CEST | Einstein@Home | Started download of templates_LATeah0030L_1156_19431165.dat mar 06 giu 2017 10:39:58 CEST | Einstein@Home | Finished download of templates_LATeah0030L_1156_19431165.dat mar 06 giu 2017 10:39:58 CEST | Einstein@Home | Started download of templates_LATeah0030L_1156_19432420.dat mar 06 giu 2017 10:39:58 CEST | Einstein@Home | Starting task LATeah0030L_1156.0_0_0.0_19431165_1 mar 06 giu 2017 10:39:59 CEST | Einstein@Home | Finished download of templates_LATeah0030L_1156_19851590.dat mar 06 giu 2017 10:39:59 CEST | Einstein@Home | Finished download of templates_LATeah0030L_1156_19432420.dat mar 06 giu 2017 10:39:59 CEST | Einstein@Home | Started download of templates_LATeah0030L_1156_21105335.dat mar 06 giu 2017 10:39:59 CEST | Einstein@Home | Started download of templates_LATeah0030L_1156_21134200.dat mar 06 giu 2017 10:40:00 CEST | Einstein@Home | Finished download of templates_LATeah0030L_1156_21105335.dat mar 06 giu 2017 10:40:00 CEST | Einstein@Home | Finished download of templates_LATeah0030L_1156_21134200.dat mar 06 giu 2017 10:40:01 CEST | Einstein@Home | Computation for task LATeah0030L_1156.0_0_0.0_19431165_1 finished mar 06 giu 2017 10:40:01 CEST | Einstein@Home | Output file LATeah0030L_1156.0_0_0.0_19431165_1_0 for task LATeah0030L_1156.0_0_0.0_19431165_1 absent mar 06 giu 2017 10:40:01 CEST | Einstein@Home | Output file LATeah0030L_1156.0_0_0.0_19431165_1_1 for task LATeah0030L_1156.0_0_0.0_19431165_1 absent mar 06 giu 2017 10:40:01 CEST | Einstein@Home | Starting task LATeah0030L_1156.0_0_0.0_19851590_0 mar 06 giu 2017 10:40:03 CEST | Einstein@Home | project suspended by user mar 06 giu 2017 10:40:04 CEST | Einstein@Home | Computation for task LATeah0030L_1156.0_0_0.0_19851590_0 finished mar 06 giu 2017 10:40:04 CEST | Einstein@Home | Output file LATeah0030L_1156.0_0_0.0_19851590_0_0 for task LATeah0030L_1156.0_0_0.0_19851590_0 absent mar 06 giu 2017 10:40:04 CEST | Einstein@Home | Output file LATeah0030L_1156.0_0_0.0_19851590_0_1 for task LATeah0030L_1156.0_0_0.0_19851590_0 absent
[Unit] Description=Berkeley Open Infrastructure Network Computing Client Documentation=man:boinc(1) After=network-online.target[Service]
Type=forking
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc_client --daemon --start_delay 1
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f /var/lib/boinc/lockfile
IOSchedulingClass=idle
Environment=LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64
SupplementaryGroups=video[Install]
WantedBy=multi-user.target
Copyright © 2024 Einstein@Home. All rights reserved.
Because your computers are
)
Because your computers are hidden there is no way to look at the "stderr output" of a fail tasks to get a hint as to why it failed. Boincs event log doesn't give any hint as to why it failed, only that it did.
Please either unhide your computers or post a link to one of the failed tasks, like this:
https://einsteinathome.org/task/653309030
Some of
)
Some of them
https://einsteinathome.org/task/653178780
https://einsteinathome.org/task/653176693
https://einsteinathome.org/task/653178730
Thank you
Don't thank me yet , I'm not
)
Don't thank me yet , I'm not a Linux user so can't give any specific help but maybe someone else might help out now that we have more info to go on.
It seems all of the linked tasks fail with the following errors (all copied from different parts of the task page and stderr outpu):
Exit status: 6 (0x00000006) Unknown error code
Maybe one of the devs can shed some light on what "signal 6" means?
As you've been playing around with getting the GPU to run tasks while Boinc is installed in service mode I would suspect that there's still something amiss with the driver installation or the science applications ability to use OpenCL.
https://stackoverflow.com/que
)
https://stackoverflow.com/questions/3413166/when-does-a-process-get-sigabrt-signal-6
Something does not look right
)
Something does not look right in the backtrace - well to my untrained eye. Whenever i see LD= being needed something is wrong with the Force....
here i see this
suggesting the library calls from the Mesa OpenCL libraries but later
AgentB
)
The reason is in first message
Caterpillar wrote:AMDGPU-Pro
)
I use Linux (PCLinuxOS) on a large number of machines, virtually all with usable GPUs these days. I'm not a programmer, just a self taught user and I wouldn't regard my knowledge of how things really work as being all that great. Often by trial and error I get things to work without really understanding the finer details of how and why.
About 4 months ago I decided to investigate what was needed to get an RX460 GPU running. My distro has the amdgpu free driver (not -pro) so the card is detected and usable for producing a GUI but of course not detected by BOINC. I downloaded the -pro driver from AMD and unpacked it all and perused the install script. I identified what I thought would be needed from the whole package to provide OpenCL functionality and installed the bits pretty much as you describe above. I was pleasantly surprised to find that BOINC was able to detect the card and use it and the results were validated just fine.
I now have a total of 12 RX460 cards installed mainly in older machines and all working fine. I put all the bits on a file share and wrote a small install script to automate the whole setup process. There was some instability with the original amdgpu driver (the graphics would crash after about a week of running) but an update to the driver seems to have solved that. Here is a link to one of the oldies that I upgraded with the RX460s. It has a Q6600 quad core CPU and it has been crunching (CPU only until the upgrade) 24/7 for almost 9 years. I had intended retiring a bunch of similar vintage machines but the RX460 upgrade has given them a new lease on life. I've restricted these hosts to just one CPU task alongside 2 concurrent GPU tasks to keep the power use down a bit. I'm still using the same PSUs as before when the machines crunched 4 CPU tasks. The Asus RX460 cards I'm using don't have a PCIe power connector.
I don't know why tasks are failing with your setup. The main reason I'm posting is to suggest that what you are doing should work, since it does for me. There must be something with your setup that is interfering. My distro doesn't use systemd (and never will according to the developers). I use the Berkeley shell archive for BOINC since my distro doesn't (and wont) maintain it in their repo. All bar one of my machines run 7.2.42. I have downloaded and built 7.6.33 from source and it runs fine too (on one machine). One of these days I might get around to deploying it over the entire fleet. The changes in the user interface between the two are sufficiently and annoyingly different to discourage a piecemeal approach so I don't have much enthusiasm to get started. I'm a firm believer in, "If it ain't broke, don't fix it" :-).
Cheers,
Gary.
ping
)
ping
$
)
Hey CATERPILLAR, thanks for
)
Hey CATERPILLAR, thanks for all your work on Fedora. I've been crunching on Fedora since 1999.
I'm having the same problem (for over a year), but with the OSS AMDGPU drivers. I would love to help troubleshoot from my end. If E@H folks or you can point me in the right direction, I'd be happy to try something.