All hsgamma Tasks Fail after Upgrade to Fedora 31 F31 (Linux,AMDGPU)

Paul
Paul
Joined: 3 May 07
Posts: 121
Credit: 1654517150
RAC: 20367
Topic 219891

My WU completion rate has been pretty consistent for the last couple of years, but I upgraded OS to F31 and E@H immediately started failing after that.  All outputs look the same, failing in LLVM-9.  Can some one interpret this for me?

I found a similar thread from three weeks ago that mentioned LIBC215 option?  I hadn't heard of that, but I tried toggling it, update, new WUs, but no joy. I don't have output from the latest failed WU, but this is one from before I tried LIBC215:

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
17:16:23 (4170162): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16

17:16:23 (4170162): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
17:16:23 (4170162): [debug]: 1e+16 fp, 7.5e+09 fp/s, 1401846 s, 389h24m05s78
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah1062L12.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.526880e-07 --ldiBins 30 --f0start 340.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah1062L12_0348_37827783.dat --debug 1 --device 0 -o LATeah1062L12_348.0_0_0.0_37827783_0_0.out
output files: 'LATeah1062L12_348.0_0_0.0_37827783_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1062L12_348.0_0_0.0_37827783_0_0' 'LATeah1062L12_348.0_0_0.0_37827783_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1062L12_348.0_0_0.0_37827783_0_1'
17:16:23 (4170162): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
17:16:23 (4170162): [debug]: glibc version/release: 2.30/stable
17:16:23 (4170162): [debug]: Set up communication with graphics process.

-- signal handler called: signal 1
3 stack frames obtained for this thread:
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48b101)
Source file: hs_boinc_extras.c (Function: sighandler / Line: 291)
Frame 2:
Binary file: /lib64/libLLVM-9.so (0x7ff1bf0beca3)
Offset info: +0x3785ca3
Frame 1:
Binary file: /lib64/libLLVM-9.so (0x7ff1bf0beca3)
Offset info: +0x3785ca3

End of stcaktrace
17:16:23 (4170162): called boinc_finish

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Please unhide your computers

Please unhide your computers or post a link to the troubled host.

Paul
Paul
Joined: 3 May 07
Posts: 121
Credit: 1654517150
RAC: 20367

What are you looking for, is

What are you looking for, is this sufficient?

 

Created: 6 Feb 2016 5:57:42 UTC
Total credit: 89,263,749
Average credit: 51,378.48
CPU type: AuthenticAMD AMD Ryzen 7 3700X 8-Core Processor [Family 23 Model 113 Stepping 0]
Number of processors: 16
Coprocessors: AMD AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.33.0, 5.3.6-300.fc31.x86_64, LLVM 9.0.0) (8192MB)
Operating system: Linux Fedora Fedora release 31 (Thirty One) [5.3.6-300.fc31.x86_64|libc 2.30 (GNU libc)]
BOINC client version: 7.14.2
Memory: 16023.15 MiB
Cache: 512 KiB
Swap space: 5120 MiB
Total disk space: 811.93 GiB
Free disk space: 167.05 GiB
Measured floating point speed: 7490.12 million ops/sec
Measured integer speed: 28631.96 million ops/sec
Average upload rate: 21.67 KiB/sec
Average download rate: 81.77 KiB/sec
Average turnaround time: 0.85 days
 
Number of times client has contacted server: 14851
 
% of time BOINC client is running: 98.6165 %
While BOINC running, % of time work is allowed: 99.4494 %
Task duration correction factor: 1.252552
Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Actually not really sure what

Actually not really sure what I'm looking for but going through the latest tasks (failed or not), scheduler logs and system specs might point to something. And if I'm not able to, as a mostly Windows user, then others might see something amiss and have advice for you to try.
If you're concerned about what will be shown then click on my alias and browse my computers, I think it's harmless to show my computers here on the site.

As for trying to solve your problem start by making sure that the graphics driver and especially OpenCL is installed correctly. Other than that I'll have to defer to one of the resident Linux specialists.

Paul
Paul
Joined: 3 May 07
Posts: 121
Credit: 1654517150
RAC: 20367

Thanks Holmis, much

Thanks Holmis, much appreciated. Sent you my host link.

I am using the OSS driver and OpenCL stack so I didn't have to install anything, which is not to say it's not broken, only that I didn't break it, knowingly.  SETI@Home has been broken for 5 years, now, after working for 3; so, who knows.  Just hoping that someone has an idea why this is happening all of a sudden.  It's very likely that the stack was updated, but the main problem remains that, when there are problems in the stack, it's impossible, AFAIK, to troubleshoot.  I just rely on folks recognizing the error or knowing from some upstream source that there is a bug and I can check to see if that bug exists on my system.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109974520313
RAC: 29659463

Paul wrote:My WU completion

Paul wrote:
My WU completion rate has been pretty consistent for the last couple of years, but I upgraded OS to F31 and E@H immediately started failing after that.  All outputs look the same, failing in LLVM-9.  Can some one interpret this for me?

Nope, I can't :-).

Let's summarise things.  Correct me if I get it wrong.  You had an older version of the OS (presumaby Fedora) and you upgraded to the latest.  Do you remember what you had to do with the older install to get OpenCL libs installed and everything working properly?  Did you repeat this procedure for the latest install, using the latest (presumably compatible) versions of OpenCL?  You don't mention what GPU you are using and your computers are hidden so I can't look to see.  Could it be that your GPU isn't supported any more by the latest drivers?  What exactly is your GPU?

My guess is that there is some sort of incompatibility between the OS and/or the OpenCL libs you have installed and/or your hardware.

I'm running lots of AMD GPUs on a Linux (PCLinuxOS) which is not supported directly by the stuff that AMD supplies in the AMDGPU-PRO package.  The supported systems are Red Hat, Ubuntu, and OpenSUSE.  Does the Red Hat version also support Fedora?  Do the Fedora maintainers provide their own version of that package?  Have you asked about this on Fedora Forums?

Here is a summary of what I do with relatively modern GPUs, seeing as I have a totally unsupported OS.  PCLOS is RPM based so I download the Red Hat version of AMDGPU-PRO and extract the contents (approx 50 separate RPMs).  I select about 5 of those that contain libs that I need - mainly pertaining to OpenCL.  I install these bits under two main paths - /opt/amdgpu/ and /opt/amdgpu-pro/.   I could give you a list of the RPMs I use and the filenames of everything I install.  It's not that large a list.

There are a couple of other bits that go elsewhere (eg. under /etc/).  I make sure that BOINC can find what it needs by setting the LD_LIBRARY_PATH environment variable

LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64:/opt/ampgpu/lib64

in the script I use to launch BOINC.  If you have OpenCL properly installed, you should be able to run clinfo and have it report the OpenCL capabilities of your GPU.  Have you tried running clinfo to see what it says?  Here is a small extract from a terminal session where I run clinfo (it's not in my $PATH) with the LD_LIBRARY_PATH set and just grab the first 15 lines to see if all looks OK.  I do this after a new install just to make sure everything looks OK.

[gary@i3-9100-01 ~]$ cd /opt/amdgpu-pro/bin

[gary@i3-9100-01 bin]$ LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64:/opt/amdgpu/lib64 ./clinfo | head -15

Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (2671.3)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Radeon RX 570 Series
  Device Topology:                               PCI[ B#1, D#0, F#0 ]
  Max compute units:                             32

[gary@i3-9100-01 bin]$

I've been installing the OpenCL libs this way since the 16.60 version of the amdgpu-pro package in late 2016.  There have been changes along the way but I've been able to work out what to do to handle the differences.  The latest version I've tried is 19.10.  There doesn't seem to be any real difference in crunching performance with the different versions of the libs.  There have been reliability benefits by keeping up with the latest versions of the amdgpu graphics driver that's built in to the kernel.  To get these benefits, you need to be running relatively recent kernels.

One thing you could do to start with is post a copy of all the event log messages you get when you launch BOINC.  That should provide some information about what BOINC thinks of the OpenCL capabilities of your GPU.  Here is an example of what I see on one of mine.

03-Nov-2019 20:12:28 [---] cc_config.xml not found - using defaults
03-Nov-2019 20:12:28 [---] Starting BOINC client version 7.15.0 for x86_64-pc-linux-gnu
03-Nov-2019 20:12:28 [---] This a development version of BOINC and may not function properly
03-Nov-2019 20:12:28 [---] log flags: file_xfer, sched_ops, task
03-Nov-2019 20:12:28 [---] Libraries: libcurl/7.66.0 OpenSSL/1.1.0k zlib/1.2.11 brotli/1.0.7 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) librtmp/2.3
03-Nov-2019 20:12:28 [---] Running as a daemon
03-Nov-2019 20:12:28 [---] Data directory: /home/gary/BOINC
03-Nov-2019 20:12:29 [---] OpenCL: AMD/ATI GPU 0: Radeon RX 570 Series (driver version 2671.3, device version OpenCL 1.2 AMD-APP (2671.3), 3980MB, 3980MB available, 5095 GFLOPS peak)
03-Nov-2019 20:12:29 [---] [libc detection] gathered: 2.30, GNU libc
03-Nov-2019 20:12:29 [---] Host name: i3-9100-01
03-Nov-2019 20:12:29 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i3-9100F CPU @ 3.60GHz [Family 6 Model 158 Stepping 10]
03-Nov-2019 20:12:29 [---] Processor features:  ### < edited out to save space> ###
03-Nov-2019 20:12:29 [---] OS: Linux PCLinuxOS: PCLinuxOS 2019 [5.3.6-pclos1|libc 2.30 (GNU libc)]
03-Nov-2019 20:12:29 [---] Memory: 7.72 GB physical, 2.00 GB virtual
03-Nov-2019 20:12:29 [---] Disk: 100.14 GB total, 94.75 GB free
03-Nov-2019 20:12:29 [---] Local time is UTC +10 hours
03-Nov-2019 20:12:29 [Einstein@Home] Found app_config.xml
03-Nov-2019 20:12:29 [---] Config: GUI RPCs allowed from:
03-Nov-2019 20:12:29 [---]     192.168.0.2
03-Nov-2019 20:12:29 [---]     192.168.0.3
03-Nov-2019 20:12:29 [---]     192.168.0.4
03-Nov-2019 20:12:29 [Einstein@Home] General prefs: from Einstein@Home (last modified ---)
03-Nov-2019 20:12:29 [Einstein@Home] Computer location: school
03-Nov-2019 20:12:29 [---] General prefs: using separate prefs for school
03-Nov-2019 20:12:29 [---] Reading preferences override file
03-Nov-2019 20:12:29 [---] Preferences:
03-Nov-2019 20:12:29 [---]    max memory usage when active: 7509.97 MB
03-Nov-2019 20:12:29 [---]    max memory usage when idle: 7905.23 MB
03-Nov-2019 20:12:29 [---]    max disk usage: 20.00 GB
03-Nov-2019 20:12:29 [---]    (to change preferences, visit a project web site or select Preferences in the Manager)
03-Nov-2019 20:12:29 [---] Setting up project and slot directories
03-Nov-2019 20:12:29 [---] Checking active tasks
03-Nov-2019 20:12:29 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 541492; resource share 900
03-Nov-2019 20:12:29 [Einstein@Home] Your settings do not allow fetching tasks for CPU.  To fix this, you can change Project Preferences on the project's web site.
03-Nov-2019 20:12:29 [---] Setting up GUI RPC socket
03-Nov-2019 20:12:29 [---] Checking presence of 908 project files

 

Cheers,
Gary.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

I'd started composing a few

I'd started composing a few thought on this problem but see that Gary has given a much better reply that I ever could when it comes to Linux, so I'll leave you in his capable care. Wink

Paul
Paul
Joined: 3 May 07
Posts: 121
Credit: 1654517150
RAC: 20367

Hey Gary! I think you and I

Hey Gary!

I think you and I have been around this block once before.  I'm using all OSS stack, and I have not been able to get they hybrid system you described working on my system.  My computers are no longer hidden, if you want to look at that stuff.

The OSS stack only seems to get better. Now, even the clpeak and clinfo work all the time, when they used to give errors or show missing pieces.  So, I'm more reluctant than before to go to the -PRO.  I keep up with the kernels every week and I know Fedora stays a bit ahead of Debian/Ubuntu on that front.

BOINC log:

 03-Nov-2019 17:57:54 [---] Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu
 03-Nov-2019 17:57:54 [---] log flags: file_xfer, sched_ops, task, coproc_debug
 03-Nov-2019 17:57:54 [---] Libraries: libcurl/7.66.0 OpenSSL/1.1.1d-fips zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.0/openssl/zlib nghttp2/1.39.2
 03-Nov-2019 17:57:54 [---] Data directory: /home/pdestefa/local/BOINC
 03-Nov-2019 17:57:55 [---] [coproc] launching child process at /home/pdestefa/local/BOINC/boinc
 03-Nov-2019 17:57:55 [---] [coproc] with data directory /home/pdestefa/local/BOINC
 03-Nov-2019 17:57:55 [---] OpenCL: AMD/ATI GPU 0: AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.33.0, 5.3.7-301.fc31.x86_64, LLVM 9.0.0) (driver version 19.2.2, device version OpenCL 1.1 Mesa 19.2.2, 8192MB, 8192MB available, 3709 GFLOPS peak)
 03-Nov-2019 17:57:55 [---] OpenCL CPU: pthread-AMD Ryzen 7 3700X 8-Core Processor (OpenCL driver vendor: The pocl project, driver version 1.5-pre, device version OpenCL 1.2 pocl HSTR: pthread-x86_64-unknown-linux-gnu-znver1)
 03-Nov-2019 17:57:55 [---] [coproc] NVIDIA: libcuda.so: cannot open shared object file: No such file or directory
 03-Nov-2019 17:57:55 [---] [coproc] ATI: libaticalrt.so: cannot open shared object file: No such file or directory
 03-Nov-2019 17:57:55 [---] [libc detection] gathered: 2.30, GNU libc
 03-Nov-2019 17:57:55 [---] Host name: wrangler
 03-Nov-2019 17:57:55 [---] Processor: 16 AuthenticAMD AMD Ryzen 7 3700X 8-Core Processor [Family 23 Model 113 Stepping 0]

clinfo:

 Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 19.2.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 1.2 pocl 1.5-pre, RelWithDebInfo, LLVM 9.0.0, RELOC, SLEEF, DISTRO, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix POCL

Platform Name Clover
Number of devices 1
Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.33.0, 5.3.7-301.fc31.x86_64, LLVM 9.0.0)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 19.2.2
Driver Version 19.2.2
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Max compute units 36
Max clock frequency 1288MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 8589934592 (8GiB)
Error Correction support No
Max memory allocation 6871947673 (6.4GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 32768 bits (4096 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max number of constant args 16
Max constant buffer size 2147483647 (2GiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

Platform Name Portable Computing Language
Number of devices 1
Device Name pthread-AMD Ryzen 7 3700X 8-Core Processor
Device Vendor AuthenticAMD
Device Vendor ID 0x6c636f70
Device Version OpenCL 1.2 pocl HSTR: pthread-x86_64-unknown-linux-gnu-znver1
Driver Version 1.5-pre
Device OpenCL C Version OpenCL C 1.2 pocl
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 16
Max clock frequency 3600MHz
Device Partition (core)
Max number of sub-devices 16
Supported partition types equally, by counts
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Preferred work group size multiple 8
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 8 / 8
long 4 / 4
half 0 / 0 (n/a)
float 8 / 8
double 4 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 14654013440 (13.65GiB)
Error Correction support No
Max memory allocation 4294967296 (4GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16777216 (16MiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 268435456 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Local memory type Global
Local memory size 8388608 (8MiB)
Max number of constant args 8
Max constant buffer size 8388608 (8MiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
printf() buffer size 16777216 (16MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA]
clCreateContext(NULL, ...) [default] Success [MESA]
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Clover
Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.33.0, 5.3.7-301.fc31.x86_64, LLVM 9.0.0)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Clover
Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.33.0, 5.3.7-301.fc31.x86_64, LLVM 9.0.0)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Clover
Device Name AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.33.0, 5.3.7-301.fc31.x86_64, LLVM 9.0.0)

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2

 

Sorry, couldn't seem to get preformatted text to work for the whole post.
Paul
Paul
Joined: 3 May 07
Posts: 121
Credit: 1654517150
RAC: 20367

Hey Gary!  I tried your

Hey Gary!  I tried your method again, and now it seems all OpenCL is broken on my system.  clinfo crashes, where as, before, it worked fine.  BOINC cannot find any OpenCL devices.

Here is what I did. I got all the AMD driver stuff and ran amd-install, which installs just the non-PRO stuff.  That worked for all the packages...except two.

amdgpu-core-19.30-934563

amdgpu-dkms

Now, as far as I can tell, these are not the important packages in the "hybrid" system you described. Is that right?

Second, the install did not install any -PRO things, but I see that you are adding -pro/ files to your LD path. So, I'm not sure I installed all the same things you did.  You said 50 pkgs?!  I just want to double check: you installed both non-PRO and PRO libraries, but *not* the actual driver or dkms packages.  Is that right?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109974520313
RAC: 29659463

To answer your specific

To answer your specific questions, I studied the install scripts provided by AMD to work out the absolute minimum to install to provide OpenCL (eg. things pointed to by --headless or --compute type options).  I found what I needed in a very limited set of rpms - from memory just 5 - and some of that may not have been required.

It's probably going to take me a while to document everything I have done so it will be in a form useful to you.  This is just a preliminary response to let you know that I'm working on it.  I do understand quite a bit more about what I'm doing now, so hopefully I can explain things a bit better this time on the merry-go-round :-).

As a teaser, you might be interested to know that I'm now able to choose any version of the OpenCL compute libs right up to the very latest 19.30 package that AMD released late last year.  With just a few components from that 19.30 package, and with the latest kernels/amdgpu modules (I'm using a 5.4.6 kernel), I can now run on *any* GCN GPU including my swag of GCN 1st Gen (Southern Islands) GPUs.  The hosts with these GPUs were running a mid-2016 version of PCLOS which had the last version of Xorg that supported the proprietary fglrx/OpenCL components from the final Catalyst package that AMD released before deprecating fglrx.

I've been documenting my attempts to get valid results on SI series GPUs in this thread.  I started the story around the middle of last year when I was able to start using the 19.10 version of the former AMDGPU-PRO package for testing.  Just recently, I've updated that thread with the success story, now that I've moved to the latest of everything, including the 19.30 version of AMDGPU-PRO (Radeon Software for Linux).  Links, particularly in the opening post, are not likely to show much since, once confirmed working well, machines are being shut down again until the autumn.  Tasks referred to in earlier times are no longer going to exist in the current online database.  Current tasks will tend to disappear quite quickly after a machine is shut down.

Cheers,
Gary.

Paul
Paul
Joined: 3 May 07
Posts: 121
Credit: 1654517150
RAC: 20367

Okay, I follow that.  I can

Okay, I follow that.  I can read the scripts, too.  Thanks for explaining.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.