new version of Ubuntu Mate (19.10). Old problem of loading to GPU (I think)

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0
Topic 219759

Greetings

I have an AMD 7750 GPU , I was successfully running the GUP wu on Ubuntu Mate 18.04 and then 19.04. (Linux, not Windows)

I just installed the BETA version of Mate 19.10. I hit a bump with BOINC not seeing the GPU.

In the past, I have been able to use the 'standard' release packages - without trying to compile AMD code.

This time, BOINC 'saw' the GPU (found the opencl library file) when I installed: mesa-opencl-icd.

The BOINC log now looks like this:


Fri 11 Oct 2019 02:47:56 AM EDT |  | Starting BOINC client version 7.16.3 for x86_64-pc-linux-gnu
Fri 11 Oct 2019 02:47:56 AM EDT |  | log flags: file_xfer, sched_ops, task
Fri 11 Oct 2019 02:47:56 AM EDT |  | Libraries: libcurl/7.65.3 OpenSSL/1.1.1c zlib/1.2.11 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.39.2 librtmp/2.3
Fri 11 Oct 2019 02:47:56 AM EDT |  | Data directory: /var/lib/boinc-client
Fri 11 Oct 2019 02:47:56 AM EDT |  | OpenCL: AMD/ATI GPU 0: AMD VERDE (DRM 2.50.0, 5.3.0-17-generic, LLVM 9.0.0) (driver version 19.2.0, device version OpenCL 1.1 Mesa 19.2.0, 2048MB, 2048MB available, 512 GFLOPS peak)
Fri 11 Oct 2019 02:47:56 AM EDT |  | [libc detection] gathered: 2.30, Ubuntu GLIBC 2.30-0ubuntu2
Fri 11 Oct 2019 02:47:56 AM EDT |  | Host name: pc-14
Fri 11 Oct 2019 02:47:56 AM EDT |  | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2]
Fri 11 Oct 2019 02:47:56 AM EDT |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
Fri 11 Oct 2019 02:47:56 AM EDT |  | OS: Linux Ubuntu: Ubuntu Eoan Ermine (development branch) [5.3.0-17-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2)]
Fri 11 Oct 2019 02:47:56 AM EDT |  | Memory: 11.61 GB physical, 48.83 GB virtual
Fri 11 Oct 2019 02:47:56 AM EDT |  | Disk: 133.57 GB total, 125.44 GB free
Fri 11 Oct 2019 02:47:56 AM EDT |  | Local time is UTC -4 hours

 

BUT, alas, I get run time errors in about 30 seconds.

So I assume it is a loading problem.

Here is a sample of a failed task result..


<core_client_version>7.16.3</core_client_version> <![CDATA[ <message> process exited with code 11 (0xb, -245)</message> <stderr_txt>

02:10:01 (5107): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16 02:10:01 (5107): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.

02:10:01 (5107): [debug]: 1e+16 fp, 3.4e+09 fp/s, 3097269 s, 860h21m09s42 command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah1062L07.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.526880e-07 --ldiBins 30 --f0start 324.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah1062L07_0332_15029665.dat --debug 1 --device 0 -o LATeah1062L07_332.0_0_0.0_15029665_1_0.out output files: 'LATeah1062L07_332.0_0_0.0_15029665_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1062L07_332.0_0_0.0_15029665_1_0' 'LATeah1062L07_332.0_0_0.0_15029665_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1062L07_332.0_0_0.0_15029665_1_1'

02:10:01 (5107): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86 02:10:01 (5107): [debug]: glibc version/release: 2.30/stable

02:10:01 (5107): [debug]: Set up communication with graphics process. boinc_get_opencl_ids returned [0x2c54d28 , 0x7f1b05d401e0] Using OpenCL platform provided by: Mesa Using OpenCL device "AMD VERDE (DRM 2.50.0, 5.3.0-17-generic, LLVM 9.0.0)" by: AMD Max allocation limit: 1503238553 Global mem size: 2147483648 OpenCL device has FP64 support % Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah1062L07.dat

% Total amount of photon times: 8950 % Preparing toplist of length: 10 % Read 1631 binary points read_checkpoint(): Couldn't open file 'LATeah1062L07_332.0_0_0.0_15029665_1_0.out.cpt': No such file or directory (2)

% fft_size: 16777216 (0x1000000); alloc: 67108872 % Sky point 1/1

% Binary point 1/1631 % Creating FFT plan. % fft length: 16777216 (0x1000000)

% Scratch buffer size: 136314880 % Starting semicoherent search over f0 and f1.

% nf1dots: 41 df1dot: 2.512676418e-15 f1dot_start: -1e-13 f1dot_band: 1e-13

% Filling array of photon pairs ac_rtld error: shdr->sh_size & 3 ELF error: invalid section index -- signal handler called: signal 1 4 stack frames obtained for this thread:

Frame 14: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48b101)

Source file: hs_boinc_extras.c (Function: sighandler / Line: 291)

Frame 13:

Binary file: /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so (0x7f1affad4e21)

Offset info: +0x14ce21

Frame 12:

Binary file: /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so (0x7f1affad4e21)

Offset info: +0x14ce21

Frame 11:

Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b319b)

Offset info: +0x37319b

Frame 10:

Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b3bbf)

Offset info: +0x373bbf

Frame 9:

Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b0815)

Offset info: +0x370815

Frame 8:

Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b0fa3)

Offset info: +0x370fa3

Frame 7:

Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b0499f39d)

Offset info: +0x35f39d

Frame 6:

Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48fe01)

Offset info: opencl_setup_photon_pairs_array+0x4c1

Source file: unknown (Function: opencl_setup_photon_pairs_array / Line: 0)

Frame 5:

Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x47480d)

Offset info: setup_photon_pairs_array+0x36d

Source file: HSgammaPulsar.c (Function: setup_photon_pairs_array / Line: 2107)

Frame 4:

Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x47e28e)

Offset info: MAIN+0x4dee

Source file: HSgammaPulsar.c (Function: MAIN / Line: 4866)

Frame 3:

Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x46c06f)

Offset info: main+0x5ff

Source file: hs_boinc_extras.c (Function: worker / Line: 833) Source file: hs_boinc_extras.c (Function: main / Line: 1039)

Frame 2:

Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f1b05d871e3)

Offset info: __libc_start_main+0xf3

Frame 1:

Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x46e569)

Source file: unknown (Function: _start / Line: 0) End of stcaktrace

02:10:13 (5107): called boinc_finish Warning: Program terminating, but clFFT resources not freed. Please consider explicitly calling clfftTeardown( ). </stderr_txt> ]]>

(see https://einsteinathome.org/task/888883349  )

 

Any suggestions?

Thanks in  advance,

Jay

Aaron Puchert
Aaron Puchert
Joined: 30 May 14
Posts: 13
Credit: 2651954
RAC: 0

I have the same issue on

I have the same issue on openSUSE Tumbleweed. It's an error in the new "ac_rtld" runtime linker in Mesa that came with https://patchwork.freedesktop.org/patch/303185/. This linker takes the machine code generated from possibly multiple source code files and turns it into one program.

The linker seems to stumble upon a section that contains executable code and has a size that isn't a multiple of four bytes. (See https://gitlab.freedesktop.org/mesa/mesa/blob/mesa-19.2.3/src/amd/common/ac_rtld.c#L369. I can only guess that instructions are always four bytes long.) But we also get an error message from libelf "invalid section index", probably from elf_strptr, but I'm just guessing here.

The best way to proceed is probably to ask on the mesa-users mailing list or irc channel if you're not a programmer. I don't really have the time to investigate this now, otherwise I might do it myself.

Aaron Puchert
Aaron Puchert
Joined: 30 May 14
Posts: 13
Credit: 2651954
RAC: 0

It seems to me as if Mesa 20

It seems to me as if Mesa 20 solves the problem, at least the tasks don't immediately error out.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.