Greetings
I have an AMD 7750 GPU , I was successfully running the GUP wu on Ubuntu Mate 18.04 and then 19.04. (Linux, not Windows)
I just installed the BETA version of Mate 19.10. I hit a bump with BOINC not seeing the GPU.
In the past, I have been able to use the 'standard' release packages - without trying to compile AMD code.
This time, BOINC 'saw' the GPU (found the opencl library file) when I installed: mesa-opencl-icd.
The BOINC log now looks like this:
Fri 11 Oct 2019 02:47:56 AM EDT | | Starting BOINC client version 7.16.3 for x86_64-pc-linux-gnu Fri 11 Oct 2019 02:47:56 AM EDT | | log flags: file_xfer, sched_ops, task Fri 11 Oct 2019 02:47:56 AM EDT | | Libraries: libcurl/7.65.3 OpenSSL/1.1.1c zlib/1.2.11 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.39.2 librtmp/2.3 Fri 11 Oct 2019 02:47:56 AM EDT | | Data directory: /var/lib/boinc-client Fri 11 Oct 2019 02:47:56 AM EDT | | OpenCL: AMD/ATI GPU 0: AMD VERDE (DRM 2.50.0, 5.3.0-17-generic, LLVM 9.0.0) (driver version 19.2.0, device version OpenCL 1.1 Mesa 19.2.0, 2048MB, 2048MB available, 512 GFLOPS peak) Fri 11 Oct 2019 02:47:56 AM EDT | | [libc detection] gathered: 2.30, Ubuntu GLIBC 2.30-0ubuntu2 Fri 11 Oct 2019 02:47:56 AM EDT | | Host name: pc-14 Fri 11 Oct 2019 02:47:56 AM EDT | | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2] Fri 11 Oct 2019 02:47:56 AM EDT | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold Fri 11 Oct 2019 02:47:56 AM EDT | | OS: Linux Ubuntu: Ubuntu Eoan Ermine (development branch) [5.3.0-17-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2)] Fri 11 Oct 2019 02:47:56 AM EDT | | Memory: 11.61 GB physical, 48.83 GB virtual Fri 11 Oct 2019 02:47:56 AM EDT | | Disk: 133.57 GB total, 125.44 GB free Fri 11 Oct 2019 02:47:56 AM EDT | | Local time is UTC -4 hours
BUT, alas, I get run time errors in about 30 seconds.
So I assume it is a loading problem.
Here is a sample of a failed task result..
<core_client_version>7.16.3</core_client_version> <![CDATA[ <message> process exited with code 11 (0xb, -245)</message> <stderr_txt> 02:10:01 (5107): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16 02:10:01 (5107): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'. 02:10:01 (5107): [debug]: 1e+16 fp, 3.4e+09 fp/s, 3097269 s, 860h21m09s42 command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah1062L07.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.526880e-07 --ldiBins 30 --f0start 324.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah1062L07_0332_15029665.dat --debug 1 --device 0 -o LATeah1062L07_332.0_0_0.0_15029665_1_0.out output files: 'LATeah1062L07_332.0_0_0.0_15029665_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1062L07_332.0_0_0.0_15029665_1_0' 'LATeah1062L07_332.0_0_0.0_15029665_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1062L07_332.0_0_0.0_15029665_1_1' 02:10:01 (5107): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86 02:10:01 (5107): [debug]: glibc version/release: 2.30/stable 02:10:01 (5107): [debug]: Set up communication with graphics process. boinc_get_opencl_ids returned [0x2c54d28 , 0x7f1b05d401e0] Using OpenCL platform provided by: Mesa Using OpenCL device "AMD VERDE (DRM 2.50.0, 5.3.0-17-generic, LLVM 9.0.0)" by: AMD Max allocation limit: 1503238553 Global mem size: 2147483648 OpenCL device has FP64 support % Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah1062L07.dat % Total amount of photon times: 8950 % Preparing toplist of length: 10 % Read 1631 binary points read_checkpoint(): Couldn't open file 'LATeah1062L07_332.0_0_0.0_15029665_1_0.out.cpt': No such file or directory (2) % fft_size: 16777216 (0x1000000); alloc: 67108872 % Sky point 1/1 % Binary point 1/1631 % Creating FFT plan. % fft length: 16777216 (0x1000000) % Scratch buffer size: 136314880 % Starting semicoherent search over f0 and f1. % nf1dots: 41 df1dot: 2.512676418e-15 f1dot_start: -1e-13 f1dot_band: 1e-13 % Filling array of photon pairs ac_rtld error: shdr->sh_size & 3 ELF error: invalid section index -- signal handler called: signal 1 4 stack frames obtained for this thread: Frame 14: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48b101) Source file: hs_boinc_extras.c (Function: sighandler / Line: 291) Frame 13: Binary file: /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so (0x7f1affad4e21) Offset info: +0x14ce21 Frame 12: Binary file: /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so (0x7f1affad4e21) Offset info: +0x14ce21 Frame 11: Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b319b) Offset info: +0x37319b Frame 10: Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b3bbf) Offset info: +0x373bbf Frame 9: Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b0815) Offset info: +0x370815 Frame 8: Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b049b0fa3) Offset info: +0x370fa3 Frame 7: Binary file: /lib/x86_64-linux-gnu/libMesaOpenCL.so.1 (0x7f1b0499f39d) Offset info: +0x35f39d Frame 6: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48fe01) Offset info: opencl_setup_photon_pairs_array+0x4c1 Source file: unknown (Function: opencl_setup_photon_pairs_array / Line: 0) Frame 5: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x47480d) Offset info: setup_photon_pairs_array+0x36d Source file: HSgammaPulsar.c (Function: setup_photon_pairs_array / Line: 2107) Frame 4: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x47e28e) Offset info: MAIN+0x4dee Source file: HSgammaPulsar.c (Function: MAIN / Line: 4866) Frame 3: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x46c06f) Offset info: main+0x5ff Source file: hs_boinc_extras.c (Function: worker / Line: 833) Source file: hs_boinc_extras.c (Function: main / Line: 1039) Frame 2: Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f1b05d871e3) Offset info: __libc_start_main+0xf3 Frame 1: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x46e569) Source file: unknown (Function: _start / Line: 0) End of stcaktrace 02:10:13 (5107): called boinc_finish Warning: Program terminating, but clFFT resources not freed. Please consider explicitly calling clfftTeardown( ). </stderr_txt> ]]>
(see https://einsteinathome.org/task/888883349 )
Any suggestions?
Thanks in advance,
Jay
Copyright © 2024 Einstein@Home. All rights reserved.
I have the same issue on
)
I have the same issue on openSUSE Tumbleweed. It's an error in the new "ac_rtld" runtime linker in Mesa that came with https://patchwork.freedesktop.org/patch/303185/. This linker takes the machine code generated from possibly multiple source code files and turns it into one program.
The linker seems to stumble upon a section that contains executable code and has a size that isn't a multiple of four bytes. (See https://gitlab.freedesktop.org/mesa/mesa/blob/mesa-19.2.3/src/amd/common/ac_rtld.c#L369. I can only guess that instructions are always four bytes long.) But we also get an error message from libelf "invalid section index", probably from elf_strptr, but I'm just guessing here.
The best way to proceed is probably to ask on the mesa-users mailing list or irc channel if you're not a programmer. I don't really have the time to investigate this now, otherwise I might do it myself.
It seems to me as if Mesa 20
)
It seems to me as if Mesa 20 solves the problem, at least the tasks don't immediately error out.