GPU workload causes computer to freeze after GPU upgrade

derek
derek
Joined: 20 Dec 15
Posts: 5
Credit: 56994638
RAC: 0
Topic 218302

Hey Guys,

I recently picked up a new AMD Radeon VII as an upgrade from my previous two RX480s.  As soon as Einstein starts work on the GPU it locks up the whole machine.  I think it may be a driver issue, as the GPU does fine under other workload like gaming that use a different driver.  I'm not new to Einstien@home but I am new to posting to the forums so let me know what information I can provide.  Below is some information I hope someone will find helpful.

OS:  Arch Linux
Kernel:  4.20.13
CPU:  AMD Ryzen 7 1700X
RAM:  32GB
GPU:  AMD Radeon VII
Drivers:  Mesa 18.3.4

Occasionally, I'll see an error that mentions the GPU is not configure to be reset or something like that.  I've been doing some researching online and I sound like this reset option will be enabled by default in Kernel 4.21 or 5.0.  Here is the end of the system log.

Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] General prefs: from Einstein@Home (last modified 13-> Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] Host location: none Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] General prefs: using your defaults Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Reading preferences override file Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Preferences: Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max memory usage when active: 16081.74 MB Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max memory usage when idle: 25730.79 MB Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max disk usage: 32.00 GB Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Number of usable CPUs has changed from 16 to 13. Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max CPUs used: 13 Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    don't use GPU while active Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    suspend work if non-BOINC CPU load exceeds 25% Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    (to change preferences, visit a project web site or select > Mar 02 19:44:16 kludge boinc[1836]: No protocol specified Mar 02 19:44:16 kludge boinc[1836]: 02-Mar-2019 19:44:16 [---] Resuming GPU computation Mar 02 19:44:16 kludge boinc[1836]: No protocol specified Mar 02 19:44:17 kludge boinc[1836]: No protocol specified Mar 02 19:44:18 kludge boinc[1836]: No protocol specified Mar 02 19:44:19 kludge boinc[1836]: No protocol specified Mar 02 19:44:21 kludge boinc[1836]: No protocol specified Mar 02 19:44:22 kludge boinc[1836]: No protocol specified Mar 02 19:44:23 kludge boinc[1836]: No protocol specified Mar 02 19:44:24 kludge boinc[1836]: No protocol specified

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109400029981
RAC: 35657092

derek wrote:Hey Guys, I

derek wrote:

Hey Guys,

I recently picked up a new AMD Radeon VII as an upgrade from my previous two RX480s.  As soon as Einstein starts work on the GPU it locks up the whole machine.  I think it may be a driver issue, as the GPU does fine under other workload like gaming that use a different driver.  I'm not new to Einstien@home but I am new to posting to the forums so let me know what information I can provide.  Below is some information I hope someone will find helpful.

OS:  Arch Linux
Kernel:  4.20.13
CPU:  AMD Ryzen 7 1700X
RAM:  32GB
GPU:  AMD Radeon VII
Drivers:  Mesa 18.3.4

Occasionally, I'll see an error that mentions the GPU is not configure to be reset or something like that.  I've been doing some researching online and I sound like this reset option will be enabled by default in Kernel 4.21 or 5.0.  Here is the end of the system log.


Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] General prefs: from Einstein@Home (last modified 13->
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] Host location: none
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] General prefs: using your defaults
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Reading preferences override file
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Preferences:
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max memory usage when active: 16081.74 MB
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max memory usage when idle: 25730.79 MB
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max disk usage: 32.00 GB
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Number of usable CPUs has changed from 16 to 13.
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    max CPUs used: 13
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    don't use GPU while active
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    suspend work if non-BOINC CPU load exceeds 25%
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---]    (to change preferences, visit a project web site or select >
Mar 02 19:44:16 kludge boinc[1836]: No protocol specified
Mar 02 19:44:16 kludge boinc[1836]: 02-Mar-2019 19:44:16 [---] Resuming GPU computation
Mar 02 19:44:16 kludge boinc[1836]: No protocol specified
Mar 02 19:44:17 kludge boinc[1836]: No protocol specified
Mar 02 19:44:18 kludge boinc[1836]: No protocol specified
Mar 02 19:44:19 kludge boinc[1836]: No protocol specified
Mar 02 19:44:21 kludge boinc[1836]: No protocol specified
Mar 02 19:44:22 kludge boinc[1836]: No protocol specified
Mar 02 19:44:23 kludge boinc[1836]: No protocol specified
Mar 02 19:44:24 kludge boinc[1836]: No protocol specified

Hi Derek,

The best way to post log snips is to enclose them in BBCode code tags.  You can also use font and size tags to control what the log excerpt looks like - particularly useful for longer lines of data in columns.  As an example, I've added the tags to your original message and have reproduced it in full so you can see how much easier it is to read.

 I presume that the above is part of the startup messages you see in the event log when you launch the client.  The more useful information will be the very start of the log where the GPU detection is performed and the OpenCL detection confirms that usable OpenCL libs are installed.  Perhaps you could post everything that comes before the above in the log.

The file stdoutdae.txt in the client directory should contain all those lines.  The lines from any startup of the client will do.  I've never seen "No protocol specified" messages before.  I don't know if those are part of 'standard' BOINC or if perhaps it's something added to the Arch version of the client or perhaps it's something else entirely.  Have you tried running clinfo to see what that says about the GPU?  How does Arch handle the installation of the OpenCL libs?  Am I correct in presuming that you had work from when you were running the RX 480s and it's this same work that is failing after you have swapped to the new card?  If so, that would suggest there is a missing driver component that you might have to do some research on.

I just want to confirm that the OpenCL capabilities of the card are being properly detected.  Perhaps the problem might be related to that.  I run a lot of AMD GPUs on Linux (not Arch) so I'm quite interested to see how the new Radeon VII goes.  It seems to be quite a hit here under Windows :-) - (based only on a very early report).  Maybe you might get better help on the Arch forums.  I don't think anyone here has got this card crunching under Linux.

 

Cheers,
Gary.

derek
derek
Joined: 20 Dec 15
Posts: 5
Credit: 56994638
RAC: 0

Thanks Gary, Here is the

Thanks Gary,

 

Here is the output from clinfo.

Number of platforms                               1   Platform Name                                   Clover   Platform Vendor                                 Mesa   Platform Version                                OpenCL 1.1 Mesa 18.3.4   Platform Profile                                FULL_PROFILE   Platform Extensions                             cl_khr_icd   Platform Extensions function suffix             MESA  Platform Name                                   Clover Number of devices                                 1   Device Name                                     AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1)   Device Vendor                                   AMD   Device Vendor ID                                0x1002   Device Version                                  OpenCL 1.1 Mesa 18.3.4   Driver Version                                  18.3.4   Device OpenCL C Version                         OpenCL C 1.1   Device Type                                     GPU   Device Profile                                  FULL_PROFILE   Device Available                                Yes   Compiler Available                              Yes   Max compute units                               60   Max clock frequency                             1802MHz   Max work item dimensions                        3   Max work item sizes                             256x256x256   Max work group size                             256   Preferred work group size multiple              64   Preferred / native vector sizes                      char                                                16 / 16           short                                                8 / 8            int                                                  4 / 4            long                                                 2 / 2            half                                                 8 / 8        (cl_khr_fp16)     float                                                4 / 4            double                                               2 / 2        (cl_khr_fp64)   Half-precision Floating-point support           (cl_khr_fp16)     Denormals                                     No     Infinity and NANs                             Yes     Round to nearest                              Yes     Round to zero                                 No     Round to infinity                             No     IEEE754-2008 fused multiply-add               No     Support is emulated in software               No   Single-precision Floating-point support         (core)     Denormals                                     No     Infinity and NANs                             Yes     Round to nearest                              Yes     Round to zero                                 No     Round to infinity                             No     IEEE754-2008 fused multiply-add               No     Support is emulated in software               No     Correctly-rounded divide and sqrt operations  No   Double-precision Floating-point support         (cl_khr_fp64)     Denormals                                     Yes     Infinity and NANs                             Yes     Round to nearest                              Yes     Round to zero                                 Yes     Round to infinity                             Yes     IEEE754-2008 fused multiply-add               Yes     Support is emulated in software               No   Address bits                                    64, Little-Endian   Global memory size                              17163091968 (15.98GiB)   Error Correction support                        No   Max memory allocation                           13730473574 (12.79GiB)   Unified memory for Host and Device              No   Minimum alignment for any data type             128 bytes   Alignment of base address                       32768 bits (4096 bytes)   Global Memory cache type                        None   Image support                                   No   Local memory type                               Local   Local memory size                               32768 (32KiB)   Max number of constant args                     16   Max constant buffer size                        2147483647 (2GiB)   Max size of kernel argument                     1024   Queue properties                                     Out-of-order execution                        No     Profiling                                     Yes   Profiling timer resolution                      0ns   Execution capabilities                               Run OpenCL kernels                            Yes     Run native kernels                            No   Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16NULL platform behavior   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]   clCreateContext(NULL, ...) [default]            Success [MESA]   clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)     Platform Name                                 Clover     Device Name                                   AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1)   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)     Platform Name                                 Clover     Device Name                                   AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1)   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)     Platform Name                                 Clover     Device Name                                   AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1)ICD loader properties   ICD loader Name                                 OpenCL ICD Loader   ICD loader Vendor                               OCL Icd free software   ICD loader Version                              2.2.12   ICD loader Profile                              OpenCL 2.2

 

The contents of the stdoutgpudetext.txt are as follows

cc_config.xml not found - using defaults

Here is the journal log of the client starting showing it recognizes the Radeon VII (Vega 20) GPU.

Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Starting BOINC client version 7.12.1 for x86_64-pc-linux-gnu Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] log flags: file_xfer, sched_ops, task Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Libraries: libcurl/7.64.0 OpenSSL/1.1.1b zlib/1.2.11 libidn2/2> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Data directory: /var/lib/boinc Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] OpenCL: AMD/ATI GPU 0: AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] [libc detection] gathered: 2.28, GNU libc Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Host name: kludge Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Processor: 16 AuthenticAMD AMD Ryzen 7 1700X Eight-Core Proces> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic se> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] OS: Linux Arch Linux: Arch Linux [4.20.13-arch1-1-ARCH|libc 2.> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Memory: 31.41 GB physical, 0 bytes virtual Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Disk: 227.75 GB total, 198.72 GB free Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Local time is UTC -6 hours Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12763> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] General prefs: from Einstein@Home (last modified 13-> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] Host location: none Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] General prefs: using your defaults Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Reading preferences override file Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Preferences: Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---]    max memory usage when active: 16081.74 MB Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---]    max memory usage when idle: 25730.78 MB Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---]    max disk usage: 32.00 GB Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---]    max CPUs used: 13 Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---]    don't use GPU while active Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---]    suspend work if non-BOINC CPU load exceeds 25% Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---]    (to change preferences, visit a project web site or select > Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Setting up project and slot directories Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Checking active tasks Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Setting up GUI RPC socket Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Checking presence of 114 project files Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 Initialization completed Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Suspending GPU computation - computer is in use Mar 03 09:14:17 kludge org.gnome.Shell.desktop[5729]: Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message > Mar 03 09:14:17 kludge org.gnome.Shell.desktop[5729]: Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message > Mar 03 09:14:19 kludge systemd-timesyncd[719]: Synchronized to time server for the first time 107.155.79.108:123 (3.arch.pool> Mar 03 09:14:19 kludge gnome-software[5969]: libostree pull from 'flathub' for appstream2/x86_64 complete                                              security: GPG: summary+commit http: TLS                                              non-delta: meta: 2 content: 0                                              transfer: secs: 0 size: 791 bytes Mar 03 09:14:20 kludge gnome-software[5969]: libostree pull from 'flathub' for appstream2/x86_64 complete                                              security: GPG: summary+commit http: TLS                                              non-delta: meta: 5 content: 5                                              transfer: secs: 0 size: 1.7 MB Mar 03 09:14:20 kludge gnome-software[5969]: /var/tmp/flatpak-cache-DGGOXZ/repo-r5UPIy: Pulled appstream2/x86_64 from flathub Mar 03 09:14:20 kludge dbus-daemon[724]: [system] Activating via systemd: service name='org.freedesktop.Flatpak.SystemHelper'> Mar 03 09:14:20 kludge systemd[1]: Starting flatpak system helper... Mar 03 09:14:20 kludge dbus-daemon[724]: [system] Successfully activated service 'org.freedesktop.Flatpak.SystemHelper' Mar 03 09:14:20 kludge systemd[1]: Started flatpak system helper. Mar 03 09:14:20 kludge audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=flatpak-system-helper com> Mar 03 09:14:20 kludge kernel: audit: type=1130 audit(1551626060.371:67): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='uni> Mar 03 09:14:20 kludge flatpak-system-helper[10613]: system: Pulled appstream2/x86_64 from /var/tmp/flatpak-cache-DGGOXZ/repo> Mar 03 09:14:42 kludge boinc[6486]: No protocol specified Mar 03 09:14:42 kludge boinc[6486]: 03-Mar-2019 09:14:42 [---] Resuming GPU computation Mar 03 09:14:43 kludge boinc[6486]: No protocol specified

 

 

 

I've always read about the BOINC clients standard output file like you mentioned, but I've never actually seen it in real life....  I listed the contents of /var/lib/boinc and as you can see I don't have that file.

-rw-r--r-- 1 boinc boinc   3667 Feb 20 21:10 account_einstein.phys.uwm.edu.xml -rw-r--r-- 1 boinc boinc  57883 Feb 20 20:49 all_projects_list.xml -rw-r--r-- 1 boinc boinc 184089 Mar  3 09:13 client_state_prev.xml -rw-r--r-- 1 boinc boinc 184089 Mar  3 09:13 client_state.xml -rw-r--r-- 1 boinc boinc   1897 Mar  3 09:13 coproc_info.xml -rw-r--r-- 1 boinc boinc    364 Mar  3 09:11 daily_xfer_history.xml -rw-r--r-- 1 boinc boinc  12592 Feb 20 21:03 get_current_version.xml -rw-r--r-- 1 boinc boinc  14064 Feb 20 21:02 get_project_config.xml -rw-r--r-- 1 boinc boinc   1497 Mar  2 19:44 global_prefs_override.xml -rw-r--r-- 1 boinc boinc   1649 Feb 20 21:03 global_prefs.xml -rw-r----- 1 boinc boinc     32 Feb 20 20:49 gui_rpc_auth.cfg -rw-r--r-- 1 boinc boinc      0 Mar  3 09:12 lockfile -rw-r--r-- 1 boinc boinc    138 Feb 20 21:02 lookup_account.xml -rw-r--r-- 1 boinc boinc  14460 Feb 20 21:03 master_einstein.phys.uwm.edu.xml drwxrwx--x 2 boinc boinc   4096 Mar  3 09:13 notices drwxrwx--x 3 boinc boinc   4096 Feb 20 21:02 projects -rw-r--r-- 1 boinc boinc  90694 Feb 20 21:10 sched_reply_einstein.phys.uwm.edu.xml -rw-r--r-- 1 boinc boinc  13811 Feb 20 21:10 sched_request_einstein.phys.uwm.edu.xml drwxrwx--x 2 boinc boinc   4096 Mar  3 09:11 slots -rw-r--r-- 1 boinc boinc    435 Feb 20 21:10 statistics_einstein.phys.uwm.edu.xml -rw-r--r-- 1 boinc boinc      0 Feb 20 20:49 stderrgpudetect.txt -rw-r--r-- 1 boinc boinc    748 Mar  3 09:13 stdoutgpudetect.txt -rw-r--r-- 1 boinc boinc   2403 Mar  3 09:13 time_stats_log

 

 

This is the event log from the BOINC Manager GUI though, generally i find it mirrors what I can find with journalctl...

Sun 03 Mar 2019 09:12:46 AM CST |  | cc_config.xml not found - using defaults Sun 03 Mar 2019 09:13:10 AM CST |  | Starting BOINC client version 7.12.1 for x86_64-pc-linux-gnu Sun 03 Mar 2019 09:13:10 AM CST |  | log flags: file_xfer, sched_ops, task Sun 03 Mar 2019 09:13:10 AM CST |  | Libraries: libcurl/7.64.0 OpenSSL/1.1.1b zlib/1.2.11 libidn2/2.1.1 libpsl/0.20.2 (+libidn2/2.1.1) libssh2/1.8.0 nghttp2/1.36.0 Sun 03 Mar 2019 09:13:10 AM CST |  | Data directory: /var/lib/boinc Sun 03 Mar 2019 09:13:10 AM CST |  | OpenCL: AMD/ATI GPU 0: AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1) (driver version 18.3.4, device version OpenCL 1.1 Mesa 18.3.4, 16368MB, 16368MB available, 8650 GFLOPS peak) Sun 03 Mar 2019 09:13:10 AM CST |  | [libc detection] gathered: 2.28, GNU libc Sun 03 Mar 2019 09:13:10 AM CST |  | Host name: kludge Sun 03 Mar 2019 09:13:10 AM CST |  | Processor: 16 AuthenticAMD AMD Ryzen 7 1700X Eight-Core Processor [Family 23 Model 1 Stepping 1] Sun 03 Mar 2019 09:13:10 AM CST |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca Sun 03 Mar 2019 09:13:10 AM CST |  | OS: Linux Arch Linux: Arch Linux [4.20.13-arch1-1-ARCH|libc 2.28 (GNU libc)] Sun 03 Mar 2019 09:13:10 AM CST |  | Memory: 31.41 GB physical, 0 bytes virtual Sun 03 Mar 2019 09:13:10 AM CST |  | Disk: 227.75 GB total, 198.72 GB free Sun 03 Mar 2019 09:13:10 AM CST |  | Local time is UTC -6 hours Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12763354; resource share 100 Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | General prefs: from Einstein@Home (last modified 13-Oct-2016 10:37:51) Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | Host location: none Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | General prefs: using your defaults Sun 03 Mar 2019 09:13:10 AM CST |  | Reading preferences override file Sun 03 Mar 2019 09:13:10 AM CST |  | Preferences: Sun 03 Mar 2019 09:13:10 AM CST |  | max memory usage when active: 16081.74 MB Sun 03 Mar 2019 09:13:10 AM CST |  | max memory usage when idle: 25730.78 MB Sun 03 Mar 2019 09:13:10 AM CST |  | max disk usage: 32.00 GB Sun 03 Mar 2019 09:13:10 AM CST |  | max CPUs used: 13 Sun 03 Mar 2019 09:13:10 AM CST |  | don't use GPU while active Sun 03 Mar 2019 09:13:10 AM CST |  | suspend work if non-BOINC CPU load exceeds 25% Sun 03 Mar 2019 09:13:10 AM CST |  | (to change preferences, visit a project web site or select Preferences in the Manager) Sun 03 Mar 2019 09:13:10 AM CST |  | Setting up project and slot directories Sun 03 Mar 2019 09:13:10 AM CST |  | Checking active tasks Sun 03 Mar 2019 09:13:10 AM CST |  | Setting up GUI RPC socket Sun 03 Mar 2019 09:13:10 AM CST |  | Checking presence of 114 project files Sun 03 Mar 2019 09:13:10 AM CST |  | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:14:42 AM CST |  | Resuming GPU computation Sun 03 Mar 2019 09:16:50 AM CST |  | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:17:51 AM CST |  | Resuming GPU computation Sun 03 Mar 2019 09:18:57 AM CST |  | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:20:37 AM CST |  | Resuming GPU computation Sun 03 Mar 2019 09:20:39 AM CST |  | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:22:45 AM CST |  | Resuming GPU computation Sun 03 Mar 2019 09:24:04 AM CST |  | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:25:09 AM CST |  | Resuming GPU computation Sun 03 Mar 2019 09:25:16 AM CST |  | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:26:17 AM CST |  | Resuming GPU computation

 

To answer your question, yes I was working previously with my RX480s.  I am genuinely excited to see what it can do for computation considering it is basically a consumer grade  MI-50.  I have see the "No protocol specified" message for a very long time in the past.  We're kind of snow'd in today, so maybe i'll put my Polaris cards back in and confirm I was getting that message with those GPUs as well.  In Arch the OpenCL libraries are installed right from the Arch repos.  I too have thought it was a driver error,  from the system logs it seems to be like the driver may crash.  I have updated all of my drivers.  As i mentioned, other workload like games using the standard Mesa or the Mesa-Vulkan drivers have no problems, but using the Mesa-OpenCL drivers (BOINC and a few other apps, like LIbreOffice of all things) have had some issues.

Thank you for your help Gary.  I really do appreciate you taking the time to help me.

Derek

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109400029981
RAC: 35657092

Derek, I'm rather busy right

Derek,

I'm rather busy right now.  After a substantial outage, this project now has fresh work and I need to get my fleet back to work and under control again :-).  This may take me a while.

I'm wondering if (from the clinfo output) the platform name of 'Clover' and the OpenCL version of 1.1 might be the problem.  Whilst Mesa drivers are installed on my machines, I don't use the Clover implementation of OpenCL.  I found that to get my Polaris GPUs to work I needed to install OpenCL components from the AMDGPU-PRO package available from AMD.  As my distro is RPM based, I used the Red Hat versions of that package, starting with version 16.60 in early 2017 and currently on 18.30 from late in 2018.  I was able to work out a small subset of files from the full package which allowed the Einstein app to work without problems.

If you're interested, and when I have things under control, I'll go through exactly what I did.  I have no idea if this will work for you.  I don't know anything about Arch or the packaging format it uses.

 

Cheers,
Gary.

derek
derek
Joined: 20 Dec 15
Posts: 5
Credit: 56994638
RAC: 0

Thanks Gary I understand that

Thanks Gary I understand that you're busy.  This isn't an urgent issue.

I too have the proprietary OpenCL bits from the AMDGPU-PRO currently on 18.50.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109400029981
RAC: 35657092

derek wrote:... I too have

derek wrote:
... I too have the proprietary OpenCL bits from the AMDGPU-PRO currently on 18.50.

But you're not using those bits.  The clinfo output only identifies a single platform - Clover.  The first line of output clearly says, "Number of Platforms    1".

I know nothing about Clover, other than it is older and seemingly not well supported.  I've seen comments quite a while ago that crunching didn't work using Clover.  For all I know, it could be different now.

My guess, with the AMDGPU-PRO 18.50 bits installed, you need to investigate why that platform is not being shown by clinfo.  You can have multiple platforms installed and clinfo should show them all - that's my understanding.  You should be able to select the platform to use from all those installed.

Perhaps you should ask on the Arch forums why clinfo doesn't show anything except Clover.  Maybe it's just a matter of tweaking an environment variable like LD_LIBRARY_PATH so the libs can be found.  Where did your AMDGPU-PRO stuff get installed?  The standard place is under /opt/amdgpu/ for some things and /opt/amdgpu-pro/ for others.  The OpenCL libs are under /opt/amdgpu-pro/lib64/.

 

Cheers,
Gary.

derek
derek
Joined: 20 Dec 15
Posts: 5
Credit: 56994638
RAC: 0

Thanks for the lead Gary!  

Thanks for the lead Gary!

 

derek
derek
Joined: 20 Dec 15
Posts: 5
Credit: 56994638
RAC: 0

Interesting development... 

Interesting development...  If I don't start my desktop environment and just run boinc from the CLI console then everything seems to work fine.  So i think it is definitely a driver issue.  Hopefully kernel 5.0 and mesa 19 will fix my issues.  Until then i'll compute with my system running headless for a few days until Arch provides kernel and mesa updates.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.