Amd opencl stopped working after a power outage

bonze82
bonze82
Joined: 9 Feb 05
Posts: 6
Credit: 31894535
RAC: 50082
Topic 225356

I had a power outage the other day and my ups failed to engage, bad battery, now my gpu won't crunch data. This is related I'm just not sure how, 2 days later all projects stopped crunching giving a shared memory error reinstalled the boinc packages, sudo apt install --reinstall boinc-client boinc-client-opencl boinc-manager libboinc7, and everything started working again. I tried the same with firmware, drivers and opencl with no luck.

stdoutdae: (before issue)

05-May-2021 01:26:36 [---] Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu
05-May-2021 01:26:39 [---] log flags: file_xfer, sched_ops, task
05-May-2021 01:26:39 [---] Libraries: libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
05-May-2021 01:26:39 [---] Data directory: /var/lib/boinc-client
05-May-2021 01:26:41 [---] OpenCL: AMD/ATI GPU 0: AMD OLAND (DRM 2.50.0, 4.19.0-16-amd64, LLVM 7.0.1) (driver version 18.3.6, device version OpenCL 1.1 Mesa 18.3.6, 2048MB, 2048MB available, 432 GFLOPS peak)
05-May-2021 01:26:41 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics Skylake Desktop GT2 (driver version 1.3, device version OpenCL 2.0 beignet 1.3, 3940MB, 3940MB available, 184 GFLOPS peak)
05-May-2021 01:26:41 [---] [libc detection] gathered: 2.28, Debian GLIBC 2.28-10
05-May-2021 01:26:41 [---] Host name: test
05-May-2021 01:26:41 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz [Family 6 Model 94 Stepping 3]
05-May-2021 01:26:41 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
05-May-2021 01:26:41 [---] OS: Linux Debian: Debian GNU/Linux 10 (buster) [4.19.0-16-amd64|libc 2.28 (Debian GLIBC 2.28-10)]
05-May-2021 01:26:41 [---] Memory: 7.70 GB physical, 0 bytes virtual
05-May-2021 01:26:41 [---] Disk: 213.26 GB total, 46.75 GB free
05-May-2021 01:26:41 [---] Local time is UTC -5 hours

stderrgpudetect:

/dev/dri/card1 not authenticated
/dev/dri/card1 not authenticated

coproc_info.xml

    <coprocs>
<warning>NVIDIA: libcuda.so: cannot open shared object file: No such file or directory</warning>
<warning>ATI: libaticalrt.so: cannot open shared object file: No such file or directory</warning>
<warning>OpenCL library present but no OpenCL-capable devices found</warning>
    </coprocs>

mikey
mikey
Joined: 22 Jan 05
Posts: 11948
Credit: 1832707143
RAC: 219099

bonze82 wrote: I had a power

bonze82 wrote:

I had a power outage the other day and my ups failed to engage, bad battery, now my gpu won't crunch data. This is related I'm just not sure how, 2 days later all projects stopped crunching giving a shared memory error reinstalled the boinc packages, sudo apt install --reinstall boinc-client boinc-client-opencl boinc-manager libboinc7, and everything started working again. I tried the same with firmware, drivers and opencl with no luck.

stdoutdae: (before issue)

05-May-2021 01:26:36 [---] Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu
05-May-2021 01:26:39 [---] log flags: file_xfer, sched_ops, task
05-May-2021 01:26:39 [---] Libraries: libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
05-May-2021 01:26:39 [---] Data directory: /var/lib/boinc-client
05-May-2021 01:26:41 [---] OpenCL: AMD/ATI GPU 0: AMD OLAND (DRM 2.50.0, 4.19.0-16-amd64, LLVM 7.0.1) (driver version 18.3.6, device version OpenCL 1.1 Mesa 18.3.6, 2048MB, 2048MB available, 432 GFLOPS peak)
05-May-2021 01:26:41 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics Skylake Desktop GT2 (driver version 1.3, device version OpenCL 2.0 beignet 1.3, 3940MB, 3940MB available, 184 GFLOPS peak)
05-May-2021 01:26:41 [---] [libc detection] gathered: 2.28, Debian GLIBC 2.28-10
05-May-2021 01:26:41 [---] Host name: test
05-May-2021 01:26:41 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz [Family 6 Model 94 Stepping 3]
05-May-2021 01:26:41 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
05-May-2021 01:26:41 [---] OS: Linux Debian: Debian GNU/Linux 10 (buster) [4.19.0-16-amd64|libc 2.28 (Debian GLIBC 2.28-10)]
05-May-2021 01:26:41 [---] Memory: 7.70 GB physical, 0 bytes virtual
05-May-2021 01:26:41 [---] Disk: 213.26 GB total, 46.75 GB free
05-May-2021 01:26:41 [---] Local time is UTC -5 hours

stderrgpudetect:

/dev/dri/card1 not authenticated
/dev/dri/card1 not authenticated

coproc_info.xml

    <coprocs>
<warning>NVIDIA: libcuda.so: cannot open shared object file: No such file or directory</warning>
<warning>ATI: libaticalrt.so: cannot open shared object file: No such file or directory</warning>
<warning>OpenCL library present but no OpenCL-capable devices found</warning>
    </coprocs> 

Can you just reload the drivers?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109971603036
RAC: 30018896

bonze82

bonze82 wrote:

....

stdoutdae: (before issue)

Your computers are hidden so nobody can check for all the details needed for a proper assessment of what the problem might be.  A link to the host in question would be useful.

You say the event log snippet was before the issue.  How about providing the same sort of details for after the issue for comparison?  The 'before' info shows the Mesa OpenCL (Clover) for your AMD device and Intel's Beignet for the Intel GPU.  Which GPU were you using and for what sort of tasks?

If there is now a different set of messages to the ones you showed - eg.  no OpenCL lines for either or both the GPUs (whichever you were using) then that will probably be the problem.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109971603036
RAC: 30018896

mikey wrote: bonze82

mikey wrote:

bonze82 wrote:

....

I tried the same with firmware, drivers and opencl with no luck.

....

Can you just reload the drivers?

He said he had already done all that.  However, some actual log output for confirmation would be nice.

Cheers,
Gary.

bonze82
bonze82
Joined: 9 Feb 05
Posts: 6
Credit: 31894535
RAC: 50082

Sorry for the delay here is

Sorry for the delay here is the current log for that pc:

Sat 08 May 2021 12:14:44 AM CDT |  | Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu
Sat 08 May 2021 12:14:44 AM CDT |  | log flags: file_xfer, sched_ops, task
Sat 08 May 2021 12:14:44 AM CDT |  | Libraries: libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
Sat 08 May 2021 12:14:44 AM CDT |  | Data directory: /var/lib/boinc-client
Sat 08 May 2021 12:14:44 AM CDT |  | No usable GPUs found
Sat 08 May 2021 12:14:44 AM CDT |  | app version refers to missing GPU type ATI
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Application uses missing ATI GPU
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_284.0_0_0.0_8108271_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_284.0_0_0.0_12913551_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_284.0_0_0.0_16374375_0
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_284.0_0_0.0_17691354_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_220.0_0_0.0_1359153_2
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_292.0_0_0.0_5297949_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_292.0_0_0.0_9667431_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_292.0_0_0.0_12583188_0
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_292.0_0_0.0_16144335_0
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_292.0_0_0.0_22181607_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_300.0_0_0.0_1835847_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_292.0_0_0.0_11705202_2
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_300.0_0_0.0_15626106_1
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | Missing coprocessor for task LATeah3011L00_300.0_0_0.0_21684465_1
Sat 08 May 2021 12:14:44 AM CDT |  | [libc detection] gathered: 2.28, Debian GLIBC 2.28-10
Sat 08 May 2021 12:14:44 AM CDT |  | Host name: test
Sat 08 May 2021 12:14:44 AM CDT |  | Processor: 4 GenuineIntel Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz [Family 6 Model 94 Stepping 3]
Sat 08 May 2021 12:14:44 AM CDT |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
Sat 08 May 2021 12:14:44 AM CDT |  | OS: Linux Debian: Debian GNU/Linux 10 (buster) [4.19.0-16-amd64|libc 2.28 (Debian GLIBC 2.28-10)]
Sat 08 May 2021 12:14:44 AM CDT |  | Memory: 7.70 GB physical, 0 bytes virtual
Sat 08 May 2021 12:14:44 AM CDT |  | Disk: 213.26 GB total, 46.05 GB free
Sat 08 May 2021 12:14:44 AM CDT |  | Local time is UTC -5 hours
Sat 08 May 2021 12:14:44 AM CDT |  | Config: GUI RPCs allowed from:
Sat 08 May 2021 12:14:44 AM CDT | climateprediction.net | URL https://climateprediction.net/; Computer ID 1518442; resource share 100
Sat 08 May 2021 12:14:44 AM CDT | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12880585; resource share 100
Sat 08 May 2021 12:14:44 AM CDT |  | General prefs: from http://setiathome.berkeley.edu/ (last modified 15-Feb-2020 10:30:58)
Sat 08 May 2021 12:14:44 AM CDT |  | Host location: none
Sat 08 May 2021 12:14:44 AM CDT |  | General prefs: using your defaults
Sat 08 May 2021 12:14:44 AM CDT |  | Reading preferences override file
Sat 08 May 2021 12:14:44 AM CDT |  | Preferences:
Sat 08 May 2021 12:14:44 AM CDT |  | max memory usage when active: 6304.08 MB
Sat 08 May 2021 12:14:44 AM CDT |  | max memory usage when idle: 6304.08 MB
Sat 08 May 2021 12:14:45 AM CDT |  | max disk usage: 15.00 GB
Sat 08 May 2021 12:14:45 AM CDT |  | max CPUs used: 3
Sat 08 May 2021 12:14:45 AM CDT |  | suspend work if non-BOINC CPU load exceeds 50%
Sat 08 May 2021 12:14:45 AM CDT |  | (to change preferences, visit a project web site or select Preferences in the Manager)
Sat 08 May 2021 12:14:45 AM CDT |  | Setting up project and slot directories
Sat 08 May 2021 12:14:45 AM CDT |  | Checking active tasks
Sat 08 May 2021 12:14:45 AM CDT |  | Setting up GUI RPC socket
Sat 08 May 2021 12:14:45 AM CDT |  | gui_rpc_auth.cfg is empty - no GUI RPC password protection
Sat 08 May 2021 12:14:45 AM CDT |  | Checking presence of 213 project files

The computer in question https://einsteinathome.org/host/12880585

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2770458276
RAC: 912504

Sometimes, Linux machines

Sometimes, Linux machines launch BOINC too quickly during restart, before the video drivers have loaded. Try shutting down the BOINC service (only - not the whole machine) for a couple of seconds, and then restarting iy.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109971603036
RAC: 30018896

And if what Richard suggests

And if what Richard suggests doesn't result in BOINC detecting the OpenCL capabilities of your machine (see the lines in your first log snip), then the current message of "No usable GPUs found" most likely means that you don't have OpenCL libraries installed.

You just need to properly install OpenCL - by whatever method you used when you had a working setup.

Cheers,
Gary.

bonze82
bonze82
Joined: 9 Feb 05
Posts: 6
Credit: 31894535
RAC: 50082

Done and done many times. The

Done and done many times. The onlt steps I haven't taken are reinstalling debian butl will try reinstalling opencl and drivers one more time.

bonze82
bonze82
Joined: 9 Feb 05
Posts: 6
Credit: 31894535
RAC: 50082

Ok I saw on a very old thread

Ok I saw on a very old thread somewhere:

sudo systemctl stop boinc-client

sudo boinc

I ran said commands on cmd and it showed both gpu's usable but no projects. Any idea btw I stopped the service again and modified /etc/init.d/boinc-client BOINC_USER=root with no success however starting sudo boinc does show the gpu's.

cecht
cecht
Joined: 7 Mar 18
Posts: 1432
Credit: 2468175260
RAC: 752545

Richard Haselgrove

Richard Haselgrove wrote:

Sometimes, Linux machines launch BOINC too quickly during restart, before the video drivers have loaded. Try shutting down the BOINC service (only - not the whole machine) for a couple of seconds, and then restarting iy.

BOINC can be delayed at startup by configuring cc_config.xml (/etc/boinc-client/cc_config.xml) with a start_delay option.  By example, here is my cc_config content using a 20-second delay:

<cc_config>
  <log_flags>
    <task>1</task>
    <file_xfer>1</file_xfer>
    <sched_ops>1</sched_ops>
  </log_flags>
  <options>
    <start_delay>20</start_delay>
    <rec_half_life_days>1.000000</rec_half_life_days>
    <use_all_gpus>1</use_all_gpus>
  </options>
</cc_config>

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2770458276
RAC: 912504

cecht wrote: Richard

cecht wrote:

Richard Haselgrove wrote:

Sometimes, Linux machines launch BOINC too quickly during restart, before the video drivers have loaded. Try shutting down the BOINC service (only - not the whole machine) for a couple of seconds, and then restarting iy.

BOINC can be delayed at startup by configuring cc_config.xml (/etc/boinc-client/cc_config.xml) with a start_delay option.  By example, here is my cc_config content using a 20-second delay:

Unfortunately, that doesn't delay BOINC itself (which I was suggesting might be the culprit here). The <start_delay> only applies to project science apps: BOINC itself has to have started first in order to read the config file, and apply that delay to the next stage in the process.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.