GPU giving Mutex lock error while peocessing einsteim@home WU

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0
Topic 197601

Hi there,
I'm running a linux system with Radeon/ATI 7750 card.

JUST in the last few weeks, I have been seeing this error in the /var/log/syslog file (From the GPU running the OpenCL driver)

Jun  6 14:01:24 pc-14-large kernel: [13330.559474] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it.
Jun  6 14:01:24 pc-14-large kernel: [13330.746082] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it.
Jun  6 14:05:22 pc-14-large kernel: [13568.659181] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it.
Jun  6 14:15:35 pc-14-large kernel: [14179.673419] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it.
Jun  6 14:16:07 pc-14-large kernel: [14212.460345] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it.

The wu complete. I don't see any error reported to Einstein.

The time interval between errors decreases until the system crashes.

Question: Anyone else seeing this??

Here are my tasks:
http://einsteinathome.org/account/tasks

I set "no new tasks" just to verify that the problem exists only when running GPU WU.

Here is what BOINC says about the system

Fri 06 Jun 2014 10:45:42 AM EDT |  | Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu
Fri 06 Jun 2014 10:45:42 AM EDT |  | log flags: file_xfer, sched_ops, task
Fri 06 Jun 2014 10:45:42 AM EDT |  | Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
Fri 06 Jun 2014 10:45:42 AM EDT |  | Data directory: /var/lib/boinc-client
Fri 06 Jun 2014 10:45:42 AM EDT |  | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 2048MB, 1906MB available, 2048 GFLOPS peak)
Fri 06 Jun 2014 10:45:42 AM EDT |  | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (driver version 1411.4 (VM), device version OpenCL 1.2 AMD-APP (1411.4), 2048MB, 1906MB available, 2048 GFLOPS peak)
Fri 06 Jun 2014 10:45:42 AM EDT |  | OpenCL CPU: AMD FX(tm)-8150 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1411.4 (sse2,avx,fma4), device version OpenCL 1.2 AMD-APP (1411.4))
Fri 06 Jun 2014 10:45:42 AM EDT |  | Host name: pc-14-large
Fri 06 Jun 2014 10:45:42 AM EDT |  | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2]
Fri 06 Jun 2014 10:45:42 AM EDT |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
Fri 06 Jun 2014 10:45:42 AM EDT |  | OS: Linux: 3.13.0-29-lowlatency
Fri 06 Jun 2014 10:45:42 AM EDT |  | Memory: 7.70 GB physical, 13.67 GB virtual
Fri 06 Jun 2014 10:45:42 AM EDT |  | Disk: 19.10 GB total, 16.50 GB free
Fri 06 Jun 2014 10:45:42 AM EDT |  | Local time is UTC -4 hours
Fri 06 Jun 2014 10:45:42 AM EDT | Einstein@Home | Found app_config.xml
Fri 06 Jun 2014 10:45:42 AM EDT | Einstein@Home | Your app_config.xml file refers to an unknown application 'einsteinbinary_BRP4G'.  Known applications: 'einstein_S6CasA', 'hsgamma_FGRP3', 'einsteinbinary_BRP5'

Here is result of WU with the Mutax error above.
(I don't see any problem.):
http://einsteinathome.org/task/439570628

Here is the WU:
http://einsteinathome.org/workunit/191675693

I have a radeon ATI card with 2GB of memory.
My Config is set to 1/1 - 1 whole PC and only 1 WU in the GPU.
I only work GPU for Einstein - No CPU tasks.

The CPU has 8 processors. They work on WCG and POGS.
CPU utilization is set to 75%
2 idle
1 to handle the load/unload of the Winstein GPU work
5 to handle WCG and POG WUs.

I use UbuntuStudio (Ubuntu 14.04) and use the 'restricted flgrx-updates driver provided by ubuntu.

Today, I put in the fglrx-updates-dev driver to see if there was any difference.
No difference. Still errors.

i A fglrx-amdcccle-updates          - Catalyst Control Center for the AMD graphi
i   fglrx-updates                   - Video driver for the AMD graphics accelera
i   fglrx-updates-dev               - Video driver for the AMD graphics accelera

Yes, this is a lot to read.
Sorry about that.
Wanted to cut down on "Can you get this info?".

:)

That said, Is there any other info I can get to help debug??

Other: Looking at the output. I did not see any checkpoints - so maybe that can be ruled out...

I'm stumped. I didn't see "mutex" in the forum search tool.

Logging needed?

Suggestions appreciated.

Thanks,
Jay

PS.
Just in case, I listed this problem in the AskUbuntu Forums
https://answers.launchpad.net/ubuntu/+question/249882
No suggestions there - yet.

Thanks again,
Jay

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

GPU giving Mutex lock error while peocessing einsteim@home WU

Stopped Einstein and tried Seti GPU task for comparison.
Get same Mutex errors - at one time -
but five times within the same second.

Got a 'new' error about 2 minutes after start 9of first SETI GPU WU:
Jun 8 01:21:59 pc-14-large kernel: [12372.687644] waiting module removal not supported: please upgrade
will also search/post in seti forum.
Jay

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.