Hi there,
I'm running a linux system with Radeon/ATI 7750 card.
JUST in the last few weeks, I have been seeing this error in the /var/log/syslog file (From the GPU running the OpenCL driver)
Jun 6 14:01:24 pc-14-large kernel: [13330.559474] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it. Jun 6 14:01:24 pc-14-large kernel: [13330.746082] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it. Jun 6 14:05:22 pc-14-large kernel: [13568.659181] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it. Jun 6 14:15:35 pc-14-large kernel: [14179.673419] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it. Jun 6 14:16:07 pc-14-large kernel: [14212.460345] [fglrx:KAS_Mutex_Release] *ERROR* Mutex released without holding it.
The wu complete. I don't see any error reported to Einstein.
The time interval between errors decreases until the system crashes.
Question: Anyone else seeing this??
Here are my tasks:
http://einsteinathome.org/account/tasks
I set "no new tasks" just to verify that the problem exists only when running GPU WU.
Here is what BOINC says about the system
Fri 06 Jun 2014 10:45:42 AM EDT | | Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu Fri 06 Jun 2014 10:45:42 AM EDT | | log flags: file_xfer, sched_ops, task Fri 06 Jun 2014 10:45:42 AM EDT | | Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3 Fri 06 Jun 2014 10:45:42 AM EDT | | Data directory: /var/lib/boinc-client Fri 06 Jun 2014 10:45:42 AM EDT | | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 2048MB, 1906MB available, 2048 GFLOPS peak) Fri 06 Jun 2014 10:45:42 AM EDT | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (driver version 1411.4 (VM), device version OpenCL 1.2 AMD-APP (1411.4), 2048MB, 1906MB available, 2048 GFLOPS peak) Fri 06 Jun 2014 10:45:42 AM EDT | | OpenCL CPU: AMD FX(tm)-8150 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1411.4 (sse2,avx,fma4), device version OpenCL 1.2 AMD-APP (1411.4)) Fri 06 Jun 2014 10:45:42 AM EDT | | Host name: pc-14-large Fri 06 Jun 2014 10:45:42 AM EDT | | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2] Fri 06 Jun 2014 10:45:42 AM EDT | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold Fri 06 Jun 2014 10:45:42 AM EDT | | OS: Linux: 3.13.0-29-lowlatency Fri 06 Jun 2014 10:45:42 AM EDT | | Memory: 7.70 GB physical, 13.67 GB virtual Fri 06 Jun 2014 10:45:42 AM EDT | | Disk: 19.10 GB total, 16.50 GB free Fri 06 Jun 2014 10:45:42 AM EDT | | Local time is UTC -4 hours Fri 06 Jun 2014 10:45:42 AM EDT | Einstein@Home | Found app_config.xml Fri 06 Jun 2014 10:45:42 AM EDT | Einstein@Home | Your app_config.xml file refers to an unknown application 'einsteinbinary_BRP4G'. Known applications: 'einstein_S6CasA', 'hsgamma_FGRP3', 'einsteinbinary_BRP5'
Here is result of WU with the Mutax error above.
(I don't see any problem.):
http://einsteinathome.org/task/439570628
Here is the WU:
http://einsteinathome.org/workunit/191675693
I have a radeon ATI card with 2GB of memory.
My Config is set to 1/1 - 1 whole PC and only 1 WU in the GPU.
I only work GPU for Einstein - No CPU tasks.
The CPU has 8 processors. They work on WCG and POGS.
CPU utilization is set to 75%
2 idle
1 to handle the load/unload of the Winstein GPU work
5 to handle WCG and POG WUs.
I use UbuntuStudio (Ubuntu 14.04) and use the 'restricted flgrx-updates driver provided by ubuntu.
Today, I put in the fglrx-updates-dev driver to see if there was any difference.
No difference. Still errors.
i A fglrx-amdcccle-updates - Catalyst Control Center for the AMD graphi i fglrx-updates - Video driver for the AMD graphics accelera i fglrx-updates-dev - Video driver for the AMD graphics accelera
Yes, this is a lot to read.
Sorry about that.
Wanted to cut down on "Can you get this info?".
:)
That said, Is there any other info I can get to help debug??
Other: Looking at the output. I did not see any checkpoints - so maybe that can be ruled out...
I'm stumped. I didn't see "mutex" in the forum search tool.
Logging needed?
Suggestions appreciated.
Thanks,
Jay
PS.
Just in case, I listed this problem in the AskUbuntu Forums
https://answers.launchpad.net/ubuntu/+question/249882
No suggestions there - yet.
Thanks again,
Jay
Copyright © 2024 Einstein@Home. All rights reserved.
GPU giving Mutex lock error while peocessing einsteim@home WU
)
Stopped Einstein and tried Seti GPU task for comparison.
Get same Mutex errors - at one time -
but five times within the same second.
Got a 'new' error about 2 minutes after start 9of first SETI GPU WU:
Jun 8 01:21:59 pc-14-large kernel: [12372.687644] waiting module removal not supported: please upgrade
will also search/post in seti forum.
Jay