FGRPopencl1K-ati POLARIS10 AMDGPU LLVM3.9.1 Mesa 17 = Crash

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

Thanks again, Gary. Yes, I'm

24 Sep 2017 21:06:14 UTC

Message 161930

(moderation:

)

Thanks again, Gary.

Yes, I'm certain that is exactly the problem. You have AMD-APP platform, and I do not. That's exactly what I need to fix.

This is very frustrating. I don't understand what is so different between our systems, nor do I really understand what is happening on either system.

On the other hand, and I'm sorry I didn't notice this before, but "AMD-APP" makes me suspicious. There used to be AMD software for computing in /opt/AMD-APP. It didn't come with the drivers, but the driver docs said you needed to get it and pointed users to a separate download site on AMD.com. It was separate API stuff...oh, I can't remember...wait, it was STEAM or something; like CUDA, but for AMD. You don't have anything like that installed, do you?

Here is my /opt directory tree:

/opt/amdgpu-pro/bin:
total 780
-rwxr-xr-x. 1 root root 798464 Aug 10 11:00 clinfo*

/opt/amdgpu-pro/lib64:
total 99980
-rwxr-xr-x. 1 root root 39131760 Aug 10 11:00 libamdocl12cl64.so*
-rwxr-xr-x. 1 root root 62971352 Aug 10 11:00 libamdocl64.so*
lrwxrwxrwx. 1 root root       22 Aug 10 09:24 libdrm_amdgpu.so.1 -> libdrm_amdgpu.so.1.0.0*
-rwxr-xr-x. 1 root root    66936 Aug 10 09:24 libdrm_amdgpu.so.1.0.0*
lrwxrwxrwx. 1 root root       22 Aug 10 09:24 libdrm_radeon.so.1 -> libdrm_radeon.so.1.0.1*
-rwxr-xr-x. 1 root root    67536 Aug 10 09:24 libdrm_radeon.so.1.0.1*
lrwxrwxrwx. 1 root root       15 Aug 10 09:24 libdrm.so.2 -> libdrm.so.2.4.0*
-rwxr-xr-x. 1 root root    81040 Aug 10 09:24 libdrm.so.2.4.0*
lrwxrwxrwx. 1 root root       15 Aug 10 09:24 libkms.so.1 -> libkms.so.1.0.0*
-rwxr-xr-x. 1 root root    18736 Aug 10 09:24 libkms.so.1.0.0*
lrwxrwxrwx. 1 root root       14 Aug 10 11:00 libOpenCL.so -> libOpenCL.so.1*
-rwxr-xr-x. 1 root root    27336 Aug 10 11:00 libOpenCL.so.1*

/opt/amdgpu-pro/share:
total 4
drwxr-xr-x. 3 root root 4096 Sep 23 15:57 doc/

/opt/amdgpu-pro/share/doc:
total 4
drwxr-xr-x. 2 root root 4096 Sep 23 15:57 libdrm-amdgpu-pro-2.4.70/

/opt/amdgpu-pro/share/doc/libdrm-amdgpu-pro-2.4.70:
total 4
-rw-r--r--. 1 root root 1627 Jul 24 01:42 README

One difference I see is that you do not have libOpenCL provided by AMD, but, I do. Since, this file came with libamdocl64 and libamdocl12cl64, I'm not sure what to make of it. Perhaps this is the key, but I would have expected more problems with your situation than mine.

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

After moving libOpenCL out of

24 Sep 2017 21:54:29 UTC

Message 161931

(moderation:

)

After moving libOpenCL out of the way, but it didn't help.

$ LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64 /opt/amdgpu-pro/bin/clinfo
/opt/amdgpu-pro/share/libdrm/amdgpu.ids: No such file or directory
Segmentation fault (core dumped)

I don't know what is /opt/amdgpu-pro/share/libdrm/amdgpu.ids, but neither of us have it, I don't think. But, that error doesn't occur with the AMD version of libOpenCL.

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

Yikes. I rebooted and GDM

25 Sep 2017 4:38:06 UTC

Message 161937

(moderation:

)

Yikes. I rebooted and GDM wouldn't start. Dang, this is worse than expected. I removed everything I installed and then GDM would start.

That doesn't even make sense. GDM wouldn't have been looking at any of the libraries, at least not directly. The only component installed globally would be the ICD stuff.

So, I'm still missing a piece.

Gordon Haverland

Joined: 28 Oct 16

Posts: 20

Credit: 428489605

RAC: 0

I don't know if relevant, but

7 Oct 2017 23:54:46 UTC

Message 162154

(moderation:

)

I don't know if relevant, but somewhere around the 4.8/4.9 kernel, vsyscall changed from emulate to none for most distributions of Linux. If you look in /var/log/messages (or similar) and see a bunch of errors about vsyscall, this could be the source. You can change the linux kernel boot line, to say vsyscall=emulate in the options, and this seems to help. I gather the reason this option was changed to none, is that there are security concerns. So hopefully at some point, BOINC quits requiring this to be set to emulate.

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

Wow, thanks Gordon. I

8 Oct 2017 1:07:44 UTC

Message 162158

(moderation:

)

Wow, thanks Gordon. I noticed your posts elsewhere, but didn't think that was related. After your post here, I checked, and I do see vsyscall at the very bottom of my backtrace. Hmm. I'll give it a try on my next reboot.

So, I'm trying to understand this issue. Does this mean that E@H GPU app uses vsyscall, directly? There isn't any problem CPU apps, only GPU apps. So, it must not be BOINC, right? All GPU apps on my system, fail, though I don't have logs from any other app but E@H to check for this particular indicator. It may not be the same problem. Or, this may only be one problem and a new one will pop up after I fix it.

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

E@H account service was down,

9 Oct 2017 16:05:16 UTC

Message 162179

(moderation:

)

E@H account service was down, yesterday, so I haven't be enable GPU computing to test. However, I tried MilkyWay@H and got only failures. Also, I'm getting SETI@H errors, too, still. So, I don't think vsyscall this has anything to do with my root problem.

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

I have confirmed that

9 Oct 2017 16:16:32 UTC

Message 162180

(moderation:

)

I have confirmed that vsyscall=emulate made no improvement to E@H. I'm running kernel 4.13, now. It's a very interesting suggestion, however, since my crashes end with:

7ffe849b9000-7ffe849bc000 r--p 00000000 00:00 0 [vvar]

7ffe849bc000-7ffe849be000 r-xp 00000000 00:00 0 [vdso]

ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

but, changing to 'emulate' had no effect.

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

Problems persist 6 months

24 Feb 2018 19:55:08 UTC

Message 164400

(moderation:

)

Problems persist 6 months later after more upgrades.

Same problem reported here https://einsteinathome.org/content/computation-errors-ubuntu17-rx460-opencl-mesa

See these bugs.

https://bugs.freedesktop.org/show_bug.cgi?id=104182

https://bugs.freedesktop.org/show_bug.cgi?id=104681

Paul

Joined: 3 May 07

Posts: 123

Credit: 1785298453

RAC: 310565

OMG, it's working. I noticed

10 Dec 2018 16:53:46 UTC

Message 168177

(moderation:

)

OMG, it's working. I noticed some updates to Mesa, and just decided to try Einsten@Home. Yay!

I now have DRM 3.27.0 and LLVM 7.0.0, driver 18.2.6 and OpenCL 1.1, Mesa 18.2.6.

Fingers crossed, this keeps working for a long time. It's been down for 2.5 years (well, less than, but I'm not sure how much less.)

Anybody else see this?

FGRPopencl1K-ati POLARIS10 AMDGPU LLVM3.9.1 Mesa 17 = Crash

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports