Rejoin and everything bombs!

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

I'm happy to hear that you

I'm happy to hear that you managed to get this solved and are up and running!

I think the key difference between Milkyway and Einstein might be the OpenCL version (and maybe how it's implemented), you had 1.1 and I think Einstein needs 1.2, though I might be wrong on that...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117711462383
RAC: 35090400

James L. Neill wrote:I do

James L. Neill wrote:
I do find it interesting that Milkyway could recognise the mesa drivers and Einstein not. I suspect that any pre-existing opencl package should be removed before installing the amdgpu-pro stack.

Hi James, I'm very happy to see that you sorted it out.  I don't use Ubuntu so didn't feel comfortable with trying to offer advice about procedures I've never used myself.

One thing to realise, which relates to the above quote, is that the OpenCL compute libs are quite separate from the video drivers.  You still would have Mesa (with the amdgpu kernel module) on top of which you now have OpenCL libs that are compliant with the OpenCL 1.2 specification rather than the 1.1 compliant libs that you previously had.  It seems there are extra (or different) functions in 1.2 that the Einstein App needs whereas Milkyway only needs functions that were available in 1.1.

It is supposed to be OK to have multiple OpenCL implementations installed (according to what I've read - maybe I've misinterpreted) and the app is supposed to choose the implementation that it needs.  Since it's the BOINC client that detects OpenCL, perhaps the fact that BOINC previously found 4 GPUs points to some weakness in the methods BOINC uses to detect your hardware and its compute capabilities.  When I looked yesterday, BOINC had listed 4 GPUs for your host.  Today it only shows 2.

Whenever I setup a new machine, I always make sure that no new tasks (NNT) is set so that I have a chance to look (event log) at how BOINC has detected the OpenCL capabilities of the GPU before any work fetch occurs.  If the number of GPUs or the OpenCL version isn't correctly specified in the startup messages, there's no use downloading or attempting to crunch any work.  Even before that, I run the clinfo utility that comes with the amdgpu-pro package to make sure the OS is properly detecting the correct GPU details and its compute capabilities (correct OpenCL platform and version info).

I don't use any of the 'big' distros - they seem too 'Microsoft-like' in their attitude.  A lot of extra bloat, and hide all the knobs you really need away somewhere in case the user decides to play with them. :-)

Mine is RPM based, uses Mesa on top of the amdgpu kernel module for video and I just extract the OpenCL libs from the Red Hat version (RPM style packages) of amdgpu-pro.  It's just a couple of RPMs from over 50 in the full set of packages.  It all works perfectly.  I install both the legacy and PAL versions of the OpenCL libs in case I ever want to swap a Polaris (or earlier) GPU that needs legacy for say a Vega based GPU that uses PAL.  I have a couple of recent AMD APUs that have internal Vega graphics and the same install method works fine on them so it's obviously OK to have two separate OpenCL implementations, at least under my circumstances.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.