GPU crunching on Linux - Fun with AMD fglrx drivers

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109386730141
RAC: 35925423
Topic 197409

I've recently set up a bunch of HD7850s in a variety of hosts with both true quad core and dual core + HT CPUs. There's even a couple of six core hosts in the mix. In the main, these have turned out for me to be a good compromise between purchase price, power consumption and productivity.

But that's another story. This one will be way too long just on its own :-).

These hosts are all running Linux and it's been quite a learning curve to arrive at where I wanted to be. The distro I prefer to use {PCLinuxOS) still doesn't have packages in the repository which allow a simple, properly working, AMD driver install or upgrade. They do have for NVIDIA GPUs. When I bought my first AMD card, I actually switched to openSUSE to get a properly working system. However, for various arcane reasons, I don't want openSUSE for everything. I really want PCLinuxOS. For EVERYTHING!! :-). Being pig-headed, I was determined to make AMD GPUs work on PCL.

I achieved that quite a while ago by being persistent. With NVIDIA, all I had to do was install the driver packages and a separate CUDA package and everything just worked. With AMD, there were catalyst driver packages but no separate OpenCL package. The driver packages worked fine for having a properly functioning fglrx driver but of course no crunching capability.

So back at the time of catalyst 13.4, I decided to remove completely the repo supplied packages and download catalyst 13.4 for Linux from the AMD website. I'm pretty sure I read that the OpenCL GPU runtime was included in the driver package. I had no problem running the installer in the package and everything completed without error. I was able to reboot and get the fglrx driver running so the display was just peachy. However, still no go for crunching. In desperation, I downloaded the AMD APP SDK (accelerated parallel processing software development kit is what that stands for, I think). I chose the version that went with the catalyst 13.4 drivers. I didn't think I should need it since I wasn't developing for OpenCL but I had read something that caused me to wonder if it might help.

Once again it was quite easy to install and it did so without significant complaint. On rebooting, lo and behold, GPU crunching just took off at a great rate, pretty much the same as it had been with openSUSE. This was quite a while ago - last year, so why am I bringing it up now, you may well ask :-). Well, last Friday, I suffered a couple of power outages that played havoc with things - as I mentioned in this message. One of my HD7850 endowed hosts actually had the root partition corrupted so a complete reinstall was required. I was able to make the partition readable using the standard Linux tools but it was too damaged to boot. I was able to copy off the BOINC tree.

I had noted that, quite recently, a fglrx OpenCL package had appeared in the repo, so I decided that as part of the OS reinstall I would give it a try. This time the catalyst driver version was 13.12 (December 2013) so I thought it might be a good time to try a more recent set of drivers. So after the OS reinstall (which included the catalyst drivers anyway) I just added the extra package for the OpenCL libs. No problems with the installation and no problems with putting the recovered BOINC stuff back in place and having it start properly. But... missing GPU messages everywhere. So I did a complete reinstall of the OS and this time I uninstalled the fglrx packages in preparation for using the AMD stuff as I had used previously. I got the catalyst 13.12 drivers and the appropriate AMD APP SDK from the AMD website. The install procedure seemed to be pretty much identical to that for 13.4 but this time it complained of an error exit while trying to build the kernel module. Nevertheless, it seemed to run through to completion, even announcing that it had successfully finished. I wasn't having a bar of that so before scrapping everything and going back to 13.4 (which I knew would work) I decided to try a further experiment.

I figured that there wouldn't be a workable module but I may as well try getting one by installing the 13.12 driver packages from the repo once again. I wondered what might happen if I did that 'over-the-top' of whatever got installed from the AMD installer (seeing as it said it finished successfully). I was expecting it to fail but I was going to trash things anyway, so why not try. I was surprised to see the packages install without complaint. On a hunch, I didn't try installing the OpenCL libs package. Instead, I installed the AMD APP SDK that went with 13.4. That too installed without complaint. So I tried a reboot, fully expecting something to blow up - but it didn't. Then I restarted BOINC and everything just started working. Not any ordinary old working but a visibly faster form of working!!!

This is the host in question. If you look through its list of tasks you will find a single solitary compute error which would have been a task in progress at the time of the power problem. There were other tasks returned after the restart which had been sitting ready to report at the time of the outage. Tasks returned on 21st or earlier are prior to the outage. The most recent tasks returned after the 24th are done with the upgraded drivers. The tasks are being done 4x and used to take over 19Ksecs. The most recent ones are now taking just over 17Ksecs. I was really annoyed and frustrated with the power outage but at least something good came out of it.

Now that I know I want 13.12 drivers on all hosts, I guess I need to go and start a coversation on the PCL website and see if I can get them to fix whatever is not working properly with their repo packages. I probably should have done that quite a while ago, rather than being pig-headedly self-reliant :-).

Cheers,
Gary.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454481158
RAC: 8804

GPU crunching on Linux - Fun with AMD fglrx drivers

Gary,

I read your post with interest because I just went through the same thing with a CAL AMD Radeon HD 7850/7870 series (Pitcairn) GPU on Ubuntu 12.04. My experience was anything but "FUN". Your post was about a day late for me. After many hours of googling to find a solution to this problem with the driver 13.12 install:

...

cd /var/lib/dkms/fglrx/13.251/build; sh make.sh --nohints --uname_r=3.2.0-4-686-pae --norootcheck.......(bad exit status: 1)
[Error] Kernel Module : Failed to build fglrx-13.251 with DKMS
[Error] Kernel Module : Removing fglrx-13.251 from DKMS

...

I was finally able to get it working by reading this post (scroll down to the user: snus addict). Following his instructions I was able to get a clean install of the 13.12 drivers. I had also tried the patches to code that are mentioned in this thread but they did not seem to work. I thought I had it made. I rebooted. Everthing came up and then requested new work from E@H but got no GPU work. The logs were saying that I had no GPU, BUT the AMD Catalyst software correctly identified the device. Had I read your post I could have saved this effort by installing the SDK stuff. Unlike you I had had enough and fell back to Win7 and I am now crunching. I will revisit the Linux/AMD effort at a later date. It seems that many vendors who say they support the Linux environment do so half-way. Their focus seems more on the Win world probably because of the larger user base. Too bad. There is no excuse for putting out "broken" driver packages. My experience with NVIDIA on Linux was a great deal more "FUN".

[EDIT] FYI: The page I linked to above reports that this is still a problem in the 14.1 beta drivers.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 556

Thanks for sharing your

Thanks for sharing your experiences.

The SDK is an important part of the installation as this will allow for OpenCL support so that the BOINC client can detect the cards as having OpenCL support.

I am currently running Slackware64 on my systems using vanilla kernel source. I installed the SDK version 2.7 once some time back and now I can easily compile and install different driver versions using the compiled kernel source. Once built, Slackware's pkgtool makes it simple to swap driver package versions back and forth. In a multi-GPU configuration, I have seen the best performance from driver 13.10 Beta 2. For some reason, the newer drivers do not perform as well including the latest 14.1 Beta driver.

I do see the occasional system lockup with the more recent drivers. This happens anywhere from 1 to as many as 21-days of runtime. I have been able to reproduce the lockup on two different systems one of which is based on a newer CPU and motherboard. Each system has different vendor AMD cards. Based on that, I do not believe this to be a hardware issue. Each day, I SSH into my systems to see if the systems are still operational and if not, I reboot and start BOINC again. This gets tiring after a while, but otherwise the tasks have been running normally.

One point to mention with AMD drivers is that if you run multiple cards, make sure to enter this command before starting the BOINC client. The command below is to make sure that BOINC properly detects all installed cards.

export DISPLAY=:0

The aticonfig command in Linux has some nice features. You can overclock the core and memory frequencies, set fan control, and monitor the temperatures and GPU load for each card. These changes can be made through a SSH session as long as you run the above export command before running aticonfig.

Here is a simple script I made so I can remotely check temperature and fan speed for each GPU installed.

export DISPLAY=:0
aticonfig --odgt --adapter=0
aticonfig --pplib-cmd "get fanspeed 0"
export DISPLAY=:0.1
aticonfig --odgt --adapter=1
aticonfig --pplib-cmd "get fanspeed 0"
export DISPLAY=:0.2
aticonfig --odgt --adapter=2
aticonfig --pplib-cmd "get fanspeed 0"

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454481158
RAC: 8804

I see where AMD has just

I see where AMD has just released its latest Beta driver 14.2. You can get it here.

Does someone have the link to the AMD SDK download?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109386730141
RAC: 35925423

Scroll down to the bottom of

Scroll down to the bottom of this page. There's also a link to an archive page for older versions.

Cheers,
Gary.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454481158
RAC: 8804

I am waving a white

I am waving a white flag.

I revisited the the Linux Amd effort again and have had no success. I installed the new AMD beta driver from their site and a fresh install of Ubuntu 12.04. I was optimistic. This driver went in clean without any errors. I rebooted. Then installed the sdk. rebooted. Everything looked great but again the GPU was not recognized. I have googled until the cows come home - I am still waiting for them. I am done for the day so if anyone has any suggestions/ideas give me a shout. It can't be this hard! - can it?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109386730141
RAC: 35925423

RE: I have googled until

Quote:
I have googled until the cows come home - I am still waiting for them.


Maybe this is one of them. I just found it now wandering around in the paddock behind the milking shed. Looks like it's got plenty to give :-). This seems like a very useful document for explaining the ins and outs of setting up OpenCL in Linux. I found it a few minutes ago. I haven't properly digested it yet but it looks very promising.

I think I've worked out why installing the SDK on top of a repo supplied catalyst driver didn't work for me. I decided to study the installation script from inside the SDK package. There is a very short bash script that runs a perl script to do the work. I've never used perl so I haven't a clue really but some things seem to be self explanatory :-). Here is a code snippet from the script.

#Copy the Runtime files to System#
print OUTPUT_LOG "$steps )Copying the OpenCL runtime files to System...  \n";  $steps=$steps+1;

#Checking for Catalyst OpenCL runtime files in /usr/lib
$Cat_OCL_RT_files = '/usr/lib/libamdocl32.so';
$lib = "/opt/AMDAPP/lib";
if (-e $Cat_OCL_RT_files) {
print "AMD Catalyst OpenCL Runtime is available hence skipping OpenCL CPU Runtime Installation Installation \n";
$cplibCL32 = "rm -f $lib/x86/libOpenCL*";
$result = system ($cplibCL32);
$cplibamd32 = "rm -f $lib/x86/libamdocl32.so";
$result = system ($cplibamd32);
$bin = "/opt/AMDAPP/bin/";
$clinfo = "cp -f $bin/x86/clinfo /usr/bin/";
$result = system ($clinfo);
$rmbin = "rm -rf $bin";
$result = system ($rmbin);
$symlink32 = "ln -s /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so";
$result = system ($symlink32);
}
else { print "Installing AMD APP CPU runtime under /opt/AMDAPP/lib \n";
$bin = "/opt/AMDAPP/bin/";
$clinfo = "cp -f $bin/x86/clinfo /usr/bin/";
$result = system ($clinfo);
$rmbin = "rm -rf $bin";
$result = system ($rmbin);
}

The above snippet seems to assume that catalyst drivers have been installed only if /usr/lib/libamdocl32.so exists. If it finds this file, it says that the "Catalyst OpenCL Runtime is available" and so deletes the shared libs from the expanded SDK (/opt/AMDAPP/lib). In the case of my repo supplied fglrx packages, they don't install things in /usr/lib. I found these shared libs in /usr/lib/fglrx-current/. So it seems to me, for my distro, I need to modify the above script to test for /usr/lib/fglrx-current/libamdocl32.so.

When I get a moment, I'll take one of my other working hosts with catalyst 13.4 and upgrade it to 13.12 from the repo. I'll also install the fglrx-opencl package that has newly appeared in the repo. I'll modify the above script as indicated and then install the SDK. Maybe that'll be all I need to do :-).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109386730141
RAC: 35925423

Thanks very much for

Thanks very much for responding.

Quote:
The SDK is an important part of the installation as this will allow for OpenCL support so that the BOINC client can detect the cards as having OpenCL support.


I'm slowly starting to find out things about this. Browsing the perl script mentioned in my previous message, plus other things I've read make me think that I don't really need the SDK but rather the 'Installable Client Driver' (ICD) stuff that ends up in /etc/OpenCL/vendors/. It seems like I need a /etc/OpenCL/vendors/amdocl32.icd, a text file containing the string "libamdocl32.so". Maybe I just need to change the string to "fglrx-current/libamdocl32.so" for the catalyst OpenCL runtime to be found in my version of Linux. Do you happen to know exactly what BOINC looks for in checking for OpenCL support? Maybe it just looks at this 'ICD registry'.

Quote:
In a multi-GPU configuration, I have seen the best performance from driver 13.10 Beta 2. For some reason, the newer drivers do not perform as well including the latest 14.1 Beta driver.


Thanks for the info. I'm only ever using a single GPU per host and the performance gain from 13.4 to 13.12 makes me keen to see if I can repeat this. It was a Haswell based host (i3 4130) so I'll try out some Ivy Bridge hosts next that have exactly the same GPU.

Quote:
The aticonfig command in Linux has some nice features....


Yes, indeed. I use it on first setup to check things but tend not to go back since these new machines seem to run for months without issue. Apart from power failures, I only seem to get problems with them if the ambient temperature gets too hot. They are not in aircon but there is forced ventilation. During normal summer days, the air inlet gets up to around 30C and the exit is about 36C.

During a heat wave, the ambient increase is enough to cause some machines to crash. I have a script that can run around the entire fleet and stop BOINC on each one. I run it on days predicted to be 35C or above. When things cool down enough, I run the script to restart all BOINCs. I only lose a few hours in the heat of the day. It's amazing what that does to the room temperature :-). It's got me through this summer with very few problems, even though there have been days that have got to around 40C or so.

Thanks for the tip about export DISPLAY:0. I used to go to individual machines but now I'm doing it from the comfort of my office desk :-). Gotta luv ssh for lots of things. Very useful inside scripts.

Cheers,
Gary.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454481158
RAC: 8804

Gary, I just rolled out of

Gary,

I just rolled out of bed and decided on one more approach and my cow came home. With all of the reading I had done from googling others (ageless over at seti) with this problem it seemed to point to an issue with libraries. The confusion seems to be with "do we need the SDK?" now. Some of the reading at AMD seems to imply that the functionality provided by SDK is now provided in "catalyst driver". I am still not clear about this. Anyway I "UNINSTALLED" the Ubuntu distro's version of BOINC. I downloaded the Linux version from Berkeley and installed it. It complained about a missing library so I installed the library, fired up this new BOINC and immediately I downloaded AMD GPU work. I still do not know the magic formula. The install of SDK modifies /etc/profile to include:

AMDAPPSDKROOT="/opt/AMDAPP"
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"/opt/AMDAPP/lib/x86_64":"/opt/AMDAPP/lib/x86"
export AMDAPPSDKROOT
export LD_LIBRARY_PATH

so my question is: does the BOINC download from Berkeley look for these "exports" to resolve libraries for GPU crunching? Or is the Ubuntu version of BOINC hosed such that it cannot find the "magic" that says we have a real GPU.

I have made so many mods to this install that I really am clueless as to what the requirements are so I am going to blow it away, reinstall Ubuntu, the new AMD beta driver and the Berkeley BOINC. If this is successful we will know that SDK is not required and I will provide a cut and paste of the procedure.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454481158
RAC: 8804

RE: RE: I have googled

Quote:
Quote:
I have googled until the cows come home - I am still waiting for them.

Maybe this is one of them. I just found it now wandering around in the paddock behind the milking shed. Looks like it's got plenty to give :-). This seems like a very useful document for explaining the ins and outs of setting up OpenCL in Linux. I found it a few minutes ago. I haven't properly digested it yet but it looks very promising.

I think I've worked out why installing the SDK on top of a repo supplied catalyst driver didn't work for me. I decided to study the installation script from inside the SDK package. There is a very short bash script that runs a perl script to do the work. I've never used perl so I haven't a clue really but some things seem to be self explanatory :-). Here is a code snippet from the script.

#Copy the Runtime files to System#
print OUTPUT_LOG "$steps )Copying the OpenCL runtime files to System...  \n";  $steps=$steps+1;

#Checking for Catalyst OpenCL runtime files in /usr/lib
$Cat_OCL_RT_files = '/usr/lib/libamdocl32.so';
$lib = "/opt/AMDAPP/lib";
if (-e $Cat_OCL_RT_files) {
print "AMD Catalyst OpenCL Runtime is available hence skipping OpenCL CPU Runtime Installation Installation \n";
$cplibCL32 = "rm -f $lib/x86/libOpenCL*";
$result = system ($cplibCL32);
$cplibamd32 = "rm -f $lib/x86/libamdocl32.so";
$result = system ($cplibamd32);
$bin = "/opt/AMDAPP/bin/";
$clinfo = "cp -f $bin/x86/clinfo /usr/bin/";
$result = system ($clinfo);
$rmbin = "rm -rf $bin";
$result = system ($rmbin);
$symlink32 = "ln -s /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so";
$result = system ($symlink32);
}
else { print "Installing AMD APP CPU runtime under /opt/AMDAPP/lib \n";
$bin = "/opt/AMDAPP/bin/";
$clinfo = "cp -f $bin/x86/clinfo /usr/bin/";
$result = system ($clinfo);
$rmbin = "rm -rf $bin";
$result = system ($rmbin);
}

The above snippet seems to assume that catalyst drivers have been installed only if /usr/lib/libamdocl32.so exists. If it finds this file, it says that the "Catalyst OpenCL Runtime is available" and so deletes the shared libs from the expanded SDK (/opt/AMDAPP/lib). In the case of my repo supplied fglrx packages, they don't install things in /usr/lib. I found these shared libs in /usr/lib/fglrx-current/. So it seems to me, for my distro, I need to modify the above script to test for /usr/lib/fglrx-current/libamdocl32.so.


I too had noticed this script. I thought that maybe the problem was my installation sequence: catalyst first then SDK. So i blew it all away and tried SDK then catalyst. There is even a blurb on the driver/sdk AMD page that say do "this" to avoid the install off "????". So I did the driver first then the SDK. But with my earlier post of the Berkeley BOINC none of this really makes sense or it just confuses the isssue. As I stated earlier I will do a complete rebuild of Ubuntu, with Catalyst and Berkeley BOINC and see if that results in a clean install without the SDK (and all of its components) It just seems so strange that the Berkeley BOINC install was able to inform E@H that I now have a GPU, where as the distros version of BOINC could not do this. Someone is not finding the correct libraries or whatever is needed.

Quote:

When I get a moment, I'll take one of my other working hosts with catalyst 13.4 and upgrade it to 13.12 from the repo. I'll also install the fglrx-opencl package that has newly appeared in the repo. I'll modify the above script as indicated and then install the SDK. Maybe that'll be all I need to do :-).
Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752669592
RAC: 1465799

RE: Gary, I just rolled

Quote:

Gary,

I just rolled out of bed and decided on one more approach and my cow came home. With all of the reading I had done from googling others (ageless over at seti) with this problem it seemed to point to an issue with libraries. The confusion seems to be with "do we need the SDK?" now. Some of the reading at AMD seems to imply that the functionality provided by SDK is now provided in "catalyst driver". I am still not clear about this. Anyway I "UNINSTALLED" the Ubuntu distro's version of BOINC. I downloaded the Linux version from Berkeley and installed it. It complained about a missing library so I installed the library, fired up this new BOINC and immediately I downloaded AMD GPU work. I still do not know the magic formula. The install of SDK modifies /etc/profile to include:

AMDAPPSDKROOT="/opt/AMDAPP"
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"/opt/AMDAPP/lib/x86_64":"/opt/AMDAPP/lib/x86"
export AMDAPPSDKROOT
export LD_LIBRARY_PATH

so my question is: does the BOINC download from Berkeley look for these "exports" to resolve libraries for GPU crunching? Or is the Ubuntu version of BOINC hosed such that it cannot find the "magic" that says we have a real GPU.

I have made so many mods to this install that I really am clueless as to what the requirements are so I am going to blow it away, reinstall Ubuntu, the new AMD beta driver and the Berkeley BOINC. If this is successful we will know that SDK is not required and I will provide a cut and paste of the procedure.


One of the major sources of confusion - which applies just as much to Windows as to Linux - is AMD's use of terminology.

There are two quite separate bits of software. One is the full, mega, toolset that is needed by application developers and programmers. The other is a small, cut-down, driver component that is distributed with the consumer-level driver, to enable end-users to run the programs created by the developers.

Most companies would call these, respectively, a Software Development Kit, and Redistributable Runtime Components (or something like that).

AMD, on the other hand, uses 'SDK' to describe both. Or used to - they now use 'APP' to describe the consumer runtime support, as in the first line of you script snippet. You certainly need that, and you need to be assured that it's included in any Catalyst driver package that you install (the download pages have been known to lie about that before now - check that it's really there): but you won't need the full developer version SDK.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.