GPU crunching on Linux - Fun with AMD fglrx drivers

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110009299906
RAC: 24161499

RE: What is interesting is

Quote:
What is interesting is that a search for CUDA in Ubuntu's "Software Center" produces NO hits. Hmm. Maybe they have a reason for hiding it????


I wouldn't think so. Different distros probably have quite different packaging philosophies. PCLinuxOS seems to want to keep things small. For a long time the aim was to fit the full KDE version on a CD. Because of that my impression is that big packages were split into smaller pieces so that 'essentials' could go into the CD image and 'extras' could be downloaded from the repo just by those who really wanted the stuff. I really appreciate the minimalist 'MiniMe' version which is still less than 500MB I think. Other distros create 'meta packages' and throw everything in.

Quote:
This thread you started has been most useful to me and to others I would imagine.


I'm glad it was useful for someone. I have an ulterior motive. I find that writing things down helps to clarify thinking. I often work things out by trying to explain it in great detail. That has worked again for me because I've now managed to come up with a procedure for a repo based driver installation/upgrade that doesn't require the installation of the SDK.

Over the weekend, I went through the perl installation script for the SDK until I felt I understood what was going on. I had a look at the stuff that ended up in /opt/AMDAPP/... and I think I can answer a question you asked earlier

Quote:

... The install of SDK modifies /etc/profile to include:

AMDAPPSDKROOT="/opt/AMDAPP"
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"/opt/AMDAPP/lib/x86_64":"/opt/AMDAPP/lib/x86"
export AMDAPPSDKROOT
export LD_LIBRARY_PATH

so my question is: does the BOINC download from Berkeley look for these "exports" to resolve libraries for GPU crunching?


The answer is, I reckon, no. To me this looks like setting up a development environment if you want to compile the examples or write your own code. Nothing to do with BOINC or science apps. So I formed the opinion that all this was of no interest to us. The libraries we need are already in place from the catalyst package install and the reason BOINC/science apps can't find them is that they are not in the 'right' place.

I've done a couple more upgrades to the 13.12 catalyst version and one in particular provided the answer (I think). A couple of my 'early' HD7850 hosts were still on openSUSE from before I worked out how to make it work with the drivers from the AMD website. They have always 'just worked' so I hadn't touched them previously. Now was the perfect time. I decided to save the BOINC tree, wipe the disk clean and do a brand new install with PCLinuxOS. I know exactly how to transfer a BOINC installation to a fresh OS install without risking anything so I wasn't worried about trashing anything. Below is a summary of the steps I took.

1. Stop BOINC and save the complete tree to a file share.
2. Reboot the machine to the live USB that I maintain and do a full OS installation.
3. Do initial configuration so that the host has the same hostname, IP address, customised .bashrc, etc as in its previous incarnation. Configure ssh and make sure both the new host and the fileserver are happy with each other.
4. Do a full upgrade with Synaptic. Install WxWidgets (needed by BOINC Manager) and all the fglrx-current stuff, including fglrx-current-opencl. Install a PAE kernel.
6. Reboot and watch the various kernel modules get installed on the new kernel. Go into the control centre and check the video configuration to make sure the new proprietary driver is being used. Browse /etc/X11/xorg.conf.
7. Copy the saved BOINC tree from the fileshare. The old machine had been running 7.1.3 so at this point I just copied the 7.2.42 executables 'over the top', thus performing a BOINC upgrade at the same time.
8. Load the desktop with all the icons I use for controlling BOINC, browsing project directories, pulling up different file shares, etc.
9. Check for things that the SDK would have installed or checked if it had been run. There are 4 things in particular, /usr/bin/clinfo, /etc/OpenCL/vendors/libamdocl32.icd, /usr/lib/libamdocl32.so and /usr/lib/libOpenCL.so. The first two were present but the two libs weren't. They were in /usr/lib/fglrx-current/. So I created symlinks to these files and put them in /usr/lib/.
10. Check the new BOINC executables with ldd to make sure there are no missing libs. All clear. Check the Einstein BRP5 GPU app with ldd. LibOpenCL.so was not found.
11. Run the ldconfig -v command (shown near the end of the perl script) to rebuild the cache. I know it produces a lot of output so I directed that to ldconfig.out. Check the science app again with ldd. What do you know - no missing libs any more :-).
12. Launch BOINC and watch the science app devour the GPU tasks :-).

I'm not sure I needed the two symlinks or whether ldconfig would have fixed things on its own. I'll work that out on the other openSUSE machine when I upgrade it. In any case the SDK is not required.

I finished this upgrade around 10 hours ago and so far there are about 6 tasks that have completed fully under the new setup. It's a quad core running 4 GPU tasks and 2 CPU tasks concurrently. GPU tasks are BRP5 only.

Previously, for 13.4 drivers, average times were Elapsed=18.05Ksecs CPU=3.15Ksecs.
After upgrading to 13.12 drivers, average times are Elapsed=15.75Ksecs CPU=2.25Ksecs.

That's actually quite an amazing improvement.

Cheers,
Gary.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454560846
RAC: 3056

RE: 9. Check for things

Quote:

9. Check for things that the SDK would have installed or checked if it had been run. There are 4 things in particular, /usr/bin/clinfo, /etc/OpenCL/vendors/libamdocl32.icd, /usr/lib/libamdocl32.so and /usr/lib/libOpenCL.so. The first two were present but the two libs weren't. They were in /usr/lib/fglrx-current/. So I created symlinks to these files and put them in /usr/lib/.

Interesting. Had you not created the soft links to these two libraries and if you had not run ldconfig I wonder if you would have had an issue? If you had then the question beckons, which one, soft links or ldconfig, would have fixed it. I know that you use the Berkely download. I prefer a distro's version of BOINC - it makes for easier management for me. So if this is a library issue then its quite possible that distro BOINC or Berkeley BOINC might be fine. And it becomes a matter of AMD's logic correctly installing or generating soft links to the requisite libraries. For clarity would you mind posting the softlinks that you generated?

My CAL AMD Radeon HD 7850/7870 series (Pitcairn) (2048MB) driver: 1.4.1848 is performing like this:

13000 for the BRP5 tasks
3800 for the BRP4 tasks
5800 for the Gamma-ray pulsar tasks.

I am set for 3 GPU units. Of course the Gamma-ray pulsar #3 tasks cause a CPU reassignment when they are scheduled which puts some CPU tasks into a "Wait" state.


And by the way: nice progress.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110009299906
RAC: 24161499

RE: ... Had you not

Quote:
... Had you not created the soft links to these two libraries and if you had not run ldconfig I wonder if you would have had an issue?


Yes, I would have had an issue. I know this for two reasons. As I mentioned in the opening message of this thread when I had to do a complete reinstall of the OS after a power glitch, I installed all the fglrx-current packages including the OpenCL one but still had ".. Missing GPU .." messages for all GPU tasks in the cache when restarting BOINC. The second reason is in point 10. of the list. I ran ldd on the BRP5 GPU app in the project directory and it said that OpenCL was not found.

Quote:
If you had then the question beckons, which one, soft links or ldconfig, would have fixed it.


I'll pin that down shortly when I upgrade another openSUSE box with a HD7850. This time I'll just run ldconfig without any extra symlinks. I'm not a programmer so I'm oblivious to the finer points of determining "run time link bindings" - which is what the manpage says ldconfig does. I have read the manpage a couple of times and I'm getting the message that (amongst other things) ldconfig should be run when new shared libs are installed, although rebooting is supposed to do this anyway. I'm wondering if the problem is due to some missing configuration that tells ldconfig to process stuff in /usr/lib/fglrx-current/? I'll experiment with ldconfig -p which prints ld.so.cache and try to work out what exactly causes the OpenCL libs to get added to the cache.

Quote:
I know that you use the Berkely download. I prefer a distro's version of BOINC - it makes for easier management for me.


I actually find the reverse - but I'm a nut case :-). Distro versions (I think) stick things under /var/lib/boinc and create a boinc user and group. Unless you put /var on a separate partition, an OS reinstall would force you to save the boinc tree and restore it later. For me, with /home on its own partition, all my personal configuration is protected across OS reinstalls.

On my large fleet of crunching boxes, I want a single user apart from root. I don't want to be worried about permissions when I want to create/edit/delete boinc files. I want everything I do living under /home/gary - a single point to backup if I need to. I put quite a bit of configuration in .bashrc. I don't want to maintain that for multiple users. I realise I'm not typical of an average volunteer who would want to his/her personal stuff well clear of boinc.

In any case a distro version of boinc is a moot point for PCLinuxOS at the moment. They used to have it in the repo but it was always well behind the current version. I checked recently and there is no current package. There is still a source rpm but it is 7.0.65. I'm guessing that the person/volunteer who used to package boinc is no longer doing so and nobody else has taken over. A quick search of their forums shows very few hits for 'BOINC'. I'm happy to keep using it straight from Berkeley.

Quote:
For clarity would you mind posting the softlinks that you generated?


 # cd /usr/lib
 # ln -s fglrx-current/libamdocl32.so libamdocl32.so
 # ln -s fglrx-current/libOpenCL.so.1 libOpenCL.so

Quote:

My CAL AMD Radeon HD 7850/7870 series (Pitcairn) (2048MB) driver: 1.4.1848 is performing like this:

13000 for the BRP5 tasks
3800 for the BRP4 tasks
5800 for the Gamma-ray pulsar tasks.

I am set for 3 GPU units. Of course the Gamma-ray pulsar #3 tasks cause a CPU reassignment when they are scheduled which puts some CPU tasks into a "Wait" state.


Looks pretty good. That's 4300secs per task for BRP5. I'm getting around 3950secs per task by running 4x and running only BRP5. I'm tempted to try 5x and see what happens :-).

Quote:
And by the way: nice progress.


Thanks! :-).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110009299906
RAC: 24161499

RE: RE: If you had then

Quote:
Quote:
If you had then the question beckons, which one, soft links or ldconfig, would have fixed it.

I'll pin that down shortly when I upgrade another openSUSE box with a HD7850. This time I'll just run ldconfig without any extra symlinks ...


Well things get even more curious.

I didn't upgrade a box with a 7850 but rather one with a 7770 but still running openSUSE. Everything went smoothly as per the previous report down to step 10. This time the BRP5 science app showed no missing libs when I tested it with ldd. The ldd output showed an entry for libOpenCL.so.1 which was pointing to exactly where the lib was in /usr/lib/fglrx-current/. So if the Einstein app could find this lib, I presumed that BOINC would be able to detect the GPU on startup. I had done an extra reboot or two compared with previously so I wondered if that was part of the magic incantation. I decided not to run ldconfig -v immediately but rather to use ldconfig -p to print out a copy of the existing ld.so.cache. Sure enough the two entries below were present

    libamdocl32.so (libc6) => /usr/lib/fglrx-current/libamdocl32.so
    libOpenCL.so.1 (libc6) => /usr/lib/fglrx-current/libOpenCL.so.1


There was nothing in /usr/lib/ itself so I figured that the system did indeed know about the contents of fglrx-current/ without needing symlinks. Encouraged by this, I fired up boinc only to be greeted with all the ... missing GPU ... messages. I shut down boinc and created the symlinks, figuring that boinc itself must be looking for these libs to be in /usr/lib/.

So I created the symlinks but using slightly shorter commands compared with what I posted previously, as shown below

 # cd /usr/lib
 # ln -s fglrx-current/libamdocl32.so .
 # ln -s fglrx-current/libOpenCL.so.1 .


By using the dot instead of an actual name, the names of the symlinks in /usr/lib/ would have been derived from the final components of the source path, ie libamdocl32.so and libOpenCL.so.1 respectively. I actually wondered if this might be a mistake since the previous working procedure had created libOpenCL.so as the name of the symlink and this time I was creating libOpenCL.so.1. So, without using ldconfig -v, I tried boinc again but still no joy.

I was determined to persist so I removed the offending symlink and recreated it with the name libOpenCL.so, just as it had been on the previous machine. This was the only change I made. I didn't reboot or run any utility. That would have been my next step, had I needed one. This time on firing up boinc, everything was working properly. At last!!!! :-).

Obviously, I'm a novice with all this, so don't necessarily think the following conclusion is correct. Caveat Emptor!! :-). I can't help but feel that this looks like a BOINC problem and not an AMD problem or a distro problem. All I can immediately think of is that boinc must require precisely /usr/lib/libOpenCL.so to be present in order to be convinced that there is an OpenCL capable GPU available. So I guess my next step is to pester Jord to see if he has any information on how GPU detection under Linux works and why it seems to need precisely /usr/lib/libOpenCL.so.

I have one more openSUSE machine to go. When I upgrade that to PCLinuxOS, I'll have a further chance to confirm the fully working procedure.

Cheers,
Gary.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454560846
RAC: 3056

RE: For clarity would you

Quote:
For clarity would you mind posting the softlinks that you generated?
 # cd /usr/lib
 # ln -s fglrx-current/libamdocl32.so libamdocl32.so
 # ln -s fglrx-current/libOpenCL.so.1 libOpenCL.so

I have read your last two posts and I don't want to confuse any readers since you are using a different Linux distro than I (PCLinux vs. Ubuntu). As you have stated it seems to be very much a library issue.

On Ubuntu
my libamdocl32.so library is in:

/usr/lib/i386-linux-gnu/libamdocl32.so <-- a 32 bit library

my libOpenCL.so.1 is in:

/usr/lib/libOpenCL.so.1
/usr/lib/i386-linux-gnu/libOpenCL.so.1

Neither of the two above are soft links.

I noticed on Ubuntu I have the following exported:

LIBGL_DRIVERS_PATH=/usr/lib/i386-linux-gnu/dri:/usr/lib/x86_64-linux-gnu/dri

it seems that fglrx_dri.so gets installed in both of these paths defined in /etc/profile.d/ati-fglrx.sh which states in its header that it is modified by the ATI Proprietary driver scripts. I am not sure how important the environment variable is but thought I would mention it.

I hope this library issue is not due to Linux distro but rather by what packages get installed. I believe you for example installed fglrx-current where as I did not. This might account for different paths above for the two noted libraries. What effect this would have on BOINC I am not certain of but I would think BOINC is expecting some standardization when it comes to library paths so that it can find what it requires for GPU support.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454560846
RAC: 3056

I was just looking in the

I was just looking in the boinc code and found in {my_location}/client that there are 4 files referencing hardware drivers. They are gpu_amd.cpp, gpu_nvidia.cpp, gpu_opencl.cpp, and hostinfo_unix.cpp. The first 3 are of interest.

gpu_amd.cpp references libaticalrt.so

gpu_nvidia.cpp references /usr/local/cuda/lib/libcuda.dylib
and libcuda.so

gpu_opencl.cpp references libOpenCL.so and OpenCL.dll (not of interest - windows).

I did not have a chance to determine where the Boinc code expects these libraries to reside. Probably should do that but its late here so I am checking in and will look around more tomorrow.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454560846
RAC: 3056

I downloaded the BOINC code

I downloaded the BOINC code from the Berkeley site and compiled it. As noted in my earlier posts 3 C files were looking for libraries as noted below. In my earlier post I had not identified the path for the libraries. Now based upon this compile I feel relatively confident that BOINC from Berkeley expects them in the directories noted below in red.

Quote:

I was just looking in the boinc code and found in {my_location}/client that there are 4 files referencing hardware drivers. They are gpu_amd.cpp, gpu_nvidia.cpp, gpu_opencl.cpp, and hostinfo_unix.cpp. The first 3 are of interest.

gpu_amd.cpp references libaticalrt.so

[EDIT] I feel that these are the libraries that BOINC Berkeley "looks for" in these paths in order to determine if there is a valid GPU based upon vendor type. To be certain I would have to scrub a PC rebuild and install GPU drivers paying close attention to where these files get installed. Different packages can install in different directory paths. Its knowing which packages to install. Since I do not have an available PC I cannot just scrub a working unit to test this. If someone looking at this finds it in error then fix or delete it.


/usr/lib/libaticalrt.so

Quote:

gpu_nvidia.cpp references /usr/local/cuda/lib/libcuda.dylib
and libcuda.so


depending on which NVIDIA driver release you have installed you might have something like the following:

  • libcuda.so.1 (libc6,x86-64) => /usr/lib/nvidia-331/libcuda.so.1 libcuda.so.1 (libc6) => /usr/lib32/nvidia-331/libcuda.so.1
    libcuda.so (libc6,x86-64) => /usr/lib/nvidia-331/libcuda.so

libcuda.so (libc6,x86-64) => /usr/lib/libcuda.so
libcuda.so (libc6) => /usr/lib32/nvidia-331/libcuda.so
libcuda.so (libc6) => /usr/lib32/libcuda.so

My NVIDIA node has had different NVIDIA drivers installed using a PPA, but I feel relatively sure that BOINC in this case would look to the standard library path highlighted in "bold" above.

[EDIT] I am running 64 bit Ubuntu.

Quote:

gpu_opencl.cpp references libOpenCL.so and OpenCL.dll (not of interest - windows).

/usr/lib/libaticalrt.so -> /usr/lib/libOpenCL.so.1

[Edit] For me to be certain that these are the 3 libraries needed by BOINC to find a GPU I would need to rebuild a node from scratch paying attention to these 3 libraries. Different packages for driver support can have an effect on these library paths. Since I do not have a "free" PC I can't confirm that successful installs of drivers is based upon the 3 libraries. If someone reading this find it to be in error then delete or fix it. I just can't cannibalize a working unit.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454560846
RAC: 3056

RE: I downloaded the BOINC

Quote:

I downloaded the BOINC code from the Berkeley site and compiled it. As noted in my earlier posts 3 C files were looking for libraries as noted below. In my earlier post I had not identified the path for the libraries. Now based upon this compile I feel relatively confident that BOINC from Berkeley expects them in the directories noted below in red.

Quote:

I was just looking in the boinc code and found in {my_location}/client that there are 4 files referencing hardware drivers. They are gpu_amd.cpp, gpu_nvidia.cpp, gpu_opencl.cpp, and hostinfo_unix.cpp. The first 3 are of interest.

gpu_amd.cpp references libaticalrt.so

[EDIT] I feel that these are the libraries that BOINC Berkeley "looks for" in these paths in order to determine if there is a valid GPU based upon vendor type. To be certain I would have to scrub a PC rebuild and install GPU drivers paying close attention to where these files get installed. Different packages can install in different directory paths. Its knowing which packages to install. Since I do not have an available PC I cannot just scrub a working unit to test this. If someone looking at this finds it in error then fix or delete it.


/usr/lib/libaticalrt.so
Quote:

gpu_nvidia.cpp references /usr/local/cuda/lib/libcuda.dylib
and libcuda.so

depending on which NVIDIA driver release you have installed you might have something like the following:

  • libcuda.so.1 (libc6,x86-64) => /usr/lib/nvidia-331/libcuda.so.1 libcuda.so.1 (libc6) => /usr/lib32/nvidia-331/libcuda.so.1
    libcuda.so (libc6,x86-64) => /usr/lib/nvidia-331/libcuda.so
libcuda.so (libc6,x86-64) => /usr/lib/libcuda.so

libcuda.so (libc6) => /usr/lib32/nvidia-331/libcuda.so
libcuda.so (libc6) => /usr/lib32/libcuda.so

My NVIDIA node has had different NVIDIA drivers installed using a PPA, but I feel relatively sure that BOINC in this case would look to the standard library path highlighted in "bold" above.

[EDIT] I am running 64 bit Ubuntu.

Quote:

gpu_opencl.cpp references libOpenCL.so and OpenCL.dll (not of interest - windows).

[EDIT]/usr/lib/libOpenCL.so -> /usr/lib/libOpenCL.so.1

[Edit] For me to be certain that these are the 3 libraries needed by BOINC to find a GPU I would need to rebuild a node from scratch paying attention to these 3 libraries. Different packages for driver support can have an effect on these library paths. Since I do not have a "free" PC I can't confirm that successful installs of drivers is based upon the 3 libraries. If someone reading this find it to be in error then delete or fix it. I just can't cannibalize a working unit.


robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454560846
RAC: 3056

I just updated/upgraded this

I just updated/upgraded this node using the BOINC update manager. This upgraded various packages including new linux headers and a new kernel. I rebooted the node. Started boincmgr from the command line (not using the Ubuntu boinc package) All CPU jobs that were in progress started to run BUT NOT the GPU jobs - they were indicating no usable GPU. Exited boinc manager. reinstalled the AMD driver (I am using the beta version). It complained about an existing fglrx install. "cd /usr/share/ati" and ran ./fglrx-uninstall.sh, followed by an install of the beta ATM driver. It was a clean install of this driver. restarted boincmgr from the command line. All CPU and GPU jobs started up.

Normally on a box running NVIDIA I have seen a problem after a kernel upgrade where "No Usable GPUs" are reported, however this can almost always be fixed with "sudo service boinc-client restart". I looked/googled but could not find a way to restart the boinc-client for the Berkeley download.

I am expecting this to be "the procedure" moving forward. It was also my experience with NVIDIA when downloading their driver. Any kernel change required an install of their driver also. This is why I chose the PPA path for NVIDIA which eliminates the need to reinstall the driver. It is handled in the process of upgrading. I have yet to find such an option for AMD.

[EDIT] After running into the problem with "No usable GPUs" I checked to see if any of the 3 libraries discussed above had been deleted/relocated. They had not.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2773273336
RAC: 873833

RE: I rebooted the node.

Quote:
I rebooted the node. Started boincmgr from the command line (not using the Ubuntu boinc package) All CPU jobs that were in progress started to run BUT NOT the GPU jobs - they were indicating no usable GPU.


There have been many reports over the years about a timing problem during BOINC startup under Linux. If the BOINC client is started "too soon" after reboot (a deliberately vague term), and Xserver isn't fully initialsed, then the GPU drivers won't have finished initialising either, and BOINC - which relies on functioning drivers for GPU detection - won't be able to see the GPUs.

Could you try repeating that sequence, with either a longer pause before starting BOINC, or alternatively trying a *BOINC* closedown/restart if you see 'no usable GPU'?

Apropos, on your system, does launching BOINC Manager trigger the startup of the client, or does the client start by itself from a daemon launch script, much earlier in the boot process?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.