Impossible GPU tasks (floating point) received

rickvanderzwet
rickvanderzwet
Joined: 9 Sep 18
Posts: 12
Credit: 11248898
RAC: 0
Topic 216337

My workunit has a Radeon HD 5770 GPU which get detected as:
AMD AMD JUNIPER (DRM 2.50.0 / 4.15.0-33-generic, LLVM 6.0.0) (1024MB)

This GPU how-ever does not support double presision floating point operations, causing the computation to fail:

21:59:46 (1499): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
21:59:46 (1499): [debug]: glibc version/release: 2.27/stable
21:59:46 (1499): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x25ef148 , 0x7f5c854b2120]
Using OpenCL platform provided by: Mesa
Using OpenCL device "AMD JUNIPER (DRM 2.50.0 / 4.15.0-33-generic, LLVM 6.0.0)" by: AMD
Max allocation limit: 751619276
Global mem size: 1073741824
OpenCL compiling FAILED! : -11 . Error message: input.cl:7:26: error: unsupported OpenCL extension 'cl_khr_fp64' - ignoring
input.cl:10:30: error: unknown type name 'double2'; did you mean 'double'?
input.cl:10:30: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
.
OpenCL device has no FP64 support

How-ever I keep getting GPU tasks assigned, unless I explicitly disable them in the web configuration.

I be happy to write patches, but have no glue where to get started, is this a platform (BOINC) problem or should it be captured by individual projects?

Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40643361169
RAC: 1590305

I may be wrong (and have been

I may be wrong (and have been many times before!) but I suspect your GPU is too old for use here, according to AMD the HD5770 supports Open CL 1.0 and the minimum requirement for current GPU tasks at Einstein is for cards supporting Open CL 1.2 and above.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109377992872
RAC: 35987556

rickvanderzwet wrote:... This

rickvanderzwet wrote:
... This GPU how-ever does not support double presision floating point operations, causing the computation to fail:

Hi Rick,
Welcome to Einstein@Home.  I'm sorry you've had some setbacks whilst getting started.

I don't think the problem is the lack of FP64.  My gut feeling is that the real issue is that the driver you are using doesn't expose the full OpenCL capabilities of your card.  According to this entry on Wikipedia, your card does support OpenCL 1.2. I believe that capability is not being properly recognised.

A task is processed in two stages.  The initial stage (which does about ~90% of the work) only uses single precision.  During this stage, potential candidate signals are identified.  There is a follow-up stage (the final ~10%) which does need DP.  This 2nd stage re-evaluates the 'toplist', the 10 most likely candidates.  If the GPU doesn't have any DP capability, the follow-up stage is supposed to be performed using a CPU core.  Of course, this will slow down that last part of the calculations, but it's not supposed to prevent a task from starting correctly in the first place.

I don't have anything as old as a 5770 but I do have a 7770 and a number of 7850s.  I run all mine using PCLinuxOS.  I've never used Ubuntu so have no experience there.  I maintain older versions of PCLOS in order to use the deprecated fglrx driver.  I suspect that if you were able to run fglrx on an older version of Ubuntu, your card may be usable.  I don't know at what point fglrx was deprecated on Ubuntu.

 At the end of the day, I think your card could work but the performance won't be great.  The 7770 was the first AMD GPU I used at Einstein.  I continue to run it because it's still quite productive.  In the pre-GPU days, I had a lot of CPU only crunchers (2008/2010 vintage).  They were due for retirement quite a while ago.  I stuck an RX 460 (or better) in a lot of them and cut out most of the CPU crunching.  I think it's a good way to rejuvenate a machine that's too slow for CPU tasks but still running fine.  GPU crunch times are largely independent of the speed of the CPU.

 

Cheers,
Gary.

rickvanderzwet
rickvanderzwet
Joined: 9 Sep 18
Posts: 12
Credit: 11248898
RAC: 0

Gary Roberts wrote: I don't

Gary Roberts wrote:

I don't think the problem is the lack of FP64.  My gut feeling is that the real issue is that the driver you are using doesn't expose the full OpenCL capabilities of your card.  According to this entry on Wikipedia, your card does support OpenCL 1.2. I believe that capability is not being properly recognised.

Thanks, spot on!


rick@uheat:~$ lspci -k -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 5770]
    Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 5770]
    Kernel driver in use: radeon
    Kernel modules: radeon
rick@uheat:~$ clinfo
Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.0.5
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Clover
  clCreateContext(NULL, ...) [default]            No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No devices found in platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

Gary Roberts wrote:

I don't have anything as old as a 5770 but I do have a 7770 and a number of 7850s.  I run all mine using PCLinuxOS.  I've never used Ubuntu so have no experience there.  I maintain older versions of PCLOS in order to use the deprecated fglrx driver.  I suspect that if you were able to run fglrx on an older version of Ubuntu, your card may be usable.  I don't know at what point fglrx was deprecated on Ubuntu.

14.04 which is still supported as LTS. I will give it a shot.

 

Gary Roberts wrote:

 At the end of the day, I think your card could work but the performance won't be great.  The 7770 was the first AMD GPU I used at Einstein.  I continue to run it because it's still quite productive.  In the pre-GPU days, I had a lot of CPU only crunchers (2008/2010 vintage).  They were due for retirement quite a while ago.  I stuck an RX 460 (or better) in a lot of them and cut out most of the CPU crunching.  I think it's a good way to rejuvenate a machine that's too slow for CPU tasks but still running fine.  GPU crunch times are largely independent of the speed of the CPU.

 

Performance will indead not be great, the system how-ever is used as alternative to electric heating, and allows re-use of the hardware instead of disposal which is better in the terms of waste hierarchy.  Helping a interesting project is nice bonus :-).

rickvanderzwet
rickvanderzwet
Joined: 9 Sep 18
Posts: 12
Credit: 11248898
RAC: 0

After quite some tinkering,

After quite some tinkering, got the proper fglrx driver installed under Ubuntu 14.04.5 amd64 LTS

$ clinfo | head -6
 Number of platforms: 1
 Platform Profile: FULL_PROFILE
 Platform Version: OpenCL 2.0 AMD-APP (1800.11)
 Platform Name: AMD Accelerated Parallel Processing
 Platform Vendor: Advanced Micro Devices, Inc.
 Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Brief install guide

  • Install standard Ubuntu 14.04.05 LTS amd64 Server edition

  • Download fglrx ubuntu 14.04 *.deb specified below from:

https://www.amd.com/en/support/graphics/amd-radeon-hd/ati-radeon-hd-5000-series/ati-radeon-hd-5770

  •  [optional] Make system up2date

$ sudo apt-get dist-upgrade

  •  Install and load compatible kernel

 $ sudo apt-get install linux-headers-3.19.0-80-generic linux-image-3.19.0-80-generic linux-image-extra-3.19.0-80-generic
 $ sudo sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 3.19.0-80-generic"/' /etc/default/grub
 $ sudo update-grub
 $ sudo reboot

  •  Install boinc software with amd opencl drivers

$ sudo apt-get install boinc-amd-opencl

  •  Fix for segfault with 2nd call to GPU (e.g. on 2nd invocation of clinfo)

$ ar p fglrx_15.201-0ubuntu1_i386_UB_14.01.deb data.tar.gz | sudo tar -C / -xzf - ./etc/ati/amdpcsdb.default

  •  Fix missing OpenCL symlink causing Boinc not to find the coprocessor

$ (cd /tmp; ln -s libOpenCL.so.1 libOpenCL.so)

The irony

All hard work in trying to get the GPU to work has proven to be useless. The scheduler log shows no work assigned to my Radeon HD 5770 GPU since mine only has 512MB memory available and at least 800MB is required  :-)


2018-09-19 21:16:49.2198 [PID=7518 ] [version] Checking plan class 'FGRPopencl-ati'
2018-09-19 21:16:49.2229 [PID=7518 ] [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2018-09-19 21:16:49.2230 [PID=7518 ] [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2018-09-19 21:16:49.2230 [PID=7518 ] [version] OpenCL GPU RAM required min: 803209216.000000, supplied: 536870912
2018-09-19 21:16:49.2230 [PID=7518 ] [version] Checking plan class 'FGRPopencl-nvidia'
2018-09-19 21:16:49.2230 [PID=7518 ] [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2018-09-19 21:16:49.2230 [PID=7518 ] [version] No CUDA devices found
2018-09-19 21:16:49.2230 [PID=7518 ] [version] Checking plan class 'FGRPopencl1K-ati'
2018-09-19 21:16:49.2230 [PID=7518 ] [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2018-09-19 21:16:49.2230 [PID=7518 ] [version] OpenCL GPU RAM required min: 1048576000.000000, supplied: 536870912
2018-09-19 21:16:49.2230 [PID=7518 ] [version] Checking plan class 'FGRPopencl1K-nvidia'
2018-09-19 21:16:49.2230 [PID=7518 ] [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000

The conclusion

I need to find myself more recent GPU, with at least 1GB onboard memory or find GPU tasks with smaller memory requirements.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109377992872
RAC: 35987556

rickvanderzwet wrote:... I

rickvanderzwet wrote:
... I need to find myself more recent GPU, with at least 1GB onboard memory or find GPU tasks with smaller memory requirements.

I'm confused!

In the second line of your opening post, the GPU was detected as having 1GB (1024MB).   That should certainly be sufficient.  I run a HD7770 with that amount of VRAM with no problem and the machine is running a full KDE desktop.

I imagine there is something else to tweak to stop the supplied memory figure being set at only 536870912, which is sufficiently more than 512MB to indicate you do indeed already have a 1GB card.

 

Cheers,
Gary.

rickvanderzwet
rickvanderzwet
Joined: 9 Sep 18
Posts: 12
Credit: 11248898
RAC: 0

Gary Roberts

Gary Roberts wrote:
rickvanderzwet wrote:
... I need to find myself more recent GPU, with at least 1GB onboard memory or find GPU tasks with smaller memory requirements.

I'm confused!

In the second line of your opening post, the GPU was detected as having 1GB (1024MB).   That should certainly be sufficient.  I run a HD7770 with that amount of VRAM with no problem and the machine is running a full KDE desktop.

You are right (as usual). The card comes in two variants 512MB and 1GB. After visual inspection the marking on my card clearly shows 1GB, so something is clearly wrong here.

Gary Roberts wrote:

I imagine there is something else to tweak to stop the supplied memory figure being set at only 536870912, which is sufficiently more than 512MB to indicate you do indeed already have a 1GB card.

 

Calculation matched up 536870912 bytes / 1024 / 1024 = 512MB, which tricked me in thinking this was correct. The search will continue...

rickvanderzwet
rickvanderzwet
Joined: 9 Sep 18
Posts: 12
Credit: 11248898
RAC: 0

It turns out to be setting a

It turns out to be setting a hidden environment variable which allows exposing the memory.

$ clinfo  | grep -e 'Max memory' -e 'Global memory size' | head -2   Max memory allocation:             134217728   Global memory size:                 536870912

 

$ GPU_MAX_ALLOC_PERCENT=95 GPU_MAX_HEAP_SIZE=95 clinfo  | grep -e 'Max memory' -e 'Global memory size' | head -2   Max memory allocation:             254803968   Global memory size:                 1019215872

In order to make them apply for all applications I have put them in /etc/default/boinc-client which get loaded on startup:

$ grep GPU_ /etc/default/boinc-client GPU_MAX_ALLOC_PERCENT=95 GPU_MAX_HEAP_SIZE=95 export GPU_MAX_ALLOC_PERCENT export GPU_MAX_HEAP_SIZE

 

 

How-ever I end up with roughly the same message as before:

/einstein.phys.uwm.edu/LATeah1021L_172.0_0_0.0_12612523_0_1'
13:58:45 (2427): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
13:58:45 (2427): [debug]: glibc version/release: 2.19/stable
13:58:45 (2427): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x1a01b60 , 0x7fed72eac430]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "Juniper" by: Advanced Micro Devices, Inc.
Max allocation limit: 254803968
Global mem size: 1019215872
OpenCL compiling FAILED! : -11 . Error message: "/tmp/OCLhykF21.cl", line 10: error: identifier "double2" is undefined
  __kernel void test( __global double2 *vec) {
                               ^

1 error detected in the compilation of "/tmp/OCLhykF21.cl".
Frontend phase failed compilation.

OpenCL device has no FP64 support
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah1021L.dat

 

 

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

The error about not

The error about not supporting FP64 has nothing to do with the task crashing!

A few lines further down in the "stderr output", when the task has gotten to the real starting point, you will find:

Error allocating device memory: 268435456 bytes (error: -61)
13:59:20 (2459): [CRITICAL]: ERROR: MAIN() returned with error '1'

This is the real cause of the task crashing.
As to what's causing the error I've no clue. *Sorry*

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109377992872
RAC: 35987556

rickvanderzwet wrote:1 error

rickvanderzwet wrote:

1 error detected in the compilation of "/tmp/OCLhykF21.cl". Frontend phase failed compilation.

 

Holmis is quite correct.  The above failure of a small test routine just confirms that your card has no FP64 support.  All this should do is ensure that the follow-up stage at the end is performed using a CPU core.

As to why you get the subsequent error about allocating device memory, I have no idea either.

I did a quick google search and found this question and answers.  I don't know if any of the comments there about 14.04.4 and 14.04.5 have any bearing on the matter.  As I mentioned, I've never used Ubuntu so have no idea about how things are done in Ubuntu-land.

There was a final version of the catalyst package that (from memory) came out around mid 2015.  In any case it was just before fglrx was deprecated due to the incompatible upgrade of the xorg package.  I remember making sure that my machines using fglrx stayed with the previous version of xorg when they got that final version of fglrx.  It actually gave a nice little performance improvement at the time.

 

Cheers,
Gary.

rickvanderzwet
rickvanderzwet
Joined: 9 Sep 18
Posts: 12
Credit: 11248898
RAC: 0

Holmis wrote:The error about

Holmis wrote:

The error about not supporting FP64 has nothing to do with the task crashing!

Thanks that's a really good suggestion! I was assuming the first error was the dominant one.

 

Holmis wrote:

A few lines further down in the "stderr output", when the task has gotten to the real starting point, you will find:

Error allocating device memory: 268435456 bytes (error: -61)
13:59:20 (2459): [CRITICAL]: ERROR: MAIN() returned with error '1'

This is the real cause of the task crashing.
As to what's causing the error I've no clue. *Sorry*


Found the cause of the memory allocation error. It's trying to allocate all available memory (268435456) where-as using the environment variables I told was only allowed to use 95% (Max allocation limit: 254803968). The code (quite) rightfully has no knowledge of the limit and thus does not honour them. I set the limit 100% and the task is running for a short while.

How-ever it is failing with an other error:

OpenCL device has no FP64 support
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah1021L.dat
% Total amount of photon times: 8950
% Preparing toplist of length: 10
% Read 1631 binary points
read_checkpoint(): Couldn't open file 'LATeah1021L_188.0_0_0.0_16164841_0_0.out.cpt': No such file or directory (2)
% fft_size: 16777216 (0x1000000); alloc: 67108872
% Sky point 1/1
% Binary point 1/1631
% Creating FFT plan.
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% Starting semicoherent search over f0 and f1.
% nf1dots: 41  df1dot: 2.512676418e-15  f1dot_start: -1e-13  f1dot_band: 1e-13
% Filling array of photon pairs
ERROR: /home/bema/fermilat/src/bridge_fft_clfft.c:1150: kernel kernel_sortedPhoton failed. status=-4
error in opencl_qsort
12:44:02 (4056): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:  PRECISION

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.