Improvements in the code of the clients

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023364931
RAC: 1813968

Bernd Machenschalk wrote: I

Bernd Machenschalk wrote:

I added app (Beta Test) versions for AMD/ATI w. OpenCL 2.0.

I've run a few tasks of 1.28 Gamma-ray pulsar binary search #1 on GPUs (FGRPopencl2-ati) on three machines with four AMD GPUs.  The GPUs include two 5700s, one 6800, and one 6800 XT.

All the tasks ran to completion.  I have at least one validation from each.  I saw modest throughput improvement on three of the GPUs, but none on the 6800 XT machine.  It could be relevant that it was the only one running 4X, while the other three cards were running 3X or 2X.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109389236782
RAC: 35906440

Bernd Machenschalk wrote:I

Bernd Machenschalk wrote:
I added app (Beta Test) versions for AMD/ATI w. OpenCL 2.0.

Thank you for providing an AMD/ATI version for testing.

I have a few 6C/12T Ryzen 5 machines (and others with Intel i3s) which have RX 570 GPUs so I set one up to allow beta work.  On that machine, clinfo reports for the "Platform Version:"  OpenCL 2.1 AMD-APP (3180.7).  I'm using the 20.40 version of the Red Hat AMDGPU-PRO package for the OpenCL libs.

At the end of the clinfo output, there are a few lines under a sub-heading of "Platform ID:" which give information about the device, including a line which  simply states "Device OpenCL C version:   OpenCL C 1.2".  I have no idea what the "C" character is supposed to represent but I assume the following "1.2" is supposed to indicate that RX 570s don't support OpenCL 2.0.

I googled for what version is supported and saw a number of hits (including AMD itself) that mentioned that 2.0 was supported.  So I decided to give it a try to see what would happen.  In light of the clinfo output, I wasn't all that hopeful.

After not seeing any test tasks following a work request, I decided to look at the last contact logs on the server to see what the response was.  Yes, the server noted that beta test work was allowed, but only normal work was sent.  I'm guessing that the reason was the OpenCL 2 test, although the wording used in the scheduler response seems rather odd.  There was a single line which said:-

.... [CRITICAL] Unknown plan class: FGRPopencl2ati

When other non-supported plan classes (eg for nvidia devices) are being checked, the wording is much clearer - eg.:-

.... [version] No CUDA devices found

I'm not sure what is [CRITICAL] -- it seems to imply some sort of server issue.  Wouldn't it be better if the message just reported a simple "[version] No ATI/AMD devices supporting OpenCL 2.0 found."?

As I imagine there will probably be others with similar devices trying to take advantage of a more efficient app, it would be useful if the message told them very clearly what the problem was.

My next step will be to try to determine whether or not an RX 570 does support OpenCL 2.0.  My guess is that it may do so but that the libs in the AMDGPU-PRO package don't expose it.  Maybe I need to come to grips with ROCm.  If anyone happens to know the true status of OpenCL on RX 570s, I'd certainly appreciate being 'educated' :-).

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33825923742
RAC: 37778635

Gary Roberts wrote: Bernd

Gary Roberts wrote:

Bernd Machenschalk wrote:
I added app (Beta Test) versions for AMD/ATI w. OpenCL 2.0.

Thank you for providing an AMD/ATI version for testing.

I have a few 6C/12T Ryzen 5 machines (and others with Intel i3s) which have RX 570 GPUs so I set one up to allow beta work.  On that machine, clinfo reports for the "Platform Version:"  OpenCL 2.1 AMD-APP (3180.7).  I'm using the 20.40 version of the Red Hat AMDGPU-PRO package for the OpenCL libs.

At the end of the clinfo output, there are a few lines under a sub-heading of "Platform ID:" which give information about the device, including a line which  simply states "Device OpenCL C version:   OpenCL C 1.2".  I have no idea what the "C" character is supposed to represent but I assume the following "1.2" is supposed to indicate that RX 570s don't support OpenCL 2.0.

I googled for what version is supported and saw a number of hits (including AMD itself) that mentioned that 2.0 was supported.  So I decided to give it a try to see what would happen.  In light of the clinfo output, I wasn't all that hopeful.

After not seeing any test tasks following a work request, I decided to look at the last contact logs on the server to see what the response was.  Yes, the server noted that beta test work was allowed, but only normal work was sent.  I'm guessing that the reason was the OpenCL 2 test, although the wording used in the scheduler response seems rather odd.  There was a single line which said:-

.... [CRITICAL] Unknown plan class: FGRPopencl2ati

When other non-supported plan classes (eg for nvidia devices) are being checked, the wording is much clearer - eg.:-

.... [version] No CUDA devices found

I'm not sure what is [CRITICAL] -- it seems to imply some sort of server issue.  Wouldn't it be better if the message just reported a simple "[version] No ATI/AMD devices supporting OpenCL 2.0 found."?

As I imagine there will probably be others with similar devices trying to take advantage of a more efficient app, it would be useful if the message told them very clearly what the problem was.

My next step will be to try to determine whether or not an RX 570 does support OpenCL 2.0.  My guess is that it may do so but that the libs in the AMDGPU-PRO package don't expose it.  Maybe I need to come to grips with ROCm.  If anyone happens to know the true status of OpenCL on RX 570s, I'd certainly appreciate being 'educated' :-).

On Linux, the only way to get OpenCL 2.0 support with an RX570 is with the ROCm full install. The ROCr package in the AMDGPU-Pro package does not support OpenCL 2.0 on GPUs older than Vega. 
 

remove your AMDGPU-Pro drivers and install the ROCm package with the instructions here: 

https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html

 

but from my testing, an RX570 sees little benefit with the code changes made, I saw almost no difference. 
 

additionally, since the project is checking for OpenCL version reported by BOINC, and even with ROCm drivers, BOINC still reported 1.2, so the project might not send to you anyway, even if you have the right drivers due to BOINC. It is possible however to “trick” BOINC to reporting OpenCL 2.0 by editing and locking down your coproc_info.xml file. 

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109389236782
RAC: 35906440

Ian&Steve C. wrote:On

Ian&Steve C. wrote:
On Linux, the only way to get OpenCL 2.0 support with an RX570 is with the ROCm full install. The ROCr package in the AMDGPU-Pro package does not support OpenCL 2.0 on GPUs older than Vega.

Thanks very much for the comprehensive reply.  Very helpful and and much appreciated.

My distro of choice (PCLinusOS) is neither Debian nor Red Hat based so I've been waiting (right from ROCm version 1 days) for it to reach some sort of stability and maturity before trying to work out what to do for a non-supported distro.  I haven't looked at rocmdocs for quite a while so I spent some time just now going through the latest guide that you linked to.  Things still appear to be very much in transition and not a fully polished and stable product.

I really appreciate the advice that RX 570s don't enjoy a benefit.  I don't have anything more recent than Polaris since I'm not prepared to throw away working gear and pay ridiculous prices to replace it.  I'm guessing it might take a year or two for some sort of normality to return so I reckon I'll be staying with what I've got for a little while yet :-).

I had picked up (and noted down) the trick for editing and then setting the immutable bit on coproc_info.xml to get BOINC to report version 2.0 to the scheduler.  You had pointed that out in a post some time ago and I recorded it for potential future use (if needed) :-).  If I hadn't received any reply to my earlier post in this thread, I had intended to give it a shot later on to see what would happen.  I wont bother now since (with your information) it's just going to result in compute errors for any test tasks received - and no ultimate benefit in any case.

All my current GPUs work well with OpenCL from the 20.40 PRO package.  I was going to try a 21.xx download and try to figure out how to use any ROC stuff from that but in light of your comment that ROCr bits wont work, I wont bother.  You've saved me quite a bit of time and effort.  Thanks for that as well.

 

Cheers,
Gary.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Bernd Machenschalk wrote:I

Bernd Machenschalk wrote:
I added app (Beta Test) versions for AMD/ATI w. OpenCL 2.0.

I tried a couple of v1.28 tasks on this fresh old host.

Windows 10,  AMD Radeon RX 580 Series (8192MB), GPU driver is newest version 21.8.2

Boinc event log shows this info about this GPU: "... device version OpenCL 2.0 AMD-APP ..."

 

Still it seems this card isn't able to run those v1.28 tasks.

https://einsteinathome.org/task/1160528286

https://einsteinathome.org/task/1160562958

Boinc event log shows both tasks went like this:

Starting task X ... Computation for task X finished ... Output file Y for task X absent ... Output file Z for task X absent

 

A regular v1.22 task run just fine: https://einsteinathome.org/task/1160538043

But it looks like a wingman on that one also had that identical game breaking problem when running v1.28 on AMD Radeon VII (gfx906):

https://einsteinathome.org/task/1160448033

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33825923742
RAC: 37778635

Richie wrote: Bernd

Richie wrote:

Bernd Machenschalk wrote:
I added app (Beta Test) versions for AMD/ATI w. OpenCL 2.0.

I tried a couple of v1.28 tasks on this fresh old host.

Windows 10,  AMD Radeon RX 580 Series (8192MB), GPU driver is newest version 21.8.2

Boinc event log shows this info about this GPU: "... device version OpenCL 2.0 AMD-APP ..."

 

Still it seems this card isn't able to run those v1.28 tasks.

https://einsteinathome.org/task/1160528286

https://einsteinathome.org/task/1160562958

Boinc event log shows both tasks went like this:

Starting task X ... Computation for task X finished ... Output file Y for task X absent ... Output file Z for task X absent

 

A regular v1.22 task run just fine: https://einsteinathome.org/task/1160538043

But it looks like a wingman on that one also had that identical game breaking problem when running v1.28 on AMD Radeon VII (gfx906):

https://einsteinathome.org/task/1160448033

very possible that something else might need to be changed in the app still. thought we at least had one report from arch that it worked for his RX5700 and RX6800. 

but as I've mentioned, I never saw an improvement with Polaris (RX570, but no errors) and don't have any Vega cards to test, but maybe in the same boat. I know that petri, Tom, and I went through a few code iterations to get it working on AMD without errors. but once it was working, it worked on both my RX570 and Tom's RX5700, and we never saw the error being shown here now.

My 570 saw no improvement, and Tom's 5700 saw about 20% improvement.

I just fired up my test bench to re-test. It's an Ubuntu 20.04.2 LTS desktop install with 5.4.0-42 kernel with the ROCm 4.2 drivers. using our code injection method (with code changes), the v1.18 app tasks are currently running without issue as they did before. so I know the drivers and the code is working properly, else it would be throwing errors. I will test the new app on this same platform, both with and without code injection.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33825923742
RAC: 37778635

well I've been unsuccessful

well I've been unsuccessful in getting any v1.28 beta tasks on my linux RX 570. I've tried fooling BOINC, but it still only ever sends me 1.18. even with the following set in coproc_info.xml

<opencl_device_version>OpenCL 2.0 AMD-APP</opencl_device_version>

 

either something is broken in the scheduler for linux/AMD, or maybe there's some additional gatekeeping for linux preventing the new app from going to Polaris cards? or I just don't have the right trickery set. 

 

does the lack of a hyphen '-' in the plan class matter? I do notice that the plan class is "FGRPopencl2-ati" for windows, but "FGRPopencl2ati" on Linux. maybe that's clogging things up in the scheduler and the reason for the following error:

[CRITICAL]   Unknown plan class: FGRPopencl2ati

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4266
Credit: 244924081
RAC: 16749

Sorry, indeed there is a typo

Sorry, indeed there is a typo in the plan class of the Linux version. I'll fix this.

BM

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Ian&Steve C. wrote:never saw

Ian&Steve C. wrote:
never saw an improvement with Polaris (RX570, but no errors)

I tested another host , running RX 570 in Windows.

Same result as I got earlier. Regular v1.22 run fine but v1.28 tasks crashed.

Error during OpenCL FFT (error: -5)
ERROR: gen_fft_execute()

Interesting if linux app is immune to this on Polaris / Ellesmere GPUs.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4266
Credit: 244924081
RAC: 16749

Richie wrote: Same result as

Richie wrote:

Same result as I got earlier. Regular v1.22 run fine but v1.28 tasks crashed.

Yep, that's why we put new app versions in Beta test first.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.