Improvements in the code of the clients

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 105
Credit: 3840066854
RAC: 4868834

Interesting. No crash here

Interesting. No crash here but about 10% slower on my Radeon VII with 3 WUs:

1.28: ~522 s,
1.18: ~475 s

clinfo shows:
Platform Version OpenCL 2.1 AMD-APP (3075.10)

bozz4science
bozz4science
Joined: 4 May 20
Posts: 15
Credit: 67643923
RAC: 3894

Thanks Ian for the detailed

Thanks Ian for the detailed reply. Much appreciated :) I applaud your effort in helping to bring these changes to fruition and thank all other volunteers that cooperated in testing & development, especially petri33, the GPU Users team and Bernd for working with you on the deployment of the incremental changes on the platform here! 

Petri sounds like a hell of a guy to being able to test code on the fly in real-time!!  

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7218704931
RAC: 974809

On my 6800 running on a

On my 6800 running on a Windows 10 system with the latest AMD driver at 2X with moderate clock limitation:

The beta test 1.28 FGRP gave 4% higher production than the production release 1.22.  I've seen no abnormal terminations on 1.28, and at this point already have several validations.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46607972642
RAC: 64209488

DF1DX wrote:Interesting. No

DF1DX wrote:

Interesting. No crash here but about 10% slower on my Radeon VII with 3 WUs:

1.28: ~522 s,
1.18: ~475 s

clinfo shows:
Platform Version OpenCL 2.1 AMD-APP (3075.10)

hmm I wonder how you got a proper app on linux. my schedule requests are still trying for the incorrectly named FGRPopencl2ati, while you were able to get the right FGRPopencl2-ati.

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117468520582
RAC: 35484092

Ian&Steve C. wrote:hmm I

Ian&Steve C. wrote:
hmm I wonder how you got a proper app on linux. my schedule requests are still trying for the incorrectly named FGRPopencl2ati, while you were able to get the right FGRPopencl2-ati.

It looks like Bernd has not removed/disabled (or whatever) the incorrect plan class but simply added the correct one as well.

For an RX 570 with no 'tricks' applied, I get entries for both in the scheduler log.  The old one still gives the [CRITICAL] response.  For the correct one, the message is quite clear now as to why no work is being sent:-

[version] OpenCL device version required min: 200, supplied: 102

Obviously, if I were to fudge the device version in coproc_info.xml, my setup would pass this test.  The platform version is already noted there as 2.1 so it would be a very simple edit.  Not much point in watching a bunch of test tasks fail though, so I wont be doing that :-).  In DF1DX's case, the platform version is 2.1 so if there was a similar device version, the test app and tasks for it would be sent.

A bit strange that there seems to be a loss of performance though.

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46607972642
RAC: 64209488

mine never checks for the

mine never checks for the "good" one. only the bad one, nvidia ones, and normal ones.

 

2021-08-28 23:43:24.7156 [PID=14422] [send] [HOST#12830576] will accept beta work. Scanning for beta work.</p>

<pre>
2021-08-28 23:43:24.7256 [PID=14422]    [version] Checking plan class 'FGRPopencl-ati'
2021-08-28 23:43:24.7285 [PID=14422]    [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2021-08-28 23:43:24.7285 [PID=14422]    [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2021-08-28 23:43:24.7286 [PID=14422]    [version] GPU RAM calculated: min: 766 MB, use: 600 MB, WU#570813978 CPU: 429 MB
2021-08-28 23:43:24.7286 [PID=14422]    [version] Peak flops supplied: 5e+10
2021-08-28 23:43:24.7286 [PID=14422]    [version] plan class ok
2021-08-28 23:43:24.7286 [PID=14422]    [version] Checking plan class 'FGRPopencl-nvidia'
2021-08-28 23:43:24.7286 [PID=14422]    [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2021-08-28 23:43:24.7286 [PID=14422]    [version] No CUDA devices found
2021-08-28 23:43:24.7286 [PID=14422]    [version] Checking plan class 'FGRPopencl1K-ati'
2021-08-28 23:43:24.7286 [PID=14422]    [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2021-08-28 23:43:24.7286 [PID=14422]    [version] GPU RAM calculated: min: 1000 MB, use: 600 MB, WU#570813978 CPU: 429 MB
2021-08-28 23:43:24.7286 [PID=14422]    [version] Peak flops supplied: 5e+10
2021-08-28 23:43:24.7286 [PID=14422]    [version] plan class ok
2021-08-28 23:43:24.7286 [PID=14422]    [version] Checking plan class 'FGRPopencl1K-nvidia'
2021-08-28 23:43:24.7286 [PID=14422]    [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2021-08-28 23:43:24.7286 [PID=14422]    [version] No CUDA devices found
2021-08-28 23:43:24.7287 [PID=14422]    [version] Checking plan class 'FGRPopenclTV-nvidia'
2021-08-28 23:43:24.7287 [PID=14422]    [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2021-08-28 23:43:24.7287 [PID=14422]    [version] No CUDA devices found
2021-08-28 23:43:24.7287 [PID=14422]    [version] Checking plan class 'FGRPopencl2ati'
2021-08-28 23:43:24.7287 [PID=14422] [CRITICAL]   Unknown plan class: FGRPopencl2ati
2021-08-28 23:43:24.7287 [PID=14422]    [version] Checking plan class 'FGRPopencl2Pup-nvidia'
2021-08-28 23:43:24.7287 [PID=14422]    [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
2021-08-28 23:43:24.7287 [PID=14422]    [version] No CUDA devices found
2021-08-28 23:43:24.7288 [PID=14422]    [version] Best version of app hsgamma_FGRPB1G is 1.18 ID 945 FGRPopencl1K-ati (150.76 GFLOPS)

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117468520582
RAC: 35484092

That's weird!  Here's the

That's weird!  Here's the same section of the scheduler log that I noticed fairly soon after Bernd made his mea culpa announcement :-).  I had a feeling that a fix might occur fairly quickly so was on the lookout for it.

</p>

<pre>
2021-08-28 09:57:59.0606 [PID=29432]    [send] [HOST#506163] will accept beta work.  Scanning for beta work.
2021-08-28 09:57:59.0708 [PID=29432]    [version] Checking plan class 'FGRPopencl-ati'
2021-08-28 09:57:59.0736 [PID=29432]    [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2021-08-28 09:57:59.0736 [PID=29432]    [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0736 [PID=29432]    [version] GPU RAM calculated: min: 766 MB, use: 600 MB, WU#568320566 CPU: 429 MB
2021-08-28 09:57:59.0736 [PID=29432]    [version] Peak flops supplied: 5e+10
2021-08-28 09:57:59.0736 [PID=29432]    [version] plan class ok
2021-08-28 09:57:59.0736 [PID=29432]    [version] Checking plan class 'FGRPopencl-nvidia'
2021-08-28 09:57:59.0736 [PID=29432]    [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0736 [PID=29432]    [version] No CUDA devices found
2021-08-28 09:57:59.0737 [PID=29432]    [version] Checking plan class 'FGRPopencl1K-ati'
2021-08-28 09:57:59.0737 [PID=29432]    [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0737 [PID=29432]    [version] GPU RAM calculated: min: 1000 MB, use: 600 MB, WU#568320566 CPU: 429 MB
2021-08-28 09:57:59.0737 [PID=29432]    [version] Peak flops supplied: 5e+10
2021-08-28 09:57:59.0737 [PID=29432]    [version] plan class ok
2021-08-28 09:57:59.0737 [PID=29432]    [version] Checking plan class 'FGRPopencl1K-nvidia'
2021-08-28 09:57:59.0737 [PID=29432]    [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0737 [PID=29432]    [version] No CUDA devices found
2021-08-28 09:57:59.0737 [PID=29432]    [version] Checking plan class 'FGRPopenclTV-nvidia'
2021-08-28 09:57:59.0737 [PID=29432]    [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0737 [PID=29432]    [version] No CUDA devices found
2021-08-28 09:57:59.0738 [PID=29432]    [version] Checking plan class 'FGRPopencl2-ati'
2021-08-28 09:57:59.0738 [PID=29432]    [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0738 [PID=29432]    [version] GPU RAM calculated: min: 1000 MB, use: 750 MB, WU#568320566 CPU: 429 MB
2021-08-28 09:57:59.0738 [PID=29432]    [version] OpenCL device version required min: 200, supplied: 102
2021-08-28 09:57:59.0738 [PID=29432]    [version] Checking plan class 'FGRPopencl2ati'
2021-08-28 09:57:59.0738 [PID=29432] [CRITICAL]   Unknown plan class: FGRPopencl2ati
2021-08-28 09:57:59.0738 [PID=29432]    [version] Checking plan class 'FGRPopencl2Pup-nvidia'
2021-08-28 09:57:59.0738 [PID=29432]    [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0738 [PID=29432]    [version] No CUDA devices found
2021-08-28 09:57:59.0739 [PID=29432]    [version] Best version of app hsgamma_FGRPB1G is 1.18 ID 945 FGRPopencl1K-ati (195.12 GFLOPS)</pre>

<p>

Notice that 2-ati is checked and disallowed for OpenCL version reasons before it gets to the wrong 2ati check that gives the error response.

You can see the timestamp - just before 10am UTC - 8PM my time.  Your timestamp is quite a bit later so I'm wondering if further changes might have been made during that interval which has caused the correct plan class to somehow be overlooked/non-functional for you.   Hopefully Bernd will notice these comments and check the situation.

My host no longer has beta work allowed and I'm not going to revert it yet again.  I'm always short of locations (venues) and I had to jump through a few hoops to get beta enabled without risking other hosts as well during the exercise.  I tend to avoid beta like the plague unless I can easily set up a single machine.  I have no appetite for risking a whole bunch.  Fortunately, the group was small enough and I was able to suspend network access on other members for the duration without too much effort.

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46607972642
RAC: 64209488

yeah I guess I'll have to

yeah I guess I'll have to wait for Bernd to address whatever is happening there. maybe he just needs to remove the bad plan class so it stops checking it?

 

I also noticed that the nvidia 1.28 app is now out of beta. its a production app now. everyone should be able to get it.

_________________________________________________________________________

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 122
Credit: 17364054280
RAC: 7178117

I was told this is where the

I was told this is where the beta applications for FGRPB1G is primarily being discussed. I'm reporting zero success on v1.28 with my older AMD hardware, have only received the beta applications on Windows, none for Linux.

https://einsteinathome.org/content/fgrpopencl2-ati-beta-test-application-broken

I'll see if I can get anything for my old Pascal-based GPUs, since I've been advised the application changes are supposed to benefit NV GPUs more.

Edit: Looks like I already had some from the weekend - it's completing in around 5 minutes and my recollection is they used to complete at close to 7 minutes, so that's a good improvement. Still requires a lot of wait time on the CPU, by the looks of it.

Soli Deo Gloria

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46607972642
RAC: 64209488

looks like Bernd got around

looks like Bernd got around to cleaning up the scheduler issue with the incorrect plan class issue. I wanted to try this app out because I know this system has the right drivers and wanted to see if it is indeed a problem with the app itself.

RX 570 4GB (Polaris)

Ubuntu 20.04.3 LTS, 5.11.0-27 kernel

ROCm 4.2 drivers

 

my Linux/RX570 picked up a handful of 1.28 tasks now. they are processing normally. maybe 2-3% slower than the 1.18 tasks, but no errors.

https://einsteinathome.org/task/1161439011

host: https://einsteinathome.org/host/12830576

 

will re-test with my code applied over top of this. It does seem likely that the people who are having issues probably comes down to the drivers and not the app itself. I would highly recommend that anyone having issues on Linux, at least try to use the latest ROCm drivers instead of the AMDGPU-Pro drivers which have a more limited ROCr or PAL implementation which I've never found to work with these new features on older GPUs (I never got PAL drivers to work properly in newer kernels, and ROCr doesnt have proper 2.0 support for old GPUs). Vega "should" work with ROCr in the AMDGPU-Pro package based on the information I've been given, but as always with AMD drivers, what should work and what actually works can often be two totally different things.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.