Thanks Ian for the detailed reply. Much appreciated :) I applaud your effort in helping to bring these changes to fruition and thank all other volunteers that cooperated in testing & development, especially petri33, the GPU Users team and Bernd for working with you on the deployment of the incremental changes on the platform here!
Petri sounds like a hell of a guy to being able to test code on the fly in real-time!!
On my 6800 running on a Windows 10 system with the latest AMD driver at 2X with moderate clock limitation:
The beta test 1.28 FGRP gave 4% higher production than the production release 1.22. I've seen no abnormal terminations on 1.28, and at this point already have several validations.
Interesting. No crash here but about 10% slower on my Radeon VII with 3 WUs:
1.28: ~522 s,
1.18: ~475 s
clinfo shows:
Platform Version OpenCL 2.1 AMD-APP (3075.10)
hmm I wonder how you got a proper app on linux. my schedule requests are still trying for the incorrectly named FGRPopencl2ati, while you were able to get the right FGRPopencl2-ati.
hmm I wonder how you got a proper app on linux. my schedule requests are still trying for the incorrectly named FGRPopencl2ati, while you were able to get the right FGRPopencl2-ati.
It looks like Bernd has not removed/disabled (or whatever) the incorrect plan class but simply added the correct one as well.
For an RX 570 with no 'tricks' applied, I get entries for both in the scheduler log. The old one still gives the [CRITICAL] response. For the correct one, the message is quite clear now as to why no work is being sent:-
[version] OpenCL device version required min: 200, supplied: 102
Obviously, if I were to fudge the device version in coproc_info.xml, my setup would pass this test. The platform version is already noted there as 2.1 so it would be a very simple edit. Not much point in watching a bunch of test tasks fail though, so I wont be doing that :-). In DF1DX's case, the platform version is 2.1 so if there was a similar device version, the test app and tasks for it would be sent.
A bit strange that there seems to be a loss of performance though.
That's weird! Here's the same section of the scheduler log that I noticed fairly soon after Bernd made his mea culpa announcement :-). I had a feeling that a fix might occur fairly quickly so was on the lookout for it.
</p>
<pre>
2021-08-28 09:57:59.0606 [PID=29432] [send] [HOST#506163] will accept beta work. Scanning for beta work.
2021-08-28 09:57:59.0708 [PID=29432] [version] Checking plan class 'FGRPopencl-ati'
2021-08-28 09:57:59.0736 [PID=29432] [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2021-08-28 09:57:59.0736 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0736 [PID=29432] [version] GPU RAM calculated: min: 766 MB, use: 600 MB, WU#568320566 CPU: 429 MB
2021-08-28 09:57:59.0736 [PID=29432] [version] Peak flops supplied: 5e+10
2021-08-28 09:57:59.0736 [PID=29432] [version] plan class ok
2021-08-28 09:57:59.0736 [PID=29432] [version] Checking plan class 'FGRPopencl-nvidia'
2021-08-28 09:57:59.0736 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0736 [PID=29432] [version] No CUDA devices found
2021-08-28 09:57:59.0737 [PID=29432] [version] Checking plan class 'FGRPopencl1K-ati'
2021-08-28 09:57:59.0737 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0737 [PID=29432] [version] GPU RAM calculated: min: 1000 MB, use: 600 MB, WU#568320566 CPU: 429 MB
2021-08-28 09:57:59.0737 [PID=29432] [version] Peak flops supplied: 5e+10
2021-08-28 09:57:59.0737 [PID=29432] [version] plan class ok
2021-08-28 09:57:59.0737 [PID=29432] [version] Checking plan class 'FGRPopencl1K-nvidia'
2021-08-28 09:57:59.0737 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0737 [PID=29432] [version] No CUDA devices found
2021-08-28 09:57:59.0737 [PID=29432] [version] Checking plan class 'FGRPopenclTV-nvidia'
2021-08-28 09:57:59.0737 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0737 [PID=29432] [version] No CUDA devices found
2021-08-28 09:57:59.0738 [PID=29432] [version] Checking plan class 'FGRPopencl2-ati'
2021-08-28 09:57:59.0738 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0738 [PID=29432] [version] GPU RAM calculated: min: 1000 MB, use: 750 MB, WU#568320566 CPU: 429 MB
2021-08-28 09:57:59.0738 [PID=29432] [version] OpenCL device version required min: 200, supplied: 102
2021-08-28 09:57:59.0738 [PID=29432] [version] Checking plan class 'FGRPopencl2ati'
2021-08-28 09:57:59.0738 [PID=29432] [CRITICAL] Unknown plan class: FGRPopencl2ati
2021-08-28 09:57:59.0738 [PID=29432] [version] Checking plan class 'FGRPopencl2Pup-nvidia'
2021-08-28 09:57:59.0738 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000
2021-08-28 09:57:59.0738 [PID=29432] [version] No CUDA devices found
2021-08-28 09:57:59.0739 [PID=29432] [version] Best version of app hsgamma_FGRPB1G is 1.18 ID 945 FGRPopencl1K-ati (195.12 GFLOPS)</pre>
<p>
Notice that 2-ati is checked and disallowed for OpenCL version reasons before it gets to the wrong 2ati check that gives the error response.
You can see the timestamp - just before 10am UTC - 8PM my time. Your timestamp is quite a bit later so I'm wondering if further changes might have been made during that interval which has caused the correct plan class to somehow be overlooked/non-functional for you. Hopefully Bernd will notice these comments and check the situation.
My host no longer has beta work allowed and I'm not going to revert it yet again. I'm always short of locations (venues) and I had to jump through a few hoops to get beta enabled without risking other hosts as well during the exercise. I tend to avoid beta like the plague unless I can easily set up a single machine. I have no appetite for risking a whole bunch. Fortunately, the group was small enough and I was able to suspend network access on other members for the duration without too much effort.
yeah I guess I'll have to wait for Bernd to address whatever is happening there. maybe he just needs to remove the bad plan class so it stops checking it?
I also noticed that the nvidia 1.28 app is now out of beta. its a production app now. everyone should be able to get it.
I was told this is where the beta applications for FGRPB1G is primarily being discussed. I'm reporting zero success on v1.28 with my older AMD hardware, have only received the beta applications on Windows, none for Linux.
I'll see if I can get anything for my old Pascal-based GPUs, since I've been advised the application changes are supposed to benefit NV GPUs more.
Edit: Looks like I already had some from the weekend - it's completing in around 5 minutes and my recollection is they used to complete at close to 7 minutes, so that's a good improvement. Still requires a lot of wait time on the CPU, by the looks of it.
looks like Bernd got around to cleaning up the scheduler issue with the incorrect plan class issue. I wanted to try this app out because I know this system has the right drivers and wanted to see if it is indeed a problem with the app itself.
RX 570 4GB (Polaris)
Ubuntu 20.04.3 LTS, 5.11.0-27 kernel
ROCm 4.2 drivers
my Linux/RX570 picked up a handful of 1.28 tasks now. they are processing normally. maybe 2-3% slower than the 1.18 tasks, but no errors.
will re-test with my code applied over top of this. It does seem likely that the people who are having issues probably comes down to the drivers and not the app itself. I would highly recommend that anyone having issues on Linux, at least try to use the latest ROCm drivers instead of the AMDGPU-Pro drivers which have a more limited ROCr or PAL implementation which I've never found to work with these new features on older GPUs (I never got PAL drivers to work properly in newer kernels, and ROCr doesnt have proper 2.0 support for old GPUs). Vega "should" work with ROCr in the AMDGPU-Pro package based on the information I've been given, but as always with AMD drivers, what should work and what actually works can often be two totally different things.
Interesting. No crash here
)
Interesting. No crash here but about 10% slower on my Radeon VII with 3 WUs:
1.28: ~522 s,
1.18: ~475 s
clinfo shows:
Platform Version OpenCL 2.1 AMD-APP (3075.10)
Thanks Ian for the detailed
)
Thanks Ian for the detailed reply. Much appreciated :) I applaud your effort in helping to bring these changes to fruition and thank all other volunteers that cooperated in testing & development, especially petri33, the GPU Users team and Bernd for working with you on the deployment of the incremental changes on the platform here!
Petri sounds like a hell of a guy to being able to test code on the fly in real-time!!
On my 6800 running on a
)
On my 6800 running on a Windows 10 system with the latest AMD driver at 2X with moderate clock limitation:
The beta test 1.28 FGRP gave 4% higher production than the production release 1.22. I've seen no abnormal terminations on 1.28, and at this point already have several validations.
DF1DX wrote:Interesting. No
)
hmm I wonder how you got a proper app on linux. my schedule requests are still trying for the incorrectly named FGRPopencl2ati, while you were able to get the right FGRPopencl2-ati.
_________________________________________________________________________
Ian&Steve C. wrote:hmm I
)
It looks like Bernd has not removed/disabled (or whatever) the incorrect plan class but simply added the correct one as well.
For an RX 570 with no 'tricks' applied, I get entries for both in the scheduler log. The old one still gives the [CRITICAL] response. For the correct one, the message is quite clear now as to why no work is being sent:-
[version] OpenCL device version required min: 200, supplied: 102
Obviously, if I were to fudge the device version in coproc_info.xml, my setup would pass this test. The platform version is already noted there as 2.1 so it would be a very simple edit. Not much point in watching a bunch of test tasks fail though, so I wont be doing that :-). In DF1DX's case, the platform version is 2.1 so if there was a similar device version, the test app and tasks for it would be sent.
A bit strange that there seems to be a loss of performance though.
Cheers,
Gary.
mine never checks for the
)
mine never checks for the "good" one. only the bad one, nvidia ones, and normal ones.
_________________________________________________________________________
That's weird! Here's the
)
That's weird! Here's the same section of the scheduler log that I noticed fairly soon after Bernd made his mea culpa announcement :-). I had a feeling that a fix might occur fairly quickly so was on the lookout for it.
</p> <pre> 2021-08-28 09:57:59.0606 [PID=29432] [send] [HOST#506163] will accept beta work. Scanning for beta work. 2021-08-28 09:57:59.0708 [PID=29432] [version] Checking plan class 'FGRPopencl-ati' 2021-08-28 09:57:59.0736 [PID=29432] [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml' 2021-08-28 09:57:59.0736 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000 2021-08-28 09:57:59.0736 [PID=29432] [version] GPU RAM calculated: min: 766 MB, use: 600 MB, WU#568320566 CPU: 429 MB 2021-08-28 09:57:59.0736 [PID=29432] [version] Peak flops supplied: 5e+10 2021-08-28 09:57:59.0736 [PID=29432] [version] plan class ok 2021-08-28 09:57:59.0736 [PID=29432] [version] Checking plan class 'FGRPopencl-nvidia' 2021-08-28 09:57:59.0736 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000 2021-08-28 09:57:59.0736 [PID=29432] [version] No CUDA devices found 2021-08-28 09:57:59.0737 [PID=29432] [version] Checking plan class 'FGRPopencl1K-ati' 2021-08-28 09:57:59.0737 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000 2021-08-28 09:57:59.0737 [PID=29432] [version] GPU RAM calculated: min: 1000 MB, use: 600 MB, WU#568320566 CPU: 429 MB 2021-08-28 09:57:59.0737 [PID=29432] [version] Peak flops supplied: 5e+10 2021-08-28 09:57:59.0737 [PID=29432] [version] plan class ok 2021-08-28 09:57:59.0737 [PID=29432] [version] Checking plan class 'FGRPopencl1K-nvidia' 2021-08-28 09:57:59.0737 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000 2021-08-28 09:57:59.0737 [PID=29432] [version] No CUDA devices found 2021-08-28 09:57:59.0737 [PID=29432] [version] Checking plan class 'FGRPopenclTV-nvidia' 2021-08-28 09:57:59.0737 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000 2021-08-28 09:57:59.0737 [PID=29432] [version] No CUDA devices found 2021-08-28 09:57:59.0738 [PID=29432] [version] Checking plan class 'FGRPopencl2-ati' 2021-08-28 09:57:59.0738 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000 2021-08-28 09:57:59.0738 [PID=29432] [version] GPU RAM calculated: min: 1000 MB, use: 750 MB, WU#568320566 CPU: 429 MB 2021-08-28 09:57:59.0738 [PID=29432] [version] OpenCL device version required min: 200, supplied: 102 2021-08-28 09:57:59.0738 [PID=29432] [version] Checking plan class 'FGRPopencl2ati' 2021-08-28 09:57:59.0738 [PID=29432] [CRITICAL] Unknown plan class: FGRPopencl2ati 2021-08-28 09:57:59.0738 [PID=29432] [version] Checking plan class 'FGRPopencl2Pup-nvidia' 2021-08-28 09:57:59.0738 [PID=29432] [version] parsed project prefs setting 'gpu_util_fgrp': 0.500000 2021-08-28 09:57:59.0738 [PID=29432] [version] No CUDA devices found 2021-08-28 09:57:59.0739 [PID=29432] [version] Best version of app hsgamma_FGRPB1G is 1.18 ID 945 FGRPopencl1K-ati (195.12 GFLOPS)</pre> <p>
Notice that 2-ati is checked and disallowed for OpenCL version reasons before it gets to the wrong 2ati check that gives the error response.
You can see the timestamp - just before 10am UTC - 8PM my time. Your timestamp is quite a bit later so I'm wondering if further changes might have been made during that interval which has caused the correct plan class to somehow be overlooked/non-functional for you. Hopefully Bernd will notice these comments and check the situation.
My host no longer has beta work allowed and I'm not going to revert it yet again. I'm always short of locations (venues) and I had to jump through a few hoops to get beta enabled without risking other hosts as well during the exercise. I tend to avoid beta like the plague unless I can easily set up a single machine. I have no appetite for risking a whole bunch. Fortunately, the group was small enough and I was able to suspend network access on other members for the duration without too much effort.
Cheers,
Gary.
yeah I guess I'll have to
)
yeah I guess I'll have to wait for Bernd to address whatever is happening there. maybe he just needs to remove the bad plan class so it stops checking it?
I also noticed that the nvidia 1.28 app is now out of beta. its a production app now. everyone should be able to get it.
_________________________________________________________________________
I was told this is where the
)
I was told this is where the beta applications for FGRPB1G is primarily being discussed. I'm reporting zero success on v1.28 with my older AMD hardware, have only received the beta applications on Windows, none for Linux.
https://einsteinathome.org/content/fgrpopencl2-ati-beta-test-application-broken
I'll see if I can get anything for my old Pascal-based GPUs, since I've been advised the application changes are supposed to benefit NV GPUs more.
Edit: Looks like I already had some from the weekend - it's completing in around 5 minutes and my recollection is they used to complete at close to 7 minutes, so that's a good improvement. Still requires a lot of wait time on the CPU, by the looks of it.
Soli Deo Gloria
looks like Bernd got around
)
looks like Bernd got around to cleaning up the scheduler issue with the incorrect plan class issue. I wanted to try this app out because I know this system has the right drivers and wanted to see if it is indeed a problem with the app itself.
RX 570 4GB (Polaris)
Ubuntu 20.04.3 LTS, 5.11.0-27 kernel
ROCm 4.2 drivers
my Linux/RX570 picked up a handful of 1.28 tasks now. they are processing normally. maybe 2-3% slower than the 1.18 tasks, but no errors.
https://einsteinathome.org/task/1161439011
host: https://einsteinathome.org/host/12830576
will re-test with my code applied over top of this. It does seem likely that the people who are having issues probably comes down to the drivers and not the app itself. I would highly recommend that anyone having issues on Linux, at least try to use the latest ROCm drivers instead of the AMDGPU-Pro drivers which have a more limited ROCr or PAL implementation which I've never found to work with these new features on older GPUs (I never got PAL drivers to work properly in newer kernels, and ROCr doesnt have proper 2.0 support for old GPUs). Vega "should" work with ROCr in the AMDGPU-Pro package based on the information I've been given, but as always with AMD drivers, what should work and what actually works can often be two totally different things.
_________________________________________________________________________