Hm. twiddle_dee is not in our own code; there is a twiddle_dee possibly in the clFFT library. That library, however, is linked but should not be used in the GW App, at least not on NVidia.
just be aware of the consequences of this change with regard to OpenCL supported features. defining a table in this way is only supported in openCL 2.0 and greater. and I think it's safe to assume that a large portion of the user base is using drivers that only have openCL 1.2
you might need to gatekeep the application from hosts that have incompatible drivers to avoid mass errors. this applies to both AMD and NVIDIA
There is now a 1.25 FGRP App (Beta test) that should have the clFFT patched in the suggested way. For now this is restricted to NVidia Pascal & up (compute capability >= 6.0) and OpenCL 2.0 in the respective plan class. The app is available for all three major platforms (Windows, Linux, OSX).
I tested out the app, but it’s not really any different in run speed or behavior. I can see __global in the hex editor now with twiddle_dee. Applying my patch overtop of this new 1.25 app brings runtime back to being fast again.
there may be some other changes that petri has made besides this that are complimentary.
Thanks, got it. Regarding the GPU code, this only changes the type of twiddle_dee and adds some options to the OpenCL compiler, in particular to use OpenCL 2.0 (*). I built app version 1.26 with that, please give it a try. It's the same plan class as 1.25, so the same restrictions apply.
(*) The other stuff in there just puts the CPU to sleep while the GPU is running, there is some other method implemented in our app.
Thanks, got it. Regarding the GPU code, this only changes the type of twiddle_dee and adds some options to the OpenCL compiler, in particular to use OpenCL 2.0 (*). I built app version 1.26 with that, please give it a try. It's the same plan class as 1.25, so the same restrictions apply.
(*) The other stuff in there just puts the CPU to sleep while the GPU is running, there is some other method implemented in our app.
Hi Bernd, I tried 1.26 and I see the same performance as 1.25. Did you add the other conditions from my followup PM last night?
Hm. twiddle_dee is not in our
)
Hm. twiddle_dee is not in our own code; there is a twiddle_dee possibly in the clFFT library. That library, however, is linked but should not be used in the GW App, at least not on NVidia.
BM
In clFFT there is
)
In clFFT there is
I'll patch that.
BM
just be aware of the
)
just be aware of the consequences of this change with regard to OpenCL supported features. defining a table in this way is only supported in openCL 2.0 and greater. and I think it's safe to assume that a large portion of the user base is using drivers that only have openCL 1.2
you might need to gatekeep the application from hosts that have incompatible drivers to avoid mass errors. this applies to both AMD and NVIDIA
_________________________________________________________________________
There is now a 1.25 FGRP App
)
There is now a 1.25 FGRP App (Beta test) that should have the clFFT patched in the suggested way. For now this is restricted to NVidia Pascal & up (compute capability >= 6.0) and OpenCL 2.0 in the respective plan class. The app is available for all three major platforms (Windows, Linux, OSX).
BM
I tested out the app, but
)
I tested out the app, but it’s not really any different in run speed or behavior. I can see __global in the hex editor now with twiddle_dee. Applying my patch overtop of this new 1.25 app brings runtime back to being fast again.
there may be some other changes that petri has made besides this that are complimentary.
_________________________________________________________________________
I would be happy to receive
)
I would be happy to receive this patch. As long as it's OpenCL, we should be able to incorporate it in the App.
BM
Bernd Machenschalk wrote: I
)
I've PMed you a link to the code and instructions/info.
_________________________________________________________________________
Thanks, got it. Regarding the
)
Thanks, got it. Regarding the GPU code, this only changes the type of twiddle_dee and adds some options to the OpenCL compiler, in particular to use OpenCL 2.0 (*). I built app version 1.26 with that, please give it a try. It's the same plan class as 1.25, so the same restrictions apply.
(*) The other stuff in there just puts the CPU to sleep while the GPU is running, there is some other method implemented in our app.
BM
Bernd Machenschalk
)
Hi Bernd, I tried 1.26 and I see the same performance as 1.25. Did you add the other conditions from my followup PM last night?
_________________________________________________________________________
Thanks, probably missed that.
)
Thanks, probably missed that. I'll take another look next week.
BM