Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread. If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.
Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread. If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.
Yup. It's a design decision to run the kernels synchronously or asychronously with the host, with the best solution often determined by testing. What the kernels actually do and the time they take is a key point here ( plus any dependencies b/w kernels ). In general one has no ( easy ) control over thread affinity to a given core. What would be really nice for hyperthreaded processors ( hardware optimised multi-threading ) is if the spinning OpenCL loop was running on the same core as the CUDA non-blocking thread. That way the same core could do as much as is possible of the CPU/GPU platform ie. host code + OpenCL runtime.
Cheers, Mike.
( edit ) The other way to view this issue is : which device do you want to optimise the time of ? The CPU or the GPU .... both ? Neither : the total thanks.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
( edit ) The other way to view this issue is : which device do you want to optimise the time of ? The CPU or the GPU .... both ? Neither : the total thanks.
Ha ha, . . . .LOL. No getting your cake and eating it too it seems.
Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread. If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.
So far as I know, SWAN_SYNC is a specific (local) feature of the GPUGrid science application, and not a feature of the generic OpenCL programming environment.
If anyone has specific programming documentation to the contrary, please post it here to stop that incipient urban myth in its tracks.
If anyone has specific programming documentation to the contrary, please post it here to stop that incipient urban myth in its tracks.
Putting aside for the moment the persnickety burden of proving non-existence, I have at the least a terminating algorithm : I would look at, say, OpenCL 2.2 Reference Guide or OpenCl C 2.0 Specification or OpenCL API Specification and note that a search for "SWAN_SYNC" produces no match. Mutatis mutandis one could search the entire Khronos domain .... :-)
Cheers, Mike.
( edit ) Conversely, one could request any naysayers of the above to quote its mention within the Khronos domain.
( edit ) Even better use Google to search the domain by inserting the following ( b/w the single quotes ) in the search field :
'www.khronos.org "SWAN_SYNC"'
and as I have done, find no matches in the Khronos domain.
( eidt ) I'm assuming, or it appears to be, that SWAN_SYNC is used within a tool called Swan. From the Khronos POV that is a 3rd party. It assists in porting CUDA to OpenCL ......
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
For starters GPUGrid uses CUDA and E@H uses OpenCL. The NV drivers seem to basically have a swan sync type of setting enabled when computing with OpenCL as there is a constant 100% of 1 CPU core being used. AMD cards using OpenCL have low CPU usage.
Maybe if there was a swan sync type of setting at E@H we wouldn't need to run 2x/3x at once to achieve high GPU util but some of that might just be waiting on CPU calculations.
Very evident in the CUDA apps
)
Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread. If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.
Keith Myers wrote:Very
)
Yup. It's a design decision to run the kernels synchronously or asychronously with the host, with the best solution often determined by testing. What the kernels actually do and the time they take is a key point here ( plus any dependencies b/w kernels ). In general one has no ( easy ) control over thread affinity to a given core. What would be really nice for hyperthreaded processors ( hardware optimised multi-threading ) is if the spinning OpenCL loop was running on the same core as the CUDA non-blocking thread. That way the same core could do as much as is possible of the CPU/GPU platform ie. host code + OpenCL runtime.
Cheers, Mike.
( edit ) The other way to view this issue is : which device do you want to optimise the time of ? The CPU or the GPU .... both ? Neither : the total thanks.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Mike Hewson wrote:Keith Myers
)
Keith Myers wrote:Very
)
Does Einstein benefit from having SWAN_SYNC=1 ???
Do AMD GPUs benefit from enabling SWAN_SYNC ???
It all depends on whether the
)
It all depends on whether the app is written to allow it. From this document the OpenCL API allows both blocking and non-blocking synchronization.
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html#execution-model-sync
Aurum wrote:Does Einstein
)
So far as I know, SWAN_SYNC is a specific (local) feature of the GPUGrid science application, and not a feature of the generic OpenCL programming environment.
If anyone has specific programming documentation to the contrary, please post it here to stop that incipient urban myth in its tracks.
Richard Haselgrove wrote:If
)
Putting aside for the moment the persnickety burden of proving non-existence, I have at the least a terminating algorithm : I would look at, say, OpenCL 2.2 Reference Guide or OpenCl C 2.0 Specification or OpenCL API Specification and note that a search for "SWAN_SYNC" produces no match. Mutatis mutandis one could search the entire Khronos domain .... :-)
Cheers, Mike.
( edit ) Conversely, one could request any naysayers of the above to quote its mention within the Khronos domain.
( edit ) Even better use Google to search the domain by inserting the following ( b/w the single quotes ) in the search field :
'www.khronos.org "SWAN_SYNC"'
and as I have done, find no matches in the Khronos domain.
( eidt ) I'm assuming, or it appears to be, that SWAN_SYNC is used within a tool called Swan. From the Khronos POV that is a 3rd party. It assists in porting CUDA to OpenCL ......
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
For starters GPUGrid uses
)
For starters GPUGrid uses CUDA and E@H uses OpenCL. The NV drivers seem to basically have a swan sync type of setting enabled when computing with OpenCL as there is a constant 100% of 1 CPU core being used. AMD cards using OpenCL have low CPU usage.
Maybe if there was a swan sync type of setting at E@H we wouldn't need to run 2x/3x at once to achieve high GPU util but some of that might just be waiting on CPU calculations.