FGRPB1G: CPU usage factor

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4969
Credit: 18768133422
RAC: 7089507

Very evident in the CUDA apps

Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread.  If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.

 

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 319516388
RAC: 425190

Keith Myers wrote:Very

Keith Myers wrote:
Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread.  If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.

Yup. It's a design decision to run the kernels synchronously or asychronously with the host, with the best solution often determined by testing. What the kernels actually do and the time they take is a key point here ( plus any dependencies b/w kernels ). In general one has no ( easy ) control over thread affinity to a given core. What would be really nice for hyperthreaded processors ( hardware optimised multi-threading ) is if the spinning OpenCL loop was running on the same core as the CUDA non-blocking thread. That way the same core could do as much as is possible of the CPU/GPU platform ie. host code + OpenCL runtime.

Cheers, Mike.

( edit ) The other way to view this issue is : which device do you want to optimise the time of ? The CPU or the GPU .... both ? Neither : the total thanks.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4969
Credit: 18768133422
RAC: 7089507

Mike Hewson wrote:Keith Myers

Mike Hewson wrote:
Keith Myers wrote:

Cheers, Mike.

( edit ) The other way to view this issue is : which device do you want to optimise the time of ? The CPU or the GPU .... both ? Neither : the total thanks.

 

Ha ha, . . . .LOL.  No getting your cake and eating it too it seems.

 

Aurum
Aurum
Joined: 12 Jul 17
Posts: 77
Credit: 3412397040
RAC: 436

Keith Myers wrote:Very

Keith Myers wrote:
Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread.  If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.

Does Einstein benefit from having SWAN_SYNC=1 ???

Do AMD GPUs benefit from enabling SWAN_SYNC ???

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4969
Credit: 18768133422
RAC: 7089507

It all depends on whether the

It all depends on whether the app is written to allow it. From this document the OpenCL API allows both blocking and non-blocking synchronization.

https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html#execution-model-sync

 

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2960835998
RAC: 702349

Aurum wrote:Does Einstein

Aurum wrote:
Does Einstein benefit from having SWAN_SYNC=1 ???

So far as I know, SWAN_SYNC is a specific (local) feature of the GPUGrid science application, and not a feature of the generic OpenCL programming environment.

If anyone has specific programming documentation to the contrary, please post it here to stop that incipient urban myth in its tracks.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 319516388
RAC: 425190

Richard Haselgrove wrote:If

Richard Haselgrove wrote:
If anyone has specific programming documentation to the contrary, please post it here to stop that incipient urban myth in its tracks.

Putting aside for the moment the persnickety burden of proving non-existence, I have at the least a terminating algorithm : I would look at, say, OpenCL 2.2 Reference Guide or OpenCl C 2.0 Specification or OpenCL API Specification and note that a search for "SWAN_SYNC" produces no match. Mutatis mutandis one could search the entire Khronos domain .... :-)

Cheers, Mike. 

( edit ) Conversely, one could request any naysayers of the above to quote its mention within the Khronos domain

( edit ) Even better use Google to search the domain by inserting the following ( b/w the single quotes ) in the search field :

'www.khronos.org "SWAN_SYNC"'

and as I have done, find no matches in the Khronos domain.

( eidt ) I'm assuming, or it appears to be, that SWAN_SYNC is used within a tool called Swan. From the Khronos POV that is a 3rd party. It assists in porting CUDA to OpenCL ...... 

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3425156540
RAC: 3817271

For starters GPUGrid uses

For starters GPUGrid uses CUDA and E@H uses OpenCL. The NV drivers seem to basically have a swan sync type of setting enabled when computing with OpenCL as there is a constant 100% of 1 CPU core being used. AMD cards using OpenCL have low CPU usage. 

Maybe if there was a swan sync type of setting at E@H we wouldn't need to run 2x/3x at once to achieve high GPU util but some of that might just be waiting on CPU calculations.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.