FGRPB1G: CPU usage factor

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18747567785

RAC: 7062554

Very evident in the CUDA apps

30 Nov 2018 1:31:52 UTC

Message 167967 in response to message 167956

(moderation:

)

Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread. If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 318096427

RAC: 399640

Keith Myers wrote:Very

30 Nov 2018 2:29:00 UTC

Message 167969 in response to message 167967

(moderation:

)

Keith Myers wrote:

Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread. If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.

Yup. It's a design decision to run the kernels synchronously or asychronously with the host, with the best solution often determined by testing. What the kernels actually do and the time they take is a key point here ( plus any dependencies b/w kernels ). In general one has no ( easy ) control over thread affinity to a given core. What would be really nice for hyperthreaded processors ( hardware optimised multi-threading ) is if the spinning OpenCL loop was running on the same core as the CUDA non-blocking thread. That way the same core could do as much as is possible of the CPU/GPU platform ie. host code + OpenCL runtime.

Cheers, Mike.

( edit ) The other way to view this issue is : which device do you want to optimise the time of ? The CPU or the GPU .... both ? Neither : the total thanks.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18747567785

RAC: 7062554

Mike Hewson wrote:Keith Myers

30 Nov 2018 4:20:36 UTC

Message 167973 in response to message 167969

(moderation:

)

Mike Hewson wrote:

Keith Myers wrote:
Cheers, Mike.

( edit ) The other way to view this issue is : which device do you want to optimise the time of ? The CPU or the GPU .... both ? Neither : the total thanks.

Ha ha, . . . .LOL. No getting your cake and eating it too it seems.

Aurum

Joined: 12 Jul 17

Posts: 77

Credit: 3412397040

RAC: 531

Keith Myers wrote:Very

30 Dec 2018 17:52:23 UTC

Message 168589 in response to message 167967

(moderation:

)

Keith Myers wrote:

Very evident in the CUDA apps over GPUGrid which by default run BLOCKING sync on the kernel thread. If you run SWAN_SYNC=1 in the environment variable you can change to SPIN sync and the app speeds up by 30-50%.

Does Einstein benefit from having SWAN_SYNC=1 ???

Do AMD GPUs benefit from enabling SWAN_SYNC ???

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18747567785

RAC: 7062554

It all depends on whether the

30 Dec 2018 20:48:45 UTC

Message 168591 in response to message 168589

(moderation:

)

It all depends on whether the app is written to allow it. From this document the OpenCL API allows both blocking and non-blocking synchronization.

https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html#execution-model-sync

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2958922856

RAC: 711872

Aurum wrote:Does Einstein

30 Dec 2018 21:29:33 UTC

Message 168595 in response to message 168589

(moderation:

)

Aurum wrote:

Does Einstein benefit from having SWAN_SYNC=1 ???

So far as I know, SWAN_SYNC is a specific (local) feature of the GPUGrid science application, and not a feature of the generic OpenCL programming environment.

If anyone has specific programming documentation to the contrary, please post it here to stop that incipient urban myth in its tracks.

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 318096427

RAC: 399640

Richard Haselgrove wrote:If

31 Dec 2018 4:01:00 UTC

Message 168600 in response to message 168595

(moderation:

)

Richard Haselgrove wrote:

If anyone has specific programming documentation to the contrary, please post it here to stop that incipient urban myth in its tracks.

Putting aside for the moment the persnickety burden of proving non-existence, I have at the least a terminating algorithm : I would look at, say, OpenCL 2.2 Reference Guide or OpenCl C 2.0 Specification or OpenCL API Specification and note that a search for "SWAN_SYNC" produces no match. Mutatis mutandis one could search the entire Khronos domain .... :-)

Cheers, Mike.

( edit ) Conversely, one could request any naysayers of the above to quote its mention within the Khronos domain.

( edit ) Even better use Google to search the domain by inserting the following ( b/w the single quotes ) in the search field :

'www.khronos.org "SWAN_SYNC"'

and as I have done, find no matches in the Khronos domain.

( eidt ) I'm assuming, or it appears to be, that SWAN_SYNC is used within a tool called Swan. From the Khronos POV that is a 3rd party. It assists in porting CUDA to OpenCL ......

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

mmonnin

Joined: 29 May 16

Posts: 291

Credit: 3410516540

RAC: 3379649

For starters GPUGrid uses

31 Dec 2018 13:19:47 UTC

Message 168605

(moderation:

)

For starters GPUGrid uses CUDA and E@H uses OpenCL. The NV drivers seem to basically have a swan sync type of setting enabled when computing with OpenCL as there is a constant 100% of 1 CPU core being used. AMD cards using OpenCL have low CPU usage.

Maybe if there was a swan sync type of setting at E@H we wouldn't need to run 2x/3x at once to achieve high GPU util but some of that might just be waiting on CPU calculations.

FGRPB1G: CPU usage factor

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner