Gamma-ray pulsar binary search #1 on GPUs

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253583493

RAC: 36238

Regarding Mac OSX: I'll get

10 Feb 2017 7:44:00 UTC

Message 155236

(moderation:

)

Regarding Mac OSX: I'll get back to the OpenCL problems when I'm done with the CUDA version. Sorry, there's currently more computing power to gain at that end. And it's not like OSX not doing any GPU work at all, just a little less efficient.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119603274511

RAC: 24854879

Bernd Machenschalk wrote:...

10 Feb 2017 12:12:11 UTC

Message 155243 in response to message 155234

(moderation:

)

Bernd Machenschalk wrote:

... would it help if we reduce that of the ATI one? If so, how much? 50%? 33%?

I have mixed feelings about changing the current defaults. The experienced users can take care of themselves. My concern would be the much larger number of 'set and forget' volunteers who may have older and/or lower end GPUs that can't run 2x anyway. If the default were changed to less than one CPU core per GPU task, the users I'm thinking of would be running a single GPU task with all CPU cores also crunching. I don't know how that would go on less capable equipment. My gut feeling is they would probably be better off with the current default behaviour of one less CPU core being used and that way enjoy better GPU performance.

Now if the GPU utilization factor mechanism could be modified so that just the first GPU task would 'reserve' a CPU core and then a configurable number of additional tasks would 'reserve' a second core, etc., it might allow the power users to really optimise the performance without causing any harm to the bulk of 'standard users who just run a single GPU task.

For FGRPB1G, I could imagine two settings, something like these (with example settings shown in bold):-

Number of concurrent GPU tasks per GPU (NB: each task will require 1GB of GPU RAM) -- 3 (default is 1)
Reserve an additional CPU core to support each extra 2 concurrent GPU task(s) after the first. (default is 1)

So for the values shown, 2 CPU cores would be reserved, one for the first GPU task and an additional one for the 2nd and 3rd GPU tasks. This type of mechanism would give users with high end GPUs the ability to fine tune just how many 'reserved' cores were needed. If the 2nd setting were 1, you would have the current default behaviour - 1 CPU core reserved for each GPU task. If the 2nd setting were 3, you would run 3 GPU tasks with just 1 core reserved and the 2nd reserved core would only kick in if you tried to run 4 GPU tasks (ie 3 additional).

I'm certainly not requesting any of the above. It's all just 'off the top of my head' with no real forethought about the degree of difficulty for implementation or about possible unintended consequences. I just thought I might explore ways of allowing control without hurting the bulk of ordinary users who just run one GPU task on a single (possibly lower grade) GPU. More advanced users can achieve all of the above through the app_config.xml mechanism anyway so perhaps nothing needs to change.

Cheers,
Gary.

Mumak

Joined: 26 Feb 13

Posts: 335

Credit: 3631557214

RAC: 1560328

I can see a quite different

10 Feb 2017 13:01:05 UTC

Message 155245

(moderation:

)

I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.

choks

Joined: 24 Feb 05

Posts: 16

Credit: 151659240

RAC: 137888

The Nvidia 100% CPU load on

10 Feb 2017 15:25:50 UTC

Message 155251

(moderation:

)

The Nvidia 100% CPU load on Linux is related to sched_yield implementation.

cgminers fixed it by:

- grabbing https://github.com/jonpry/primecl/blob/master/libsleep.c

- compile it according to included doc

- then export LD_PRELOAD="/where_file_is/libsleep.so"

- adjust YIELD_SLEEP_TIME for lowest used CPU

- ./run_client or ./run_manager

Some one on Linux/Nvidia to give it a try?

EDIT: if it works on Linux, it might also work on Windows if Nvidia used the same POSIX call in their driver

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

Mumak wrote:I can see a quite

10 Feb 2017 19:31:58 UTC

Message 155259 in response to message 155245

(moderation:

)

Mumak wrote:

I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.

On linux with the rx-480, i see a svelte 6%.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

choks wrote: Some one on

10 Feb 2017 19:39:23 UTC

Message 155262 in response to message 155251

(moderation:

)

choks wrote:

Some one on Linux/Nvidia to give it a try?

I would be happy to but have only (two) GTX-460 768 MB memory physical, but this looks pretty simple and easy to try.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119603274511

RAC: 24854879

AgentB wrote:Mumak wrote:I

11 Feb 2017 1:04:07 UTC

Message 155277 in response to message 155259

(moderation:

)

AgentB wrote:

Mumak wrote:
I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.

On linux with the rx-480, i see a svelte 6%.

I also see much lower CPU usage under Linux, for example (all single cards per host) :-

    GPU Model    Concurrency   Elapsed/CPU   CPU Type & Year    Usage %
      HD7770        x1              2060/195 AMD PhenomII - 2009    9.5
HD7850        x2              2270/106 Intel G3258 - 2015    4.7
HD7950          x3              2125/186 AMD FX-6300 - 2015    8.8
      R7 260X            x2              2510/255     Intel Q8400 - 2009    10.2
      R7 370            x2              1930/105 Intel G3258 - 2015      5.4
      R9 380              x2              1430/243 AMD PhenomII - 2009      17.0
      RX 460              x2              2120/175 Intel Q6600 - 2008        8.3

As you would expect, older and/or slower CPUs use a bit more CPU time but the RX 460 supported by a Q6600 still seems to do reasonably well. That's the only one in the list using the amdgpu FOSS driver with the OpenCL libs from the amdgpu-pro package from AMD. The others are all using the old fglrx (Catalyst) drivers from 2014.

Cheers,
Gary.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253583493

RAC: 36238

choks wrote:The Nvidia 100%

14 Feb 2017 15:43:00 UTC

Message 155406 in response to message 155251

(moderation:

)

choks wrote:

The Nvidia 100% CPU load on Linux is related to sched_yield implementation.

cgminers fixed it by:

- grabbing https://github.com/jonpry/primecl/blob/master/libsleep.c

- compile it according to included doc

- then export LD_PRELOAD="/where_file_is/libsleep.so"

- adjust YIELD_SLEEP_TIME for lowest used CPU

- ./run_client or ./run_manager

Some one on Linux/Nvidia to give it a try?

EDIT: if it works on Linux, it might also work on Windows if Nvidia used the same POSIX call in their driver

Didn't help at our experiments. Might depend on the driver version, though.

Anyway I built and issued a new application version (1.19 Linux FGRPopencl-Beta-nvidia). This has two additional parameters that can be set on the command-line: --sleepTimeKern and --sleepTimeFFT. These put the CPU to sleep for as many microseconds as the respective argument says, specified separately for the execution of the FFT and own kernels. The values to use depend highly on you CPU and GPU. These need to be small enough to not hold up the application, but large enough to reduce the CPU utilization. You may start with adding "<cmdline> --sleepTimeKern 2000 --sleepTimeFFT 10000 </cmdline>" to your app_config.xml, then play around with it.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253583493

RAC: 36238

There is now such an app

15 Feb 2017 7:39:00 UTC

Message 155434 in response to message 155406

(moderation:

)

There is now such an app version 1.19 for NVidia on Windows as well.

I don't really have a chance to test this myself, so I'm not sure it works as it should.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253583493

RAC: 36238

I'm publishing 1.20 for

15 Feb 2017 9:47:55 UTC

Message 155438

(moderation:

)

I'm publishing 1.20 for Windows and Linux (FGRPopencl-Beta-nvidia). This features some automatic measurement and adjustment of the optimal sleep time. Just add "<cmdline> --sleepTimeFFT -20 </cmdline>" to your app_config.xml.

Gamma-ray pulsar binary search #1 on GPUs

Forums › Technical News

Comment viewing options

Forums › Technical News