Gamma-ray pulsar binary search #1 on GPUs

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251180315
RAC: 41915

Regarding Mac OSX: I'll get

Regarding Mac OSX: I'll get back to the OpenCL problems when I'm done with the CUDA version. Sorry, there's currently more computing power to gain at that end. And it's not like OSX not doing any GPU work at all, just a little less efficient.

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 117990258139
RAC: 21145443

Bernd Machenschalk wrote:...

Bernd Machenschalk wrote:
... would it help if we reduce that of the ATI one? If so, how much? 50%? 33%?

I have mixed feelings about changing the current defaults.  The experienced users can take care of themselves.  My concern would be the much larger number of 'set and forget' volunteers who may have older and/or lower end GPUs that can't run 2x anyway.  If the default were changed to less than one CPU core per GPU task, the users I'm thinking of would be running a single GPU task with all CPU cores also crunching.  I don't know how that would go on less capable equipment.  My gut feeling is they would probably be better off with the current default behaviour of one less CPU core being used and that way enjoy better GPU performance.

Now if the GPU utilization factor mechanism could be modified so that just the first GPU task would 'reserve' a CPU core and then a configurable number of additional tasks would 'reserve' a second core, etc., it might allow the power users to really optimise the performance without causing any harm to the bulk of 'standard users who just run a single GPU task.

For FGRPB1G, I could imagine two settings, something like these (with example settings shown in bold):-

  1. Number of concurrent GPU tasks per GPU (NB: each task will require 1GB of GPU RAM) -- (default is 1)
  2. Reserve an additional CPU core to support each extra 2 concurrent GPU task(s) after the first.  (default is 1)

So for the values shown, 2 CPU cores would be reserved, one for the first GPU task and an additional one for the 2nd and 3rd GPU tasks.  This type of mechanism would give users with high end GPUs the ability to fine tune just how many 'reserved' cores were needed.  If the 2nd setting were 1, you would have the current default behaviour - 1 CPU core reserved for each GPU task.  If the 2nd setting were 3, you would run 3 GPU tasks with just 1 core reserved and the 2nd reserved core would only kick in if you tried to run 4 GPU tasks (ie 3 additional).

I'm certainly not requesting any of the above.  It's all just 'off the top of my head' with no real forethought about the degree of difficulty for implementation or about possible unintended consequences.  I just thought I might explore ways of allowing control without hurting the bulk of ordinary users who just run one GPU task on a single (possibly lower grade) GPU.  More advanced users can achieve all of the above through the app_config.xml mechanism anyway so perhaps nothing needs to change.

 

Cheers,
Gary.

Mumak
Joined: 26 Feb 13
Posts: 333
Credit: 3540436327
RAC: 1129460

I can see a quite different

I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 146831055
RAC: 42042

The Nvidia 100% CPU load on

The Nvidia 100% CPU load on Linux is related to sched_yield implementation.

cgminers fixed it by:

- grabbing https://github.com/jonpry/primecl/blob/master/libsleep.c

- compile it according to included doc

- then export LD_PRELOAD="/where_file_is/libsleep.so"

- adjust YIELD_SLEEP_TIME for lowest used CPU

- ./run_client or ./run_manager

Some one on Linux/Nvidia to give it a try?

EDIT: if it works on Linux, it might also work on Windows if Nvidia used the same POSIX call in their driver

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Mumak wrote:I can see a quite

Mumak wrote:
I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.

On linux with the rx-480, i see a svelte 6%. 

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

choks wrote: Some one on

choks wrote:

Some one on Linux/Nvidia to give it a try?

I would be happy to but have only (two) GTX-460 768 MB memory physical, but this looks pretty simple and easy to try.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 117990258139
RAC: 21145443

AgentB wrote:Mumak wrote:I

AgentB wrote:
Mumak wrote:
I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.

On linux with the rx-480, i see a svelte 6%. 

I also see much lower CPU usage under Linux, for example (all single cards per host) :-

    GPU Model    Concurrency   Elapsed/CPU   CPU Type & Year            Usage %
      HD7770            x1              2060/195     AMD PhenomII - 2009        9.5
      HD7850            x2              2270/106     Intel G3258 - 2015            4.7
      HD7950            x3              2125/186     AMD FX-6300 - 2015          8.8
      R7 260X            x2              2510/255     Intel Q8400 - 2009          10.2
      R7 370              x2              1930/105     Intel G3258 - 2015            5.4
      R9 380              x2              1430/243     AMD PhenomII - 2009      17.0
      RX 460              x2              2120/175     Intel Q6600 - 2008            8.3

As you would expect, older and/or slower CPUs use a bit more CPU time but the RX 460 supported by a Q6600 still seems to do reasonably well.  That's the only one in the list using the amdgpu FOSS driver with the OpenCL libs from the amdgpu-pro package from AMD.  The others are all using the old fglrx (Catalyst) drivers from 2014.

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251180315
RAC: 41915

choks wrote:The Nvidia 100%

choks wrote:

The Nvidia 100% CPU load on Linux is related to sched_yield implementation.

cgminers fixed it by:

- grabbing https://github.com/jonpry/primecl/blob/master/libsleep.c

- compile it according to included doc

- then export LD_PRELOAD="/where_file_is/libsleep.so"

- adjust YIELD_SLEEP_TIME for lowest used CPU

- ./run_client or ./run_manager

Some one on Linux/Nvidia to give it a try?

EDIT: if it works on Linux, it might also work on Windows if Nvidia used the same POSIX call in their driver

Didn't help at our experiments. Might depend on the driver version, though.

Anyway I built and issued a new application version (1.19 Linux FGRPopencl-Beta-nvidia). This has two additional parameters that can be set on the command-line: --sleepTimeKern and --sleepTimeFFT. These put the CPU to sleep for as many microseconds as the respective argument says, specified separately for the execution of the FFT and own kernels. The values to use depend highly on you CPU and GPU. These need to be small enough to not hold up the application, but large enough to reduce the CPU utilization. You may start with adding "<cmdline> --sleepTimeKern 2000 --sleepTimeFFT 10000 </cmdline>" to your app_config.xml, then play around with it.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251180315
RAC: 41915

There is now such an app

There is now such an app version 1.19 for NVidia on Windows as well.

I don't really have a chance to test this myself, so I'm not sure it works as it should.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251180315
RAC: 41915

I'm publishing 1.20 for

I'm publishing 1.20 for Windows and Linux (FGRPopencl-Beta-nvidia). This features some automatic measurement and adjustment of the optimal sleep time. Just add "<cmdline> --sleepTimeFFT -20 </cmdline>" to your app_config.xml.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.