Regarding Mac OSX: I'll get back to the OpenCL problems when I'm done with the CUDA version. Sorry, there's currently more computing power to gain at that end. And it's not like OSX not doing any GPU work at all, just a little less efficient.
... would it help if we reduce that of the ATI one? If so, how much? 50%? 33%?
I have mixed feelings about changing the current defaults. The experienced users can take care of themselves. My concern would be the much larger number of 'set and forget' volunteers who may have older and/or lower end GPUs that can't run 2x anyway. If the default were changed to less than one CPU core per GPU task, the users I'm thinking of would be running a single GPU task with all CPU cores also crunching. I don't know how that would go on less capable equipment. My gut feeling is they would probably be better off with the current default behaviour of one less CPU core being used and that way enjoy better GPU performance.
Now if the GPU utilization factor mechanism could be modified so that just the first GPU task would 'reserve' a CPU core and then a configurable number of additional tasks would 'reserve' a second core, etc., it might allow the power users to really optimise the performance without causing any harm to the bulk of 'standard users who just run a single GPU task.
For FGRPB1G, I could imagine two settings, something like these (with example settings shown in bold):-
Number of concurrent GPU tasks per GPU (NB: each task will require 1GB of GPU RAM) -- 3 (default is 1)
Reserve an additional CPU core to support each extra 2 concurrent GPU task(s) after the first. (default is 1)
So for the values shown, 2 CPU cores would be reserved, one for the first GPU task and an additional one for the 2nd and 3rd GPU tasks. This type of mechanism would give users with high end GPUs the ability to fine tune just how many 'reserved' cores were needed. If the 2nd setting were 1, you would have the current default behaviour - 1 CPU core reserved for each GPU task. If the 2nd setting were 3, you would run 3 GPU tasks with just 1 core reserved and the 2nd reserved core would only kick in if you tried to run 4 GPU tasks (ie 3 additional).
I'm certainly not requesting any of the above. It's all just 'off the top of my head' with no real forethought about the degree of difficulty for implementation or about possible unintended consequences. I just thought I might explore ways of allowing control without hurting the bulk of ordinary users who just run one GPU task on a single (possibly lower grade) GPU. More advanced users can achieve all of the above through the app_config.xml mechanism anyway so perhaps nothing needs to change.
I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.
I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.
I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.
On linux with the rx-480, i see a svelte 6%.
I also see much lower CPU usage under Linux, for example (all single cards per host) :-
As you would expect, older and/or slower CPUs use a bit more CPU time but the RX 460 supported by a Q6600 still seems to do reasonably well. That's the only one in the list using the amdgpu FOSS driver with the OpenCL libs from the amdgpu-pro package from AMD. The others are all using the old fglrx (Catalyst) drivers from 2014.
- then export LD_PRELOAD="/where_file_is/libsleep.so"
- adjust YIELD_SLEEP_TIME for lowest used CPU
- ./run_client or ./run_manager
Some one on Linux/Nvidia to give it a try?
EDIT: if it works on Linux, it might also work on Windows if Nvidia used the same POSIX call in their driver
Didn't help at our experiments. Might depend on the driver version, though.
Anyway I built and issued a new application version (1.19 Linux FGRPopencl-Beta-nvidia). This has two additional parameters that can be set on the command-line: --sleepTimeKern and --sleepTimeFFT. These put the CPU to sleep for as many microseconds as the respective argument says, specified separately for the execution of the FFT and own kernels. The values to use depend highly on you CPU and GPU. These need to be small enough to not hold up the application, but large enough to reduce the CPU utilization. You may start with adding "<cmdline> --sleepTimeKern 2000 --sleepTimeFFT 10000 </cmdline>" to your app_config.xml, then play around with it.
I'm publishing 1.20 for Windows and Linux (FGRPopencl-Beta-nvidia). This features some automatic measurement and adjustment of the optimal sleep time. Just add "<cmdline> --sleepTimeFFT -20 </cmdline>" to your app_config.xml.
Regarding Mac OSX: I'll get
)
Regarding Mac OSX: I'll get back to the OpenCL problems when I'm done with the CUDA version. Sorry, there's currently more computing power to gain at that end. And it's not like OSX not doing any GPU work at all, just a little less efficient.
BM
Bernd Machenschalk wrote:...
)
I have mixed feelings about changing the current defaults. The experienced users can take care of themselves. My concern would be the much larger number of 'set and forget' volunteers who may have older and/or lower end GPUs that can't run 2x anyway. If the default were changed to less than one CPU core per GPU task, the users I'm thinking of would be running a single GPU task with all CPU cores also crunching. I don't know how that would go on less capable equipment. My gut feeling is they would probably be better off with the current default behaviour of one less CPU core being used and that way enjoy better GPU performance.
Now if the GPU utilization factor mechanism could be modified so that just the first GPU task would 'reserve' a CPU core and then a configurable number of additional tasks would 'reserve' a second core, etc., it might allow the power users to really optimise the performance without causing any harm to the bulk of 'standard users who just run a single GPU task.
For FGRPB1G, I could imagine two settings, something like these (with example settings shown in bold):-
So for the values shown, 2 CPU cores would be reserved, one for the first GPU task and an additional one for the 2nd and 3rd GPU tasks. This type of mechanism would give users with high end GPUs the ability to fine tune just how many 'reserved' cores were needed. If the 2nd setting were 1, you would have the current default behaviour - 1 CPU core reserved for each GPU task. If the 2nd setting were 3, you would run 3 GPU tasks with just 1 core reserved and the 2nd reserved core would only kick in if you tried to run 4 GPU tasks (ie 3 additional).
I'm certainly not requesting any of the above. It's all just 'off the top of my head' with no real forethought about the degree of difficulty for implementation or about possible unintended consequences. I just thought I might explore ways of allowing control without hurting the bulk of ordinary users who just run one GPU task on a single (possibly lower grade) GPU. More advanced users can achieve all of the above through the app_config.xml mechanism anyway so perhaps nothing needs to change.
Cheers,
Gary.
I can see a quite different
)
I can see a quite different CPU usage of ATI GPU tasks on my hosts. From 20% (HD7950), 35% (RX 480) to 57% (Fury X). So IMO it's better to reserve a full core by default.
The Nvidia 100% CPU load on
)
The Nvidia 100% CPU load on Linux is related to sched_yield implementation.
cgminers fixed it by:
- grabbing https://github.com/jonpry/primecl/blob/master/libsleep.c
- compile it according to included doc
- then export LD_PRELOAD="/where_file_is/libsleep.so"
- adjust YIELD_SLEEP_TIME for lowest used CPU
- ./run_client or ./run_manager
Some one on Linux/Nvidia to give it a try?
EDIT: if it works on Linux, it might also work on Windows if Nvidia used the same POSIX call in their driver
Mumak wrote:I can see a quite
)
On linux with the rx-480, i see a svelte 6%.
choks wrote: Some one on
)
I would be happy to but have only (two) GTX-460 768 MB memory physical, but this looks pretty simple and easy to try.
AgentB wrote:Mumak wrote:I
)
I also see much lower CPU usage under Linux, for example (all single cards per host) :-
GPU Model Concurrency Elapsed/CPU CPU Type & Year Usage %
HD7770 x1 2060/195 AMD PhenomII - 2009 9.5
HD7850 x2 2270/106 Intel G3258 - 2015 4.7
HD7950 x3 2125/186 AMD FX-6300 - 2015 8.8
R7 260X x2 2510/255 Intel Q8400 - 2009 10.2
R7 370 x2 1930/105 Intel G3258 - 2015 5.4
R9 380 x2 1430/243 AMD PhenomII - 2009 17.0
RX 460 x2 2120/175 Intel Q6600 - 2008 8.3
As you would expect, older and/or slower CPUs use a bit more CPU time but the RX 460 supported by a Q6600 still seems to do reasonably well. That's the only one in the list using the amdgpu FOSS driver with the OpenCL libs from the amdgpu-pro package from AMD. The others are all using the old fglrx (Catalyst) drivers from 2014.
Cheers,
Gary.
choks wrote:The Nvidia 100%
)
Didn't help at our experiments. Might depend on the driver version, though.
Anyway I built and issued a new application version (1.19 Linux FGRPopencl-Beta-nvidia). This has two additional parameters that can be set on the command-line: --sleepTimeKern and --sleepTimeFFT. These put the CPU to sleep for as many microseconds as the respective argument says, specified separately for the execution of the FFT and own kernels. The values to use depend highly on you CPU and GPU. These need to be small enough to not hold up the application, but large enough to reduce the CPU utilization. You may start with adding "<cmdline> --sleepTimeKern 2000 --sleepTimeFFT 10000 </cmdline>" to your app_config.xml, then play around with it.
BM
There is now such an app
)
There is now such an app version 1.19 for NVidia on Windows as well.
I don't really have a chance to test this myself, so I'm not sure it works as it should.
BM
I'm publishing 1.20 for
)
I'm publishing 1.20 for Windows and Linux (FGRPopencl-Beta-nvidia). This features some automatic measurement and adjustment of the optimal sleep time. Just add "<cmdline> --sleepTimeFFT -20 </cmdline>" to your app_config.xml.
BM