GPU Tasks using a lot of CPU

Prototype
Prototype
Joined: 9 Feb 05
Posts: 2
Credit: 320355
RAC: 0
Topic 198607

While running GPU tasks on my system I was surprised to find they where taking huge amounts of time to run especially when compared to other less powerful machines doing the same unit as myself or to units run solely on CPU power.

I've been getting units taking way over 24 hours to crunch when other people have been doing them in less than 6 hours.

The problem seem to be how much E@H GPU tasks require CPU power. Most other projects (that I've tried) that use GPU to crunch barely effect (or are affected by) whatever else is running.

If I set CPU usage to 100%, I get 8 CPU projects running fine and E@H GPU projects taking 24 hours+.
If I set CPU usage <100%, I get 7 CPU projects running fine and E@H GPU projects taking <6 hours.

Is there any other way of making E@H GPU tasks "play nice" with other projects?
I just want to make full use of all 8(logical) cores when possible instead of only 7 + whatever fraction E@H (or other) GPU project needs, it seems like almost a total waste of a core otherwise (especially with projects like SETI@HOME which need almost no CPU power per GPU task).

System spec.
Intel i7-4790k
16GB RAM
RADEON HD-7970
Windows 10 Pro

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7060514931
RAC: 1154941

GPU Tasks using a lot of CPU

You don't mention in your post directly whether your 24 vs 4 hours comparisons involve a situation in which you are running more than one GPU task at the same time, with at least one task being from Einstein, and at least one from somewhere else.

I surmise the answer is "yes". When multiple tasks are running "simultaneously" on a GPU, it is more nearly true that they are running in rapid sequence, with very rapid swapping from one to the next facilitated by the massive on-chip task state storage of the GPU. When the currently active task needs an external resource, control will pass to another active task which is ready to run.

If you pair two tasks, one of which has this type of "break for resource" much more often than the other, then the one which asks less often will get a higher fraction of the GPU resource.

If this point interests you, you can investigate the behavior of matched pairs (or triplets, or quads...) by suspending all available tasks of another sort.

Many of us here restrict BOINC CPU tasks to no more than the available CPU count minus one, and some of us to far less.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034220040
RAC: 22396279

RE: ...running GPU tasks

Quote:
...running GPU tasks ... ... way over 24 hours to crunch ...


I'm not sure where you get that figure from. Here is a list of your BRP6 GPU tasks. The longest running one so far took about 19 hours but all the rest have completed much faster - two of them much less than 2 hours.

Unfortunately, the problem is that high end AMD GPUs like the 7970 will do exactly what you see - run atrociously if you you try to fully load all CPU cores. The problem is compounded if they are HT cores rather than real ones. Also, there is no hard and fast 'rule' as to how many cores should not be loaded with a CPU task. It varies with lots of factors. The only way to optimise things is to experiment with different numbers of concurrent GPU tasks and with how many CPU cores you 'free up' from running CPU tasks.

I bought a 7950 quite cheaply on ebay recently. I'm still experimenting but at the moment it's running 6 concurrent GPU tasks and 2 CPU tasks with 4 CPU cores not crunching. The CPU is an AMD FX-6300 hexa-core. The GPU is completing 6 tasks every ~4.6hrs - ~46 mins per task. Here is a link to the recently validated BRP6 tasks that this GPU has completed. Notice how relatively uniform the crunch times are in that list. They should be pretty uniform if everything is working efficiently.

I know this doesn't really help you much, and, from your full list of projects and where Einstein is on that list, I assume Einstein isn't very high on your agenda of favourite projects to support :-). Unfortunately, when you try to support multiple projects on the one machine, it can be very hard for all the projects to coexist efficiently. There are bound to be some interactions that cause problems. In some ways, you may get better efficiency if you run A, B, C this month and D, E, F next month, (or something like that) rather than all concurrently.

I notice you joined this project the very same day I did over 11 years ago so welcome back!! You've rejoined at a quite exciting time with the availability of advanced LIGO data and the recent confirmation (via the black hole merger event) that gravity waves really do exist as predicted. The prospect of detecting continuous emissions from massive objects like rapidly spinning pulsars is quite appealing.

While the advanced LIGO detectors were undergoing their upgrade, E@H was kept busy searching for pulsars. I don't know off hand exactly how many previously undiscovered radio pulsars have been found by sifting through data from radio telescopes but it's probably of the order of 50 or more. There are links to the precise details on the front page of this site so you can easily find out. Your GPU is eminently suited to the BRP6 search so I hope you will continue to do those tasks. Your GPU should be faster than mine and so should be able to better the performance you can see in the above link.

Please ask specific questions if you need more information. All the best with whatever you decide to do.

Cheers,
Gary.

Engagex BOINC-SETI
Engagex BOINC-SETI
Joined: 7 Oct 16
Posts: 5
Credit: 1062792
RAC: 0

I'm finding I have the same

I'm finding I have the same issue. Boinc seems to not let me specify the number of GPUs and CPUs in the app_config.xml for E@H though. It always runs 1 whole CPU core and 1 whole GPU. It still takes about 20 hours per E@H project. My geforce can crank out most projects in about 45 min, some take about 2-3, maybe 4-5 hours.

Here is my machine and stats:

Total credit:121,138


Average credit:1,160.05


Cross project credit:


CPU type:AuthenticAMD AMD Phenom(tm) II X4 B95 Processor [Family 16 Model 4 Stepping 2]


Number of processors:4


Coprocessors:NVIDIA GeForce GT 420 (1983MB) driver: 367.57


Operating system:Linux 4.8.0-39-generic


BOINC client version:7.6.33


Memory:7981.77 MiB


Cache:512 KiB


Swap space:15255.99 MiB


Total disk space:443.43 GiB


Free disk space:414.19 GiB


Measured floating point speed:3272.48 million ops/sec


Measured integer speed:94562.91 million ops/sec


Average upload rate:131.5 KiB/sec


Average download rate:12480.19 KiB/sec


Average turnaround time:2.09 days


 


I know this is almost a year old but if you have any insight please reply.
archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7060514931
RAC: 1154941

Engagex BOINC-SETI

Engagex BOINC-SETI wrote:
Coprocessors:NVIDIA GeForce GT 420 (1983MB) driver: 367.57
if you have any insight please reply.


It is always tricky comparing GPU units, but here is a web site comparison of your GT 420 to the GTX 750Ti, which is well regarded as an economy unit of relatively recent vintage with still almost competitive performance for the price and power consumption per unit productivity.
 


Your 420 on those comparisons looks likely to be an order of magnitude slower than the GTX 750 Ti, or perhaps worse.  Also it is practically certain to be greatly inferior to modern cards on power consumption vs. productivity, and thus not cheap even if the acquisition cost is zero.
 


So it is not obvious to me that your reported results suggest you have a problem readily fixable by any means other than procuring a more up-to-date and competitive graphics card.  Possibly you might consider a GTX 1050, which is very modern, pretty low-priced, and very power efficient. 
Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034220040
RAC: 22396279

Engagex BOINC-SETI wrote:I'm

Engagex BOINC-SETI wrote:
I'm finding I have the same issue. Boinc seems to not let me specify the number of GPUs and CPUs in the app_config.xml for E@H though. It always runs 1 whole CPU core and 1 whole GPU. It still takes about 20 hours per E@H project. My geforce can crank out most projects in about 45 min, some take about 2-3, maybe 4-5 hours.

You don't really have the same issue.  This project is a lot different now to what it was then.

I agree with the advice given by archae86.  Unfortunately, the card you have is quite unsuited to the current series of GPU tasks at Einstein.

A year ago, there was no FGRPB1G GPU OpenCL based app.  For your GPU, we were crunching Radio Pulsar data (BRP6 and/or BRP4G) with a CUDA app which was very efficient in its use of CPU support.  NVIDIA's implementation of OpenCL (there is currently no CUDA app) is such that the support of a full CPU core is required for the current GPU tasks.  There is no easy way to avoid this.  I have GTX 650s that I've shut down for the moment because the tasks take so long to crunch (around 3.5 hrs) and make the machine quite sluggish.  Your times are even longer (the ones currently in your list took around 8-9 hours) but a GT 420 is a much less capable GPU than a GTX 650.  I;m not at all surprised by your times.

A GTX 1050 would be a very decent upgrade.  It still requires the support of a full CPU core per GPU task.  I'm trying out some AMD RX 460s that do two tasks each in about 36-38 mins.  They are quite economical with power.and were cheaper for me to purchase than 1050s.

 

Cheers,
Gary.

Engagex BOINC-SETI
Engagex BOINC-SETI
Joined: 7 Oct 16
Posts: 5
Credit: 1062792
RAC: 0

I finally got it to work!

I finally got it to work! When done the total should be about 8.75 hours per half (which is slightly better than what I've been doing): https://einsteinathome.org/workunit/276348872

 

<app_config>          <app>         <name>hsgamma_FGRPB1G</name> <max_concurrent>1</max_concurrent>         <gpu_versions>             <gpu_usage>.5</gpu_usage>             <cpu_usage>.5</cpu_usage>         </gpu_versions>     </app>          <app>         <name>hsgamma_FGRPB1</name> <max_concurrent>1</max_concurrent>         <gpu_versions>             <gpu_usage>.5</gpu_usage>             <cpu_usage>.5</cpu_usage>         </gpu_versions>     </app>

    <app>
        <name>einsteinbinary_BRP4</name>
<max_concurrent>1</max_concurrent>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>.5</cpu_usage>
        </gpu_versions>
    </app>

    <app>
        <name>einsteinbinary_BRP5</name>
<max_concurrent>1</max_concurrent>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>.5</cpu_usage>
        </gpu_versions>
    </app>

    <app>
        <name>einsteinbinary_BRP5G</name>
<max_concurrent>1</max_concurrent>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>.5</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

I still need to add in the <avg_ncpus> and <ngpus> to the app_version portion of the file mainly for the CPU only apps, as well as <project_max_concurrent>

I only paid $40 for the GT420 (a little much, but not bad for hardly doing any research first). I actually bought a Quadro FX 1400 for $12 first not knowing I needed 256Mb of Vram and CUDA cores. :(

I just bought a GT710 for $25 to help out the team (machine) so we'll see how that goes. :D

It's a dedicated machine that I just overclocked FSB to max from 200Mhz to 300 and PCIe from 100 to 145 and it's running cooler than ever 135-138F avg under 100% load.

Engagex BOINC-SETI
Engagex BOINC-SETI
Joined: 7 Oct 16
Posts: 5
Credit: 1062792
RAC: 0

How would I go about

How would I go about specifying this setting in the config files?:
"Run CPU versions of applications for which GPU versions are available"

As I have computers on my grid that only crunch cpu tasks. Or would it be better all around to say no?

Also are CPU versions of apps any different than the GPU counterparts? Will the GPU versions return any additional data?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034220040
RAC: 22396279

Engagex BOINC-SETI wrote:I

Engagex BOINC-SETI wrote:
I finally got it to work! When done the total should be about 8.75 hours per half (which is slightly better than what I've been doing): https://einsteinathome.org/workunit/276348872

I don't understand what you mean by "8.75 hours per half".  Ignoring the compute errors for the moment, your last 5 successfully completed tasks (most recent first) have taken 11.23, 11.15, 11.13, 8.73, 8.41 hours respectively.  The last two were returned sufficiently long ago to have been done before you started using app_config.xml.  The first three would appear to show the results of using app_config.xml - a rather worse performance.

I've fixed the formatting in the full example you posted so we can talk about it more easily.  You have a whole bunch of irrelevant sections in what you are using so you should get rid of most of them.  For example, at the moment there are no GPU tasks for things like BRP4, BRP5, BRP5G so you need to remove those three.  FGRPB1 is for CPUs only so has no place in an app_config.xml file for the purpose of controlling GPU tasks.  The FGRPB1G entry is the only one you should even be considering.

Engagex BOINC-SETI wrote:

<app_config>   
    <app>
        <name>hsgamma_FGRPB1G</name>
        <max_concurrent>1</max_concurrent>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>.5</cpu_usage>
        </gpu_versions>
    </app>

    <app>
        <name>hsgamma_FGRPB1</name>
        <max_concurrent>1</max_concurrent>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>.5</cpu_usage>
        </gpu_versions>
    </app>

    <app>
        <name>einsteinbinary_BRP4</name>
        <max_concurrent>1</max_concurrent>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>.5</cpu_usage>
        </gpu_versions>
    </app>

    <app>
        <name>einsteinbinary_BRP5</name>
        <max_concurrent>1</max_concurrent>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>.5</cpu_usage>
        </gpu_versions>
    </app>

    <app>
       <name>einsteinbinary_BRP5G</name>
       <max_concurrent>1</max_concurrent>
       <gpu_versions>
          <gpu_usage>.5</gpu_usage>
          <cpu_usage>.5</cpu_usage>
       </gpu_versions>
     </app>
</app_config>

You've said nothing about other projects you support but I see you do support a whole bunch of them.  Are you trying to run GPU tasks from other projects concurrently with FGRPB1G tasks from here?  If you are, please be aware that at any instant, only one of the tasks will be executing and that the full GPU will be switching from one to the other rather than each one having access to just a fraction of a GPU.  You may well get poorer performance by using app_config.xml rather than just letting BOINC control things and give each project access to the full GPU according to your resource share preferences.

Engagex BOINC-SETI wrote:
I still need to add in the <avg_ncpus> and <ngpus> to the app_version portion of the file mainly for the CPU only apps, as well as <project_max_concurrent>

I don't believe you need to do any of these things.  Perhaps if you spell out in a lot more detail what you wish to achieve, including details about other projects that need to coexist on this particular machine, we will be able to help you achieve that outcome in the most efficient manner and without all the compute errors you currently seem to be getting.

Engagex BOINC-SETI wrote:
It's a dedicated machine that I just overclocked FSB to max from 200Mhz to 300 and PCIe from 100 to 145 and it's running cooler than ever 135-138F avg under 100% load.

If you have just dramatically increased the overclocking, the power used and the heat output will have risen. It can't be running "cooler than ever" unless it's not doing as much work or you have dramatically improved the cooling efficiency in some way.  You don't mention anything about improving cooling so are you really sure it's still at 100% load?

 

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034220040
RAC: 22396279

Engagex BOINC-SETI wrote:How

Engagex BOINC-SETI wrote:
How would I go about specifying this setting in the config files?:
"Run CPU versions of applications for which GPU versions are available"

You don't specify the setting in app_config.xml if that's the config file you are talking about.  If you have a host with a working GPU, that setting (in your website project prefs) can control whether or not the host crunches CPU tasks of the same type as well as the GPU tasks.  I don't believe it has any effect if the host doesn't have a working GPU (but I could be wrong).  If you wanted to have different answers for different machines, use different 'locations' (generic, home, work, school) and put the 'Yes' ones and the 'No' ones in two different locations.

Engagex BOINC-SETI wrote:

Also are CPU versions of apps any different than the GPU counterparts? Will the GPU versions return any additional data?

The two apps do the same analysis.  To try to make GPU tasks take a bit longer for modern GPUs, there is 5 times the work content of a GPU task compared to a CPU task.  The data comes from the same source, the Large Area Telescope (LAT) on board the Fermi satellite, it's just the packaging of the data that's different.

 

Cheers,
Gary.

Engagex BOINC-SETI
Engagex BOINC-SETI
Joined: 7 Oct 16
Posts: 5
Credit: 1062792
RAC: 0

Thank you for your thorough

Thank you for your thorough reply!

I am signed up for all projects that my computer can participate in. ;)

Since it's an AMD I was trying to simulate a "hyperthreaded" environment with the app_config.xmls especially with my GPU. I temporarily had a computer with an AMD A6-3620 APU and it automatically did 2 GPU tasks and like .025 or .05 cpus per gpu for most tasks.

Since you say it will be more efficient and what I've read about the client over the weekend, I've removed all my app_configs.

Why make the GPU task take longer? That seems like the same bullshit that bitcoin does and not to mention bass ackwards. Unless it's to create a bottleneck so the CPU can catch up.

I had to remove all my overclocks. :( It wasn't running as stable as it was the first two days. I guess I did add a fan in front of my HDDs...Embarassed

On that note, how much extra computing power would I gain by getting some PC3 1600 DIMMS with CL7 timings? VS PC3 1600 CL11 that I have to underclock to 1066 to get better overall speed because of better timing?

I haven't received my PCIe X1 to X16 adapter yet so I don't have my GT710 running yet. I'm also planning on getting an AMD HD 5450 depending on how much room I have after the adapter (I'll have to use another one). Only because I've read that AMD GPUs can be faster, especially with some projects. Would that still be the case even though the 710 is better on paper? http://gpuboss.com/gpus/Radeon-HD-5450-vs-GeForce-GT-710

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.