I am having trouble number of tasks on my GPU

Cherokee150

Joined: 13 May 11

Posts: 24

Credit: 899815254

RAC: 286107

16 Apr 2020 10:47:56 UTC

Topic 221958

(moderation:

)

I have been testing how to optimize my computer's processing performance on Einstein tasks. I am having a problem with my computer 4213062.

The computer is:

Intel Core 2 Quad CPU Q8400 @ 2.66GHz [Family 6 Model 23 Stepping 10]
RAM: 8190 MB
GPU: one NVIDIA GeForce GTX 1070 (4095MB) driver: 390.65
Note: Einstein shows the GPU has 4095 MB, but it is supposed to have 8192 MB, and GPU-Z shows 8192 MB.
Windows 10 Professional x64 Edition, (10.00.18362.00)
BOINC Ver: 7.14.2

I have a problem using app_config.xml to change the number of simultaneous tasks on the GPU.
It works fine if I try to run one or two tasks at a time, but attempts to run three or more at once only run two tasks simultaneously.

I do save the changes to my app_config.xml and restart BOINC each time I make a change.

Here is an example of the code I am using to attempt to run two O2MDF tasks or three FGRP tasks.

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>0.33</gpu_usage>
            <cpu_usage>1</cpu_usage>
        </gpu_versions>
    </app>
    <app>
        <name>einstein_O2MDF</name>
        <gpu_versions>
            <gpu_usage>0.5</gpu_usage>
            <cpu_usage>1</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

Am I doing something wrong? If so, what should I change?

Also, are there any other app names I should add to ensure I am running all of Einstein's task types?

Thanks!

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 525

Credit: 10547213795

RAC: 6345484

Hoping to have understood

16 Apr 2020 11:42:35 UTC

Message 176666

(moderation:

)

Hoping to have understood your question ...

This is how I work it:

<app_config>

      <project_max_concurrent>8</project_max_concurrent>

      <app>
            <name>hsgamma_FGRPB1G</name>
            <max_concurrent>2</max_concurrent>
            <gpu_versions>
                <gpu_usage>1</gpu_usage>
                <cpu_usage>1</cpu_usage>
            </gpu_versions>
      </app>

      <app>
            <name>einstein_O2MDF</name>
            <max_concurrent>1</max_concurrent>
            <gpu_versions>
                <gpu_usage>1</gpu_usage>
                <cpu_usage>1</cpu_usage>
            </gpu_versions>
      </app>

</app_config>

The above lets me run max 8 WUs, consisting of

max 2 FGRPB1G gpu WUs

and

max 1 O2MDF gpu WU

plus

max 8 cpu WUs --> minus max 3 gpu WUs = 5 cpus WUs

The max_concurrent part might be of interest to you.

I would update the NVIDIA driver to perhaps 445.87, yours is very old.

Cherokee150

Joined: 13 May 11

Posts: 24

Credit: 899815254

RAC: 286107

Thank you, San Fernando. I

16 Apr 2020 13:44:50 UTC

Message 176667

(moderation:

)

Thank you, San Fernando.

I tried your suggestions. Unfortunately, they didn't work on my computer. Nothing would run.

I think I didn't do a good job of explaining what I have, and what I am trying to do.

The computer in question has one processor with four CPUs built into that chip. It's an old Intel Core2Quad chip. I have only one GPU, the GTX 1070 I mentioned. At this time I wish to run only GPU tasks. The app_config file I pasted into my original post will only run two tasks at a time on the one GPU, regardless of whether they are both two gamma ray tasks, two gravitational wave tasks, or one of each. I would like to be able to run up to three tasks at a time on my one GPU. For some reason I can't get that third task to run. What I had worked okay in SETI. I could run up to four tasks simultaneously on my one GPU by setting the GPU Usage to 0.25 and the CPU usage to 1. However, for Einstein, setting the GPU and CPU Usages to 1 allows one GPU task to run; and setting the GPU Usage to 0.5 and the CPU Usage to 1 allows two Einstein GPU tasks to run on my one GPU simultaneously. However, setting the GPU Usage to 0.33 or smaller and the CPU Usage to 1 does not allow any additional Einstein tasks to run simultaneously on my one GPU.

That is my problem.

Is it possible to get that one GPU to run three or more Einstein GPU tasks simultaneously, and, if it is possible, how do I set it up?

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 525

Credit: 10547213795

RAC: 6345484

Well, I guess I misunderstood

16 Apr 2020 14:15:15 UTC

Message 176668 in response to message 176667

(moderation:

)

Well, I guess I misunderstood you.

Sorry.

Have you checked the Event Log and the stderr log of the tasks?

I guess you know about these.

Did you update your gpu driver?

One gpu WU uses about 900MB ram on my GTX 1650.

So maybe running three might be to much if the system thinks you only have a 4000MB ram gpu card.

If it is a ram problem, you should be able to see the appropiate message in stderr log.

That is about all I can help you. Godd luck!

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Cherokee150 wrote:Thank you,

16 Apr 2020 20:23:24 UTC

Message 176680 in response to message 176667

(moderation:

)

Cherokee150 wrote:

Thank you, San Fernando.

I tried your suggestions. Unfortunately, they didn't work on my computer. Nothing would run.

I think I didn't do a good job of explaining what I have, and what I am trying to do.

The computer in question has one processor with four CPUs built into that chip. It's an old Intel Core2Quad chip. I have only one GPU, the GTX 1070 I mentioned. At this time I wish to run only GPU tasks. The app_config file I pasted into my original post will only run two tasks at a time on the one GPU, regardless of whether they are both two gamma ray tasks, two gravitational wave tasks, or one of each. I would like to be able to run up to three tasks at a time on my one GPU. For some reason I can't get that third task to run. What I had worked okay in SETI. I could run up to four tasks simultaneously on my one GPU by setting the GPU Usage to 0.25 and the CPU usage to 1. However, for Einstein, setting the GPU and CPU Usages to 1 allows one GPU task to run; and setting the GPU Usage to 0.5 and the CPU Usage to 1 allows two Einstein GPU tasks to run on my one GPU simultaneously. However, setting the GPU Usage to 0.33 or smaller and the CPU Usage to 1 does not allow any additional Einstein tasks to run simultaneously on my one GPU.

That is my problem.

Is it possible to get that one GPU to run three or more Einstein GPU tasks simultaneously, and, if it is possible, how do I set it up?

What were your times with 1 WU to a GPU vs 2 WU to the GPU? Was there any significant decrease or was it just twice as long? What was your GPU utilization with 2 work units? I'm trying to remember but I don't think there was any benefit to running more than 1 WU per GPU. Unlike Seti, the OpenCl apps are rooted in fairly old programing. Seti had the benefit of Petri's special Cuda or Raistmer's OpenCl. Both of which had been highly modified over the years to run leaner and quicker.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5879

Credit: 118897009032

RAC: 23433528

Cherokee150 wrote:I have been

17 Apr 2020 1:50:00 UTC

Message 176686

(moderation:

)

Cherokee150 wrote:

I have been testing how to optimize my computer's processing performance on Einstein tasks. I am having a problem with my computer 4213062.

Thanks for documenting everything so well. I don't use nvidia GPUs at all but I do have lots of experience with the GPU apps on a whole variety of AMD GPUs. Also, since I have about 20 machines with Q8400 CPUs, all with discrete GPUs, I have some experience that might be relevant to your CPU.

There is nothing wrong with the syntax of your app_config.xml file. However, trying to run 3 GRP tasks concurrently is most likely to be of no benefit to you. The GRP app puts enough load on the GPU that running just 2 of these pretty much will be the best you can do. I get about a 7-10% improvement in throughput from running 2x. When I try to run 3x, there is no gain in throughput and crunch times suddenly become variable. I interpret this as some resources being over-committed. Tasks have to wait for resources to be freed. Sometimes tasks even crash.

There has been for me a very nice benefit for running GW GPU tasks at 2x, 3x, and even 4x in earlier times. This is because the algorithm is apparently not amenable to running everything in a parallel fashion. There are sections of code that need the sequential processing capabilities of a CPU and so would probably run much more slowly if forced to run on a GPU. By running multiple concurrent tasks, with enough CPU cores available to handle the CPU only bits, there was considerable extra throughput to be gained.

The biggest problem with that was that the CPU architecture turned out to be extremely important. I spent a lot of time testing older architecture like the Q6600 and Q8400 quads and found that both crunch times and error rates were much higher with the older CPUs compared with the same GPU being supported by a more modern CPU. Admittedly, that was in much earlier times when there were more bugs in the app, so I don't really know if that has now changed. What I do know is that the most recent GW GPU tasks seem to be a lot more memory hungry and people are now reporting problems running even single tasks on 3GB nvidia GPUs. I'm still running 3x on 4GB AMD RX 570 GPUs but I'm about to reduce to 2x to make sure I avoid problems with the very latest tasks.

Cherokee150 wrote:

Note: Einstein shows the GPU has 4095 MB, but it is supposed to have 8192 MB, and GPU-Z shows 8192 MB.

BOINC detects your hardware during startup. Check your event log to see what BOINC says about your GPU. If BOINC is mis-detecting, you could try upgrading to the newly released 7.16.5 to see if that fixes things. If not, perhaps report it on the BOINC boards. Einstein just uses what BOINC says.

If BOINC is already detecting the 8192 MB, let us know and I'll report it to the Einstein staff to see why it's being listed incorrectly. Your GPU should be able to handle 2x. Because of the nature of the app, that should give a reasonable improvement. You still have the problem of CPU architecture, though.

Cherokee150 wrote:

I have a problem using app_config.xml to change the number of simultaneous tasks on the GPU.
It works fine if I try to run one or two tasks at a time, but attempts to run three or more at once only run two tasks simultaneously.

I don't know why that would be. The only thing I can think of is the setting for how many cores BOINC is allowed to use. You don't have that set to 50% by any chance?

Another point here that you need to consider is the way nvidia implements OpenCL in it's drivers. nvidia derives a lot of revenue from professional series cards running CUDA for compute. Whilst it supports OpenCL, it stands to reason that it's not going to damage its revenue stream by allowing OpenCL to compete on a level playing field. You should draw your own conclusions about that. I'm not technically qualified to comment in detail. I will note that nvidia GPUs consistently require more than a full CPU for the support of GW GPU tasks and this is quite different from the way AMD GPUs behave.

Cherokee150 wrote:

Am I doing something wrong? If so, what should I change?

Also, are there any other app names I should add to ensure I am running all of Einstein's task types?

No, you're not doing anything 'wrong' but I'll make some suggestions about some 'difficulties' you might like to avoid.

For your hardware (both machines) there are just the two GPU searches you are currently discussing. The GRP search has estimates that are significantly longer than the true crunch time. It's the exact opposite for the GW search. For Einstein, it needs to use Locality Scheduling to handle GW tasks (both CPU and GPU) and that is its main 'business' - the first detection of continuous GW. To use Locality Scheduling it uses its own highly customised version of the BOINC server code, complete with the older DCF (duration correction factor) mechanism for adjusting task estimates.

Since there is a single DCF for all searches, badly calibrated task estimates can play havoc with large work caches when alternate types of tasks try to impose radically different 'corrections' to estimates. This is not really a problem at all, if you keep the cache size small and ignore the incorrect estimates. BOINC will be able to manage if you don't exceed something like 0.5 - 1.0 cache size.

A better way (which could work for you) is to run just one search on each host. To use different preferences for each host you would need to use two separate 'locations'. For example you could put one in 'home' and one in 'work' and set the preferences for each separately.

Each host would have vastly different DCF values stored in the state file but the estimates would be correct and the work on hand would agree with what the cache size was set to. You have 2 machines. You could use the one with the GTX 960 to run just the GRP tasks - you could try 2x - and the one with the 1070 to run just the GW tasks. 2x should be fine on that. You might also get a benefit from 3x. The only way to find out is to try :-).

Please ask further questions if there is anything that's not clear. I hope some of the above will be useful to you.

Cheers,
Gary.

Keith Myers

Joined: 11 Feb 11

Posts: 5049

Credit: 19087269461

RAC: 6071253

Quote: BOINC detects your

17 Apr 2020 6:28:34 UTC

Message 176690

(moderation:

)

Quote:

BOINC detects your hardware during startup. Check your event log to see what BOINC says about your GPU. If BOINC is mis-detecting, you could try upgrading to the newly released 7.16.5 to see if that fixes things. If not, perhaps report it on the BOINC boards. Einstein just uses what BOINC says.

In the case of Nvidia cards, BOINC is unable to correctly read or determine the amount of VRAM on card because it is using a 32 bit API from Nvidia. The max reported will always be 4096MB, the most you can enumerate with 32 bits. You can't do anything about that until the BOINC developers correct that problem.

The GPUUG developers figured out where the BOINC code is wrong and developed our own client which reads Nvidia card RAM capacities correctly. We informed the BOINC developers what code changes need to be done to fix the issue. It is just a simple matter of changing what API structure is used to probe the card. The client needs to be updated to incorporate the change.

You can look at my hosts and see that my card RAM capacities are read correctly.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5879

Credit: 118897009032

RAC: 23433528

Keith Myers wrote:You can

17 Apr 2020 8:12:41 UTC

Message 176693 in response to message 176690

(moderation:

)

Keith Myers wrote:

You can look at my hosts and see that my card RAM capacities are read correctly.

OK, thanks very much for the explanation. I don't have nvidia cards so I didn't realise that BOINC had that problem.

Have the BOINC Devs given any reason why they don't adopt the fix you mention? Seems very strange that they wouldn't immediately adopt something handed to them on a plate.

Cheers,
Gary.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3578934237

RAC: 873084

Gary Roberts

17 Apr 2020 11:07:45 UTC

Message 176696 in response to message 176686

(moderation:

)

Gary Roberts wrote:

Cherokee150 wrote:
I have been testing how to optimize my computer's processing performance on Einstein tasks. I am having a problem with my computer 4213062.
Thanks for documenting everything so well. I don't use nvidia GPUs at all but I do have lots of experience with the GPU apps on a whole variety of AMD GPUs. Also, since I have about 20 machines with Q8400 CPUs, all with discrete GPUs, I have some experience that might be relevant to your CPU.

There is nothing wrong with the syntax of your app_config.xml file. However, trying to run 3 GRP tasks concurrently is most likely to be of no benefit to you. The GRP app puts enough load on the GPU that running just 2 of these pretty much will be the best you can do. I get about a 7-10% improvement in throughput from running 2x. When I try to run 3x, there is no gain in throughput and crunch times suddenly become variable. I interpret this as some resources being over-committed. Tasks have to wait for resources to be freed. Sometimes tasks even crash.

My most recent testing was a few years ago, but I saw a several percent net RAC gain running 3x GRP tasks on 1070/1080 GPUs under windows. It was much smaller than that from going from 1 to 2 tasks because during normal operations 2 tasks were enough to fully max out the GPU. There was a startup (shutdown?) period in each task that put minimal load on the GPU though, and a single task during its main crunching phase couldn't max the card out which let the 3x task mode give a modest - but larger than that from running another CPU task - gain over 2x.

Keith Myers

Joined: 11 Feb 11

Posts: 5049

Credit: 19087269461

RAC: 6071253

Gary Roberts wrote:Keith

17 Apr 2020 16:27:50 UTC

Message 176710 in response to message 176693

(moderation:

)

Gary Roberts wrote:

Keith Myers wrote:
You can look at my hosts and see that my card RAM capacities are read correctly.
OK, thanks very much for the explanation. I don't have nvidia cards so I didn't realise that BOINC had that problem.

Have the BOINC Devs given any reason why they don't adopt the fix you mention? Seems very strange that they wouldn't immediately adopt something handed to them on a plate.

Haha LOL. You obviously have never interacted with the devs or follow the github conversations. They will take their time studying the issue IF they even put the issue to the top of their TODO list. Then they have to pass the code through their validators and code profilers and finally submit the code change for merge consensus.

This bug has existed since the beginning of gpu use and they never considered it worthy of attention. This was the comment from our GPUUG developer.

Quote:

Boinc coders had blamed Nvidia for the problem but it really was a Boinc problem.

The library has the old 32 version in it to stay binary compatible with the code compiled against the old version of the library. Any new 64 bit code normally linked with the library would be compiled using 64 bit header files that add that _v2 to the symbol name resolved from the library when the code calls the function and everything works correctly.

But boinc isn't using the library with normal linking. It wants to avoid dependency on Cuda development stuff, so it doesn't use any headers and accesses the library 'the hard way' by the running code finding the library file and extracting individual symbol names from it and casting them to function pointers to be used. If you do 'manual' linking this way, then it is your responsibility to handle the different function versions too. Boinc didn't do this, so the bug was entirely in their end.

mikey

Joined: 22 Jan 05

Posts: 12844

Credit: 1884318765

RAC: 478631

Keith Myers wrote: The

17 Apr 2020 20:38:06 UTC

Message 176730 in response to message 176710

(moderation:

)

Keith Myers wrote:

The library has the old 32 version in it to stay binary compatible with the code compiled against the old version of the library. Any new 64 bit code normally linked with the library would be compiled using 64 bit header files that add that _v2 to the symbol name resolved from the library when the code calls the function and everything works correctly.

Supposedly one of the upcoming versions will be 64 bit only and stay that way from then one, with the current version still staying around to support 32 bit stuff. They haven't said which one it will be or how long it will take on the email list I get.

I am having trouble number of tasks on my GPU

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner