Running multiple tasks concurrently per GPU - How to revert to just one task.

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 174
Credit: 3085308626
RAC: 651435
Topic 218701

I tried running 2 WUs per GPU but would like to go back to one per GPU.  Something is wrong, cannot go back.  I set all my preferences to 1.0 even changed the venue to make sure the system saw the change, even tried the following

<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

 

Not sure if hsgamma_FGRPB1G is correct as I could not find it anywhere under applications but google found it.

This is  a quad core system and there are not enough cores to go around   There are 3 GPUs and each runs about %70 CPU per task (5 or 6 sometimes) and I only want 3 tasks, one each gpu.  This project seems to ignore the max_concurrent and the cpu_usage.

I tried 1, 2 and 3 in max concurrent, didn't make a differnce

 

[EDIT] Had to detach and re-attach to fix this.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118229403187
RAC: 24422512

BeemerBiker wrote:[EDIT] Had

BeemerBiker wrote:
[EDIT] Had to detach and re-attach to fix this.

For the benefit of anyone else who has a similar problem and finds this 'solution', it really isn't necessary to do that.  There are better solutions.

There are two ways to set up for running more than one GPU task concurrently on a discrete GPU.  Both have advantages and disadvantages.  The important thing to understand is that they work in different ways and it's not a good idea to have a mixture of both unless you really do understand the properties of each and how to 'undo' a configuration if necessary.

First, the simplest method is to go to your account page and select your project preferences (account -> preferences -> project).  Scroll down to find the section on GPU Utilization Factor.  There are 3 different settings - BRP (Binary Radio Pulsar) apps (no longer available for discrete GPUs) FGRP (Fermi Gamma Ray Pulsar) apps (the current GPU app) and GW (Gravitational Wave) apps (tested recently but not yet available, possibly for quite a while).  Only change the factor for the type of app you wish to use.  As of now that would be FGRP.

The number you enter is the fraction of a GPU that a task will use.  To run 2 concurrent tasks, each needs 0.5 of a GPU.  Make sure of two things.  (1) You change a value that applies to the 'location' you have assigned your computer to (default, home, work, school) and (2) you 'save' any changes you make.  If you haven't set up to use locations, you won't need to worry about (1).

This setting cannot act immediately.  The ONLY way your computer will know about it is when it receives a new task with the new settings encoded.  Even if you click 'update' in BOINC Manager, your computer won't be told until it receives new work.  This will also happen if you want to reverse the change at a later stage.  Your computer won't immediately revert to previous settings until new work is received that contain those previous settings.  This 'time delay' for changed settings to apply is one of the disadvantages of the simple method.  Another is that there is no ability to also configure the fraction of a CPU to be budgeted for GPU support duties.  Please note that "budgeting" is definitely not the same as what fraction of a CPU core that a task will actually use.

The second method involves you setting up a suitable app_config.xml file as documented here.  There are two particular advantages.  The changes are handled locally and don't require contact with the project.  In fact they will override any changes subsequently made on the website.  Local changes are applied immediately by just clicking 'reread config files' in BOINC Manager.  The second advantage is that you can customise both the fraction of a GPU and the fraction of a CPU that are budgeted for supporting a GPU task.

When you set up an app_config.xml file, there are disadvantages you need to be aware of - quite apart from understanding the syntax of the file itself.  The contents of the file are installed inside the state file (client_state.xml) which is the major piece of infrastructure that allows BOINC to do its work.  When you change the config file locally, the changed values will also cause the values in the state file to be updated.  If you decide not to use a config file any more, the stuff in the state file doesn't get removed, even if you delete the config file.

To go back to the original default values, you need to edit the config file to contain those values and then 'reread config files'.  To stop having to edit this file when things change, you can do what the OP did, or you could reset the project, both of which are like using a sledge hammer to crack an egg.  It should also be possible to manually remove the insertions in the state file with the client stopped first.  Not recommended unless you really know what you are doing.

Because an app_config.xml file always overrides GPU utilization settings supplied by the project, the OP could have reverted to default settings with the example he showed.  The max_concurrent is not needed because it will never be used when the cpu_usage and gpu_usage are both set to 1.  There can only be a single GPU task with default settings.  Here is the file with proper formatting to make it easy to see what includes what.

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>1</gpu_usage>
            <cpu_usage>1</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

One question often asked about these files is, "Where do I find the correct <name> to use?"  The documentation linked above shows a couple of different ways to find that.

If the OP had installed the file he showed in the proper place and clicked the 'reread config files' option, the crunching of GPU tasks should have reverted to the default of 1 task per discrete GPU. If that didn't happen, there must have been something wrong with the file, perhaps in the wrong place.  When clicking 'reread config files', always look in the event log to confirm that the file was found and that no syntax errors were reported.

 

Cheers,
Gary.

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 154
Credit: 2226732709
RAC: 535268

<max_concurrent>N</max_concur

<max_concurrent>N</max_concurrent> option affect only total number of task running on client. Per computer, not per GPU.
And only for one app. So if a project has few different GPU apps (like E@H now does) it can run more tasks from one project.
While <gpu_usage>x</gpu_usage> works on per GPU basis.

Make difference on multi-GPU setups.

"hsgamma_FGRPB1G"  is correct app name for Gamma Ray Binary Search. And "einstein_O1OD1E" - is the short name for current Gravitational Wave Engineering run on GPUs to use in config

Both options work fine with E@H.

Just need to ensure correct name for config file (app_config.xml) and correct place (\projects\einstein.phys.uwm.edu\ folder in BOINC data folder).
Plus apply these setting by "reread config files" command from menu or just restarting BOINC client.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118229403187
RAC: 24422512

Mad_Max wrote:Make difference

Mad_Max wrote:
Make difference on multi-GPU setups.

That may well be, but the purpose for creating and pinning my response wasn't for addressing multi-GPU cases.  The OP had already 'solved' his problem.  I took advantage of the opportunity to point out the details of the different methods for having a single GPU crunch more than one task concurrently.

More importantly, I wanted to draw attention to the methods for removing the concurrent tasks capability if there was a change of heart.  I was just trying to provide basic information so that ordinary users could assess the pros and cons and choose the best option for their circumstances.

I felt it best to avoid the complexities of multiple GPUs, multiple projects and multiple apps for the one project..  That tends to be the domain of power users who will need to work out the details for their own particular requirements.  Any such setup is likely to require quite a bit of 'trial and error' before it works the way the user wants.

Mad_Max wrote:

"hsgamma_FGRPB1G"  is correct app name for Gamma Ray Binary Search. And "einstein_O1OD1E" - is the short name for current Gravitational Wave Engineering run on GPUs to use in config

Both options work fine with E@H.

With Bernd's recent announcement, the engineering run is about to finish.  Already generated tasks will be sent out but no new work units will be produced.  Eventually, there will be an "Injection" run but I imagine it will have a different short name.  There was no time frame mentioned so we have no clue as to when this could start.  It may be a while.  For the time being, the only thing that will remain for GPUs is the FGRPB1G short name since there won't be a GPU app for O2AS when it restarts.

I mentioned that suggestions for finding the short names, are given in the documentation.  Since these names change with each new search, it's best to know how to find them when you need to.  I always do a search for "app_name" in the state file (client_state.xml) and that immediately shows all the currently defined names.

If people choose to use the app_config.xml method, they really should make sure they understand how it works (and if anything has changed) by reading the current instructions.  Some of these details have changed over time so it's always best not just to rely on stuff written on message boards that could easily become outdated.

 

Cheers,
Gary.

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 174
Credit: 3085308626
RAC: 651435

Gary Roberts wrote:This

Gary Roberts wrote:

This setting cannot act immediately.  The ONLY way your computer will know about it is when it receives a new task with the new settings encoded.  Even if you click 'update' in BOINC Manager, your computer won't be told until it receives new work.  ors were reported.

 

 

That was the problem:  I had queued up tasks so never got additional work unit. So maybe the following:

1.  stop new work

2.  suspend existing work

3. allow new work

If I am correct the new work would come in under a 1 cpu and 1 task rule.  I had the config file in the correct place. 

Alternately detach or abort existing work which I rarely do.

This system had only 4 cores but clearly throughput would be slightly better running 2 tasks for each RX-570

I calculated average of about 710 seconds each for single and about 1200 (600 equivalent) each for multiple tasks.  Pick here shows mix of both systems but you can see the difference.  I can calculate the throughput accurately using BoincTask's history but I did not have it on when I was looking at the problem.

================

strange, the images does not show up in preview, but looks perfect in edit mode. Stats info is at below url

stateson.net\images\ein_rx570.png

================

PowerAndUsageStats

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7261171908
RAC: 1544125

BeemerBiker wrote:1.  stop

BeemerBiker wrote:

1.  stop new work

2.  suspend existing work

3. allow new work

If I am correct the new work would come in under a 1 cpu and 1 task rule. 

As written this will not give the desired result.  BOINC (on your machine) does not get new tasks when any tasks are suspended.

You can, however, have your cake and eat it by watching the progression of running tasks on your machine, then unsuspending all and forcing an update, then waiting out the 1 minute delay, as likely BOINC on your machine won't learn of any revised settings until the first update.   As soon as you see a task beginning to download, you can re-assert any task suspensions.  For some preference changes this may be important to avoid gross over-fetching (especially when raising multiplicity, say from 1X to 2X). 

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

Hello all, I'm new at this

Hello all,

I'm new at this so bear with me.

What is the advantage to running concurrent tasks on the GPU? It seems to me doing so will only increase the amount of time for each task (including the use of an additional core). Yes two tasks are completed at once, however, it takes more than twice as long. In addition, the second core is tied up and not available to do other work.

My configuration is, an Intel Core 2 Dou @ 3MHz, and a GEFORCE GTX1030. I have only changed the preference setting to .5 on the E@H site. I have not made any changes to the app_config.xml file.

Thanks,

Matt

Clear skies,
Matt
Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

ka1bqp wrote:Hello all, I'm

ka1bqp wrote:

Hello all,

I'm new at this so bear with me.

What is the advantage to running concurrent tasks on the GPU? It seems to me doing so will only increase the amount of time for each task (including the use of an additional core). Yes two tasks are completed at once, however, it takes more than twice as long. In addition, the second core is tied up and not available to do other work.

My configuration is, an Intel Core 2 Dou @ 3MHz, and a GEFORCE GTX1030. I have only changed the preference setting to .5 on the E@H site. I have not made any changes to the app_config.xml file.

Thanks,

Matt

 

Hello Matt, 

 

And welcome to the Forums.  Actually, there's data missing from these discussions when talking about running concurrent work units on a graphics card.  As you can tell from your set up, doing more than 1 work unit per card really doesn't benefit you at all. This has to do with both the GPU, Motherboard, PCIe speed, CPU.

The best set up which benefits from running multiple work units are high end GPUs 10X0 or 20X0 (in the form of 70, 70Ti, 80, 80Ti, Titans)  Motherboards with at least 2 PCIe slots or more all running at 16X and CPU with at least 16 PCIe lanes (but preferably 40).

Einstein is very much bandwidth dependent on the PCIe slots. Anything less than 16x will process slower. Same thing with low end CPUs (fewer PCIe lanes that can be used)

With high end GPUs, doing more than 1 work unit at a time will result in faster times over all. I believe I was doing 3 at a time with a total reduction of 2-3 minutes overall for each work unit under Windows. 

Hope this clears things up.

 

Z

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

Thanks Z! This certainly

Thanks Z!

This certainly clears things up. Unfortunately the two PCIe x16 slots on my motherboard are separated by another interface, otherwise I would have bought a dual interface card. Oh well. As configured, the machaine can run the current FGRP task in about 40 minutes. I'll switch future tasks back to 1.00 on the GPU runs.

The box I am using is somewhat long in the tooth (HP XW4600) but it is dedicated to running BIONIC.

Thanks.

Matt

Clear skies,
Matt
Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Hey Matt,  I'm sorry. I

Hey Matt, 

I'm sorry. I didn't mean to give the impression that you need a dual interface GPU.  A single PCIe running at 16X is really the best option. Didn't have my morning coffee yet...

 

Z

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118229403187
RAC: 24422512

ka1bqp wrote:... run the

ka1bqp wrote:

... run the current FGRP task in about 40 minutes. I'll switch future tasks back to 1.00 on the GPU runs.

The box I am using is somewhat long in the tooth (HP XW4600) but it is dedicated to running BIONIC.

Hi Matt,
Welcome to the forums from me as well!

I had a look at your computer and it shows as having a GT 1030.  That is a low end GPU so it's not surprising that single tasks take so long and that you could lose by trying to run two concurrently.  If you had something closer to mid-range like a 1050 or 1050Ti (or better) then your crunch times would be lower and you would see some (probably modest) further gain from running more than a single task at a time.

I had a quick look through your tasks list and saw essentially 2 groups of times - around 4000s and around 7900s or around 67 mins and 132 mins.  I imagine those represent single (x1) and dual (x2) tasks running concurrently.  The x2 did seem to gain very slightly on x1 but not enough to be worth it.

When running GPU tasks, the 'oldness' of the CPU is relatively unimportant, so you don't need to worry too much about that.  You could produce a lot more (and a lot faster) output from a 'better' GPU but at the cost of higher power bills and probably the need for a better PSU as well.  That would depend on the ratings of your current PSU.  It could also depend on available space inside your case.

Once again, welcome to Einstein@Home and if you need help with anything, don't hesitate to ask.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.