Running multiple tasks concurrently per GPU - How to revert to just one task.

BeemerBiker
BeemerBiker
Joined: 7 May 07
Posts: 26
Credit: 403,285,925
RAC: 48,667
Topic 218701

I tried running 2 WUs per GPU but would like to go back to one per GPU.  Something is wrong, cannot go back.  I set all my preferences to 1.0 even changed the venue to make sure the system saw the change, even tried the following

<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

 

Not sure if hsgamma_FGRPB1G is correct as I could not find it anywhere under applications but google found it.

This is  a quad core system and there are not enough cores to go around   There are 3 GPUs and each runs about %70 CPU per task (5 or 6 sometimes) and I only want 3 tasks, one each gpu.  This project seems to ignore the max_concurrent and the cpu_usage.

I tried 1, 2 and 3 in max concurrent, didn't make a differnce

 

[EDIT] Had to detach and re-attach to fix this.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,681
Credit: 24,001,009,084
RAC: 33,164,719

BeemerBiker wrote:[EDIT] Had

BeemerBiker wrote:
[EDIT] Had to detach and re-attach to fix this.

For the benefit of anyone else who has a similar problem and finds this 'solution', it really isn't necessary to do that.  There are better solutions.

There are two ways to set up for running more than one GPU task concurrently on a discrete GPU.  Both have advantages and disadvantages.  The important thing to understand is that they work in different ways and it's not a good idea to have a mixture of both unless you really do understand the properties of each and how to 'undo' a configuration if necessary.

First, the simplest method is to go to your account page and select your project preferences (account -> preferences -> project).  Scroll down to find the section on GPU Utilization Factor.  There are 3 different settings - BRP (Binary Radio Pulsar) apps (no longer available for discrete GPUs) FGRP (Fermi Gamma Ray Pulsar) apps (the current GPU app) and GW (Gravitational Wave) apps (tested recently but not yet available, possibly for quite a while).  Only change the factor for the type of app you wish to use.  As of now that would be FGRP.

The number you enter is the fraction of a GPU that a task will use.  To run 2 concurrent tasks, each needs 0.5 of a GPU.  Make sure of two things.  (1) You change a value that applies to the 'location' you have assigned your computer to (default, home, work, school) and (2) you 'save' any changes you make.  If you haven't set up to use locations, you won't need to worry about (1).

This setting cannot act immediately.  The ONLY way your computer will know about it is when it receives a new task with the new settings encoded.  Even if you click 'update' in BOINC Manager, your computer won't be told until it receives new work.  This will also happen if you want to reverse the change at a later stage.  Your computer won't immediately revert to previous settings until new work is received that contain those previous settings.  This 'time delay' for changed settings to apply is one of the disadvantages of the simple method.  Another is that there is no ability to also configure the fraction of a CPU to be budgeted for GPU support duties.  Please note that "budgeting" is definitely not the same as what fraction of a CPU core that a task will actually use.

The second method involves you setting up a suitable app_config.xml file as documented here.  There are two particular advantages.  The changes are handled locally and don't require contact with the project.  In fact they will override any changes subsequently made on the website.  Local changes are applied immediately by just clicking 'reread config files' in BOINC Manager.  The second advantage is that you can customise both the fraction of a GPU and the fraction of a CPU that are budgeted for supporting a GPU task.

When you set up an app_config.xml file, there are disadvantages you need to be aware of - quite apart from understanding the syntax of the file itself.  The contents of the file are installed inside the state file (client_state.xml) which is the major piece of infrastructure that allows BOINC to do its work.  When you change the config file locally, the changed values will also cause the values in the state file to be updated.  If you decide not to use a config file any more, the stuff in the state file doesn't get removed, even if you delete the config file.

To go back to the original default values, you need to edit the config file to contain those values and then 'reread config files'.  To stop having to edit this file when things change, you can do what the OP did, or you could reset the project, both of which are like using a sledge hammer to crack an egg.  It should also be possible to manually remove the insertions in the state file with the client stopped first.  Not recommended unless you really know what you are doing.

Because an app_config.xml file always overrides GPU utilization settings supplied by the project, the OP could have reverted to default settings with the example he showed.  The max_concurrent is not needed because it will never be used when the cpu_usage and gpu_usage are both set to 1.  There can only be a single GPU task with default settings.  Here is the file with proper formatting to make it easy to see what includes what.

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>1</gpu_usage>
            <cpu_usage>1</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

One question often asked about these files is, "Where do I find the correct <name> to use?"  The documentation linked above shows a couple of different ways to find that.

If the OP had installed the file he showed in the proper place and clicked the 'reread config files' option, the crunching of GPU tasks should have reverted to the default of 1 task per discrete GPU. If that didn't happen, there must have been something wrong with the file, perhaps in the wrong place.  When clicking 'reread config files', always look in the event log to confirm that the file was found and that no syntax errors were reported.

 

Cheers,
Gary.

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 124
Credit: 1,180,775,237
RAC: 969,479

<max_concurrent>N</max_concur

<max_concurrent>N</max_concurrent> option affect only total number of task running on client. Per computer, not per GPU.
And only for one app. So if a project has few different GPU apps (like E@H now does) it can run more tasks from one project.
While <gpu_usage>x</gpu_usage> works on per GPU basis.

Make difference on multi-GPU setups.

"hsgamma_FGRPB1G"  is correct app name for Gamma Ray Binary Search. And "einstein_O1OD1E" - is the short name for current Gravitational Wave Engineering run on GPUs to use in config

Both options work fine with E@H.

Just need to ensure correct name for config file (app_config.xml) and correct place (\projects\einstein.phys.uwm.edu\ folder in BOINC data folder).
Plus apply these setting by "reread config files" command from menu or just restarting BOINC client.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,681
Credit: 24,001,009,084
RAC: 33,164,719

Mad_Max wrote:Make difference

Mad_Max wrote:
Make difference on multi-GPU setups.

That may well be, but the purpose for creating and pinning my response wasn't for addressing multi-GPU cases.  The OP had already 'solved' his problem.  I took advantage of the opportunity to point out the details of the different methods for having a single GPU crunch more than one task concurrently.

More importantly, I wanted to draw attention to the methods for removing the concurrent tasks capability if there was a change of heart.  I was just trying to provide basic information so that ordinary users could assess the pros and cons and choose the best option for their circumstances.

I felt it best to avoid the complexities of multiple GPUs, multiple projects and multiple apps for the one project..  That tends to be the domain of power users who will need to work out the details for their own particular requirements.  Any such setup is likely to require quite a bit of 'trial and error' before it works the way the user wants.

Mad_Max wrote:

"hsgamma_FGRPB1G"  is correct app name for Gamma Ray Binary Search. And "einstein_O1OD1E" - is the short name for current Gravitational Wave Engineering run on GPUs to use in config

Both options work fine with E@H.

With Bernd's recent announcement, the engineering run is about to finish.  Already generated tasks will be sent out but no new work units will be produced.  Eventually, there will be an "Injection" run but I imagine it will have a different short name.  There was no time frame mentioned so we have no clue as to when this could start.  It may be a while.  For the time being, the only thing that will remain for GPUs is the FGRPB1G short name since there won't be a GPU app for O2AS when it restarts.

I mentioned that suggestions for finding the short names, are given in the documentation.  Since these names change with each new search, it's best to know how to find them when you need to.  I always do a search for "app_name" in the state file (client_state.xml) and that immediately shows all the currently defined names.

If people choose to use the app_config.xml method, they really should make sure they understand how it works (and if anything has changed) by reading the current instructions.  Some of these details have changed over time so it's always best not just to rely on stuff written on message boards that could easily become outdated.

 

Cheers,
Gary.

BeemerBiker
BeemerBiker
Joined: 7 May 07
Posts: 26
Credit: 403,285,925
RAC: 48,667

Gary Roberts wrote:This

Gary Roberts wrote:

This setting cannot act immediately.  The ONLY way your computer will know about it is when it receives a new task with the new settings encoded.  Even if you click 'update' in BOINC Manager, your computer won't be told until it receives new work.  ors were reported.

 

 

That was the problem:  I had queued up tasks so never got additional work unit. So maybe the following:

1.  stop new work

2.  suspend existing work

3. allow new work

If I am correct the new work would come in under a 1 cpu and 1 task rule.  I had the config file in the correct place. 

Alternately detach or abort existing work which I rarely do.

This system had only 4 cores but clearly throughput would be slightly better running 2 tasks for each RX-570

I calculated average of about 710 seconds each for single and about 1200 (600 equivalent) each for multiple tasks.  Pick here shows mix of both systems but you can see the difference.  I can calculate the throughput accurately using BoincTask's history but I did not have it on when I was looking at the problem.

================

strange, the images does not show up in preview, but looks perfect in edit mode. Stats info is at below url

stateson.net\images\ein_rx570.png

================

PowerAndUsageStats

archae86
archae86
Joined: 6 Dec 05
Posts: 2,556
Credit: 1,850,734,981
RAC: 2,572,879

BeemerBiker wrote:1.  stop

BeemerBiker wrote:

1.  stop new work

2.  suspend existing work

3. allow new work

If I am correct the new work would come in under a 1 cpu and 1 task rule. 

As written this will not give the desired result.  BOINC (on your machine) does not get new tasks when any tasks are suspended.

You can, however, have your cake and eat it by watching the progression of running tasks on your machine, then unsuspending all and forcing an update, then waiting out the 1 minute delay, as likely BOINC on your machine won't learn of any revised settings until the first update.   As soon as you see a task beginning to download, you can re-assert any task suspensions.  For some preference changes this may be important to avoid gross over-fetching (especially when raising multiplicity, say from 1X to 2X). 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.