How to control the FGRP3 / BRP4G / BRP5 mix for GPU endowed hosts

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092644413
RAC: 23373918
Topic 197431

You can ignore all of this if any of these are true

  • * You don't have a host with a GPU capable of crunching.
    * You have a GPU but it is one of the recent integrated Intel HD4x00 ones. Sorry, I'm not addressing those here.
    * You have a GPU but are quite happy to take what the project sends.
    * You have no desire to run multiple concurrent tasks on your GPU.
    * You have a GPU but don't want to prevent FGRP3 GPU tasks.

If you are still reading, you are probably interested in making the most efficient use of a discrete GPU. Please note that 'efficient' doesn't mean maximum RAC. If you think it does, you shouldn't be crunching this project. These notes are all about choosing the combination of apps that will most effectively use the resources you are donating.

Running multiple GPU tasks concurrently is a good way to improve your host's output but at the expense of additional power draw and heat. As this note is not about the mechanics of running multiple concurrent tasks, I am going to assume you already know about the 'GPU utilization factor' pref setting. I am also going to assume you understand that the new FGRP3 GPU app is still in development and that it needs so much CPU support that GPU loading will be a lot lower than what you are used to with previous BRP4/5 apps. If you don't understand this, please read the announcement thread in Technical News.

The particular scenario I want to address is how to run FGRP3 CPU tasks only whilst preventing FGRP3 GPU tasks from ever being downloaded in the first place. I assume there will be volunteers who wish to keep the current and more efficient BRP4/5 type tasks for their GPUs whist choosing exactly what to run on any spare CPU cores.

I have a lot of GPU endowed hosts. I've spent quite a bit of time tuning each one to crunch efficiently. I want to run the most efficient combination of apps and for me (until FGRP3 GPU improves) that means BRP4/BRP5 on the GPU and FGRP3/S6CasA on the CPU cores. I've used all venues (default, home, school, work) to set up different GPU utilization factors of 1, 0.5, 0.33, 0.25, respectively, which covers the most likely range. I have other specific factors in choosing which apps I run personally, but I'm attempting to make these notes as general as possible.

There have been a couple of recent developments that make it possible to run FGRP3 CPU tasks without having to run FGRP3 GPU. The latest apps have the as part of their name and this is crucial. The latest recommended BOINC (7.2.42) is also crucial. You need to use the most recent incarnation of app_config.xml which is documented (sketchily) at the bottom of this page.

In the last couple of hours I found some time to upgrade some of my linux boxes to 7.2.42 and to compose the app_config.xml file I thought might work. It would have been nice to have more detail in the documentation :-). I chose to try this out on a machine with an AMD HD7850 GPU and an i3-2120 CPU (Sandy Bridge) - 4 HT cores. The machine was in the work venue and was running BRP5 4x on the GPU. This automatically reserves two CPU cores. I was running an entirely separate project (POGS) on the 2 spare CPU cores. Below is the app_config.xml file that I have written to allow me to run FGRP3 CPU tasks.

[pre]

hsgamma_FGRP3
2

0.25
0.5



hsgamma_FGRP3
FGRPSSE
1


einsteinbinary_BRP5
4

0.25
0.5

[/pre]
The documentation says that overrides but it still was uncomfortable to leave in stuff about . None of this was shown as optional ([]) so I left it all in. I figured I might need the set to 2 so I was happy to have that in. I made the assumption that not setting a for the GPU app would be sufficient to preclude GPU tasks but it would have been nice to read something like that in the documentation :-).

So I placed the above file in the EAH project directory and used the 'read config files' control in the manager. The event log didn't complain about syntax errors but it did complain that "Entry in app_config.xml for app 'hsgamma_FGRP3', plan_class 'FGRPSSE' doesn't match any app versions". I wasn't too concerned because how would the current BOINC client know about this app when it hadn't been running it before. It had previously run FGRP3 but before the plan_class stuff had come along.

So I set the cache size to a low value to prevent over fetch, changed the venue from 'work' (FGRP3 disabled) to '---' (default) (FGRP3 enabled) and hit 'update'. The client grabbed the new venue but again complained about the app/plan_class. I increased the work cache setting to trigger work fetch and was rewarded with 7 CPU tasks and no GPU tasks. Looks like everything is working the way I had hoped.

I promoted one of the new CPU tasks and have been watching it occasionally, noting progress. The tasks were estimated to take around 24 hours which is around 3x what they should take for this CPU. The estimates are always way too high anyway and I presume this is an artifact of running GPU tasks 4x. I presume BOINC is not smart enough to make allowances for the 'slowness' of an individual GPU task :-). A CPU task finishing always lowers estimates on all EAH tasks by the normal amount you would expect and then along comes a GPU task and cranks them back up again.

The first task was 0.3% complete after 6 mins, 4.2% after 60 mins and 10.3% after 90 mins, ie 6.1% progress in the last 30 minutes. This translates to a full run time of just over 8 hours - probably at least 8.5 hours when the extra time for followup calculations (now allowed for in the very slow start) is added. It's worthy of note that a lot of the first hour was taken up with slow but continuous (changing every second) progress. By the time I looked again at the hour mark, progress wasn't continuous but the interval between progress steps wasn't inordinately long. I don't think too many people will think things are stalled :-).

One other point I should mention. I made the assumption that app_config.xml would override project pref settings. This host is now in a venue where the GPU utilization factor is 1. However there are still 4 GPU tasks crunching. I haven't downloaded new GPU work and wont be until my ISP's next 'off-peak' period cpmes around. I can easily wait till then to confirm (or otherwise) the assumption.

Everything continues to look good so I hope these notes might be useful to others. If you are running a different number of CPUs/concurrent GPU tasks, you will need some changes to the details.

EDIT:
I guess I should point out that, in order to keep things simple, I chose to have only one GPU app (BRP5) for this host. It should be easy to substitute BRP4 for BRP5 or even to have both if you so desired.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092644413
RAC: 23373918

How to control the FGRP3 / BRP4G / BRP5 mix for GPU endowed host

Quote:
One other point I should mention. I made the assumption that app_config.xml would override project pref settings. This host is now in a venue where the GPU utilization factor is 1. However there are still 4 GPU tasks crunching. I haven't downloaded new GPU work and wont be until my ISP's next 'off-peak' period comes around. I can easily wait till then to confirm (or otherwise) the assumption.


No problems with the assumption. GPU tasks are still being crunched 4x so settings in app_config.xml do override the project setting for GPU utilization factor.

However, there is a problem in that the overnight cache fillup did get a couple of FGRP3 GPU tasks so it is not sufficient just to leave out the and associated tags for the GPU version. So I've come up with an extra thing to try and that is to get the client to try to tell the scheduler that there aren't enough resources on the host to allow FGRP3 GPU tasks to run. I don't know that the scheduler will pay any notice but I figure it's worth a try :-).

I've tried to do this by modifying the first ... section (for hsgamma_FGRP3) to change 0.25 to 1.1. By examining the scheduler request message that is sent to the server, I can see that the 1.1 value is being passed on. After a couple of small work requests with no FGRP3 GPU tasks being returned, I was wondering if it might be working, but the next request returned a FGRP3 GPU task. So, unfortunately that's not the answer.

The next thought might be to extend the ... block by specifically adding a new set of tags for things like the GPU and for example. If was set to a value higher than the actual number of GPUs in the host, I wonder if that could 'turn off' tasks for that ? I'll try this later when I get some time.

If anyone reading this has any bright ideas, I'd be very interested! In case you are wondering, I know it can be done with app_info.xml. Because of the work required to handle things when app versions change and the difficulty of removing it when you need to, I'm not at all inclined to go down that path.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092644413
RAC: 23373918

I couldn't find a way to

I couldn't find a way to allow the CPU version of the FGRP3 app to run without also getting FGRP3 GPU tasks with just the app_config.xml procedure. However, it seems to be working now by using one of the options in cc_config.xml. I won't know for sure for about another 24 hours.

As mentioned previously, I use all four available 'venues' to allow easy control of the number of concurrent GPU tasks on GPU endowed hosts. The venues 'default', 'home', 'school', 'work' have GPU utilization factors of 1, 0.5, 0.33, 0.25 respectively. All hosts that don't have a usable GPU are in the default venue. GPU hosts are spread over the other three according to where they run most efficiently. FGRP3 is enabled for the default venue. It is disabled for all the others. BRP5 is enabled on all venues. Before the FGRP3 GPU app was released, FGRP3 was enabled for all venues.

Over the weekend I read again all the stuff on client configuration and not just the stuff at the end on app_config.xml. I haven't really felt any need to use cc_config.xml previously so wasn't really familiar with all the options. I found the following option and description - I've highlighted the bit that caught my attention :-

Quote:


Don't use the given GPU for the given project. If is not specified, exclude all GPUs of the given type. is required if your computer has more than one type of GPU; otherwise it can be omitted. specifies the short name of an application (i.e. the element within the element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple elements. If you change GPU exclusions, you must restart the BOINC client for these changes to take effect. New in 6.13

[pre]
project_URL
[N]
[NVIDIA|ATI|intel_gpu]
[appname]
[/pre]

I can recall the option and the very first sentence of the description which implied that the GPU would be excluded for the entire project. That's what you get if you skim things too quickly. The later sentence seems to indicate that you can apply it 'per app'. Whilst the app is the same for both CPU and GPU, I'm hoping this option will just stop GPU tasks and still allow CPU tasks.

So I've set up three of my HD7850 hosts to use this option. Because the default location is the only one where FGRP3 is allowed, I shifted these hosts to 'default'. The GPU utilization factor is 1 for that venue so I've continued to use app_config.xml to force the required 4 concurrent tasks. So far, everything seems to be working as expected with just one small glitch.

From experiments over the weekend, these three hosts already had some FGRP3 CPU tasks. So today when I placed the config files and stopped and restarted BOINC, the expected 4 GPU tasks and 2 CPU tasks were running just fine. An extra FGRP3 CPU task was downloaded (before any crunching was restarted) which was extremely encouraging. However the 4 GPU tasks were immediately in high priority mode. Because of this, I can't just increase the cache size to trigger a GPU work fetch to really check if FGRP3 GPU tasks have actually been excluded. So why are the GPU tasks in panic mode? To explain why I think they are, I have to explain my work fetch strategy.

The plan with my ISP has a modest 'peak' monthly allowance and a very much larger 'off-peak' allowance. I do all my downloading off-peak and it works very well. The details don't matter but during off-peak time, the cache size on all hosts is increased automatically to 4.5 days and then reverted to 2 days some time before the end of off-peak. During the day when I'm trying out things, hosts will have close to 4.5 days of work on board and will be well above the cache level where new work could be fetched.

Because the above three hosts are now in a venue where the GPU utilization factor is 1, perhaps BOINC is regarding the ~4.5 day cache as a ~18 day cache (4 x 4.5), despite the fact that tasks are running 4x due to app_config.xml. I can get out of panic mode by suspending GPU tasks - it seems just about enough so that those left standing add up to not far below 14 days if running singly. By this time tomorrow, the actual cache will be below 3.5 days and I shouldn't need to suspend any GPU tasks. At that time I'll be able to trigger a work fetch and so test whether or not I get any FGRP3 GPU tasks.

If all goes well, I'll then be able to do the mod to all hosts running 4x and return them all to the work venue where there should be no problem with properly calculating cache sizes. I'll then add FGRP3 to the work venue and hopefully that will be the end of it (for HD7850 endowed hosts anyway :-) ). I'll post again when I have confirmed what happens.

Cheers,
Gary.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2808646
RAC: 3020

RE: Over the weekend I read

Quote:

Over the weekend I read again all the stuff on client configuration and not just the stuff at the end on app_config.xml. I haven't really felt any need to use cc_config.xml previously so wasn't really familiar with all the options. I found the following option and description - I've highlighted the bit that caught my attention :-

Quote:


Don't use the given GPU for the given project. If is not specified, exclude all GPUs of the given type. is required if your computer has more than one type of GPU; otherwise it can be omitted. specifies the short name of an application (i.e. the element within the element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple elements. If you change GPU exclusions, you must restart the BOINC client for these changes to take effect. New in 6.13

[pre]
project_URL
[N]
[NVIDIA|ATI|intel_gpu]
[appname]
[/pre]

I can recall the option and the very first sentence of the description which implied that the GPU would be excluded for the entire project. That's what you get if you skim things too quickly. The later sentence seems to indicate that you can apply it 'per app'. Whilst the app is the same for both CPU and GPU, I'm hoping this option will just stop GPU tasks and still allow CPU tasks.


This option is ment for use when you have multiple GPUs, and you don't want Boinc to allocate work to one of them for a particular project/a particular project's app,
I don't think it'll stop the project sending work for that app through.

Bernd's suggestion is interesting through:

http://einsteinathome.org/node/197343&nowrap=true#129792

Claggy

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092644413
RAC: 23373918

Hi Claggy, thanks for the

Hi Claggy, thanks for the response.

Quote:
This option is ment for use when you have multiple GPUs ...


But it does say you can leave out which will cause it to apply to all devices. Some of mine have two devices (AMD and Intel) but I'm running Linux so it's really only one :-). Nevertheless one is still all I have :-).

Quote:
I don't think it'll stop the project sending work for that app through.


Fair enough. I'm still going to finish the test though. I'd like to see what happens.

Quote:
Bernd's suggestion is interesting through: ...


It's not a suggestion - he called it a quick and dirty hack :-). I often wonder if there's any potential for unintended consequences with things like this. It was great when Bernd came up with the whole GPU utilization factor - something that was lacking in BOINC at the time. Now that there is stuff in BOINC to do the same job (probably not as conveniently for the average user) do we have to worry about which takes precedence? My experiments seem to indicate that app_config.xml takes precedence over GPU utilization factor but I'd want to see more evidence before I'm sure that's the whole story.

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4347
Credit: 252851102
RAC: 40560

RE: Now that there is stuff

Quote:
Now that there is stuff in BOINC to do the same job (probably not as conveniently for the average user) do we have to worry about which takes precedence? My experiments seem to indicate that app_config.xml takes precedence over GPU utilization factor but I'd want to see more evidence before I'm sure that's the whole story.

The "GPU utilization factor" is purely used on the server side and affects only the value of n_gpus that's communicated to the client. As always in BOINC, configuration made locally on the client side overrides information from the server.

hth,
BM

BM

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: This option is ment for

Quote:

This option is ment for use when you have multiple GPUs, and you don't want Boinc to allocate work to one of them for a particular project/a particular project's app,
I don't think it'll stop the project sending work for that app through.

Claggy


what makes you think that it won't work, particularly when it is designed to prevent a project's server from sending work for a specific application. despite originally being created to help multi-GPU hosts, i see no reason why it wouldn't work even in a host w/ only one GPU.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.