ATI/AMD + NVIDIA GPUs on same machines, app_config question

Aron
Aron
Joined: 29 Sep 06
Posts: 29
Credit: 1,154,829,667
RAC: 0
Topic 220085

Hi,

I have a few GTX1080 and some other Radeon cards laying around. Got them working on one machine, however the NVIDIA gtx 1080 cards seem to error out if I run more then one task simultaneously per GPU. The Radeon VII is much slower running only one task per GPU. My question is:

Can someone help me with an app_config file so that my Radeon (ATI/AMD) runs two tasks at the same time, while the NVIDIA cards only do one?

I guess I'd set the preferences to 1 task per GPU as default, and then add/edit the app_config file for any ATI/AMD cards, any ideas how to distinguish manufacturer?

Thanks a bunch!

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,870
Credit: 115,209,444,842
RAC: 32,522,896

Aron wrote:... Can someone

Aron wrote:
... Can someone help me with an app_config file so that my Radeon (ATI/AMD) runs two tasks at the same time, while the NVIDIA cards only do one?

The documentation shows examples of using "plan class" to distinguish between nvidia and AMD (ati).  An easy way to see exactly what plan classes are being used on your machine is to look at your completed tasks list on the website and click on the TaskID link for tasks crunched on each GPU.  On the page that opens, you will see a list of data which includes the application name for the app used. The bit in brackets at the end of the name is the plan class.  You would need to use precisely that in any app_config.xml you construct.  Just follow the format as set out in the documentation, leaving out all the optional bits you don't need.

I use app_config.xml just to control the budgeting of fractional CPU and GPU resources.  I only use the <app> .... </app> section.  I've never needed to use the <app_version> section or things like <plan_class> or <max_concurrent>.   Read the documentation carefully, create and deploy what you need and then click the "read config files" option in BOINC Manager.  Look in the event log for any complaints.  It's also a good idea to set a low work cache size and run off excess tasks before you first try to use it.  This is just in case something causes the whole work cache to get trashed if you get something wrong with the options/syntax :-).  Shouldn't happen but you just never know :-).

Aron wrote:
I guess I'd set the preferences to 1 task per GPU as default, and then add/edit the app_config file for any ATI/AMD cards, any ideas how to distinguish manufacturer?

Since app_config.xml overrides normal default settings, I would suggest including separate <app_version> sections for each GPU type.  If you don't specify the other GPU type, the default settings may well work for it but if you are going to install the file you may as well provide the settings for both types.  This also gives you an easy way to adjust both GPUs independently.

When I looked at the contents of the scheduler logs for one of the contacts your host made, I saw the following comment.

2019-11-30 23:27:46.4705 [PID=6255 ] [version] NVidia compute capability: 601
2019-11-30 23:27:46.4705 [PID=6255 ] [version] CUDA compute capability required min: 700, supplied: 601


CUDA is not being used (so it's not important) but I imagine an old version of CUDA might mean an old version of the OpenCL libs.  I really don't know anything about this but there's got to be some reason why you have trouble running multiple tasks on that GPU.  I have no experience but I see plenty of others who run multiple tasks without errors.  You should try to solve that issue since it should be solvable.

Cheers,
Gary.

Aron
Aron
Joined: 29 Sep 06
Posts: 29
Credit: 1,154,829,667
RAC: 0

Dear Gary, Thanks for your

Dear Gary,

Thanks for your detailed reply! Let me add my comments to your paragraphs:

Gary Roberts wrote:
Since app_config.xml overrides normal default settings, I would suggest including separate <app_version> sections for each GPU type.  If you don't specify the other GPU type, the default settings may well work for it but if you are going to install the file you may as well provide the settings for both types.  This also gives you an easy way to adjust both GPUs independently.

So the way to distinguish GPU types is by app version? So the FGRP apps actually have different app names? I thought it was all the same app. If so, then I'll just use the app name to distinguish the GPU type.

Gary Roberts wrote:
When I looked at the contents of the scheduler logs for one of the contacts your host made, I saw the following comment...

It seems that only 20xx versions of NVIDIA cards have compute capability of 7+ (and TITAN V): https://developer.nvidia.com/cuda-gpus

GPU Compute Capability
NVIDIA TITAN RTX 7.5
Geforce RTX 2080 Ti 7.5
Geforce RTX 2080 7.5
Geforce RTX 2070 7.5
Geforce RTX 2060 7.5
NVIDIA TITAN V 7.0
NVIDIA TITAN Xp 6.1
NVIDIA TITAN X 6.1
GeForce GTX 1080 Ti 6.1
GeForce GTX 1080 6.1
GeForce GTX 1070 6.1
GeForce GTX 1060 6.1
GeForce GTX 1050 6.1

Are native cuda applications actually faster than OpenCL? The 7+ compute requirement seems a bit high, my GTX 1080s are not that old? As for my NVIDIA cards generating errors, I think I've narrowed it down to a memory socket issue, so now it should be fine.

Thanks! Best,
Aron

 

 

Aron
Aron
Joined: 29 Sep 06
Posts: 29
Credit: 1,154,829,667
RAC: 0

By the way, does the app name

By the way, does the app name include .exe? I.e. is this correct?

<app_config>
<app>
<name>hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
</app_config>

 

Thanks! Best,
Aron

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1,702,989,778
RAC: 0

Aron wrote:By the way, does

Aron wrote:
By the way, does the app name include .exe? I.e. is this correct?

No exe. I believe it should look something like this...

<app_config>

<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>3</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>

<app_version>
<app_name>hsgamma_FGRPB1G</app_name>
<plan_class>FGRPopencl1K-nvidia</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>

<app_version>
<app_name>hsgamma_FGRPB1G</app_name>
<plan_class>FGRPopencl-ati</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>0.5</ngpus>
</app_version>

</app_config>

You can find info on all the different app versions here:

https://einsteinathome.org/apps.php?xml=1

Aron
Aron
Joined: 29 Sep 06
Posts: 29
Credit: 1,154,829,667
RAC: 0

Hi Richie, Thanks! That’s

Hi Richie,

Thanks! That’s why I asked since it‘s a bit tricky to figure it out without any experience. Btw, the thing “<gpu_usage>.5</gpu_usage>” in the top section, is this over-ridden but the following two app_version sections? I want the NVIDIA app to run one app at a time and the ATI one to run two (or perhaps three). Also, if I add another 1080, how do I change it then? I guess the “<max_concurrent>3</max_concurrent>” is some global max?

Thanks again! Best,

Aron

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1,702,989,778
RAC: 0

Let's keep it secret but your

Let's keep it secret but your dealing with a total amateur here. I don't have any experience on that plan class stuff :D But now that I re-thought about that example, I don't think that the whole app part... or the gpu_version part in it are needed at all. I would replace it with just <app><name>hsgamma_FGRPB1G</name></app> and then the two app version parts.

I guess those <ngpus> lines should be enough to limit the max amount of concurrent tasks for both cards (1 for Nvidia, 2 for AMD). That should automatically limit also the max concurrent. * If you'd like 3 tasks for AMD use .33 for the ngpus for it (instead of .5)

I don't know the answer for that overriding... but without the gpu version stuff... that problem would go away.

Aron
Aron
Joined: 29 Sep 06
Posts: 29
Credit: 1,154,829,667
RAC: 0

Hi! Awesome, thanks! I

Hi!

Awesome, thanks! I tried:

<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
</app>

<app_version>
<app_name>hsgamma_FGRPB1G</app_name>
<plan_class>FGRPopencl1K-nvidia</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>

<app_version>
<app_name>hsgamma_FGRPB1G</app_name>
<plan_class>FGRPopencl1K-ati</plan_class>
<avg_ncpus>0.5</avg_ncpus>
<ngpus>0.5</ngpus>
</app_version>

</app_config>

 

And that seemed to work just fine. Thanks again! Best,

Aron

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1,702,989,778
RAC: 0

Feels good to hear that it

Feels good to hear that it works! Thanks too. I also learned something and saved those 'code' lines for future, if I would need a similar app_config some day.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,870
Credit: 115,209,444,842
RAC: 32,522,896

Aron wrote:So the way to

Aron wrote:
So the way to distinguish GPU types is by app version?

Well actually it's the <plan_class> that does the distinguishing so that's the important bit.  To get that, you need to use <app_version> rather than <app>.

Did you read the documentation?  It's vital that you do and that you do it thoroughly.  Here are a couple of key quotes from it that would have helped answer your subsequent questions.

Quote:
name
short name of the application as found in the corresponding <name>xxxxx</name> tags in your client_state.xml file.
The application name can also be found using the <sched_op_debug> logging flag: a new task shows "Starting task task_name using Application_Name ..."


Quote:
app_name
the short name of the application.


Quote:
Note: The sections in square brackets '[foo/]' are optional. When you want to use any, remove the square brackets.


Quote:
Each <app_version> element specifies parameters for a given app version; it overrides <app>.


I've highlighted some key words in the above quotes.  For example you need to use the correct short name, a lot of stuff is optional (in other words leave it out unless you really need it) and some stuff overrides other stuff, so if you're using 'some stuff' leave out the 'other stuff'.  To use <plan_class> you must use <app_version> to contain it so you should throw away all parts of the <app> clause.

Regarding the short name, I find the easiest way to find it quickly is to take a copy (to protect the original) of the state file (client_state.xml), open it with any plain text editor, and search for "</project>" - the closing project tag for the Einstein project.  There will be a closing tag for each project you support so make sure you find the one for Einstein.  Immediately following that particular tag will be all the <app> ... </app> clauses for every app that you've ever used on that machine.  The first line in each <app> clause is the short name of that particular app.  Throw away the state file copy when you are finished.

Aron wrote:
It seems that only 20xx versions of NVIDIA cards have compute capability of 7+

As far as I know, there are no plans for a CUDA app for either of the current GPU searches.  I just found it very curious as to why the scheduler would be checking that capability under the current circumstances and wondered if it had any bearing on OpenCL and why you were unable to run multiple tasks on a 1080.  If those GPUs were mine, I would want to sort that issue out.

Aron wrote:
As for my NVIDIA cards generating errors, I think I've narrowed it down to a memory socket issue, so now it should be fine.

That's great!  I hope it does solve the issue for you.

Cheers,
Gary.

Aron
Aron
Joined: 29 Sep 06
Posts: 29
Credit: 1,154,829,667
RAC: 0

Hi Gary, Thanks for the

Hi Gary,

Thanks for the detailed reply! I finally got things working the way I intended!

Best,
Tomas

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.