app_config settings for multiple GPU apps

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46731392642

RAC: 64238713

hadron wrote:True. I have,

18 Jun 2024 12:56:22 UTC

Message 226108 in response to message 226094

(moderation:

)

hadron wrote:

True. I have, in fact, already selected the GPU I would be getting. Not only does it come well within my anticipated budget, but it also falls well within the card size limitation imposed by my system layout -- it isn't a huge case, and I tend to let cables fall wherever they will, which puts about a 9" limit on card size.

By "points", are you referring to credits awarded, or something else? If it's credits, those are of little to no concern to me -- I am simply interested in increasing the number of tasks my system can get done in a single day.

As for all those other constraints on my pocketbook, I am getting close on those, so it might not be all that long before I can go ahead with the GPU purchase -- wish me luck :)

Quote:
your current CPU is capable of maybe <100,000 ppd. a single GPU can easily be 10-50x that much, depending what you choose and what subproject you're interested in running.

I would figure out what GPU you're interested in that's in your price range, then look through the leaderboards for folks running that same GPU to get an idea of it's projected performance.

I'm running LHC, Rosetta and Einstein, with no intention of adding any others. Of those, only Einstein offers GPU tasks, and of those, I have decided I am interested in O3AS and MeerKat -- the former because I am a gravitation theorist and the latter because I read the code for that one is highly optimized :)

So I read what you say as meaning the GPU tasks are at least 10 times as productive as CPU-only tasks, give or take a fiddle factor for things like coding efficiency, total work which can be done, etc. If so, that alone probably will be enough for me to justify getting that GPU.

As for checking leaderboards, how would I go about locating those? I haven't found any; there don't seem to be any obvious links to such a thing.

PS, must be nice to have all those 64-core Epyc processors to play with :D

points = credits. same thing.
credits earned are directly correlated to tasks completed, within the same project. different kinds of tasks get different points/credits awarded according to the project's own weighting of how important and/or computationally intensive they are. the CPU tasks you are completing now (FGRP#5) are awarded 693 points each. the MeerKat (BRP7) tasks are awarded 3333 points each. they have more (and different) data in them to be processed. but a GPU can blast through these rather quickly. a lot faster than a CPU can. the O3AS tasks are awarded 10000 points each. they have very different data to be processed, and require a lot of GPU VRAM (~2GB per task) and need significant help from the CPU as well. these are the most computationally intensive tasks, requiring the most resources, and so they are awarded the most points for completion. the high credit reward is also an incentive from the project to encourage more people to run this project as they want to prioritize these results.

what GPU are you considering? I don't think you ever said which one. In general, and from my experience, Nvidia GPUs outperform AMD for Einstein tasks. especially with the additional tuning features on Linux and custom applications available in Linux. certain older AMD models do very well also (like the Radeon VII).

Leaderboards and Statistics are synonymous within BOINC projects. Top of the Einstein page, click "Community" then -> "Statistics" then -> "View more" button under the "Top Computers" section, here: https://einsteinathome.org/community/stats/hosts

_________________________________________________________________________

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46731392642

RAC: 64238713

hadron wrote: Keith Myers

18 Jun 2024 13:02:24 UTC

Message 226109 in response to message 226095

(moderation:

)

hadron wrote:

Keith Myers wrote:

For Linux systems, maybe the equivalent can be said for Windows system, Boinc will run all cpu tasks at NICE level = 19.

For gpu tasks, Boinc will all gpu tasks at NICE level =10.

There is an override setting for app_config that is supposedly capable of bumping gpu priority higher for specific brand cards but I have never seen it work correctly and it does nothing in fact.

NICE level higher number values mean lower priority with range from -20 to +20 with negative values being the highest priorities.

I don't see anything in the current Boinc documentation for such an app_config setting. If it actually does nothing, perhaps they simply got rid of it?

it's actually a setting for the cc_config.xml file.

talked about here (which also covers app_config): https://boinc.berkeley.edu/wiki/Client_configuration

Quote:

<process_priority>N</process_priority>, <process_priority_special>N</process_priority_special>

The OS process priority at which tasks are run. Values are 0 (lowest priority, the default), 1 (below normal), 2 (normal), 3 (high) and 4 (highest). 'Special' process priority is used for coprocessor (GPU) applications, wrapper applications, and non-compute-intensive applications, 'process priority' for all others. The two options can be used independently.

FYI, you can't set any better priority than "normal" unless you run BOINC with elevated privileges. I've played around with these settings, but didnt find conclusively that it helps much, so i don't really mess with it anymore. I just always leave a couple spare threads free to manage the background OS functions and that seems to helps things run more smoothly.

_________________________________________________________________________

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98605697

RAC: 538438

Ian&Steve C. wrote: hadron

20 Jun 2024 2:03:25 UTC

Message 226148 in response to message 226088

(moderation:

)

Ian&Steve C. wrote:

hadron wrote:

I'm not sure if that is the case.

That is the case. I have extensive experience with BOINC and what all the settings actually do from empirical data. Your interpretation of DA's words in a paper to showcase BOINC (it doesn't go too far into the weeds about what actually happens in the code) isn't correct. and frankly, I'm not sure what DA is even talking about in terms of a GPU at 0.5 using "half the cores" since not a single app works this way and BOINC has no way to enforce this. DA is often wrong. I think he just used a poor choice of words here that's making it be easily misinterpreted by the reader.

I got the impression that he wrote at the lowest common denominator of reader understanding, but who am I to argue with you?

I will take issues with your statement that "not a single app works this way and BOINC has no way to enforce this." That is not at all what Anderson means here. He says, I thought quite clearly, exactly what the manual says: setting <gpu_usage> to 0.5 causes BOINC to load 2 tasks into the GPU.

So, what exactly does he mean by GPU "core"? I dug around a fair bit, and couldn't find anything much better than what I find on Wikipedia (not to say that better doesn't exist, I just never found it). To me, is it clear that, by GPU core, Anderson means what AMD calls a Compute Unit. These consist of 64 shaders, 5 texture mappers, an ALU, a scheduler, and I believe, at least for more recent GPUs, a ray tracer. I haven't found any info on NVidia, so I don't know what they call their equivalent unit, nor what hardware those include.

So, these objects/devices clearly do the equivalent of what the CPU does for general data -- they crunch graphical data, and by design, are also optimized to crunch scientific data. Doesn't it seem logical, then, to call them GPU cores? It does to me.

Ian&Steve C. wrote:

the <gpu_versions><cpu_usage> setting tells the BOINC client how much CPU is used by the GPU app. nothing else. note, I am only talking about the cpu_usage setting as it pertains to GPU applications, since that's what you've been asking about.

Translation: "... 0.5 means that (a task) has 1 thread that (is in use by that task) half the time."

Ian&Steve C. wrote:

you won't have any threads sitting idle unless you configure it to do so. my suggestion to avoid this is to never reduce the CPU time setting (in compute preferences) from 100%. and if you set CPU use % to something like 95+% it will use all except 1 thread. if you set it to 100% it will use all threads. resource share between projects is a whole other can of worms that can impact what tasks get run from what projects.

Been there, done that, and this is nowhere near what I am talking about. You seem to have missed what I have been driving at, but that was based on a few assumptions/guesses that turn out to be wrong.

My point here was that, if you allot one thread to the exclusive use of a task that uses the CPU only 10% of the time, then that thread will be sitting idle 90% of the time. I thought this was clear in what I said; evidently I was wrong.

However, that all turns out to be moot, because that is not the default way of doing things, at least not in Linux. Without a Windows machine, I have no way to know if the same is true with that.

It is possible in Linux, both in runtime and from the command line, to assign any process exclusive use of a specific CPU/thread, but this does not seem to be the method used in real life. Rather, any process may be run on any thread at all, though once set, that will not change unless something forces the process onto a different thread. The default operation is to leave a process on the same thread throughout, unless there is a very good reason to do so. The reason for this is that switching a process to a different thread is time-consuming.

Of course, I don't know if running tasks like BRP7, which use the CPU only 10% of the time, make a significant difference to all that. It shouldn't, unless you allow BOINC to run so many tasks as to completely overwhelm the hardware and operating system -- and BOINC itself won't let you do that (Options/Computing preferences in the manager).

Bottom line in my mind is now that <cpu_usage>0.5 means that two such tasks can run simultaneously, and that will count as only one thread in use by the system -- meaning, if you have n threads allotted to BOINC in Options/..., with 2 such tasks running, BOINC still have n-1 threads left to run other tasks.

Anyway, enough of that. I have already delved far deeper into the inner workings of the system than I ever wanted to, and I don't intend to dive any deeper.

Ian&Steve C. wrote:

where it delves out resources is in the jobs in the queue that are waiting to run. Here BOINC will use the bookeeping of how many resources are available, and how many resources are needed, both CPU and GPU, by each task. if you lie to BOINC about how many resources are needed, then it will act on that lie.

Why would anyone do anything so colossally stupid as to tell a task that clearly needs the CPU throughout its entire runtime that it only needs the CPU half the time, or indeed, anything less than 100% of the time?

Ian&Steve C. wrote:

hypothetically, if you were to have a 20-thread CPU, and 8x GPUs and 100% CPU use/time setting, assuming that the app ACTUALLY needs a full core support (like it does for O3AS):

with 1x task per GPU, cpu_usage for the GPU app set to 1.0, BOINC would allocate 8x CPU threads for the GPU tasks, leaving 12 available to run CPU tasks, in whatever configurations you've set. 12x 1-thread tasks, 3x 4-thread tasks, whatever.

with 1x task per GPU, cpu_usage for the GPU app set to 0.5, BOINC would allocate 4x CPU threads for the GPU tasks, leaving 16 available threads for the CPU tasks. so 16 threads would be allowed to run CPU tasks, but you would be overcommited because those 8x GPU tasks are still trying to also use a full core. the CPU would be trying to juggle processes trying to use 24 threads on a 20-thread system. the scheduler will timeslice based on the process priority (i think GPU tasks are generally given higher priority in BOINC). runtimes of the CPU tasks is likely to suffer, and probably the GPU tasks too to some extent, especially in the case of O3AS.

What is the CPU time for those 8 tasks? Half the runtime? Then in reality, those tasks are consuming the resources of only 4 threads, and so are not over-committing the system. Oh wait, initially you said to assume 100% CPU use per task -- then see above, re: "colossally stupid."

If you really want to talk about "over-committed" systems, first you should open a console, run htop, and sort on CPU. Find out just how many processes are active at the time -- it might surprise you.

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98605697

RAC: 538438

Ian&Steve C. wrote: hadron

20 Jun 2024 2:18:14 UTC

Message 226149 in response to message 226109

(moderation:

)

Ian&Steve C. wrote:

hadron wrote:

Keith Myers wrote:

... Boinc will run all cpu tasks at NICE level = 19.

For gpu tasks, Boinc will all gpu tasks at NICE level =10.

There is an override setting for app_config that is supposedly capable of bumping gpu priority higher for specific brand cards but I have never seen it work correctly and it does nothing in fact.

I don't see anything in the current Boinc documentation for such an app_config setting. If it actually does nothing, perhaps they simply got rid of it?

it's actually a setting for the cc_config.xml file.

Grazzie. I've never had reason to dig deep into cc_config, because the system seems to be working well without any meddling on my part. app_config settings are another matter.

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98605697

RAC: 538438

Ian&Steve C. wrote:hadron

20 Jun 2024 5:03:01 UTC

Message 226150 in response to message 226108

(moderation:

)

Ian&Steve C. wrote:

hadron wrote:

<snip>

By "points", are you referring to credits awarded, or something else? If it's credits, those are of little to no concern to me -- I am simply interested in increasing the number of tasks my system can get done in a single day.

<snip>

points = credits. same thing.
credits earned are directly correlated to tasks completed, within the same project. different kinds of tasks get different points/credits awarded according to the project's own weighting of how important and/or computationally intensive they are. the CPU tasks you are completing now (FGRP#5) are awarded 693 points each. the MeerKat (BRP7) tasks are awarded 3333 points each. they have more (and different) data in them to be processed. but a GPU can blast through these rather quickly. a lot faster than a CPU can. the O3AS tasks are awarded 10000 points each. they have very different data to be processed, and require a lot of GPU VRAM (~2GB per task) and need significant help from the CPU as well. these are the most computationally intensive tasks, requiring the most resources, and so they are awarded the most points for completion. the high credit reward is also an incentive from the project to encourage more people to run this project as they want to prioritize these results.

Those credits are IMO outrageous. They equate to 22 to 30 thousand per hour for O3AS and about 35K per hour for BRP7. FGRP5 tasks are worth about 220/hr, while on LHC, Atlas earns me ~170/hr, Theory about 67. Rosetta tasks yield around the same as Theory. I cannot even fathom any project that does so much work as to be worth what O3AS and BRP7 are being awarded, and I really don't care too much about "how much more valuable" they might be to the scientific community.

But whatever, it is what it is.

Ian&Steve C. wrote:

what GPU are you considering? I don't think you ever said which one. In general, and from my experience, Nvidia GPUs outperform AMD for Einstein tasks. especially with the additional tuning features on Linux and custom applications available in Linux. certain older AMD models do very well also (like the Radeon VII).

It's probably not surprising that NVidia outperforms AMD most of the time, since NVidia GPUs have twice as many shaders per "core" as do AMD chips. However, NVidia GPUs are also way more costly than AMD; I have found only one NVidia card on Newegg that is within my price range (there may be cards from manufacturers I won't consider), with 12GB VRAM. (Note: I am also limited by case size/card length.) However, this card would only allow me to run at most 2 BRP7 tasks in addition to the 4 O3AS tasks I wish to run.

I would thus prefer to have a 16GB card, so I can run 4 O3AS tasks and still have a decent amount of VRAM left over to run a decent number of BRP7. I found only one card on Newegg, an ASUS Dual Radeon RX 7600 XT, that fits bill for affordability and card length. So that is my preferred choice.

(Why not search on Amazon, you may ask? Because Amazon keeps throwing irrelevant garbage at me that I didn't ask for; if I am looking for an Asus product, why do they insist on showing me stuff from manufacturers I've never even heard of -- likely imported stuff from you-know-where -- and am not interested in?)

You mention the Radeon VII. Those went out of production in August 2019, barely 6 months after introduction, and I am simply not interested in getting a used card. They may be wonders at crunching these tasks, but if I can't get a new one, I'm not interested. Besides, they were rather pricey when new, so I have no idea if even a used one would be affordable to me.

Ian&Steve C. wrote:

Leaderboards and Statistics are synonymous within BOINC projects. Top of the Einstein page, click "Community" then -> "Statistics" then -> "View more" button under the "Top Computers" section, here: https://einsteinathome.org/community/stats/hosts

OK, thanks. I had looked at that but no further than the top of the list. I never realized that so much more information would be there by clicking on "show more" :)

PS, just how much VRAM does a BRP7 task require?

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 401

Credit: 10141843455

RAC: 25907492

After plowing through the

20 Jun 2024 5:54:32 UTC

Message 226152 in response to message 226150

(moderation:

)

After plowing through the waterfall of posts here and getting a headache, I am about ready to donate a nice GPU ....

Regards SFV

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98605697

RAC: 538438

San-Fernando-Valley

20 Jun 2024 7:29:12 UTC

Message 226155 in response to message 226152

(moderation:

)

San-Fernando-Valley wrote:

After plowing through the waterfall of posts here and getting a headache, I am about ready to donate a nice GPU ....

Regards SFV

This one looks pretty good: https://www.newegg.ca/gigabyte-geforce-rtx-4090-gv-n4090aorusx-w-24gd/p/N82E16814932556

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46731392642

RAC: 64238713

hadron wrote:I will take

20 Jun 2024 14:12:29 UTC

Message 226159 in response to message 226148

(moderation:

)

hadron wrote:

I will take issues with your statement that "not a single app works this way and BOINC has no way to enforce this." That is not at all what Anderson means here. He says, I thought quite clearly, exactly what the manual says: setting <gpu_usage> to 0.5 causes BOINC to load 2 tasks into the GPU.

So, what exactly does he mean by GPU "core"? I dug around a fair bit, and couldn't find anything much better than what I find on Wikipedia (not to say that better doesn't exist, I just never found it). To me, is it clear that, by GPU core, Anderson means what AMD calls a Compute Unit. These consist of 64 shaders, 5 texture mappers, an ALU, a scheduler, and I believe, at least for more recent GPUs, a ray tracer. I haven't found any info on NVidia, so I don't know what they call their equivalent unit, nor what hardware those include.

So, these objects/devices clearly do the equivalent of what the CPU does for general data -- they crunch graphical data, and by design, are also optimized to crunch scientific data. Doesn't it seem logical, then, to call them GPU cores? It does to me.

Translation: "... 0.5 means that (a task) has 1 thread that (is in use by that task) half the time."

not sure how you got all that from what I said, it's not what i was talking about at all.

DA said "0.5 means that J uses half the GPU’s cores" and made it clear that he thinks that the application of this parameter is different between CPUs and GPUs (CPUs = time, GPUs = # of core used). note, he's talking about the gpu_usage parameter here, not the cpu_usage parameter, and that's what I was referring to.

"GPU's", possessive, belonging to
"cores", plural, more than 1. including all.

say you have a GPU with 1000 cores. that means DA thinks that this setting means that an app is using 500 cores of the GPU. this is not how GPU applications work. no GPU application uses half the cores like this (by default of course, this kind of tuning IS possible with special uses like CUDA MPS, but that's another topic). all BOINC applications have access to all resources of the GPU and can uses all the cores. and BOINC cannot enforce this upon the application to make it use half the cores. I was arguing the wording of how DA described it. The logic in BOINC is simplistic here, it says "oh the tasks only needs half the GPU, i can fit another half-GPU task in here" and runs another task. just because that's the logic of BOINC, does not mean that's how the app actually works. various apps have all kind of bottlenecks that make the practice of running multiple tasks on one GPU worth it or not, not simply core utilization. for einstein GPU tasks the issue is usually memory access/bandwidth. O3AS is even more tricky since overall runtime depends on many things, including the CPU speed. the same exact GPU can be faster or slower depending on what CPU it's connected to, by a fairly large margin, because a significant portion of the computation happens exclusively on the CPU while the GPU basically does nothing so it becomes as much about timing the various run stages as resource utilization during the GPU stage. there's uniqueness to every app and situation and you need to evaluate on a case by case or app by app basis for what is best.

Nvidia's analog to AMD's "compute unit" is their SM or Streaming Multiprocessor, the specific contents of which can vary based on core architecture.

hadron wrote:

It is possible in Linux, both in runtime and from the command line, to assign any process exclusive use of a specific CPU/thread, but this does not seem to be the method used in real life. Rather, any process may be run on any thread at all, though once set, that will not change unless something forces the process onto a different thread. The default operation is to leave a process on the same thread throughout, unless there is a very good reason to do so. The reason for this is that switching a process to a different thread is time-consuming.

you're talking about process affinity, or task pinning, and yes there is little reason to do this anymore since task core switching is so fast these days. most modern processors will "distribute the load" so to speak to even out the thermal load across the cores. The only time I've see this to be massively helpful is with certain Primegrid tasks with set/known L3 cache utilization when run on AMD zen architectures with several chiplets and segregated L3 cache areas. in those cases you don't want the task moving from one L3 cache to another. tasks are noticably faster when pinned to static core ranges that all share the same L3. but this is an edge case in the BOINC world, and most other projects wont benefit from this. and this idea doesnt really have anything to do with this discussion about setting up the app_config file.

hadron wrote:

Why would anyone do anything so colossally stupid as to tell a task that clearly needs the CPU throughout its entire runtime that it only needs the CPU half the time, or indeed, anything less than 100% of the time?

this is exactly what I was trying to get across. that you should tailor your settings to reality. if the task uses 10% of the CPU core, set it to 0.1. if the task uses 100% of the CPU core, set it to 1.0. there is no one-size-fits-all if you want to optimize tightly.

side note, these kinds of settings, while they can be fractional, and are counted as fractional when summing up resources, the resultant value for number of cores used cannot be and the remainders get truncated.

0.1 for 1 task, really = 0 to BOINC
0.5 for 1 task, really = 0 to BOINC
0.1 for 10 tasks, really = 1 to BOINC
0.5 for 5 tasks, really = 2 to BOINC

hadron wrote:

If you really want to talk about "over-committed" systems, first you should open a console, run htop, and sort on CPU. Find out just how many processes are active at the time -- it might surprise you

i do this on the regular, it doesn't surprise me. most of those processes are not actively doing anything. My comments have been in regards to BOINC active tasks.

I think there's been enough shop-talk and speculation about what can or will happen. i think you should just buy whatever GPU fits you and play with the settings to reach your desired outcome. BOINC has a lot of knobs to turn.

_________________________________________________________________________

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46731392642

RAC: 64238713

hadron wrote: PS, just how

20 Jun 2024 14:19:43 UTC

Message 226160 in response to message 226150

(moderation:

)

hadron wrote:

PS, just how much VRAM does a BRP7 task require?

It slightly scales to however many SMs/cores you have, stronger GPUs use more VRAM than weaker ones. less than 1GB per task. I want to say it's somewhere around 600-800MB.

but I haven't run BRP7 for a while to get more specific numbers. and my setup and tuning is very non-standard anyway and wouldn't really translate to what you'd see.

_________________________________________________________________________

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98605697

RAC: 538438

Ian&Steve C. wrote: hadron

20 Jun 2024 20:43:42 UTC

Message 226165 in response to message 226160

(moderation:

)

Ian&Steve C. wrote:

hadron wrote:

PS, just how much VRAM does a BRP7 task require?

It slightly scales to however many SMs/cores you have, stronger GPUs use more VRAM than weaker ones. less than 1GB per task. I want to say it's somewhere around 600-800MB.

but I haven't run BRP7 for a while to get more specific numbers. and my setup and tuning is very non-standard anyway and wouldn't really translate to what you'd see.

That gives me a good estimate of the upper and lower bounds on the number of simultaneous BRP7 tasks I will be able to run, which will save me time in figuring it out. With 6GB available (assuming the 2.5GB figure for an O3AS task is correct), I can start at 6 BRP7 tasks, and just keep adding one more until the GPU freaks out from Boinc trying to overload it.

app_config settings for multiple GPU apps

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner