app_config settings for multiple GPU apps

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98585697

RAC: 538884

Ian&Steve C. wrote:say you

21 Jun 2024 8:53:20 UTC

Message 226171 in response to message 226159

(moderation:

)

Ian&Steve C. wrote:

say you have a GPU with 1000 cores. that means DA thinks that this setting means that an app is using 500 cores of the GPU. this is not how GPU applications work. no GPU application uses half the cores ...

Nvidia's analog to AMD's "compute unit" is their SM or Streaming Multiprocessor, the specific contents of which can vary based on core architecture.

So I'll keep referring to such things as (GPU) cores, and we'll all know what I'm talking about. On the NVidia website, there are references in spec sheets to things like CUDA cores, shader cores, tensor cores, etc, but personally, I feel those are misleading names; the Compute Unit/Streaming Multiprocessor seem to me to be the only things worthy of being called a "core".

On the Widipedia lists for NVidia and AMD GPUs, I have seen only 2 units, both from NVidia, with over 100 cores, 60 to 80 is common, from both manufacturers.

Ian&Steve C. wrote:

hadron wrote:
Why would anyone do anything so colossally stupid as to tell a task that clearly needs the CPU throughout its entire runtime that it only needs the CPU half the time, or indeed, anything less than 100% of the time?

this is exactly what I was trying to get across. that you should tailor your settings to reality. if the task uses 10% of the CPU core, set it to 0.1. if the task uses 100% of the CPU core, set it to 1.0. there is no one-size-fits-all if you want to optimize tightly.

I figured that out long ago, just by reading the manual. I was trying to understand the general rules on which both <gpu_usage> and <cpu_usage> operate. After that, tailoring the settings to specific cases depends only on knowing how much memory and how much of a CPU each app needs.

Ian&Steve C. wrote:

side note, these kinds of settings, while they can be fractional, and are counted as fractional when summing up resources, the resultant value for number of cores used cannot be and the remainders get truncated.

0.1 for 1 task, really = 0 to BOINC
0.5 for 1 task, really = 0 to BOINC
0.1 for 10 tasks, really = 1 to BOINC
0.5 for 5 tasks, really = 2 to BOINC

Say what?!!! That makes absolutely no sense to me. This means, for example, that for 9 tasks set to 0.1111111111. BOINC will round 0.9999999999 to 0.

Of course, I suppose it doesn't matter, given the efficiency of the task scheduler to keep things running smoothly.

But what about <gpu_usage>. Is it similarly rounded down? That becomes important if one wishes to run n tasks where 1/n is a non-terminating decimal, eg. 1/7 = 0.14287.... recursive. If <gpu_usage> is set to 0.14287, will that still result in 7 running tasks, or will (0.14287)^-1 = 6.99937005669489746... be rounded down to 6?

Well, duh, that was brilliant -- really. First off, 1/7 = 0.142857.. recursive, not as I wrote above; then 0.1428570 < 0.142857... recursive = 1/7, so (0.142857)^-1 > 7.

Ian&Steve C. wrote:

hadron wrote:
If you really want to talk about "over-committed" systems, first you should open a console, run htop, and sort on CPU. Find out just how many processes are active at the time -- it might surprise you

i do this on the regular, it doesn't surprise me. most of those processes are not actively doing anything. My comments have been in regards to BOINC active tasks.

I was actually referring to the number of active processes vs. the number of threads in the CPU. With your Epyc CPUs, perhaps it wouldn't be as noticeable to you, but with my 24-thread 5900X, it definitely is. 16 single-thread BOINC tasks plus 2x 2-core ATLAS tasks will actually be running in 22 processes, because each ATLAS task spawns one task for each of the threads it's using. All of those run at upward of 90% CPU usage. The X server likewise runs at or near 90%.

Usually, I have 26 to 30 active processes each using a significant CPU percentage (this is a rather quiet day so far, only 24 tasks so high in the stratosphere). Anyway, moot point -- and just more confirmation that the task scheduler is doing what it should.

Ian&Steve C. wrote:

I think there's been enough shop-talk and speculation about what can or will happen. i think you should just buy whatever GPU fits you and play with the settings to reach your desired outcome. BOINC has a lot of knobs to turn.

Agreed, but do know this has all been quite productive for me, despite the fact that, at times, we may have been talking at cross-purposes.

Also, my apologies if at times I got overly harsh in my wording. I tried to keep things at a simmer throughout, but occasionally my frustration may have got the better of me.

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 401

Credit: 10139713455

RAC: 25976194

Ian&Steve C. wrote:I think

22 Jun 2024 6:27:19 UTC

Message 226221 in response to message 226171

(moderation:

)

Ian&Steve C. wrote:

I think there's been enough shop-talk and speculation about what can or will happen. i think you should just buy whatever GPU fits you and play with the settings to reach your desired outcome. BOINC has a lot of knobs to turn.

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98585697

RAC: 538884

Well, blah. I went ahead last

26 Jun 2024 3:48:49 UTC

Message 226338

(moderation:

)

Well, blah. I went ahead last Friday, and got the RX7600XT; it arrived yesterday, so I went to work to get it up.

The only openCL package that comes with SUSE is from Mesa. It recognizes the RX560, but not the 7600XT -- totally useless with BOINC.

So off I went, looking for something else from the open-source community -- yeah, I am not about to turn this machine into a developer's haven, so now there is only one thing left: off to talk to the gods themselves.

AMD has a very nice-looking set of packages that have a lot more features than just openCL. Unfortunately, AMD clearly states that the software supports only the following GPUs:

AMD Radeon RX 7900 XTX
AMD Radeon RX 7900 XT
AMD Radeon RX 7900 GRE
AMD Radeon PRO W7900
AMD Radeon PRO W7900DS
AMD Radeon PRO W7800

I found out the hard way that that means just what it says: the new card still isn't recognized, nor is the old one.

Furthermore, when I came back after a reboot, BOINC wasn't even able to run -- it seems that somehow, it wasn't able to locate its own home directory.

OK, time to go back to the start -- bye-bye AMD stuff, reboot to clear out all the pesky kernel modules in loaded, and I am back to where I was before. The only thing I can say right now is, thank the gawds that I didn't have to do a complete version rollback to get things running again.

So now I have two choices:

I can post in the AMD forums and wait for who knows how long for someone to get back to me -- and I suspect that any answer there will be "at present, there is no solution"

or, I can just bite the bullet, spend the extra money, and get an NVidia card. There's a nice looking one, an RTX 4060 Ti, that's less than $300 more than what I've already spent. What's a while longer eating more PB sandwiches, and less steak? lol

GWGeorge007

Joined: 8 Jan 18

Posts: 3060

Credit: 4964497686

RAC: 1411113

hadron wrote: From: 26 Jun

26 Jun 2024 17:05:27 UTC

Message 226352 in response to message 226338

(moderation:

)

hadron wrote:

From: 26 Jun 2024 3:48:49 UTC MESSAGE 226338

Well, blah. I went ahead last Friday, and got the RX7600XT; it arrived yesterday, so I went to work to get it up.

.....snip.....

AMD has a very nice-looking set of packages that have a lot more features than just openCL. Unfortunately, AMD clearly states that the software supports only the following GPUs:

I found out the hard way that that means just what it says: the new card still isn't recognized, nor is the old one.

.....snip.....

So now I have two choices:

I can post in the AMD forums and wait for who knows how long for someone to get back to me -- and I suspect that any answer there will be "at present, there is no solution"

or, I can just bite the bullet, spend the extra money, and get an NVidia card. There's a nice looking one, an RTX 4060 Ti, that's less than $300 more than what I've already spent. What's a while longer eating more PB sandwiches, and less steak? lol

Hadron,

If you haven't found out by now, I suggest you look into "Ian&Steve C." profile / computer log. He is a very influential 'volunteer' cruncher with BOINC and especially Einstein with our GPU Users Group, and so is Keith Myers for that matter.

Ian had at one time said: "I have extensive experience with BOINC", and it is my impression that a person such as yourself should really listen (read?) him and not argue with him about the complexities of BOINC. He is, and was really trying to help you.

I know you have been with BOINC Einstein for many years, 19 to be exact by doing a search, and yet have little credits toward your experience. If I'm wrong, and you only have been with BOINC/Einstein less than a year and a half, my comments above should mean much more to you.

I know the budget is tight, like many of us, but please follow Ian's advice if you want to become successful with Einstein... and BOINC. It will likely save us all from being overwhelmed by the long diatribe's of endless back and forth arguing that went on in the past.

George

Proud member of the Old Farts Association

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98585697

RAC: 538884

GWGeorge007

26 Jun 2024 18:40:39 UTC

Message 226353 in response to message 226352

(moderation:

)

GWGeorge007 wrote:

Hadron,

If you haven't found out by now, I suggest you look into "Ian&Steve C." profile / computer log. He is a very influential 'volunteer' cruncher with BOINC and especially Einstein with our GPU Users Group, and so is Keith Myers for that matter.

Ian had at one time said: "I have extensive experience with BOINC", and it is my impression that a person such as yourself should really listen (read?) him and not argue with him about the complexities of BOINC. He is, and was really trying to help you.

All of that was quite obvious to me right from the start, which is why I went on for so long -- I was trying to pick his brain. I didn't regard any of it as argumentative, but in the end it was clear he was speaking from one vantage point, while I was speaking from another. Apologies that it did go on for so long, but in the end I did find out what I wanted to know.

GWGeorge007 wrote:

I know you have been with BOINC Einstein for many years, 19 to be exact by doing a search, and yet have little credits toward your experience. If I'm wrong, and you only have been with BOINC/Einstein less than a year and a half, my comments above should mean much more to you.

Actually, when I look at the other places I'm involved with (LHC and Rosetta), I see I joined those in early Sept. 2022 -- and this would be when I first started running BOINC. I've also crunched for WCG and Cosmology@H, but I'm pretty sure I joined those after I joined LHC and Rosetta. For reasons I think are obvious, I left those some time ago.

So I don't know how you came up with 19 years.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46725482642

RAC: 64344990

Yeah I think we're on the

26 Jun 2024 19:09:06 UTC

Message 226355

(moderation:

)

Yeah I think we're on the same page now. no worries. I like an occasional rap battle :)

but it does look like you ran into my biggest gripe with AMD (at least on Linux), their terrible driver support. if someone were to google AMD linux drivers without any further context they might think that it's great and linux already includes the drivers. but most of those online comments and statements are regarding Mesa drivers, which are basically useless for most BOINC projects. and in cases where they work at all, they are many times slower than the ROCm or AMDGPU proprietary drivers. but those options also come with caveats of only supporting certain configurations/platforms/hardware/environments unless you want to go through the trouble to compile things specifically for your environment and/or hack up the driver packages to extract and manually install the bits you need.

I don't have any experience with OpenSUSE specifically, but i did a 5min google search and did find some interesting discussions about how to install ROCm drivers on OpenSUSE, and it might be possible. I would suggest doing your own research on that as you can probably parse through the search results and more easily determine what is or is not applicable to you.

whereas Nvidia drivers by and large "just work" without all these edge case caveats and hoops to jump through. Nvidia suits me better personally, even at the higher price (relatively) since I'm comfortable with it, I have a few applications/projects I run that are Nvidia-only (GPUGRID, only CUDA apps, AMD can't contribute, some other special/custom optimized apps for other projects that are CUDA), and at this point I prefer to use CUDA apps almost exclusively since I can further optimize with Nvidia's MPS which also only works on CUDA apps (and is Linux only as well). I also don't have many qualms with buying used hardware. the price:performance ratio becomes more favorable, and rarely have I had any used GPU failures.

_________________________________________________________________________

GWGeorge007

Joined: 8 Jan 18

Posts: 3060

Credit: 4964497686

RAC: 1411113

hadron wrote: All of that

26 Jun 2024 20:27:03 UTC

Message 226358 in response to message 226353

(moderation:

)

hadron wrote:

All of that was quite obvious to me right from the start, which is why I went on for so long -- I was trying to pick his brain. I didn't regard any of it as argumentative, but in the end it was clear he was speaking from one vantage point, while I was speaking from another. Apologies that it did go on for so long, but in the end I did find out what I wanted to know.

Actually, when I look at the other places I'm involved with (LHC and Rosetta), I see I joined those in early Sept. 2022 -- and this would be when I first started running BOINC. I've also crunched for WCG and Cosmology@H, but I'm pretty sure I joined those after I joined LHC and Rosetta. For reasons I think are obvious, I left those some time ago.

So I don't know how you came up with 19 years.

I found your NAME twice using the BOINC/Einstein search function:

I don't know if you and someone else had or have the same name or not, but it is what it is.

Regardless, I'm glad you are getting to use Ian's advice and did find out what you wanted to know.

George

Proud member of the Old Farts Association

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98585697

RAC: 538884

GWGeorge007 wrote: hadron

26 Jun 2024 20:58:57 UTC

Message 226360 in response to message 226358

(moderation:

)

GWGeorge007 wrote:

hadron wrote:

So I don't know how you came up with 19 years.

I found your NAME twice using the BOINC/Einstein search function:

On the other hand, if you look at my profile here, you will find this:

Member since: 27 January 2023

Country:

BOINC ID: 1045591

GWGeorge007

Joined: 8 Jan 18

Posts: 3060

Credit: 4964497686

RAC: 1411113

hadron wrote: On the other

26 Jun 2024 21:20:10 UTC

Message 226364 in response to message 226360

(moderation:

)

hadron wrote:

On the other hand, if you look at my profile here, you will find this:

Member since: 27 January 2023

Country:

BOINC ID: 1045591

Like I said before:

GWGeorge007 wrote:

I don't know if you and someone else had or have the same name or not, but it is what it is.

Regardless, I'm glad you are getting to use Ian's advice and did find out what you wanted to know.

Please, let's not get too wound up over this. For all I know, you were at one time into Einstein, then for some reason dropped out and restarted Einstein again with a different BOINC ID #. Let's drop it now, okay?

George

Proud member of the Old Farts Association

hadron

Joined: 27 Jan 23

Posts: 62

Credit: 98585697

RAC: 538884

OK, the NVidia card arrived

3 Jul 2024 5:06:46 UTC

Message 226561

(moderation:

)

OK, the NVidia card arrived yesterday (Jul 01) and I installed it today. I had a few tense moments where nothing seemed to be going right (details on request if anyone's at all interested), but in the end, after the other stuff was sorted, I had forgotten to add user boinc to groups render and video, Once that was out of the way and the client restarted, Boinc was able to recognize the card, and down came a few GW tasks and a couple of BRP7.

The first 4 O3AS tasks have now been reported, total run time of just over an hour. Personally, I think that compares reasonably favourably with the half-hour times Ian&Steve's monster crunchers require.

app_config settings for multiple GPU apps

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner