CPU speed impact to GPU processing

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46784242642
RAC: 64192323

archae86 wrote:This is an

archae86 wrote:
This is an application difference.

well, yes but the nvidia card can't run the AMD app or vice versa ;-). I didn't mean to imply that it was any kind of hardware difference on the cards themselves or anything like that.

 

but are you saying that the best performance comes from a less utilized CPU? in the sense of how much spare CPU resources are available to service other tasks. my systems are crunching duty only and have minimal other tasks to do besides the background OS functions. if so, it seems to fit my observed behavior thus far. the 10-2070 system had 40 threads at it's disposal, and only uses 10 of them to service the GPU work. leaving 30 threads free. this system seems to best handle the GW tasks in terms of GPU utilization

by contrast, the 7-2080 watercooled system has 12 total threads, uses 7 for GPU work, leaving 6 free. it can't seem to load the GPUs more than about 50-60%

and the 7-2070 system has 8 total threads, and uses 7 for GPU support, leaving only 1 free. performance on the GW tasks is not great, even lower GPU use, ~20-30%, and even the GR tasks run 4% slower than the 2070s in the 10-GPU system.

_________________________________________________________________________

rromanchuk
rromanchuk
Joined: 4 May 18
Posts: 7
Credit: 9902647
RAC: 0

I've been doing some sanity

I've been doing some sanity tests just so I can better understand this specific (GW) application's utilization. I'm new to BOINC and its underlying execution architecture, so very prone to user error. it would be nice to run this directly from the command line against a single WU and i'll pass my own flags, which I assume app_config.xml is doing. 

After suspending everything I tried a handful of extreme configurations just so i can easily see the changes in terms of system utilization, but regardless of configuration, GPU/CPU remains oddly consistent across runs. 

I have to slow down and RTFM first. 

rromanchuk
rromanchuk
Joined: 4 May 18
Posts: 7
Credit: 9902647
RAC: 0

One more interesting thing i

One more interesting thing i noticed was just tailing the PID

 

Process: einstein_O2MDF_2.02_x86_64-apple-darwin__GW-opencl-ati 

CPU Utilization: 25%

Threads: 10

GPU: 6%

Idle wake ups: >2000+

 

That 25% in combination with so many wake up feels like I have a configuration error somewhere. I would understand 6% GPU with the CPU bottlenecked, but maybe i'm confused about how the CPU and GPU operate in a multicore environment. Just seems suspect i cant break 25% for that process regardless of all the permutations I have tried, like there is an overriding config hiding somewhere.

 

<app_config>

   <app_version>

       <app_name>einstein_O2MDF</app_name>

       <plan_class>GW-opencl-ati</plan_class>

       <avg_ncpus>1</avg_ncpus>

       <ngpus>1</ngpus>

   </app_version>

</app_config>

mikey
mikey
Joined: 22 Jan 05
Posts: 12682
Credit: 1839086161
RAC: 3857

rromanchuk wrote:I've been

rromanchuk wrote:

I've been doing some sanity tests just so I can better understand this specific (GW) application's utilization. I'm new to BOINC and its underlying execution architecture, so very prone to user error. it would be nice to run this directly from the command line against a single WU and i'll pass my own flags, which I assume app_config.xml is doing. 

After suspending everything I tried a handful of extreme configurations just so i can easily see the changes in terms of system utilization, but regardless of configuration, GPU/CPU remains oddly consistent across runs. 

I have to slow down and RTFM first. 

Just remember that most apps at a Project are written by the same person or group of people at least for the most part. BUT other Projects rarely share that person or group so it could be a long learning curve if you try to do different Projects and then compare them. Strictly as a user I find that there is little to no acknowledged interaction between Pojects even at that level which to me is a no brainer to keep our contributions at the highest level possible.

And I do believe you can run Boinc from a command prompt in Windows...alot of Linux users do it now. I personally am both a Windows and very a novice Linux user but use the gui version on both platforms.  I have crunched from a command prompt in Windows but it was an app that used the gpu and not Boinc itself.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46784242642
RAC: 64192323

here's a sample of what is

here's a sample of what is looks like running 2x Gravity Wave on each of the 10x RTX 2070's (20 tasks running) with 40x cores running about 3.0GHz.

GPU utilization is good, in the mid 90's for the most part.

CPU utilization is fine, but on average using more than 1 whole core per GPU WU. consistent 60-75% CPU use while all jobs are running. 28-30 threads needed to feed 20 GPU jobs.

average run times improved with 2x on this system. taking over 6 mins per WU at 1x, and closer to 10.5-11min for each WU at 2x.

 

its a shame the GW WU's "pay" less when they require so much in the way of system resources.

_________________________________________________________________________

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Just a note to take into

Just a note to take into consideration, the Gravity wave GPU tasks do need a lot of CPU support (a substantial part of the work has not yet been ported to make use of the GPU, or can't be ported) so take care when talking about cores and threads. My experience is that each WU will require a full CPU core to support it. If using hyper threading and 2 GPU WU share one CPU physical core, performance will suffer. Or using AMD CPUs that share FPUs between cores will probably suffer as well.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46784242642
RAC: 64192323

Not sure if you read my post

Not sure if you read my post in its entirety. I did mention that it’s using more than an entire thread for each GPU WU. I don’t think there‘s any way to limit CPU use by an app if you have available CPU. running 2xWU won’t limit the 2 jobs to a single core. its obviously overflowing a bit into unused threads. My guess is the app was coded for some level of multithread capability. 

I have plenty of threads to spare. Performance is better overall (perhaps you prefer the term efficient) running 2X if you look at the run times of 1x vs 2x. But this Is probably only possible because I have so many threads available.  

_________________________________________________________________________

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Holmis wrote:Or using AMD

Holmis wrote:
Or using AMD CPUs that share FPUs between cores will probably suffer as well.

It appears that the Ryzen 3000 series is better?

https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/9

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18718249325
RAC: 6371585

Yes, Zen does not share FPU

Yes, Zen does not share FPU registers among threads like old Bulldozer. Zen 2 improves on original Zen by doubling the FPU register width also to 256bit.

 

rromanchuk
rromanchuk
Joined: 4 May 18
Posts: 7
Credit: 9902647
RAC: 0

[just saw the new posts

[just saw the new posts catching up]

One last follow up. So I ran hsgamma_FGRPB1G_1.17_x86_64-apple-darwin__FGRPopencl-ati-mav, which anecdotally runs well. Turns out I don't know what i'm talking about because tailing that process returns about 20%CPU/80%GPU utilization, ~5 threads and similar wake up reports. Same goes for seti's.

Obviously we're talking about different software, but even from software architecture/openCL standpoint it bothers me I don't understand what the software is being bounded by. It's not bound by any obvious culprits IO(paging/swaping)/CPU/GPU. Seeing CPU pegged regardless of settings makes me believe it is at 100% because core/thread confinement/driver constraints and i'm just seeing an obfuscated display of utilization the OS is telling me. 

I'm kind of curious now though, might dust off my C/openCL and build from source so i can isolate from boinc. Nice thing is though, boinc has a debug mode where it will pull the WU, create the slot but won't execute, so it should be pretty easy to run my own binary against real sample data. 

Man, science software projects are crazy, (https://git.ligo.org/lscsoft/lalsuite/tree/master/lalapps/src/pulsar) will be a miracle if i can even find up to date information on how e@h packages the GW binaries. The links to FGRPB1G's source code is a literal link to a zip file. 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.