All things Amd GPU

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 176
Credit: 12546412555
RAC: 8031721

Hi Martin, Looking at the

Hi Martin,

Looking at the task log for one of your GR tasks from earlier today, I see a huge problem:

03:32:25 (24556): [debug]: Set up communication with graphics process.
...
% C 0 67
% C 0 135
% C 0 203
...         checkpoints ~70s apart
% C 0 611
% C 0 679
% C 0 747   (12:27 -> 03:44:52)
...
04:36:14 (24556): [normal]: done. calling boinc_finish(0).

Start time to the last checkpoint was 12m27s.  Last checkpoint to 'done' was 51m22s later.  This is (mostly) the remaining 11% part that gets processed on the cpu, typically 15-20s cpu processing. 

What is the cpu doing that prevents it from dealing with this?  Too much cpu work from BRP/GW?  Priorities favoring cpu work?

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1052
Credit: 17867359
RAC: 12395

mountkidd schrieb:% C 0

mountkidd wrote:

% C 0 203

...         checkpoints ~70s apart
% C 0 611
% C 0 679
% C 0 747   (12:27 -> 03:44:52)

These aren't seconds but some kind of subroutine counter or progress counter. So 747 isn't 12m27s. Depending on the CPU speed (e.g. iGPU), GPU speed, BOINC's CPU throttle configuration or the CPU load (other science apps consuming memory bandwidth) the progress between checkpoints differs. There are 11 checkpoints within this task. End time minus start time is ~64 minutes. I assume a checkpoint configuration of 300 seconds (BOINC's client configuration). So maybe 55-60 minutes total time until last checkpoint was written. Afterwards final  toplist calculation starts which is often done on the CPU as many/most GPU don't support FP64 (64bit... 'double precision'... floating point arithmetics).

There's no obvious problem here. Task runs for 64 minutes on the CPUs' iGPU (Radeon) of this Ryzen 7 Desktop CPU. That's too slow? I don't know. On my old Core i7 such FGRPB1G tasks take 5..12 hours on iGPU, depending on CPU load (memory bandwidth limits).

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1052
Credit: 17867359
RAC: 12395

mountkidd schrieb:[...] This

mountkidd wrote:

[...] This is (mostly) the remaining 11% part that gets processed on the cpu, typically 15-20s cpu processing. 

What is the cpu doing that prevents it from dealing with this?  Too much cpu work from BRP/GW?  Priorities favoring cpu work?

AMD Ryzen 7 7700X iGPU (AMD Radeon(TM) Graphics (12284MB)), DeviceID "gfx1036" (see task log) is FP64 capable (see task log). There's no need to use CPU for final toplist computation.

Scrooge McDuck wrote:
...Afterwards final  toplist calculation starts which is often done on the CPU as many/most GPU don't support FP64 (64bit... 'double precision'... floating point arithmetics).

So, I was wrong too...

But Martin (astro-marwil) doesn't want to use this iGPU for BOINC. So... how to activate discrete AMD GPU card for BOINC and how to disable iGPU for BOINC without being forced to switch it off in BIOS? I have no idea but as Ian&SteveC, Mikey and SkipDaShu already suggested: The solution is a customized app_info.xml and/or cc_config.xml file.

project specific exclusion:

<exclude_gpu>
   <url>project_URL</url>
   [<device_num>N</device_num>]
   [<type>NVIDIA|ATI|intel_gpu</type>]
   [<app>appname</app>]
</exclude_gpu>

or for all projects...

<ignore_ati_dev>N</ignore_ati_dev>
astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 531
Credit: 641776543
RAC: 1110146

Hello! Many, many thanks

Hello!

Many, many thanks to all of you, for all of your assistance!

Unfortunately, I have no luck with cc_config.xml . It’s now this:

<cc_config>

    <exclude_gpu>

        <url>https://einstein.phys.uwm.edu/</url>

        <device_num>0</device_num>

    </exclude_gpu>

</cc_config>

and in the notifications, I get in the first line the remark : unrecognized tag in cc_config.xml: <exclude_gpu>. I tried several, also with hard brackets, without success. Is the url correct? Should it be the address of the server, from which the task become downloaded, which I didn’t find? What else could it be? I assume, it will be a minor failure, like a missing space or so? But I tested so much.

Also with

<cc_config>
    <use_all_gpus>1</use_all_gpus>
</cc_config>

I get a corresponding remark in the notifications. And beside of this, it will not work as I want.

Kind regards and happy crunching

Martin

 

 

 

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46591812642
RAC: 64178958

astro-marwil

astro-marwil wrote:

Hello!

Many, many thanks to all of you, for all of your assistance!

Unfortunately, I have no luck with cc_config.xml . It’s now this:

<cc_config>

    <exclude_gpu>

        <url>https://einstein.phys.uwm.edu/</url>

        <device_num>0</device_num>

    </exclude_gpu>

</cc_config>

and in the notifications, I get in the first line the remark : unrecognized tag in cc_config.xml: <exclude_gpu>. I tried several, also with hard brackets, without success. Is the url correct? Should it be the address of the server, from which the task become downloaded, which I didn’t find? What else could it be? I assume, it will be a minor failure, like a missing space or so? But I tested so much.

Also with

<cc_config>
    <use_all_gpus>1</use_all_gpus>
</cc_config>

I get a corresponding remark in the notifications. And beside of this, it will not work as I want.

Kind regards and happy crunching

Martin

 

you need to put these inside an <options> element.


<cc_config>
   <options>
      <use_all_gpus>1</use_all_gpus>
      <exclude_gpu>
         <url>https://einstein.phys.uwm.edu/</url>
         <device_num>0</device_num>
      </exclude_gpu>
   </options>
</cc_config>

_________________________________________________________________________

Skip Da Shu
Skip Da Shu
Joined: 18 Jan 05
Posts: 151
Credit: 1039976322
RAC: 760684

astro-marwil

astro-marwil wrote:

Hello!

Many, many thanks to all of you, for all of your assistance!

Unfortunately, I have no luck with cc_config.xml . It’s now this:

<cc_config>

    <exclude_gpu>

        <url>https://einstein.phys.uwm.edu/</url>

        <device_num>0</device_num>

    </exclude_gpu>

</cc_config>

and in the notifications, I get in the first line the remark : unrecognized tag in cc_config.xml: <exclude_gpu>. I tried several, also with hard brackets, without success. Is the url correct? Should it be the address of the server, from which the task become downloaded, which I didn’t find? What else could it be? I assume, it will be a minor failure, like a missing space or so? But I tested so much.

Also with

<cc_config>
    <use_all_gpus>1</use_all_gpus>
</cc_config>

I get a corresponding remark in the notifications. And beside of this, it will not work as I want.

Kind regards and happy crunching

Martin

Here's a working cc_config.xml from a Ryzen 7 5700G with the iGPU excluded from all BOINC projects but running my monitor on that box.  The discrete card (RX 6600) is platform "0".  The iGPU is platform "1".

<cc_config>
    <log_flags>
        <file_xfer>1</file_xfer>
        <sched_ops>1</sched_ops>
        <task>1</task>
    </log_flags>
    <options>
        <ignore_ati_dev>1</ignore_ati_dev>
        <max_file_xfers>8</max_file_xfers>
        <max_file_xfers_per_project>5</max_file_xfers_per_project>
        <ncpus>-1</ncpus>
    </options>
</cc_config>

Skip

PS:  I'd remove the max_file_xfers lines... left over from trying to do something in the past.

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 176
Credit: 12546412555
RAC: 8031721

Scrooge McDuck

Scrooge McDuck wrote:

There's no obvious problem here. Task runs for 64 minutes on the CPUs' iGPU (Radeon) of this Ryzen 7 Desktop CPU. That's too slow? I don't know.

The obvious problem here is that the task was running on the iGPU/APU when the expectation was for it to run on the RX6600.  A 6600 will crunch a GR task in 800-1200s.  Taking 3800s is a big red flag.

We can revisit the timing of the checkpoints once the GR tasks are running on the 6600.

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1052
Credit: 17867359
RAC: 12395

So... Martin (astro-marwil)

So... Martin (astro-marwil) now runs FGRPB1G tasks on iGPU and discrete GPU card in parallel:

discrete GPU task

Received:29 Apr 2023 10:02:47 UTC
Run time (sec):665.24
CPU time (sec):74.00
Using OpenCL device "gfx1032" by: Advanced Micro Devices, Inc.

iGPU task

Received:29 Apr 2023 12:05:50 UTC
Run time (sec):8,340.26
CPU time (sec):160.02
Using OpenCL device "gfx1036" by: Advanced Micro Devices, Inc.

iGPU seems to be slowed down by many parallel O3MD1 CPU tasks consuming memory bandwidth. Maybe until now Martin hasn't had the time to try out different client configurations to disable iGPU... But I'd assume that discrete GPU runs fast enough now.-... 11 minutes... There are tasks running less than 9 minutes too.

Happy crunching...

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 66
Credit: 56155179
RAC: 0

Scrooge McDuck wrote: So...

Scrooge McDuck wrote:

So... Martin (astro-marwil) now runs FGRPB1G tasks on iGPU and discrete GPU card in parallel:

iGPU seems to be slowed down by many parallel O3MD1 CPU tasks consuming memory bandwidth.

 

Big problem of consumer boards. I see memory bottlenecks even without using iGPU. Even though it's DDR5, two channels just don't cut it. Especially not if some iGPU wants their share too. And there is no nice value or rate limit for memory :)

And you don't get an iGPU on EPYC or W-3400.

 

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 531
Credit: 641776543
RAC: 1110146

Hallo!It does work now as

Hallo!

It does work now as wanted!!!

Last night I found the error in the code, I copied direct from the end of Ian&Steves thread Message 211606, the missing closing remark in the next-to-last line. I didn't see it, even hundred times before. Before that, I deleted the <exclude-gpu>. That gave also error remarks, but worked for prolonged time with both GPUs. Until yesterday, when an update for .NET Framework became installed. From then on, only the iGPU was in charge, as before.

Nevertheless, it was a great fun for me to write some lines of code after 30 to 40 years.

From mid of may on, I'll start to optimize the workload for maximum of Cobblestone/Wh. Now there are running 7 threads of O3 (CPU) and one of FGRP1G in parallel. That gives 85 to 90% of CPU-load and 75% of RAM-load averaged over 1 minute. All cores are about equally loaded. Don't forget, I'm using Process Lasso, with priority set to less than normal for O3 (CPU) and real-time for FGRP1G. Crunching more threads in parallel increases all, CPU-, RAM-load and running times of the threads.

Many thanks to all of you!

Kind regards and happy crunching

Martin

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.