03:32:25 (24556): [debug]: Set up communication with graphics process.
...
% C 0 67
% C 0 135
% C 0 203
... checkpoints ~70s apart
% C 0 611
% C 0 679
% C 0 747 (12:27 -> 03:44:52)
...
04:36:14 (24556): [normal]: done. calling boinc_finish(0).
Start time to the last checkpoint was 12m27s. Last checkpoint to 'done' was 51m22s later. This is (mostly) the remaining 11% part that gets processed on the cpu, typically 15-20s cpu processing.
What is the cpu doing that prevents it from dealing with this? Too much cpu work from BRP/GW? Priorities favoring cpu work?
... checkpoints ~70s apart
% C 0 611
% C 0 679
% C 0 747 (12:27 -> 03:44:52)
These aren't seconds but some kind of subroutine counter or progress counter. So 747 isn't 12m27s. Depending on the CPU speed (e.g. iGPU), GPU speed, BOINC's CPU throttle configuration or the CPU load (other science apps consuming memory bandwidth) the progress between checkpoints differs. There are 11 checkpoints within this task. End time minus start time is ~64 minutes. I assume a checkpoint configuration of 300 seconds (BOINC's client configuration). So maybe 55-60 minutes total time until last checkpoint was written. Afterwards final toplist calculation starts which is often done on the CPU as many/most GPU don't support FP64 (64bit... 'double precision'... floating point arithmetics).
There's no obvious problem here. Task runs for 64 minutes on the CPUs' iGPU (Radeon) of this Ryzen 7 Desktop CPU. That's too slow? I don't know. On my old Core i7 such FGRPB1G tasks take 5..12 hours on iGPU, depending on CPU load (memory bandwidth limits).
[...] This is (mostly) the remaining 11% part that gets processed on the cpu, typically 15-20s cpu processing.
What is the cpu doing that prevents it from dealing with this? Too much cpu work from BRP/GW? Priorities favoring cpu work?
AMD Ryzen 7 7700X iGPU (AMD Radeon(TM) Graphics (12284MB)), DeviceID "gfx1036" (see task log) is FP64 capable (see task log). There's no need to use CPU for final toplist computation.
Scrooge McDuck wrote:
...Afterwards final toplist calculation starts which is often done on the CPU as many/most GPU don't support FP64 (64bit... 'double precision'... floating point arithmetics).
So, I was wrong too...
But Martin (astro-marwil) doesn't want to use this iGPU for BOINC. So... how to activate discrete AMD GPU card for BOINC and how to disable iGPU for BOINC without being forced to switch it off in BIOS? I have no idea but as Ian&SteveC, Mikey and SkipDaShu already suggested: The solution is a customized app_info.xml and/or cc_config.xml file.
Many, many thanks to all of you, for all of your assistance!
Unfortunately, I have no luck with cc_config.xml . It’s now this:
<cc_config>
<exclude_gpu>
<url>https://einstein.phys.uwm.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
</cc_config>
and in the notifications, I get in the first line the remark : unrecognized tag in cc_config.xml: <exclude_gpu>. I tried several, also with hard brackets, without success. Is the url correct? Should it be the address of the server, from which the task become downloaded, which I didn’t find? What else could it be? I assume, it will be a minor failure, like a missing space or so? But I tested so much.
Many, many thanks to all of you, for all of your assistance!
Unfortunately, I have no luck with cc_config.xml . It’s now this:
<cc_config>
<exclude_gpu>
<url>https://einstein.phys.uwm.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
</cc_config>
and in the notifications, I get in the first line the remark : unrecognized tag in cc_config.xml: <exclude_gpu>. I tried several, also with hard brackets, without success. Is the url correct? Should it be the address of the server, from which the task become downloaded, which I didn’t find? What else could it be? I assume, it will be a minor failure, like a missing space or so? But I tested so much.
Many, many thanks to all of you, for all of your assistance!
Unfortunately, I have no luck with cc_config.xml . It’s now this:
<cc_config>
<exclude_gpu>
<url>https://einstein.phys.uwm.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
</cc_config>
and in the notifications, I get in the first line the remark : unrecognized tag in cc_config.xml: <exclude_gpu>. I tried several, also with hard brackets, without success. Is the url correct? Should it be the address of the server, from which the task become downloaded, which I didn’t find? What else could it be? I assume, it will be a minor failure, like a missing space or so? But I tested so much.
I get a corresponding remark in the notifications. And beside of this, it will not work as I want.
Kind regards and happy crunching
Martin
Here's a working cc_config.xml from a Ryzen 7 5700G with the iGPU excluded from all BOINC projects but running my monitor on that box. The discrete card (RX 6600) is platform "0". The iGPU is platform "1".
There's no obvious problem here. Task runs for 64 minutes on the CPUs' iGPU (Radeon) of this Ryzen 7 Desktop CPU. That's too slow? I don't know.
The obvious problem here is that the task was running on the iGPU/APU when the expectation was for it to run on the RX6600. A 6600 will crunch a GR task in 800-1200s. Taking 3800s is a big red flag.
We can revisit the timing of the checkpoints once the GR tasks are running on the 6600.
Received:29 Apr 2023 12:05:50 UTC
Run time (sec):8,340.26
CPU time (sec):160.02
Using OpenCL device "gfx1036" by: Advanced Micro Devices, Inc.
iGPU seems to be slowed down by many parallel O3MD1 CPU tasks consuming memory bandwidth. Maybe until now Martin hasn't had the time to try out different client configurations to disable iGPU... But I'd assume that discrete GPU runs fast enough now.-... 11 minutes... There are tasks running less than 9 minutes too.
So... Martin (astro-marwil) now runs FGRPB1G tasks on iGPU and discrete GPU card in parallel:
iGPU seems to be slowed down by many parallel O3MD1 CPU tasks consuming memory bandwidth.
Big problem of consumer boards. I see memory bottlenecks even without using iGPU. Even though it's DDR5, two channels just don't cut it. Especially not if some iGPU wants their share too. And there is no nice value or rate limit for memory :)
Last night I found the error in the code, I copied direct from the end of Ian&Steves thread Message 211606, the missing closing remark in the next-to-last line. I didn't see it, even hundred times before. Before that, I deleted the <exclude-gpu>. That gave also error remarks, but worked for prolonged time with both GPUs. Until yesterday, when an update for .NET Framework became installed. From then on, only the iGPU was in charge, as before.
Nevertheless, it was a great fun for me to write some lines of code after 30 to 40 years.
From mid of may on, I'll start to optimize the workload for maximum of Cobblestone/Wh. Now there are running 7 threads of O3 (CPU) and one of FGRP1G in parallel. That gives 85 to 90% of CPU-load and 75% of RAM-load averaged over 1 minute. All cores are about equally loaded. Don't forget, I'm using Process Lasso, with priority set to less than normal for O3 (CPU) and real-time for FGRP1G. Crunching more threads in parallel increases all, CPU-, RAM-load and running times of the threads.
Hi Martin, Looking at the
)
Hi Martin,
Looking at the task log for one of your GR tasks from earlier today, I see a huge problem:
Start time to the last checkpoint was 12m27s. Last checkpoint to 'done' was 51m22s later. This is (mostly) the remaining 11% part that gets processed on the cpu, typically 15-20s cpu processing.
What is the cpu doing that prevents it from dealing with this? Too much cpu work from BRP/GW? Priorities favoring cpu work?
mountkidd schrieb:% C 0
)
These aren't seconds but some kind of subroutine counter or progress counter. So 747 isn't 12m27s. Depending on the CPU speed (e.g. iGPU), GPU speed, BOINC's CPU throttle configuration or the CPU load (other science apps consuming memory bandwidth) the progress between checkpoints differs. There are 11 checkpoints within this task. End time minus start time is ~64 minutes. I assume a checkpoint configuration of 300 seconds (BOINC's client configuration). So maybe 55-60 minutes total time until last checkpoint was written. Afterwards final toplist calculation starts which is often done on the CPU as many/most GPU don't support FP64 (64bit... 'double precision'... floating point arithmetics).
There's no obvious problem here. Task runs for 64 minutes on the CPUs' iGPU (Radeon) of this Ryzen 7 Desktop CPU. That's too slow? I don't know. On my old Core i7 such FGRPB1G tasks take 5..12 hours on iGPU, depending on CPU load (memory bandwidth limits).
mountkidd schrieb:[...] This
)
AMD Ryzen 7 7700X iGPU (AMD Radeon(TM) Graphics (12284MB)), DeviceID "gfx1036" (see task log) is FP64 capable (see task log). There's no need to use CPU for final toplist computation.
So, I was wrong too...
But Martin (astro-marwil) doesn't want to use this iGPU for BOINC. So... how to activate discrete AMD GPU card for BOINC and how to disable iGPU for BOINC without being forced to switch it off in BIOS? I have no idea but as Ian&SteveC, Mikey and SkipDaShu already suggested: The solution is a customized app_info.xml and/or cc_config.xml file.
project specific exclusion:
or for all projects...
Hello! Many, many thanks
)
Hello!
Many, many thanks to all of you, for all of your assistance!
Unfortunately, I have no luck with cc_config.xml . It’s now this:
<cc_config>
<exclude_gpu>
<url>https://einstein.phys.uwm.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
</cc_config>
and in the notifications, I get in the first line the remark : unrecognized tag in cc_config.xml: <exclude_gpu>. I tried several, also with hard brackets, without success. Is the url correct? Should it be the address of the server, from which the task become downloaded, which I didn’t find? What else could it be? I assume, it will be a minor failure, like a missing space or so? But I tested so much.
Also with
<cc_config>
<use_all_gpus>1</use_all_gpus>
</cc_config>
I get a corresponding remark in the notifications. And beside of this, it will not work as I want.
Kind regards and happy crunching
Martin
astro-marwil
)
you need to put these inside an <options> element.
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<exclude_gpu>
<url>https://einstein.phys.uwm.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
</options>
</cc_config>
_________________________________________________________________________
astro-marwil
)
Here's a working cc_config.xml from a Ryzen 7 5700G with the iGPU excluded from all BOINC projects but running my monitor on that box. The discrete card (RX 6600) is platform "0". The iGPU is platform "1".
<cc_config>
<log_flags>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
<task>1</task>
</log_flags>
<options>
<ignore_ati_dev>1</ignore_ati_dev>
<max_file_xfers>8</max_file_xfers>
<max_file_xfers_per_project>5</max_file_xfers_per_project>
<ncpus>-1</ncpus>
</options>
</cc_config>
Skip
PS: I'd remove the max_file_xfers lines... left over from trying to do something in the past.
Scrooge McDuck
)
The obvious problem here is that the task was running on the iGPU/APU when the expectation was for it to run on the RX6600. A 6600 will crunch a GR task in 800-1200s. Taking 3800s is a big red flag.
We can revisit the timing of the checkpoints once the GR tasks are running on the 6600.
So... Martin (astro-marwil)
)
So... Martin (astro-marwil) now runs FGRPB1G tasks on iGPU and discrete GPU card in parallel:
discrete GPU task
iGPU task
iGPU seems to be slowed down by many parallel O3MD1 CPU tasks consuming memory bandwidth. Maybe until now Martin hasn't had the time to try out different client configurations to disable iGPU... But I'd assume that discrete GPU runs fast enough now.-... 11 minutes... There are tasks running less than 9 minutes too.
Happy crunching...
Scrooge McDuck wrote: So...
)
Big problem of consumer boards. I see memory bottlenecks even without using iGPU. Even though it's DDR5, two channels just don't cut it. Especially not if some iGPU wants their share too. And there is no nice value or rate limit for memory :)
And you don't get an iGPU on EPYC or W-3400.
Hallo!It does work now as
)
Hallo!
It does work now as wanted!!!
Last night I found the error in the code, I copied direct from the end of Ian&Steves thread Message 211606, the missing closing remark in the next-to-last line. I didn't see it, even hundred times before. Before that, I deleted the <exclude-gpu>. That gave also error remarks, but worked for prolonged time with both GPUs. Until yesterday, when an update for .NET Framework became installed. From then on, only the iGPU was in charge, as before.
Nevertheless, it was a great fun for me to write some lines of code after 30 to 40 years.
From mid of may on, I'll start to optimize the workload for maximum of Cobblestone/Wh. Now there are running 7 threads of O3 (CPU) and one of FGRP1G in parallel. That gives 85 to 90% of CPU-load and 75% of RAM-load averaged over 1 minute. All cores are about equally loaded. Don't forget, I'm using Process Lasso, with priority set to less than normal for O3 (CPU) and real-time for FGRP1G. Crunching more threads in parallel increases all, CPU-, RAM-load and running times of the threads.