Multidirectional 2.07 doesn't obey CPU limits

Pavel Hanak
Pavel Hanak
Joined: 27 Jul 06
Posts: 9
Credit: 35,337,085
RAC: 16,843
Topic 220570

Hi all, I have two AMD Threadripper machines (1950X and 2920X) and during idle times, I have Boinc Manager (version 7.14.2) set up to use around 90% of CPUs. With other apps, it works pretty well, the real CPU utilization is between 85 and 95%. But the new Multidirectional 2.07 app doesn't obey that, it consumes around 15% more CPU power than the number of active CPU cores. For example, when 13 WUs are running, I have to lower CPU limit below 80% and I still see CPU utilization near 100%. The system (Win10 64-bit Pro) becomes really sluggish and unresposive, too. Is there some way to fix this (apart from limiting the number of concurrent Multi 2.07 WUs via app_config.xml)?

Pavel Hanak
Pavel Hanak
Joined: 27 Jul 06
Posts: 9
Credit: 35,337,085
RAC: 16,843

Just to clarify, executable

Just to clarify, executable name is "einstein_O2MD1_2.07_windows_x86_64__GWnew.exe"

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,054
Credit: 3,341,604,897
RAC: 3,194

Pavel Hanak wrote:Hi all, I

Pavel Hanak wrote:
Hi all, I have two AMD Threadripper machines (1950X and 2920X) and during idle times, I have Boinc Manager (version 7.14.2) set up to use around 90% of CPUs. With other apps, it works pretty well, the real CPU utilization is between 85 and 95%. But the new Multidirectional 2.07 app doesn't obey that, it consumes around 15% more CPU power than the number of active CPU cores. For example, when 13 WUs are running, I have to lower CPU limit below 80% and I still see CPU utilization near 100%. The system (Win10 64-bit Pro) becomes really sluggish and unresposive, too. Is there some way to fix this (apart from limiting the number of concurrent Multi 2.07 WUs via app_config.xml)?

 

That is not what "Use at most %CPU" means.  If you have 8 thread CPU, use 90% means use 7 of 8 threads, if you say use 75% then you would use 6 of 8 threads.  It does not mean use 90% of 1 thread. The way to prevent sluggishness is to limit the total number of work units on your system. App_config with a <max_concurrent> is probably the best way to do it. GPU work units have priority over CPU so calculate how many work units total you want and place that value in the <max_concurrnet>in the app_config.

 

Pavel Hanak
Pavel Hanak
Joined: 27 Jul 06
Posts: 9
Credit: 35,337,085
RAC: 16,843

I know %CPUs  limits number

I know %CPUs  limits number of threads, not actual CPU utilization. But for all other Boinc apps I run (Seti and LHC), they are more or less equal. However, with the Multidirectional 2.07, 50% CPUs translate roughly to 60% actual utilization. Since my machines have 32 and 24 threads respectively, this... inaccuracy can quickly swamp the CPU to full 100% load. Is this "feature" (and not a bug) of the new app, then? Or is it just some quirk on Threadripper machines?

Edit: I prepared an example - I suspended all other apps and set %CPUs to 50% (12 threads running the new 2.07 app). But as you can see from the screenshot, the actual utilization oscillates between 60 and 65%, not around 50% as would be expected. So the inaccuracy is actually even worse than I thought in the first post.

https://imgur.com/a/KL62DgM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 183
Credit: 319,858,204
RAC: 135,928

I'm not sure if this is the

I'm not sure if this is the same as on the CPU tasks as the GPU tasks, but the GPU task seems to be requesting >100% CPU thread, which causes it to spill over, it doesnt just get limited by 1 thread. If it requests 150% CPU it ended up taking 1.5 threads.

 

However, the BOINC scheduler only thinks the jobs are being limited to 100% and not accounting for it taking more, so it allows say 6 jobs to run that are using 8 or more threads. just a guess. are you seeing more threads/tasks running than should be? or are you only going by the CPU%?

 

 

_____________________________________________


Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,054
Credit: 3,341,604,897
RAC: 3,194

Pavel Hanak wrote:I know

Pavel Hanak wrote:

I know %CPUs  limits number of threads, not actual CPU utilization. But for all other Boinc apps I run (Seti and LHC), they are more or less equal. However, with the Multidirectional 2.07, 50% CPUs translate roughly to 60% actual utilization. Since my machines have 32 and 24 threads respectively, this... inaccuracy can quickly swamp the CPU to full 100% load. Is this "feature" (and not a bug) of the new app, then? Or is it just some quirk on Threadripper machines?

Edit: I prepared an example - I suspended all other apps and set %CPUs to 50% (12 threads running the new 2.07 app). But as you can see from the screenshot, the actual utilization oscillates between 60 and 65%, not around 50% as would be expected. So the inaccuracy is actually even worse than I thought in the first post.

https://imgur.com/a/KL62DgM

This probably best answered by the developer of the app. Any response we give is a guess at best. Assuming that the value of 120% of a cpu is correct,then you would need to use a max_concurrent to limit the number of task so that the total value of CPUs is less that what  you would think it would be. In this cause, 120% CPU and requesting a limit of only 12 CPU then you would need to set the max to 10 work units. This isn't the only project where that has happen, I've seen other projects gobble up to 4 CPU down to 2 CPU threads for a work unit then the last few minutes it drops significantly under 1 thread. I think it depends on how they wrote the app.

Pavel Hanak
Pavel Hanak
Joined: 27 Jul 06
Posts: 9
Credit: 35,337,085
RAC: 16,843

And... do the developers read

And... do the developers read these forums?

 

In any case, I double-checked there are no extra "zombie" apps running like Ian&Steve suggested (there were none). It is true that every einsten_O2MD1 app runs 3 threads, but that shouldn't matter - some LHC apps run over 30 threads, yet still consume 100% CPU, not 120%. And yes, one LHC app consumes >100% CPU when it initializes, but einsten_O2MD1 consumes those 120% permanently. In the past, I also encountered problems with thread scheduling - some apps weren't programmed with 16-core/32-thread CPUs in mind and their threads super-rapidly jumped between physical CPUs, causing slowdown and lockups. But this is not einsten_O2MD1's case, either - if I manually change process affinity to single physical CPU, the app still consumes 120% of CPU time.

I also tried to limit the "excess" CPU time in app_config with with <avg_ncpus>, <max_ncpus> and <cpu_usage> options, but none of them helped.

Oh well, I will just limit the number of concurrent einsten_O2MD1 apps for the time being. Thanks for your input, though.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 183
Credit: 319,858,204
RAC: 135,928

Pavel Hanak wrote:I

Pavel Hanak wrote:
I double-checked there are no extra "zombie" apps running like Ian&Steve suggested (there were none).

that's not really what I was suggesting. I was just asking for a spot check that you were running the expected number of tasks.

as I said, on the GPU apps, I've observed the task requesting >100% CPU (which is a per thread value, not total). over 100% means more than one thread being used.

you can see that in this screenshot here: https://i.imgur.com/bBO3z1B.png

BOINC only thinks it's using 100% per thread. so when you say limit to 50% total CPU in compute settings, it will look at how much CPU is used by each thread, and scale it until you reach your defined value for the total CPU. BOINC can't "see" how much CPU is actually being used, it only knows what it is told by the user or config files. so with 50% as your limit, and 100% of each thread, BOINC decides to allow 12 out of 24 threads to run. but in reality more is being used.

you might be able to get the CPU% to work how you want if you use an app_config file and tell BOINC that that app is using 120% of each thread with an value of 1.20. I do not know if this will work though or if BOINC allows values >1.0, just a guess.

_____________________________________________


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.