More processors used than allowed

alintope
alintope
Joined: 27 Jan 12
Posts: 52
Credit: 295551180
RAC: 0
Topic 210141

I restricted BOINC to use only 75% of my machine's (11659637) processors. This is a core-i7 with 4 cores, hyperthreading is being turned off. A simple calculation says one processor should be unused. That is exactly what I intended. But the BOINC GUI (and Windows Task Manager as well) are showing all cores in full action, 100% processor usage. What has gone wrong?

As a work around I set processor usage to 70%. Now E@H is using 3 cores (whereas it should use only 2).

Anybody who has an explanation?

Heinrich

P.S. My other machine with processor XEON E3 is working fine.

 

 

 

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

If you open Boinc Manager,

If you open Boinc Manager, how many tasks (both CPU & GPU) have the status "Running" in the Tasks tab under both settings?

Jonathan
Jonathan
Joined: 4 Oct 17
Posts: 15
Credit: 16912307
RAC: 5332

I think it may be rounding

I think it may be rounding errors at 70%. Are you giving Boinc Manager time to update or doing a manual update of the project preferences? 

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023644931
RAC: 1808012

BOINC (nor Einstein) has no

BOINC (nor Einstein) has no direct control on how much CPU consumption the tasks have.  Your designation influences the thing that BOINC actually does control, which is how many tasks of what types run under BOINC supervision.

The code assumes that a CPU task uses a whole CPU, and I think it assumes a GPU task uses a fraction of a CPU which you can find someplace.   That assumption can be wrong, sometimes very wrong.

That fraction is a prediction for scheduling purposes, not an enforced allocation.

What task types from BOINC are running on your machine, and in what numbers?

 

alintope
alintope
Joined: 27 Jan 12
Posts: 52
Credit: 295551180
RAC: 0

@Archae86 I planned: 2

@Archae86

I planned:

2 Contin. Grav. Wave searches Gal. Cent. highfreq 1.00 (AVX)

1 Gamma ray pulsar binary search #1 on GPUs 1.20

but 3 gravitational wave searches were running.

The gamma ray search needs 1 CPU core and each gravitational wave search 1 core also.

But: Since yesterday some miracle happened and BOINC is doing well without me having changed anything.

 

 

 

WB8ILI
WB8ILI
Joined: 20 Feb 05
Posts: 45
Credit: 924045877
RAC: 231748

Alintope - This may be of no

Alintope -

This may be of no help but I am posting because I am not sure what settings you are changing.

1) To limit the number of "cores" BOINC will use are you going to the Options Tab - > Computing preferences - > Usage limits? For example, to use 3 out of 4 cores set at 75%. I am not sure what happens if you set it to 70% or 65% or 60% or 55%, but if you have 4 cores BOINC will use 2 or 3 depending on how it "rounds" the number of cores to an integer.

2) As far as I know, it is important to set the CPU% and GPU% correctly in the app_config.xml if you want the results expected. There are default values (from the project) for these, but sometimes it is necessary to modify them. For example, if you do set the number of cores to 3 in step 1 (above) correctly, and then set the CPU usage in the app_config.xml file to 50% (0.50), BOINC will assume it can run 6 tasks simultaneously. However, if the CPU% requirements for the tasks are actually 100%, you computer will be thrashing around trying to run 6 tasks.

Maybe you knew this. If so, sorry for the post.

 

 

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

alintope skrev:@Archae86I

alintope wrote:

@Archae86

I planned:

2 Contin. Grav. Wave searches Gal. Cent. highfreq 1.00 (AVX)

1 Gamma ray pulsar binary search #1 on GPUs 1.20

but 3 gravitational wave searches were running.

The gamma ray search needs 1 CPU core and each gravitational wave search 1 core also.

But: Since yesterday some miracle happened and BOINC is doing well without me having changed anything. 

I checked how many tasks you had "In progress" and found that you now (at the time of this message) have 72 Gravity wave tasks in your cache. Based on previous run times these take about 7,8 hours each, so that's about 11,7 days worth of work running them on 2 cores. On the 8th you reported 2 Gravity wave tasks that took almost double the time to complete so I suspect Boinc thought it was in deadline trouble (14 day deadline) so prioritized the Gravity wave tasks.
When enough tasks had been completed and the estimates had returned to a more correct value Boinc started following your preferences again.

To avoid this in the future you might consider lowering your cache settings a bit.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109394616719
RAC: 35811408

Holmis wrote:... I suspect

Holmis wrote:

... I suspect Boinc thought it was in deadline trouble (14 day deadline) so prioritized the Gravity wave tasks.
When enough tasks had been completed and the estimates had returned to a more correct value Boinc started following your preferences again.

To avoid this in the future you might consider lowering your cache settings a bit.

That's a very good analysis of the problem.  I'd be most surprised if it turned out to be incorrect.  If (rightly or wrongly) the boinc client thinks there's a deadline problem with CPU tasks and goes into panic mode, it will use all available CPU cores, even the one supposedly 'reserved' for GPU support.  The GPU task may suffer but the deadline issue may be averted by doing this.  The solution is to avoid panic mode in the first place.  Reducing the work cache setting to a small enough value is the simplest thing to do.

There are other aspects the OP needs to be aware of.  Firstly, BOINC fetches CPU work based on the allowed cores, 3 out of 4 in this case.  In normal circumstances, only 2 cores will be allowed to crunch CPU tasks when there is a GPU task running.  So there will always be a tendency to over-fetch CPU work (fetch for 3 but crunch with 2).

Secondly, because there is a single (per project) duration correction factor (DCF) which this project uses, run time estimates for all types of E@H tasks will be affected if any single task is faster or slower than its own particular estimate.  If a particular task happens to take a lot longer than its estimate, the estimates of ALL tasks in the cache will be fully increased to a new increased estimate in one single step.  This can very easily push BOINC into panic mode the moment the task completes.

Both of the above can be protected against by keeping the work cache small.  The over-fetching tendency can be removed completely by different preferences.  It involves setting the allowed cores to 50% and changing the default reservation of CPU cores required for GPU support to less than the default whole core.  It's not a problem to change the CPU reservation for GPU support since the 50% cores setting has already guaranteed there will be support available when needed.

An app_config.xml file is needed to do this.  If the OP is not sure what to do, I could easily post a file that would work correctly.  By using this mechanism, BOINC will only fetch work for 2 cores and will be less likely to go into panic mode at some point in the future.  If the OP doesn't want to use the app_config.xml mechanism, it's very important to use a cache size that will most likely avoid the possibility of panic mode, e.g.  1-2 days should be good.  As mentioned previously, this is the simplest action to take.

 

Cheers,
Gary.

alintope
alintope
Joined: 27 Jan 12
Posts: 52
Credit: 295551180
RAC: 0

Thank you very much for your

Thank you very much for your offer writing an appropriate app_config.xml file for me, Gary. I would greatly appreciate it. Could you please send it via private message?

I didn't know about BOINC's "panic mode" before. Thank you for this information. As a first step I reduced the work cache as you recommended.

Heinrich

 

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109394616719
RAC: 35811408

Sorry about not getting back

Sorry about not getting back to you sooner - busy day :-).

alintope wrote:
Thank you very much for your offer writing an appropriate app_config.xml file for me, Gary. I would greatly appreciate it. Could you please send it via private message?

I have two reasons for not using PMs for giving assistance.  Forum help is of general interest and we all can get things wrong.  Help should be subject to peer review so that incorrect stuff can immediately be challenged.  It's much better for corrections to be made before the person being helped gets stuffed around with something incorrect or incomplete.

The other reason is that many people pick up stuff they weren't aware of by simply browsing without asking specific questions.  I want to encourage that as much as possible.

Here is an app_config.xml that should achieve your requirements - 3 out of 4 cores actually involved with crunching and two CPU tasks plus 1 GPU task running concurrently.  You should copy and paste the following code into a plain text editor of your choice and save it in the Einstein project directory using the precise name app_config.xml.   You can do this any time, even while BOINC is running and then use the 'reread config files' option in BOINC Manager - advanced view to have the client read the new file.  It will always be found on any subsequent BOINC startup.

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>1</gpu_usage>
            <cpu_usage>0.5</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

Before you tell the client to re-read this file, make sure you have set the cores BOINC is allowed to use to 50% (from 75%).  This will temporarily cause one CPU task to suspend because BOINC can only use two cores and one is already tied up with GPU support.  As soon as you click 'reread' it will resume because GPU support no longer demands a complete core.  Everything will be the same as before except that you probably won't download CPU tasks for a bit.  BOINC will now know that it is only fetching enough CPU tasks for two cores to crunch.

At any time in the future, if you wanted to see what crunching two GPU tasks would be like, you could just change <gpu_usage> from 1 to 0.5 (and click 'reread') without changing anything else.  You would see a 2nd GPU task immediately start up and one of the two CPU tasks would be suspended.  That CPU task would continue from where it was suspended once the other CPU task had completed.  Your host would immediately download more GPU tasks to fill the cache so don't have too large a cache setting at the time if you decide to do that :-).  The run time estimate doesn't change until a task is completed under the new conditions so there is a temporary GPU over-fetch condition likely until the new estimate is produced.

I'm not advocating you do any of these further changes.  I'm just completing the story for anyone else who might be reading and interested in controlling things this way.  If you are happy with your current arrangements and don't intend to do future experiments or optimisations, it should be quite fine not to use app_config.xml now that you have lowered your work cache setting.

alintope wrote:
I didn't know about BOINC's "panic mode" before. Thank you for this information. As a first step I reduced the work cache as you recommended.

The official name (I think) is high priority mode.   The problem is that it often does considerable collateral damage to an otherwise stable crunching environment when stupidly large crunch times sometimes get returned for reasons other than a true change in the correct crunch time.   One of the problems here is that different apps sometimes have crunch time estimate errors in completely opposite directions which can make the single DCF (duration correction factor) oscillate in a rather unstable manner.  A small cache setting seems to be the best way to cope with this.

 

Cheers,
Gary.

alintope
alintope
Joined: 27 Jan 12
Posts: 52
Credit: 295551180
RAC: 0

Thank you, Gary. With the

Thank you, Gary. With the help of your app_config file and the very comprehensive explanations my machines are crunching now without using the fourth core any more.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.