Avoid clogging BOINC by suspending instead of "Waiting for memory"

Robert
Robert
Joined: 28 Jan 16
Posts: 1
Credit: 70925
RAC: 0
Topic 198481

My suggestion is simple yet serious - suspend instead of "Waiting for memory" or diminish RAM per task.

The E@H project is creating tasks with a high amount of reserved RAM (400MB for more than 24h). It's highly likely that during this time the BOINC user will start using the computer and the tasks will suspend. After unsuspending (e.g. when CPU usage of non-boinc processes drops to normal) BOINC will try to reserve RAM for those unfinished tasks. If it will fail, BOINC will set the tasks to "Waiting for memory" and will not use the cores assigned to this task. This state can go on indefinitely, because the necessary amount of RAM doesn't free up. As a result BOINC will lose a core for every task like that.

Here's an example - My computer has 8GB ram, 8 virtual cores and 2 GPU-s, so I usually run 10 tasks simultaneously. Because I don't keep my computer on when I don't use it I set BOINC to work while it's in use and allow BOINC to use 25% RAM during that time (2GB). For the last week i have had only 6 tasks progressing at any given time while the other 4 had the "Waiting for memory" status. I had to abort those 4 E@H tasks, but soon 2 more E@H tasks started "Waiting for memory".
I understand if the tasks can't be broken up into smaller pieces, but they should be suspended if no memory is available for them.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7055284931
RAC: 1619631

Avoid clogging BOINC by suspending instead of "Waiting for memor

Why do you restrict BOINC to such a small portion of your available RAM? Have you done any actual comparison of running with a lesser restriction to confirm this gives you a valued improvement?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109938589467
RAC: 31390507

I agree with archae86. If

I agree with archae86.

If you set the RAM use to say 75%, things would probably work better. Also, it shouldn't be necessary to use the 'suspend work when computer is in use' setting or 'suspend work when non-BOINC CPU use exceeds xx%' setting because your computer should work well without them if you allow it to use more RAM. I could be wrong but from what you say you would appear to be using one or both of those. You should try without and see how it performs with more RAM.

Remember that you do have virtual memory so that if higher priority jobs need memory your tasks can be swapped out. BOINC is designed to run in the background and give way to your normal work. If you use the setting to 'leave jobs in memory when suspended' you wont lose any part of the accumulated crunching progress.

You mention 2 GPUs but you seem to have work only for the 960M at the moment. If you do end up with work for the Intel GPU as well, please realise that others find there really needs to be a 'free' CPU core to support those particular tasks. Otherwise your machine might really struggle. Maybe you have seen that before. The best way to protect from this would probably be to set BOINC to use only 87% (or even 75%) of the available cores. That can make a tremendous difference to the performance of the machine when running tasks on the Intel GPU.

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Robert hi, I see this post in

Robert hi, I see this post in boinc forum

Can you be specific about what you see happening?

Does the behaviour change if you suspend/abort a "waiting for memory" task?

The memory required for FGRBP1 is 450MB which is much larger than other E@H apps BRP4G, BRP6 and OAS1 which are under 128MB.

Perhaps just change the preferences to avoid this app?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109938589467
RAC: 31390507

Robert, I'd like to

Robert,

I'd like to respond to what you said over on the BOINC boards and I'll do so here because your request is really about the resources needed by Einstein tasks. I'll try to give you an answer for each of your points by reproducing them here and then commenting. I declare right up front that I regard my knowledge of the deeper workings of modern multi-tasking operating systems as being meagre at best but I think I can give an explanation in basic terms that might make some sense. If there are more knowledgeable people reading this, I invite them to correct anything that needs correcting.

Quote:
Problem:
BOINC task status "Waiting for memory" does not free CPU cores, so they can sit idle indefinitely on systems where there is always less free RAM than the task demands (e.g. 400MB for E@H) and smaller tasks don't get computed because of this.


This is not a BOINC problem or one that BOINC could address. BOINC schedules the order in which tasks may be run and how many it would like to run and is restricted by the preferences you set. BOINC does not take over the job of the OS - ie to decide from the available 'ready to run' tasks which ones actually get to run and for when they start/stop and the timing of all that. So BOINC doesn't have any ability to 'free CPU cores' because it's the OS's job to handle all that. If BOINC has been overly restricted it can't even pass a job to the OS, even if the OS has available cores.

Because the OS is quite clever, you don't really need to worry about trying to manage resource allocation for it. If you tell BOINC it can use sufficient memory, it will be able to present all the jobs you want run and then the OS can decide how best to run them. If a non-BOINC job with higher priority needs to run, the OS will handle all that too.

Quote:
Cause:
The E@H project is creating big tasks with a high amount of reserved RAM (400MB for more than 24h). It's highly likely that during this time BOINC will get suspended at least once and after unsuspending there will be not enough free/shared RAM for them and they will start "Waiting for memory". Instead of reassigning CPU-s for other tasks BOINC will not use the cores.


The OS can allocate and free memory quite dynamically when needed. Nothing is ever reserved "for more than 24h." If there is sufficient RAM, the images of running tasks may remain 'memory resident' and the OS will switch CPU resources very quickly to service those tasks as it sees fit. If there is insufficient physical memory, task images can be swapped to virtual memory. It doesn't matter when/if BOINC is suspended. Even if all physical/virtual memory were exhausted and the OS had to kill tasks, very little would be lost because tasks can restart from 'checkpoints' saved on disk.

The "cause" of the unacceptable behaviour you point to is most likely the severe memory restriction you have imposed. To test this, why not ease the restriction and see what happens. Also try allowing BOINC to use only 75% of cores. If that worked well you could experiment with other values to find the best setting.

Quote:
Example:
This state has been going on for about a week on my laptop until I aborted the tasks. I usually run 10 tasks simultaneously (8 cores + 2 GPU, 8GB RAM with 25% limit for BOINC during use) but only 6 progressed at any point during that week. The other 4 were under E@H tasks. Soon after aborting them 2 more E@H tasks started "Waiting for memory". I'll suspend E@H until it gets fixed.


Some things for you to consider

  • * The behaviour you see as problematic is not something either BOINC or the project can 'fix', only you.
    * Your machine is really a quad core. It has 8 virtual cores due to HT. It will run more slowly with greater chances of unstable behaviour if you run ~100% loading on all 8 cores.
    * GPU tasks need some CPU cycles for 'support'.
    * Gamma-ray pulsar tasks need to be the 'size' they are for the run to be feasible to attempt in the first place. They can't/won't be made 'smaller'.
    * If BOINC interferes with non-BOINC use, it would be far better to restrict BOINC to using only 50% of cores rather than just 2GB RAM.
    * If you stop over-restricting BOINC and allow the OS to do its job, a much better (possibly a compromise) solution could be found.

Cheers,
Gary.

Pete(r) van der Spoel
Pete(r) van der...
Joined: 10 Sep 06
Posts: 1
Credit: 106529016
RAC: 137

Hi Robert, Is it possible

Hi Robert,

Is it possible you (or a moderator) change the misleading title of this thread? It's all over google with generic searches like "boinc waiting for memory".

Thanks, Pete

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.