processor sharing

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109410027856
RAC: 35089717

Thanks for all the details. 

Thanks for all the details.  I just need to clarify what you mean by the following  Once I understand that, I'll give a proper response.

Quote:
Concurrency settings: I've played with them a lot. For now, I see a lot of value in limiting all projects to a maximum of four threads.

Does that mean you are using an app_config.xml file at every single project?  If not, how are you limiting all projects to 4 threads?  By imposing limits, you are actually fighting against BOINC's internal mechanism for honouring your resource shares.  For example, after an outage, Seti would need a lot of 'catching up'.  If you impose a limit, you are preventing BOINC from allowing Seti to catch up as quickly as possible.

For a second example, think about what happens when Seti can't supply work.  If you have <max_concurrent> of 4 for both Rosetta and Einstein, guess what will happen?  Yep, 4 running tasks for each.  And in your comments you seemed to be dismayed by the fact that 4 Einstein tasks were running?  You must recognise that you forced BOINC to do that.  BOINC is designed to keep all available threads occupied if possible.  That's just what it is doing.

Without the restrictions, BOINC would have had more Rosetta and less Einstein running and eventually, when Seti supplied work, BOINC would have launched 8 Seti threads to allow the catch-up to happen as quickly as possible.  From what you describe, you seem to be hindering rather than helping BOINC to reach the state you wish to achieve.

Before seeing the above quote, I thought you were only limiting Einstein.  App_config files can be tricky.  If you are using this mechanism (and if you intend to keep using it), I would really like to be sure of what you have inserted in each one.  Can you paste a copy of each into your next reply so I'm sure of what is going on, thanks.

If you are willing to accept that the best way forward is to remove all these restrictions and to stop trying to force BOINC to conform, then there's no need to post any copies of files.  My recommended course of action will be to get rid of all app_config files and just use resource shares to guide BOINC .   There will be times when you think BOINC is doing the wrong thing.  That won't be the case and you will just need to trust that BOINC will deliver in the longer term.

Cheers,
Gary.

Garry
Garry
Joined: 20 Feb 05
Posts: 20
Credit: 1011953
RAC: 0

Thanks again, Gary. Right

Thanks again, Gary.

Right now, my concurrency limits are 6 for SETI and 5 for Rosetta and Einstein. BOINC is giving SETI 5 threads; the other projects are sharing the remaining 3 threads.

As with the "at least" limits in "computing preferences", perhaps you and I interpret the concurrency limits differently. I perceive I'm setting maximums; I'm not letting the scheduler use more threads, but it's welcome to use fewer.

Past experience seems to indicate the scheduler data doesn't know about concurrency limits. If that's true, the following process will be useful.

I think the goal long-term allocation is 3:2:2 among the three projects, with each of them getting three processors some of the time. That's the closest I can get to 2.67:2.67:2.67, right? I could set the limits to 3:3:3. That's the maximum any project needs, presuming a continuous supply of work. It would force the scheduler to assign two threads to each project and ask it to assign two remaining threads to the projects most needing to catch up. (Ah. We already know something about the scheduler data!) Sharing will better than without the limits during the transition this way and each project will get a steadier supply of work. When two projects start varying between 2 and 3 threads, those two are in balance. When all three projects join that pattern, we've achieved balance. We can test that by raising all limits to 4. If the scheduler continues making the same decisions, that's an indication we've achieved balance.

I can set the limits to 4:4:4, which better (probably best) accounts for the possibility of one project not contributing data for a while; the other two could keep all threads full. With smooth arrival of data, the scheduler data would achieve balance earlier than 3:3:3 at the expense of more variability in workflows. With unreliable arrival of data and presuming the other two projects were reliable, 4:4:4 would tend strongly toward the balance of those two, but at the unavoidable expense of the balance with the project not sending data. On the other hand, when that project sends data, the scheduler should first allocate 4:2:2 and later achieve full balance.

I can set no limits to create the fastest progress toward balance and also the greatest fluctuations in work to the projects. My current settings are higher than 4:4:4; the longer the scheduler makes all the decisions right, the more likely I am to release the constraints.

Or maybe my process won't work. If you know it won't, I look forward to knowing it; if you've made that argument, I haven't recognized it yet (sorry).

This would be so much easier if the tasks would come in steadily! :) I have a couple more days this week.

Thanks again for your help.

Garry
Garry
Joined: 20 Feb 05
Posts: 20
Credit: 1011953
RAC: 0

I got one batch of 6 SETI

I got one batch of 6 SETI tasks today (24 hours of work). The scheduler burned them off in a flash.

Rosetta inventory has been 6 tasks throughout the day; once or twice it was at 7 if another task was close to complete.

The Einstein inventory continues to burn down. Only two tasks remain. Maybe I won't get more tasks until completing both of them.

Data arrival is not consistent enough to achieve any balance. I give. I'll give the scheduler a full chance.

As discussed, my resource ratios are 30:10:3 SETI:Rosetta:Einstein. I've removed all concurrency constraints. It'll be nice if that changes task arrival behavior.

The current inventory is 6 Rosetta and 2 Einstein. Predictably, the first scheduler decision was to run all of them. Three tasks (all Rosetta) will complete in the first 5 hours of work.

Garry
Garry
Joined: 20 Feb 05
Posts: 20
Credit: 1011953
RAC: 0

Interesting discovery, though

Interesting discovery, though I'm not confident it's authoritative.

https://boinc.berkeley.edu/wiki/How_BOINC_works

At the bottom of the page is wording and a diagram that seems fairly explicit that the scheduling function happens on servers, not clients. "In the cloud", by modern vocabulary.

The page hasn't been updated since mid-2013. So, maybe that's outdated.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6534
Credit: 284730859
RAC: 105773

That page is a good summary.

That page is a good summary. Bear in mind that scheduling normally occurs with regard to the user specified preferences in your account, as known to the project eg. for E@H one may exclude certain applications running on your client(s). It is the client that requests work from any given project(s)/server(s) though, and that request may take note of the contents of <HERE BE DRAGONS> various config files which may be created/manipulated by the user on the client side </HERE BE DRAGONS>. 

{ Personally I leave the settings at the default values so as to allow as much latitude by the project as possible, as I only crunch E@H. }

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Garry
Garry
Joined: 20 Feb 05
Posts: 20
Credit: 1011953
RAC: 0

Mike:   Hiya!Yup. And

Mike:   Hiya!

Yup. And that's a great decision for someone supporting only one project. Good on ya!

I found an 8-year-old scheduler description at https://boinc.berkeley.edu/trac/wiki/ClientSched. Maybe they haven't changed the code much since then. Dunno. I can't poke holes in the design; it looks good AFAIK.

It says the scheduler tries to balance work credit. That's not consistent with my experience.

New perception; check my thinking: If the scheduler balances work credit and Einstein grants 10 times the work credit, it seems like Einstein would get 1/10 the time.

In experience, Einstein gets more like 10 times the time. A puzzle.

Here's a 3-year-old reference to my problem. https://setiathome.berkeley.edu/forum_thread.php?id=81592&postid=1873558. Briefly, he says some projects send very large amounts of work without regard to the scheduler request.

I may be experiencing that with Einstein. All projects tend to send in batches of 6 tasks; there are exceptions. SETI tends to send tasks of 4 hours. Rosetta, 8 hours (because they have a setting that let me set it to 6 hours). Einstein, 28 hours (a week of processing per batch) with a 3-week deadline! That would require more than half of the time I contribute to BOINC in that period.

The scheduler could accept such a batch periodically with appropriate periods between, but I haven't witnessed that behavior. It finishes one batch, gets another, and Einstein gets far more than I want to contribute. Sigh.

Maybe the half-life degradation of scheduler data is a good technique. The 10-day setting may be too small for the scheduler to accurately compensate for the environment I'm experiencing. It needs to remember excesses and deficits longer. (At ten days, it has forgotten half of any excess or deficit within 10 days. Geez, my bid Einstein batches are still processing at that point!) That's an available setting in the config files; the default is 10.

I could <HERE BE DRAGONS>manually intervene</HERE BE DRAGONS>. I dowanna. I get a great deal of accurate advice it is ill-advised.

I found a reference to a scheduler simulator (it uses the scheduler code to simulate the scheduler) that researchers can use to study the scheduler's decisions. Clearly someone has looked carefully. I doubt the scheduler is a problem.

Maybe there's a project setting somewhere that establishes the minimum size of a task. It's in a project's self-interest in one way to set it high because that reduces their use of bandwidth. Too high, though, and they're discouraging the lesser-contributing participants from contributing (or they're effectively commandeering those participants' BOINC usage; that's an unreasonably negative spin). If those participants represent a small enough part of the work they get, perhaps they're willing. Or should!

For the last month or so, I have been watching scheduler decisions carefully. Absent the expectation that projects will provide a smooth flow of work (unreasonable for SETI right now if only because of their weekly day of maintenance; Einstein is smooth but of the wrong size), there is no smooth flow to achieve. SETI can't get the time I want to contribute because I don't get enough work, not because of scheduler decisions. The other projects I contribute to benefit.

Maybe the best I can do is set the resource shares such that the project balance on average. That's taking some experimentation. Maybe it requires the collection of long-term data sets, which I'm unlikely to do. Sigh.

Please forgive me for remaining typos. I hope the won't excessively obstruct meaning.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6534
Credit: 284730859
RAC: 105773

Ah well < glimpse the full

Ah well <glimpse the full horror now / burn after reading> I guess it may be time to reveal to you that E@H has it's own version of scheduling called 'locality scheduling'. E@H has it's very own fork on the BOINC server code/behaviour tree. This does not affect long term user decisions of project balance nor depend upon BOINC client code version. This roughly means that the decision of what work to send to a client depends on what data the client already has, implying that the server keeps a Ruddy Great Data Base of who was given what. The intent is to keep network bandwidth down because many places in the world still have only say, dial-up ISP @ 56K. So if there is work to be yet done on a data set already held by a client then that is to be preferred to a whole new download of data. That makes data downloads lumpy but infrequent while the work done on the lump is a steady progression.

Also the project can match clients of similiar mojo to be placed in the same quora ....... <further down that rabbit hole> .... fully leveraging all that is known by the project about all of it's current client set.

There is no balancing of work credits b/w projects : E@H management simply assigns credit per work unit and that is that. A fiat currency no less. There was a time long ago - no one alive remembers the meaning of the mystic runes - when the awarded credit took account of what the client said it did. This became food for a new race of trolls who had written their own client code in order to ( ahem ) 'optimise' the claimed credit. Yes indeedy. All that effort to obtain credit that you can't even swap for a hamburger. They're not Frequent Flyer Points or anything. We still whisper about the Era Of Optimised Clients even today.

{ If you look at the BOINC Manager Event Log ( Tools->Event Log dropdown menu option ) then you will see filenames sent and received with portions of those names in the form XXXX.XX, these being physical frequencies of relevance to the type of signal being analysed. The detail is arcane but is dependent upon which application program is being used for said data. }

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109410027856
RAC: 35089717

Hi Garry, Right back in your

Hi Garry,
Right back in your opening post, you said:

Quote:
Some projects are more "generous" with "credit" than others. Einstein may be more generous than the other two.  I tried to balance among projects by setting resource share in inverse proportion to credit.

You also said that you wanted to balance "time".   You didn't say "equal time" so I understood that "balance" meant "adjust" so that the other projects would have more time and Einstein would have less time and so the credit awards would be closer to each other.  Your use of the term "inverse proportion to credit" implied that you wanted the 3 projects to end up having approximately equal RACs.  So can you please clarify for me, which of the following best describes what you want to achieve.

A.  All three projects should get as close to an equal share of the available time as possible.

B.  All three projects should have as close to the same RAC (credit per day) as possible.

I need to ask this because you now have made the comment,

Quote:
I think the goal long-term allocation is 3:2:2 among the three projects, with each of them getting three processors some of the time. That's the closest I can get to 2.67:2.67:2.67, right?

which appears to be saying that you now want equal times (i.e. a third of your host's resources per project) rather than equal RACs.  If that's what you really want, you need to get rid of all concurrency limitations completely and simply set three equal resource shares.  BOINC will be able to do this quite easily as long as you fully understand a key element in the BOINC design - to honour resource shares in the longer term whilst dealing with outages/work shortages in the short term.

Whichever option you decide to go with, the best course of action is to set appropriate resource shares and stop restricting BOINC, i.e. remove concurrency limits.  Either set equal resource shares for equal time shares or set the 3000/1000/300 values (mentioned in an earlier message) if you want equal RACs.  Just set the values and then stop changing things - for at least a couple of weeks - when you will be astounded as to how well it's doing :-).

BOINC doesn't control based on estimated times, which can be quite wrong.  It tracks completed times and reacts to inequalities there.  It's doesn't try to keep equal time allocations on short time scales.  A day or two is a short time scale.  BOINC is designed to deal with outages in individual projects, which can be protracted.  BOINC takes a slow but careful approach to fixing an imbalance created by things like this.  You need to allow time for BOINC to achieve the desired outcome.  Trying to force things will just confuse BOINC and make things worse.  Each of your previous messages have mentioned drastic changes you have made.  It's not surprising BOINC is confused.

BOINC keeps records of what each project gets and works towards making sure that any deficiencies or excesses are corrected in a controlled manner.  In other words, continuing adverse circumstances might cause the correction to take weeks/months to achieve.  Every time you intervene in that process by adjusting parameters, you are highly likely to just make things considerably worse.

I started this reply yesterday.  Other things intervened and I wasn't able to finish it then.  I notice there have been further messages which I'll look at in detail and respond to later.  In the meantime, I strongly advise you to remove limits on concurrent tasks, set the resource shares you want and let BOINC get on with the job.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109410027856
RAC: 35089717

Garry wrote:Interesting

Garry wrote:

Interesting discovery, though I'm not confident it's authoritative.

https://boinc.berkeley.edu/wiki/How_BOINC_works

At the bottom of the page is wording and a diagram that seems fairly explicit that the scheduling function happens on servers, not clients. "In the cloud", by modern vocabulary.

The page hasn't been updated since mid-2013. So, maybe that's outdated.

It's a simplified overview and it's not really "outdated" if for no other reason than the fact that BOINC is still trying to meet the same basic goals that it always had and so the information is still relevant.

The section you linked is headed "General Information" and you're already way beyond that in the way you are attempting to run BOINC.  There are subsequent sections headed "Running BOINC: Basic" followed by "Running BOINC: Advanced" and you're already beyond those as well!! ;-).  You should start at this page which lists those headings I've just mentioned and follow progressively, all the links there that deal with aspects of running BOINC that you don't fully understand.  Once you have that under your belt, please pick any single aspect that still puzzles you and ask a question about anything that is unclear.

You seem surprised that "scheduling function happens on servers, not clients."  Your BOINC client only knows what is on your machine already and cannot know about task availability on particular servers until after a response to a request.  The scheduler at each project decides what to send in accord with the preferences you have set.  If you change preferences on a website, your client will not know about the change until an exchange between client and server results in that changed information reaching the client.  With multiple projects, a change made at one website may take a while to be fully propagated to all projects via the client/server interactions that happen during work requests.

A client will request work when it sees a shortfall in estimated run times compared to what your work cache size specifies.  Under routine circumstances, that shortfall is likely to be quite small, perhaps of the order of seconds to minutes (say 90 secs - just an example).  The client would then make a work request for 90 secs to the 'most deserving' project according to its view of your current resource shares.  In your case the client will ask for CPU work from one project but not for any specific search.  The server at that project will know which searches you have selected and what tasks it has available and will choose something to send that has at least 90 seconds of estimated content for an 'allowed' search.  It has to send a full task, which could be a zillion times greater than 90 secs if tasks have a large time estimate - hence the 'at least' terminology.

The server does NOT deliberately send multiple tasks in this situation unless there is a precise reason for it.  Having multiple 'out-of-work' threads with a preferred project unable to supply tasks would be a common precise reason.  Another precise reason would be having an artificial limit on the number of threads a preferred project can use.  In that case, the client might download extra tasks from a non preferred project just to ensure all threads remain active.

BOINC is designed to handle all these situations.  It is completely unnecessary and quite counter-productive to set limits or to try to force BOINC to behave in some restricted way.  Hence the recommendation to keep a low cache size, remove artificial restrictions and allow BOINC to do its job.  There is nothing particularly special about your project mix that would require extra intervention.  Despite an unreliable project, BOINC will be able to cope quite adequately on its own.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109410027856
RAC: 35089717

Garry wrote:It says the

Garry wrote:
It says the scheduler tries to balance work credit.

No it doesn't.  If you search that page, 'credit' is used once and the context is "if you miss the deadline you might not get credit".  On the other hand, 'time' is used 9 times and if you check the context then it's clear that the client tries to allocate crunch time in accord with your resource shares.

Garry wrote:

New perception; check my thinking: If the scheduler balances work credit and Einstein grants 10 times the work credit, it seems like Einstein would get 1/10 the time.

In experience, Einstein gets more like 10 times the time. A puzzle.

You need to ditch all of that thinking.  If not otherwise constrained, your client attempts to request work from a specific scheduler simply to keep the work cache full.  It chooses the actual project to ask based on which project needs more work to make up for any deficiency in crunch time.  It's nothing to do with credit.  Things will start to go 'off-track' if the selected project can't supply when asked.  BOINC is quite able to get 'back on track' if allowed.  That's the key thinking you should focus on.

Garry wrote:
Here's a 3-year-old reference to my problem. https://setiathome.berkeley.edu/forum_thread.php?id=81592&postid=1873558. Briefly, he says some projects send very large amounts of work without regard to the scheduler request.

I'm glad you brought this up.  It's a classic example of how to misinterpret a situation.  Without knowing anything about the true details (there is just a blanket statement about a project forcing non-requested work - no attempt to provide proper details) I'd be willing to wager that the true situation was that the work cache size was large, Seti was out of work so the client (in order to keep the cache full) was forced to download a lot of work from the non-preferred project - or something along those lines.

Projects do not and cannot force extra non-requested work on the client.  The client always has to ask first.  Work out why the client is asking and the problem can be addressed.  The 'blame' is almost entirely two-fold.  (1) The preferred project is unreliable in it's ability to supply work when asked.  (2) The work cache size has been set at a large value in the belief that this will help extract more work from the preferred (but unreliable) project.  It should not be a surprise when the non-favoured project is approached to fill the gap left by an unreliable project that fails to supply when asked.  BOINC is compelled by design to do just that.

Garry wrote:

I may be experiencing that with Einstein. All projects tend to send in batches of 6 tasks; there are exceptions. SETI tends to send tasks of 4 hours. Rosetta, 8 hours (because they have a setting that let me set it to 6 hours). Einstein, 28 hours (a week of processing per batch) with a 3-week deadline! That would require more than half of the time I contribute to BOINC in that period.

The scheduler could accept such a batch periodically with appropriate periods between, but I haven't witnessed that behavior. It finishes one batch, gets another, and Einstein gets far more than I want to contribute. Sigh.

Firstly, the deadline is 2 weeks and not 3.  Secondly, since you have 8 threads, BOINC could get rid of 6 tasks in 28 hrs (your figures - not a week) if it really needed to.  The key thing for you to realise is that by setting unneeded restrictions, you have probably contributed heavily to this behaviour.  You are always going to have 8 non-Seti threads in some combination when Seti is out of work.  You have to trust that BOINC (on its own) will be able to rectify the imbalance by running 8 Seti threads when Seti has work.  Effectively, you are preventing Seti from recovering when it has work, so the situation is likely to get worse each cycle.

This is not the same as the previous example you cite.  The cause is completely different because you have a very small work cache size.  That means there are strict limits (a maximum of 1 task per thread) on the number of Einstein tasks that can be requested.  It has to be due to inappropriate use of concurrency limits that you keep adjusting before having time to see if there has been any benefit from the previous change.  You need to get rid of these completely and allow BOINC to work properly.

Follow the advice I've given you multiple times now.  You will be able to achieve what you want by allowing BOINC to decide how best to get there.  It may take a week or two to see that you are firmly heading in the right direction but it will happen if you allow it to.

 

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.