backup project: resource share problem

John McLeod VII

Moderator

Joined: 10 Nov 04

Posts: 547

Credit: 632255

RAC: 0

The key issue is that the

30 Jul 2005 1:07:05 UTC

Message 14692

(moderation:

)

The key issue is that the deferral took place instead of downloading more work. The more recent server code looks at the client version and does not do the resource share check for 4.45 clients (but not all projects have this in place). It will not download a WU if there is a WU on the system from that project and the new WU would not complete running full time in EDF.

BOINC WIKI

Jim Baize

Joined: 22 Jan 05

Posts: 116

Credit: 582144

RAC: 0

Ok, Gravy, I understand your

30 Jul 2005 4:06:18 UTC

Message 14693

(moderation:

)

Ok, Gravy, I understand your point of view. You are saying that using resource share to make project "A" a main project and project "B" a back up project is not a good practice. You want an option to define project A as primary and B as backup.

Using your system, by whose definition of a backup project are we going to use? How much time would a backup project be allowed? What about those who REALLY like project A and want to run it as a primary and they sort of like project B, so they want to run it also, but not as much as "A", but don't want to religate it to a low level backup project?

@JM7
I didn't follow what you were trying to say. the information just seemed to hit a block wall and didn't want to filter in as a comprehensible idea. I'm sure it is just me not properly interpreting what you were saying. :( sorry.

Jim

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: Using your system, by

30 Jul 2005 12:55:50 UTC

Message 14694 in response to message 14693

(moderation:

)

Quote:

Using your system, by whose definition of a backup project are we going to use? How much time would a backup project be allowed? What about those who REALLY like project A and want to run it as a primary and they sort of like project B, so they want to run it also, but not as much as "A", but don't want to religate it to a low level backup project?

please look at my suggestuin again, I'd hoped it was clear. It is in this wish list thread which I have cross linked from this thread already, together with comments from other users. To re-phrase what I said there:

The user interface and databases for project specific preferences are augmented to include one more option, project priority, default value = 1. The notes next to this option make it clear that the user leaves a vaule of 1 for one or more main projects, and values greater than 1 for backup projects.

How it works for users is that they put in only priority 1 work (ie leave the default setting) if they do not want backup projects. If they do want backup projects they decide which project is to be the backup, by putting a '2' in the setting for the first level of backup, '3' for the next level, and so on till there is no project left that they want to support. They can have many projects at a level, or just one project at that level.

For myself, by way of example, I'd choose

Pri 1 = CPDN & Orbit, resource share 70 and 30

Pri 2 = Einstein & LHC, resource shares 50 and 50

Pri 3 = Predictor, resource share 100

Pri 4 = SETI, resource share 100

meaning that CPDN & Orbit are run concurrently when both are up, according to resource share, and that when just one is up, it runs alone. When either of these two is providing work, my client will not look at the pri 2+ prjects at all.

When neither CPDN nor Orbit are up, both Einstein & LHC run.

When none of those four is up, Predictor downloads wu. But often, in practice, Predictor refuses work as it is committed to faster machines than mine, in which case my client would go to SETI to get the work.

ONCE DOWNLOADED, work is crunched according to resource share, as at present, without consideration of priorities. Priorities are download priorities only.

If for example, Einstein and LHC have been running, and then Orbit comes back online and downloads work, for a short while (till the pri 2 work is completed) then Einstein would get 50/130 of the time, LHC 50/30, and Orbit 30/130. But once the pri 2 work was complete, Orbit would run alone until CPDN came back.

Who decides all those questions of priorities and shares between projects is the donor, who is donating his/her machine time. BOINC simply provides the interface to allow then to do that, and there does seem to be a user demand for the ability to do this.

From the client's point of view, this is how it works:

The client attempts to download work for projects of priority 1, and if there are several of these the time is divided up according to resource share, as at present.

When there is less than the connect interval worth of work left in hand, the client attempts to download work from some or all priority 1 projects. If successful, no attempt is made to download from pririoty 2+, and no adjustment is made to the long- or short- term debt of thse projects.

Only if it gets no work from all priority 1 projects does it go on to try to get work from priority 2 projects. Only if all these fail does it go on to priority 3 projects. Once it hits a priority level that has work, it fills up with the desired amount of work at that priority level, and goes no further.

Work is next requested when the work held locally is below the connect interval, as at present. There is no change to the code that decides when to ask for work, only to the code that decides which project to ask.

Once again, even tho work is being crunched for level 2 projects, when the next requests for work are made they start at priority 1. If work is now forthcoming at pri 1, no more work is asked for at other levels.

Work already in hand continues to be crunched as per resource shares, not by priorities. Resource shares would all be in a sensible ball park, but even if they were not, EDF mode could step in to prevent work being lost. Once downloaded the client is committed to trying to complete that work. All of this is just as at present. No change then to the code that controls pre-emption/resumption of work.

From the point of view of coding, there is a small addition to the user interface page, an extra column in a database table, and a small amount of new code in just one place within the client, ie once the client is seeking work it now needs to decide which projects to ask on the basis of priority settings. As far as I can see (without having inside knowledge of the code) there would be no need for any new code elsewhere

I hope I have made all the angles clear - if anything still remains unclear please keep asking, I appreciate your continued interest Jim

~~gravywavy

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: The key issue is that

30 Jul 2005 14:17:17 UTC

Message 14695 in response to message 14692

(moderation:

)

Quote:

The key issue is that the deferral took place instead of downloading more work. The more recent server code looks at the client version and does not do the resource share check for 4.45 clients (but not all projects have this in place).

which projects are you suggestiong have this code, John? Because Einstein, LHC and Predictor all go into deferral mode.

And has this method of delivering backup project capability been tested on any project, or is it just a good guess at what the coders hope will happen? Which projects can I test it on so I can see it working for myslef?

Quote:

It will not download a WU if there is a WU on the system from that project and the new WU would not complete running full time in EDF.

That would certainly avoid the issue I have raised here: It is still, I'd respectfully suggest, a weaker solution than mine.

If I understand you correctly, that code creates the opposite danger that two or more projects will download work on the assumption that both/all will get 100% of the time for the next week, resulting in missed deadlines.

My solution avoids that, so long as the resource shares add up to something that is still possible. This illustrates well my basic point that trying to deliver functionality through the back door is inherently more prone to unwanted side effects than coding it explicitly.

Coding it explicitly is easier for the user to understand, and easier for the next round of programmers to understand and therefore less likely to get broken again in the future.

~~gravywavy

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 681783

RAC: 775

~~gravywavy I like your

30 Jul 2005 17:03:30 UTC

Message 14696

(moderation:

)

~~gravywavy I like your proposal, but there are a couple of problems that i can see with it. I am all with you until the client have found work and downloaded it. With keeping in mind that all downloaded work must be returned before deadline, hear are some thoughts.

We would need to have separate LTD and STD for all priority levels so we can keep track of work within the priority level.

If we were to crunch per resource shares without priority we would also need a LTD and STD to balance the priority levels. This would be very problematic and a fair way to do this isn't possible because it is conflicting with how the priority works. A much simpler way to do this is to crunch work according to priority and let the EDF mode flush out work with lower priority.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

John McLeod VII

Moderator

Joined: 10 Nov 04

Posts: 547

Credit: 632255

RAC: 0

RE: RE: The key issue is

30 Jul 2005 18:22:55 UTC

Message 14697 in response to message 14695

(moderation:

)

Quote:

Quote:
The key issue is that the deferral took place instead of downloading more work. The more recent server code looks at the client version and does not do the resource share check for 4.45 clients (but not all projects have this in place).

which projects are you suggestiong have this code, John? Because Einstein, LHC and Predictor all go into deferral mode.

And has this method of delivering backup project capability been tested on any project, or is it just a good guess at what the coders hope will happen? Which projects can I test it on so I can see it working for myslef?

Quote:

It will not download a WU if there is a WU on the system from that project and the new WU would not complete running full time in EDF.

That would certainly avoid the issue I have raised here: It is still, I'd respectfully suggest, a weaker solution than mine.

If I understand you correctly, that code creates the opposite danger that two or more projects will download work on the assumption that both/all will get 100% of the time for the next week, resulting in missed deadlines.

My solution avoids that, so long as the resource shares add up to something that is still possible. This illustrates well my basic point that trying to deliver functionality through the back door is inherently more prone to unwanted side effects than coding it explicitly.

Coding it explicitly is easier for the user to understand, and easier for the next round of programmers to understand and therefore less likely to get broken again in the future.

IMHO the priority option that you want will be impossible to get past Berkeley. They want to simplify if they can instead of adding options.

The client side will not request work from anywhere if there is a CPU scheduler problem. The exception is if there is not enough work to fill your queue (this is to allow modem users to get enough work). In order to get this to work, any request for work should not be filled unless the deadeline is at least 2* the connect every X setting. The server should also check to see that you have enough slack time to get the work done.

BOINC WIKI

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: ...We would need to

30 Jul 2005 18:24:01 UTC

Message 14698 in response to message 14696

(moderation:

)

Quote:

...We would need to have separate LTD and STD for all priority levels so we can keep track of work within the priority level.

One STD figure per project, as now, and one LTD figure per project, as now. It does not need to separate these figures as the project determines the priority.

You may well be right that work done on pri 2 projects should only affect the debt levels of pri 2 projects, so that it is not so muck a matter of keeping more figures but of processing them slightly differently. I don't claim to understand the debt system in enough detail to comment.

Quote:

If we were to crunch per resource shares without priority we would also need a LTD and STD to balance the priority levels. This would be very problematic and a fair way to do this isn't possible because it is conflicting with how the priority works. A much simpler way to do this is to crunch work according to priority and let the EDF mode flush out work with lower priority.

Do I understand correctly that under the current system, STD is used for crunch control (pre-emption, resumption, etc), and LTD controls which project next gets work downloaded?

If I understand correctly, then under my suggestion, STD would control exactly the same as it does now. If this is unworkable then I'd certainly accept EDF.
Perhaps (for reasons I don't understand) it would be easier to code if the client always runs in EDF mode when there is mixed priority work held locally. This would not be a problem, as the situation only arises during recovery from an outage on the preferred project.

The decision of what work to download would be a two stage one, projects being sorted by priority and then by LTD. If work was available at pri 1, only the LTD for pri 1 projects qould be taken into account, and the work from the most deserving of these (in LTD terms) would be preferred. This seems fair to me.

If no work was available at pri 1, then pri 2 projects would be considered. If there were more than one of these, LTD would also determine which level 2 project was asked for work first. This means that after a large number of shortish breaks in the pri 1 work the shares of work given to the backup projects would come out as per the user's request.

~~gravywavy

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: IMHO the priority

30 Jul 2005 18:42:57 UTC

Message 14699 in response to message 14697

(moderation:

)

Quote:

IMHO the priority option that you want will be impossible to get past Berkeley. They want to simplify if they can instead of adding options.

I'd challenge the equation reduce=simplify.

When you want options to do two distinct things, it is, in my use of language, simpler to provide two options rather than one.

There is less confusion, both for maintenance of the code and for users if we avoid trying to make a few options do a lot of work.

Longer code can often be simpler, ie easier to understand, quicker to maintain, and easier to use.

I accept that you know Berkeley and I don't. If you say we can't get this idea past them I trust your judgment on that, but I think that is a pity.

Quote:

The client side will not request work from anywhere if there is a CPU scheduler problem. The exception is if there is not enough work to fill your queue (this is to allow modem users to get enough work). In order to get this to work, any request for work should not be filled unless the deadeline is at least 2* the connect every X setting. The server should also check to see that you have enough slack time to get the work done.

again, please point me to a project that currently runs the code as you describe it. Einstein, LHC, and Predictor schedulers all refuse work if there is one of their wu present, unless the project share is large enough to keep them happy. The deadline for all these is 7 or 14 days, and my connect interval for some of these tests has been 0.7, for others it has been the default 0.1. You will see that this is well less than half the deadline.

~~gravywavy

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 681783

RAC: 775

RE: RE: ...We would need

30 Jul 2005 19:56:55 UTC

Message 14700 in response to message 14698

(moderation:

)

Quote:

Quote:
...We would need to have separate LTD and STD for all priority levels so we can keep track of work within the priority level.

One STD figure per project, as now, and one LTD figure per project, as now. It does not need to separate these figures as the project determines the priority.

You may well be right that work done on pri 2 projects should only affect the debt levels of pri 2 projects, so that it is not so muck a matter of keeping more figures but of processing them slightly differently. I don't claim to understand the debt system in enough detail to comment.

Yes,that is exactly what i meant.

Quote:

Quote:

If we were to crunch per resource shares without priority we would also need a LTD and STD to balance the priority levels. This would be very problematic and a fair way to do this isn't possible because it is conflicting with how the priority works. A much simpler way to do this is to crunch work according to priority and let the EDF mode flush out work with lower priority.

Do I understand correctly that under the current system, STD is used for crunch control (pre-emption, resumption, etc), and LTD controls which project next gets work downloaded?

If I understand correctly, then under my suggestion, STD would control exactly the same as it does now. If this is unworkable then I'd certainly accept EDF.
Perhaps (for reasons I don't understand) it would be easier to code if the client always runs in EDF mode when there is mixed priority work held locally. This would not be a problem, as the situation only arises during recovery from an outage on the preferred project.

The decision of what work to download would be a two stage one, projects being sorted by priority and then by LTD. If work was available at pri 1, only the LTD for pri 1 projects qould be taken into account, and the work from the most deserving of these (in LTD terms) would be preferred. This seems fair to me.

If no work was available at pri 1, then pri 2 projects would be considered. If there were more than one of these, LTD would also determine which level 2 project was asked for work first. This means that after a large number of shortish breaks in the pri 1 work the shares of work given to the backup projects would come out as per the user's request.

I don't know exactly how LTD and STD work, but i understand the principle behind it, so a general discussion isn't a problem.

Since we are discussing a priority system hear, we must keep the LTD and STD separate for every level of priority. The debt numbers would be balanced separately on every level of priority. Now to compare the debt numbers for different levels would be like comparing apples to oranges.
You could set up a debt system between the levels of priority, but the problem would be to decide then a crunching of a WU should effect the level debt or not.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: Since we are discussing

31 Jul 2005 7:00:18 UTC

Message 14701 in response to message 14700

(moderation:

)

Quote:

Since we are discussing a priority system hear, we must keep the LTD and STD separate for every level of priority. The debt numbers would be balanced separately on every level of priority.

Certainly it is true that after long operation of the main project, and with pri 2 projects excluded by the priority system, the result would be to make the pri 1 projects all be in debt collectively to the pri 2 projects. How much does this matter?

I was assuming, perhaps wrongly, that the algorithm simply picked the project with the greatest debt each time it needed to make a choice, and that the right project would still have the largest number, and that it would not actually matter if the choice was made betwen (say) several very large positive numbers.

After some more thought I now agree with you, Ziran, and for two reasons.

Firstly, balanced debt makes the numbers more understandable during debugging. Simplicity means easy to understand, even if the code is slightly longer.

Secondly, separate balancing of priority levels also means that we don't have to worry about whether the imbalance would get so big that floating point arithmetic becomes an issue.

~~gravywavy

backup project: resource share problem

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports