Report deadline too short

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1125

Credit: 172127663

RAC: 0

> Einstein@Home and SETI@Home

4 Mar 2005 23:17:34 UTC

Message 5495 in response to message 5479

(moderation:

)

> Einstein@Home and SETI@Home are having performance problems with the database.

E@H is not having any performance problems with the database. However we ARE seeing some bugs in BOINC, which cause some WU sent to clients to be lost (meaning that they are sent out of the server, over the network, but never arrive at the host machine).

Bruce

Director, Einstein@Home

$STE\/E$

STE\/E

Joined: 18 Jan 05

Posts: 135

Credit: 145821500

RAC: 20143

Crap, I forgot about the 7

11 Mar 2005 1:37:03 UTC

Message 5496

(moderation:

)

Crap, I forgot about the 7 Day Deadline over here a Einstein, oh well, I'll run out what I can and reset the Project when I hit the Deadlines ...

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

> There are logic problems

11 Mar 2005 1:59:55 UTC

Message 5497 in response to message 5494

(moderation:

)

> There are logic problems with all 3 of these approaches though. The one with
> the short term scheduler is obvious from this thread, you cannot attach to an
> unlimited number of projects. The long term scheduler had 2 major problems.
> With the right combination of queue size, work speed and deadline a project
> could tie up a computer indefinatly. The other was a long workunit project
> would be the only thing on the computer for extended periods (2 workunits
> before a project change). The combination scheduler should eliminate the
> problem with a project locking the other projects out. It should make it
> possible to attach to unlimited projects but it still may miss deadlines
> sometimes depending on how many projects manage to squeeze work in at one
> time. But only reduce the effect of long workunit projects being the only
> thing on the computer.

If the developers would give up some of their "tunnel-vision" this could be realized with small alterations within the current framework. One of the things that would have to change is the Project-Server-Centric approach. THough, someone could probably still do this on the client side but it would require some footwork on the part of the client application ... but, by monitoring the farm's completion rate (something that BOINC View in theory could do because it does log the completed work), and then play with the XML files to "force" changes ...

THe other piece of course is to monitor deadlines and to suspend and resume projects and work. And by "lying" to the servers, changes in the settings coiuld be back-propagated to the project servers...

Just something else to think about ...

Razorirr

Joined: 18 Jan 05

Posts: 5

Credit: 43658

RAC: 0

i have my system at .5 a day.

11 Mar 2005 5:36:50 UTC

Message 5498

(moderation:

)

i have my system at .5 a day. it keeps it so the seti and eienstien dont overfill my cache. but there are disadvantages to this. main issue is this little cherry, "dl refused'have wus but you wont finish in time'" my cache is small cause if i have a large one seti and eien stien flood it. one halfday of work is two wus (one of each) and thats actually 2 days when they finish. one full day is 3 and 3 which equals six days. the smaller projects deny work because they wont get it in ontime.

i propose this,

phase out non boinc manager cc's. Figure most people will start messing around and figure outh how many wus it can handle. then they have the projects reconfigure the stuff so instead of connect every .5 days for all the projects each project has their own little dl this many wu's per week box. then the people average it to however much they want to do of each per week. The one problem is that it requires everyone to sign up for cpdn that wants to run 24/7/365
like mine, i can handle in a week
cpdn= 1wu whenever
seti= 1wu
eienstien=1 wu
alpha 1=12 wus
alpha 2=12 wus
predictor= 2 wus
lhc= 1 1mil rotation or 12 100k rotation
plus i have alpha 3&4 that run constantly une wu a year.
thats a week worth of work that i could do with no missed deadlines but with this system that we use right now i can get one eah sah and predictor and all the other projects give me a "have wus but you wont finish in time" cause alpha one has a three day deadline. and alpha 2 has a 2 day deadline.

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 0

Or, teach (try) yourself the

11 Mar 2005 6:26:28 UTC

Message 5499

(moderation:

)

Or, teach (try) yourself the act of using two project prefs per PC, Razorrir. If I can do it one one PC, I bet you can do it on your seven. :)

$STE\/E$

STE\/E

Joined: 18 Jan 05

Posts: 135

Credit: 145821500

RAC: 20143

Some of us prefer to only run

11 Mar 2005 11:37:08 UTC

Message 5500

(moderation:

)

Some of us prefer to only run 1 Project at a time Ageless, I like to have the flavor of the month. I run 1 Project for a month & then go to another Project for a month and so on ... Plus I've found that trying to run 2 or 3 Projects on my Computers just leads to mass confusion for them.

I sit there and watch the GUI or Manager just switch back and forth between the Projects every 2 or 3 minutes & each time it switch's it just goes back to the last Check Point so I end up getting nothing Crunched.

Why that does that on my Computers I don't know, I have my Preferences set to switch projects every 60 minutes but it wants to do it every 2 or 3 minutes ... Goofy

Vid Vidmar*

Joined: 22 Jan 05

Posts: 25

Credit: 191816

RAC: 0

> E@H is not having any

11 Mar 2005 12:48:43 UTC

Message 5501 in response to message 5495

(moderation:

)

> E@H is not having any performance problems with the database. However we ARE
> seeing some bugs in BOINC, which cause some WU sent to clients to be lost
> (meaning that they are sent out of the server, over the network, but never
> arrive at the host machine).
>
> Bruce

Yes that's just what happened with 1 WU that was "sent" to me but I never got it.

Razorirr

Joined: 18 Jan 05

Posts: 5

Credit: 43658

RAC: 0

six goast computers is

12 Mar 2005 5:22:35 UTC

Message 5502

(moderation:

)

six goast computers is nothing, in 5 months i picked up 23 at seti. I had the 11 goasts then I updated to sp2 and all the goasts doubled and made sp2 versions of themselves. the real computer goasted again at that point too. weire thing some of them had credit. strange for a machine that doesnt exist.

John McLeod VII

Moderator

Joined: 10 Nov 04

Posts: 547

Credit: 632255

RAC: 0

> i have my system at .5 a

14 Mar 2005 3:06:30 UTC

Message 5503 in response to message 5498

(moderation:

)

> i have my system at .5 a day. it keeps it so the seti and eienstien dont
> overfill my cache. but there are disadvantages to this. main issue is this
> little cherry, "dl refused'have wus but you wont finish in time'" my cache is
> small cause if i have a large one seti and eien stien flood it. one halfday of
> work is two wus (one of each) and thats actually 2 days when they finish. one
> full day is 3 and 3 which equals six days. the smaller projects deny work
> because they wont get it in ontime.
>
> i propose this,
>
> phase out non boinc manager cc's. Figure most people will start messing
> around and figure outh how many wus it can handle. then they have the projects
> reconfigure the stuff so instead of connect every .5 days for all the projects
> each project has their own little dl this many wu's per week box. then the
> people average it to however much they want to do of each per week. The one
> problem is that it requires everyone to sign up for cpdn that wants to run
> 24/7/365
> like mine, i can handle in a week
> cpdn= 1wu whenever
> seti= 1wu
> eienstien=1 wu
> alpha 1=12 wus
> alpha 2=12 wus
> predictor= 2 wus
> lhc= 1 1mil rotation or 12 100k rotation
> plus i have alpha 3&4 that run constantly une wu a year.
> thats a week worth of work that i could do with no missed deadlines but
> with this system that we use right now i can get one eah sah and predictor and
> all the other projects give me a "have wus but you wont finish in time" cause
> alpha one has a three day deadline. and alpha 2 has a 2 day deadline.
>
There is a major flaw with WU counts. Even in the same project different WUs can be known to take vastly different times. For S@H AstroPulse WUs will take about double the time that SETI WUs take. There is also southern hemisphere WUs (Parkes) comming sometime these will take yet a different amount of time. Re-observation work will take more time. Predictor has two different WU sizes MFold and CharMM. CharMM takes about half the time that MFold takes. In general setting a time period is going to work much better.

BOINC WIKI

John McLeod VII

Moderator

Joined: 10 Nov 04

Posts: 547

Credit: 632255

RAC: 0

> This problem is definatly

14 Mar 2005 3:31:16 UTC

Message 5504 in response to message 5494

(moderation:

)

> This problem is definatly known to the BOINC developers. There have been
> several discusions about it on the developers mailing list. I expect (hope) it
> to be the next thing worked on after the current development version is
> released, right now getting that out is priority 1.
>
> The current short term scheduler attempts to divide CPU time according to the
> resource shares in a day. All projects start at zero at the begining of the
> day and do not remember if they were the only project the day before or
> anything like that.
>
> The older long term scheduler attempted to divide CPU time over about 1 to 2
> weeks. It did this by downloading work from projects that were behind, which
> would then be crunched starting with the earliest deadline the only time it
> changed projects was between workunits.
>
> Currently the best looking idea is to try to combine the 2 schedulers. The
> short term scheduler would still mostly work the same way but instead of
> directly getting more work it would pass the request to the long term
> scheduler. The long term scheduler would keep the queue down to a managable
> length and remember which projects were worked on recently and try to work on
> other projects next.
>
> There are logic problems with all 3 of these approaches though. The one with
> the short term scheduler is obvious from this thread, you cannot attach to an
> unlimited number of projects. The long term scheduler had 2 major problems.
> With the right combination of queue size, work speed and deadline a project
> could tie up a computer indefinatly. The other was a long workunit project
> would be the only thing on the computer for extended periods (2 workunits
> before a project change). The combination scheduler should eliminate the
> problem with a project locking the other projects out. It should make it
> possible to attach to unlimited projects but it still may miss deadlines
> sometimes depending on how many projects manage to squeeze work in at one
> time. But only reduce the effect of long workunit projects being the only
> thing on the computer.
>
I think that I have thought of a decent solution (in the shower of cours).

The debt is calculated at the times it is needed (not at the end of a WU as it was done in the original - this prevents 2 CPDN WUs from running back to back). There are two debts maintained. The longterm debt is allowed to float to whatever number it needs. It is reduced somewhat if a project fails to deliver any work when asked (to prevent projects that are down for long periods from dominating too badly when they come back). However, it is not reduced based on time (so that CPDN does not get run again too soon after it is completed). The short term debt would work about the same way that the current short term debt works.

When the client gets to the minimum queue size, it requests work from the project with the highest debt. The calculation would be min( maxqueue - current work, (maxqueue - current work) * resource fracton + longterm debt). After the work is retrieved from the first project, the second project is contacted with the same equation based on current work. Since the queues are overfilled, (the work request is satisfied, and some fraction of the last WU slops over the request) The last project may or not be contacted because the queue is already full. If a user does not have something like CPDN, a very short queue length, or a slow machine, it will behave very much as it does today. If CPDN is added to the mix, CPDN will be the only WU running for some times. After the CPDN WU completes, the sharing will tend to be by queue fill. With a slow machine or a short queue, the sharing will tend to be by queue fill rather than by the hour. However, it will tend to get less work for slow machines. It will have shorter turnaround times with short queues even if there are a large number of projects on that machine (this is better for the science).

The client should still implement a saftey that starts crunching the nearest deadline if there are WUs that have tight deadlines. This could be facilitated by the client remembering what deadline durations the server has requested, and reporting slack time available for each of these deadline durations (S@H would have one entry returned at 14 days, Predictor would have several at 1 day, 3 days and 14 days if I recall correctly, other projects would have different reports.) This list is not that hard to build. Having this information sent back to the server would mean that the server would know not to send a 1 day deadline WU to a client that needs all of that day to crunch for other short deadline WUs (possibly from other projects). The server would not have to know what projects are running on the client, just the slack time.

The only real problem I can see with this is the loss in some instances of the share by hour. However, it was mostly those instances that are causing the problem with share by hour (the exception being CPDN, and anyone running CPDN should be taking the long view anyway). Can anyone else see any problems with this?

BOINC WIKI

Report deadline too short

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner