Report deadline too short

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> Einstein@Home and SETI@Home

Message 5495 in response to message 5479

> Einstein@Home and SETI@Home are having performance problems with the database.

E@H is not having any performance problems with the database. However we ARE seeing some bugs in BOINC, which cause some WU sent to clients to be lost (meaning that they are sent out of the server, over the network, but never arrive at the host machine).

Bruce

Director, Einstein@Home

STE\/E
STE\/E
Joined: 18 Jan 05
Posts: 135
Credit: 144301535
RAC: 16214

Crap, I forgot about the 7

Crap, I forgot about the 7 Day Deadline over here a Einstein, oh well, I'll run out what I can and reset the Project when I hit the Deadlines ...

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

> There are logic problems

Message 5497 in response to message 5494

> There are logic problems with all 3 of these approaches though. The one with
> the short term scheduler is obvious from this thread, you cannot attach to an
> unlimited number of projects. The long term scheduler had 2 major problems.
> With the right combination of queue size, work speed and deadline a project
> could tie up a computer indefinatly. The other was a long workunit project
> would be the only thing on the computer for extended periods (2 workunits
> before a project change). The combination scheduler should eliminate the
> problem with a project locking the other projects out. It should make it
> possible to attach to unlimited projects but it still may miss deadlines
> sometimes depending on how many projects manage to squeeze work in at one
> time. But only reduce the effect of long workunit projects being the only
> thing on the computer.

If the developers would give up some of their "tunnel-vision" this could be realized with small alterations within the current framework. One of the things that would have to change is the Project-Server-Centric approach. THough, someone could probably still do this on the client side but it would require some footwork on the part of the client application ... but, by monitoring the farm's completion rate (something that BOINC View in theory could do because it does log the completed work), and then play with the XML files to "force" changes ...

THe other piece of course is to monitor deadlines and to suspend and resume projects and work. And by "lying" to the servers, changes in the settings coiuld be back-propagated to the project servers...

Just something else to think about ...

Razorirr
Razorirr
Joined: 18 Jan 05
Posts: 5
Credit: 43658
RAC: 0

i have my system at .5 a day.

i have my system at .5 a day. it keeps it so the seti and eienstien dont overfill my cache. but there are disadvantages to this. main issue is this little cherry, "dl refused'have wus but you wont finish in time'" my cache is small cause if i have a large one seti and eien stien flood it. one halfday of work is two wus (one of each) and thats actually 2 days when they finish. one full day is 3 and 3 which equals six days. the smaller projects deny work because they wont get it in ontime.

i propose this,

phase out non boinc manager cc's. Figure most people will start messing around and figure outh how many wus it can handle. then they have the projects reconfigure the stuff so instead of connect every .5 days for all the projects each project has their own little dl this many wu's per week box. then the people average it to however much they want to do of each per week. The one problem is that it requires everyone to sign up for cpdn that wants to run 24/7/365
like mine, i can handle in a week
cpdn= 1wu whenever
seti= 1wu
eienstien=1 wu
alpha 1=12 wus
alpha 2=12 wus
predictor= 2 wus
lhc= 1 1mil rotation or 12 100k rotation
plus i have alpha 3&4 that run constantly une wu a year.
thats a week worth of work that i could do with no missed deadlines but with this system that we use right now i can get one eah sah and predictor and all the other projects give me a "have wus but you wont finish in time" cause alpha one has a three day deadline. and alpha 2 has a 2 day deadline.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 62

Or, teach (try) yourself the

Or, teach (try) yourself the act of using two project prefs per PC, Razorrir. If I can do it one one PC, I bet you can do it on your seven. :)

STE\/E
STE\/E
Joined: 18 Jan 05
Posts: 135
Credit: 144301535
RAC: 16214

Some of us prefer to only run

Some of us prefer to only run 1 Project at a time Ageless, I like to have the flavor of the month. I run 1 Project for a month & then go to another Project for a month and so on ... Plus I've found that trying to run 2 or 3 Projects on my Computers just leads to mass confusion for them.

I sit there and watch the GUI or Manager just switch back and forth between the Projects every 2 or 3 minutes & each time it switch's it just goes back to the last Check Point so I end up getting nothing Crunched.

Why that does that on my Computers I don't know, I have my Preferences set to switch projects every 60 minutes but it wants to do it every 2 or 3 minutes ... Goofy

Vid Vidmar*
Vid Vidmar*
Joined: 22 Jan 05
Posts: 25
Credit: 191816
RAC: 0

> E@H is not having any

Message 5501 in response to message 5495

> E@H is not having any performance problems with the database. However we ARE
> seeing some bugs in BOINC, which cause some WU sent to clients to be lost
> (meaning that they are sent out of the server, over the network, but never
> arrive at the host machine).
>
> Bruce

Yes that's just what happened with 1 WU that was "sent" to me but I never got it.

Razorirr
Razorirr
Joined: 18 Jan 05
Posts: 5
Credit: 43658
RAC: 0

six goast computers is

six goast computers is nothing, in 5 months i picked up 23 at seti. I had the 11 goasts then I updated to sp2 and all the goasts doubled and made sp2 versions of themselves. the real computer goasted again at that point too. weire thing some of them had credit. strange for a machine that doesnt exist.

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

> i have my system at .5 a

Message 5503 in response to message 5498

> i have my system at .5 a day. it keeps it so the seti and eienstien dont
> overfill my cache. but there are disadvantages to this. main issue is this
> little cherry, "dl refused'have wus but you wont finish in time'" my cache is
> small cause if i have a large one seti and eien stien flood it. one halfday of
> work is two wus (one of each) and thats actually 2 days when they finish. one
> full day is 3 and 3 which equals six days. the smaller projects deny work
> because they wont get it in ontime.
>
> i propose this,
>
> phase out non boinc manager cc's. Figure most people will start messing
> around and figure outh how many wus it can handle. then they have the projects
> reconfigure the stuff so instead of connect every .5 days for all the projects
> each project has their own little dl this many wu's per week box. then the
> people average it to however much they want to do of each per week. The one
> problem is that it requires everyone to sign up for cpdn that wants to run
> 24/7/365
> like mine, i can handle in a week
> cpdn= 1wu whenever
> seti= 1wu
> eienstien=1 wu
> alpha 1=12 wus
> alpha 2=12 wus
> predictor= 2 wus
> lhc= 1 1mil rotation or 12 100k rotation
> plus i have alpha 3&4 that run constantly une wu a year.
> thats a week worth of work that i could do with no missed deadlines but
> with this system that we use right now i can get one eah sah and predictor and
> all the other projects give me a "have wus but you wont finish in time" cause
> alpha one has a three day deadline. and alpha 2 has a 2 day deadline.
>
There is a major flaw with WU counts. Even in the same project different WUs can be known to take vastly different times. For S@H AstroPulse WUs will take about double the time that SETI WUs take. There is also southern hemisphere WUs (Parkes) comming sometime these will take yet a different amount of time. Re-observation work will take more time. Predictor has two different WU sizes MFold and CharMM. CharMM takes about half the time that MFold takes. In general setting a time period is going to work much better.

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

> This problem is definatly

Message 5504 in response to message 5494

> This problem is definatly known to the BOINC developers. There have been
> several discusions about it on the developers mailing list. I expect (hope) it
> to be the next thing worked on after the current development version is
> released, right now getting that out is priority 1.
>
> The current short term scheduler attempts to divide CPU time according to the
> resource shares in a day. All projects start at zero at the begining of the
> day and do not remember if they were the only project the day before or
> anything like that.
>
> The older long term scheduler attempted to divide CPU time over about 1 to 2
> weeks. It did this by downloading work from projects that were behind, which
> would then be crunched starting with the earliest deadline the only time it
> changed projects was between workunits.
>
> Currently the best looking idea is to try to combine the 2 schedulers. The
> short term scheduler would still mostly work the same way but instead of
> directly getting more work it would pass the request to the long term
> scheduler. The long term scheduler would keep the queue down to a managable
> length and remember which projects were worked on recently and try to work on
> other projects next.
>
> There are logic problems with all 3 of these approaches though. The one with
> the short term scheduler is obvious from this thread, you cannot attach to an
> unlimited number of projects. The long term scheduler had 2 major problems.
> With the right combination of queue size, work speed and deadline a project
> could tie up a computer indefinatly. The other was a long workunit project
> would be the only thing on the computer for extended periods (2 workunits
> before a project change). The combination scheduler should eliminate the
> problem with a project locking the other projects out. It should make it
> possible to attach to unlimited projects but it still may miss deadlines
> sometimes depending on how many projects manage to squeeze work in at one
> time. But only reduce the effect of long workunit projects being the only
> thing on the computer.
>
I think that I have thought of a decent solution (in the shower of cours).

The debt is calculated at the times it is needed (not at the end of a WU as it was done in the original - this prevents 2 CPDN WUs from running back to back). There are two debts maintained. The longterm debt is allowed to float to whatever number it needs. It is reduced somewhat if a project fails to deliver any work when asked (to prevent projects that are down for long periods from dominating too badly when they come back). However, it is not reduced based on time (so that CPDN does not get run again too soon after it is completed). The short term debt would work about the same way that the current short term debt works.

When the client gets to the minimum queue size, it requests work from the project with the highest debt. The calculation would be min( maxqueue - current work, (maxqueue - current work) * resource fracton + longterm debt). After the work is retrieved from the first project, the second project is contacted with the same equation based on current work. Since the queues are overfilled, (the work request is satisfied, and some fraction of the last WU slops over the request) The last project may or not be contacted because the queue is already full. If a user does not have something like CPDN, a very short queue length, or a slow machine, it will behave very much as it does today. If CPDN is added to the mix, CPDN will be the only WU running for some times. After the CPDN WU completes, the sharing will tend to be by queue fill. With a slow machine or a short queue, the sharing will tend to be by queue fill rather than by the hour. However, it will tend to get less work for slow machines. It will have shorter turnaround times with short queues even if there are a large number of projects on that machine (this is better for the science).

The client should still implement a saftey that starts crunching the nearest deadline if there are WUs that have tight deadlines. This could be facilitated by the client remembering what deadline durations the server has requested, and reporting slack time available for each of these deadline durations (S@H would have one entry returned at 14 days, Predictor would have several at 1 day, 3 days and 14 days if I recall correctly, other projects would have different reports.) This list is not that hard to build. Having this information sent back to the server would mean that the server would know not to send a 1 day deadline WU to a client that needs all of that day to crunch for other short deadline WUs (possibly from other projects). The server would not have to know what projects are running on the client, just the slack time.

The only real problem I can see with this is the loss in some instances of the share by hour. However, it was mostly those instances that are causing the problem with share by hour (the exception being CPDN, and anyone running CPDN should be taking the long view anyway). Can anyone else see any problems with this?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.