The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.
Would this concept of a variable deadline be desirable?
BM
I think so. This should be the same system at SETI, I think. Ther is also a variable deadline from the size of the WU.
The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.
Would this concept of a variable deadline be desirable?
BM
Agreed, and when the scheduler decides whether or not to send a given WU to host it has the reported performance and other metrics for the host to use for that purpose.
However, keep in mind you would only 'need' to use variable deadlines if you intended to send the whole spectrum of template frequencies to all hosts so that the tightness factor was constant over that range.
If you were to stick with the 'slowhost/fasthost' method used in S5R1/I, then the main factor would be the trigger points for the ranges. Of course the downside to that with a fixed two week deadline is you will progressively raise the bar for who can participate as the work gets 'tougher'. IOW's slowhosts would only be able to run a smaller fraction of the work, all other things being equal.
So it seems to me that the choice of stategies boils down to how much the extra load from either method would impact the DB backend. Although I can't say for sure I would think that just bumping the deadline a week would have less effect than variable deadlines at fixed tightness factor would, since my data indicates it could easily take a host at the low end of the speed spectrum a month to run the 'toughies'.
Thinking about it, since we're Beta right now anyway, wouldn't it be easier to test the 3 week deadline theory at this point than variable deadlines with regard to DB load?
I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick
The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.
Would this concept of a variable deadline be desirable?
BM
That seems to be the way that SETI does it.
BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.
BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.
You've just observed what I'm talking about when I speak of 'tightness factor'.
FWIW, EAH has normally been a tighter project than SAH historically speaking.
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.
No offense meant, but issuing trailers by default is the last thing which should be considered, especially while we're in this beta phase and other possible scheduling issues have been observed.
My reasons:
1.) It means that any host not running a 5.10 client will end up wasting at least some of, up to most of it's time running scientifically useless results and therefore wasting the participants money spent on power.
2.) With a tight deadline project your host might find itself running a little late, but end up getting unconditionally aborted after having crunched most of the result, due to the third result coming in and validating. There are other twists to this scenario, and applies to 5.5 CC's and up (IIRC).
3.) The amount of time a result stays pending has zero long term impact on any of your performance metrics, regardless of the reason for it.
4.) The large host cache scenario, where 221 functionality works to mitigate the wasted time issue for always connected fast hosts, is really intended for people who are not always connected (ie notebooks and DU participants). Issuing trailers by default just to placate instant gratification unduly penalizes them due to Items 1 and 2. One only needs to look at Dr. Anderson's comments regarding this to see how the 'head man' feels about it, the cache decoupling feature was only recently released, and the jury is still out regarding whether it's a good or bad thing in the context of being available to the whole spectrum of participants. My guess is 221 functionality was added as well in order to prevent wholesale deadline blowing in extreme cache, short CI scenarios when running multiple projects.
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.
But doesn't that take a backend update as well (which is needed here)? I don't run Seti and I haven't been paying very close attention to all of the server side stuff that's come up in the last few weeks. I've had enough trouble keeping up with client side stuff.
On the topic of the thread, I personally am not running into deadline issues. But variable deadlines does seem to fit the bill here.
Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.
If I understand things correctly (and that's a big if) I don't think this statement is necessarily true.
I would think that many of the result pairs - perhaps even the majority - get fully completed in less than 10 days with a 14 day deadline. I base this on an observation of many of my own results over time. I ask this question. If the deadline had been 28 days instead of 14, would all those people who are taking 10 days or less suddenly start taking 20 days? I wouldn't have thought so. In fact isn't it true to say that a simple increase in deadline would have no effect on those who are currently meeting the deadline unless they suddenly started running their machines less hours per day or suddenly reduced the resource share that they were prepared to allocate to EAH or suddenly did something silly like drastically increasing their cache size? My gut feeling is that whilst some may take some of these three actions, most wouldn't.
As far as waiting for validation and credits, I don't think there would be much change. The anecdotal evidence suggests that there is a significant drift of machines away from the project because the perception of the owners is that they can't abide the long crunch times and the strict deadlines. Many simply leave without completing what they have which means that work has to be reissued. In other words, the results so effected are going to take a long time to validate anyway. A longer deadline would encourage many of those people to "stick it out" which may actually reduce the total time for validation on quite a few results. Someone sticking to the job for 20 days is going to be faster than two people successively failing a 14 day deadline.
Quote:
A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.
I think absolutely that this is the way to go. As the workunit generator knows the "size" and could calculate and insert an appropriate deadline, this course of action obeys the KISS principle. On your PDF graph you showed "size" in terms of crunch hours - 10, 20, 30, etc. It almost seems appropriate to change those into deadline days - 10 days, 20 days, 30 days. It wouldn't need to be a continuous function - you could put certain frequencies into "speed bins" and have a single deadline for each bin - whatever is easiest for the WU generator to do.
A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.
BM
One comment: I've observed an undesirable side-effect on the short end of the current variable SETI deadlines.
For users running more than one project, one with low resource share can very easily trip into EDF processing when a new result is downloaded with a low predicted runtime and thus a very early deadline. On my machines, this effect is annoying. I run some at 2% SETI share, but when SETI has a server hiccup, during recovery my machine overfetches (a long-known bug), and if among the overfetch are some short ones, I get into Earliest Deadline First. If I run a queue of more than trivial length, soon some Einstein units are in EDF. I won't go down the path of arguing whether anyone should care about this--the simple fact is that a fair number of people do.
I'd think most of us would like the behavior of the variable deadline with size approach so long as the low end did not dip below something like ten days.
On the long end the biggest project risk I can see is that an unlucky WU which gets downloaded to sequence of machines which quit or invalidate could take even longer to finally get resolved than now. So the tail at the end of current campaign could take even longer and lead to even more massive multiple issuing in the end game.
On balance I think it a good idea. As a first guess I'd suggest the smallest units currently issued get ten days, and the largest currently issue get double the current deadline, with a linear scale between. No magic bullet this, but possibly a decent compromise among the considerations.
When you look at the basic assumptions for BOINC then the deadlines MUST be extended.
You cannot expect hosts to be on 24/7, most are probably only on during office hours, or at home, but not sleeping hours, so don't expect more than 8 hrs/day as an absolute max.
You must assume the host is attached to more than one project, so divide time by 2 or 3.
The host probably uses windows and uses standard app, and on average is two years old. Therefore crunch time per Einstein, mid range, unit is 20+ hours.
BOINC is only expected to use spare cpu cycles.
And in reality a 14 day deadline is probably closer to 12 days crunching. As each unit is probably downloaded a day before it starts and the scheduler tries to return 24 hrs before actual deadline.
From this I would guess the average attached computer could do one unit/cpu at most and probably at some point is in EDF for the Einstein units.
RE: The deadline (actually
)
I think so. This should be the same system at SETI, I think. Ther is also a variable deadline from the size of the WU.
RE: The deadline (actually
)
Agreed, and when the scheduler decides whether or not to send a given WU to host it has the reported performance and other metrics for the host to use for that purpose.
However, keep in mind you would only 'need' to use variable deadlines if you intended to send the whole spectrum of template frequencies to all hosts so that the tightness factor was constant over that range.
If you were to stick with the 'slowhost/fasthost' method used in S5R1/I, then the main factor would be the trigger points for the ranges. Of course the downside to that with a fixed two week deadline is you will progressively raise the bar for who can participate as the work gets 'tougher'. IOW's slowhosts would only be able to run a smaller fraction of the work, all other things being equal.
So it seems to me that the choice of stategies boils down to how much the extra load from either method would impact the DB backend. Although I can't say for sure I would think that just bumping the deadline a week would have less effect than variable deadlines at fixed tightness factor would, since my data indicates it could easily take a host at the low end of the speed spectrum a month to run the 'toughies'.
Thinking about it, since we're Beta right now anyway, wouldn't it be easier to test the 3 week deadline theory at this point than variable deadlines with regard to DB load?
Alinator
RE: RE: I guess if the
)
That seems to be the way that SETI does it.
BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.
RE: BTW: I just got a SETI
)
You've just observed what I'm talking about when I speak of 'tightness factor'.
FWIW, EAH has normally been a tighter project than SAH historically speaking.
Alinator
The other thing that SETI
)
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.
RE: The other thing that
)
No offense meant, but issuing trailers by default is the last thing which should be considered, especially while we're in this beta phase and other possible scheduling issues have been observed.
My reasons:
1.) It means that any host not running a 5.10 client will end up wasting at least some of, up to most of it's time running scientifically useless results and therefore wasting the participants money spent on power.
2.) With a tight deadline project your host might find itself running a little late, but end up getting unconditionally aborted after having crunched most of the result, due to the third result coming in and validating. There are other twists to this scenario, and applies to 5.5 CC's and up (IIRC).
3.) The amount of time a result stays pending has zero long term impact on any of your performance metrics, regardless of the reason for it.
4.) The large host cache scenario, where 221 functionality works to mitigate the wasted time issue for always connected fast hosts, is really intended for people who are not always connected (ie notebooks and DU participants). Issuing trailers by default just to placate instant gratification unduly penalizes them due to Items 1 and 2. One only needs to look at Dr. Anderson's comments regarding this to see how the 'head man' feels about it, the cache decoupling feature was only recently released, and the jury is still out regarding whether it's a good or bad thing in the context of being available to the whole spectrum of participants. My guess is 221 functionality was added as well in order to prevent wholesale deadline blowing in extreme cache, short CI scenarios when running multiple projects.
Alinator
RE: The other thing that
)
But doesn't that take a backend update as well (which is needed here)? I don't run Seti and I haven't been paying very close attention to all of the server side stuff that's come up in the last few weeks. I've had enough trouble keeping up with client side stuff.
On the topic of the thread, I personally am not running into deadline issues. But variable deadlines does seem to fit the bill here.
Kathryn :o)
Einstein@Home Moderator
RE: Doubling the deadline
)
If I understand things correctly (and that's a big if) I don't think this statement is necessarily true.
I would think that many of the result pairs - perhaps even the majority - get fully completed in less than 10 days with a 14 day deadline. I base this on an observation of many of my own results over time. I ask this question. If the deadline had been 28 days instead of 14, would all those people who are taking 10 days or less suddenly start taking 20 days? I wouldn't have thought so. In fact isn't it true to say that a simple increase in deadline would have no effect on those who are currently meeting the deadline unless they suddenly started running their machines less hours per day or suddenly reduced the resource share that they were prepared to allocate to EAH or suddenly did something silly like drastically increasing their cache size? My gut feeling is that whilst some may take some of these three actions, most wouldn't.
As far as waiting for validation and credits, I don't think there would be much change. The anecdotal evidence suggests that there is a significant drift of machines away from the project because the perception of the owners is that they can't abide the long crunch times and the strict deadlines. Many simply leave without completing what they have which means that work has to be reissued. In other words, the results so effected are going to take a long time to validate anyway. A longer deadline would encourage many of those people to "stick it out" which may actually reduce the total time for validation on quite a few results. Someone sticking to the job for 20 days is going to be faster than two people successively failing a 14 day deadline.
I think absolutely that this is the way to go. As the workunit generator knows the "size" and could calculate and insert an appropriate deadline, this course of action obeys the KISS principle. On your PDF graph you showed "size" in terms of crunch hours - 10, 20, 30, etc. It almost seems appropriate to change those into deadline days - 10 days, 20 days, 30 days. It wouldn't need to be a continuous function - you could put certain frequencies into "speed bins" and have a single deadline for each bin - whatever is easiest for the WU generator to do.
Cheers,
Gary.
RE: A deadline that depends
)
One comment: I've observed an undesirable side-effect on the short end of the current variable SETI deadlines.
For users running more than one project, one with low resource share can very easily trip into EDF processing when a new result is downloaded with a low predicted runtime and thus a very early deadline. On my machines, this effect is annoying. I run some at 2% SETI share, but when SETI has a server hiccup, during recovery my machine overfetches (a long-known bug), and if among the overfetch are some short ones, I get into Earliest Deadline First. If I run a queue of more than trivial length, soon some Einstein units are in EDF. I won't go down the path of arguing whether anyone should care about this--the simple fact is that a fair number of people do.
I'd think most of us would like the behavior of the variable deadline with size approach so long as the low end did not dip below something like ten days.
On the long end the biggest project risk I can see is that an unlucky WU which gets downloaded to sequence of machines which quit or invalidate could take even longer to finally get resolved than now. So the tail at the end of current campaign could take even longer and lead to even more massive multiple issuing in the end game.
On balance I think it a good idea. As a first guess I'd suggest the smallest units currently issued get ten days, and the largest currently issue get double the current deadline, with a linear scale between. No magic bullet this, but possibly a decent compromise among the considerations.
When you look at the basic
)
When you look at the basic assumptions for BOINC then the deadlines MUST be extended.
You cannot expect hosts to be on 24/7, most are probably only on during office hours, or at home, but not sleeping hours, so don't expect more than 8 hrs/day as an absolute max.
You must assume the host is attached to more than one project, so divide time by 2 or 3.
The host probably uses windows and uses standard app, and on average is two years old. Therefore crunch time per Einstein, mid range, unit is 20+ hours.
BOINC is only expected to use spare cpu cycles.
And in reality a 14 day deadline is probably closer to 12 days crunching. As each unit is probably downloaded a day before it starts and the scheduler tries to return 24 hrs before actual deadline.
From this I would guess the average attached computer could do one unit/cpu at most and probably at some point is in EDF for the Einstein units.
Andy