Optimal distribution of work

A-D
A-D
Joined: 22 Jan 05
Posts: 22
Credit: 48640312
RAC: 0
Topic 188692

I apologize if this has been discussed here before, but I couldn't find anything.

I notice that over at Predictor they send a WU to three people (initially) and grant credit when two results match, whereas here results are sent to four people and credit is given when three results match.

Is it necessary to get three matching results? What are the chances of two bad results matching? (This is a real question - I have no clue what the chances are.) And even if there is some non-negligible chance of this, wouldn't it be off-set by the sustantial increase in work accomplished by streamlining the distribution of work?

So wouldn't the optimal distribution of work be to give a WU to two computers and only send it to others when the deadline expires, an error is reported, or the results don't match? Think of all the redundant results the project could avoid, and thus all the extra work it could accomplish. The only downside I can see to this is that credit would stay "pending" for longer on average. But who cares? I don't crunch to get gold stars to put on the refrigerator - I want to see the work get done.

Note that these are two separate questions, and that the second applies to Predictor and other projects, as well. _However_ many matching results are required for scientific reliablity, why send the work to more than that many computers unless you have to?

ghstwolf
ghstwolf
Joined: 9 Feb 05
Posts: 24
Credit: 59103
RAC: 0

Optimal distribution of work

I think part of it is a numbers game, Seti and Einstein have many more users (Einstein about 2.5X, Seti nearly 6X). This does make a difference, the extra runs are easily absorbed.

Another difference, is the data sets. Seti and Eistein use recordings, observatories the are not always available (and even when they are it doesn't always yield usable results). Predictor and CPDN are models, that is they design their WUs (set variables and send), they can always create more. That isn't the case with Seti and Einstein, they have what they have (at any given time, we can include the cashes of tapes). So in the case of Seti and Einstein, why let the data collection be out paced by the processing? If the pool grows big enough, maybe 6 matching runs will be required. At that point it is not about Scientific accuracy, but keeping interest (NO work kills projects fast).


John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

> I think part of it is a

Message 9195 in response to message 9194

> I think part of it is a numbers game, Seti and Einstein have many more users
> (Einstein about 2.5X, Seti nearly 6X). This does make a difference, the extra
> runs are easily absorbed.
>
> Another difference, is the data sets. Seti and Eistein use recordings,
> observatories the are not always available (and even when they are it doesn't
> always yield usable results). Predictor and CPDN are models, that is they
> design their WUs (set variables and send), they can always create more. That
> isn't the case with Seti and Einstein, they have what they have (at any given
> time, we can include the cashes of tapes). So in the case of Seti and
> Einstein, why let the data collection be out paced by the processing? If the
> pool grows big enough, maybe 6 matching runs will be required. At that point
> it is not about Scientific accuracy, but keeping interest (NO work kills
> projects fast).
>
Actually, no. If S@H runs out of work because there are too many crunchers dedicating too much of their CPU power, they will run out of work. The same is true of all of the projects. The assumption is that if there are enough projects, everybody can get work from someplace.

ghstwolf
ghstwolf
Joined: 9 Feb 05
Posts: 24
Credit: 59103
RAC: 0

> Actually, no. If S@H runs

Message 9196 in response to message 9195

> Actually, no. If S@H runs out of work because there are too many crunchers
> dedicating too much of their CPU power, they will run out of work. The same
> is true of all of the projects. The assumption is that if there are enough
> projects, everybody can get work from someplace.
>

True enough, but it would be a position they have yet to run into (in what, 7 years and they have been able to increase their detection to keep ahead). Yes they have their bad days getting the work out, but they've yet to run out of a work pool. I'm not sure about the in/out, but I do wonder if we are catching up, maintaining, or falling behind. The day they are splitting tapes the day that they come in, I wonder if something would change.


Keck_Komputers
Keck_Komputers
Joined: 18 Jan 05
Posts: 376
Credit: 5744955
RAC: 0

> > Actually, no. If S@H

Message 9197 in response to message 9196

> > Actually, no. If S@H runs out of work because there are too many
> crunchers
> > dedicating too much of their CPU power, they will run out of work. The
> same
> > is true of all of the projects. The assumption is that if there are
> enough
> > projects, everybody can get work from someplace.
> >
>
> True enough, but it would be a position they have yet to run into (in what, 7
> years and they have been able to increase their detection to keep ahead). Yes
> they have their bad days getting the work out, but they've yet to run out of a
> work pool. I'm not sure about the in/out, but I do wonder if we are catching
> up, maintaining, or falling behind. The day they are splitting tapes the day
> that they come in, I wonder if something would change.
>
This day will come for seti and will come soon. The BOINC version of seti already does more science per day than classic. How long will it take them to rerun every tape from the old client (classic v2.xx) and all the tapes with the astropulse program? After that the only way seti will be able to keep up with demand is more data coming in.

BOINC WIKI

BOINCing since 2002/12/8

A-D
A-D
Joined: 22 Jan 05
Posts: 22
Credit: 48640312
RAC: 0

Well, obviously if there

Well, obviously if there really is more processing power than is strictly needed, it may as well be used to increase corroboration and decrease time-to-credit. But I was under the impression that projects like LIGO and LHC would be producing far more data than the present BOINC community could hope to crunch.

I realize that both of these systems are still in initial phases, and that there may be a bottleneck with the preparation and distribution of work units rather than the generation of raw data. Still, my suggested scheme does seem to be optimal (whether with two or with three matching results), and so I hope things can move that way as the data bottleneck (whatever exactly it is) is addressed. Anyone with any inside knowledge on this reading?

ben
ben
Joined: 9 Feb 05
Posts: 36
Credit: 1663
RAC: 0

> How long will it take them

> How long will it take them to rerun every tape from the old client (classic v2.xx) and all the tapes with the astropulse program?

Not certain they will re-run every tape...could you post a link to where project heads stated that goal?

In any case, yes, once work is used up, seti will not re-run it.
They have 3 projects in the wings to produce even more data though, and these were the primary reasons they created BOINC.

Project 1 - Astropulse: Relook at each WU with an entirely different crunching program to look for evaporation of possible black holes. I haven't ever run this, but I suspect the time for each WU would be much less.

Project 2 - Multi-beam receiver at Arecibo: A new antenna is/has be mounted on the Arecibo gimbol arm, along with the original single beam. Along with this will go a new faster/better recorder to make the tapes. This signal will have much more information than current tapes/WUs, and will use a wholly different crunching application to process.

Project 3 - Extending seti detection to other dishes. Either getting data that is already collected by other dishes, or adding antennas to other dishes. (Example: Right now Arecibo does southern hemisphere and Northern isn't covered)
-----

A little WU result issuing history:
Originally in BOINC seti issued 3 WUs...there was much complaining in the forums that "2 people have returned the result, guy #3's machine has a history of not returning WUs (going over)...so my machine won't get credited until the WU is re-issued and a 4th guy sends it back (unless he too goes over)".

Various solutions were posed, Like...
1. generating numbers to see who usually returns WUs and who goes over (can be caused by resetting or detaching)
2. or how fast WUs are usually returned (Average turnaround time...that was mine). Once you know this, if you have to re-issue a WU to a 4th machine, choose a fast turnaround machine.
3. Issuing to 4 hosts (as often happened anyway), and when 3 return it grant credit then.

They chose #3.

A-D
A-D
Joined: 22 Jan 05
Posts: 22
Credit: 48640312
RAC: 0

> Originally in BOINC seti

Message 9200 in response to message 9199

> Originally in BOINC seti issued 3 WUs...there was much complaining in the
> forums that "2 people have returned the result, guy #3's machine has a history
> of not returning WUs (going over)...so my machine won't get credited until the
> WU is re-issued and a 4th guy sends it back (unless he too goes over)".

What on Earth does it matter? Even if you do care about "credit," you'll get it all eventually, and the rate of return will settle down after you've been running it for a while. These people would rather see massive superfluous crunching than wait a bit longer for their precious "credit"? What's wrong with them? This is infantile.

Thanks for the update on the future of SETI. I've been trying to figure out what's happening with Astropulse, and whether SETI per se will end once they get through these tapes. I like the idea of putting the system to astronomical uses, but I hope that SETI does continue its primary mission. Is there more info available on the multi-beam receiver?

By the way, you know about PlanetQuest? http://www.planetquest.org

ghstwolf
ghstwolf
Joined: 9 Feb 05
Posts: 24
Credit: 59103
RAC: 0

> Not certain they will

Message 9201 in response to message 9199

> Not certain they will re-run every tape...could you post a link to where
> project heads stated that goal?

Sorry for not finding a link, but it had been discussed a while back. They were talking about some of the new things that were checked for in the Boinc version (Tests not availible on Classic), but nothing solid about actually doing it.

> Project 2 - Multi-beam receiver at Arecibo: A new antenna is/has be mounted
> on the Arecibo gimbol arm, along with the original single beam. Along with
> this will go a new faster/better recorder to make the tapes. This signal will
> have much more information than current tapes/WUs, and will use a wholly
> different crunching application to process.

Any other details on this, the geek in me wants to know. It seems to me any changes from this could be handled back-end, requiring no client side change. Maybe there is something to it I'm missing??? IIRC, the multi-beam unit is installed (maybe not calibrated, but physically there), and part of it's installation freed Arecibo for more recording time.


ben
ben
Joined: 9 Feb 05
Posts: 36
Credit: 1663
RAC: 0

> Any other details on this,

Message 9202 in response to message 9201

> Any other details on this, the geek in me wants to know. It seems to me any
> changes from this could be handled back-end, requiring no client side change.
> Maybe there is something to it I'm missing??? IIRC, the multi-beam unit is
> installed (maybe not calibrated, but physically there), and part of it's
> installation freed Arecibo for more recording time.

I saw the recorder unit when I visited seti, but beyond that I don't know if the actual antenna is connected yet. It had kind of a low priority setting vs getting BOINC completely ready for other projects.

The code to handle data from the new antenna will/would either be a separate project or another science app that is downloaded by BOINC under the seti project.

It does pick up considerably more information per time unit than the single beam one, and as such more refinement of incomming signals.

ghstwolf
ghstwolf
Joined: 9 Feb 05
Posts: 24
Credit: 59103
RAC: 0

> > The code to handle data

Message 9203 in response to message 9202

>
> The code to handle data from the new antenna will/would either be a separate
> project or another science app that is downloaded by BOINC under the seti
> project.
>
> It does pick up considerably more information per time unit than the single
> beam one, and as such more refinement of incomming signals.
>

Ok I can see that, I was under the impression that it would be easy to seperate the antennas in the recording (then just minor tweeks for the sensitivity of the reciever in the client). You've made it sound far more complicated, and I can believe that. Thanks for the correction.

Anyway, to get back to the board's topic: Where are we on this run? And When will S4 be ready to go (at least as raw data)?


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.