Far Far Too Much Redundancy!

Betting Slip
Betting Slip
Joined: 22 Jan 05
Posts: 15
Credit: 308,376
RAC: 0
Topic 188744

Please reduce the amount of redundancy that is employed as it is depriving other projects of computer time while we are needlessly crunching a qourum of 4 when the absolute max needed is 3. Seti is guilty and so is Einstein. I know its so that they can grant credit faster and delete the unit from their disks but its not justified. In effect for every 100 computers only 25 are doing any real work the others are only duplicating. You need redundancy but not that much.

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 22,245,334
RAC: 0

Far Far Too Much Redundancy!

Yes I agree.

Seti started with a quorum of three, but then all the
"When will I get my credit?" - whiners came.

If you send out three results, chances are great, that at least
one is not reported in time and resent. Which is perfectly normal,
but validation/credit calculation is delayed until a third
valid result is returned.

So seti decided to send four. So it is more likely to get three
valid results. And Einstein has adopted this strategy.

BTW, LHC sends far more, six, if I remember correctly, but
this is because CPU calculation is screwed over there...

(All spelling mistakes are intended for lysdexias pleasure only.)

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5,385,205
RAC: 0

> Yes I agree. > > Seti

Message 9435 in response to message 9434

> Yes I agree.
>
> Seti started with a quorum of three, but then all the
> "When will I get my credit?" - whiners came.
>
> If you send out three results, chances are great, that at least
> one is not reported in time and resent. Which is perfectly normal,
> but validation/credit calculation is delayed until a third
> valid result is returned.
>
> So seti decided to send four. So it is more likely to get three
> valid results. And Einstein has adopted this strategy.
>
> BTW, LHC sends far more, six, if I remember correctly, but
> this is because CPU calculation is screwed over there...
>
> (All spelling mistakes are intended for lysdexias pleasure only.)

There were, and are, other reasons for the change, one was to improve the speed at which the work could be "retired" from the on-line database.

LHC@Home has different issues and one of those is that the stability of the calculations is such that they can cause them to "crash" (if you pardon the similie) and that is not what is needed. I know it is hard to believe, but I spent a decade working with a mathematition on a program with similar instabilities and it is shocking how quickly chaotic behavior can arise ...

I am Paul D. Buck, and not only did I approve this message, I injected spelling errors for the spelling police ...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,753
Credit: 26,159,174,730
RAC: 35,006,103

> Please reduce the amount of

> Please reduce the amount of redundancy that is employed as it is depriving
> other projects of computer time while we are needlessly crunching a qourum of
> 4 when the absolute max needed is 3.

I'm sorry, but this is not correct. I think you mean that the absolute minimum required is three. The purpose of the third is to adjudicate if the first two do not quite agree. Your thread title says, "Far too much...". Aren't you being a little melodramatic?

There is a modicum of redundancy but that's a good thing from the science perspective. The people running the project are scientists pure and simple and the science objective is to get publishable results as efficiently as possible. If you think very carefully about that objective, I reckon you would come to the conclusion that they've got it about right. After all, they would have put a lot of thought into those aspects.

As far as depriving other projects is concerned, aren't all projects rather flush with donated computing resources anyway?? Don't they have some difficulty in keeping up with the apparently insatiable demand for workunits?

Cheers,
Gary.

The Pirate
The Pirate
Joined: 11 Nov 04
Posts: 57
Credit: 23,332,769
RAC: 0

And speaking of Redundancy,

And speaking of Redundancy, The title to this thread is.....well rife with redundancy.


gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68,962
RAC: 0

I agree with the basic point

I agree with the basic point made here.

Several of my results have been accepted with only three of the four results having been returned. If it is acceptable to the project to take a result as acceptable on the basis of 3 results, when one is not returned or comes in late, then it should be accpetable anyway with three, and IMHO the project should only send out three copies of the WU in the first place.

As has been rightly said, the reason this is done is to cut down on the additional waiting time needed when only two results come back and the WU cannot be marked as completed. However it is wrong to say this is because of people wanting their credit quickly (tho we all know that this pressure exists). Reading other threads it appears that the real motive, from the project viewpoint, is to keep the number of outstanding WU as low as possible in order to keep down the size of the databse.

I think this is a technical mistake. Firstly, while it does keep down the size of the database, it also means that for a given amount of useful work done there is almost 33% more database traffic than there would be with only 3 results given out per WU. And connections to the database are also said to be an important bottleneck.

Notice that a given client connects according to a combination of the progress of local work and the user's preferences, neither of which depend on how many other machines crunch the same WU.

Secondly, I think this is a human-resources mistake. It says to the users that the project is not willing to invest in a little more infrastructure to get an extra 33% out of the volunteers they have got. Even if, for technical reasons, that is incorrect, it is still a human resources mistake. Most people contributing want to feel that their donation is being used to best effect. When they feel it is not, some of them will leave.

I understand that the project has started to identify machines which habitually turn round work fast. My suggestion is that, once that list is sufficiently well populated, the following two changes are made together:

1. Cut number of copies of each WU to 3
2. Pass re-allocated WU only to machines know to have a fast turn-around.

(2) means that the extra time taken when someone does not return a WU will be reduced, (3) means there will be an overnight increase in throughput of almost 33%. And all without compromising the level of redundancy already considered acceptable, in that every WU will still be done at least 3x.

~~

~~gravywavy

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68,962
RAC: 0

> I think you mean that the

Message 9439 in response to message 9436


> I think you mean that the absolute
> minimum required is three. The purpose of the third is to adjudicate
> if the first two do not quite agree.

Depends how the checking is done.

If you have a two stage process where you wait for a quorum and then do the comparision, then yes three is the minimum. As you correctly say you need three so that if two disagree the third can arbitrate. That is why there are three chronos on a ship, for example.

But, if you let the voting process feedback into the assignment process, then you only need two at first. Then, only if the first two disagree would you ask for a third.

So the simple design decision to separate the voting from the assignment of WU has already increased redundancy from 2 to 3.

Another question is whether on a screening process like E@H or SETI you really need total redundancy at all. On the ZetaGrid project (www.zetagrid.net), looking for counter-examples to an unproven mathematical conjecture, the project went for limited redundancy. Newbies' work was double-checked, but as you returned more and more WU that checked out OK your machines got awarded a trust factor, and your work was only double-calculated on a random basis. About 10% of all WU were double-crunched overall.

This sort of approach has been acceptable in physics in the past. A long time ago I was involved in particle physics, and we'd have women (they were usually women) looking at thousands of photos taken in bubble chambers, trying to filter out the one-in-a-million runs that had some new physics in it. We had to inject 'false positives' into their work to keep them amused (humans can't cope well if they never find what they are looking for). But then providing a particular woman (pun intended) spotted the deliberate positives we'd injected into her work, we did not tend to issue the real work more than once. So only one in every ten or twenty pieces of work was taken up with error checking, not the majority.

So I'd say the minimum plausible redundancy is to calulate everything just 1.1 times, on average; that is enough to spot rogue machines (whether through malice or malfunction). A more comfortable value is to calcualte everything twice, and only call for third result when needed to arbitrate. Based on either of these minima there is a redundant amount of redundancy in most BOINC projects, built in by the basic BOINC infrastructure. While E@H works within BOINC it is not realistic to ask for anything less than 3x.

The change from 3x to 4x is, however, something that E@H could choose to do something about. And if you accept that 3x is already overly redundant, the initial comment looks very plausible.

~~gravywavy

RandyC
RandyC
Joined: 18 Jan 05
Posts: 2,189
Credit: 102,702,729
RAC: 13,507

> I agree with the basic

Message 9440 in response to message 9438

> I agree with the basic point made here.
>
That is your right. I happen to disagree with it, because the logic is flawed.
o The poster states that too much redundancy (in project A) is depriving other projects (B, C, etc) of computer time. BOINC allocates time among multiple projects by having them take turns using the CPU. The amount of time project A crunches on the CPU and what it crunches (redundant or not) is totally independant of what projects B, C, etc crunch on that same CPU. They each get their own slice of the CPU in their own turn.
o The only way project A could be considered to be taking time away from the other projects is if it would have run out of useful work for users to do. In the case of SETI, this is highly unlikely and they have plans to expand the type and quality of the science crunched as future input is enhanced (more antennas come online, etc). For Einstein, the project is just barely starting (compared to SETI) and I'm sure they have plans to perform more and deeper science as they ramp up and the user community upgrades their systems as well.
o The project administrators are the ones closest to the science of their projects. THEY determine the level of redundancy needed to process their work. Not individual users within the cruncher community. WE (the users) can question it, but only THEY (the administrators and scientists of the project) have all the necessary facts available to determine what level is needed.
>
> Several of my results have been accepted with only three of the four results
> having been returned. If it is acceptable to the project to take a result as
> acceptable on the basis of 3 results, when one is not returned or comes in
> late, then it should be accpetable anyway with three, and IMHO the project
> should only send out three copies of the WU in the first place.
>
You are entitled to your opinion.
>
> As has been rightly said, the reason this is done is to cut down on the
> additional waiting time needed when only two results come back and the WU
> cannot be marked as completed. However it is wrong to say this is because of
> people wanting their credit quickly (tho we all know that this pressure
> exists). Reading other threads it appears that the real motive, from the
> project viewpoint, is to keep the number of outstanding WU as low as possible
> in order to keep down the size of the databse.
>
A reasonable guess, but you and I are not privy to all the facts, so it can only be a guess.
>
> I think this is a technical mistake. Firstly, while it does keep down the
> size of the database, it also means that for a given amount of useful work
> done there is almost 33% more database traffic than there would be with only 3
> results given out per WU. And connections to the database are also said to
> be an important bottleneck.
>
Again a reasonable guess, but you don't/can't know unless you're on the project team itself.
>
> Notice that a given client connects according to a combination of the progress
> of local work and the user's preferences, neither of which depend on how
> many other machines crunch the same WU.
>
This is by design, and is a good thing.
>
> Secondly, I think this is a human-resources mistake. It says to the users
> that the project is not willing to invest in a little more infrastructure to
> get an extra 33% out of the volunteers they have got. Even if, for technical
> reasons, that is incorrect, it is still a human resources mistake. Most
> people contributing want to feel that their donation is being used to best
> effect. When they feel it is not, some of them will leave.
>
Maybe they will, maybe they won't. There are some crunchers who will refuse to work any other project than this one (I've seen threads on SETI where they state that in no uncertain terms). I'm afraid I flunked mind-reading 101, so I can't predict how any particular user will view this issue.
>
> I understand that the project has started to identify machines which
> habitually turn round work fast.
Yes, this is good and there are ways that could be made even more accurate by tweaking the BOINC client.
>My suggestion is that, once that list is
> sufficiently well populated, the following two changes are made together:
>
> 1. Cut number of copies of each WU to 3
> 2. Pass re-allocated WU only to machines know to have a fast turn-around.
>
> (2) means that the extra time taken when someone does not return a WU will be
> reduced, (3) means there will be an overnight increase in throughput of
> almost 33%. And all without compromising the level of redundancy already
> considered acceptable, in that every WU will still be done at least 3x.
>
You're entitled to suggest whatever you think is best, but the decisions are made by the project admins and scientists and they will/should do what they think is best. Who knows, your suggestion(s) may be accepted.

Seti Classic Final Total: 11446 WU.

Ned Ludd
Ned Ludd
Joined: 9 Feb 05
Posts: 23
Credit: 56,045
RAC: 0

> Please reduce the amount of

> Please reduce the amount of redundancy that is employed as it is depriving
> other projects of computer time while we are needlessly crunching a qourum of
> 4 when the absolute max needed is 3. Seti is guilty and so is Einstein. I know
> its so that they can grant credit faster and delete the unit from their disks
> but its not justified. In effect for every 100 computers only 25 are doing any
> real work the others are only duplicating. You need redundancy but not that
> much.

This thread is also on the SETI forum, with similar comments.

The basic assumption here is that people don't have a big affinity for any one project -- that crunching is crunching, and we as a group could care less about gravity waves, or extra-terrestrials, or whether we'll need a jacket on New Year's Day 2081 in Reykjavik.

... and if that was true, you'd see roughly the same number of hosts for every project -- because almost every host would be crunching every project.

We don't see that reflected in the various statistics sites.



Shaktai
Shaktai
Joined: 8 Nov 04
Posts: 183
Credit: 426,451
RAC: 0

How about we do what is best,

How about we do what is best, which is crunching, and let the scientists do what they do best which is decide the best way to achieve the results.

There is no benefit to the project or the science to waste computer or human resources. The scientists and project team make decisions based upon the best possible utilization of the resources they have (hardware & human), within the budgets they have.

No body on this project is being cheap, inconsiderate of the participants or ineffecient. They are working within the real world contraints they have man power, hardware and money-wise. I think the Predictor team has demonstrated their willingness to seriously consider the desires of the participants, but in the end the science and resource limitations determine the final approach. As progress is made, the methods may change.

This is Science, it is always a work in progress, learning, improving and changing every step of the way. When it is no longer a work in progress, then there will be no further need for our assistance.

While I will make suggestions for consideration by the project team when appropriate, I won't presume to second guess the project teams decisions. It is always in their best interest for the science and their jobs to operate with maximum effeciency, but only they can make the determination of what is the best compromise between all factors involved.

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5,385,205
RAC: 0

> The basic assumption here

Message 9443 in response to message 9441

> The basic assumption here is that people don't have a big affinity for any one
> project -- that crunching is crunching, and we as a group could care less
> about gravity waves, or extra-terrestrials, or whether we'll need a jacket on
> New Year's Day 2081 in Reykjavik.
>
> ... and if that was true, you'd see roughly the same number of hosts for every
> project -- because almost every host would be crunching every project.
>
> We don't see that reflected in the various statistics sites.

Reasons for that include:

1) Many of the Projects have limited the number of Participants for longer or shorter time frames. LHC@Home is currently at its cap, but plans to open again later ... Einstein@Home and Predictor@Home also have had caps ...

2) People may not know of the other projects, SETI@Home has that large installed base that it can/will call on to migrate to BOINC and have the most Participants for some time to come. Many of these participants are start and don't look types who have mostly just installed and don't pay attention ...

3) Most of the remaining projects are a little harder to explain the value of the science, and therefor are lagging a little.

I expect over time this will "correct" itself as people become more aware. I mean, how many people do we direct to other projects from SETI@Home when there is an outage?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.