Quorum replication

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 418877692

RAC: 165808

29 Nov 2008 12:49:13 UTC

Topic 194062

(moderation:

)

They are beta testing single replication at seti. Could it be possible at einstein? It could possibly duplicate our processing power

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 806760009

RAC: 1270934

Quorum replication

1 Dec 2008 5:57:45 UTC

Message 88756

(moderation:

)

Quote:

They are beta testing single replication at seti. Could it be possible at einstein? It could possibly duplicate our processing power

Just my personal opinion, but given the fact that E@H (well like any BOINC project I guess) is often used for overclocking calibration, I don't see this happening.

Apart from overclocked or fading hardware, we've seen computational problems because of bugs in compilers and operating systems (e.g. the Linux kernel bug that causes "signal 8" errors if you are lucky and maybe just randomly wrong results if you are unlucky) that IMHO discourage single replication. E@H is a "needle in haystack" type of search, and if you happen not to find any needles, at least you want to make a statement like "we can be 95% sure that any needle in that haystack must be below size X because otherwise we would have found it". If you are not sufficiently confident in the validity of results returned (either because of computational errors or because someone manipulated the app to run faster but possibly returing bad results), you can't make that sort of statement.

I wonder how this risk will be handled at S@H? But I guess they care less for "upper limits" (like 'If an alien civilization would use this and that technology to transmit a signal omnidirectionally, they are at least X lightyears away').

CU
Bikeman

Novasen169

Joined: 14 May 06

Posts: 43

Credit: 2767204

RAC: 0

You could probably make

1 Dec 2008 6:30:49 UTC

Message 88757

(moderation:

)

You could probably make certain OS's / builds / application version combinations cause a second issue of the WU. I think some project is already doing that or has done that, but I'm not sure which. Problem with that is that you probably don't notice in time if a certain OS / build / application is causing problems. Also, wouldn't it theoritically be possible that if two people use the same problem-causing setup it'll still validate? If that's true then there will always be a risk of the needle being overlooked

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

@ Bikeman: Agreed, this is a

1 Dec 2008 6:31:43 UTC

Message 88758

(moderation:

)

@ Bikeman: Agreed, this is a much more rigorous project in terms of evaluating the returned data from the hosts than SAH.

I don't care how long it takes to get through a science run (within reason) as long as we don't miss a wave or 'find' one that's hokey. ;-)

@ BR: That's the whole point for the multiple replication. It gives 'natural' warning indicators that are relatively easy to spot in the returning data stream and can be addressed before it turns the whole run into 'junk'.

Given how long they take even under the best conditions, you want to minimize the times you have to start over after six months in. ;-)

Alinator

Winterknight

Joined: 4 Jun 05

Posts: 1518

Credit: 406915580

RAC: 555387

RE: I wonder how this risk

1 Dec 2008 13:25:53 UTC

Message 88759 in response to message 88756

(moderation:

)

Quote:

I wonder how this risk will be handled at S@H? But I guess they care less for "upper limits" (like 'If an alien civilization would use this and that technology to transmit a signal omnidirectionally, they are at least X lightyears away').

CU
Bikeman

Analysis of the tasks at Seti show very few completed tasks are corrupt.

It is being trialled on Seti Beta with each host earning a "relability" score. The very reliable hosts will do 90% of work unvalidated, with 10% of tasks chosen at random validated.
The tasks for the host are hidden until completed so that a user will not know which tasks are to be validated. And the replication suffix _N is also randomised.

Also now that they are using the Alpha receiver, which has seven beams each with Horizontal and Vertical polarization, means there are now 14 * more units compared to the old Line Feed receiver. As the beams have some overlap and any signal received will probably not be Horizontal or Vertical, then any signal from ET will probably be in more than one task. So presumably some cross checking can be done if a significant signal is reported in one workunit.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253012254

RAC: 42193

WCG is already using "single

1 Dec 2008 13:49:04 UTC

Message 88760

(moderation:

)

WCG is already using "single replication" for some of their applications (sub-projects so to speak). They keep a "reliability score" of each host, which basically determines the ratio in which these hosts get sent tasks for "single" or "validated" workunits (both are available). Even a highly "reliable" host gets sent a task that's compared to another task of the same workunit from time to time, so they occasionally verify their reliability records.

Yes, there still is a chance that if two people use the same "broken" application on Einstein@home the results of these two are compared, and so wrong results might end up as canonical. However with the increased number of compute cluster machines that only run "official" apps this case has become rather unlikely.

The techs (like me) discussed "single replication" with the scientists behind Einstein@home when we switched to "server assigned credit", which allowed us to reduce the initial replication from 3 to 2 (I think S4 was the first run to feature this). The scientists insisted on at least a quorum of 2 for verification. I don't think they feel differently about that nowadays.

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

RE: WCG is already using

1 Dec 2008 14:24:49 UTC

Message 88761 in response to message 88760

(moderation:

)

Quote:

WCG is already using "single replication" for some of their applications (sub-projects so to speak). They keep a "reliability score" of each host, which basically determines the ratio in which these hosts get sent tasks for "single" or "validated" workunits (both are available). Even a highly "reliable" host gets sent a task that's compared to another task of the same workunit from time to time, so they occasionally verify their reliability records.

Yes, there still is a chance that if two people use the same "broken" application on Einstein@home the results of these two are compared, and so wrong results might end up as canonical. However with the increased number of compute cluster machines that only run "official" apps this case has become rather unlikely.

The techs (like me) discussed "single replication" with the scientists behind Einstein@home when we switched to "server assigned credit", which allowed us to reduce the initial replication from 3 to 2 (I think S4 was the first run to feature this). The scientists insisted on at least a quorum of 2 for verification. I don't think they feel differently about that nowadays.

BM

I am running QMC@home where quorum is 1. I am running stock apps both in QMC and Einstein, optimized apps on SETI MB and Astropulse because stock apps are very slow on my Opteron 1210. I am also running CPDN and CPDN Beta on the same CPU, always using Linux, and also LHC when available. Its FP unit seems very reliable, I never get "compute error".
Tullio

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 1

RE: RE: I wonder how this

2 Dec 2008 15:47:28 UTC

Message 88762 in response to message 88759

(moderation:

)

Quote:

Quote:
I wonder how this risk will be handled at S@H? But I guess they care less for "upper limits" (like 'If an alien civilization would use this and that technology to transmit a signal omnidirectionally, they are at least X lightyears away').

Analysis of the tasks at Seti show very few completed tasks are corrupt.

A small percentage that is. A few bad hosts could still cause a lot of possibly localised pollution.

Quote:

It is being trialled on Seti Beta with each host earning a "relability" score. The very reliable hosts will do 90% of work unvalidated, with 10% of tasks chosen at random validated.
The tasks for the host are hidden until completed so that a user will not know which tasks are to be validated. And the replication suffix _N is also randomised.

Good stuff. That eliminates one of my 'cheats danger' arguments.

Quote:

... then any signal from ET will probably be in more than one task. So presumably some cross checking can be done if a significant signal is reported in one workunit.

Matt on s@h also mentioned that 'sanity checks' were to be included server-side for each workunit to catch cheater hosts that might try quickly returning 'incompletely' processed units.

All good for improved performance provided that the cheaters or dubious hosts can not lever the system for their own abominable or ignorant gains.

Happy crunching,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Quorum replication

Forums › Cruncher's Corner

Quorum replication

You could probably make

@ Bikeman: Agreed, this is a

RE: I wonder how this risk

WCG is already using "single

RE: WCG is already using

RE: RE: I wonder how this

Comment viewing options

Forums › Cruncher's Corner