Quorum of one?

svincent

Joined: 24 Oct 05

Posts: 5

Credit: 167049

RAC: 0

10 Nov 2011 2:02:06 UTC

Topic 196051

(moderation:

)

The Clean Energy Project, a subproject of the World Community Grid (they're looking for next generation organic photovoltaics), is going essentially from a quorum of 2 to 1, with no redundancy checking for results from hosts that are deemed to be reliable, but with occasional random workunits selected for double checking. The thread is here https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,31985 : unfortunately you need a WCG account to read it.

Is there a reason why a similar approach wouldn't work for Einstein@home or one of the pulsar-searching spinoff projects? It must be a very rare event that two reliable hosts report the same workunit as valid but the results are in fact different.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250360321

RAC: 35693

Quorum of one?

10 Nov 2011 12:13:05 UTC

Message 107529

(moderation:

)

AFAIK by now most projects use what is known as "adaptive replication", which would mean to accept results without further validation from "reliable" hosts.

For the GW search this was discussed in the LVC (LIGO-Virgo collaboration, the scientific community behind Einstein@home) at least two times that I remember, and each time strongly voted against.

The BRP search is doing a lot of computation on GPUs, which are numerically less reliable than e.g. CPUs (note that the invalid result rate is 20x as high as that of the other searches). As the results from Einstein@home are used directly for targeting re-observations, the requirements on the correctness of the results are somewhat higher.

Finally our youngest application for the FGRP search hasn't yet reached the reliability that we would dare to take the results without comparison.

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 405129789

RAC: 417958

RE: AFAIK by now most

16 Apr 2015 12:07:29 UTC

Message 107530

(moderation:

)

Quote:

AFAIK by now most projects use what is known as "adaptive replication", which would mean to accept results without further validation from "reliable" hosts.

For the GW search this was discussed in the LVC (LIGO-Virgo collaboration, the scientific community behind Einstein@home) at least two times that I remember, and each time strongly voted against.

The BRP search is doing a lot of computation on GPUs, which are numerically less reliable than e.g. CPUs (note that the invalid result rate is 20x as high as that of the other searches). As the results from Einstein@home are used directly for targeting re-observations, the requirements on the correctness of the results are somewhat higher.

Finally our youngest application for the FGRP search hasn't yet reached the reliability that we would dare to take the results without comparison.

BM

Is this still not viable?
Are we still continuing with a high rate of invalids from GPU's?

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

RE: Are we still

16 Apr 2015 19:12:16 UTC

Message 107531 in response to message 107530

(moderation:

)

Quote:

Are we still continuing with a high rate of invalids from GPU's?

I would say yes, very high at times.

See http://einstein6.aei.uni-hannover.de/EinsteinAtHome/download/BRP6-progress/ for one example. BRP4 is similar.

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 405129789

RAC: 417958

And for CPU dedicated

17 Apr 2015 10:40:49 UTC

Message 107532

(moderation:

)

And for CPU dedicated searches? Like it is the case now for S6GW and FGRP?

Would it be possible?

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

RE: And for CPU dedicated

17 Apr 2015 17:53:31 UTC

Message 107533 in response to message 107532

(moderation:

)

Quote:

And for CPU dedicated searches? Like it is the case now for S6GW and FGRP?

http://einstein.phys.uwm.edu/server_status.html

Certainly shows even for CPU the invalid rates are high, S6BucketFU1UB is around 20% which was higher than I expected.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

Insofar as I can remember, I

23 May 2015 20:31:37 UTC

Message 107534 in response to message 107533

(moderation:

)

Insofar as I can remember, I have never had a S6Bucket Follow-up #2 or a Parkes PMPS XT v1.52 (BRP6-cuda32-nv301) fail.
http://einsteinathome.org/host/11368189/tasks
http://einsteinathome.org/host/11671653/tasks

The possible exceptions might be if the machine crashes for other reasons, but that is quite rare now. So it would seem to me that some machines are more susceptible to failures than others.

In the case of GPUs, this is easy to understand: the gamers frequently overclock their cards and think that just because the games don't crash, they are good for Einstein. It happens on every project; they don't realize that scientific calculations are a different story, and it takes some education to get the majority up to speed.

In the case of CPUs, overclocking problems are also possible, but I would suspect it is more likely chip or OS incompatibilities. Some projects just prefer one type over another.

But isn't the real question whether the machines that DO complete get it right often enough? The outright failures are easy to spot and are eliminated anyway. If the successful runs are always "good" at the scientific level, then the scheme mentioned by svincent above should work here too. It might be worth a study to compare the machines giving successful results to see if the quorum is really necessary.

Therefore, some machines may be more reliable than others, and can be "trusted", if their results are good often enough as you define it. I believe that on CEP2, there are periodic re-evaluations too, to ensure that machines are still providing good results.

Quorum of one?

Forums › Cruncher's Corner

Quorum of one?

RE: AFAIK by now most

RE: Are we still

And for CPU dedicated

RE: And for CPU dedicated

Insofar as I can remember, I

Comment viewing options

Forums › Cruncher's Corner