Quorum of one.

tito
tito
Joined: 10 Jun 06
Posts: 23
Credit: 897,248,226
RAC: 1,704,510
Topic 227721

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 2,382
Credit: 19,298,171,437
RAC: 37,509,417

Oh wow. That explains why

Oh wow. That explains why there was a sudden uptick in credit recently. 
 

I’ll forward off to Bernd. 

_________________________________________________________________________

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 1,297
Credit: 2,393,372,972
RAC: 5,941,821

tito wrote: For day or so

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?


From: 10 Nov 2011 12:13:05 UTC

Bernd Machenschalk wrote:

AFAIK by now most projects use what is known as "adaptive replication", which would mean to accept results without further validation from "reliable" hosts.

For the GW search this was discussed in the LVC (LIGO-Virgo collaboration, the scientific community behind Einstein@home) at least two times that I remember, and each time strongly voted against.

The BRP search is doing a lot of computation on GPUs, which are numerically less reliable than e.g. CPUs (note that the invalid result rate is 20x as high as that of the other searches). As the results from Einstein@home are used directly for targeting re-observations, the requirements on the correctness of the results are somewhat higher.

Finally our youngest application for the FGRP search hasn't yet reached the reliability that we would dare to take the results without comparison.

BM


 

George

A proud member of the O.F.A. (Old Farts Association)

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 1,297
Credit: 2,393,372,972
RAC: 5,941,821

Ian&Steve C. wrote: Oh wow.

Ian&Steve C. wrote:

Oh wow. That explains why there was a sudden uptick in credit recently. 
 

I’ll forward off to Bernd. 

Would this be why?  FWIW, the "Not Running" is in RED, vs grey, which is normally what Not Running is.

FGRPB1G assimilator einstein4 Not Running

George

A proud member of the O.F.A. (Old Farts Association)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,625
Credit: 89,768,693,117
RAC: 57,576,013

GWGeorge007 wrote: tito

GWGeorge007 wrote:

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?

Yes, that's exactly what's happening - adaptive replication.  That particular Bernd quote was from 2011 but adaptive replication has been used later than that.

Here is a 2019 comment describing adaptive replication.  I remember something more recent about current searches but couldn't quickly find it.  I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 2,382
Credit: 19,298,171,437
RAC: 37,509,417

Gary Roberts

Gary Roberts wrote:

GWGeorge007 wrote:

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?

Yes, that's exactly what's happening - adaptive replication.  That particular Bernd quote was from 2011 but adaptive replication has been used later than that.

Here is a 2019 comment describing adaptive replication.  I remember something more recent about current searches but couldn't quickly find it.  I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).

no that’s not what’s happening. Bernd said in PM to me that the new work unit generator built for their new OS behaves differently and quorum of one was not intended. They’ve never used this on Einstein. Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G 
 

it will be corrected. 

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,625
Credit: 89,768,693,117
RAC: 57,576,013

Ian&Steve C. wrote: ....

Ian&Steve C. wrote:

.... Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G

Yes, thanks for the correction.  For some reason it was stuck in my head that the OP was talking about BRP4G rather than FGRPB1G and I think adaptive replication is being used with that new Arecibo, large search.

The issue with the wrong use for FGRPB1G must have started some days ago - during the currency of the previous data file which was LATeah3012L12.dat.  A couple of hours after posting, I noticed that suddenly my hosts were receiving lots of _1 branded tasks for 3012L12 - ie. the missing 2nd half of the quorum, along with the regular 3012L13 tasks for the current data file.

If the _0 task hasn't yet been returned, it might be an easy fix.  Might be a bit more work to identify and revert the validation state on those already returned and validated.  Those tasks will need to wait for the _1 task to be crunched, returned before a second validation process.  Might be a bit of a nightmare to sort all that out :-).

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 2,382
Credit: 19,298,171,437
RAC: 37,509,417

My solution would be to issue

My solution would be to issue wingmen to any _0s not returned yet. Then all the _0s that have been returned and validated, issue a wingman as well. If the result comes back inconclusive, issue another and find the valid pair. If the invalid was the original _0, just let them keep the points. Not a huge deal it was only a day or two that this was going on before the project fixed it. 

_________________________________________________________________________

K_PL
K_PL
Joined: 12 Oct 08
Posts: 1
Credit: 711,161,331
RAC: 1,025,173

Hello. The problem still


Hello. The problem still exists in the CPU tasks. https://einsteinathome.org/pl/workunit/652155691

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.