Quorum of one.

tito
tito
Joined: 10 Jun 06
Posts: 25
Credit: 1,063,420,272
RAC: 1,410,240
Topic 227721

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,760
Credit: 36,152,656,060
RAC: 44,741,563

Oh wow. That explains why

Oh wow. That explains why there was a sudden uptick in credit recently. 
 

I’ll forward off to Bernd. 

_________________________________________________________________________

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2,863
Credit: 4,751,104,853
RAC: 3,421,349

tito wrote: For day or so

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?


From: 10 Nov 2011 12:13:05 UTC

Bernd Machenschalk wrote:

AFAIK by now most projects use what is known as "adaptive replication", which would mean to accept results without further validation from "reliable" hosts.

For the GW search this was discussed in the LVC (LIGO-Virgo collaboration, the scientific community behind Einstein@home) at least two times that I remember, and each time strongly voted against.

The BRP search is doing a lot of computation on GPUs, which are numerically less reliable than e.g. CPUs (note that the invalid result rate is 20x as high as that of the other searches). As the results from Einstein@home are used directly for targeting re-observations, the requirements on the correctness of the results are somewhat higher.

Finally our youngest application for the FGRP search hasn't yet reached the reliability that we would dare to take the results without comparison.

BM


 

George

Proud member of the Old Farts Association

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2,863
Credit: 4,751,104,853
RAC: 3,421,349

Ian&Steve C. wrote: Oh wow.

Ian&Steve C. wrote:

Oh wow. That explains why there was a sudden uptick in credit recently. 
 

I’ll forward off to Bernd. 

Would this be why?  FWIW, the "Not Running" is in RED, vs grey, which is normally what Not Running is.

FGRPB1G assimilator einstein4 Not Running

George

Proud member of the Old Farts Association

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,853
Credit: 111,254,564,452
RAC: 34,857,045

GWGeorge007 wrote: tito

GWGeorge007 wrote:

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?

Yes, that's exactly what's happening - adaptive replication.  That particular Bernd quote was from 2011 but adaptive replication has been used later than that.

Here is a 2019 comment describing adaptive replication.  I remember something more recent about current searches but couldn't quickly find it.  I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,760
Credit: 36,152,656,060
RAC: 44,741,563

Gary Roberts

Gary Roberts wrote:

GWGeorge007 wrote:

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?

Yes, that's exactly what's happening - adaptive replication.  That particular Bernd quote was from 2011 but adaptive replication has been used later than that.

Here is a 2019 comment describing adaptive replication.  I remember something more recent about current searches but couldn't quickly find it.  I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).

no that’s not what’s happening. Bernd said in PM to me that the new work unit generator built for their new OS behaves differently and quorum of one was not intended. They’ve never used this on Einstein. Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G 
 

it will be corrected. 

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,853
Credit: 111,254,564,452
RAC: 34,857,045

Ian&Steve C. wrote: ....

Ian&Steve C. wrote:

.... Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G

Yes, thanks for the correction.  For some reason it was stuck in my head that the OP was talking about BRP4G rather than FGRPB1G and I think adaptive replication is being used with that new Arecibo, large search.

The issue with the wrong use for FGRPB1G must have started some days ago - during the currency of the previous data file which was LATeah3012L12.dat.  A couple of hours after posting, I noticed that suddenly my hosts were receiving lots of _1 branded tasks for 3012L12 - ie. the missing 2nd half of the quorum, along with the regular 3012L13 tasks for the current data file.

If the _0 task hasn't yet been returned, it might be an easy fix.  Might be a bit more work to identify and revert the validation state on those already returned and validated.  Those tasks will need to wait for the _1 task to be crunched, returned before a second validation process.  Might be a bit of a nightmare to sort all that out :-).

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,760
Credit: 36,152,656,060
RAC: 44,741,563

My solution would be to issue

My solution would be to issue wingmen to any _0s not returned yet. Then all the _0s that have been returned and validated, issue a wingman as well. If the result comes back inconclusive, issue another and find the valid pair. If the invalid was the original _0, just let them keep the points. Not a huge deal it was only a day or two that this was going on before the project fixed it. 

_________________________________________________________________________

Krzysiek_mil(K_PL)
Krzysiek_mil(K_PL)
Joined: 12 Oct 08
Posts: 5
Credit: 1,105,212,619
RAC: 730,186

Hello. The problem still


Hello. The problem still exists in the CPU tasks. https://einsteinathome.org/pl/workunit/652155691

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.