Quorum of one.

tito

Joined: 10 Jun 06

Posts: 28

Credit: 1235057830

RAC: 720571

25 Jun 2022 12:38:51 UTC

Topic 227721

(moderation:

)

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3928

Credit: 45773362642

RAC: 63838235

Oh wow. That explains why

25 Jun 2022 14:06:13 UTC

Message 198128

(moderation:

)

Oh wow. That explains why there was a sudden uptick in credit recently.

I’ll forward off to Bernd.

_________________________________________________________________________

GWGeorge007

Joined: 8 Jan 18

Posts: 3037

Credit: 4941231023

RAC: 898748

tito wrote: For day or so

25 Jun 2022 15:47:33 UTC

Message 198130

(moderation:

)

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?

From: 10 Nov 2011 12:13:05 UTC

Message 107529

Bernd Machenschalk wrote:

AFAIK by now most projects use what is known as "adaptive replication", which would mean to accept results without further validation from "reliable" hosts.

For the GW search this was discussed in the LVC (LIGO-Virgo collaboration, the scientific community behind Einstein@home) at least two times that I remember, and each time strongly voted against.

The BRP search is doing a lot of computation on GPUs, which are numerically less reliable than e.g. CPUs (note that the invalid result rate is 20x as high as that of the other searches). As the results from Einstein@home are used directly for targeting re-observations, the requirements on the correctness of the results are somewhat higher.

Finally our youngest application for the FGRP search hasn't yet reached the reliability that we would dare to take the results without comparison.

BM

George

Proud member of the Old Farts Association

GWGeorge007

Joined: 8 Jan 18

Posts: 3037

Credit: 4941231023

RAC: 898748

Ian&Steve C. wrote: Oh wow.

25 Jun 2022 18:17:43 UTC

Message 198134 in response to message 198128

(moderation:

)

Ian&Steve C. wrote:

Oh wow. That explains why there was a sudden uptick in credit recently.

I’ll forward off to Bernd.

Would this be why? FWIW, the "Not Running" is in RED, vs grey, which is normally what Not Running is.

FGRPB1G assimilator

einstein4

Not Running

George

Proud member of the Old Farts Association

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117010264332

RAC: 36417401

GWGeorge007 wrote: tito

26 Jun 2022 8:07:12 UTC

Message 198141 in response to message 198130

(moderation:

)

GWGeorge007 wrote:

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?

Yes, that's exactly what's happening - adaptive replication. That particular Bernd quote was from 2011 but adaptive replication has been used later than that.

Here is a 2019 comment describing adaptive replication. I remember something more recent about current searches but couldn't quickly find it. I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).

Cheers,
Gary.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3928

Credit: 45773362642

RAC: 63838235

Gary Roberts

26 Jun 2022 12:47:25 UTC

Message 198145 in response to message 198141

(moderation:

)

Gary Roberts wrote:

GWGeorge007 wrote:

tito wrote:

For day or so all my Gamma-ray pulsar binary search has quorum of one.

Shall it be like that? Sorry if that is mentioned somewhere I can't find.

Would this be what your looking for?

Yes, that's exactly what's happening - adaptive replication. That particular Bernd quote was from 2011 but adaptive replication has been used later than that.

Here is a 2019 comment describing adaptive replication. I remember something more recent about current searches but couldn't quickly find it. I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).

no that’s not what’s happening. Bernd said in PM to me that the new work unit generator built for their new OS behaves differently and quorum of one was not intended. They’ve never used this on Einstein. Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G

it will be corrected.

_________________________________________________________________________

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117010264332

RAC: 36417401

Ian&Steve C. wrote: ....

27 Jun 2022 0:10:14 UTC

Message 198161 in response to message 198145

(moderation:

)

Ian&Steve C. wrote:

.... Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G

Yes, thanks for the correction. For some reason it was stuck in my head that the OP was talking about BRP4G rather than FGRPB1G and I think adaptive replication is being used with that new Arecibo, large search.

The issue with the wrong use for FGRPB1G must have started some days ago - during the currency of the previous data file which was LATeah3012L12.dat. A couple of hours after posting, I noticed that suddenly my hosts were receiving lots of _1 branded tasks for 3012L12 - ie. the missing 2nd half of the quorum, along with the regular 3012L13 tasks for the current data file.

If the _0 task hasn't yet been returned, it might be an easy fix. Might be a bit more work to identify and revert the validation state on those already returned and validated. Those tasks will need to wait for the _1 task to be crunched, returned before a second validation process. Might be a bit of a nightmare to sort all that out :-).

Cheers,
Gary.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3928

Credit: 45773362642

RAC: 63838235

My solution would be to issue

27 Jun 2022 1:15:00 UTC

Message 198163

(moderation:

)

My solution would be to issue wingmen to any _0s not returned yet. Then all the _0s that have been returned and validated, issue a wingman as well. If the result comes back inconclusive, issue another and find the valid pair. If the invalid was the original _0, just let them keep the points. Not a huge deal it was only a day or two that this was going on before the project fixed it.

_________________________________________________________________________

Krzysiek_mil(K_PL)

Joined: 12 Oct 08

Posts: 5

Credit: 1184251000

RAC: 311425

Hello. The problem still

2 Jul 2022 13:03:09 UTC

Message 198381

(moderation:

)

Hello. The problem still exists in the CPU tasks. https://einsteinathome.org/pl/workunit/652155691

Quorum of one.

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner