AFAIK by now most projects use what is known as "adaptive replication", which would mean to accept results without further validation from "reliable" hosts.
For the GW search this was discussed in the LVC (LIGO-Virgo collaboration, the scientific community behind Einstein@home) at least two times that I remember, and each time strongly voted against.
The BRP search is doing a lot of computation on GPUs, which are numerically less reliable than e.g. CPUs (note that the invalid result rate is 20x as high as that of the other searches). As the results from Einstein@home are used directly for targeting re-observations, the requirements on the correctness of the results are somewhat higher.
Finally our youngest application for the FGRP search hasn't yet reached the reliability that we would dare to take the results without comparison.
For day or so all my Gamma-ray pulsar binary search has quorum of one.
Shall it be like that? Sorry if that is mentioned somewhere I can't find.
Would this be what your looking for?
Yes, that's exactly what's happening - adaptive replication. That particular Bernd quote was from 2011 but adaptive replication has been used later than that.
Here is a 2019 comment describing adaptive replication. I remember something more recent about current searches but couldn't quickly find it. I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).
For day or so all my Gamma-ray pulsar binary search has quorum of one.
Shall it be like that? Sorry if that is mentioned somewhere I can't find.
Would this be what your looking for?
Yes, that's exactly what's happening - adaptive replication. That particular Bernd quote was from 2011 but adaptive replication has been used later than that.
Here is a 2019 comment describing adaptive replication. I remember something more recent about current searches but couldn't quickly find it. I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).
no that’s not what’s happening. Bernd said in PM to me that the new work unit generator built for their new OS behaves differently and quorum of one was not intended. They’ve never used this on Einstein. Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G
.... Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G
Yes, thanks for the correction. For some reason it was stuck in my head that the OP was talking about BRP4G rather than FGRPB1G and I think adaptive replication is being used with that new Arecibo, large search.
The issue with the wrong use for FGRPB1G must have started some days ago - during the currency of the previous data file which was LATeah3012L12.dat. A couple of hours after posting, I noticed that suddenly my hosts were receiving lots of _1 branded tasks for 3012L12 - ie. the missing 2nd half of the quorum, along with the regular 3012L13 tasks for the current data file.
If the _0 task hasn't yet been returned, it might be an easy fix. Might be a bit more work to identify and revert the validation state on those already returned and validated. Those tasks will need to wait for the _1 task to be crunched, returned before a second validation process. Might be a bit of a nightmare to sort all that out :-).
My solution would be to issue wingmen to any _0s not returned yet. Then all the _0s that have been returned and validated, issue a wingman as well. If the result comes back inconclusive, issue another and find the valid pair. If the invalid was the original _0, just let them keep the points. Not a huge deal it was only a day or two that this was going on before the project fixed it.
Oh wow. That explains why
)
Oh wow. That explains why there was a sudden uptick in credit recently.
I’ll forward off to Bernd.
_________________________________________________________________________
tito wrote: For day or so
)
Would this be what your looking for?
From: 10 Nov 2011 12:13:05 UTC
Proud member of the Old Farts Association
Ian&Steve C. wrote: Oh wow.
)
Would this be why? FWIW, the "Not Running" is in RED, vs grey, which is normally what Not Running is.
Proud member of the Old Farts Association
GWGeorge007 wrote: tito
)
Yes, that's exactly what's happening - adaptive replication. That particular Bernd quote was from 2011 but adaptive replication has been used later than that.
Here is a 2019 comment describing adaptive replication. I remember something more recent about current searches but couldn't quickly find it. I think that it's used for "trusted" hosts (ie. returning known valid work) and that around 10% of tasks are 'paired' just to ensure the host should continue to be trusted :-).
Cheers,
Gary.
Gary Roberts
)
no that’s not what’s happening. Bernd said in PM to me that the new work unit generator built for their new OS behaves differently and quorum of one was not intended. They’ve never used this on Einstein. Bernd’s old post makes it clear that it’s been considered but not implemented for FGRPB1G
it will be corrected.
_________________________________________________________________________
Ian&Steve C. wrote: ....
)
Yes, thanks for the correction. For some reason it was stuck in my head that the OP was talking about BRP4G rather than FGRPB1G and I think adaptive replication is being used with that new Arecibo, large search.
The issue with the wrong use for FGRPB1G must have started some days ago - during the currency of the previous data file which was LATeah3012L12.dat. A couple of hours after posting, I noticed that suddenly my hosts were receiving lots of _1 branded tasks for 3012L12 - ie. the missing 2nd half of the quorum, along with the regular 3012L13 tasks for the current data file.
If the _0 task hasn't yet been returned, it might be an easy fix. Might be a bit more work to identify and revert the validation state on those already returned and validated. Those tasks will need to wait for the _1 task to be crunched, returned before a second validation process. Might be a bit of a nightmare to sort all that out :-).
Cheers,
Gary.
My solution would be to issue
)
My solution would be to issue wingmen to any _0s not returned yet. Then all the _0s that have been returned and validated, issue a wingman as well. If the result comes back inconclusive, issue another and find the valid pair. If the invalid was the original _0, just let them keep the points. Not a huge deal it was only a day or two that this was going on before the project fixed it.
_________________________________________________________________________
Hello. The problem still
)
Hello. The problem still exists in the CPU tasks. https://einsteinathome.org/pl/workunit/652155691