Arecibo Binary Pulsar Search (STSP) min quorum 21

{CurlY BracketS}

Joined: 9 Feb 05

Posts: 4

Credit: 899111

RAC: 0

12 Oct 2010 9:39:01 UTC

Topic 195378

(moderation:

)

Can anybody explain to me why there's WUs with a minimum quorum of 21, but they have been sent only to 2 clients, like eg p2030_53925_58521_0173_G201.47+00.42.C_6.dm_270 ?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250226353

RAC: 35937

Arecibo Binary Pulsar Search (STSP) min quorum 21

12 Oct 2010 11:04:48 UTC

Message 99998

(moderation:

)

To inhibit validation until the results have arrived on the correct server. See this news and this thread.

{CurlY BracketS}

Joined: 9 Feb 05

Posts: 4

Credit: 899111

RAC: 0

Well, I don't understand all

12 Oct 2010 11:35:12 UTC

Message 99999

(moderation:

)

Well, I don't understand all of this, but thanks anyway...

mikey

Joined: 22 Jan 05

Posts: 12663

Credit: 1839063099

RAC: 4270

RE: Well, I don't

13 Oct 2010 11:43:10 UTC

Message 100000 in response to message 99999

(moderation:

)

Quote:

Well, I don't understand all of this, but thanks anyway...

I have to agree with you, Bernd I don't understand that either. In the News Thread you said "tasks that were issued with the wrong upload URL" and that was the problem, now you are seeming to say that you did it on purpose. I think I am misunderstanding what you are saying. The basic question was why if you need 21 results to verify a unit are you only sending it out to 2 people initially? And you said "tasks that were issued with the wrong upload URL", that answer doesn't seem to match the question.

Also doesn't this cause those units to be 'in the system' FOREVER? And we users to then wait forever for a unit to be validated and credits granted? I mean if it takes 21 valid returns and you are only sending 2 units out once every 2 weeks, we are talking FOREVER before the credits are granted. Isn't this going to cause your database to have to hold these units essentially in limbo until all 21 VALID results are returned to you? If so I hope you have alot of hard drives and a very fast server!

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7214814931

RAC: 978164

RE: Also doesn't this cause

13 Oct 2010 13:51:01 UTC

Message 100001 in response to message 100000

(moderation:

)

Quote:

Also doesn't this cause those units to be 'in the system' FOREVER?

No.

Let me try to summarize--though I'm just a participant.

Point one: In the conversion from 4-fold to 10-fold Work Units for ABP2 an error was made (the "wrong URL").

Point two: a consequence of that error was that this work failed validation (falsely) on return. A consequence of the validation failure was that new issue was made--so if unfixed we not only had people not getting credit for good work (possibly fixable later), but had people repeating work already done to no useful purpose (pure waste). This was a large-scale episode, involving many tens of thousands of results at least.

Point three: The 21/2 condition of some WUs is an interim control measure--and is neither the original condition of the WUs in question nor their intended final state. While active, it "freezes" returned work--so it is not found falsely invalid and inappropriate reissue does not happen. The 21 part is an intentionally unmeetable but temporary condition--so validation is not attempted.

Point four: (this part I understand least). In some sense appropriate information sent to the right place after results are already returned allows correct validation to occur. This is currently a batch process (I believe an initial attempt to cover the full incident proved too large for the infrastructure to tolerate). You as a user may see this as a sudden conversion of many 21/2 results returned but sitting in Pending limbo to Valid, credited results over a few hour period. I think this happened most recently a day or so ago, and that Bernd plans one final round in about a week, after which all the original problem work will have gone past deadline expiry.

To mods and officials: I was just trying to summarize for those happening to read this thread. I'd welcome any correction, or simple deletion of my post if you see a better way of helping understanding.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250226353

RAC: 35937

archae86 is correct. The

13 Oct 2010 14:21:53 UTC

Message 100002

(moderation:

)

archae86 is correct.

The main point is that I intentionally modified the minimum quorum (and, for that matter, initial replication) outside a reasonable range to put certain workunits on hold i.e. to inhibit validation and further replication.

It is true that while in this state the tasks will not be credited and will stay in the system. But I'll restore the original values for both parameters at some point, and let BOINC continue finishing these workunits the normal way.

The reason for putting these workunits on hold is that their first tasks were sent out with the wrong "upload URL", so after crunching the clients would upload the files to a wrong server. We put the affected workunits on hold until we transferred the affected results to the correct server. After that, we released the workunits again by restoring the original settings.

Originally ~70.000 workunits were put on hold that way, and by now ~45.000 have been released again, and their tasks correctly validated and credited.

{CurlY BracketS}

Joined: 9 Feb 05

Posts: 4

Credit: 899111

RAC: 0

Now it starts to make sense,

13 Oct 2010 15:45:33 UTC

Message 100003

(moderation:

)

Now it starts to make sense, thanx archae86 for clarifying things.

mikey

Joined: 22 Jan 05

Posts: 12663

Credit: 1839063099

RAC: 4270

RE: Now it starts to make

14 Oct 2010 11:22:53 UTC

Message 100004 in response to message 100003

(moderation:

)

Quote:

Now it starts to make sense, thanx archae86 for clarifying things.

I totally agree, thanks from me too!!

Arecibo Binary Pulsar Search (STSP) min quorum 21

Forums › Cruncher's Corner

Arecibo Binary Pulsar Search (STSP) min quorum 21

Well, I don't understand all

RE: Well, I don't

RE: Also doesn't this cause

archae86 is correct. The

Now it starts to make sense,

RE: Now it starts to make

Comment viewing options

Forums › Cruncher's Corner