Slow wingmen & growing pending accounts

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: What I have, from your

Message 85778 in response to message 85777

Quote:

What I have, from your clear comment, is that given time (18 days) things will clear up. So, I am not advocating a return to the shorter reporting period.

Apparently I failed to convey things properly. "Things" (a perceived or real increase in pending credit here) will not "clear up" in 18 days. 18 days is the length of the deadline of any replication of any task on this project.

Example:

Task 123456 (yes, I know they're not named that way, but being simple), is assigned to your host at 0:00:00 UTC on October 10th. Your specific replication of that task is due to be reported by 0:00:00 UTC on October 28th. For this project, a quorum of 2 is needed. The second replication does not get assigned until 18:00:00 UTC on October 13th. The workunit as a whole now has a deadline of 18:00:00 UTC on October 31st. Even if you submit your replication as complete at 10:00:00 UTC on October 10th, the other person still has until 18:00:00 UTC on October 31st to get their replication back in.

That said, you could have several scenarios:

  • * The second replication is reported at 23:30:00 UTC on October 13th and it validates with your replication and you get credit as of 23:30:00 UTC (plus whatever validator backlog there may be).
    * The second replication is reported 10 days later at 10:00:00 UTC and validates with your replication.
    * The second replication is reported 10 days later at 10:00:00 UTC and fails to validate with your replication. A new replication is generated, but another host doesn't pick it up for the same gap as before (3 days), so the new deadline becomes November 10th.
    * A possible rinse and repeat of the failure that happened with the second replication with the new third replication in the prior bullet point.
    * The second replication completely exhausts their deadline and a new replication has to be sent, but isn't picked up until November 2nd, giving a new deadline of November 20th.
    * Etc, etc, etc, with a number of other combinations.

If the speedup is fairly substantial with the Windows app and it can be made the app that Windows users use by default, then it will naturally help lower the time needed to process a task. This could be enough to drop the deadline back down to 14 days without running the risk of getting people that have slower computers or less of a resource share allocated to Einstein all upset again like they were during portions of S5R2.

As for your P3, it doesn't have any pending credit... :-) It is just like my 2.4GHz P4 right now, usually the last to report in the initial replication. I do have 2 tasks that are pending, but they have only been pending for 7 days at the most. I have that system running 100% dedicated to Einstein though and it is on all the time, so my average turnaround is only 4.35 days with a 3-day cache setting...

Bottom line: I think anyone raising issues about pending credit and even the gap between first and second replications being picked up needs to wait and see what the impact of the faster Windows app has upon those two figures...

Odd-Rod
Odd-Rod
Joined: 15 Mar 05
Posts: 38
Credit: 8007735
RAC: 617

Brian Thanks for the great

Message 85779 in response to message 85778

Brian
Thanks for the great description of how replication works. I have a couple of questions about it, if you can help?

Quote:
* The second replication completely exhausts their deadline and a new replication has to be sent, but isn't picked up until November 2nd, giving a new deadline of November 20th.

Would there be 3rd, 4th, 5th, etc replication? I'm sure there must be a limit, but is it based on these successive replications or is there an ultimate deadline for a WU. And if there is, does the cruncher who returned the 1st result then lose the credit? Oops, maybe I shouldn't ask that last question, because the answer might not be good for those who are really concerned about credits ;)

Thanks
Rod

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: Would there be 3rd,

Message 85780 in response to message 85779

Quote:

Would there be 3rd, 4th, 5th, etc replication? I'm sure there must be a limit, but is it based on these successive replications or is there an ultimate deadline for a WU.

Yes, the additional replications would happen. When you look at the workunit details, where you can see the other systems that are processing the same workunit as your host is, on that screen you will see:

[pre]max # of error/total/success tasks 20, 20, 20 [/pre]

That says that there can be a total number of replications of 20 at a maximum. Out of those 20, you can have a maximum of either 20 error or 20 success tasks. The 20 "success" tasks is very uncommon here, but over at Cosmology they have it set for 3,6,3 (I think). I know that they had quite a few tasks over there that got denied credit because there were "too many success" tasks. That is really silly.

Quote:
And if there is, does the cruncher who returned the 1st result then lose the credit?

As it stands at Einstein, if a task does not have a quorum partner that matches (validates), once it got up to that level, nobody would get credit. I really seriously doubt that is a concern for this project though. I don't think I've ever seen higher than 7 replications, and that was a rare event from what I remember. In other words, could happen, but extremely unlikely.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023364931
RAC: 1813968

RE: I don't think I've ever

Message 85781 in response to message 85780

Quote:
I don't think I've ever seen higher than 7 replications, and that was a rare event from what I remember. In other words, could happen, but extremely unlikely.


Here is one of mine for which eight results were sent out (and a ninth put on deck, but never sent).

But it is a special case. I was slow to return, taking ten days after the August 1 issue. As this was back in the good old S5R3 days of nearly simultaneous issue, my first quorum partner also got it August 1, and took three days to return a result which did not validate against mine.

To resolve who was right, a tie-breaker was issued August 11, eleven minutes after I turned mine in. That one failed to return, so on its expiry on August 29, another tie-breaker was issued. Before it timed out Einstein Central Control changed some settings to hustle S5R3 to completion, and two additional potential tie-breakers were sent out September 12. When the third guy also timed out on September 16, yet another tie-breaker was sent out (so three tie-breakers in flight continuously from September 12 through September 17). The next day, yet another tie-breaker went out (four in flight now). Finally the third returned result showed up on September 17, and two more on September 23 and 24. Somewhere along there adequate similarity was found to establish a validating condition, and eventually all the returns except the second were awarded credit. That one still shows as "checked, but no consensus yet", and is presumably not similar enough to the validating reference.

This is an extreme case as regards numbers (and not meant as disagreement with Brian over the rarity of such high replication), as they were impatiently increased by the end of S5R3 speedup. It is not, however, an extreme case as regards elapsed time. Quite the contrary, our current state makes it likely that a similar sequence combining validation failure and failure to return results would take considerably longer, as in this case results were generally sent out within minutes to a few hours of the first time policy would allow, where it is quite common now on S5R4 for many days delay to be incurred at that step (a subject of this thread). Also, the lack of impatience (multiple tie-breakers in simultaneous flight), will stretch resolution times in such cases.

Odd-Rod
Odd-Rod
Joined: 15 Mar 05
Posts: 38
Credit: 8007735
RAC: 617

RE: I know that they had

Message 85782 in response to message 85780

Quote:
I know that they had quite a few tasks over there that got denied credit because there were "too many success" tasks. That is really silly.


Indeed.

Quote:
I don't think I've ever seen higher than 7 replications, and that was a rare event from what I remember. In other words, could happen, but extremely unlikely.


And as archae86 posted about an hour after you, it has in fact happened! But I agree, it would be very rare.

To Brian, many thanks for helping me learn a bit more about Boinc's workings.

To archae86, I enjoyed reading the detailed story of the replications! It helped my understanding of what happens.

To my wingmen (or should that be wingpersons?), I'm afraid I have 2 WUs that I didn't finish - my son's PC was reformatted and it had 2, so my apologies.

Regards
Rod
(Off to bed soon - it's 00:32 here)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.