user's invalid WU accepted with quorum of 2

Ken Vogt
Ken Vogt
Joined: 18 Jan 05
Posts: 41
Credit: 321783
RAC: 0
Topic 190709

Hello again,

One of our team members has had a problem with a few Albert WUs, for example, 4082650.

His result is marked invalid, with zero credit, but only two other computers received the WU, and the validator accepted the result, and granted credit to the other two.

Has this been seen before? I did search for some likely terms & skimmed the whole Albert thread; sorry if I missed it here.

There are about 10 other results just like this, for computers 467939 and 83035. They are all Albert WUs, from several different master files, mostly under 5.2.6, but the most recent one is after he upgraded to 5.2.13.

All the recent results from these two machines that list 0.00 credit are examples of this phenomenon, as best I could check. Both of them show a history of many successfully completed WUs.

My first guess was that there is some difference between the info returned to the Validator and the Sheduler that leads one to believe the result is valid and the other not?

467939 in particular is a very fast machine, and I suppose it's possible that some WUs are in fact invalid, but the same thing also happens with 83035.

And in any case, the crux is:

If the results are invalid, why are they not sent to a 4th machine?

If the results are in fact valid, why is no credit granted?

In the second case, it would be great for the user to be able to get credit for the WUs, but it is more important to us both to understand what is going on.

We would be grateful for any help you can give.

Ken

Tom B
Tom B
Joined: 23 Jul 05
Posts: 9
Credit: 3982841
RAC: 0

user's invalid WU accepted with quorum of 2

When three results are returned without errors they are validated against each other. If one doesn't agree within a reasonable amount with the other two, it gets marked as invalid and no credit is given.

This usually happens when one is a different OS from the other two. I've been on the recieving end of that with my Power Mac several times. It rarely happens with three windows machines.

You might look for some possible problem with the machine (excessive heat, overclocking, etc...)

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

This same think happens on at

This same think happens on at least one other project. Success of a result means that is was downloaded, crunched, uploaded and reported without issues. It does not mean it's valid. Once 3 successes happen, the validator looks at the results and decides if the data is valid. If one is invalid, it sets it to 0 and if the other two are close enough it validates the WU, giving the "middle" result (so it throws out the 0 and the high). The two must be close enough, otherwise it will say that there is no consensus, and send a 4th result to be crunched.

So, yes 2 good ones can get validated.

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

I think we are getting a

I think we are getting a little hung up on the wording.

As stated Min Quorum is the minimum needed before validation will be attempted. But it does not require three validated results for the quorum to be closed and a canonical result chosen.

It is hoped that the three will compare of course ...

Ken Vogt
Ken Vogt
Joined: 18 Jan 05
Posts: 41
Credit: 321783
RAC: 0

Paul and Pooh Bear, Thanks

Message 24755 in response to message 24754

Paul and Pooh Bear,

Thanks very much for these explanations; I'll pass them on. :)

Ken

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.