Some Observations on Cross Platform Validation

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109380652842
RAC: 35967500
Topic 192955

Three weeks ago I posted some info in the Unlucky Validation error thread concerning the rate of validation errors being experienced by my Linux boxes. Here is a quote:-

Quote:
I have now had time to do a survey of a number of my Linux boxes to get an idea of how bad the validation problem really is (for me anyway). I've been through the results lists of about 15 linux boxes picked at random. I've examined 103 total results from those boxes and found 15 marked as "invalid" or "checked but no consensus yet" (which almost invariably eventually become invalid).

With the announcement of the new validator, I was interested in gathering some fresh numbers for the period immediately before the new validator was introduced. I examined a total of 113 results from 18 different boxes. Of these only 4 were either "invalid" or "checked but no consensus yet" compared with 15 from the last time I did this, as reported in the above quote.

The period examined covered the range from about June 20 to July 9. Interestingly not only was the rate of invalids much lower than in the earlier survey, but also there were zero invalids where the Windows box was running version 4.24 science app. The large majority of observed results were post the introduction of 4.24 so there seems to have been quite an improvement in validation performance just from the introduction of 4.24. There were approx 25 of my results where the windows app was 4.17 of which 4 were eventually marked invalid.

I realise that the sample size may not be statistically significant but the trend seems interesting and encouraging nevertheless.

This particular WU is one of the 4 potential invalids just mentioned and it is particularly interesting as it has not yet been subjected to final validation. My Linux result was originally paired with one being crunched by 4.17, giving the "no consensus" outcome. The result has been recently reissued to a box running 4.24 and has not yet been returned. When it does get returned it will be subjected to the new validator which will have to decide between Linux, Win 4.17 and Win 4.24. Normally you would put your money on the two Win boxes, but ...

My thinking is that if the validator is better now, perhaps all three will pass the test.

What do you think?? :).

Cheers,
Gary.

Martin P.
Martin P.
Joined: 17 Feb 05
Posts: 162
Credit: 40156217
RAC: 0

Some Observations on Cross Platform Validation

Quote:

Three weeks ago I posted some info in the Unlucky Validation error thread concerning the rate of validation errors being experienced by my Linux boxes. Here is a quote:-

Quote:
I have now had time to do a survey of a number of my Linux boxes to get an idea of how bad the validation problem really is (for me anyway). I've been through the results lists of about 15 linux boxes picked at random. I've examined 103 total results from those boxes and found 15 marked as "invalid" or "checked but no consensus yet" (which almost invariably eventually become invalid).

With the announcement of the new validator, I was interested in gathering some fresh numbers for the period immediately before the new validator was introduced. I examined a total of 113 results from 18 different boxes. Of these only 4 were either "invalid" or "checked but no consensus yet" compared with 15 from the last time I did this, as reported in the above quote.

The period examined covered the range from about June 20 to July 9. Interestingly not only was the rate of invalids much lower than in the earlier survey, but also there were zero invalids where the Windows box was running version 4.24 science app. The large majority of observed results were post the introduction of 4.24 so there seems to have been quite an improvement in validation performance just from the introduction of 4.24. There were approx 25 of my results where the windows app was 4.17 of which 4 were eventually marked invalid.

Gary,

did you take into account that the servers delete invalid results much quicker now? In the past you could roll back your results for at least 4-6 weeks before they were deleted. Nowadays you only see the last 2-3 weeks (and many invalid results get deleted almost immediateley) which makes monitoring invalid results much more difficult. So, if you don't monitor your result pages at least twice a day you have a good chance to miss some invalid results, which will bias your stats towards an improving situation.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109380652842
RAC: 35967500

RE: did you take into

Message 69642 in response to message 69641

Quote:

did you take into account that the servers delete invalid results much quicker now?

Actually, I don't think they do. All invalids go through the CBNC (checked but no consensus) stage before being declared invalid. This should almost inevitably delay proceedings while the decider result gets done. Sure, once the jury comes back then the result will get deleted quickly because it has already been hanging around as a pending for a relatively long period. I actually looked at quite a high number of pendings and was surprised to see so few CNBCs amongst them.

However you make a good point about possible shortcomings in the experimental technique :). I didn't spend much time on this so I didn't observe results throughout their life. I was just interested in getting a quick and dirty estimate of how things were going now compared to how they were the last time I looked. So I just used exactly the same technique as last time, expecting to get a similar picture to that of last time. I was surprised to see the apparent difference. My intention was to establish a baseline so that I could repeat the procedure in a week or two in order to gauge the performance of the new validator. I'll still do that but I think it might be difficult to see another similar improvement :).

Quote:
So, if you don't monitor your result pages at least twice a day you have a good chance to miss some invalid results...

Not at all. Potential invalids are very visible by looking at all the pendings and noting how many of them are "double pendings" :).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109380652842
RAC: 35967500

Talking about "double

Talking about "double pendings", here is a further example of exactly the sort of thing I'm talking about. This is the exact reverse of the double pending example I gave in my original message. There my machine was the Linux box whereas this time it's the windows box. The Linux box is in the Merlin/Morgane dual cluster at AEI. I seem to be in rather top notch company :).

I'm guessing that these two pendings became CNBCs just before the new validator was installed, judging by the dates. They will hang about until the decider returns and at that stage I'll probably have to be paying attention if I want to see how the new validator handles it. Once again it would be nice to see all three get the nod :).

Cheers,
Gary.

Martin P.
Martin P.
Joined: 17 Feb 05
Posts: 162
Credit: 40156217
RAC: 0

RE: RE: did you take

Message 69644 in response to message 69642

Quote:
Quote:

did you take into account that the servers delete invalid results much quicker now?

Actually, I don't think they do. All invalids go through the CBNC (checked but no consensus) stage before being declared invalid. This should almost inevitably delay proceedings while the decider result gets done. Sure, once the jury comes back then the result will get deleted quickly because it has already been hanging around as a pending for a relatively long period. I actually looked at quite a high number of pendings and was surprised to see so few CNBCs amongst them.

However you make a good point about possible shortcomings in the experimental technique :). I didn't spend much time on this so I didn't observe results throughout their life. I was just interested in getting a quick and dirty estimate of how things were going now compared to how they were the last time I looked. So I just used exactly the same technique as last time, expecting to get a similar picture to that of last time. I was surprised to see the apparent difference. My intention was to establish a baseline so that I could repeat the procedure in a week or two in order to gauge the performance of the new validator. I'll still do that but I think it might be difficult to see another similar improvement :).

Quote:
So, if you don't monitor your result pages at least twice a day you have a good chance to miss some invalid results...

Not at all. Potential invalids are very visible by looking at all the pendings and noting how many of them are "double pendings" :).

Gary,

this is the problem: Not all pending work-units appear in the "Pending"-list. I currently see 2 work-units in my Pending-list but checking the Resultspage I see 5 work-units showing the status "pending".
Example: The following work-units do NOT appear in the pending list but do show as pending in the result pages (just the first 3). They also do NOT show the title "checked but no consensus":
http://einsteinathome.org/task/85345232
http://einsteinathome.org/task/85534343
http://einsteinathome.org/task/85493212
It is these work-units that disappear very quickly after the are signed "Invalid" and therefore are very hard to track.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023084931
RAC: 1831799

RE: With the announcement

Quote:


With the announcement of the new validator, I was interested in gathering some fresh numbers for the period immediately before the new validator was introduced. I examined a total of 113 results from 18 different boxes. Of these only 4 were either "invalid" or "checked but no consensus yet" compared with 15 from the last time I did this, as reported in the above quote.

The period examined covered the range from about June 20 to July 9. Interestingly not only was the rate of invalids much lower than in the earlier survey, but also there were zero invalids where the Windows box was running version 4.24 science app. The large majority of observed results were post the introduction of 4.24 so there seems to have been quite an improvement in validation performance just from the introduction of 4.24. There were approx 25 of my results where the windows app was 4.17 of which 4 were eventually marked invalid.

I realise that the sample size may not be statistically significant but the trend seems interesting and encouraging nevertheless.
....
What do you think?? :).


Gary, I think that a grave statistical barrier is the extreme non-randomness of one's selection of quorum partners. This of course arises from the method of result assignment. I notice this as major shifts in my pending situation, depending on whether my fastest machine has recently been in a pool with fast or slow responders. I imagine the validation error matter is similarly systematically biased. This would stop those of us with very small fleets from concluding much from our own results. Your 18 may be enough for your result to mean something.

My big hope is that Bernd has clearly indicated the validation problem is high concern, is being worked on, and that a new validator has been started. My smaller hope is that a portion of the problem that is stemming in some sense from "bad hosts" may get better as their owners take them off the project. This will help whether they are motivated by public virtue or by pique at lost credit.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109380652842
RAC: 35967500

RE: this is the problem:

Message 69646 in response to message 69644

Quote:

this is the problem: Not all pending work-units appear in the "Pending"-list. I currently see 2 work-units in my Pending-list but checking the Resultspage I see 5 work-units showing the status "pending".

I'm not 100% sure but I believe this is normal behaviour. I know that the results lists for individual computers are always up-to-date. However I believe that because it takes a lot of server resources to construct the full per participant pending list, that operation is done relatively infrequently. Therefore your full pending list is always going to be missing the most recent pendings.

In my case, with well in excess of 100 boxes running, I never consult my full pending list as it's just too large. As mentioned in my original post, I checked the results lists of 18 separate Linux boxes and examined every pending that I found in those lists. I could only find 4 out of 113 results that were either already invalid or CBNC which means that someone (most likely the Linux box) is going to miss out.

Quote:
Example: The following work-units do NOT appear in the pending list but do show as pending in the result pages (just the first 3). They also do NOT show the title "checked but no consensus":
http://einsteinathome.org/task/85345232
http://einsteinathome.org/task/85534343
http://einsteinathome.org/task/85493212

Here I'm not quite understanding you. You say they are not CBNCs but the first one definitely is. So it will become invalid for somebody. The second and third ones are not CBNCs simply because the validator has not looked at them yet as only one result has been returned in each case. There is no indication at this point that they will ever become CBNCs and hence eventually invalid for somebody.

When I did my survey, I deliberately excluded all pendings where only one result had been returned, simply because you can't predict the final outcome. So in my case I actually looked at quite a lot more than 113 in total.

Using the data you listed above, I've had a look at the full results lists of those three machine and have noted your continuing problem with invalids. Compared to you, I seem to be having a lucky run at the moment.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109380652842
RAC: 35967500

RE: Gary, I think that a

Message 69647 in response to message 69645

Quote:

Gary, I think that a grave statistical barrier is the extreme non-randomness of one's selection of quorum partners....

Thanks for the comments. I think you have identified the significant factor that contributes to the systematic bias you refer to. That's why I kept adding extra hosts to the list I was surveying in order to try to cancel out this effect.

Quote:
... a portion of the problem that is stemming in some sense from "bad hosts" may get better as their owners take them off the project. This will help whether they are motivated by public virtue or by pique at lost credit.

Hopefully the fact that there really is a "bad host syndrome" will become more evident to the owners of "bad hosts" when the "good hosts" stop being so regularly "invalidated". :).

Cheers,
Gary.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Keep in mind that a while

Keep in mind that a while back EAH rolled back from using the latest builds of the Server backend, when it turned out it was causing trouble with the MSD.

As a result I'm going to assume they don't have the fix which keeps CBNC's on the pendings summary page implimented. This can make it easy to miss some, since they don't hang around for too long as pending (or even on the host summary anymore) once the third one comes back. I think that was the point Martin was driving at in his comment about being careful to not miss them when looking for invalidation rates.

In fact, the only reason I noticed the quirk was I started doing detailed logging of results for my hosts awhile back and discovered the CBNC's weren't showing up when I went to reconcile the pendings for the hosts.

Interestingly, the fix they implimented (which is in place over at SAH currently) has a quirk. Normally there pendings are listed in ascending RID order by default, but once a result gets transitioned to CBNC, it ends up relisted after the ones marked pending (in ascending RID order).

Alinator

Regarding the frequency of observation to be reasonably sure of not missing them; given the increased runtimes of the results, once a day is still more than adequate for EAH, even with later model CPU's for the most part. SAH, with their ultra agressive purging is a different story though. ;-)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109380652842
RAC: 35967500

In the opening message of

In the opening message of this thread I wrote:-

Quote:


This particular WU is one of the 4 potential invalids just mentioned and it is particularly interesting as it has not yet been subjected to final validation. My Linux result was originally paired with one being crunched by 4.17, giving the "no consensus" outcome. The result has been recently reissued to a box running 4.24 and has not yet been returned. When it does get returned it will be subjected to the new validator which will have to decide between Linux, Win 4.17 and Win 4.24. Normally you would put your money on the two Win boxes, but ...

My thinking is that if the validator is better now, perhaps all three will pass the test.

Well, the decider for this particular WU has now been returned and the new validator has been called to do its thing. I'm very pleased to report that all three results have been declared valid and been assigned their full credit.

Congratulations to the team for this significant advance in the solving of the cross-platform validation issue.

Cheers,
Gary.

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

It may be just luck, but all

It may be just luck, but all my results on SETI, Einstein and QMC obtained with a PII running Linux have been validated. It may be a slow CPU, but its floating point arithmetics must be OK.
Tullio

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.