Validate Errors

Dana
Dana
Joined: 11 Sep 06
Posts: 44
Credit: 4303113
RAC: 0
Topic 193756

In the past two days I have received 11 "validate errors" for outcomes to my work units. I have never seen a validate error in any other work unit I have done in the past almost two years. Is this a problem on my end? How can I investigate the trouble? I'd be grateful for any help anyone can offer.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117650542777
RAC: 35186668

Validate Errors

Quote:
In the past two days I have received 11 "validate errors" ....

If you click the "explain" link under the column heading of the column in which the validate error is reported on the website, you will get the definitions of all the possible outcomes that can be reported. You will see that "validate error" refers to actions happening on the server and therefore not something happening on your computer.

I've see this happen before in situations where an uploaded result is reported too quickly after the uploading action. The validator can be called before the uploaded files have been moved to the location where the validator expects to find them. If the files aren't there, the validator reports this particular error.

Some time ago there were "third party" BOINC clients which were attempting to report results immediately upon uploading and this was giving rise to this issue. You seem to be using the standard BOINC client so this shouldn't be affecting you. Are you perhaps manually updating immediately the upload finishes? If you were, this might be the cause of the problem.

Apart from that, it otherwise must be a server issue which seems to have singled you out for unfair treatment. There have been some indications that there may be some issues with the servers at the moment. You could send a PM to Bernd asking him if someone could look into your results list to see what is going on with these errors.

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728255554
RAC: 1178769

I'm not so sure this is

I'm not so sure this is server related. It seems that the results get uploaded fine, pass the first validation stage (the result taken for itself does not show corruption) but fails the test that tries to match the result with the wingman, so that a second wingman is consulted. So there is a chance that there is really a hardware problem. Given the runtimes of the WUs, I would not rule out the PC is overclocked a bit, maybe ??? If the PC is overclocked, undervolted, or in any other way operating beyond the specs, I'd try to set it back to the stock settings and see if the problems go away. Checking for adequate cooling is always a good idea as well.

CU
Bikeman

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Also, the problem of

Also, the problem of reporting too soon after the upload of the output files isn't only limited to third party Core Clients which have Return Tasks Immediately functionality.

It's possible to set 5.10.x series CC's so that you can get the same effect. In addition, not all project backends have the timing problem. SAH can be sensitive to it for example, but I don't recall loosing one on EAH when I tried it here. However, it's generally not a wise idea to go pushing your luck when you know in advance there might be a potential problem. I usually tell people to not set the Connect to Network Interval (CI) to less than about 0.01 days (about 15 minutes).

Alinator

Dana
Dana
Joined: 11 Sep 06
Posts: 44
Credit: 4303113
RAC: 0

Yes, no doubt my system is

Yes, no doubt my system is overclocked. The processor is, however, liquid cooled and temps spike into the lower 60’s. All other components are actively cooled. Besides, this is nothing new in the last few days. What is new is I have replaced my router. I’ve installed a Linksys wrt350 in which I have yet to update the firmware. I should have mentioned this in my original post but did not want to lead someone to a conclusion. Since the router is new to the system it is the obvious choice as the culprit if, in fact, the trouble is on my end. I am thankful to everyone who has tried to help me as this is a frustrating situation and your suggestions are appreciated.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117650542777
RAC: 35186668

RE: I'm not so sure this is

Message 82678 in response to message 82675

Quote:
I'm not so sure this is server related. It seems that the results get uploaded fine, pass the first validation stage (the result taken for itself does not show corruption) but fails the test that tries to match the result with the wingman, so that a second wingman is consulted.

A second wingman will be used irrespective of the cause of the validate error. If the situation were as you propose, both the completed results would be CBNC (checked but no consensus) and both would be still "pending" the return of the third and deciding result.

However this is not the case (for the one I looked into) and that fact really points to something being missing on the server for the "validate error" result. I think it's well worth getting someone project side to check out the situation. If it were an overclocking or faulty hardware issue, I would expect to see results directly marked as invalid, scattered throughout his results list.

At a quick glance, I didn't see any "invalids" at all in his list. Also, if you look at the quorum of one of the older "validate errors" where the 2nd wingman has reported you will notice that the validate state of the "validate error" result is still listed as CBNC. It would be listed as "invalid" if it really were an invalid result caused by a problem on his machine. To me, the fact that the validate state has not been updated seems to infer that there really is stuff lost on the server that prevents this result from participating in the validation exercise.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117650542777
RAC: 35186668

RE: Also, the problem of

Message 82679 in response to message 82676

Quote:
Also, the problem of reporting too soon after the upload of the output files isn't only limited to third party Core Clients which have Return Tasks Immediately functionality.

Yes, exactly - which was why I asked if he was doing a too quick manual update.

Quote:
... I don't recall loosing one on EAH when I tried it here....

I've seen a validate error happen on E@H when the reporting too quickly followed the uploading. Basically it was a request for new work that occurred almost simultaneously with the uploading of a just completed result. The request for work tried to also report the uploading result and sure enough .... I actually just happened to be watching the whole thing and had this premonition that something nasty was going to happen. When I then checked - sure enough, validate error!!@#$%!!

You tend to remember those sorts of things :-).

Cheers,
Gary.

Dana
Dana
Joined: 11 Sep 06
Posts: 44
Credit: 4303113
RAC: 0

I usually tell people to not

I usually tell people to not set the Connect to Network Interval (CI) to less than about 0.01 days (about 15 minutes).

Well, the trouble seems to have abated for now but I'm still holding my breath waiting for the other shoe to drop. I made a cursory search for the connect to network interval in my preferences and in the manager interface but did not find anything I could adjust. The consensus seems to be that the trouble lay with reporting the result while making a request for new work if I'm understanding correctly. Mostly, the program just works in the background. At the end of the day I'll check my results to see how things are going and from time to time, check my stats but that is usually the limit of my interaction with the program. I guess I'll simply be thankful the problem has gone from me for now. Nice to see there is help out there when someone like me has some unusual happenstance.

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

RE: I usually tell people

Message 82681 in response to message 82680

Quote:

I usually tell people to not set the Connect to Network Interval (CI) to less than about 0.01 days (about 15 minutes).

Well, the trouble seems to have abated for now but I'm still holding my breath waiting for the other shoe to drop. I made a cursory search for the connect to network interval in my preferences and in the manager interface but did not find anything I could adjust. The consensus seems to be that the trouble lay with reporting the result while making a request for new work if I'm understanding correctly. Mostly, the program just works in the background. At the end of the day I'll check my results to see how things are going and from time to time, check my stats but that is usually the limit of my interaction with the program. I guess I'll simply be thankful the problem has gone from me for now. Nice to see there is help out there when someone like me has some unusual happenstance.

On the website (in your General preferences) it's under "Network usage" and the exact field is "Computer is connected to the Internet about every". Set that to .01 or so. If you use the local preferences in the manager (tabbed view - Advanced... - Preferences... - Network tab - "Connect about every". If you do set it in the manager, note that it will override anything set on the website. So make sure to tweak the other settings to match what's on the web.

Kathryn :o)

Einstein@Home Moderator

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117650542777
RAC: 35186668

RE: ... The consensus seems

Message 82682 in response to message 82680

Quote:
... The consensus seems to be that the trouble lay with reporting the result while making a request for new work if I'm understanding correctly.

For your errors - no, not at all. I was simply pointing out that validate errors can occur in a variety of ways. In your particular case, since you are not micro-managing, I still think the problem is at the server end. There is something funny going on with the validator (see other recent threads in this forum and see the large validate queue on the server status page) as we speak and I have asked one of the Admins to have a look at your validate error results when he has a chance.

Cheers,
Gary.

Dana
Dana
Joined: 11 Sep 06
Posts: 44
Credit: 4303113
RAC: 0

Just as I thought the storm

Just as I thought the storm had passed I got nailed again. A whole new slew of validate errors just cropped up. Since perhaps the problem is not on my end, does anyone know of a way I can get the attention of someone who can perhaps look in to the other end of this business?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.