Validate error - What this really means!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117829201739
RAC: 34754396

RE: Nobody likes some


It would appear so. You do get occasional bad WUs which give repeated errors until the limit of 20 error results is reached and the WU is then automatically abandoned. You seem to be copping rather a lot. Unfortunately, they have to be weeded out manually if they are identified before the limit of 20 failures is reached.

I'll send Bernd an email and ask him to look into it.

Cheers,
Gary.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 19

Hi, We've have identified

Hi,

We've have identified a corrupt dataset and cancelled all workunits/tasks analyzing it. They all bear a name like "p2030.20111103.G44.50-01.64.N.b*". We also identified some more stored data that hasn't been sent so far and removed it from the data pool. We furthermore came up with additional tests to prevent such corrupt data to enter our pre-processing chain.

Sorry for the inconvenience!

Oliver

Einstein@Home Project

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 19

Update: I just checked for

Update: I just checked for any other suspicious datasets that already made into our work unit/task pool. I found none. Thus the situation should be back to normal now. Occasional validate errors might still occur of course.

Best,
Oliver

Einstein@Home Project

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117829201739
RAC: 34754396

Thanks for attending to this

Thanks for attending to this promptly.

Here is an example of somebody noticing a canceled WU. I've wondered why there seems to be always one apparently 'good' result in a long list of 'validate errors' when a WU is bad. I guess the answer is obvious. The validator is called when the second result arrives and checks that one first. When it discards the second result, it never gets around to actually checking the contents of the first result. Something like that, I imagine.

Cheers,
Gary.

Billy
Billy
Joined: 2 Jun 06
Posts: 30
Credit: 3514004
RAC: 0

Some errors on my Intel Core

Some errors on my Intel Core Duo iMac 269029992 and 269030200.

Both have the same error code.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117829201739
RAC: 34754396

RE: Some errors on my Intel

Quote:

Some errors on my Intel Core Duo iMac 269029992 and 269030200.

Both have the same error code.


Both have the same reason why the validator is unhappy

Validate error [6] (00000010)
- result file has entries that aren't numbers

I have quite a few iMacs (not to mention all my Linux hosts) getting a number of these too. The Devs are aware of this and are trying to work out why this is happening. Some validate errors are the result of hardware problems at the client end. There are too many of these on Linux and OS X to be just due to hardware problems.

Unfortunately, a lot of Dev time is going into the test project so that the new apps (OpenCL and the new GW app) will be ready when needed. Our little problem is probably on the back burner. I'm sure it wont be forgotten and I'm sure we'll all be informed when progress is ultimately made. We just need to be patient.

Cheers,
Gary.

Nigel Garvey
Nigel Garvey
Joined: 4 Oct 10
Posts: 51
Credit: 33238871
RAC: 91697

RE: Just like some previous

Quote:

Just like some previous ones of yours, the more detailed info about that one is

Validate error [6] (00000010)
- result file has entries that aren't numbers

Thanks, Gary. But what does not being a number mean? Accompanied by the wrong class code? Not recognised because of wrong precision or wrong-endedness? Not representing digits from a particular number base in a particular text encoding?

NG

NG

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117829201739
RAC: 34754396

RE: ... what does not being

Quote:
... what does not being a number mean?


There is a separate error message when a value is outside an acceptable range so it's not just a matter of precision.

I would assume that certain fields in the result file simply contained values that were not numeric. I'm sure that I was told that the result file isn't human readable, although I don't think I've actually gone to the trouble to check this for myself. I have no information about what a bad value looks like but I assume that it could be just garbage as a consequence of a bug deep in the app code or in a math library (or wherever) that gets triggered with certain data values being processed. It's obviously not something that can be diagnosed easily.

Cheers,
Gary.

bahndamm_net-boincers
bahndamm_net-bo...
Joined: 11 Feb 05
Posts: 14
Credit: 11968816
RAC: 788

Out of the last 8 Gamma ray

Out of the last 8 Gamma ray WU on my Linux boxes (1 C7 and 1 Athon X2 4850e), only one (on the AMD) has been positively validated.

The next one that's just failed is: 268539068
with "Validate error (2:00000010)"

The C7 has another one in the pipe to be finished tomorrow. 2 Grav wave S6 tasks were successful the last 5 days.

I'll finish the last Gamma ray on the C7 and the one in work on the Athlon and then I'll postpone Gamma ray until there's an update available, that's supposed to improve this.

Sorry, but work that's nearly always coming out with an error doesn't get the project further on, I think.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117829201739
RAC: 34754396

RE: ... The next one that's

Quote:
... The next one that's just failed is: 268539068
with "Validate error (2:00000010)"


The full error message is

Validate error [6] (00000010)
- result file has entries that aren't numbers

which is quite common amongst these errors.

Quote:
I'll finish the last Gamma ray on the C7 and the one in work on the Athlon and then I'll postpone Gamma ray until there's an update available, that's supposed to improve this.


For anyone particularly troubled by this ongoing problem, turning off FGRP tasks in your preferences would be a sensible course of action. There's no similar issue with GW tasks.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.