Validate error - What this really means!

Nigel Garvey
Nigel Garvey
Joined: 4 Oct 10
Posts: 50
Credit: 18025640
RAC: 56543

Two

Two today:

262704444
262862245

NG

NG

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109965389174
RAC: 30700438

RE: Two

Quote:

Two today:

262704444
262862245

NG


Both of them have the same error message

Validate error [6] (00000010)
- result file has entries that aren't numbers


Cheers,
Gary.

[AF>FAH-Addict.net]toTOW
[AF>FAH-Addict....
Joined: 9 Oct 10
Posts: 6
Credit: 10596151
RAC: 0

I hope someone will be able

I hope someone will be able to enlighten me on what is happening : I have two identical GPUs, one is failing to validate all WUs, and the other is perfectly fine.

I've tested the failing card with other DC projects, MemtestG80 and MemtestCL, and I didn't find any error ...

So here is the list of the tasks with validate error:

266671780
266684326
266690957
266704182
266707576
266723178
266728665
266732827
266744893
266747992
266837405

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

RE: I hope someone will be

Quote:

I hope someone will be able to enlighten me on what is happening : I have two identical GPUs, one is failing to validate all WUs, and the other is perfectly fine.

I've tested the failing card with other DC projects, MemtestG80 and MemtestCL, and I didn't find any error ...

So here is the list of the tasks with validate error:

266671780
266684326
266690957
266704182
266707576
266723178
266728665
266732827
266744893
266747992
266837405


I took a quick look and by my count that computer has 11 Validate Errors and 13 Validated results. Are both GPU's in the same system?

I am not an expert here just curious.

Joe

The Xorcist
The Xorcist
Joined: 16 Aug 11
Posts: 16
Credit: 464281554
RAC: 0

Ive noticed a lot more of

Ive noticed a lot more of these validate errors in the last 3 months & like Joe stated earlier i myself am no expert just curious. They dont seem to be cpu or gpu specific or am i wrong ?

X

[AF>FAH-Addict.net]toTOW
[AF>FAH-Addict....
Joined: 9 Oct 10
Posts: 6
Credit: 10596151
RAC: 0

RE: I took a quick look and

Quote:

I took a quick look and by my count that computer has 11 Validate Errors and 13 Validated results. Are both GPU's in the same system?

I am not an expert here just curious.

Joe

Indeed, valid results are from GPU 1 and CPU, and invalid result are all from GPU 0.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109965389174
RAC: 30700438

RE: Ive noticed a lot more

Quote:
Ive noticed a lot more of these validate errors in the last 3 months & like Joe stated earlier i myself am no expert just curious. They dont seem to be cpu or gpu specific or am i wrong ?


There's quite a lot of detailed information on error rates for various platforms for the various science apps in the opening posts of this thread. These errors have been around for quite some time (much longer than 3 months) and are being investigated. They seem to be OS specific - Windows hosts have much lower error rates.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109965389174
RAC: 30700438

RE: Indeed, valid results

Quote:
Indeed, valid results are from GPU 1 and CPU, and invalid result are all from GPU 0.


Which fairly strongly implies that your GPU 0 has recently developed some sort of hardware problem and needs to be checked out.

Cheers,
Gary.

[AF>FAH-Addict.net]toTOW
[AF>FAH-Addict....
Joined: 9 Oct 10
Posts: 6
Credit: 10596151
RAC: 0

Unfortunately, it doesn't

Unfortunately, it doesn't fail in any other application I tried (MemtestCL, MemtestG80, SETI, Folding).

And there is no output in E@H stderr that could help to find out what is wrong :(

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.