Validate Errors

Firebird
Firebird
Joined: 10 Mar 05
Posts: 7
Credit: 90397577
RAC: 0
Topic 194734

I just changed my video card from a 220gt to a 9800gt, now every workunit is showing a validate error. Driver is version 195.62, any ideas on how to fix it?

rroonnaalldd
rroonnaalldd
Joined: 12 Dec 05
Posts: 116
Credit: 537221
RAC: 0

Validate Errors

Someone at folding@home has released a nVIDIA GPU memory checker. The results was/are an eye-opener for nVidia and Fermi-based cards comes with ecc-memory...
This problem is not nVidia only, AMD/ATI is affected too.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690848072
RAC: 270706

Most likely, the validation

Most likely, the validation problems are not caused be a hardware problem, but by problems that surfaced in the transition phase of the ABP1 to ABP2 searches.

Details can be found here, for example.

New GPU apps have been distributed automatically, sothe problem should be resolved in a few days after all new results are computed with the new apps.

CU
HB

Firebird
Firebird
Joined: 10 Mar 05
Posts: 7
Credit: 90397577
RAC: 0

RE: Most likely, the

Message 96638 in response to message 96637

Quote:

Most likely, the validation problems are not caused be a hardware problem, but by problems that surfaced in the transition phase of the ABP1 to ABP2 searches.

Details can be found here, for example.

New GPU apps have been distributed automatically, sothe problem should be resolved in a few days after all new results are computed with the new apps.

CU
HB

Well, that got my hopes up, so I put the card in a different computer. Now it's erroring everything it runs. The workunits run normal, everything looks fine until they validate, or rather don't.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690848072
RAC: 270706

I think you were unlucky

I think you were unlucky enough to still catch some of those transition workunits that had some of the flawed initial ABP2 apps as "wingmen".

All the recent (lots of it) result here http://einsteinathome.org/host/2263082/tasks seem to validate just fine, and with a good performance.

CU
HB

Firebird
Firebird
Joined: 10 Mar 05
Posts: 7
Credit: 90397577
RAC: 0

Those are after I switched

Those are after I switched the card back to the 220gt. The 9800 gt went in a 3.0 gig pentium 4 box, then that one started running junk. I detached it from einstein, it's trying seti now. We'll see how that works for it. The card for sure doesn't run einstein though.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690848072
RAC: 270706

RE: Those are after I

Message 96641 in response to message 96640

Quote:
Those are after I switched the card back to the 220gt. The 9800 gt went in a 3.0 gig pentium 4 box, then that one started running junk. I detached it from einstein, it's trying seti now. We'll see how that works for it. The card for sure doesn't run einstein though.

It might be the individual card that has a problem, there 's no fundamental problem with the 9800 GT, as I have one myself and it's doing fine.

Again, the validation errors I could find for your host were with workunits that had earlier ABP2 hosts as wingmen, and you can see that even some of your 220GT crunched units had a validation problem with them:

e.g. http://einsteinathome.org/task/155515818

I suspect it was a coincidence that you switched cards at almost the exact moment when your host ran out of the "problematic" workunits. I'm quite sure your 9800 GT will be doing great on E@H.

CU
HB

Firebird
Firebird
Joined: 10 Mar 05
Posts: 7
Credit: 90397577
RAC: 0

RE: RE: Those are after I

Message 96642 in response to message 96641

Quote:
Quote:
Those are after I switched the card back to the 220gt. The 9800 gt went in a 3.0 gig pentium 4 box, then that one started running junk. I detached it from einstein, it's trying seti now. We'll see how that works for it. The card for sure doesn't run einstein though.

It might be the individual card that has a problem, there 's no fundamental problem with the 9800 GT, as I have one myself and it's doing fine.

Again, the validation errors I could find for your host were with workunits that had earlier ABP2 hosts as wingmen, and you can see that even some of your 220GT crunched units had a validation problem with them:

e.g. http://einsteinathome.org/task/155515818

I suspect it was a coincidence that you switched cards at almost the exact moment when your host ran out of the "problematic" workunits. I'm quite sure your 9800 GT will be doing great on E@H.

That one started with the 220, then switched to 9800 in the middle.

[19:43:43][4552][INFO ] Using CUDA device #0 "GeForce 9800 GT" (462.00 GFLOPS)

CU
HB


Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690848072
RAC: 270706

RE: RE: RE: Those are

Message 96643 in response to message 96642

Quote:
Quote:
Quote:
Those are after I switched the card back to the 220gt. The 9800 gt went in a 3.0 gig pentium 4 box, then that one started running junk. I detached it from einstein, it's trying seti now. We'll see how that works for it. The card for sure doesn't run einstein though.

It might be the individual card that has a problem, there 's no fundamental problem with the 9800 GT, as I have one myself and it's doing fine.

Again, the validation errors I could find for your host were with workunits that had earlier ABP2 hosts as wingmen, and you can see that even some of your 220GT crunched units had a validation problem with them:

e.g. http://einsteinathome.org/task/155515818

I suspect it was a coincidence that you switched cards at almost the exact moment when your host ran out of the "problematic" workunits. I'm quite sure your 9800 GT will be doing great on E@H.

That one started with the 220, then switched to 9800 in the middle.

[19:43:43][4552][INFO ] Using CUDA device #0 "GeForce 9800 GT" (462.00 GFLOPS)

CU
HB


Ok, I see, but still it's one of these units that have a high initial replication and were practically bound to fail for some wingmen

http://einsteinathome.org/workunit/65820105

CU
HB

Firebird
Firebird
Joined: 10 Mar 05
Posts: 7
Credit: 90397577
RAC: 0

I'm sure it's just that

I'm sure it's just that particular card, not 9800 in general. For some reason though it REALLY doesn't like this project. It's running Seti fine, everything validating OK there.
FWIW,
http://www.bfgtech.com/bfgr981024gtge.aspx
That's the exact card in question.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110016054341
RAC: 23137230

RE: The 9800 gt went in a

Message 96645 in response to message 96640

Quote:
The 9800 gt went in a 3.0 gig pentium 4 box, then that one started running junk.


I'm not sure what you mean by "running junk" but I've had a good look at your 3.0GHz Pentium 4 host with the 9800GT and it is showing 6 recently completed tasks, 2 with the outcome of 'success' and 4 with the outcome of 'validate error'. These 4 are not necessarily 'invalid' - 'validate error' means that there was a problem on the server when the validator tried to perform a validation on the 2 members of the quorum. It is possible for this error to be caused by the files not being available on the server in the expected location when the validator tried to do the validation. Check out the explanatory link under the 'outcome' heading.

I'm only guessing here but the ABP2 task load is having a noticeable effect on server performance. In a day there are many short periods where the server response is woeful. The thought occurred to me that you might be trying to 'return results immediately'. If so, the 'reporting' phase might be happening too quickly after the 'uploading' phase and this with a bogged down server might explain what you are seeing. So, are you trying to 'return results immediately'?

Quote:
I detached it from einstein, it's trying seti now. We'll see how that works for it. The card for sure doesn't run einstein though.


You say you have detached from E@H. That host's task list shows a swag of tasks which are still assigned to it. Why don't you give it another try with NNT (no new tasks) set so the host will crunch and upload the results but will not be asking for new work and so won't be trying to report immediately. Once you have a few done, try reporting them well after they upload and see if that gets rid of the 'validate error'.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.