In the last 2 days computer 12193582 has had 23 GPU WUs with validate errors. It is mostly the WUs from one video card that are failing. I can't find anything wrong with the computer setup or the StdErr file. Could some look and see why all these WUs won't validate?
Copyright © 2024 Einstein@Home. All rights reserved.
23 Validate Errors in last 2 days -- New O/S installation on a r
)
You're getting "Validate errors" on all the failed tasks.
When a result is uploaded to the servers the validator does a preliminary check on the file to make sure it's formatted in the right way and contains valid data, if not it throws a validate error.
Usually a "validate error" is a good indicator of a problem with the hardware or the settings for the affected machine.
First of all do a reboot of the machine, preferably powering down to completely clear the RAM and GPU-RAM. Then check temps and voltages under load to make sure they are within specs. If overclocked go back to stock settings. If running multiple GPU units at once go back to one at a time.
For further help post more info about temps, voltages, frequencies, number of tasks run at a time and so on.
RE: ... It is mostly the
)
I had a quick look at some of the stderr.txt outputs for some of these errors. There's a line that gives device number and I saw regular errors for both device #0 and #1 but not #2. I looked at some valid results and saw examples of all three devices giving acceptable answers.
When you say "refurbished" what does that actually mean? If you have replaced failed components (eg swollen capacitors) maybe you have missed one. If it's just a 'cleanup and dustoff', I wouldn't imagine that would be of concern.
That's not really possible since, as Holmis explains, all you would see is something that doesn't comply with the proper format of a result that is able to be tested against another properly formatted result. There's no way to diagnose why your result can't pass a basic sanity check. All that can really be said is that it's most likely there is something not quite right with your machine. Something is a little outside its comfort zone.
The classic things to check are operating frequencies, temperatures, sufficiency and quality of power and quality of RAM. Swapping components (if possible) is a good way to narrow things down. Since you are running three similar GPUs, try removing one temporarily and see if it makes any difference.
Good luck with hunting the problem down.
Cheers,
Gary.