Gday all
Recently I swapped out one of my older machines for a newer 975xbx2 with a quad core and a newer (but still oldish) gpu. Ive noticed however that my RAC has dropped of quite alot.
Am I missing something? That machine as are all the others still happily crunching away. None of the temps are excessive, and core utilisation is fine.
I did update Nvidia drivers so thats the only thig I change other than adding faster hardware.
Has Nvidia borked their drivers again? or has E@H change credit amounts?
Any ideas?
Cheers
Darren
Copyright © 2024 Einstein@Home. All rights reserved.
RAC drop?
)
No
I have two GTX460 hosts. They will run happily for weeks at a time, then suddenly find a way to drop to much lower productivity. The only consistent symptoms are lower power consumption (the model of UPS I use on both hosts happens to support a decent onscreen power monitor), lower than expected GPU temperature, and suddenly much longer completion times.
Sometimes GPU clock speed monitoring tools report a major downclock (i.e. to 100 MHz), but other times they report normal clock rate and yet the GPU productivity, as given directly by completion times, and indirectly by clock rate and temperature, is impaired.
Once when I had a particularly bad case of this it got drastically better when I updated to the Nvidia driver version I'm currently on (306.97), but I have had at least a case or two since. Sometimes when I'm in that state rebooting the PC fixes things right up, sometimes not. Sometimes just starting up a different monitoring program seems to have tipped it back to normal behavior (I think I've seen cases of this both from OC Guru II and from SIV64X), but for me nothing has been a consistent fix, and the different monitoring tools don't always report the system to be in a problem state.
None of which may be the trouble you have, but if you have not already done so, it might be good to occasionally review your completion time distribution, and watch for changes.
Good luck, and let us know if you figure out something.
Decent thought there. I
)
Decent thought there.
I went through them and checked but completion times havent really changed.
Might revert to older drivers and see what happens...
Ive dropped about 20% off RAC the last couple weeks even tho the new hardware is a fair bit more powerful.
RE: Ive dropped about 20%
)
Have you noticed all the validate errors on your i7-860 box? Over time, this would have quite a negative effect on your RAC.
I would guess that something in that setup is going past its limits. I wouldn't think it was anything to do with drivers. Time to check all the usual stuff :-). Good luck with the hunt!
Cheers,
Gary.
RE: RE: Ive dropped about
)
Hmm didnt notice that...
Overheating? temps say theyre ok but would that cause this issue?
edit: temps all in low to mid 70s for both gpus and cpu. The only change recently to that machine was swapping one of the 560s for a 660ti.
edit 2: One of the other machines is also getting lots of invalidate errors http://einsteinathome.org/host/5022192/tasks&offset=0&show_names=1&state=4&appid=0
Temps on that one are waaaaay to high in the mid to high 80s. Will have to add some fans or repaste the gpus tonight...
Oh oops, you'd seen that
)
Oh oops, you'd seen that already. ;-)
Anyway, validate errors are different from tasks not validating. For validate errors, see Gary's first post of this thread.
Tasks not validating can happen due to dust, heat, overclocking, bad memory, bad capacitors, bad karma. ;-)
Yes seemed that whole
)
Yes seemed that whole machines output and alot of the wu's of the other one were throwing out those errors.
I repasted both GPUs on the worst machine and added another 12cm fan in a suitable spot to try for more airflow and the temps have come down to more acceptable levels.
Will let it run a bit and see how it goes. If it keeps erroring the work units Ill try to find a case with better airflow (or have some waterblocks machined up lol).
Thanks for replys and pointing me in the right place to look for solutions.
If you don´t use think to
)
If you don´t use think to use a program like EVGA Precision or MSI Afterburner to increase the fan speed at lower temperatures, that will do a lot of help to mantain your GPUs cooler. Even better than any external fan (always needed on a crunching host)
Well seems it might be a
)
Well seems it might be a driver issue with the 6600ti.
Temps are fine but WUs all seem to error.
Found this thread about it on Seti but dont know if E@H has similar issue.
http://setiathome.berkeley.edu/forum_thread.php?id=69735
Anyone else with the same card have issues?
RE: Well seems it might be
)
No, that particular driver bug doesn't affect Einstein - the GTX 670 that I researched that workround for the SETI application on runs just fine here - host 5744895.
Hmm lol square one... I
)
Hmm lol square one...
I cant see how its buggered ram or a cap since everything else works 100% fine...the temps are fine etc...games all work no worries etc...it hasnt crashed for weeks...
Might have to shut that machine down to E@H til I can get time to swap cards around a bit and see if its the 660 or something else. the bulk of the errors seem to have appeared after changing that gpu.
Edit: I did find a guy here using the 6600ti without problems...
http://einsteinathome.org/host/5807458/tasks&offset=0&show_names=1&state=4&appid=0