I recently replaced a NVIDIA GTX 650TI with a NVIDIA GK104 and now I am noticing a several invalidates on computer #9715636
on one job I am seeing: Outcome: Validate error (2:00000010)
[EDIT] noticed the following at the end of many jobs:
[02:42:21][12845][INFO ] Statistics: count dirty SumSpec pages 3944 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1100505
[02:42:21][12845][INFO ] Data processing finished successfully!
02:42:21 (12845): called boinc_finish
I had noticed that my daily performance was not was I expected it should be so I started to look around.
Any ideas?
Copyright © 2024 Einstein@Home. All rights reserved.
noticed a large number of invalidates after replacing a GTX 650T
)
I believe I found MY problem. I was running with the NVIDIA 304 drivers on this node. I upgraded to the 331 drivers and now E@H in the "computers view" no longer ID's the GPU as a GK104 but as a GTX 760. When I installed this card I had checked to see if the card the E@H saw was correct: a GK 104 and thought that was correct. I guess not.
I will follow the jobs and see if the invalidates cease.
[EDIT] Looks like I have lost 5~6 days of work. All jobs since the swapout seem to have been "invalid". Ouch!
I'm glad you found the
)
I'm glad you found the problem! A few of your WUs during the time frame in question validated nevertheless.. but it's surely not an exercise to repeat.
It's a bit strange that your GTX650Ti with GK106 would work with the old driver, but the new card would work as far as "not giving straight errors" but not quite correctly. The new one is based on GK104, which was released before GK106. It's the exact same old chip!
Which is not to say your new card wouldn'T be nice for what it is.. just that I can't see a reason for the driver to behave like that. Oh well, 304 is really old by now.
MrS
Scanning for our furry friends since Jan 2002
RE: I'm glad you found the
)
Agree. I noticed this morning that two more jobs "invalidated". What is interesting is that the "output" on both of these jobs indicates at the start that the GPU is a GK104 with driver version xxxxx, and then at the mid point in processing the GPU is a GTX 760 with driver version yyyyy. I am assuming/hoping that these two jobs got "caught in the middle" of the driver change out and this is why they invalidated.
It was/is strange that E@H/BOINC would see the new unit as a GK104 with the old driver, but with the newer driver see it as a GTX 760. Not sure how that works.
It seems as though a driver
)
It seems as though a driver upgrade has not fixed my "invalidate" problems. More jobs from today are showing up in the "invalidate" queue.
Should I do a "reset" on this project on this node? I am sure that something got twisted in upgrading the GPU hardware but have not got a clue as to what might be causing this problem.
RE: RE: I'm glad you
)
Well, BOINC asks the driver what cards it can find, and the driver asks the cards what sort of card they are.. Then, BOINC gets the answer it needs from the driver, and passes that on to Einstein.
With the 304 driver dating back to June/July 2012 (depending on sub-version), and the GTX 760 not released until May/June 2013, I'm not surprised the driver couldn't properly interpret the answers it was getting from the card.
RE: It seems as though a
)
You can try a project reset, but I doubt it's going to work. The same executables for nVidia will be downloaded again.
I think these possible reasons are the most likely:
1. Your driver installation might somehow be broken. I guess you know better than me how to try to fix this under linux.
2. The card could be defective. You could test some other GPU project, preferably starting with a reliable and simple one like Collatz. If this works try GPU-Grid. It can be picky, but is also quite a stress test and has a linux app for sure.
3. There's a chance the PSU might not be sufficient for the new GPU (not knowing the model). Then I wouldn't expect silent calculation errors but rather the PC shutting down. Or, if the PSU is old and pushed borderline hard, it could drop voltages too low under heavy load. You could test for this by suspending CPU work, which should save about 100 W of power draw.
MrS
Scanning for our furry friends since Jan 2002
RE: RE: It seems as
)
I removed the 331 drivers and reinstalled. Again more invalidated WUs. I shut the unit down pulled the 760 and replaced with a 650 TI. Reinstalled the drivers once more and am waiting on the first 3 jobs to complete. If they and others are clean then I will suspect the card.
If the 650 begins processing WUs cleanly than I will suspect the card. I have a win7 box that I could move the 760 to and test it there.
I checked the specs on the power supply and it seems more then adequate I had run two 650Tis on this box many months ago before reconfiguring to my current set up and it did not have a problem. Of course that was a few months ago and....
I really hope its not the card but a driver issue. If the 650 fails then it most definitely indicates a driver issue. I would then install the driver from the NVIDIA site just in case my PPA set up has an isssue. If that were to fix the problem then reinstall the 760 and try again.
It seems the "invalidate"
)
It seems the "invalidate" problem experienced with both the GTX 760 and GTX 650 is gone. I have completed 4~5 GPU WUs with the GTX 650 and none so far have "invalidated". I will burn off these WUs and replug the 760 just in the off chance that my reinstall of the NVIDIA drivers fixed the original problem.