Another "Computation Error" thread

CharlesC
CharlesC
Joined: 6 Nov 09
Posts: 3
Credit: 191507759
RAC: 243
Topic 195638

Hi,

I am getting a lot of failed WUs with the following sterr:

6.10.58

Cannot create a symbolic link in a registry key that already has subkeys or values. (0x3fc) - exit code 1020 (0x3fc)

Activated exception handling...
[11:00:50][6552][INFO ] Starting data processing...
[11:00:50][6552][ERROR] Couldn't initialize CUDA driver API (error: 100)!
[11:00:50][6552][ERROR] Demodulation failed (error: 1020)!
11:00:50 (6552): called boinc_finish

This does not happen with all CUDA WUs, but I seem to suddenly get batches which all fail.

Any idea what might be causing this?

My computer specs are as follows:

OS: Windows 7 Ultimate
CPU: Intel C2D E6850 3GHz
Motherboard: MSI P35 Platinum
GPU: nVidia GTX470 (1280MB RAM) - v266.58 driver

I am not running any overclocking and all temps are well within tolerances...
RAM: 4GB

mikey
mikey
Joined: 22 Jan 05
Posts: 12945
Credit: 1884496578
RAC: 22457

Another "Computation Error" thread

Quote:

Hi,

I am getting a lot of failed WUs with the following sterr:

6.10.58

Cannot create a symbolic link in a registry key that already has subkeys or values. (0x3fc) - exit code 1020 (0x3fc)

Activated exception handling...
[11:00:50][6552][INFO ] Starting data processing...
[11:00:50][6552][ERROR] Couldn't initialize CUDA driver API (error: 100)!
[11:00:50][6552][ERROR] Demodulation failed (error: 1020)!
11:00:50 (6552): called boinc_finish

This does not happen with all CUDA WUs, but I seem to suddenly get batches which all fail.

Any idea what might be causing this?

My computer specs are as follows:

OS: Windows 7 Ultimate
CPU: Intel C2D E6850 3GHz
Motherboard: MSI P35 Platinum
GPU: nVidia GTX470 (1280MB RAM) - v266.58 driver

I am not running any overclocking and all temps are well within tolerances...
RAM: 4GB

Bad batches of units?

CharlesC
CharlesC
Joined: 6 Nov 09
Posts: 3
Credit: 191507759
RAC: 243

Well, possibly, I suppose...

Well, possibly, I suppose... I am getting a lot of them!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5888
Credit: 119756223408
RAC: 25676119

You can very quickly rule out

You can very quickly rule out problems with the tasks themselves by looking at how many of your wingmen (quorum partners) are also having problems. Nobody else seems to be having a problem with the tasks but I didn't look at all that many. Also it's rather strange that a small percentage of tasks are completing and validating whilst the bulk are failing.

I can't recall ever seeing that particular message before but the mention of "registry keys" seems to suggest some sort of installation issue, either in BOINC itself or the CUDA driver. If it were my machine I would stop BOINC and uninstall it. Your BOINC Data directory will be preserved and can be picked up again when you eventually reinstall. At each major step I'd be rebooting to make sure the system is "refreshed" and also to give it a chance to "complain" if it wants to. Any such complaints might give you a better handle on what the problem actually is.

The next step would be to remove the current CUDA driver. You might need to do some research on whether or not there are any particular tricks to fully removing the old driver. After rebooting, try reinstalling the latest driver from nVidia (you mentioned 266.58 which I think is the latest anyway), once again keeping a close eye on anything unusual during the installation. If all seems normal, reboot for good measure and then reinstall BOINC, making sure your previous installation is detected. This should happen automatically but it doesn't hurt to use the available options to check for yourself. Then enable work fetch when you are ready (I always set NNT (no new tasks) before shutting down so that I have some control over exactly what happens during a restart) and see if the new work you get can be crunched without error with the fresh installation.

If tasks still fail after that, I'd be looking at testing and replacing hardware, probably starting with your GPU, PSU and system RAM to see if you can find something that changes things.

Good luck with your troubleshooting.

Cheers,
Gary.

Fred E.
Fred E.
Joined: 26 Oct 10
Posts: 4
Credit: 2576811
RAC: 0

I also got this error message

I also got this error message about registry keys on most of the cuda jobs I tried last week. My GT240 card is new, so I stopped downloading them when errors started. Tried again yesterday and the only one that has triedl to run so far run failed with this error message. Sorry, I don't have the stderr.txt to include here. My info on error message is from Boinclogx which logs those errors, but I can't paste from it. Noticed that jobs correctly id the model of card before failing. Exit codde was 1020. If we both have a problem with computer like you described, why do cuda jobs from other projects run okay?

Computer setup is available to view. running XP SP3 with latest, using latest NVIDIA drivers and don't overclock.

Fred

Fred E.
Fred E.
Joined: 26 Oct 10
Posts: 4
Credit: 2576811
RAC: 0

Gary, Charles: I have a

Gary, Charles: I have a clue. All of my errors occurred while I was away from the computer. I noticed that one occurred after my daughter used the computer on her user ID, so I got curious. Tested this morning and indeed, the cuda task that was running bombed when I logged off her id and returned to my id. If I suspend GPU processing before going to her id, all is okay when I return to my id. Charles, do you have multiple users set up on your machine? And have you tried Gary's suggestion to reinstall BOINC?

Just found this comment by Mikey in the Getting Started forum:

http://einsteinathome.org/node/195462&nowrap=true#109745
*****(He was taliking about setting up all users on a machine to run BOINC. I haven't tried that, so this item may not apply)
*****
Be aware that if you crunch with a gpu this will NOT work! Yes Boinc itself will work as you described but the gpu will not crunch when switching from user to user. This is a Windows thing and it is what it is, so if you are crunching with a gpu too, all units will error out when you switch from one user to another. In short do not plan to use a gpu for crunching on a machine with multiple users.
*****

All of my other cuda tasks have run to completion since my earlier post because I've suspended GPU processing when I'm away from the machine. I forgot to mention in my first post that I reinstalled the Nvidia drivers when these problems started, using their installer's menu option for what they call a "clean" install (removes all old files first). That didn't help me.

Fred

mikey
mikey
Joined: 22 Jan 05
Posts: 12945
Credit: 1884496578
RAC: 22457

RE: I also got this error

Quote:

I also got this error message about registry keys on most of the cuda jobs I tried last week. My GT240 card is new, so I stopped downloading them when errors started. Tried again yesterday and the only one that has triedl to run so far run failed with this error message. Sorry, I don't have the stderr.txt to include here. My info on error message is from Boinclogx which logs those errors, but I can't paste from it. Noticed that jobs correctly id the model of card before failing. Exit codde was 1020. If we both have a problem with computer like you described, why do cuda jobs from other projects run okay?

Computer setup is available to view. running XP SP3 with latest, using latest NVIDIA drivers and don't overclock.

Since each project does alot of tweaking to make Boinc run for them you really can't compare one project to another that way. Then if you do any kind of overclocking it throws another huge wrench into the problem too, some projects tolerate alot of overclocking some tolerate NONE at all! I do not know if you overclock or not, it was just an example of the differences from project to project.

And yes switching users will cause those problems in Windows. If you are going to crunch with your gpu you must either do as Fred suggests or just get each user their own pc. When I first started out, many years ago, my wife and I shared a pc, it kept crashing and we blamed each other! I got her her own pc and it started crashing but mine didn't, I had found the problem!! Then the kids shared her pc and it crashed even more often, I finally got them their own and theirs only crashed when they would load and unload games 20 times a day. Now we each have our own pc's and they have all learned how not to make it crash, for the most part anyway. It also gives me more cpu's for Boinc and places to put gpu's for crunching too!!

CharlesC
CharlesC
Joined: 6 Nov 09
Posts: 3
Credit: 191507759
RAC: 243

Thanks for your thoughts

Thanks for your thoughts chaps, there's definitely some ideas worth further investigation...

I do think Fred might have hit on something though, as my PC is used by my fiancee during the day and she uses the Windows 7 "Switch User" function, so leaving my account logged in with the BOINC app still running, while she logs on with her own account.
I'll try to correlate the times that she has been logged on with WU failure...

Thanks again!

mikey
mikey
Joined: 22 Jan 05
Posts: 12945
Credit: 1884496578
RAC: 22457

RE: Thanks for your

Quote:

Thanks for your thoughts chaps, there's definitely some ideas worth further investigation...

I do think Fred might have hit on something though, as my PC is used by my fiancee during the day and she uses the Windows 7 "Switch User" function, so leaving my account logged in with the BOINC app still running, while she logs on with her own account.
I'll try to correlate the times that she has been logged on with WU failure...

Thanks again!

Apparently Windows uses the Windows drivers to display what is on screen when you switch users NOT the Nvidia or ATI drivers, causing the Nvidia or ATI drivers to crash. The only way to make them work again is to reboot the machine, now IF you suspended the crunching of the GPU units for your account and THEN switched users supposedly it will work just fine without a reboot when your account gets logged back in.

Fred E.
Fred E.
Joined: 26 Oct 10
Posts: 4
Credit: 2576811
RAC: 0

Mikey and Gary, thanks for

Mikey and Gary, thanks for the help.

Have taught daughter and grandson how to suspend GPU work before they switch users, but they'll forget from time to time. Might have grandson log me off - it's easier for him to remember, but that stops cpu work too. Just posted a wish list item at BOINC site to ask if BOINC Mgr. could detect the user switch and suspend gpu processing for us - not very hopeful that they can or will, but thought I'd try. If they show interest, I'll refer them to your explanation - it helps me. Thought it was interesting that Primegrid has built in a 10 minute delay after an error to help avoid emptying the cache (and they say it's to save load on their server).

I was wrong when I said SETI didn't fail this way. I thought it was okay because in my first week of cuda processing, I ran 72 cuda tasks from them without error. Didn't have one to test when I wrote that note due to their extended outage. Have some tasks now and sacrificed one for a test Indeed, their tasks fail when I switch user. Mikey, I like the idea of more computers in the house - this may be my excuse to build that dream cruncher, and help avoid the pain of a Win 7 upgrade on this one! But it would be cheaper to locate a new battery for her old notebook.

ps sorry about the typos in these posts - that's a combination of trying to type while GPU is crunching and putting off eye surgery too long.

Fred

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

As I said at the BOINC forums

As I said at the BOINC forums as well, this was already on the developers TODO list. Worse even, it should work already, even in 6.10.58, as it uses the same code as the Windows Remote Desktop function. I just tested with BOINC 6.12.15 and Windows 7, while I was talking to one of the developers and found a snag... my BOINC continues to run, even on the GPU, when I switch to the other user. No crashes.

So uh... either something changed in Windows 7, or the fix is in 6.12, or there's something else wrong now as no matter what, it should suspend the work done on the GPU when we switch to the other user, or log in remotely. ;-)

It's being looked at.

Btw, I also answered your other points. ;)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.