Error: Application uses missing NVIDIA GPU - Bobrr

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5870
Credit: 116948961787
RAC: 36705146
Topic 195467

I'm not sure if this is the right place for this, so here goes:

This message is from the BOINC manager messages:

'Sat 13 Nov 2010 08:00:09 PM EST Einstein@Home Application uses missing NVIDIA GPU
Sat 13 Nov 2010 08:00:09 PM EST Einstein@Home Missing coprocessor for task p2030_53487_34711_0066_G42.78-01.15.N_1.dm_140_0'

Also:
'Sat 13 Nov 2010 08:00:09 PM EST Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 3200+ [Family 15 Model 47 Stepping 0]
Sat 13 Nov 2010 08:00:09 PM EST Processor: 512.00 KB cache
Sat 13 Nov 2010 08:00:09 PM EST Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up rep_good pni lahf_lm
Sat 13 Nov 2010 08:00:09 PM EST OS: Linux: 2.6.35-22-generic
Sat 13 Nov 2010 08:00:09 PM EST Memory: 1.96 GB physical, 5.74 GB virtual
Sat 13 Nov 2010 08:00:09 PM EST Disk: 682.02 GB total, 639.43 GB free'

My video card is a Nvidia GeForce GTS 450 with 1024Mb DDR5 RAM.

Is the video card not supported? It is quite new. Or is the system (CPU MOBO) too old?

Thank you.

bobrr

Cheers,
Gary.

ZoSo
ZoSo
Joined: 2 Apr 10
Posts: 14
Credit: 6182189
RAC: 0

Error: Application uses missing NVIDIA GPU - Bobrr

Quote:
I'm not sure if this is the right place for this, [snip]

I don't think it is... the Getting Started forum would be my recommendation for further posts about it, since the problem doesn't appear to be a bug per se...
Did you follow all the steps at
http://boinc.berkeley.edu/wiki/GPU_computing ?

When you get to the 'latest drivers' step, choose GeForce, 400 series, GTS450, and your OS (it's not clear from what you posted if you're running 32 or 64 bit Linux... it's also a good idea to mention what version of BOINC you're using when you're asking how to do things with it; you might not have noticed, but they sometimes have a couple new versions per week).

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4307
Credit: 249827479
RAC: 32288

Well, "Einstein@Home Missing

Well, "Einstein@Home Missing coprocessor for task p2030_53487_34711_0066_G42.78-01.15.N_1.dm_140_0'" means that your computer already got a task for NVidia GPUs, so at some point there was running a BOINC Client that detected a NVidia GPU and fetched work for it. But at the next (re)start of the BOINC Client it couln't find this GPU (as usable) anymore. Question is what has changed? System restore? Graphics driver up/downgrade? BOINC Client? User logged in? System update?

BM

BM

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

RE: Well, "Einstein@Home

Quote:

Well, "Einstein@Home Missing coprocessor for task p2030_53487_34711_0066_G42.78-01.15.N_1.dm_140_0'" means that your computer already got a task for NVidia GPUs, so at some point there was running a BOINC Client that detected a NVidia GPU and fetched work for it. But at the next (re)start of the BOINC Client it couln't find this GPU (as usable) anymore. Question is what has changed? System restore? Graphics driver up/downgrade? BOINC Client? User logged in? System update?

BM

This is highly plausible. You have to reinstall/recompile NVIDIA drivers after Kernel-Update.

Michael

Bobrr
Bobrr
Joined: 7 Jul 10
Posts: 2
Credit: 38497661
RAC: 506

Here is the BOINC

Here is the BOINC information:

'Thu 18 Nov 2010 11:46:25 AM EST Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu
Thu 18 Nov 2010 11:46:25 AM EST Config: GUI RPC allowed from:
Thu 18 Nov 2010 11:46:25 AM EST log flags: file_xfer, sched_ops, task
Thu 18 Nov 2010 11:46:25 AM EST Libraries: libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
Thu 18 Nov 2010 11:46:25 AM EST Data directory: /var/lib/boinc-client
Thu 18 Nov 2010 11:46:28 AM EST Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 3200+ [Family 15 Model 47 Stepping 0]
Thu 18 Nov 2010 11:46:28 AM EST Processor: 512.00 KB cache
Thu 18 Nov 2010 11:46:28 AM EST Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up rep_good pni lahf_lm
Thu 18 Nov 2010 11:46:28 AM EST OS: Linux: 2.6.35-22-generic
Thu 18 Nov 2010 11:46:28 AM EST Memory: 1.96 GB physical, 5.74 GB virtual
Thu 18 Nov 2010 11:46:28 AM EST Disk: 682.02 GB total, 621.77 GB free
Thu 18 Nov 2010 11:46:28 AM EST Local time is UTC -5 hours
Thu 18 Nov 2010 11:46:28 AM EST NVIDIA GPU 0: GeForce GTS 450 (driver version unknown, CUDA version 3020, compute capability 2.1, 1023MB, 421 GFLOPS peak)
'

I did an upgrade to the new NVIDIA driver from 260.19.06 to 260.19.21 around the first week of November. This resulted in the 'No usable GPUs found'. I didn't notice this in BOINC until a later (13 November) as the system including the video were working fine. I reinstalled 260.19.06 and BOINC found the GPU again. I had a similar issue with GPUGRID, but this resulted in the task failing after anywhere from a few minutes to hours. MilkyWay does not seem to recognise the card at all.

Message from Zoso (107861) believes 'Getting started' is the better venue for this post. You can giude me on this.

I believe it is a bug, but not necessarily in Einstein, or in any of the other BOINC projects. From other posts in this and other projects regarding GPU's, it may be all the parts haven't meshed yet. My humble opinion.

If there is any other information you need me to add, let me know.

By the way, 'task p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1' has just run into 'computation error' after 5H 43m 50s.
Same thing for 'p2030_53464_37749_0017_G52.63+01.84.N_4_460.binary' after 4s.

Here are the BOINC messages
'Fri 19 Nov 2010 04:56:04 PM EST Einstein@Home Temporarily failed upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_2: can't resolve hostname
Fri 19 Nov 2010 04:56:04 PM EST Einstein@Home Backing off 1 min 0 sec on upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_2
Fri 19 Nov 2010 04:57:16 PM EST BOINC can't access Internet - check network connection or proxy configuration.
Fri 19 Nov 2010 04:57:17 PM EST Einstein@Home Started upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_0
Fri 19 Nov 2010 04:57:17 PM EST Einstein@Home Started upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_1
Fri 19 Nov 2010 04:58:40 PM EST Einstein@Home Temporarily failed upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_0: can't resolve hostname
Fri 19 Nov 2010 04:58:40 PM EST Einstein@Home Backing off 1 min 0 sec on upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_0
Fri 19 Nov 2010 04:58:40 PM EST Einstein@Home Temporarily failed upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_1: can't resolve hostname
Fri 19 Nov 2010 04:58:42 PM EST Einstein@Home Backing off 1 min 0 sec on upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_1
Fri 19 Nov 2010 04:59:26 PM EST Project communication failed: attempting access to reference site
Fri 19 Nov 2010 05:00:07 PM EST BOINC can't access Internet - check network connection or proxy configuration.
Fri 19 Nov 2010 05:00:53 PM EST Einstein@Home Started upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_0
Fri 19 Nov 2010 05:00:53 PM EST Einstein@Home Started upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_1
Fri 19 Nov 2010 05:02:14 PM EST Einstein@Home Temporarily failed upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_0: can't resolve hostname
Fri 19 Nov 2010 05:02:14 PM EST Einstein@Home Backing off 1 min 0 sec on upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_0
Fri 19 Nov 2010 05:02:14 PM EST Einstein@Home Temporarily failed upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_1: can't resolve hostname
Fri 19 Nov 2010 05:02:14 PM EST Einstein@Home Backing off 1 min 0 sec on upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_1
Fri 19 Nov 2010 05:05:03 PM EST Project communication failed: attempting access to reference site
Fri 19 Nov 2010 05:05:49 PM EST BOINC can't access Internet - check network connection or proxy configuration.
Fri 19 Nov 2010 05:08:29 PM EST Project communication failed: attempting access to reference site
Fri 19 Nov 2010 05:09:14 PM EST BOINC can't access Internet - check network connection or proxy configuration.
'

I am connected through DSL, no proxy, so there should not be any disruptions to the Internet as I am always connected. Other projects are communicating OK.

Thank you.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5870
Credit: 116948961787
RAC: 36705146

RE: Message from Zoso

Quote:

Message from Zoso (107861) believes 'Getting started' is the better venue for this post. You can giude me on this.

I believe it is a bug, but not necessarily in Einstein, or in any of the other BOINC projects. From other posts in this and other projects regarding GPU's, it may be all the parts haven't meshed yet. My humble opinion.


I'm sorry, but I don't understand your logic here.

Your original message concerned BOINC not recognising your GPU. Bernd explained the message and even predicted the possible cause (driver upgrade) and now you tell us that you actually did upgrade and that you solved the problem by reverting to the previous driver. So why are you also saying that "all the parts haven't meshed yet" and what does that actually mean? How does this become an Einstein problem report and why choose the sticky thread about validate errors? Zoso is quite correct to suggest that you should have chosen somewhere else more appropriate to post your message.

I'm not having a go at you - just making a few suggestions and explanations that might help you to get more targeted responses to any problems you might be having. You have actually raised several different issues and it would be much easier for people to respond if you had started a new thread for each separate issue. A good general rule to follow is that if you are not sure your problem is the same as what is being discussed in an existing thread then it probably isn't. Also, choose the title of your new thread to be as informative as possible and only deal with one problem per thread unless the problems are clearly related.

Your first issue was BOINC not recognising your GPU after a driver upgrade. You've solved that by reverting to the previous driver.

Your second issue was the following:-

Quote:
By the way, 'task p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1' has just run into 'computation error' after 5H 43m 50s.
Same thing for 'p2030_53464_37749_0017_G52.63+01.84.N_4_460.binary' after 4s.


If you look through your tasks list for the host in question, you will see quite a few errors dating back to the start of the list and not just the latest two. If you click on any of the taskIDs for those tasks you will get to see the error messages that were returned to the server. These are usually more informative than messages you may see in BOINC Manager.

The computation errors are all listed as

process got signal 11


There is another sticky thread in this forum that relates to signal 11 problems with the S5GC1 application that Bernd is trying to deal with for some time. However, your problem is likely to be different since it relates to the ABP2 app and NOT the S5GC1 app. It's also interesting to note that all your tasks, whether successful or comp error, are plagued with large numbers of 'Error writing shared memory data (size limit exceeded)!' messages. These are actually not preventing the successful completion of some tasks (those that don't get signal 11) and there is this thread that you can easily find by searching the forums for the error message, that gives some advice on what to do. If you follow links as you read, you can find this page which gives more information on what shared memory is all about. You could try running the command

sysctl -a | grep shm to get the shared memory settings that apply on your machine to see if that gives any clue as to what is going on. I have no experience (or answers) - I'm just merely giving you pointers to what can be found with a couple of quick searches. I have no justification for the comment but perhaps the teaming up of a modern GPU with such an old CPU might have something to do with your problems. Also you might like to observe the large difference between CPU time and elapsed (Run) time for all tasks in your tasks list. Unless your machine is heavily involved in doing other things at the same time, those differences would seem to be unusually large. perhaps this is in some way associated with the shared mem messages?

Your third issue is completely different again.

Quote:
'[b]Fri 19 Nov 2010 04:56:04 PM EST Einstein@Home Temporarily failed upload of p2030_53490_36389_0107_G50.32-00.37.N_2.dm_70_1_2: can't resolve hostname
.....


Obviously the upload eventually succeeded because that particular task is now sitting there on the server and is listed as being reported at 22:38:09 UTC ie later on the same day. The 'error' part of the message is specifically "can't resolve hostname" which seems to suggest that DNS lookups were failing at your end for some reason at that point in time. I'm not sure why you think that this is something the project could sort out. Perhaps your ISP's DNS server was not responding for some reason. BOINC is designed to work over unreliable networks so this is just a transient problem that can usually be ignored. BOINC will just keep retrying until it eventually succeeds. Of course, if the problem was caused because someone tripped over and damaged your network cable, BOINC might have a little difficulty in working around that one :-).

As I said earlier, I'm not having a go at you but I am trying to give some pointers on how best to use the message board system both to report problems in the appropriate way and to seek assistance in a way that is easy to respond to and that doesn't hijack a completely different discussion. I'm also putting in a plug for doing searches. In this case because someone was smart enough to properly name the previous thread, it was very easy to find it. Then there were the helpful people that had provided the Spy-Hill.net and other useful links.

In a day or two when you've had a chance to peruse this response, I'll probably shift all discussion about this into a separate thread so that those who want to follow 'Validate errors' can do so without all the extra noise. Hopefully there will be others who can contribute to the 'shared mem' problem who can give you more advice about that. I've never had the problem and I'd forgotten about the previous thread from the beginning of the year.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5870
Credit: 116948961787
RAC: 36705146

@ Bobrr As mentioned

@ Bobrr

As mentioned previously, I've created this new thread as a repository for your various problems using as the title, the error message you first reported.

I had a look at your task list just now and was pleased to see that you at last have a recent successfully completed task. I was also pleased to see that the output for this task didn't contain any 'error writing shared memory' messages and that the run time was quite a bit closer to the CPU time as well. What did you have to do to get things working properly like this?

Cheers,
Gary.

Bobrr
Bobrr
Joined: 7 Jul 10
Posts: 2
Credit: 38497661
RAC: 506

Thank you for taking the time

Thank you for taking the time to explain all of the issues involved. The reason I posted is to get this kind of guidance. I will consider the issues more closely in the future. And I don't take issue with any of your comments.

Keep up the good work.

mikey
mikey
Joined: 22 Jan 05
Posts: 12639
Credit: 1839024911
RAC: 5517

RE: RE: I believe it is a

Quote:
Quote:
I believe it is a bug, but not necessarily in Einstein, or in any of the other BOINC projects. From other posts in this and other projects regarding GPU's, it may be all the parts haven't meshed yet. My humble opinion.

I'm sorry, but I don't understand your logic here.

I think he is trying to say that the new Nvidia drivers do not work in Boinc for him and is wondering if it is a Boinc thing. I say this because just prior to the above statement of his he said: "I did an upgrade to the new NVIDIA driver from 260.19.06 to 260.19.21 around the first week of November. This resulted in the 'No usable GPUs found'. I didn't notice this in BOINC until a later (13 November) as the system including the video were working fine. I reinstalled 260.19.06 and BOINC found the GPU again. I had a similar issue with GPUGRID, but this resulted in the task failing after anywhere from a few minutes to hours. MilkyWay does not seem to recognise the card at all."

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.