Boinc says its using 2 GPUs ... but its not? [SOLVED]

Phillip Oakes
Phillip Oakes
Joined: 8 Aug 19
Posts: 4
Credit: 3,177,328
RAC: 10,973
Topic 219428

Hey, so I started crunching here after GPUgrid ran out of WUs and noticed something really weird.

With GPUgrid I ran 2 WUs per gpu and all was well, but with Einstein even though Boinc says that it's running 1 task per gpu (device 0 and device 1) and the progress bar of each is progressing, MSI Afterburner shows that only device 0 (gpu1) is running at 99% and the other at about 2%. 

I thought, hey, that's probably an error with afterburner, but the temps are down to almost ambient on device 1 and the exhaust from my case is noticeably cooler.

Lasso shows that the last 2 cpu threads are maxed which is what i have set Einstein to use. The rest of the cores are reserved for WCG.

Yes i have cc_config set with use all gpus 1.

device 0 is a 1070 and 1 is a 970 both running at 8x. cpu is a 6700k with 32GB of ram and I'm running win 10. 

Any ideas on getting E@H to crunch correctly on my second gpu?

 

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,918
Credit: 137,225,449
RAC: 115,088

Please post the startup Event

Please post the startup Event Log messages from BOINC, showing the lines where BOINC lists the GPU detection results for your machine.

There have been cases where driver problems have led to duplicate GPU detections.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,888
Credit: 29,418,171,284
RAC: 35,193,997

Phillip Oakes wrote:Lasso

Phillip Oakes wrote:
Lasso shows that the last 2 cpu threads are maxed which is what i have set Einstein to use. The rest of the cores are reserved for WCG.

I presume you are trying to run the FGRPB1G GPU app?  Your computers are hidden so I can't check.  With hidden computers you need to give more details about your setup.

Unfortunately with FGRPB1G, each nvidia GPU task requires a full CPU core for support duties.  If you only allow 2 cores for this, BOINC can only have 2 GPU tasks running concurrently.  You could use an app_config.xml to change the default CPU allocation but if you did you would probably seriously affect crunch times.

Cheers,
Gary.

Phillip Oakes
Phillip Oakes
Joined: 8 Aug 19
Posts: 4
Credit: 3,177,328
RAC: 10,973

Richard Haselgrove

Richard Haselgrove wrote:
There have been cases where driver problems have led to duplicate GPU detections.

I think you've hit the nail on the head, should I just do a clean install or roll back the driver?

21/08/2019 9:03:58 PM | | Starting BOINC client version 7.14.2 for windows_x86_64
21/08/2019 9:03:58 PM | | log flags: file_xfer, sched_ops, task
21/08/2019 9:03:58 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
21/08/2019 9:03:58 PM | | Data directory: C:\ProgramData\BOINC
21/08/2019 9:03:58 PM | | Running under account Phil
21/08/2019 9:03:58 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 431.36, CUDA version 10.1, compute capability 6.1, 4096MB, 3556MB available, 6803 GFLOPS peak)
21/08/2019 9:03:58 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 970 (driver version 431.36, CUDA version 10.1, compute capability 5.2, 4096MB, 3381MB available, 4381 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 431.36, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6803 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 431.36, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6803 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 970 (driver version 431.36, device version OpenCL 1.2 CUDA, 4096MB, 3381MB available, 4381 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 970 (driver version 431.36, device version OpenCL 1.2 CUDA, 4096MB, 3381MB available, 4381 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 530 (driver version 21.20.16.4542, device version OpenCL 2.0, 12970MB, 12970MB available, 221 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 6.6.0.370, device version OpenCL 2.0 (Build 370))
21/08/2019 9:03:59 PM | | Host name: DESKTOP-763SLQQ
21/08/2019 9:03:59 PM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz [Family 6 Model 94 Stepping 3]
21/08/2019 9:03:59 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 pbe fsgsbase bmi1 hle smep bmi2
21/08/2019 9:03:59 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18362.00)
21/08/2019 9:03:59 PM | | Memory: 31.69 GB physical, 36.44 GB virtual
21/08/2019 9:03:59 PM | | Disk: 418.63 GB total, 188.35 GB free
21/08/2019 9:03:59 PM | | Local time is UTC +10 hours
21/08/2019 9:03:59 PM | | No WSL found.
21/08/2019 9:03:59 PM | | VirtualBox version: 5.2.8
21/08/2019 9:03:59 PM | GPUGRID | Found app_config.xml
21/08/2019 9:03:59 PM | GPUGRID | Your app_config.xml file refers to an unknown application 'acemdbeta'. Known applications: 'acemdlong', 'acemdshort'
21/08/2019 9:03:59 PM | | Config: use all coprocessors
21/08/2019 9:03:59 PM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12786409; resource share 0
21/08/2019 9:03:59 PM | GPUGRID | URL http://www.gpugrid.net/; Computer ID 509469; resource share 100
21/08/2019 9:03:59 PM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 5498529; resource share 100
21/08/2019 9:03:59 PM | Einstein@Home | General prefs: from Einstein@Home (last modified 18-Aug-2019 11:02:32)
21/08/2019 9:03:59 PM | Einstein@Home | Computer location: home
21/08/2019 9:03:59 PM | Einstein@Home | General prefs: no separate prefs for home; using your defaults
21/08/2019 9:03:59 PM | | Reading preferences override file
21/08/2019 9:03:59 PM | | Preferences:
21/08/2019 9:03:59 PM | | max memory usage when active: 24334.24 MB
21/08/2019 9:03:59 PM | | max memory usage when idle: 32445.65 MB
21/08/2019 9:03:59 PM | | max disk usage: 10.00 GB
21/08/2019 9:03:59 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
21/08/2019 9:03:59 PM | | Setting up project and slot directories
21/08/2019 9:03:59 PM | | Checking active tasks
21/08/2019 9:03:59 PM | | Setting up GUI RPC socket
21/08/2019 9:03:59 PM | | Checking presence of 1959 project files

Gary Roberts wrote:
If you only allow 2 cores for this, BOINC can only have 2 GPU tasks running concurrently.

I could have been clearer, I am running one E@H per gpu.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,918
Credit: 137,225,449
RAC: 115,088

Phillip Oakes wrote:Richard

Phillip Oakes wrote:
Richard Haselgrove wrote:
There have been cases where driver problems have led to duplicate GPU detections.

I think you've hit the nail on the head, should I just do a clean install or roll back the driver?

21/08/2019 9:03:58 PM | | Starting BOINC client version 7.14.2 for windows_x86_64
21/08/2019 9:03:58 PM | | log flags: file_xfer, sched_ops, task
21/08/2019 9:03:58 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
21/08/2019 9:03:58 PM | | Data directory: C:\ProgramData\BOINC
21/08/2019 9:03:58 PM | | Running under account Phil
21/08/2019 9:03:58 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 431.36, CUDA version 10.1, compute capability 6.1, 4096MB, 3556MB available, 6803 GFLOPS peak)
21/08/2019 9:03:58 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 970 (driver version 431.36, CUDA version 10.1, compute capability 5.2, 4096MB, 3381MB available, 4381 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 431.36, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6803 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 431.36, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6803 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 970 (driver version 431.36, device version OpenCL 1.2 CUDA, 4096MB, 3381MB available, 4381 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 970 (driver version 431.36, device version OpenCL 1.2 CUDA, 4096MB, 3381MB available, 4381 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 530 (driver version 21.20.16.4542, device version OpenCL 2.0, 12970MB, 12970MB available, 221 GFLOPS peak)
21/08/2019 9:03:58 PM | | OpenCL CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz (OpenCL driver vendor: Intel(R)

...
21/08/2019 9:03:59 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18362.00)

Yes, that would explain your symptoms. Note that the multi-detection problem only affects the OPENCL drivers for your NVidia GPUs. The CUDA drivers properly detect each of the GPUs, one time each: that explains why GPUGrid worked properly.

Notice also that you have two more separate OPENCL drivers: one for the embedded Intel GPU (HD 530), and one for the Intel CPU. The CPU driver isn't important and you could manage without it, but it tends - especially with Windows 10 - to come along for the ride automatically.

I don't think there's anything definitive to offer by way of advice - the underlying cause of the problem isn't well understood yet. In your circumstances, I think I would try:

1) Installing a new NVidia GPU driver, downloaded from the NVidia website. Install that, making sure that you choose the 'advanced' mode install, and checking the box for a 'clean install', thus removing the previous driver as part of the process.

But I'm not sure that this will cure the problem completely. If it persists, go on to:

2) Remove all GPU drivers, for both the NVidia cards AND for the Intel GPU, using Display Driver Uninstaller. Allow the machine to restart, and then check Windows Update. This will detect that drivers are missing and, if all follows the same path as last time I tried it, supply a CUDA driver for NVidia and an OPENCL driver for Intel - which has the happy side-effect of working on the NVidia card as well.

But Microsoft has a habit of changing the rules without notice where video drivers are concerned, and you may find yourself falling back on:

3) Remove GPU drivers using DDU, as before. Disable Windows' automatic driver management. Install manufacturer's driver from both manufacturers' websites.

Sorry, that's the best I can offer.

Phillip Oakes
Phillip Oakes
Joined: 8 Aug 19
Posts: 4
Credit: 3,177,328
RAC: 10,973

Thanks a lot for taking the

Thanks a lot for taking the time to diagnose the issue, It's late here so I'll give it a go tomorrow and report back!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,888
Credit: 29,418,171,284
RAC: 35,193,997

Phillip Oakes wrote:Gary

Phillip Oakes wrote:
Gary Roberts wrote:
If you only allow 2 cores for this, BOINC can only have 2 GPU tasks running concurrently.

I could have been clearer, I am running one E@H per gpu.

It's my fault - you were clear enough. Somehow in my mind the fact that you had two tasks per GPU at GPUgrid translated into two per GPU at Einstein as well.  The clincher for my mental picture was that you mentioned that two CPU cores were maxed out as well, which seems a bit odd since you were only running 1 task on 1 GPU.

Sorry about that.

Cheers,
Gary.

Phillip Oakes
Phillip Oakes
Joined: 8 Aug 19
Posts: 4
Credit: 3,177,328
RAC: 10,973

Sorry I'm late, it was a big

Sorry I'm late, it was a big weekend.

In the end, to fix this I DDU all of my graphics drivers in safe mode, then let windows install the drivers. This caused an error saying no drivers could be installed so I reset, but everything was working perfectly. I re-ran windows update and windows was trying to install a second graphics driver for my 1070 so I paused that and turned off windows update. Seems to be working well so far. 

Thanks again for the help guys!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.