NVIDIA GPU problems since Windows 11 update

mcz
mcz
Joined: 18 Jul 11
Posts: 6
Credit: 70857082
RAC: 97376
Topic 226517

I've been having problems with all BOINC GPU tasks using the NVIDIA driver since an unfortunate update to Windows 11 a couple of weeks ago. So this probably isn't just Einstein specific, but I'll ask for help here anyway. I'll mention that I've installed drivers fresh from NVIDIA (a couple of times already) on the theory that maybe the Windows update replaced the NVIDIA drivers with their own; but that hasn't helped.

 

The Einstein jobs that use the Intel GPU driver work fine, but I realize now that all of the NVIDIA GPU tasks have failed since that update. For example, here's the section with the error from a recent fail (from https://einsteinathome.org/task/1198531885):

2021-12-01 20:37:16.9056 (11072) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O3AS_1.01_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[ERROR] Couldn't get OpenCL device from BOINC (-1)!
2021-12-01 20:37:17.8223 (11072) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-12-01 20:37:17.8359 (11072) [debug]: Set up communication with graphics process.

einstein_O3AS_1.01_windows_x86_64__GW-opencl-nvidia.exe: unrecognized option `--device'


Here's what that part of the log normally looks like, from the last time one of the NVIDIA GPU jobs ran successfully (from https://einsteinathome.org/task/1192872781):

2021-11-18 11:47:54.1040 (3620) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O3AS_1.01_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2021-11-18 11:48:04.2059 (3620) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-11-18 11:48:04.2089 (3620) [debug]: Set up communication with graphics process.
2021-11-18 11:48:04.3864 (3620) [normal]: Parsed user input successfully

While some of the other NVIDIA GPU projects (Milkyway, MLC) sometimes still run with the NVIDIA driver, they don't run reliably anymore. Eventually something happens, usually within a few hours, and then all NVIDIA jobs hang or fail. Until the next reboot. Repeat.

 

Any suggestions for something else to try? Or just wait until the next Windows update? Apologies if I missed an explanation in an earlier thread.

--Martin

 

 

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 264
Credit: 7185948261
RAC: 9802396

Driver Problem? Try

Driver Problem?

Try 497.09   released  Dec 1st  ....

Maybe that is the solution!

Please remember that O3AS  WUs  usually need a GPU with more than 2GB but less than 4GB.

Not always - depends on the WU.

Have a nice day

mcz
mcz
Joined: 18 Jul 11
Posts: 6
Credit: 70857082
RAC: 97376

Thanks for the info on that

Thanks for the info on that NVIDIA update; I'd missed that. Unfortunately, it doesn't seem to have helped. Even after installing the 1Dec21 drivers and rebooting (multiple times now), Einstein isn't working. Same error message

2021-12-03 13:49:22.4161 (11132) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O3AS_1.01_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[ERROR] Couldn't get OpenCL device from BOINC (-1)!
2021-12-03 13:49:23.3036 (11132) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-12-03 13:49:23.3192 (11132) [debug]: Set up communication with graphics process.

(from https://einsteinathome.org/task/1198531885).

I don't think it's lack of memory; if it was I'd expect the error message to be the one about: Can't allocate memory. But maybe something's changed.

Milkyway does work sometimes after a reboot. Not always.

Jonathan
Jonathan
Joined: 4 Oct 17
Posts: 15
Credit: 16939851
RAC: 821

Have you tried uninstalling

Have you tried uninstalling the graphics driver completely and rebooting?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4753
Credit: 17702320413
RAC: 5400358

When graphics driver problems

When graphics driver problems crop up, it is usually best to clean them out completely with the DDU uninstaller to get rid of any lingering remnants that will continue to foul up new reinstallations.

Wagnard DDU Uninstaller

 

mikey
mikey
Joined: 22 Jan 05
Posts: 11968
Credit: 1833776442
RAC: 224751

mcz wrote: Thanks for the

mcz wrote:

Thanks for the info on that NVIDIA update; I'd missed that. Unfortunately, it doesn't seem to have helped. Even after installing the 1Dec21 drivers and rebooting (multiple times now), Einstein isn't working. Same error message

2021-12-03 13:49:22.4161 (11132) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O3AS_1.01_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[ERROR] Couldn't get OpenCL device from BOINC (-1)!
2021-12-03 13:49:23.3036 (11132) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-12-03 13:49:23.3192 (11132) [debug]: Set up communication with graphics process.

(from https://einsteinathome.org/task/1198531885).

I don't think it's lack of memory; if it was I'd expect the error message to be the one about: Can't allocate memory. But maybe something's changed.

Milkyway does work sometimes after a reboot. Not always. 

Your graphics card only has 2gb of memory on it don't the O3AS tasks require a gpu with at least 4gb?

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2823
Credit: 4633382195
RAC: 3668793

Keith Myers wrote: When

Keith Myers wrote:

When graphics driver problems crop up, it is usually best to clean them out completely with the DDU uninstaller to get rid of any lingering remnants that will continue to foul up new reinstallations.

Wagnard DDU Uninstaller

Good info to know.  I was unaware that a program existed to wipe clean the video driver for a fresh install of a new one.  Does this also work for Linux?  The website doesn't say anything about Linux, so I expect not.

George

Proud member of the Old Farts Association

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4753
Credit: 17702320413
RAC: 5400358

No, the DDU uninstaller is

No, the DDU uninstaller is for Windows only.  For Linux you need to use apt purge.

 

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

mikey wrote:Your graphics

mikey wrote:
Your graphics card only has 2gb of memory on it don't the O3AS tasks require a gpu with at least 4gb?

Many times they used to (earlier this year)... but not anymore. Nvidia 2GB works well these days.

https://einsteinathome.org/host/12768123/tasks/0/56

NVIDIA GeForce GTX 960 (2047MB)

peak VRAM usage: 869.2 MB

mikey
mikey
Joined: 22 Jan 05
Posts: 11968
Credit: 1833776442
RAC: 224751

Richie wrote: mikey

Richie wrote:

mikey wrote:
Your graphics card only has 2gb of memory on it don't the O3AS tasks require a gpu with at least 4gb?

Many times they used to (earlier this year)... but not anymore. Nvidia 2GB works well these days.

https://einsteinathome.org/host/12768123/tasks/0/56

NVIDIA GeForce GTX 960 (2047MB)

peak VRAM usage: 869.2 MB 

Thanks that should help alot of people, and the constant questions as to why it didn't work as well.

mcz
mcz
Joined: 18 Jul 11
Posts: 6
Credit: 70857082
RAC: 97376

As a followup, NVIDIA did

As a followup, NVIDIA did release an update to their driver (on 20Dec21, but I just learned about it today). Probably doesn't really fix the problem, but I did learn something.

 

I did a clean install with the new driver, uninstalling the old one first (didn't use the Wagnard uninstaller yet, just NVIDIA's own installer software). Then changed the BOINC client computing preferences so that I wouldn't suddenly get 24 Einstein jobs that all fail immediately, causing Einstein to not send any more workunits for any configuration for a day.

The first couple of NVIDIA tasks were GRPB, and they actually finished without problems. (Haven't got credit yet, but I'm hopeful, once it's validated.) However, then I got a GWO3AS task, which got a computation error in about 3 minutes, similar to previous experiences. However, at least I see now that the error was a memory allocation error:

XLAL Error - XLALOpenCLMemcpy (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/lib/GPUUtils/OpenCLUtils.c:312): Transferring host memory to GPU failed with OpenCL error: CL_MEM_OBJECT_ALLOCATION_FAILURE

(from https://einsteinathome.org/task/1212200757)

After that, all jobs trying to use the NVIDIA driver failed, not just Einstein jobs, but anything else as well, until I rebooted. The other GWO3AS jobs that tried to run had a different error, not memory allocation:

einstein_O3AS_1.01_windows_x86_64__GW-opencl-nvidia.exe: unrecognized option `--device'

So I currently suspect that 1) I don't have enough memory to run (some) GWO3AS jobs, but 2) the bad part is that when this happens, it leave the laptop in a bad state, essentially unusable for any job. So I'm holding off on running Einstein jobs on NVIDIA for now (Still running on the Intel driver, though). Might exclude the GW03AS in the future, as one possible alternative.

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.