NVidia driver 515.48.07 gives OpenCL errors for 12GB RTX 3060 on Linux

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,921
Credit: 18,475,583,469
RAC: 5,894,924

Quote:( edit ) The

Quote:
( edit ) The multimeter's probe head is too fat to reach down the back of the connector.

FYI, if you don't have needle tips for your probe leads, then a short croc-clipped test lead and a T-pin change the probing lead to a needle tip. Handy to get down into the backside of a leaded connector.

 

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,581
Credit: 308,317,784
RAC: 206,107

After a good month of

After a good month of behaving the same error has arisen ( same host ), again with only the gamma ray pulsar binary work on the GPU. For example :

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
process exited with code 69 (0x45, -187)</message>
<stderr_txt>
17:14:46 (1232): [normal]: This Einstein@home App was built at: Aug 17 2021 16:19:40

17:14:46 (1232): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia'.
17:14:46 (1232): [debug]: 1e+16 fp, 7.3e+09 fp/s, 1434329 s, 398h25m28s76
17:14:46 (1232): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L12220730.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 836.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L12220730_0844_27158139.dat --debug 0 -o LATeah3012L12220730_844.0_0_0.0_27158139_0_0.out
output files: 'LATeah3012L12220730_844.0_0_0.0_27158139_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L12220730_844.0_0_0.0_27158139_0_0' 'LATeah3012L12220730_844.0_0_0.0_27158139_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L12220730_844.0_0_0.0_27158139_0_1'
17:14:46 (1232): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
17:14:46 (1232): [debug]: glibc version/release: 2.33/release
17:14:46 (1232): [debug]: Set up communication with graphics process.
GPU not found: type=NVIDIA, opencl_device_index=0, device_num=0
boinc_get_opencl_ids returned [(nil) , (nil)]
Failed to get OpenCL platform/device info from BOINC (error: -1)!
initialize_ocl(): Got no suitable OpenCL device information from BOINC - boincPlatformId is NULL - boincDeviceId is NULL
initialize_ocl returned error [2004]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
17:14:46 (1232): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:
mv: cannot stat 'LATeah3012L12220730_844.0_0_0.0_27158139_0_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220730_844.0_0_0.0_27158139_0_0.out.cohfu': No such file or directory
17:14:46 (1232): [normal]: done. calling boinc_finish(69).
17:14:46 (1232): called boinc_finish(69)

</stderr_txt>
]]>

.... which blew off 193 units before I caught the problem. Notably it started just after a reboot required for new/major flatpak updates ( but not the video drivers ) for the operating system. Another reboot has solved the problem ie. the units are completing rather than aborting after a few seconds and one has validated. Go figure !

Cheers, Mike.

( edit ) Actually I think that may imply that the process of updating, say, the ICD loader runtime (libOpenCL.so) could be the issue here. I now recall that I didn't shutdown BOINC before rebooting after the flatpak downloads, IIRC the update is not complete until after reboot. Subtle, see this diagram :

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6,279
Credit: 8,997,827,987
RAC: 12,021,434

Its wonderful how "cold

Its wonderful how "cold boots" and re-boots seem to "fix" things :)

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.