NVidia driver 515.48.07 gives OpenCL errors for 12GB RTX 3060 on Linux

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,745
Credit: 35,590,059,437
RAC: 36,515,937

Yes that’s true.    3060Ti

Yes that’s true. 
 

3060Ti - December 2nd 2020

3060 - February 25th 2021 

 

 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,777
Credit: 17,791,857,302
RAC: 3,814,624

My original idea of

My original idea of backleveling to the 470 series drivers then was a good idea then for Mike.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,745
Credit: 35,590,059,437
RAC: 36,515,937

I don’t think his issue is

I don’t think his issue is the drivers. Looks like the GPU is dropping off the PCIe bus (OpenCL device missing) Reboot brings it back I bet. 
 

how’s the power and thermal situation? Using risers? What motherboard? Which slot on the motherboard? What PCIe gen is the slot/card running? 

_________________________________________________________________________

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,550
Credit: 288,499,208
RAC: 71,615

Keith Myers wrote: My

Keith Myers wrote:

My original idea of backleveling to the 470 series drivers then was a good idea then for Mike.

Indeed ! There have now been 16 consecutive successful FGRPB1G WUs finishing, validating and being awarded credit since I reverted the driver. Looking good ie.

12:30:25 (747): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x1659b30 , 0x16598f0] 
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 3060" by: NVIDIA Corporation
Max allocation limit: 3159015424
Global mem size: 12636061696

Thanks everyone for all your input.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,550
Credit: 288,499,208
RAC: 71,615

Ian&Steve C. wrote:I don’t

Ian&Steve C. wrote:

I don’t think his issue is the drivers. Looks like the GPU is dropping off the PCIe bus (OpenCL device missing) Reboot brings it back I bet. 
 

how’s the power and thermal situation?

About 60 - 700 C at 80 - 90% load. Power I don't know.

Quote:
Using risers?

Nope.

Quote:
What motherboard?

Gigabyte B550 Aorus Pro AX ( rev 1.0 )

Quote:
Which slot on the motherboard?

The first ( nearest to CPU ) of three, other two are empty. But there are M.2 slots ( both occupied ) either side of that first slot.

Quote:
What PCIe gen is the slot/card running?

4.0

Hmm, I am rebooting after driver changes .....

I'll watch and wait.

Cheers, Mike.

( edit ) I drop the 3060 temp to ~ 500 C by setting the fans x2 to max speed.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,777
Credit: 17,791,857,302
RAC: 3,814,624

Sounds like the 470 backlevel

Sounds like the 470 backlevel was the ticket.

The 515 drivers are showing up in a lot of problem tickets across numerous forums.

They did a major rewrite to shift focus to AI and cloud computing clusters for the series.

That appears to be antithetical to standard BOINC computing.

 

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,550
Credit: 288,499,208
RAC: 71,615

Yep it's working fine, now 32

Yep it's working fine, now 32 consecutive successful FGRPB1G WUs finishing, validating and being awarded credit since I reverted the driver.

It's actually fascinating to watch the GPU load and temperature in one ( Coolero ) window vs the work unit progress in another ( BOINC Task pane ). The first ( notionally ) ~ 90 % of the workunit goes at 80 - 90% load @ 600C, the remaining 10% of the workunit at 20 - 60% load and temp down to 450C. Check this out :

Do I see exponential rise and decay in the GPU temps ? You can even see a little downward notch in load & temp during the reload to a new WU.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,777
Credit: 17,791,857,302
RAC: 3,814,624

The last 10% of the task

The last 10% of the task shifts computation to the cpu to calculate the toplist.

So utilization on the gpu drops to almost nothing.

 

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,550
Credit: 288,499,208
RAC: 71,615

Yeah the first 90% is

Yeah the first 90% is essentially an FFT calculation, but in the toplist phase the GPU load varies widely, probably shuffling data back to general memory space maybe ?

Also it clearly shows that the last notional 10% of the WU is actually over 20% of the time.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,851
Credit: 110,837,673,197
RAC: 33,736,632

A very impressive

A very impressive visualisation of the overall crunching behaviour!

For the rates of rise/fall in GPU temps, I think the word you're looking for is asymptotic rather than exponential :-).

Since the followup stage (last 10%) involves recalculating (on the GPU, not the CPU) the top 10 candidates (toplist) using double precision, the lower GPU load (and temperature) is a function of the relative paucity of the double precision hardware being used.  The single precision hardware would be idling during this time so the load being measured (on average) drops.

I was interested to count the peak/trough cycles that show so nicely.  Exactly 1 for each of the ten candidates.  The bigger drop to almost zero load marks the transition between tasks.

There are similar cycles in load for the main (90%) stage.  I reckon if you counted those, they would add up to the number of skypoints in the task being analysed :-).

That software seems to be doing a very nice job of following exactly what is happening.  Thanks very much for sharing.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.