My original idea of backleveling to the 470 series drivers then was a good idea then for Mike.
Indeed ! There have now been 16 consecutive successful FGRPB1G WUs finishing, validating and being awarded credit since I reverted the driver. Looking good ie.
12:30:25 (747): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x1659b30 , 0x16598f0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 3060" by: NVIDIA Corporation
Max allocation limit: 3159015424
Global mem size: 12636061696
Thanks everyone for all your input.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Yep it's working fine, now 32 consecutive successful FGRPB1G WUs finishing, validating and being awarded credit since I reverted the driver.
It's actually fascinating to watch the GPU load and temperature in one ( Coolero ) window vs the work unit progress in another ( BOINC Task pane ). The first ( notionally ) ~ 90 % of the workunit goes at 80 - 90% load @ 600C, the remaining 10% of the workunit at 20 - 60% load and temp down to 450C. Check this out :
Do I see exponential rise and decay in the GPU temps ? You can even see a little downward notch in load & temp during the reload to a new WU.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Yeah the first 90% is essentially an FFT calculation, but in the toplist phase the GPU load varies widely, probably shuffling data back to general memory space maybe ?
Also it clearly shows that the last notional 10% of the WU is actually over 20% of the time.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
A very impressive visualisation of the overall crunching behaviour!
For the rates of rise/fall in GPU temps, I think the word you're looking for is asymptotic rather than exponential :-).
Since the followup stage (last 10%) involves recalculating (on the GPU, not the CPU) the top 10 candidates (toplist) using double precision, the lower GPU load (and temperature) is a function of the relative paucity of the double precision hardware being used. The single precision hardware would be idling during this time so the load being measured (on average) drops.
I was interested to count the peak/trough cycles that show so nicely. Exactly 1 for each of the ten candidates. The bigger drop to almost zero load marks the transition between tasks.
There are similar cycles in load for the main (90%) stage. I reckon if you counted those, they would add up to the number of skypoints in the task being analysed :-).
That software seems to be doing a very nice job of following exactly what is happening. Thanks very much for sharing.
Yes that’s true. 3060Ti
)
Yes that’s true.
3060Ti - December 2nd 2020
3060 - February 25th 2021
_________________________________________________________________________
My original idea of
)
My original idea of backleveling to the 470 series drivers then was a good idea then for Mike.
I don’t think his issue is
)
I don’t think his issue is the drivers. Looks like the GPU is dropping off the PCIe bus (OpenCL device missing) Reboot brings it back I bet.
how’s the power and thermal situation? Using risers? What motherboard? Which slot on the motherboard? What PCIe gen is the slot/card running?
_________________________________________________________________________
Keith Myers wrote: My
)
Indeed ! There have now been 16 consecutive successful FGRPB1G WUs finishing, validating and being awarded credit since I reverted the driver. Looking good ie.
Thanks everyone for all your input.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Ian&Steve C. wrote:I don’t
)
About 60 - 700 C at 80 - 90% load. Power I don't know.
Nope.
Gigabyte B550 Aorus Pro AX ( rev 1.0 )
The first ( nearest to CPU ) of three, other two are empty. But there are M.2 slots ( both occupied ) either side of that first slot.
4.0
Hmm, I am rebooting after driver changes .....
I'll watch and wait.
Cheers, Mike.
( edit ) I drop the 3060 temp to ~ 500 C by setting the fans x2 to max speed.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Sounds like the 470 backlevel
)
Sounds like the 470 backlevel was the ticket.
The 515 drivers are showing up in a lot of problem tickets across numerous forums.
They did a major rewrite to shift focus to AI and cloud computing clusters for the series.
That appears to be antithetical to standard BOINC computing.
Yep it's working fine, now 32
)
Yep it's working fine, now 32 consecutive successful FGRPB1G WUs finishing, validating and being awarded credit since I reverted the driver.
It's actually fascinating to watch the GPU load and temperature in one ( Coolero ) window vs the work unit progress in another ( BOINC Task pane ). The first ( notionally ) ~ 90 % of the workunit goes at 80 - 90% load @ 600C, the remaining 10% of the workunit at 20 - 60% load and temp down to 450C. Check this out :
Do I see exponential rise and decay in the GPU temps ? You can even see a little downward notch in load & temp during the reload to a new WU.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
The last 10% of the task
)
The last 10% of the task shifts computation to the cpu to calculate the toplist.
So utilization on the gpu drops to almost nothing.
Yeah the first 90% is
)
Yeah the first 90% is essentially an FFT calculation, but in the toplist phase the GPU load varies widely, probably shuffling data back to general memory space maybe ?
Also it clearly shows that the last notional 10% of the WU is actually over 20% of the time.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
A very impressive
)
A very impressive visualisation of the overall crunching behaviour!
For the rates of rise/fall in GPU temps, I think the word you're looking for is asymptotic rather than exponential :-).
Since the followup stage (last 10%) involves recalculating (on the GPU, not the CPU) the top 10 candidates (toplist) using double precision, the lower GPU load (and temperature) is a function of the relative paucity of the double precision hardware being used. The single precision hardware would be idling during this time so the load being measured (on average) drops.
I was interested to count the peak/trough cycles that show so nicely. Exactly 1 for each of the ten candidates. The bigger drop to almost zero load marks the transition between tasks.
There are similar cycles in load for the main (90%) stage. I reckon if you counted those, they would add up to the number of skypoints in the task being analysed :-).
That software seems to be doing a very nice job of following exactly what is happening. Thanks very much for sharing.
Cheers,
Gary.