Wow, it really was that easy! Got the symlinks, restarted WSL2 and bam, BOINC sees the GPU. Much Thanks for that find!
Successfully completed a task with all 4 of the Linux CUDA GPU apps, including the custom BRP7 one. One even validated already. Times were unimpressive but I also had work briefly running or paused on the Windows side at the time. Once I drain the Windows cache, I'll try again with better resource allocation. But the fact that it works is great in itself.
One interesting observation is that BRP7 CPU usage is much higher in the WSL2 setup than regular, see here.
To improve the performance uner WSL i'm going to try things mentioned Here
That link is to the project's home page. What did you mean it to be?
Leveling up CUDA Performance on WSL2 with New Enhancements | NVIDIA Technical Blog Here
That'd be cool if you could improve WSL2 performance. If I understood things correctly, sounds like WSL2 Overhead is where the biggest slow downs occur. I believe hardware-accelerated GPU scheduling is on by default in Windows. Mine is on.
Here's what I found trying out BRP7 1x on WSL2 Ubuntu22.04 on Windows10 PC. Using a sample of 10, your latest app was fastest, 244 sec. avg. Official Linux CUDA (v.16) - 285 sec. Windows CUDA (v.19) - 290 sec. I'd need a larger sample size to be confident that Linux (v.16) is faster than Windows (v.19). Interesting thing is that both Linux apps show a lot more CPU usage than on Windows or regular Linux.
Running 2x is much worse in all cases. Because it was obviously way worse, I only did 6 tasks at 2x, staggered at 50%. Your app was 454 sec. avg. per task (effective, not actual), v.16 - 414 sec. I believe Windows v.19 was ~420 sec.
Trying out O3AS app now to see how that one does under WSL2.
Got some O3AS tasks processed under WSL2 Ubuntu, and both CUDA versions, 1.16 & 1.17, fared worse than Windows OpenCL app. 1.17 seems to be 1-2% faster than 1.16, which I think is too small to be certain without a large sample size. Running 2x is advantageous but is also worse than 2x on Windows.
Windows app was ~22% faster than those two.
In general, O3AS is worse, BRP7 is about even, on WSL2 vs. Windows. I think this is due to WSL2 overhead when it comes to GPU processing, which seems too much to overcome for stock apps. Not impossible though, as Petri's BRP7 app runs much faster on WSL2, shaved off at least 30 sec. for me. 2x is still much worse, with any BRP7 app.
Since I like to run both projects, I'll have to switch manually every few days.
Ian&Steve C. wrote: GPU
)
Wow, it really was that easy! Got the symlinks, restarted WSL2 and bam, BOINC sees the GPU. Much Thanks for that find!
Successfully completed a task with all 4 of the Linux CUDA GPU apps, including the custom BRP7 one. One even validated already. Times were unimpressive but I also had work briefly running or paused on the Windows side at the time. Once I drain the Windows cache, I'll try again with better resource allocation. But the fact that it works is great in itself.
One interesting observation is that BRP7 CPU usage is much higher in the WSL2 setup than regular, see here.
To improve the performance
)
To improve the performance uner WSL i'm going to try things mentioned Here
petri33 wrote: To improve
)
That link is to the project's home page. What did you mean it to be?
AndreyOR wrote: petri33
)
Leveling up CUDA Performance on WSL2 with New Enhancements | NVIDIA Technical Blog Here
petri33 wrote: AndreyOR
)
That'd be cool if you could improve WSL2 performance. If I understood things correctly, sounds like WSL2 Overhead is where the biggest slow downs occur. I believe hardware-accelerated GPU scheduling is on by default in Windows. Mine is on.
Here's what I found trying out BRP7 1x on WSL2 Ubuntu22.04 on Windows10 PC. Using a sample of 10, your latest app was fastest, 244 sec. avg. Official Linux CUDA (v.16) - 285 sec. Windows CUDA (v.19) - 290 sec. I'd need a larger sample size to be confident that Linux (v.16) is faster than Windows (v.19). Interesting thing is that both Linux apps show a lot more CPU usage than on Windows or regular Linux.
Running 2x is much worse in all cases. Because it was obviously way worse, I only did 6 tasks at 2x, staggered at 50%. Your app was 454 sec. avg. per task (effective, not actual), v.16 - 414 sec. I believe Windows v.19 was ~420 sec.
Trying out O3AS app now to see how that one does under WSL2.
Got some O3AS tasks processed
)
Got some O3AS tasks processed under WSL2 Ubuntu, and both CUDA versions, 1.16 & 1.17, fared worse than Windows OpenCL app. 1.17 seems to be 1-2% faster than 1.16, which I think is too small to be certain without a large sample size. Running 2x is advantageous but is also worse than 2x on Windows.
Windows app was ~22% faster than those two.
In general, O3AS is worse, BRP7 is about even, on WSL2 vs. Windows. I think this is due to WSL2 overhead when it comes to GPU processing, which seems too much to overcome for stock apps. Not impossible though, as Petri's BRP7 app runs much faster on WSL2, shaved off at least 30 sec. for me. 2x is still much worse, with any BRP7 app.
Since I like to run both projects, I'll have to switch manually every few days.