"faster" probably isnt the best description and might be a source of confusion.
"more productive" is probably better to describe the benefit. the tasks are slower in raw crunch time per task, but you're doing them concurrently so you end up getting more work done. which results in more points per day.
When you run 2X or 3X, you elapsed times shown for the task need to be divided by the integer to show 'effective' elapsed times.
So your 420+ second tasks are actually completing in 210 seconds, IOW faster "more productive" than your 270 second tasks at 1X
420+ is a half, my explanation seems half as good though.
At 1x I get 4.5-5 min. actual run time per task, doubling up the actual run time per task goes to ~15+ min., about triples. Half of that would make it ~7.5 min. (450 sec.) effective time per task.
The other users who posted their times seem to be running singles at ~7 min. per task. Which seems peculiar to me.
They are all running three brp7 tasks simultaneously. Air cooled. Power limited. Cuda MPS 45%.
Wall clock times (about):
TITAN V about 10 min (600 s), 140W
3080 Ti about 7 min 40 s (460 s), 338W
2080 Ti under 12 min (700 s), 220W
Effective run times (about):
TITAN V 200 s (3 min 20 s)
3080 Ti 150-160 s (2 min 35s)
2080 Ti 240 s (4 min)
--
1) The high number of invalids and errors comes from the fact that I do development work and I need to run a lot of test runs. Sometimes an error is revealed too late and it does not manifest itself in the off-line test runs.
2) Oh how I'd like to have a separate tab for inconclusive tasks and that all lists were correctly sorted. Something like "https://einsteinathome.org/fi/host/13193216/tasks/4/0?sort=desc&order=Reported" but with correct ordering.
They are all running three brp7 tasks simultaneously. Air cooled. Power limited. Cuda MPS 45%.
Wall clock times (about):
TITAN V about 10 min (600 s), 140W
3080 Ti about 7 min 40 s (460 s), 338W
2080 Ti under 12 min (700 s), 220W
Effective run times (about):
TITAN V 200 s (3 min 20 s)
3080 Ti 150-160 s (2 min 35s)
2080 Ti 240 s (4 min)
--
1) The high number of invalids and errors comes from the fact that I do development work and I need to run a lot of test runs. Sometimes an error is revealed too late and it does not manifest itself in the off-line test runs.
2) Oh how I'd like to have a separate tab for inconclusive tasks and that all lists were correctly sorted. Something like "https://einsteinathome.org/fi/host/13193216/tasks/4/0?sort=desc&order=Reported" but with correct ordering.
It would be nice to get similar production on Windows. Overclocking TITAN V RAM a bit shaved off 30 sec. for me but that's all the improvement I know how to get. I tried Linux via WSL2 but BOINC doesn't recognize the GPU in that set up. Do you know if it can be made to?
I agree that sorting by Reported would be helpful sometimes.
It would be nice to get similar production on Windows. Overclocking TITAN V RAM a bit shaved off 30 sec. for me but that's all the improvement I know how to get. I tried Linux via WSL2 but BOINC doesn't recognize the GPU in that set up. Do you know if it can be made to?
I agree that sorting by Reported would be helpful sometimes.
GPU detection in WSL2 with BOINC is an issue. but it's a problem with the symlinks really. can be fixed easily.
I haven't tested our app specifically, but I'd be surprised if it doesnt work. I have seen some apps that wont work in WSL though and require a native Linux environment. the way WSL interactions with GPU is a little weird. it shares the GPU driver from windows, so you are limited to the capabilities of the Windows driver. there are some features available in the Linux driver that are not available in Windows. WSL translates the Linux driver calls into the equivalent Windows driver call, but in some rare cases, there is no equivalent.
you will need to create an app_info.xml file and run Anonymous Platform for this to work.
Thank you, I'll try it out. Even if stock CUDA app works, it should still be an improvement from what I've read. If all goes really well, the custom app with the RAM overclock might produce record setting times. :-)
MPS, is that part of the custom app or is it something I'll have to learn about and install and run separately?
I've got a feeling that WSL(2) does not allow access to CUDA compute at least in a way that can be used by E@h -programs.
I can not use RAM overclocking because of the heat issues :(
Hi,
I can imagine the heat issues with all those GPUs. :-)
Since you do development, do you know why overclocking GPU clock didn't make a difference but overclocking GPU RAM clock made a significant one? Just curious.
I'll try out the WSL2 GPU fix posted by another user above, hopefully it'll work.
the limiting factor is RAM speed, either a lot of sequential reads and bandwidth or random access and latency. This is what you experienced.
And some times ...
the limiting factor is GPU floating point operations (sin, cos, sqrt, div) and especially double precision math
divergent execution that is caused by if/switch/loop commands.
performance is hurt by previous instruction result / needed by next instruction input latencies. Instruction level parallelism can on occasion nearly double the performance.
the calculated problem (dataset) is too small to be well suited to parallel computation.
results are needed back from GPU to make decisions on what to do next with CPU (transfer latency and bandwidth plus starting new work after communicating back to GPU)
Chosen algorithm may not be well suited for parallel execution.
Thank you, I'll try it out. Even if stock CUDA app works, it should still be an improvement from what I've read. If all goes really well, the custom app with the RAM overclock might produce record setting times. :-)
MPS, is that part of the custom app or is it something I'll have to learn about and install and run separately?
MPS is part of the Linux Nvidia driver. it's not available on Windows. Since WSL still uses the windows driver at the base of it, I dont think MPS will work in WSL.
BTW, after running the commands to fix the symlinks. you need to restart WSL, or just restart the whole system for the changes to take effect.
"faster" probably isnt the
)
"faster" probably isnt the best description and might be a source of confusion.
"more productive" is probably better to describe the benefit. the tasks are slower in raw crunch time per task, but you're doing them concurrently so you end up getting more work done. which results in more points per day.
_________________________________________________________________________
Keith Myers wrote: When you
)
420+ is a half, my explanation seems half as good though.
At 1x I get 4.5-5 min. actual run time per task, doubling up the actual run time per task goes to ~15+ min., about triples. Half of that would make it ~7.5 min. (450 sec.) effective time per task.
The other users who posted their times seem to be running singles at ~7 min. per task. Which seems peculiar to me.
I've got a
)
I've got a host https://einsteinathome.org/fi/host/13193216 that has three kinds of GPUs:
They are all running three brp7 tasks simultaneously. Air cooled. Power limited. Cuda MPS 45%.
Wall clock times (about):
Effective run times (about):
--
1) The high number of invalids and errors comes from the fact that I do development work and I need to run a lot of test runs. Sometimes an error is revealed too late and it does not manifest itself in the off-line test runs.
2) Oh how I'd like to have a separate tab for inconclusive tasks and that all lists were correctly sorted. Something like "https://einsteinathome.org/fi/host/13193216/tasks/4/0?sort=desc&order=Reported" but with correct ordering.
petri33 wrote: I've got a
)
It would be nice to get similar production on Windows. Overclocking TITAN V RAM a bit shaved off 30 sec. for me but that's all the improvement I know how to get. I tried Linux via WSL2 but BOINC doesn't recognize the GPU in that set up. Do you know if it can be made to?
I agree that sorting by Reported would be helpful sometimes.
Hi. I've got a feeling
)
Hi.
I've got a feeling that WSL(2) does not allow access to CUDA compute at least in a way that can be used by E@h -programs.
I can not use RAM overclocking because of the heat issues :(
Petri
AndreyOR wrote: It would be
)
GPU detection in WSL2 with BOINC is an issue. but it's a problem with the symlinks really. can be fixed easily.
credit to the user who posted the fix: https://github.com/microsoft/WSL/issues/5663#issuecomment-1068499676
I haven't tested our app specifically, but I'd be surprised if it doesnt work. I have seen some apps that wont work in WSL though and require a native Linux environment. the way WSL interactions with GPU is a little weird. it shares the GPU driver from windows, so you are limited to the capabilities of the Windows driver. there are some features available in the Linux driver that are not available in Windows. WSL translates the Linux driver calls into the equivalent Windows driver call, but in some rare cases, there is no equivalent.
feel free to try it. here's the link to the latest custom Linux BRP7 build: https://drive.google.com/file/d/10fDUDuJulctG_gaqMemyD950QAIRyYVI/view?usp=sharing
you will need to create an app_info.xml file and run Anonymous Platform for this to work.
_________________________________________________________________________
Ian&Steve C. wrote: ... feel
)
Thank you, I'll try it out. Even if stock CUDA app works, it should still be an improvement from what I've read. If all goes really well, the custom app with the RAM overclock might produce record setting times. :-)
MPS, is that part of the custom app or is it something I'll have to learn about and install and run separately?
petri33 wrote: Hi. I've got
)
Hi,
I can imagine the heat issues with all those GPUs. :-)
Since you do development, do you know why overclocking GPU clock didn't make a difference but overclocking GPU RAM clock made a significant one? Just curious.
I'll try out the WSL2 GPU fix posted by another user above, hopefully it'll work.
Hi. There are times when
)
Hi.
There are times when ...
And some times ...
Petri
AndreyOR wrote:Thank you,
)
MPS is part of the Linux Nvidia driver. it's not available on Windows. Since WSL still uses the windows driver at the base of it, I dont think MPS will work in WSL.
BTW, after running the commands to fix the symlinks. you need to restart WSL, or just restart the whole system for the changes to take effect.
_________________________________________________________________________