I've been aware of the CPU restrictions on GW tasks for several months now. I played around with it on my high thread count systems. I played around with 2x tasks per GPU, and did see increased GPU utilization north of 90%, and faster AVERAGE WU run times but I don't think it's worth it to use up to 3 threads per GPU that way. it's a lot of extra power burned for just a little gain.
what we need is a better optimized nvidia app that doesnt need so much CPU resources to run. no more than 1 full thread, and >95% GPU utilization for the entire run. if they can move all of the work to the GPU and stop relying on the CPU for a good bit of the work, then they will see improved power efficiency from the host and faster run times as well.
I have faith that petri can work his magic, when he finds the time.
I still have two other ideas I want to try. 2 cpu threads for 1 gpu task. And 1 cpu per 0.33 gpu tasks.
nothing you do to that CPU value will change how much CPU is actually being used. it is ONLY used for BOINC resource management to know how much is used by each device. Just making sure you know that. changing it to 2 CPU to 1 GPU will not make it actually USE 2 cpu threads. it will only reserve them and not allow other tasks to use them. it's fine if you just want to make sure the GPU task has enough resources available, but don't expect to actually see that being used.
and on the flip side you can change it to 0.5 if you want, and the GPU task will still use what it needs, even if it's greater than 0.5, which for GW tasks, it usually is. the only thing here is that BOINC is not accounting for that, and if you are running other CPU work, it will over allocate resources that can/will step on those GPU tasks if you run out of CPU resources. classic "overcommited" situation.
hypothetical situation for simplicity in understanding: 1 GPU on a 4-thread system, running only GPU work
with setting 2cpu:1gpu - BOINC sees that the GPU task needs 2 CPUs, leaving 2 cpus free to service other tasks, if allowed
with setting 1cpu:1gpu - BOINC sees that the GPU task needs 1 CPU, leaving 3 cpus free to service other tasks, if allowed
with setting 0.5cpu:0.5gpu - BOINC sees that the GPU task needs 0.5, also running 2 per GPU, so it thinks only a single core is required for both (0.5+0.5 = 1), leaving 3 cpus free to service other tasks.
the problem with the last scenario is that we know from experience that GW tasks will in reality use more than 1.0 cpus per GPU task. call it 1.25. so now you have the GPU work using 2.5 cpus actually in use, and 1.5 "free", but BOINC thinks that you have 3 cpus free and will allocate them out to other tasks if you allow it.
this is what most people get wrong about the cpu:gpu use numbers.
I have faith that petri can work his magic, when he finds the time.
+1
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
All GW GPU tasks on my GTX 1060 with 3 GB Video RAM failed on April 29. All GW GPU tasks completed successfully today but, save one validated, they are all in pending state because their wingman has a GTX 750 Ti with 2 GB. I haven't changed a single thing on my Windows 10 PC except for a cumulative upgrade on Windows home edition 1903 which Microsoft sent me last nigt and I had to reboot the PC.
I've been aware of the CPU restrictions on GW tasks for several months now. I played around with it on my high thread count systems. I played around with 2x tasks per GPU, and did see increased GPU utilization north of 90%, and faster AVERAGE WU run times...
I thought I was seeing a "more than double" the run time at 2x and 3x started throwing a few computation errors. I am currently back down to 1x but the processing time is still significantly higher than it "used" to be and my RAC graph has been dead flat.
---edit---
I forgot to re-set my cpu threads to keep "4 idle". As soon as I did the gpu processing time resumed sub-600 seconds.
--edit--
--another edit---
It is preliminary but it looks like when I have 6 threads idling time the task takes to run is trending lower than when I only had 4 idling. I know I have been seeing cpu time #'s exceeding the wallclock processing time but really sounds like we need some elementary optimization too. Not just Petri level.
--another edit---
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I was given access to the source code (thanks Bernd!!!).
The code compiled just well after I set up my environment to be compatible with e@h.
I had some difficulty with crypt.cpp in boinc but I got it resolved (a different version).
"
Successfully built and installed HSgammaPulsar [Application]!
************************************
Build finished successfully!
ma 1.6.2020 12.24.40 +0300
************************************
root@Linux1:~/fermilat-master# ls dist/
HSgammaPulsar_i686-pc-linux-gnu-cuda HSgammaPulsar_x86_64-pc-linux-gnu-opencl
HSgammaPulsar_i686-pc-linux-gnu-cuda-noboinc HSgammaPulsar_x86_64-pc-linux-gnu-opencl-noboinc
root@Linux1:~/fermilat-master#
"
Now I need to make me an app_info.xml to run anonymous platform....
I'll try to get some time in the forthcoming months to read the code and fully understand it. Then I'll try some "things" and if and if and if and only after that then ...
I've been aware of the CPU restrictions on GW tasks for several months now. I played around with it on my high thread count systems. I played around with 2x tasks per GPU, and did see increased GPU utilization north of 90%, and faster AVERAGE WU run times...
I thought I was seeing a "more than double" the run time at 2x and 3x started throwing a few computation errors. I am currently back down to 1x but the processing time is still significantly higher than it "used" to be and my RAC graph has been dead flat.
---edit---
I forgot to re-set my cpu threads to keep "4 idle". As soon as I did the gpu processing time resumed sub-600 seconds.
--edit--
--another edit---
It is preliminary but it looks like when I have 6 threads idling time the task takes to run is trending lower than when I only had 4 idling. I know I have been seeing cpu time #'s exceeding the wallclock processing time but really sounds like we need some elementary optimization too. Not just Petri level.
--another edit---
Tom M
Is that 4 or 6 real cpu cores or is it both real and virtual cores? Alot of Projects run much better when not using virtual cores due to the overhead of the virtual cores and the units not fitting into their more limited memory.
Is that 4 or 6 real cpu cores or is it both real and virtual cores? Alot of Projects run much better when not using virtual cores due to the overhead of the virtual cores and the units not fitting into their more limited memory.
A terminology question here. If you are referring to with/without hyperthreading/SMT as "real and virtual cores" it would have been a lot clearer to me, at least, if you had used the standard naming convention(s).
In answer to your question SMT was running. And I was only varying the "% of cores used" in the Boinc Manager.
Recently I have been testing under Windows 10 with a different system id. I expect to move back to Linux after I run the Windows 10 setup "dry". I have made comments about the results in the "All things Radeon" thread.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I've been aware of the CPU
)
I've been aware of the CPU restrictions on GW tasks for several months now. I played around with it on my high thread count systems. I played around with 2x tasks per GPU, and did see increased GPU utilization north of 90%, and faster AVERAGE WU run times but I don't think it's worth it to use up to 3 threads per GPU that way. it's a lot of extra power burned for just a little gain.
what we need is a better optimized nvidia app that doesnt need so much CPU resources to run. no more than 1 full thread, and >95% GPU utilization for the entire run. if they can move all of the work to the GPU and stop relying on the CPU for a good bit of the work, then they will see improved power efficiency from the host and faster run times as well.
I have faith that petri can work his magic, when he finds the time.
_________________________________________________________________________
Tom M wrote: I still have
)
nothing you do to that CPU value will change how much CPU is actually being used. it is ONLY used for BOINC resource management to know how much is used by each device. Just making sure you know that. changing it to 2 CPU to 1 GPU will not make it actually USE 2 cpu threads. it will only reserve them and not allow other tasks to use them. it's fine if you just want to make sure the GPU task has enough resources available, but don't expect to actually see that being used.
and on the flip side you can change it to 0.5 if you want, and the GPU task will still use what it needs, even if it's greater than 0.5, which for GW tasks, it usually is. the only thing here is that BOINC is not accounting for that, and if you are running other CPU work, it will over allocate resources that can/will step on those GPU tasks if you run out of CPU resources. classic "overcommited" situation.
hypothetical situation for simplicity in understanding: 1 GPU on a 4-thread system, running only GPU work
with setting 2cpu:1gpu - BOINC sees that the GPU task needs 2 CPUs, leaving 2 cpus free to service other tasks, if allowed
with setting 1cpu:1gpu - BOINC sees that the GPU task needs 1 CPU, leaving 3 cpus free to service other tasks, if allowed
with setting 0.5cpu:0.5gpu - BOINC sees that the GPU task needs 0.5, also running 2 per GPU, so it thinks only a single core is required for both (0.5+0.5 = 1), leaving 3 cpus free to service other tasks.
the problem with the last scenario is that we know from experience that GW tasks will in reality use more than 1.0 cpus per GPU task. call it 1.25. so now you have the GPU work using 2.5 cpus actually in use, and 1.5 "free", but BOINC thinks that you have 3 cpus free and will allocate them out to other tasks if you allow it.
this is what most people get wrong about the cpu:gpu use numbers.
_________________________________________________________________________
Ian&Steve C. wrote: I have
)
+1
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: Ian&Steve C.
)
+1
All GW GPU tasks on my GTX
)
All GW GPU tasks on my GTX 1060 with 3 GB Video RAM failed on April 29. All GW GPU tasks completed successfully today but, save one validated, they are all in pending state because their wingman has a GTX 750 Ti with 2 GB. I haven't changed a single thing on my Windows 10 PC except for a cumulative upgrade on Windows home edition 1903 which Microsoft sent me last nigt and I had to reboot the PC.
Tullio
Ian&Steve C. wrote:I've
)
I thought I was seeing a "more than double" the run time at 2x and 3x started throwing a few computation errors. I am currently back down to 1x but the processing time is still significantly higher than it "used" to be and my RAC graph has been dead flat.
---edit---
I forgot to re-set my cpu threads to keep "4 idle". As soon as I did the gpu processing time resumed sub-600 seconds.
--edit--
--another edit---
It is preliminary but it looks like when I have 6 threads idling time the task takes to run is trending lower than when I only had 4 idling. I know I have been seeing cpu time #'s exceeding the wallclock processing time but really sounds like we need some elementary optimization too. Not just Petri level.
--another edit---
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Here is the latest! I
)
Here is the latest!
I was given access to the source code (thanks Bernd!!!).
The code compiled just well after I set up my environment to be compatible with e@h.
I had some difficulty with crypt.cpp in boinc but I got it resolved (a different version).
"
Successfully built and installed HSgammaPulsar [Application]!
************************************
Build finished successfully!
ma 1.6.2020 12.24.40 +0300
************************************
root@Linux1:~/fermilat-master# ls dist/
HSgammaPulsar_i686-pc-linux-gnu-cuda HSgammaPulsar_x86_64-pc-linux-gnu-opencl
HSgammaPulsar_i686-pc-linux-gnu-cuda-noboinc HSgammaPulsar_x86_64-pc-linux-gnu-opencl-noboinc
root@Linux1:~/fermilat-master#
"
Now I need to make me an app_info.xml to run anonymous platform....
I'll try to get some time in the forthcoming months to read the code and fully understand it. Then I'll try some "things" and if and if and if and only after that then ...
I'll be back.
Woooo. Looking forward to
)
Woooo. Looking forward to it!
_________________________________________________________________________
Tom M wrote: Ian&Steve C.
)
Is that 4 or 6 real cpu cores or is it both real and virtual cores? Alot of Projects run much better when not using virtual cores due to the overhead of the virtual cores and the units not fitting into their more limited memory.
Quote: Is that 4 or 6 real
)
A terminology question here. If you are referring to with/without hyperthreading/SMT as "real and virtual cores" it would have been a lot clearer to me, at least, if you had used the standard naming convention(s).
In answer to your question SMT was running. And I was only varying the "% of cores used" in the Boinc Manager.
Recently I have been testing under Windows 10 with a different system id. I expect to move back to Linux after I run the Windows 10 setup "dry". I have made comments about the results in the "All things Radeon" thread.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!