As both power consumption and GPU load were down at 3x from BRP4 on this system, and I am currently running only a single CPU job on this 4-core host, I thought I might get appreciable speedup of 4x over 3x, but the improvement in throughput was very small, and came at a cost of degraded system level power productivity. So I've reverted to 3x. For this system the 3x benefit over 2x is moderate, but definite, including a power efficiency improvement.
Thanks for the detail; so looks like 3x is still optimum (for nvidia) with 2x not far behind?
I'll probably stick with 2x; keeps the GPU's that little bit cooler (even if the science per watt isn't quite as good).
so looks like 3x is still optimum (for nvidia) with 2x not far behind?
Yes, for my host configuration for the GTX660. I don't feel like plugging my GTX460 back in, but suspect it would prefer 2x to 3x.
I think the rough and ready rule of thumb of running 2x and restricting the pure CPU jobs to n-1 cores is still a fast way to get pretty close to optimum without testing for Einstein word on Nvidia cards. It seems almost universal that 2x beats 1x appreciably if it works at all, and gains above 2x seem to vary from modest to negative. Optimum CPU core count varies more, but n-1 won't often be badly off optimum in performance.
I'm pretty confident my rig would get higher total throughput with one or more additional CPU jobs to the single one I'm running--but I am on summer power conservation, and already throttling a fair number of hours per day in normal service (not these tests), so the poor incremental power efficiency of adding CPU jobs has me not going that way until heating season returns about November.
I've been testing 1x and 2x on a linux PCIe 2.0 system with GTX580.
1x: ~12,000s
2x: ~19,000s (9,500s/task)
As this is quite a difference, I'm wondering if even higher utilisation would be better (compared to BRP4, where NVIDIA at least don't seem to improve much beyond 2x).
I think that higher utilization may be a possibility. I have not checked power consumption on my NVIDIA systems yet but I noticed that one of my AMD systems is now drawing 100w less at the wall while running BRP5 tasks. I suspect there may be some additional headroom for running more tasks but have not tried so far.
Due to high electric costs in the summer, I think I may just keep my systems as is with the reduced power consumption of the BRP5 tasks rather than trying to increase the utilization.
As far as I remember we discovered and reported this while CUDA 4.0 was in alpha test to developers, I think about 2.5y ago. We tested and reported with every new CUDA version that has come out since.
BM
Two and a half years ago means that NVIDIA is not going to correct this bug in any near future. Maybe it's time to move on and find another solution or workaround, in order to make use of CUDA 4.2 and CUDA 5.0. CUDA 4.2 really works great in GPUGRID when compared to CUDA 3.1.
I've been testing 1x and 2x on a linux PCIe 2.0 system with GTX580.
1x: ~12,000s
2x: ~19,000s (9,500s/task)
As this is quite a difference, I'm wondering if even higher utilisation would be better (compared to BRP4, where NVIDIA at least don't seem to improve much beyond 2x).
BRP5 credit definitely requires some more investigation and thinking, more than we have time for now. For the time being I raised the credit to 5000 for newly generated BRP5 workunits.
Generation of BRP4 workunits has been disabled alltogether, what's already there will be sent out and processed by GPUs. (In a few days BRP4 workunit generation will be reconfigured and the remaining Arecibo data will be processd by slower CPUs.)
I´m missing on the SERVER STATUS page at the lower right corner a data field called "BRP5 progress".
Me too. But for various technical reasons this will take a while. For one the first ~30k WUs that were sent out last weekend have to be completely processed. You will see that this happened when the number of "BRP5 Workunits waiting for assimilation" is dropping again.
BRP5 credit definitely requires some more investigation and thinking, more than we have time for now. For the time being I raised the credit to 5000 for newly generated BRP5 workunits.
BM
RE: As both power
)
Thanks for the detail; so looks like 3x is still optimum (for nvidia) with 2x not far behind?
I'll probably stick with 2x; keeps the GPU's that little bit cooler (even if the science per watt isn't quite as good).
Neil Newell wrote:so looks
)
Yes, for my host configuration for the GTX660. I don't feel like plugging my GTX460 back in, but suspect it would prefer 2x to 3x.
I think the rough and ready rule of thumb of running 2x and restricting the pure CPU jobs to n-1 cores is still a fast way to get pretty close to optimum without testing for Einstein word on Nvidia cards. It seems almost universal that 2x beats 1x appreciably if it works at all, and gains above 2x seem to vary from modest to negative. Optimum CPU core count varies more, but n-1 won't often be badly off optimum in performance.
I'm pretty confident my rig would get higher total throughput with one or more additional CPU jobs to the single one I'm running--but I am on summer power conservation, and already throttling a fair number of hours per day in normal service (not these tests), so the poor incremental power efficiency of adding CPU jobs has me not going that way until heating season returns about November.
RE: I've been testing 1x
)
I think that higher utilization may be a possibility. I have not checked power consumption on my NVIDIA systems yet but I noticed that one of my AMD systems is now drawing 100w less at the wall while running BRP5 tasks. I suspect there may be some additional headroom for running more tasks but have not tried so far.
Due to high electric costs in the summer, I think I may just keep my systems as is with the reduced power consumption of the BRP5 tasks rather than trying to increase the utilization.
Hallo! I´m missing on the
)
Hallo!
I´m missing on the SERVER STATUS page at the lower right corner a data field called "BRP5 progress".
Kind regards and happy crunching.
Martin
RE: As far as I remember we
)
Two and a half years ago means that NVIDIA is not going to correct this bug in any near future. Maybe it's time to move on and find another solution or workaround, in order to make use of CUDA 4.2 and CUDA 5.0. CUDA 4.2 really works great in GPUGRID when compared to CUDA 3.1.
I got a notice few days ago,
)
I got a notice few days ago, that CUDA 5.5 is available. Maybe check that?
-----
RE: I've been testing 1x
)
With a really small sample on my GTX 580 I got:
2x BRP5 = 7790s GPU@ 90%
3x BRP5 = 7570s GPU@ 94%
So only a small (~3%) gain for running 3 tasks.
BRP5 credit definitely
)
BRP5 credit definitely requires some more investigation and thinking, more than we have time for now. For the time being I raised the credit to 5000 for newly generated BRP5 workunits.
Generation of BRP4 workunits has been disabled alltogether, what's already there will be sent out and processed by GPUs. (In a few days BRP4 workunit generation will be reconfigured and the remaining Arecibo data will be processd by slower CPUs.)
BM
BM
RE: I´m missing on the
)
Me too. But for various technical reasons this will take a while. For one the first ~30k WUs that were sent out last weekend have to be completely processed. You will see that this happened when the number of "BRP5 Workunits waiting for assimilation" is dropping again.
BM
BM
RE: BRP5 credit definitely
)
Great news. Thank you!