As was said, the speed up is much more awesome on faster GPUs.
I'd guess that a host with a fast GPU and a CPU with slow single-core performance might show the biggest improvement.
I'd also guess that the whole relationship of optimal number of simultaneous GPU tasks and optimal number of CPU cores held back from pure CPU processing has probably changed for many hosts. As I am running on summer power-throttling and preparing for some time away from home it is not a good time for me to rerun tests, but even without them I can see clear signs of major improvement caused by the new application on two hosts with GTX460 GPUs.
I think there was some discussion that it is good to free up one core (i.e. on a quad core use only three) to let it feed the GPU.
Otherwise the GPU is waiting for the data coming from the CPU most of the time.
None of us have much experience of the GTX 660 Ti or driver 304.87 yet.
As reported in the other v1.28 feedback thread, my GTX 670 with driver 304.79 showed a 25% reduction in runtime.
I think it's been reported that the 660 Ti has a narrower, and hence slower, GPU internal memory bus. We handle a lot of data (large data files) here at Einstein: that might explain why your card is showing less benefit than mine - or the difference might be the CPUs (my 670 is hosted by an overclocked i7).
So far that doesn't seem to be making any difference as far as time per task Bikeman
I wonder how you are looking at it to see it that way.
Looking at the valid tasks lists for your host 4109993 I think I see an impressive transition. Without doing a formal summary, it appears that for 1.25 this host was posting elapsed times of about 4200 seconds and CPU times of about 1420 seconds, which seem to have come down to about 2800 and 640 for version 1.28--an impressive improvement by most standards.
So far that doesn't seem to be making any difference as far as time per task Bikeman
I wonder how you are looking at it to see it that way.
Looking at the valid tasks lists for your host 4109993 I think I see an impressive transition. Without doing a formal summary, it appears that for 1.25 this host was posting elapsed times of about 4200 seconds and CPU times of about 1420 seconds, which seem to have come down to about 2800 and 640 for version 1.28--an impressive improvement by most standards.
Am I missing something?
Yeah archae,
What we were talking about is that the time didn't change after setting the Boinc manager to change the "task checkpoint to disc" to 60 seconds since for some reason it was at zero and after I did change that and take a look at the time it was still the same.
Bikeman was saying he thought it should be even faster if I did that.
But as mentioned it could be because I run these quads using 2 cores for the T4T and an LHC task at the same time running 2X cuda's here.
You can take a look at the first batch of 1.28's that I did compared to the ones after I changed the "task checkpoint to disc" to 60 seconds.
What we were talking about is that the time didn't change after setting the Boinc manager to change the "task checkpoint to disc" to 60 seconds since for some reason it was at zero and after I did change that and take a look at the time it was still the same.
My Main(24/7)cruncher for Einstein is a 9800GTX that running in a slow PciE*4 Slot with a hard banged Mainboard running three concurrent GPU POEM Tasks in the other pciE*16 slot with a second 9800GTX. So the runtime was ~7000secs for one Einstein tasks and a reserved CPU Core (Q6600) due the limited pciE. Now it dropped to 4200!! secs. Thats the value, when i rememer right, i had with this card in e pciE*16 some time ago :D Thats so great for cards that have only 512MB memory and can only run one task at once AND/OR in slow pcie slots.
Great!! Thx !! *working from 6,5k up to a >10k RAC again* :)
On my host with the 560: ~35
)
On my host with the 560: ~35 mins (1.28) vs ~46 mins (1.25)
On my host with the 550TI ~110 mins (1.28) vs ~135 mins (1.25)
Doing 2 WUs per GPU with 1 CPU core free for each GPU in both hosts.
One task at a time on my 560
)
One task at a time on my 560 Ti:
1.24: about 1720 seconds
1.28: about 1100 seconds
So it only takes 64% of the time now. A fine performance improvement!
Horacio wrote:As was said,
)
I'd guess that a host with a fast GPU and a CPU with slow single-core performance might show the biggest improvement.
I'd also guess that the whole relationship of optimal number of simultaneous GPU tasks and optimal number of CPU cores held back from pure CPU processing has probably changed for many hosts. As I am running on summer power-throttling and preparing for some time away from home it is not a good time for me to rerun tests, but even without them I can see clear signs of major improvement caused by the new application on two hosts with GTX460 GPUs.
RE: So far that doesn't
)
I think there was some discussion that it is good to free up one core (i.e. on a quad core use only three) to let it feed the GPU.
Otherwise the GPU is waiting for the data coming from the CPU most of the time.
Greetings, Christoph
RE: So far that doesn't
)
None of us have much experience of the GTX 660 Ti or driver 304.87 yet.
As reported in the other v1.28 feedback thread, my GTX 670 with driver 304.79 showed a 25% reduction in runtime.
I think it's been reported that the 660 Ti has a narrower, and hence slower, GPU internal memory bus. We handle a lot of data (large data files) here at Einstein: that might explain why your card is showing less benefit than mine - or the difference might be the CPUs (my 670 is hosted by an overclocked i7).
MAGIC wrote:So far that
)
I wonder how you are looking at it to see it that way.
Looking at the valid tasks lists for your host 4109993 I think I see an impressive transition. Without doing a formal summary, it appears that for 1.25 this host was posting elapsed times of about 4200 seconds and CPU times of about 1420 seconds, which seem to have come down to about 2800 and 640 for version 1.28--an impressive improvement by most standards.
Am I missing something?
RE: MAGIC wrote:So far that
)
Yeah archae,
What we were talking about is that the time didn't change after setting the Boinc manager to change the "task checkpoint to disc" to 60 seconds since for some reason it was at zero and after I did change that and take a look at the time it was still the same.
Bikeman was saying he thought it should be even faster if I did that.
But as mentioned it could be because I run these quads using 2 cores for the T4T and an LHC task at the same time running 2X cuda's here.
You can take a look at the first batch of 1.28's that I did compared to the ones after I changed the "task checkpoint to disc" to 60 seconds.
RE: What we were talking
)
I botched the context, sorry about that.
Hey no problem archae
)
Hey no problem archae
Holy shit the new app is
)
Holy shit the new app is absolut fantastic!!
My Main(24/7)cruncher for Einstein is a 9800GTX that running in a slow PciE*4 Slot with a hard banged Mainboard running three concurrent GPU POEM Tasks in the other pciE*16 slot with a second 9800GTX. So the runtime was ~7000secs for one Einstein tasks and a reserved CPU Core (Q6600) due the limited pciE. Now it dropped to 4200!! secs. Thats the value, when i rememer right, i had with this card in e pciE*16 some time ago :D Thats so great for cards that have only 512MB memory and can only run one task at once AND/OR in slow pcie slots.
Great!! Thx !! *working from 6,5k up to a >10k RAC again* :)
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]