I have replaced my GTX 560Ti with a GTX 760 now. Unfortunately there are almost no improvements for BRP5 WUs. The run time is about the same :-(
tell me about it...i made the same upgrade (GTX 560 Ti --> GTX 670) and noticed only a minimal improvement in run times as well.
now take into account that a GTX 560 Ti is capable of 1263.4 GFLOPs, while the GTX 670 is capable of 2460 GFLOPs, which is almost double the FP32 (single-precision) performance of the former. now granted, i understand that these are maximum theoretical throughput figures, and that they are not representative of the everyday performance we'll see from our GPUs. but even if our GPUs come nowhere near the maximum theoretical performance, a GTX 670 should still be substantially more powerful than a GTX 560 Ti. so not only is the Kepler architecture (GTX 6xx series) no better than the Fermi architecture (GTX 5xx series) when it comes to Einstin@Home, but Kepler is actually substantially worse clock for clock at Einstein@Home than Fermi. of course you really can't blame it on the GPUs architecture...ultimately it comes down to Einstein@Home code not being as optimized for Kepler as it is for Fermi.
i ended up selling the GTX 670 and "downgrading" to a GTX 580 (i actually have 3 of them now), which blows the GTX 670 away in terms of Einstein@Home performance...even if it consumes slightly more power than a GTX 670.
just to be clear, i was running 3 simultaneously on each of my GTX 560 Ti's when i had them, and i was running 4 simultaneously on the GTX 670 when i had it. the GTX 670 still only performed marginally better than the GTX 560 Ti's did.
If you compare the data sheets you will find out that GTX 5xx series have internal core clock twice the external core clock. GTX 6xx does not double the core clock. A very basic calculation is: twice the cores give the same performance as the 5xx series.
GTX 7xx is a finetuned GTX6xx series.
The advantage of the new cards is: less power consumption.
Since cuda developement continues you can not be shure that the older cards will be usable forever. It is pretty like with the AMD cards: AMD HD4xxx is no longer supported here.
There are threads here where the dev's explain why they are still using an older cuda version, but new versions will come for shure.
So keep the new cards and you are on the save side.
If you compare the data sheets you will find out that GTX 5xx series have internal core clock twice the external core clock. GTX 6xx does not double the core clock. A very basic calculation is: twice the cores give the same performance as the 5xx series.
this is exactly the reason i spoke in terms of GFLOPs only...b/c you can't compare the Kepler and Fermi architectures on either core count or shader clock alone - both must be taken into consideration in order to do an apples to apples comparison (GFLOPs vs GFLOPs), where GFLOPs = shader clock X core count.
my point is that the GTX 670's internal core clock is greater than half of the GTX 560 Ti's internal core clock (or shader clock...whatever you want to call it) AND it has 3.5 times as many cores as the GTX 560 Ti. hence the reason the GTX 670 has twice the theoretical FP32 (single-precision) throughput that the GTX 560 Ti has. and so the reason the GTX 670 isn't substantially more efficient at Einstein@Home is a lack of optimization for the Kepler architecture as compared with the older Fermi architecture.
this is exactly the reason i spoke in terms of GFLOPs only...b/c you can't compare the Kepler and Fermi architectures on either core count or shader clock alone - both must be taken into consideration in order to do an apples to apples comparison (GFLOPs vs GFLOPs), where GFLOPs = shader clock X core count.
my point is that the GTX 670's internal core clock is greater than half of the GTX 560 Ti's internal core clock (or shader clock...whatever you want to call it) AND it has 3.5 times as many cores as the GTX 560 Ti. hence the reason the GTX 670 has twice the theoretical FP32 (single-precision) throughput that the GTX 560 Ti has. and so the reason the GTX 670 isn't substantially more efficient at Einstein@Home is a lack of optimization for the Kepler architecture as compared with the older Fermi architecture.
You are right, but as long as we have only a cuda32 app my calculation is much closer to the reality.
Might be very different and closer to your calculation when the cuda 5x apps will be available.
I have replaced my GTX 560Ti with a GTX 760 now. Unfortunately there are almost no improvements for BRP5 WUs. The run time is about the same :-(
tell me about it...i made the same upgrade (GTX 560 Ti --> GTX 670) and noticed only a minimal improvement in run times as well.
now take into account that a GTX 560 Ti is capable of 1263.4 GFLOPs, while the GTX 670 is capable of 2460 GFLOPs, which is almost double the FP32 (single-precision) performance of the former.
If you did the same comparison at GPUGrid, you'd find the GTX 670 to be almost twice as fast as the 560Ti, but that's CUDA42.
If you did the same comparison at GPUGrid, you'd find the GTX 670 to be almost twice as fast as the 560Ti, but that's CUDA42.
exactly - its an optimization issue. and your point lends further credence to what i said earlier - Einstein@Home just isn't optimized as well for Kepler as it is for Fermi. if it were (i.e. if Einstein@Home were taking advantage of CUDA42, and not just CUDA32), then we would actually see substantial differences in performance between Kepler GPUs and Fermi GPUs on the Einstein@Home project. but until then...
RE: I have replaced my GTX
)
tell me about it...i made the same upgrade (GTX 560 Ti --> GTX 670) and noticed only a minimal improvement in run times as well.
now take into account that a GTX 560 Ti is capable of 1263.4 GFLOPs, while the GTX 670 is capable of 2460 GFLOPs, which is almost double the FP32 (single-precision) performance of the former. now granted, i understand that these are maximum theoretical throughput figures, and that they are not representative of the everyday performance we'll see from our GPUs. but even if our GPUs come nowhere near the maximum theoretical performance, a GTX 670 should still be substantially more powerful than a GTX 560 Ti. so not only is the Kepler architecture (GTX 6xx series) no better than the Fermi architecture (GTX 5xx series) when it comes to Einstin@Home, but Kepler is actually substantially worse clock for clock at Einstein@Home than Fermi. of course you really can't blame it on the GPUs architecture...ultimately it comes down to Einstein@Home code not being as optimized for Kepler as it is for Fermi.
i ended up selling the GTX 670 and "downgrading" to a GTX 580 (i actually have 3 of them now), which blows the GTX 670 away in terms of Einstein@Home performance...even if it consumes slightly more power than a GTX 670.
RE: GTX560Ti: 9900
)
For better performance on those GPUs with many CUDA cores with current BRP applications, you really need to run more than one WU at a time (2 or 3).
just to be clear, i was
)
just to be clear, i was running 3 simultaneously on each of my GTX 560 Ti's when i had them, and i was running 4 simultaneously on the GTX 670 when i had it. the GTX 670 still only performed marginally better than the GTX 560 Ti's did.
If you compare the data
)
If you compare the data sheets you will find out that GTX 5xx series have internal core clock twice the external core clock. GTX 6xx does not double the core clock. A very basic calculation is: twice the cores give the same performance as the 5xx series.
GTX 7xx is a finetuned GTX6xx series.
The advantage of the new cards is: less power consumption.
Since cuda developement continues you can not be shure that the older cards will be usable forever. It is pretty like with the AMD cards: AMD HD4xxx is no longer supported here.
There are threads here where the dev's explain why they are still using an older cuda version, but new versions will come for shure.
So keep the new cards and you are on the save side.
Alex
RE: If you compare the data
)
this is exactly the reason i spoke in terms of GFLOPs only...b/c you can't compare the Kepler and Fermi architectures on either core count or shader clock alone - both must be taken into consideration in order to do an apples to apples comparison (GFLOPs vs GFLOPs), where GFLOPs = shader clock X core count.
my point is that the GTX 670's internal core clock is greater than half of the GTX 560 Ti's internal core clock (or shader clock...whatever you want to call it) AND it has 3.5 times as many cores as the GTX 560 Ti. hence the reason the GTX 670 has twice the theoretical FP32 (single-precision) throughput that the GTX 560 Ti has. and so the reason the GTX 670 isn't substantially more efficient at Einstein@Home is a lack of optimization for the Kepler architecture as compared with the older Fermi architecture.
RE: this is exactly the
)
You are right, but as long as we have only a cuda32 app my calculation is much closer to the reality.
Might be very different and closer to your calculation when the cuda 5x apps will be available.
RE: RE: I have replaced
)
If you did the same comparison at GPUGrid, you'd find the GTX 670 to be almost twice as fast as the 560Ti, but that's CUDA42.
RE: If you did the same
)
exactly - its an optimization issue. and your point lends further credence to what i said earlier - Einstein@Home just isn't optimized as well for Kepler as it is for Fermi. if it were (i.e. if Einstein@Home were taking advantage of CUDA42, and not just CUDA32), then we would actually see substantial differences in performance between Kepler GPUs and Fermi GPUs on the Einstein@Home project. but until then...
This has already been
)
This has already been discussed a few posts above:
http://einsteinathome.org/node/196873&nowrap=true#123837
It seems we're out of luck...
-----
I have several different
)
I have several different GeForce cards but the 660Ti is running these tasks the fastest for me and the 550Ti is faster than the 650Ti
But I OC'd all of them as much as I can running 2X (and all of my processors are a couple years old so I don't have any OC'd)