Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: I have replaced my GTX

11 Jul 2013 17:58:31 UTC

Message 115717 in response to message 115716

(moderation:

)

Quote:

I have replaced my GTX 560Ti with a GTX 760 now. Unfortunately there are almost no improvements for BRP5 WUs. The run time is about the same :-(

tell me about it...i made the same upgrade (GTX 560 Ti --> GTX 670) and noticed only a minimal improvement in run times as well.

now take into account that a GTX 560 Ti is capable of 1263.4 GFLOPs, while the GTX 670 is capable of 2460 GFLOPs, which is almost double the FP32 (single-precision) performance of the former. now granted, i understand that these are maximum theoretical throughput figures, and that they are not representative of the everyday performance we'll see from our GPUs. but even if our GPUs come nowhere near the maximum theoretical performance, a GTX 670 should still be substantially more powerful than a GTX 560 Ti. so not only is the Kepler architecture (GTX 6xx series) no better than the Fermi architecture (GTX 5xx series) when it comes to Einstin@Home, but Kepler is actually substantially worse clock for clock at Einstein@Home than Fermi. of course you really can't blame it on the GPUs architecture...ultimately it comes down to Einstein@Home code not being as optimized for Kepler as it is for Fermi.

i ended up selling the GTX 670 and "downgrading" to a GTX 580 (i actually have 3 of them now), which blows the GTX 670 away in terms of Einstein@Home performance...even if it consumes slightly more power than a GTX 670.

Khangollo

Joined: 17 Feb 11

Posts: 42

Credit: 928047659

RAC: 0

RE: GTX560Ti: 9900

11 Jul 2013 18:18:12 UTC

Message 115718 in response to message 115716

(moderation:

)

Quote:

GTX560Ti: 9900 sec
GTX760: 9500 sec
(both with 1 WU at a time)

For better performance on those GPUs with many CUDA cores with current BRP applications, you really need to run more than one WU at a time (2 or 3).

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

just to be clear, i was

11 Jul 2013 18:23:55 UTC

Message 115719

(moderation:

)

just to be clear, i was running 3 simultaneously on each of my GTX 560 Ti's when i had them, and i was running 4 simultaneously on the GTX 670 when i had it. the GTX 670 still only performed marginally better than the GTX 560 Ti's did.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 507068263

RAC: 78843

If you compare the data

11 Jul 2013 18:53:07 UTC

Message 115720 in response to message 115719

(moderation:

)

If you compare the data sheets you will find out that GTX 5xx series have internal core clock twice the external core clock. GTX 6xx does not double the core clock. A very basic calculation is: twice the cores give the same performance as the 5xx series.
GTX 7xx is a finetuned GTX6xx series.

The advantage of the new cards is: less power consumption.
Since cuda developement continues you can not be shure that the older cards will be usable forever. It is pretty like with the AMD cards: AMD HD4xxx is no longer supported here.

There are threads here where the dev's explain why they are still using an older cuda version, but new versions will come for shure.

So keep the new cards and you are on the save side.

Alex

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: If you compare the data

11 Jul 2013 19:32:18 UTC

Message 115721 in response to message 115720

(moderation:

)

Quote:

If you compare the data sheets you will find out that GTX 5xx series have internal core clock twice the external core clock. GTX 6xx does not double the core clock. A very basic calculation is: twice the cores give the same performance as the 5xx series.

this is exactly the reason i spoke in terms of GFLOPs only...b/c you can't compare the Kepler and Fermi architectures on either core count or shader clock alone - both must be taken into consideration in order to do an apples to apples comparison (GFLOPs vs GFLOPs), where GFLOPs = shader clock X core count.

my point is that the GTX 670's internal core clock is greater than half of the GTX 560 Ti's internal core clock (or shader clock...whatever you want to call it) AND it has 3.5 times as many cores as the GTX 560 Ti. hence the reason the GTX 670 has twice the theoretical FP32 (single-precision) throughput that the GTX 560 Ti has. and so the reason the GTX 670 isn't substantially more efficient at Einstein@Home is a lack of optimization for the Kepler architecture as compared with the older Fermi architecture.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 507068263

RAC: 78843

RE: this is exactly the

11 Jul 2013 19:53:17 UTC

Message 115722 in response to message 115721

(moderation:

)

Quote:

this is exactly the reason i spoke in terms of GFLOPs only...b/c you can't compare the Kepler and Fermi architectures on either core count or shader clock alone - both must be taken into consideration in order to do an apples to apples comparison (GFLOPs vs GFLOPs), where GFLOPs = shader clock X core count.

my point is that the GTX 670's internal core clock is greater than half of the GTX 560 Ti's internal core clock (or shader clock...whatever you want to call it) AND it has 3.5 times as many cores as the GTX 560 Ti. hence the reason the GTX 670 has twice the theoretical FP32 (single-precision) throughput that the GTX 560 Ti has. and so the reason the GTX 670 isn't substantially more efficient at Einstein@Home is a lack of optimization for the Kepler architecture as compared with the older Fermi architecture.

You are right, but as long as we have only a cuda32 app my calculation is much closer to the reality.
Might be very different and closer to your calculation when the cuda 5x apps will be available.

Beyond

Joined: 28 Feb 05

Posts: 121

Credit: 2357366212

RAC: 5583249

RE: RE: I have replaced

12 Jul 2013 14:12:30 UTC

Message 115723 in response to message 115717

(moderation:

)

Quote:

Quote:
I have replaced my GTX 560Ti with a GTX 760 now. Unfortunately there are almost no improvements for BRP5 WUs. The run time is about the same :-(

tell me about it...i made the same upgrade (GTX 560 Ti --> GTX 670) and noticed only a minimal improvement in run times as well.

now take into account that a GTX 560 Ti is capable of 1263.4 GFLOPs, while the GTX 670 is capable of 2460 GFLOPs, which is almost double the FP32 (single-precision) performance of the former.

If you did the same comparison at GPUGrid, you'd find the GTX 670 to be almost twice as fast as the 560Ti, but that's CUDA42.

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: If you did the same

12 Jul 2013 14:36:43 UTC

Message 115724 in response to message 115723

(moderation:

)

Quote:

If you did the same comparison at GPUGrid, you'd find the GTX 670 to be almost twice as fast as the 560Ti, but that's CUDA42.

exactly - its an optimization issue. and your point lends further credence to what i said earlier - Einstein@Home just isn't optimized as well for Kepler as it is for Fermi. if it were (i.e. if Einstein@Home were taking advantage of CUDA42, and not just CUDA32), then we would actually see substantial differences in performance between Kepler GPUs and Fermi GPUs on the Einstein@Home project. but until then...

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3522721432

RAC: 1571596

This has already been

12 Jul 2013 20:17:32 UTC

Message 115725

(moderation:

)

This has already been discussed a few posts above:
http://einsteinathome.org/node/196873&nowrap=true#123837
It seems we're out of luck...

-----

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1886

Credit: 1409611210

RAC: 1172912

I have several different

12 Jul 2013 20:59:28 UTC

Message 115726

(moderation:

)

I have several different GeForce cards but the 660Ti is running these tasks the fastest for me and the 550Ti is faster than the 650Ti

But I OC'd all of them as much as I can running 2X (and all of my processors are a couple years old so I don't have any OC'd)

Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Forums › Technical News

Comment viewing options

Forums › Technical News