CUDA and openCL Benchmarks

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1,313
Credit: 427,708,912
RAC: 109,131

My EVGA Geforce GTX 660 Ti

My EVGA Geforce GTX 660 Ti 2GB is the superclocked version.

I also have the 550Ti OC running here and it just started running the 1.28's

Funny thing is the best one running the 1.28 for me is the GeForce 610M in my laptop.

It is around 30mins faster now (as long as I am not on the laptop at the same time like right now)

And it is running Einstien cudaX2 along with the 2-core T4T and 5 LHC's at the same time.

I just added the 660Ti about 10 days ago so the RAC isn't where it will be in a few more days.

But I have had that laptop running non-stop for about 80 days

I also had the 550Ti OC just sitting here in the box for months but finally got around to updating the PS and adding Ram and the 1.28's run about 15mins faster on the 660Ti SC than on the 550Ti OC

So for the money I just may update a couple 3-core's I have to the 550Ti OC

http://einsteinathome.org/account/82814/computers

 

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 319,541,037
RAC: 47,875

RE: RE: New, atleast the

Quote:
Quote:


New, atleast the GTX670 also GT445. But I'd choose the 670.

GTX 670.

If I were to build a new system, a mobo with atleast 2 PCIe (2.0/3.0)x16/x8
slots and 6 core AMD or SB/IB i7-2600(K)/3770 and a GTX670 with room
for another one.
A suitable case and a 1KW PSU ;-)


The 670 is in the $400 range while the 660 is in the $300 range both seem to come standard with 2GB. I'm not sure how the throughput compares.

My latest systems are pretty similar to your specs but I have a word of warning on choosing motherboards. For my I7-2600K system I chose the ASUS P8Z68-V PRO which was a mistake. If you look closely the PCIe 3 x16 slot is the last one right next to the edge connectors and the on board power/reset buttons. So there's no way to install a dual width board. The memory bandwidth of the other slot is severely limited.

Also I think 1KW PS is overkill and therefore a bit wasteful. I measure the power draw in the 200-300 W range.

Thanks for the recommendation, I appreciate it.

Joe

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1,243,186
RAC: 0

RE: RE: I run Arecibo

Quote:
Quote:
I run Arecibo 1.24 in roughly 3800secs on my 6850, am wondering how other people compare.

With BRP now being v1.28, and times having halved, I suppose you should start a new thread. ;-)


Yeah, thank you all for the interesting and friendly discussion! but this thread is getting a little long and hard to navigate...

Was thinking of starting a Benchmarks 2.0 thread and having the moderators lock off this one...maybe?

Still thinking of how I'm gonna post the old times and new in the same chart, thinking of going with colors to easily differentiate between the two.

Let me know your thoughts and I'll make a decision this week as well as post an overdue updated list.

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1,243,186
RAC: 0

RE: Hmm,they didn`t improve

Quote:
Hmm,they didn`t improve the DP performance at all.I still don`t know how Nvidia and Ati in games are very close to each other,when the 7900 series has 3.7teraflops of SP and 900Gflopfs DP computing power.


Because the software never ends up using the full capabilities of the hardware; your lucky if you can get 80% efficiency. Looking at the huge difference between the 1.24 and 1.28 app versions you can see how much room for optimization there is and this is with fairly streamlined code.

Computer games are at best optimized for either AMD/Ati or Nvidia giving very skewed results with different games, and are at the mercy of the nature of information flow/ data crunching, a perfect example of which is Anti-Aliasing.

HERE
is a link to an article that describes in good detail what happens in the GPU and the 2 separate methods that AMD and Nvidia use. If you can stand the boring jargon the information is very interesting.

P.S. The first page of the article is about "PCIe Bandwidth: When Do You Have Enough?" and makes some interesting points; video games don't have nearly the bandwidth requirements of general compute apps.

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1,243,186
RAC: 0

RE: Let me know your

Quote:
Let me know your thoughts and I'll make a decision this week as well as post an overdue updated list.


Sorry for the delay, life gets a little weird sometimes.

Since nobody had an opinion on way or the other, and looking at other longer threads in the forums I guess we'll just keep this one running a little longer...

Here is the much overdue updated list:

Old times are plain text.

New BRP 1.28 times for Nvidia cards are in GREEN, and AMD cards are in RED, classic colors. :)

Enjoy!

HD 7970 ------> 2x 2,300sec
HD 7950 ------> 1x 1,145
HD 7950 ------> 2x 3,400, 3x 4,500
GTX 690 (2 GPU)
GTX 590 (2 GPU)
HD 7870
GTX 680 ------> 1x~0,750
GTX 680 ------> 3x 3,100(Win7)
*GTX 680 -----> 2x 1,945(Linux)
GTX 580 ------> 1x 0,834, 3x~2,500
GTX 580 ------> 3x 3,350(Windows)
*GTX 580 -----> 3x 3,050(Linux)
GTX 670 ------> 3x~4,300(vista)
GTX 660Ti ----> 1x~1,180, 2x~2,170
GTX 660Ti ----> 1x~1,700, 2x~2,900, 3x~4,500, 4x~6,030, 5x~8,660, 6x~12,760
GTX 570
HD 7850
GTX 670
GTX 690 (1 GPU)
GTX 570
GTX 480 ------> 2x~2,200
GTX 470 ------> 2x~3,000, 3x 3,800
GTX 590 (1 GPU)
GTX 560 [448] -> 1x 1,550, 2x 2,500
GTX 560 Ti ----> 1x~1,100, 2x 2,654, 6x 6,400
GTX 560 Ti ----> 1x~1,900, 2x~3,094, 3x~3,961, 4x~6,000, 5x~6840, 6x 7,800
*GTX 560 Ti ---> 1x 1,583 (OC'd)
GT 440
GTX 460
HD 7770 ------> 2x~8,500
HD 7750 ------> 2x~11,000
GTX 560 ------> 2x 2,300
GTX 560 ------> 1x 3,300, 2x 4800
GTX 465
HD 5870 ------> 2x~3,105
HD 5870
HD 5850 ------> 1x 1,800, 2x 6,085
HD 5850
HD 5830 ------> 1x 2,916
HD 6970
*HD 6950(1536)-> 2x 6700
HD 6950 ------> 2x 3,500
HD 6950
HD 6990 (1 GPU)
HD 6870
GTX 460 SE
HD 5970 (1 GPU)
HD 6850 ------> 1x~2,300
HD 6850 ------> 1x 3,800
GTX 550 Ti ---> 1x 1,793, 2x 2,961
GTX 550 Ti ---> 1x 3,065, 2x 5,600
GT 640 -------> 1x~5,700
GTS 450 ------> 1x~2,850, 2x~4,660
HD 6790
AMD A8 3870 -> 1x 6,489
HD 5770 ------> 1x 7,750+
HD 6770
GF 610M ------> 1x~7,800
GT 430 -------> 2x 9,100
GT 520 -------> 1x~9,600(Linux)
FirePro V4800-> 1x 10,620
HD 5670 ------> 1x 11,100
*HD 5670 -----> 1x 11,480(Win XP32)
HD 5570 ------> 1x~15,000
HD 5450 ------> 1x~36,500!

Older cards (not openCL v1.1 capable) but still interesting comparison:
GT 295 -------> 1x 2,000(Linux)
8800GT G92 ---> 1x 2,940(Linux)
8800GT G92 ---> 1x 3,600(Linux)
8800GTS G80 --> 1x 4,020(Linux)
GTS 250 ------> 2x~5,484
*GT 240 ------> 1x 4,035(OC'd)
GT 240 -------> 1x~4,500
GT 220 -------> 2x 19,400

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1,243,186
RAC: 0

I've picked through other

I've picked through other threads to get some of the data not posted here and sadly am missing some old times for cards that people have posted new times for so there are a few holes.

Hopefully we can fill in the missing gaps soon.

Oh, and some people might notice I added in a zero at the front of a few times like GTX 680 ------> 1x~0,750. I did this because I wanted it clear that it wasn't a typo and so that the sub-1,000 times stand out clearly.

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160,342,159
RAC: 0

that v1.28 run time of 2 x

that v1.28 run time of 2 x 2,654 for the GTX 560 Ti seems a bit funny considering an individual task only took 1,100s. conventional wisdom says that 2 tasks in parallel should take < 2,200s. or perhaps the 1x and 2x run times were documented on two different hosts?

at any rate, to help fill in the gaps in the GTX 560 Ti v1.28 run times, i'm running 3 tasks in parallel on each of two GTX 560 Ti's. i'm averaging ~3,600s per three tasks (~1,200s per task). these are not the only tasks running - i'm also running 1 POGS task and a bi-threaded Test4Theory@Home task (all on the CPU). CPU usage averages about 80% and peaks around 90%. if i try to add another single core CPU task of any kind, Einstein@Home performance goes down the drain. i would imagine my run times would be more inline with (or perhaps even better than) the 1 x 1,100s run time if i weren't running other projects at the same time.

Patrick
Patrick
Joined: 2 Aug 12
Posts: 70
Credit: 2,358,155
RAC: 0

Hello everyone I have a

Hello everyone

I have a GTS 250 my best time for one Task was 3200 seconds the badest 4500 and my CPU is a FX 6100 running at 3600Mhz.I´m running CPU Tasks too.How can i run more than one task?
Just setting GPU utilization factor of BRP apps from 1 to 0.5?I saw in the list someone runs 2 at 5400 seconds!?

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160,342,159
RAC: 0

RE: Hello everyone I have

Quote:

Hello everyone

I have a GTS 250 my best time for one Task was 3200 seconds the baddest 4500 and my CPU is a FX 6100 running at 3600Mhz.I´m running CPU Tasks too.How can i run more than one task?
Just setting GPU utilization factor of BRP apps from 1 to 0.5?
I saw in the list someone runs 2 at 5400 seconds!?


yes. if your Einstein BRP4 cache is empty, then you'll see the change as soon as new tasks start downloading and crunching. if you've already got BRP4 tasks in the queue however, don't expect to see the change right away. rather, any tasks that were already in the queue when you changed the count to 0.5 will continue to crunch one at a time. then, any tasks downloaded after the change was made will crunch 2 at a time.

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1,243,186
RAC: 0

RE: that v1.28 run time of

Quote:
that v1.28 run time of 2 x 2,654 for the GTX 560 Ti seems a bit funny considering an individual task only took 1,100s. conventional wisdom says that 2 tasks in parallel should take < 2,200s. or perhaps the 1x and 2x run times were documented on two different hosts?


Yes, most of the places where the times seem a little off is because of different hosts. It's a combination of some people only posting say a 2x or 3x and then someone else posting a 1x, or someone posts a faster 1x then the other person. In all situations I'll take the fastest time someone posts or PMs me and use that.

Unfortunately because the CPU, number of free cores to feed the GPU, mixing with other BOINC projects, PCI-E version, phase of the moon...all play a factor in the end result. My own CPU is OC'd and that alone shaves off 300-400 seconds.

But sorting the list by system specs, assuming we even have all the relevant info, would be too complicated (cool but messy) so the simplest way is to just take the best times available, take a swig of whiskey and after a while all the numbers will blur together...beer goggles can make anything look good if you squint hard enough. :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.