CUDA and openCL Benchmarks

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

i just finished some fairly

i just finished some fairly extensive testing to determine how best to distribute 6 GPUs across 3 individual machines. since i documented it all, i figured some of the data could be used here.

specifically, i documented the run times of Einstein@Home BRP tasks on a Gigabyte WindForce GTX 560 Ti, a Zotac GTX 580 3GB, and a Gigabyte WindForce GTX 670. not only did i test each card individually at full PCIe 2.0 x16 bandwidth, but i also tested 2 dual GPU configurations (GTX 580 + GTX 560 Ti, and GTX 580 + GTX 670), both at PCIe 2.0 x8 bandwidth (due to the limited number of PCIe lanes the 790FX and 890GX chipsets on my motherboards have available to them).

btw, these machines are all Win7 x64 platforms...

...so without further ado, here are the run times - let's start w/ the GTX 560 Ti at full PCIe 2.0 x16 bandwidth:

here are the run times for the GTX 580 at full PCIe 2.0 x16 bandwidth:

here are the run times for the GTX 670 at full PCIe 2.0 x16 bandwidth:

here is the dual GPU setup w/ the GTX 580 and the GTX 560 Ti, both at PCI 2.0 x8 bandwidth:

and finally, here is the dual GPU setup w/ the GTX 580 and the GTX 670, both at PCI 2.0 x8 bandwidth:

obviously these last two tables are here just so people can reference some run times of tasks that were crunched on GPUs that were limited to PCIe 2.0 x8 bandwidth...their x16 counterparts are obviously going to run faster/in less time. just in case it isn't perfectly clear, you want the numbers from the "run time for N simultaneous tasks" column of each table. aside from only having a minor contribution to the GTX 560 Ti row of your spreadsheet, my data can fill in the entire GTX 670 row, as well as the remaining missing values from the GTX 580 row...

...my apologies for not making this data viewable before you updated the spreadsheet. while i did finish testing a few days before your update, i only just got done organizing the data i collected.

Eric

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 65713

RE: RE: Updated list

Quote:
Quote:

Updated list after a very long Time ^^

[LINK]http://www.dskag.at/images/Research/EinsteinGPUperformancelist.pdf[/LINK]

Thx for new values & for showing some typos :)


Very useful, thanks! Here's an example of the Titan running 5x:

http://einsteinathome.org/host/6889477/tasks&offset=0&show_names=1&state=3&appid=0

Thx, added. im very impressed on ~2950 secs for 5x. nice nice

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

John Jamulla
John Jamulla
Joined: 26 Feb 05
Posts: 32
Credit: 1187979911
RAC: 641877

RE: NVIDIA just recently

Quote:
NVIDIA just recently announced a new consumer grade card, the Titan:

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications

This will be the first consumer grade card with similar FP64 performance to its Tesla counterpart, the K20x while costing a fraction of the K20x. There is an option added to the NVIDIA control panel to enable the full FP64 performance at 1/3 of FP32 performance at the expense of running lower clock frequency. Granted, the FP64 improvements will not help Einstein@home but should come in handy for a project like Milkyway@home.

Hi - I'm just wondering if you would elaborate on why the Titan FP64 performance won't help einstein@home.

I only crunch einstein@home, and was seriously thinking of using this card, specifically becuase of the FP64 performace.

Don't want to drop that much cash if it's not going to be worth it.

Please elaborate.

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: Hi - I'm just wondering

Quote:

Hi - I'm just wondering if you would elaborate on why the Titan FP64 performance won't help einstein@home.

I only crunch einstein@home, and was seriously thinking of using this card, specifically becuase of the FP64 performace.

Don't want to drop that much cash if it's not going to be worth it.

Please elaborate.


b/c Einstein@Home only requires single precision (FP32) calculations, not double precision (FP64).

if you really want to take advantage of the Titan's FP64 performance, you'll have to put it to work on one of the handful of projects out there that requires double precision performance, like Milkyway@Home for example. but even then it probably won't be worth the initial investment. while Titan's FP64 performance is some 37% greater than the HD 7970's, two HD 7970s would far outpace a single Titan in Milkyway@Home at only 70% of the cost of a Titan, maybe less. and while power consumption (and thus electricity costs) might be double the Titan's (though in reality i doubt it), you'd have to crunch on the dual HD 7970s flat out 24/7 for several years before you'd offset the ~$300 you'd save on your initial investment.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1907
Credit: 1439266145
RAC: 1206429

Keep up the good work DSKAG


Keep up the good work DSKAG

And nice job with the testing and the chart Eric!

John Jamulla
John Jamulla
Joined: 26 Feb 05
Posts: 32
Credit: 1187979911
RAC: 641877

I thought for sure that

I thought for sure that einstein@home used double-precision math? I guess I was wrong...

Beyond
Beyond
Joined: 28 Feb 05
Posts: 123
Credit: 2480181007
RAC: 4830506

RE: I thought for sure that

Quote:
I thought for sure that einstein@home used double-precision math? I guess I was wrong...


You are right, it does not. AFAIK only MW uses DP.

Wingless Wonder
Wingless Wonder
Joined: 15 Jan 06
Posts: 4
Credit: 3101932
RAC: 0

RE: RE: I thought for

Quote:
Quote:
I thought for sure that einstein@home used double-precision math? I guess I was wrong...

You are right, it does not. AFAIK only MW uses DP.


PrimeGrid GeneferCUDA also requires a double-precision gpu and is very sensitive to any overclocked chips, even factory gpu overclocks. GeneferCUDA doesn't tolerate even the slightest of errors.

Beyond
Beyond
Joined: 28 Feb 05
Posts: 123
Credit: 2480181007
RAC: 4830506

RE: RE: RE: I thought

Quote:
Quote:
Quote:
I thought for sure that einstein@home used double-precision math? I guess I was wrong...

You are right, it does not. AFAIK only MW uses DP.

PrimeGrid GeneferCUDA also requires a double-precision gpu and is very sensitive to any overclocked chips, even factory gpu overclocks. GeneferCUDA doesn't tolerate even the slightest of errors.


Are there any other FP projects for ATI/AMD, or just the NV one at PrimeGrid?

Wingless Wonder
Wingless Wonder
Joined: 15 Jan 06
Posts: 4
Credit: 3101932
RAC: 0

RE: Are there any other FP

Quote:
Are there any other FP projects for ATI/AMD, or just the NV one at PrimeGrid?


There is one for both NVIDIA and AMD cards, as well as a cpu app on PrimeGrid. It is the Proth Prime Search (Sieve). It also has a much shorter completion time than the GeneferCUDA work unit, which I've had problems with.

I don't know how to upload a screenshot, so I'll copy'n'paste the requirements for AMD cards and the PPS (Sieve) app:

Proth Prime Search (Sieve)
Supported platforms:

Windows: 32bit, 64bit (+CUDA23, AMD OpenCL1)
Linux: 32bit, 64bit (+CUDA23, AMD OpenCL1)
Mac: 32bit, 64bit (+CUDA32, AMD OpenCL1 - 64 bit only)

1 Requires AMD Accelerated Parallel Processing (APP) drivers.If APP driver not available for your card, then the ATI Stream SDK is needed.
Nvidia Windows drivers 295.xx and 296.xx should not be used.

Recent average CPU time: 28:54:11
Recent average GPU time: 39:11

EDIT: I just realized that I don't know what you mean by 'FP' projects, so I don't know if my reply is valid. Floating-point?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.