CUDA and openCL Benchmarks

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

i just finished some fairly

7 Apr 2013 5:03:59 UTC

Message 110290

(moderation:

)

i just finished some fairly extensive testing to determine how best to distribute 6 GPUs across 3 individual machines. since i documented it all, i figured some of the data could be used here.

specifically, i documented the run times of Einstein@Home BRP tasks on a Gigabyte WindForce GTX 560 Ti, a Zotac GTX 580 3GB, and a Gigabyte WindForce GTX 670. not only did i test each card individually at full PCIe 2.0 x16 bandwidth, but i also tested 2 dual GPU configurations (GTX 580 + GTX 560 Ti, and GTX 580 + GTX 670), both at PCIe 2.0 x8 bandwidth (due to the limited number of PCIe lanes the 790FX and 890GX chipsets on my motherboards have available to them).

btw, these machines are all Win7 x64 platforms...

...so without further ado, here are the run times - let's start w/ the GTX 560 Ti at full PCIe 2.0 x16 bandwidth:

here are the run times for the GTX 580 at full PCIe 2.0 x16 bandwidth:

here are the run times for the GTX 670 at full PCIe 2.0 x16 bandwidth:

here is the dual GPU setup w/ the GTX 580 and the GTX 560 Ti, both at PCI 2.0 x8 bandwidth:

and finally, here is the dual GPU setup w/ the GTX 580 and the GTX 670, both at PCI 2.0 x8 bandwidth:

obviously these last two tables are here just so people can reference some run times of tasks that were crunched on GPUs that were limited to PCIe 2.0 x8 bandwidth...their x16 counterparts are obviously going to run faster/in less time. just in case it isn't perfectly clear, you want the numbers from the "run time for N simultaneous tasks" column of each table. aside from only having a minor contribution to the GTX 560 Ti row of your spreadsheet, my data can fill in the entire GTX 670 row, as well as the remaining missing values from the GTX 580 row...

...my apologies for not making this data viewable before you updated the spreadsheet. while i did finish testing a few days before your update, i only just got done organizing the data i collected.

Eric

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 65713

RE: RE: Updated list

7 Apr 2013 9:44:45 UTC

Message 110291 in response to message 110289

(moderation:

)

Quote:

Quote:
Updated list after a very long Time ^^

[LINK]http://www.dskag.at/images/Research/EinsteinGPUperformancelist.pdf[/LINK]

Thx for new values & for showing some typos :)

Very useful, thanks! Here's an example of the Titan running 5x:

http://einsteinathome.org/host/6889477/tasks&offset=0&show_names=1&state=3&appid=0

Thx, added. im very impressed on ~2950 secs for 5x. nice nice

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

John Jamulla

Joined: 26 Feb 05

Posts: 32

Credit: 1187823017

RAC: 637548

RE: NVIDIA just recently

8 Apr 2013 23:39:40 UTC

Message 110292 in response to message 110286

(moderation:

)

Quote:

NVIDIA just recently announced a new consumer grade card, the Titan:

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications

This will be the first consumer grade card with similar FP64 performance to its Tesla counterpart, the K20x while costing a fraction of the K20x. There is an option added to the NVIDIA control panel to enable the full FP64 performance at 1/3 of FP32 performance at the expense of running lower clock frequency. Granted, the FP64 improvements will not help Einstein@home but should come in handy for a project like Milkyway@home.

Hi - I'm just wondering if you would elaborate on why the Titan FP64 performance won't help einstein@home.

I only crunch einstein@home, and was seriously thinking of using this card, specifically becuase of the FP64 performace.

Don't want to drop that much cash if it's not going to be worth it.

Please elaborate.

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: Hi - I'm just wondering

8 Apr 2013 23:57:58 UTC

Message 110293 in response to message 110292

(moderation:

)

Quote:

Hi - I'm just wondering if you would elaborate on why the Titan FP64 performance won't help einstein@home.

I only crunch einstein@home, and was seriously thinking of using this card, specifically becuase of the FP64 performace.

Don't want to drop that much cash if it's not going to be worth it.

Please elaborate.

b/c Einstein@Home only requires single precision (FP32) calculations, not double precision (FP64).

if you really want to take advantage of the Titan's FP64 performance, you'll have to put it to work on one of the handful of projects out there that requires double precision performance, like Milkyway@Home for example. but even then it probably won't be worth the initial investment. while Titan's FP64 performance is some 37% greater than the HD 7970's, two HD 7970s would far outpace a single Titan in Milkyway@Home at only 70% of the cost of a Titan, maybe less. and while power consumption (and thus electricity costs) might be double the Titan's (though in reality i doubt it), you'd have to crunch on the dual HD 7970s flat out 24/7 for several years before you'd offset the ~$300 you'd save on your initial investment.

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1907

Credit: 1439046145

RAC: 1206067

Keep up the good work DSKAG

9 Apr 2013 5:29:06 UTC

Message 110294

(moderation:

)

Keep up the good work DSKAG

And nice job with the testing and the chart Eric!

John Jamulla

Joined: 26 Feb 05

Posts: 32

Credit: 1187823017

RAC: 637548

I thought for sure that

9 Apr 2013 10:38:21 UTC

Message 110295 in response to message 110293

(moderation:

)

I thought for sure that einstein@home used double-precision math? I guess I was wrong...

Beyond

Joined: 28 Feb 05

Posts: 123

Credit: 2479337674

RAC: 4835396

RE: I thought for sure that

9 Apr 2013 14:34:44 UTC

Message 110296 in response to message 110295

(moderation:

)

Quote:

I thought for sure that einstein@home used double-precision math? I guess I was wrong...

You are right, it does not. AFAIK only MW uses DP.

Wingless Wonder

Joined: 15 Jan 06

Posts: 4

Credit: 3101932

RAC: 0

RE: RE: I thought for

9 Apr 2013 14:55:50 UTC

Message 110297 in response to message 110296

(moderation:

)

Quote:

Quote:
I thought for sure that einstein@home used double-precision math? I guess I was wrong...

You are right, it does not. AFAIK only MW uses DP.

PrimeGrid GeneferCUDA also requires a double-precision gpu and is very sensitive to any overclocked chips, even factory gpu overclocks. GeneferCUDA doesn't tolerate even the slightest of errors.

Beyond

Joined: 28 Feb 05

Posts: 123

Credit: 2479337674

RAC: 4835396

RE: RE: RE: I thought

9 Apr 2013 15:01:59 UTC

Message 110298 in response to message 110297

(moderation:

)

Quote:

Quote:
Quote:
I thought for sure that einstein@home used double-precision math? I guess I was wrong...

You are right, it does not. AFAIK only MW uses DP.

PrimeGrid GeneferCUDA also requires a double-precision gpu and is very sensitive to any overclocked chips, even factory gpu overclocks. GeneferCUDA doesn't tolerate even the slightest of errors.

Are there any other FP projects for ATI/AMD, or just the NV one at PrimeGrid?

Wingless Wonder

Joined: 15 Jan 06

Posts: 4

Credit: 3101932

RAC: 0

RE: Are there any other FP

9 Apr 2013 15:23:51 UTC

Message 110299 in response to message 110298

(moderation:

)

Quote:

Are there any other FP projects for ATI/AMD, or just the NV one at PrimeGrid?

There is one for both NVIDIA and AMD cards, as well as a cpu app on PrimeGrid. It is the Proth Prime Search (Sieve). It also has a much shorter completion time than the GeneferCUDA work unit, which I've had problems with.

I don't know how to upload a screenshot, so I'll copy'n'paste the requirements for AMD cards and the PPS (Sieve) app:

Proth Prime Search (Sieve)
Supported platforms:

Windows: 32bit, 64bit (+CUDA23, AMD OpenCL1)
Linux: 32bit, 64bit (+CUDA23, AMD OpenCL1)
Mac: 32bit, 64bit (+CUDA32, AMD OpenCL1 - 64 bit only)

1 Requires AMD Accelerated Parallel Processing (APP) drivers.If APP driver not available for your card, then the ATI Stream SDK is needed.
Nvidia Windows drivers 295.xx and 296.xx should not be used.

Recent average CPU time: 28:54:11
Recent average GPU time: 39:11

EDIT: I just realized that I don't know what you mean by 'FP' projects, so I don't know if my reply is valid. Floating-point?

CUDA and openCL Benchmarks

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner