i just finished some fairly extensive testing to determine how best to distribute 6 GPUs across 3 individual machines. since i documented it all, i figured some of the data could be used here.
specifically, i documented the run times of Einstein@Home BRP tasks on a Gigabyte WindForce GTX 560 Ti, a Zotac GTX 580 3GB, and a Gigabyte WindForce GTX 670. not only did i test each card individually at full PCIe 2.0 x16 bandwidth, but i also tested 2 dual GPU configurations (GTX 580 + GTX 560 Ti, and GTX 580 + GTX 670), both at PCIe 2.0 x8 bandwidth (due to the limited number of PCIe lanes the 790FX and 890GX chipsets on my motherboards have available to them).
btw, these machines are all Win7 x64 platforms...
...so without further ado, here are the run times - let's start w/ the GTX 560 Ti at full PCIe 2.0 x16 bandwidth:
here are the run times for the GTX 580 at full PCIe 2.0 x16 bandwidth:
here are the run times for the GTX 670 at full PCIe 2.0 x16 bandwidth:
here is the dual GPU setup w/ the GTX 580 and the GTX 560 Ti, both at PCI 2.0 x8 bandwidth:
and finally, here is the dual GPU setup w/ the GTX 580 and the GTX 670, both at PCI 2.0 x8 bandwidth:
obviously these last two tables are here just so people can reference some run times of tasks that were crunched on GPUs that were limited to PCIe 2.0 x8 bandwidth...their x16 counterparts are obviously going to run faster/in less time. just in case it isn't perfectly clear, you want the numbers from the "run time for N simultaneous tasks" column of each table. aside from only having a minor contribution to the GTX 560 Ti row of your spreadsheet, my data can fill in the entire GTX 670 row, as well as the remaining missing values from the GTX 580 row...
...my apologies for not making this data viewable before you updated the spreadsheet. while i did finish testing a few days before your update, i only just got done organizing the data i collected.
This will be the first consumer grade card with similar FP64 performance to its Tesla counterpart, the K20x while costing a fraction of the K20x. There is an option added to the NVIDIA control panel to enable the full FP64 performance at 1/3 of FP32 performance at the expense of running lower clock frequency. Granted, the FP64 improvements will not help Einstein@home but should come in handy for a project like Milkyway@home.
Hi - I'm just wondering if you would elaborate on why the Titan FP64 performance won't help einstein@home.
I only crunch einstein@home, and was seriously thinking of using this card, specifically becuase of the FP64 performace.
Don't want to drop that much cash if it's not going to be worth it.
Hi - I'm just wondering if you would elaborate on why the Titan FP64 performance won't help einstein@home.
I only crunch einstein@home, and was seriously thinking of using this card, specifically becuase of the FP64 performace.
Don't want to drop that much cash if it's not going to be worth it.
Please elaborate.
b/c Einstein@Home only requires single precision (FP32) calculations, not double precision (FP64).
if you really want to take advantage of the Titan's FP64 performance, you'll have to put it to work on one of the handful of projects out there that requires double precision performance, like Milkyway@Home for example. but even then it probably won't be worth the initial investment. while Titan's FP64 performance is some 37% greater than the HD 7970's, two HD 7970s would far outpace a single Titan in Milkyway@Home at only 70% of the cost of a Titan, maybe less. and while power consumption (and thus electricity costs) might be double the Titan's (though in reality i doubt it), you'd have to crunch on the dual HD 7970s flat out 24/7 for several years before you'd offset the ~$300 you'd save on your initial investment.
I thought for sure that einstein@home used double-precision math? I guess I was wrong...
You are right, it does not. AFAIK only MW uses DP.
PrimeGrid GeneferCUDA also requires a double-precision gpu and is very sensitive to any overclocked chips, even factory gpu overclocks. GeneferCUDA doesn't tolerate even the slightest of errors.
I thought for sure that einstein@home used double-precision math? I guess I was wrong...
You are right, it does not. AFAIK only MW uses DP.
PrimeGrid GeneferCUDA also requires a double-precision gpu and is very sensitive to any overclocked chips, even factory gpu overclocks. GeneferCUDA doesn't tolerate even the slightest of errors.
Are there any other FP projects for ATI/AMD, or just the NV one at PrimeGrid?
Are there any other FP projects for ATI/AMD, or just the NV one at PrimeGrid?
There is one for both NVIDIA and AMD cards, as well as a cpu app on PrimeGrid. It is the Proth Prime Search (Sieve). It also has a much shorter completion time than the GeneferCUDA work unit, which I've had problems with.
I don't know how to upload a screenshot, so I'll copy'n'paste the requirements for AMD cards and the PPS (Sieve) app:
1 Requires AMD Accelerated Parallel Processing (APP) drivers.If APP driver not available for your card, then the ATI Stream SDK is needed.
Nvidia Windows drivers 295.xx and 296.xx should not be used.
Recent average CPU time: 28:54:11
Recent average GPU time: 39:11
EDIT: I just realized that I don't know what you mean by 'FP' projects, so I don't know if my reply is valid. Floating-point?
i just finished some fairly
)
i just finished some fairly extensive testing to determine how best to distribute 6 GPUs across 3 individual machines. since i documented it all, i figured some of the data could be used here.
specifically, i documented the run times of Einstein@Home BRP tasks on a Gigabyte WindForce GTX 560 Ti, a Zotac GTX 580 3GB, and a Gigabyte WindForce GTX 670. not only did i test each card individually at full PCIe 2.0 x16 bandwidth, but i also tested 2 dual GPU configurations (GTX 580 + GTX 560 Ti, and GTX 580 + GTX 670), both at PCIe 2.0 x8 bandwidth (due to the limited number of PCIe lanes the 790FX and 890GX chipsets on my motherboards have available to them).
btw, these machines are all Win7 x64 platforms...
...so without further ado, here are the run times - let's start w/ the GTX 560 Ti at full PCIe 2.0 x16 bandwidth:
here are the run times for the GTX 580 at full PCIe 2.0 x16 bandwidth:
here are the run times for the GTX 670 at full PCIe 2.0 x16 bandwidth:
here is the dual GPU setup w/ the GTX 580 and the GTX 560 Ti, both at PCI 2.0 x8 bandwidth:
and finally, here is the dual GPU setup w/ the GTX 580 and the GTX 670, both at PCI 2.0 x8 bandwidth:
obviously these last two tables are here just so people can reference some run times of tasks that were crunched on GPUs that were limited to PCIe 2.0 x8 bandwidth...their x16 counterparts are obviously going to run faster/in less time. just in case it isn't perfectly clear, you want the numbers from the "run time for N simultaneous tasks" column of each table. aside from only having a minor contribution to the GTX 560 Ti row of your spreadsheet, my data can fill in the entire GTX 670 row, as well as the remaining missing values from the GTX 580 row...
...my apologies for not making this data viewable before you updated the spreadsheet. while i did finish testing a few days before your update, i only just got done organizing the data i collected.
Eric
RE: RE: Updated list
)
Thx, added. im very impressed on ~2950 secs for 5x. nice nice
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]
RE: NVIDIA just recently
)
Hi - I'm just wondering if you would elaborate on why the Titan FP64 performance won't help einstein@home.
I only crunch einstein@home, and was seriously thinking of using this card, specifically becuase of the FP64 performace.
Don't want to drop that much cash if it's not going to be worth it.
Please elaborate.
RE: Hi - I'm just wondering
)
b/c Einstein@Home only requires single precision (FP32) calculations, not double precision (FP64).
if you really want to take advantage of the Titan's FP64 performance, you'll have to put it to work on one of the handful of projects out there that requires double precision performance, like Milkyway@Home for example. but even then it probably won't be worth the initial investment. while Titan's FP64 performance is some 37% greater than the HD 7970's, two HD 7970s would far outpace a single Titan in Milkyway@Home at only 70% of the cost of a Titan, maybe less. and while power consumption (and thus electricity costs) might be double the Titan's (though in reality i doubt it), you'd have to crunch on the dual HD 7970s flat out 24/7 for several years before you'd offset the ~$300 you'd save on your initial investment.
Keep up the good work DSKAG
)
Keep up the good work DSKAG
And nice job with the testing and the chart Eric!
I thought for sure that
)
I thought for sure that einstein@home used double-precision math? I guess I was wrong...
RE: I thought for sure that
)
You are right, it does not. AFAIK only MW uses DP.
RE: RE: I thought for
)
PrimeGrid GeneferCUDA also requires a double-precision gpu and is very sensitive to any overclocked chips, even factory gpu overclocks. GeneferCUDA doesn't tolerate even the slightest of errors.
RE: RE: RE: I thought
)
Are there any other FP projects for ATI/AMD, or just the NV one at PrimeGrid?
RE: Are there any other FP
)
There is one for both NVIDIA and AMD cards, as well as a cpu app on PrimeGrid. It is the Proth Prime Search (Sieve). It also has a much shorter completion time than the GeneferCUDA work unit, which I've had problems with.
I don't know how to upload a screenshot, so I'll copy'n'paste the requirements for AMD cards and the PPS (Sieve) app:
Proth Prime Search (Sieve)
Supported platforms:
Windows: 32bit, 64bit (+CUDA23, AMD OpenCL1)
Linux: 32bit, 64bit (+CUDA23, AMD OpenCL1)
Mac: 32bit, 64bit (+CUDA32, AMD OpenCL1 - 64 bit only)
1 Requires AMD Accelerated Parallel Processing (APP) drivers.If APP driver not available for your card, then the ATI Stream SDK is needed.
Nvidia Windows drivers 295.xx and 296.xx should not be used.
Recent average CPU time: 28:54:11
Recent average GPU time: 39:11
EDIT: I just realized that I don't know what you mean by 'FP' projects, so I don't know if my reply is valid. Floating-point?