BUt, as experience on GPU Grid shows, higher end cards are sometimes the only way to get the work done. That said, the run time of the CURRENT tasks on Einstein takes about 6-20 hours on typical systems ... with a 9800GT card or above I would expect this to drop to about about 30 minutes to an hour on the aforementioned 9800GT...
Now that's a headstart, since i'm planning to invest small on a 9800GTX+ hoping to kext it on a Mac Pro, then again I can Boot Camp with it in case the Mac GPU client fails to surface....
Paul: The danger with that calculation is that GPUgrid is a single precision app. Einstien is double precision. GF cards are 8x slower in double precision. (I've no idea about ATI) This means that if the einstien team can't rework thier app to run in single precision without losing needed levels of output quality it'll be running much slower.
Paul: The danger with that calculation is that GPUgrid is a single precision app. Einstien is double precision.
I would not qualify E@H as "double precision". It is true that E@H uses double precision arithmetic in some places, but it also does a whole lot of computation in single precision.
If you compare the performance of the variant of the E@H app that is optimized for SSE (SIMD instructions are single precsion only) to the one optimzed for SSE2 amd higher (capable of double precision SIMD), you'll see that the difference in performance is not that great. You should be able to get quite a lot of boost even from single precision GPU code.
That's for the S5R5 gravitational wave pulsar search. Note that there's now also the Arecibo EM binary pulsar search here at E@H. This app could benefit from GPUs as well, I guess.
That's a pity, as the ATI HD 38xx and 48xx series seem to be good at double precision, which is why they are so heavily used at MW. That project is heavily reliant on double precision, as I understand.
Certainly the processing speed up is well worth the graphics care upgrade (assuming WU feed is not an issue).
Using the old AGP HD38xx GPU will bring a six year old P4 PC up to the output of a 25% overclocked unlocked Penryn extreme quad.
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
If the code at EaH is "heavy" with double precision code then, well, the number of CUDA cards that will be usable goes down like a rock...
But we are all speculating in a vacuum, with no actual information or application to pick apart. Which is why I am patiently waiting until there is an actual application before I get that interested in making a change to the GPUs I have on hand ... :)
In "NVIDIA CUDA Programming Guide 2.0" in "Appendix A. Technical Specifications", in section A.1 "General Specifications" I found, that "Compute capability" equal to 1.3 have only:
GeForce GTX 280
GeForce GTX 260
Tesla S1070
Tesla C1060
And in section A.1.4 "Specifications for Compute Capability 1.3" I read:
Quote:
Support for double-precision floating-point numbers.
If I understand right, devices "smaller" than GTX260/280 and Tesla S1060/C1060 cannot operate with double precision numbers and "truncate" operands to float precision type.
P.S. But for ATI Radeons we have a different sitiation. Or not?
Paul: The danger with that calculation is that GPUgrid is a single precision app. Einstien is double precision. GF cards are 8x slower in double precision. (I've no idea about ATI)
Have a look at Milkyway@Home where an ATI-optimized application is available. A Working unit that takes 20 minutes with a Core2 with 2.4 GHz and a SSSE3-optimited application is crunched within 26 seconds using an ATI Radeon HD4850. See my statistic-page for this. If you wonder about the drop in WUs being processed. The server at M@H is simply not able to create enough WUs for the high demand due to all the ATI users flooding that project.
M@H is using double precision as well, so assuming the same factor of optimization with Einstein at Home, crunching a WU with above GPU should be finished within 5 to 10 minutes.
P.S. But for ATI Radeons we have a different sitiation. Or not?
Well, NOT for 38x0 and 48x0 cards .. but yes likely for all others ...
In other words, if you need lots of double precision you need a 38 or 48 class card and all other need not apply ...
This is probably one of the more common questions on GPU Grid regards to Nvidia cards and MW for ATI ...
The truth is that to do GPU computing you need a card, in general, that costs more than $100 to even get into the game to start at the low end. If you want to do serious work, well, right off you need to start thinking of a card in the $200+ range ... Domination required true commitments of cash ... :)
The truth is that to do GPU computing you need a card, in general, that costs more than $100 to even get into the game to start at the low end. If you want to do serious work, well, right off you need to start thinking of a card in the $200+ range ... Domination required true commitments of cash ... :)
Sure, but you replace 60 regular computers with one GPU, so even $200+ is not too much.
If I understand right, devices "smaller" than GTX260/280 and Tesla S1060/C1060 cannot operate with double precision numbers and "truncate" operands to float precision type.
Nope. For these the double precision operations will be emulated by the software using multiple single precision operations, which is way slower than on GPUs supporting double precision on hardware, but will still give correct double precision results.
The core functions of the "HierarchicalSearch" are all single precision, there are a few variables that collect many small numeric values which will add up to a large error if simply switched to single precision. But in the code we use since S5R2 the use of such double precision variables has been purposefully reduced to a minimum, I don't think they will be the limiting factor.
RE: BUt, as experience on
)
Now that's a headstart, since i'm planning to invest small on a 9800GTX+ hoping to kext it on a Mac Pro, then again I can Boot Camp with it in case the Mac GPU client fails to surface....
Paul: The danger with that
)
Paul: The danger with that calculation is that GPUgrid is a single precision app. Einstien is double precision. GF cards are 8x slower in double precision. (I've no idea about ATI) This means that if the einstien team can't rework thier app to run in single precision without losing needed levels of output quality it'll be running much slower.
RE: Paul: The danger with
)
I would not qualify E@H as "double precision". It is true that E@H uses double precision arithmetic in some places, but it also does a whole lot of computation in single precision.
If you compare the performance of the variant of the E@H app that is optimized for SSE (SIMD instructions are single precsion only) to the one optimzed for SSE2 amd higher (capable of double precision SIMD), you'll see that the difference in performance is not that great. You should be able to get quite a lot of boost even from single precision GPU code.
That's for the S5R5 gravitational wave pulsar search. Note that there's now also the Arecibo EM binary pulsar search here at E@H. This app could benefit from GPUs as well, I guess.
CU
Bikeman
That's a pity, as the ATI HD
)
That's a pity, as the ATI HD 38xx and 48xx series seem to be good at double precision, which is why they are so heavily used at MW. That project is heavily reliant on double precision, as I understand.
Certainly the processing speed up is well worth the graphics care upgrade (assuming WU feed is not an issue).
Using the old AGP HD38xx GPU will bring a six year old P4 PC up to the output of a 25% overclocked unlocked Penryn extreme quad.
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
If the code at EaH is "heavy"
)
If the code at EaH is "heavy" with double precision code then, well, the number of CUDA cards that will be usable goes down like a rock...
But we are all speculating in a vacuum, with no actual information or application to pick apart. Which is why I am patiently waiting until there is an actual application before I get that interested in making a change to the GPUs I have on hand ... :)
In "NVIDIA CUDA Programming
)
In "NVIDIA CUDA Programming Guide 2.0" in "Appendix A. Technical Specifications", in section A.1 "General Specifications" I found, that "Compute capability" equal to 1.3 have only:
GeForce GTX 280
GeForce GTX 260
Tesla S1070
Tesla C1060
And in section A.1.4 "Specifications for Compute Capability 1.3" I read:
If I understand right, devices "smaller" than GTX260/280 and Tesla S1060/C1060 cannot operate with double precision numbers and "truncate" operands to float precision type.
P.S. But for ATI Radeons we have a different sitiation. Or not?
RE: Paul: The danger with
)
Have a look at Milkyway@Home where an ATI-optimized application is available. A Working unit that takes 20 minutes with a Core2 with 2.4 GHz and a SSSE3-optimited application is crunched within 26 seconds using an ATI Radeon HD4850. See my statistic-page for this. If you wonder about the drop in WUs being processed. The server at M@H is simply not able to create enough WUs for the high demand due to all the ATI users flooding that project.
M@H is using double precision as well, so assuming the same factor of optimization with Einstein at Home, crunching a WU with above GPU should be finished within 5 to 10 minutes.
Regards, Lothar
RE: P.S. But for ATI
)
Well, NOT for 38x0 and 48x0 cards .. but yes likely for all others ...
In other words, if you need lots of double precision you need a 38 or 48 class card and all other need not apply ...
This is probably one of the more common questions on GPU Grid regards to Nvidia cards and MW for ATI ...
The truth is that to do GPU computing you need a card, in general, that costs more than $100 to even get into the game to start at the low end. If you want to do serious work, well, right off you need to start thinking of a card in the $200+ range ... Domination required true commitments of cash ... :)
RE: The truth is that to do
)
Sure, but you replace 60 regular computers with one GPU, so even $200+ is not too much.
Regards, Lothar
RE: If I understand right,
)
Nope. For these the double precision operations will be emulated by the software using multiple single precision operations, which is way slower than on GPUs supporting double precision on hardware, but will still give correct double precision results.
The core functions of the "HierarchicalSearch" are all single precision, there are a few variables that collect many small numeric values which will add up to a large error if simply switched to single precision. But in the code we use since S5R2 the use of such double precision variables has been purposefully reduced to a minimum, I don't think they will be the limiting factor.
BM
BM