CUDA, Stream Computing and Ct

bloed_brot

Joined: 5 Apr 05

Posts: 70

Credit: 91124558

RAC: 0

18 Jun 2008 13:20:47 UTC

Topic 193727

(moderation:

)

With the latest installments of the graphic boards hitting the teraflop mark, I wonder when if at all einstein can carry out those parallel calculations these bad boys can crunch so effectively?

Folding@home has shown that they are up to 60(!) times faster than a quad-core 3,4 GHz CPU. So, please give me a clue whether to expect some support for GPGPU-apps coming from the einstein team and if so, which framework is likely to being supported?

Thanks and keep up the good work

:
your thoughts - the ways :: the knowledge - your space
:

Alexander W. Janssen

Joined: 20 Feb 05

Posts: 56

Credit: 4543686

RAC: 0

CUDA, Stream Computing and Ct

18 Jun 2008 14:49:21 UTC

Message 82437

(moderation:

)

I second that. There's only one question left... CUDA or Brook? Can the project afford maintaining two GPU-platforms?

Alex.

P.S.: No, I don't want to kick off a NVIDA vs. ATI discussion ;-)

"I am tired of all this sort of thing called science here... We have spent
millions in that sort of thing for the last few years, and it is time it
should be stopped."
-- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901.

bloed_brot

Joined: 5 Apr 05

Posts: 70

Credit: 91124558

RAC: 0

RE: I second that. There's

18 Jun 2008 15:53:03 UTC

Message 82438 in response to message 82437

(moderation:

)

Quote:

I second that. There's only one question left... CUDA or Brook? Can the project afford maintaining two GPU-platforms?

Alex.

P.S.: No, I don't want to kick off a NVIDA vs. ATI discussion ;-)

No, me neither, no fanboy flame war, please ;-)

I too think that for starters it is only possible to support one platform / framework. Since I am no programmer, I cannot comment on those voices saying that nVidia's CUDA is easier to programm than ATI's Stream Computing. Yet, ATI's has made it open source so maybe in future it is more likely to being more widely adopted. Despite Brook being around for years (5?) it is still early doors for GPGPU, yet I think the GPU manufactures have picked up the pace recently and maybe einstein can develop an app that benefits from these developments. After all the SSE2 version of the Linux power app is a beast. :)

:
your thoughts - the ways :: the knowledge - your space
:

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253255189

RAC: 38480

I got some code from a

18 Jun 2008 17:07:05 UTC

Message 82439

(moderation:

)

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3593234919

RAC: 662521

RE: I got some code from a

18 Jun 2008 23:53:53 UTC

Message 82440 in response to message 82439

(moderation:

)

Quote:

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

If double precision is a major req you could just limit cuda to the G200 series of cards. Unlike their predecessors they have 64bit FPUs.

[AF>Futura Scie...

Joined: 12 Apr 05

Posts: 34

Credit: 1923040

RAC: 0

RE: RE: I got some code

19 Jun 2008 2:57:24 UTC

Message 82441 in response to message 82440

(moderation:

)

Quote:

Quote:
I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

If double precision is a major req you could just limit cuda to the G200 series of cards. Unlike their predecessors they have 64bit FPUs.

Yes they comply to IEEE 754R.
But... Only about 30 of the 240 ALUs support double precision.

God created a few good looking guys.. and for the rest he put hairs on top..

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253255189

RAC: 38480

RE: RE: I got some code

23 Jun 2008 13:36:08 UTC

Message 82442 in response to message 82440

(moderation:

)

Quote:

Quote:
I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

If double precision is a major req you could just limit cuda to the G200 series of cards. Unlike their predecessors they have 64bit FPUs.

There are rather few of them. Right now I'm not sure that supporting GPU is worth the effort at all. Anyway, I'm pretty sure that the remaining issues can be resolved by emulating the double precision e.g. with two floats or a float and an int. But first you'll have to find out what precisely goes wrong, and that's where we're stuck atm.

bloed_brot

Joined: 5 Apr 05

Posts: 70

Credit: 91124558

RAC: 0

RE: Right now I'm not sure

25 Jun 2008 15:42:52 UTC

Message 82443 in response to message 82442

(moderation:

)

Quote:

Right now I'm not sure that supporting GPU is worth the effort at all.
BM

How come? Will not CUDA become more sophisticated as well as the processing power continue to outperform serialised instruction computing on the x86 architecture? Doesn't that imply, that parallelisation of the e@h-app is rather tricky while parallelisation is key to unlocking shed loads of processing?

I am sorry that I do not understand the precise problems (e.g. double float VS int+float), however, from a long term perspective, I would say, that harnissing the power of GPUs holds more potential than harnessing the power of CPUs.

BM or somebody else equally knowledgable (AkosF, etc.): Please, is it possible to explain why Folding@home has managed to get a GPU client working whereas e@h proves to be difficult? Please explain from the point of the architecture of the apps :)

Thank you ever so much!

:
your thoughts - the ways :: the knowledge - your space
:

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253255189

RAC: 38480

RE: RE: Right now I'm not

25 Jun 2008 18:30:19 UTC

Message 82444 in response to message 82443

(moderation:

)

Quote:

Quote:
Right now I'm not sure that supporting GPU is worth the effort at all.
BM

How come? Will not CUDA become more sophisticated as well as the processing power continue to outperform serialised instruction computing on the x86 architecture? Doesn't that imply, that parallelisation of the e@h-app is rather tricky while parallelisation is key to unlocking shed loads of processing?

I am sorry that I do not understand the precise problems (e.g. double float VS int+float), however, from a long term perspective, I would say, that harnissing the power of GPUs holds more potential than harnessing the power of CPUs.

BM or somebody else equally knowledgable (AkosF, etc.): Please, is it possible to explain why Folding@home has managed to get a GPU client working whereas e@h proves to be difficult? Please explain from the point of the architecture of the apps :)

Thank you ever so much!

There is no standard for GPU computing (yet). Picking one particular model: how many Einstein@home participants do have an NVidia Quadro card that they want to actually use for crunching? Remember that displaying anything is not (yet) possible when using the GPU for numerical calculations.

As far as I understand the Folding@home application is based on Brook or some similar higher level language, the Einstein@home application is (currently) not. Our "Fstat engine" could be thought of as an FFT for narrow frequency bands. It's actually possible to use standard FFT implementations to calculate it, but in the current framework this would be rather inefficient. The current code was chosen for Einstein@home because it allows us to split the frequency bands into many small pieces (workunits), keeping computing time and data transfer volume within the bounds of a volunteer computing project.

Pinkesh Patel (a LSC member) is working on a program that actually uses standard FFT algorithms (I think with little modifications) for calculating the F-Statistic, but his code isn't ready to be used yet (at least not on E@H), using it would require a completely different search- and workunit design, and it would be much more demanding for machines and their connection to the servers than what we currently expect our participants to have.

I definitely think that using high-level languages / libraries like Brook that have efficient implementations for every platform is the way to go in the future, but for the moment (i.e. S5R4) we need to stick to what we have.

bloed_brot

Joined: 5 Apr 05

Posts: 70

Credit: 91124558

RAC: 0

Thank you BM, for making it

26 Jun 2008 8:15:44 UTC

Message 82445

(moderation:

)

Thank you BM, for making it clearer.
So the design and generation of work units limits the processing methods the app can be built upon and since GPU and CPU process differently, the method of generating the work units can only focus and support one type (GPU or CPU), correct?

Well, I am sure that the sooner or later you guys will come up with a solution! :)

:
your thoughts - the ways :: the knowledge - your space
:

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 815316372

RAC: 1262065

Hi! I understand that even

26 Jun 2008 11:17:07 UTC

Message 82446

(moderation:

)

Hi!

I understand that even for Folding@Home, the workunits crunched by the GPU beta clients are different from those for the other platforms. But they did manage to do visualization and GPU processing at the same time now, so that you can still use your PC's video capabilities while crunching, which should improve acceptance.

Bikeman

CUDA, Stream Computing and Ct

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner