Is there a GPU version of the app in the works?

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1252
Credit: 322858992
RAC: 375068

RE: Even given these

Message 87140 in response to message 87138

Quote:

Even given these figures (4 fold performance, for an additional 200 W), the work done per Watt*h ration would be better with than w/o GPU, right?

CU
Bikeman


The graphics card can be over 250 watts, that's 200W from connectors and up to 75 watts from mobo. I usually use my computer when BOINC is running but people have said they get BSOD if they do CUDA plus other graphics tasks, like watching DVD, and it must be stopped for gaming.

@Klimax,
A cpu is needed to feed data for the task running on the gpu. If you also tey to run task on that cpu the gpu task does not have enough priority to interupt the cpu task. Times reported for gpu task have gone from 20 mins (wall clock) without cpu task to over two hours for gpu task when cpu is also doing BOINC task.

Also at the moment you can only use one application/project so if you use the gpu app then the cpu's must use a different application, that usually means different project.

ulenz
ulenz
Joined: 22 Jan 05
Posts: 27
Credit: 17897764
RAC: 0

My testing hardware included

My testing hardware included an AMD X64-dual core cpu (about 2100 Mhz) and an ATI Radeon HD 3850 graphics card. Folding home itsself needed one complete core (about 50 % of cpu-power) additionaly to the gpu. Of course it will depend on the application itsself how much cpu-power is needed additionaly. But indeed the gpu itsself is not enough to get the job done.
Moreover, I spend 20-25 Euros monthly for the electricity bill running distributed computing projects (einstein@home + seti@home). And I don't want to pay much more except for my annual donations to seti@home and the Southern Seti project.

Intel Q9300 Quadcore, 2500 Mhz, 4096 MB RAM, GeForce 9800 GT, Vista Ultimate 64-bit, Ubuntu 10.10 64 bit

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1847066
RAC: 0

I think there are about to be

I think there are about to be a lot of changes to BOINC in the next year. The dev's are no doubt going to be up late trying to perfect BOINC for GPU efficiency. And many of the projects are going to want to get in on the act, and soon too. I know Lattice is working on a GPU app as well, and that's just for starters: I suspect that most other projects will either have GPU apps ready by years end, or at least have them in the works. With a limited BOINC universe vying for increases in flops, this could get interesting to say the least.

All of these issues will get worked out, especially since we already know that Linux can run 4+1 apps (4 cores + 1 on the GPU) on its boxes. No doubt someone at BOINC HQ will put the pieces of the puzzle together for Windows boxes too. Ulenz and others are probably going to want some throttling options, and I would think that perhaps options on when the GPU can run will be one of those. After all, GPUs will likely have to throttle back or stop the app outright when things needs to be more, well, peaceful or indeed wintry inside.

By this time next year, I expect many who have posted to this thread will be singing Kumbaya. :-)


(Click for detailed stats)

alpina
alpina
Joined: 27 Aug 05
Posts: 2
Credit: 150754
RAC: 0

CUDA definitely has a lot of

CUDA definitely has a lot of potential. My 8800GTS is at least 6 times faster doing a Workunit as my Q6700 clocked @ 3ghz. The CPU needs to assist the GPU, so you loose some CPU cycles, but that really is minimal: about 2 to 3% of the CPU power of one core.

There are still some problems though and I don't know what the guys at Seti@home wore thinking but the CUDA application should never have left beta in the state it is today:
1./ the application itself crashes from time to time, thereby crashing the display driver as well.
2./ BOINC does not use all of the CPU cores and GPU cores concurrently(alltough this can be fixed(also on Windows, I have had 4+1 tasks running concurrently without any problem), it should be a standard feature of BOINC)

But it should be clear that the potential of GPU computing is so promising that einstein@home just can't ignore this. It might not be possible to port einstein to CUDA but they at least should investigate it and I'm glad they do.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 137731140
RAC: 16621

RE: But it should be clear

Message 87144 in response to message 87143

Quote:
But it should be clear that the potential of GPU computing is so promising that einstein@home just can't ignore this. It might not be possible to port einstein to CUDA but they at least should investigate it and I'm glad they do.

From the NVIDIA press releases there is already a CUDA based app in the works, and Bernd has confirmed it too. The 1st one I quoted sounds like the app is completed, the 2nd press release refers to "potential" so implies it is coming but not yet released.

Either way we should be expecting one fairly soon for Einstein, although Bernd may hold off for a while so the BOINC dev team can sort out issues with BOINC first.

It will be interesting if the PALFA search application will also be CUDA capable or CPU-only. I suspect CPU-only to start with and they will have to port it.

j2satx
j2satx
Joined: 22 Jan 05
Posts: 46
Credit: 1650297
RAC: 0

RE: RE: My understanding

Message 87145 in response to message 87123

Quote:
Quote:

My understanding is that it would be 2*0.9 CPU and 2 GPU. I believe each GPU requires a core, whether or not it is fully utilized.

Not sure what you mean here. Do you mean that on my quad core host, three of my cores will belong to other projects, and use 100% of those CPUs, and that one core will be used by the GPU at 90% usage?

Yes, although the one core feeding the GPU will probably only run at less than 10%.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691072864
RAC: 264439

RE: RE: RE: My

Message 87146 in response to message 87145

Quote:
Quote:
Quote:

My understanding is that it would be 2*0.9 CPU and 2 GPU. I believe each GPU requires a core, whether or not it is fully utilized.

Not sure what you mean here. Do you mean that on my quad core host, three of my cores will belong to other projects, and use 100% of those CPUs, and that one core will be used by the GPU at 90% usage?

Yes, although the one core feeding the GPU will probably only run at less than 10%.

This refers to SETI@Home's GPU app, right? No doubt the CPU load for the "feeding" part of the app will be highly project specific.

Over-simplifying a bit, one can say that for E@H, the current "CPU-only" app. spends about half its time with "signal precessing" code ("F-Statistic") and half with pattern recognition ("Hough Transform"). The first is probably much easier to implement in a highly parallel way on the GPU, I'm not sure that the pattern recognition part has already been ported to GPU code. I think Bernd's statements here always referred to the F-stat part only. Unless both parts of the algorithm are GPU enabled (or pattern recognition is somehow moved out of the client app), the total speedup potential for GPUs on E@H is indeed limited, and the feeder part of the app will need considerable share of CPU time. We'll see.

CU
Bikeman

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86314215
RAC: 213

RE: ... Over-simplifying a

Message 87147 in response to message 87146

Quote:
... Over-simplifying a bit, one can say that for E@H, the current "CPU-only" app. spends about half its time with "signal precessing" code ("F-Statistic") and half with pattern recognition ("Hough Transform"). The first is probably much easier to implement in a highly parallel way on the GPU, I'm not sure that the pattern recognition part has already been ported to GPU code. ...


Can hybrid solutions be considered?...

Do whatever work on whichever hardware does that work best. Hence, why not split the WUs into consecutive pairs so that the GPU does its best half and the CPU does its best half?

Hence you could have the GPU working on WU #1 F, then the CPU working on WU #1 H whilst the GPU works on WU #2 F, then WU #2 H on the CPU with WU #3 F on the GPU, and so on...

It would be futile to lose the GPU speedup by having the GPU slog the long way round something that a CPU could stroll through! Or is there indeed some clever programming being done?...

Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

RE: Yes, although the one

Message 87148 in response to message 87145

Quote:

Yes, although the one core feeding the GPU will probably only run at less than 10%.


I doubt its possible to estimate CPU usage at this time for 2 reasons. 50% on a Pentium D is much less processing power than 40% on a 45nm Core2, plus it will be x times harder to feed a GTX280 than a 8600GT.

Would be interesting to see a somewhat direct comparison between ATI and Nvidia for some project. The closest thing i seen is a benchmark for video encoding on the GPU, the ATI variant almost maxed out 4 CPU cores while nvidia used 2 cores not as close to maxed out, iirc. (ATI was considerably faster but had some quality flaws i would consider critical so it could be discussed if it did the job it was supposed to do.)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691072864
RAC: 264439

RE: RE: ...

Message 87149 in response to message 87147

Quote:
Quote:
... Over-simplifying a bit, one can say that for E@H, the current "CPU-only" app. spends about half its time with "signal precessing" code ("F-Statistic") and half with pattern recognition ("Hough Transform"). The first is probably much easier to implement in a highly parallel way on the GPU, I'm not sure that the pattern recognition part has already been ported to GPU code. ...

Can hybrid solutions be considered?...

Do whatever work on whichever hardware does that work best. Hence, why not split the WUs into consecutive pairs so that the GPU does its best half and the CPU does its best half?

Hence you could have the GPU working on WU #1 F, then the CPU working on WU #1 H whilst the GPU works on WU #2 F, then WU #2 H on the CPU with WU #3 F on the GPU, and so on...

Not a bad idea at all, in fact, up until S5R2, the science app would only do the "signal processing" F-stat and the "coincidence search" / pattern recognition part would be done on the server side. But as the bandwidth between the project server and individual clients is rather limited, it's better to do both parts of the search on the clients (see this article by project scientist Reinhard Prix for details.

There are highly parallel implementations of the Hough transform, even for GPUs, described in the literature, it's just a big effort to rewrite & test the code. Despite the enthusiastic NVidia press release (probably a product of the marketing rather than the engineering department :-) ) I think a GPU app for E@H is something that will need considerable time to develop its full potential, but should eventually be very powerful indeed.

Who knows...maybe we will eventually see some top crunchers buying one of those NVidia Tesla "personal supercomputers" for about 10k$??? It's an expensive hobby anyway...

CU
Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.