More CUDA Budha

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1847066
RAC: 0
Topic 194275

Steven Pletsch had an interesting post over on Lattice that I thought would get more attention here since TLP doesn't do CUDA discussions much. Here is a copy and paste of that post:

Another thought on this subject.

Since most of the consumer level video cards that are being used with CUDA the most efficiently come with 896MB up to 1.7GB of memory, and most of the workstation grade CUDA enabled video cards are typically in the range of of 1 to 4 GB of memory. Is it at all possible to load the application, or at the least the majority of the calculation data into the memory of the video card ?

It seems that this would free up resources on the computer for handling other BOINC applications, and would possibly serve to increase the efficiency of the application. I do not know of the feasibility of doing something like this, but if the option is there it may well be worth trying.


(Click for detailed stats)

John Clark
John Clark
Joined: 4 May 07
Posts: 1087
Credit: 3143193
RAC: 0

More CUDA Budha

Similar discussion as at Milkyway, but using the ATI HD38xx and 48xx cards for their double precision maths and more pipelines.

Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

The applications themselves

The applications themselves cannot be started in the GPU or put in its memory and started as the OS doesn't know that the GPU is a processor. That's because it isn't. It's a coprocessor, it helps the CPU, it won't take over from the CPU (yet).

The actual application will always run in the computer's main memory (RAM), it can't run in the GPU's memory (VRAM) because the GPU isn't recognized by the OS as something that can natively execute data. Code in the videocard's drivers and the science application will tell that there is a coprocessor available and that the data needs to run on that coprocessor.

The actual data isn't run as "data" either, but as kernels. Every bit of information has to be translated as a kernel, which the GPU's multiprocessors can run. This will take up quite a chunk of memory.

For example, the run of the mill Seti Enhanced Multibeam task which takes about 32MB of RAM when run on the CPU, runs in 200MB+ memory on the GPU.

Then there's the trouble that all applications have to be ported from whatever language they are in now (C++ for the majority) to C to make them able to work. C isn't as sophisticated as C++, so some may be difficult to translate, or you'll be adding lots of lines of code to get it to do the same function as C++ could do in less lines. Future versions of CUDA are expected to have FORTRAN, C++ and OpenCL support. Not that this makes it any easier to just port over code from one platform to the CUDA platform, but it's a start.

A lot more information can be found in the Nvidia CUDA programming Guide (2.0).

Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9562235
RAC: 0

RE: The actual application

Message 92313 in response to message 92312

Quote:
The actual application will always run in the computer's main memory (RAM), it can't run in the GPU's memory (VRAM) because the GPU isn't recognized by the OS as something that can natively execute data. Code in the videocard's drivers and the science application will tell that there is a coprocessor available and that the data needs to run on that coprocessor.


This might speed things up a bit though: Zero Copy in CUDA 2.2

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.