More CUDA Budha

Gerry Rough

Joined: 1 Mar 05

Posts: 102

Credit: 1847066

RAC: 0

7 Apr 2009 14:38:29 UTC

Topic 194275

(moderation:

)

Steven Pletsch had an interesting post over on Lattice that I thought would get more attention here since TLP doesn't do CUDA discussions much. Here is a copy and paste of that post:

Another thought on this subject.

Since most of the consumer level video cards that are being used with CUDA the most efficiently come with 896MB up to 1.7GB of memory, and most of the workstation grade CUDA enabled video cards are typically in the range of of 1 to 4 GB of memory. Is it at all possible to load the application, or at the least the majority of the calculation data into the memory of the video card ?

It seems that this would free up resources on the computer for handling other BOINC applications, and would possibly serve to increase the efficiency of the application. I do not know of the feasibility of doing something like this, but if the option is there it may well be worth trying.

(Click for detailed stats)

John Clark

Joined: 4 May 07

Posts: 1087

Credit: 3143193

RAC: 0

More CUDA Budha

7 Apr 2009 22:59:52 UTC

Message 92311

(moderation:

)

Similar discussion as at Milkyway, but using the ATI HD38xx and 48xx cards for their double precision maths and more pipelines.

Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 167

The applications themselves

8 Apr 2009 7:37:32 UTC

Message 92312

(moderation:

)

The applications themselves cannot be started in the GPU or put in its memory and started as the OS doesn't know that the GPU is a processor. That's because it isn't. It's a coprocessor, it helps the CPU, it won't take over from the CPU (yet).

The actual application will always run in the computer's main memory (RAM), it can't run in the GPU's memory (VRAM) because the GPU isn't recognized by the OS as something that can natively execute data. Code in the videocard's drivers and the science application will tell that there is a coprocessor available and that the data needs to run on that coprocessor.

The actual data isn't run as "data" either, but as kernels. Every bit of information has to be translated as a kernel, which the GPU's multiprocessors can run. This will take up quite a chunk of memory.

For example, the run of the mill Seti Enhanced Multibeam task which takes about 32MB of RAM when run on the CPU, runs in 200MB+ memory on the GPU.

Then there's the trouble that all applications have to be ported from whatever language they are in now (C++ for the majority) to C to make them able to work. C isn't as sophisticated as C++, so some may be difficult to translate, or you'll be adding lots of lines of code to get it to do the same function as C++ could do in less lines. Future versions of CUDA are expected to have FORTRAN, C++ and OpenCL support. Not that this makes it any easier to just port over code from one platform to the CUDA platform, but it's a start.

A lot more information can be found in the Nvidia CUDA programming Guide (2.0).

Ver Greeneyes

Joined: 26 Mar 09

Posts: 140

Credit: 9562235

RAC: 0

RE: The actual application

8 Apr 2009 11:25:49 UTC

Message 92313 in response to message 92312

(moderation:

)

Quote:

The actual application will always run in the computer's main memory (RAM), it can't run in the GPU's memory (VRAM) because the GPU isn't recognized by the OS as something that can natively execute data. Code in the videocard's drivers and the science application will tell that there is a coprocessor available and that the data needs to run on that coprocessor.

This might speed things up a bit though: Zero Copy in CUDA 2.2

More CUDA Budha

Forums › Cruncher's Corner

More CUDA Budha

The applications themselves

RE: The actual application

Comment viewing options

Forums › Cruncher's Corner