ABP2 CUDA applications

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 723014008

RAC: 1153325

RE: How is it possible to

13 Jun 2010 21:56:39 UTC

Message 96404 in response to message 96403

(moderation:

)

Quote:

How is it possible to have extremely longer crunch times when the crunching power was much uncreased?

My GPU isn't the fastest as well, but if it's not faster with it than without, why bother sending WUs at all? Why does an increase if power make the crunching so much wasteful?

Hi,

Well, you can't judge the whole concept from your particular configuration. You have a quite fast CPU (ca. 9000 sec per ABP2 unit is excellent) and a low end grpahics board. The 8600 Gt has just 32 shaders if I'm not mistaken. This means it is perhaps 4 or 5 times (!) slower than a modern, mid-range graphics board, and 10 times slower than a top end board. This is not just true for Einstein@Home. If you look at GPUgrid project, for example, the 8600 GT isn't supported there at all because it is "too slow".

CU
HB

Saenger

Joined: 15 Feb 05

Posts: 403

Credit: 33009522

RAC: 0

RE: RE: How is it

14 Jun 2010 4:44:00 UTC

Message 96405 in response to message 96404

(moderation:

)

Quote:

Quote:
How is it possible to have extremely longer crunch times when the crunching power was much uncreased?

My GPU isn't the fastest as well, but if it's not faster with it than without, why bother sending WUs at all? Why does an increase if power make the crunching so much wasteful?

Hi,

Well, you can't judge the whole concept from your particular configuration. You have a quite fast CPU (ca. 9000 sec per ABP2 unit is excellent) and a low end grpahics board. The 8600 Gt has just 32 shaders if I'm not mistaken. This means it is perhaps 4 or 5 times (!) slower than a modern, mid-range graphics board, and 10 times slower than a top end board. This is not just true for Einstein@Home. If you look at GPUgrid project, for example, the 8600 GT isn't supported there at all because it is "too slow".

CU
HB

My card is supported, I just don't get any bonus as it's not fast enough for that.

But I crunch the same WUs, at least I think they are the same, as sometimes others crunch them in parallel without GPU as well, in 2/3rd of the time with only my CPU than it takes me to crunch it with more crunching power.

It's irrelevant how fast my CPU and my GPU are, as long as it's the same CPU and I add more processors to the same work, they should finish faster. Of course it should be much faster on a Fermi, and my 8600 should accelerate just a wee bit, but it's slowed down very much instead.

The formula here is: More crunching power ---> more crunching time.
The expected formula is: More crunching power ---> less crunching time.

Grüße vom Sänger

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2955479867

RAC: 719731

RE: My card is supported, I

14 Jun 2010 8:29:45 UTC

Message 96406 in response to message 96405

(moderation:

)

Quote:

My card is supported, I just don't get any bonus as it's not fast enough for that.

But I crunch the same WUs, at least I think they are the same, as sometimes others crunch them in parallel without GPU as well, in 2/3rd of the time with only my CPU than it takes me to crunch it with more crunching power.

It's irrelevant how fast my CPU and my GPU are, as long as it's the same CPU and I add more processors to the same work, they should finish faster. Of course it should be much faster on a Fermi, and my 8600 should accelerate just a wee bit, but it's slowed down very much instead.

The formula here is: More crunching power ---> more crunching time.
The expected formula is: More crunching power ---> less crunching time.

Which goes to prove, as so often in life, that brute force isn't enough: you need cunning and guile as well. I think that the Einstein project know very well that their application doesn't yet use CUDA cards efficiently, but it is there and can be used by people who have hardware with enough brute force to overcome its limitations.

For the rest of us, there's an option on the Einstein@Home preferences page for

Quote:

Use NVIDIA GPU
Enforced by version 6.10+ no

Mine, as you can see, is set to 'no': having done some initial testing (and I'm quite happy to come back and do some more testing when the next version is out), I decided to deploy my CUDA cards on projects where the applications make fuller use of them. It's one solution.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2955479867

RAC: 719731

That reminds me: I've got a

14 Jun 2010 8:49:58 UTC

Message 96407

(moderation:

)

That reminds me: I've got a Fermi (GTX 470) to test with now.

Kepp an eye on host 1226365: it's just downloaded the standard CUDA application and the cuda23 runtime DLLs.

That combination caused problems on Fermis at both GPUGrid and SETI: my understanding is that, at the very least, applications have to be compiled against the 3.0 SDK. There's also the 'volatile' bug to consider.

I'll be about 2 hours before it finishes its current GPUGrid task and has a go at Einstein: more then.

Saenger

Joined: 15 Feb 05

Posts: 403

Credit: 33009522

RAC: 0

I'm still totally lost

14 Jun 2010 9:35:46 UTC

Message 96408

(moderation:

)

I'm still totally lost here:
If my computer crunches the same WU either on 100% CPU or on 100%CPU + x% GPU, why is it so much slower?
It would be OK if it doesn't increase the speed, because the card is too slow, but it takes 50% longer with additional ressources working on it.
WTF is it doing on my CPU the whole time? It uses 50% more of this ressource than without additional GPU-assistance.

Grüße vom Sänger

Ver Greeneyes

Joined: 26 Mar 09

Posts: 140

Credit: 9562235

RAC: 0

Most likely, the GPU is doing

14 Jun 2010 10:42:45 UTC

Message 96409 in response to message 96408

(moderation:

)

Most likely, the GPU is doing part of the work -instead- of the CPU, not alongside it - although the CPU still has to tell it what to do and so probably won't be entirely idle. Since your GPU is slower than your CPU at the same work, it takes longer. It might be an idea to let the CPU still do some of the work (in your case, over 50%), but the program would have to break it up into little chunks so the CPU doesn't end up doing too much when a faster GPU is present. Whether that is possible is something the project devs will have to tell you :)

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2955479867

RAC: 719731

RE: That reminds me: I've

14 Jun 2010 15:04:29 UTC

Message 96410 in response to message 96407

(moderation:

)

Quote:

That reminds me: I've got a Fermi (GTX 470) to test with now: host 1226365.

Finished the first task in a shade over 2 hours (7,660 seconds), and validated against an X5355 @ 2.66GHz which took twice as long. The Fermi barely broke sweat: maximum GPU utilisation was 20%, and the average about 12.5%.

For those familiar with the GPUGrid project, the card is currently in host 43404, where it can churn through their standard tasks in 11,000 - 12,000 seconds.

But congratulations to Einstein: I think you're the first project I've come across who made a Fermi-compatible app right from the start, without needing a special build.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 723014008

RAC: 1153325

Hi! @Richard, thanks for

14 Jun 2010 16:36:03 UTC

Message 96411

(moderation:

)

Hi!

@Richard, thanks for giving ABP2 a try on your Fermi. Needless to say, it would be real nice to have you on board for a beta test of the next app, if there will be a beta test (but I guess so).

@Saenger: The explanation given by Ver Greeneyes is correct. There is a certain part of the computation (here: Fast Fourier Transform, FFT) that is executed either exclusively on the GPU (Cuda version) or CPU (conventional app). No matter how you arrange the other work, the number of FFTs per second that your GPU can do will be the bottleneck if the GPU is sufficiently slow. Usually it's impractical to have the GPU and CPU collaborate closely on the same algorithm (e.g. FFT) at the same time, because between CPU and GPU, there is a bottleneck called PCIe bus. You want to push some data onto the card, then have the GPU crunch on this (using it's ultra fast on-board RAM but not the PCIe bus) and only at the end transfer data (results) back from the board over PCIe to main RAM.

Greg

Joined: 10 Mar 05

Posts: 9

Credit: 116663922

RAC: 0

I tried the CUDA client on a

17 Jun 2010 8:46:09 UTC

Message 96412

(moderation:

)

I tried the CUDA client on a GTX 480 using 64-bit linux, and it failed. It works fine with GPUGrid and Collatz. Can I be of help fixing the issue?

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: I tried the CUDA client

17 Jun 2010 9:51:31 UTC

Message 96413 in response to message 96412

(moderation:

)

Quote:

I tried the CUDA client on a GTX 480 using 64-bit linux, and it failed. It works fine with GPUGrid and Collatz. Can I be of help fixing the issue?

Are you sure that this

[01:23:28][6844][ERROR] Error acquiring "real" CUDA device!
------> The acquired device is a "Device Emulation (CPU)"
[01:23:28][6844][ERROR] Demodulation failed (error: 1014)!

is the application's fault and not that of your system?

I'm just interpreting the message, since I don't have a CUDA device available.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

ABP2 CUDA applications

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner