ABP2 CUDA applications

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,429,247
RAC: 128,405

RE: How is it possible to

Message 96404 in response to message 96403

Quote:

How is it possible to have extremely longer crunch times when the crunching power was much uncreased?

My GPU isn't the fastest as well, but if it's not faster with it than without, why bother sending WUs at all? Why does an increase if power make the crunching so much wasteful?

Hi,

Well, you can't judge the whole concept from your particular configuration. You have a quite fast CPU (ca. 9000 sec per ABP2 unit is excellent) and a low end grpahics board. The 8600 Gt has just 32 shaders if I'm not mistaken. This means it is perhaps 4 or 5 times (!) slower than a modern, mid-range graphics board, and 10 times slower than a top end board. This is not just true for Einstein@Home. If you look at GPUgrid project, for example, the 8600 GT isn't supported there at all because it is "too slow".

CU
HB

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33,009,522
RAC: 0

RE: RE: How is it

Message 96405 in response to message 96404

Quote:
Quote:

How is it possible to have extremely longer crunch times when the crunching power was much uncreased?

My GPU isn't the fastest as well, but if it's not faster with it than without, why bother sending WUs at all? Why does an increase if power make the crunching so much wasteful?

Hi,

Well, you can't judge the whole concept from your particular configuration. You have a quite fast CPU (ca. 9000 sec per ABP2 unit is excellent) and a low end grpahics board. The 8600 Gt has just 32 shaders if I'm not mistaken. This means it is perhaps 4 or 5 times (!) slower than a modern, mid-range graphics board, and 10 times slower than a top end board. This is not just true for Einstein@Home. If you look at GPUgrid project, for example, the 8600 GT isn't supported there at all because it is "too slow".

CU
HB


My card is supported, I just don't get any bonus as it's not fast enough for that.

But I crunch the same WUs, at least I think they are the same, as sometimes others crunch them in parallel without GPU as well, in 2/3rd of the time with only my CPU than it takes me to crunch it with more crunching power.

It's irrelevant how fast my CPU and my GPU are, as long as it's the same CPU and I add more processors to the same work, they should finish faster. Of course it should be much faster on a Fermi, and my 8600 should accelerate just a wee bit, but it's slowed down very much instead.

The formula here is: More crunching power ---> more crunching time.
The expected formula is: More crunching power ---> less crunching time.

Grüße vom Sänger

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,924,397,508
RAC: 890,366

RE: My card is supported, I

Message 96406 in response to message 96405

Quote:

My card is supported, I just don't get any bonus as it's not fast enough for that.

But I crunch the same WUs, at least I think they are the same, as sometimes others crunch them in parallel without GPU as well, in 2/3rd of the time with only my CPU than it takes me to crunch it with more crunching power.

It's irrelevant how fast my CPU and my GPU are, as long as it's the same CPU and I add more processors to the same work, they should finish faster. Of course it should be much faster on a Fermi, and my 8600 should accelerate just a wee bit, but it's slowed down very much instead.

The formula here is: More crunching power ---> more crunching time.
The expected formula is: More crunching power ---> less crunching time.


Which goes to prove, as so often in life, that brute force isn't enough: you need cunning and guile as well. I think that the Einstein project know very well that their application doesn't yet use CUDA cards efficiently, but it is there and can be used by people who have hardware with enough brute force to overcome its limitations.

For the rest of us, there's an option on the Einstein@Home preferences page for

Quote:
Use NVIDIA GPU
Enforced by version 6.10+ no


Mine, as you can see, is set to 'no': having done some initial testing (and I'm quite happy to come back and do some more testing when the next version is out), I decided to deploy my CUDA cards on projects where the applications make fuller use of them. It's one solution.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,924,397,508
RAC: 890,366

That reminds me: I've got a

That reminds me: I've got a Fermi (GTX 470) to test with now.

Kepp an eye on host 1226365: it's just downloaded the standard CUDA application and the cuda23 runtime DLLs.

That combination caused problems on Fermis at both GPUGrid and SETI: my understanding is that, at the very least, applications have to be compiled against the 3.0 SDK. There's also the 'volatile' bug to consider.

I'll be about 2 hours before it finishes its current GPUGrid task and has a go at Einstein: more then.

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33,009,522
RAC: 0

I'm still totally lost

I'm still totally lost here:
If my computer crunches the same WU either on 100% CPU or on 100%CPU + x% GPU, why is it so much slower?
It would be OK if it doesn't increase the speed, because the card is too slow, but it takes 50% longer with additional ressources working on it.
WTF is it doing on my CPU the whole time? It uses 50% more of this ressource than without additional GPU-assistance.

Grüße vom Sänger

Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9,562,235
RAC: 0

Most likely, the GPU is doing

Message 96409 in response to message 96408

Most likely, the GPU is doing part of the work -instead- of the CPU, not alongside it - although the CPU still has to tell it what to do and so probably won't be entirely idle. Since your GPU is slower than your CPU at the same work, it takes longer. It might be an idea to let the CPU still do some of the work (in your case, over 50%), but the program would have to break it up into little chunks so the CPU doesn't end up doing too much when a faster GPU is present. Whether that is possible is something the project devs will have to tell you :)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,924,397,508
RAC: 890,366

RE: That reminds me: I've

Message 96410 in response to message 96407

Quote:
That reminds me: I've got a Fermi (GTX 470) to test with now: host 1226365.


Finished the first task in a shade over 2 hours (7,660 seconds), and validated against an X5355 @ 2.66GHz which took twice as long. The Fermi barely broke sweat: maximum GPU utilisation was 20%, and the average about 12.5%.

For those familiar with the GPUGrid project, the card is currently in host 43404, where it can churn through their standard tasks in 11,000 - 12,000 seconds.

But congratulations to Einstein: I think you're the first project I've come across who made a Fermi-compatible app right from the start, without needing a special build.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,429,247
RAC: 128,405

Hi! @Richard, thanks for

Hi!

@Richard, thanks for giving ABP2 a try on your Fermi. Needless to say, it would be real nice to have you on board for a beta test of the next app, if there will be a beta test (but I guess so).

@Saenger: The explanation given by Ver Greeneyes is correct. There is a certain part of the computation (here: Fast Fourier Transform, FFT) that is executed either exclusively on the GPU (Cuda version) or CPU (conventional app). No matter how you arrange the other work, the number of FFTs per second that your GPU can do will be the bottleneck if the GPU is sufficiently slow. Usually it's impractical to have the GPU and CPU collaborate closely on the same algorithm (e.g. FFT) at the same time, because between CPU and GPU, there is a bottleneck called PCIe bus. You want to push some data onto the card, then have the GPU crunch on this (using it's ultra fast on-board RAM but not the PCIe bus) and only at the end transfer data (results) back from the board over PCIe to main RAM.

HB

Greg
Greg
Joined: 10 Mar 05
Posts: 9
Credit: 116,663,922
RAC: 0

I tried the CUDA client on a

I tried the CUDA client on a GTX 480 using 64-bit linux, and it failed. It works fine with GPUGrid and Collatz. Can I be of help fixing the issue?

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

RE: I tried the CUDA client

Message 96413 in response to message 96412

Quote:
I tried the CUDA client on a GTX 480 using 64-bit linux, and it failed. It works fine with GPUGrid and Collatz. Can I be of help fixing the issue?


Are you sure that this

[01:23:28][6844][ERROR] Error acquiring "real" CUDA device!
------> The acquired device is a "Device Emulation (CPU)"
[01:23:28][6844][ERROR] Demodulation failed (error: 1014)!

is the application's fault and not that of your system?

I'm just interpreting the message, since I don't have a CUDA device available.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.