How is it possible to have extremely longer crunch times when the crunching power was much uncreased?
My GPU isn't the fastest as well, but if it's not faster with it than without, why bother sending WUs at all? Why does an increase if power make the crunching so much wasteful?
Hi,
Well, you can't judge the whole concept from your particular configuration. You have a quite fast CPU (ca. 9000 sec per ABP2 unit is excellent) and a low end grpahics board. The 8600 Gt has just 32 shaders if I'm not mistaken. This means it is perhaps 4 or 5 times (!) slower than a modern, mid-range graphics board, and 10 times slower than a top end board. This is not just true for Einstein@Home. If you look at GPUgrid project, for example, the 8600 GT isn't supported there at all because it is "too slow".
How is it possible to have extremely longer crunch times when the crunching power was much uncreased?
My GPU isn't the fastest as well, but if it's not faster with it than without, why bother sending WUs at all? Why does an increase if power make the crunching so much wasteful?
Hi,
Well, you can't judge the whole concept from your particular configuration. You have a quite fast CPU (ca. 9000 sec per ABP2 unit is excellent) and a low end grpahics board. The 8600 Gt has just 32 shaders if I'm not mistaken. This means it is perhaps 4 or 5 times (!) slower than a modern, mid-range graphics board, and 10 times slower than a top end board. This is not just true for Einstein@Home. If you look at GPUgrid project, for example, the 8600 GT isn't supported there at all because it is "too slow".
CU
HB
My card is supported, I just don't get any bonus as it's not fast enough for that.
But I crunch the same WUs, at least I think they are the same, as sometimes others crunch them in parallel without GPU as well, in 2/3rd of the time with only my CPU than it takes me to crunch it with more crunching power.
It's irrelevant how fast my CPU and my GPU are, as long as it's the same CPU and I add more processors to the same work, they should finish faster. Of course it should be much faster on a Fermi, and my 8600 should accelerate just a wee bit, but it's slowed down very much instead.
The formula here is: More crunching power ---> more crunching time.
The expected formula is: More crunching power ---> less crunching time.
My card is supported, I just don't get any bonus as it's not fast enough for that.
But I crunch the same WUs, at least I think they are the same, as sometimes others crunch them in parallel without GPU as well, in 2/3rd of the time with only my CPU than it takes me to crunch it with more crunching power.
It's irrelevant how fast my CPU and my GPU are, as long as it's the same CPU and I add more processors to the same work, they should finish faster. Of course it should be much faster on a Fermi, and my 8600 should accelerate just a wee bit, but it's slowed down very much instead.
The formula here is: More crunching power ---> more crunching time.
The expected formula is: More crunching power ---> less crunching time.
Which goes to prove, as so often in life, that brute force isn't enough: you need cunning and guile as well. I think that the Einstein project know very well that their application doesn't yet use CUDA cards efficiently, but it is there and can be used by people who have hardware with enough brute force to overcome its limitations.
Mine, as you can see, is set to 'no': having done some initial testing (and I'm quite happy to come back and do some more testing when the next version is out), I decided to deploy my CUDA cards on projects where the applications make fuller use of them. It's one solution.
That reminds me: I've got a Fermi (GTX 470) to test with now.
Kepp an eye on host 1226365: it's just downloaded the standard CUDA application and the cuda23 runtime DLLs.
That combination caused problems on Fermis at both GPUGrid and SETI: my understanding is that, at the very least, applications have to be compiled against the 3.0 SDK. There's also the 'volatile' bug to consider.
I'll be about 2 hours before it finishes its current GPUGrid task and has a go at Einstein: more then.
I'm still totally lost here:
If my computer crunches the same WU either on 100% CPU or on 100%CPU + x% GPU, why is it so much slower?
It would be OK if it doesn't increase the speed, because the card is too slow, but it takes 50% longer with additional ressources working on it.
WTF is it doing on my CPU the whole time? It uses 50% more of this ressource than without additional GPU-assistance.
Most likely, the GPU is doing part of the work -instead- of the CPU, not alongside it - although the CPU still has to tell it what to do and so probably won't be entirely idle. Since your GPU is slower than your CPU at the same work, it takes longer. It might be an idea to let the CPU still do some of the work (in your case, over 50%), but the program would have to break it up into little chunks so the CPU doesn't end up doing too much when a faster GPU is present. Whether that is possible is something the project devs will have to tell you :)
That reminds me: I've got a Fermi (GTX 470) to test with now: host 1226365.
Finished the first task in a shade over 2 hours (7,660 seconds), and validated against an X5355 @ 2.66GHz which took twice as long. The Fermi barely broke sweat: maximum GPU utilisation was 20%, and the average about 12.5%.
For those familiar with the GPUGrid project, the card is currently in host 43404, where it can churn through their standard tasks in 11,000 - 12,000 seconds.
But congratulations to Einstein: I think you're the first project I've come across who made a Fermi-compatible app right from the start, without needing a special build.
@Richard, thanks for giving ABP2 a try on your Fermi. Needless to say, it would be real nice to have you on board for a beta test of the next app, if there will be a beta test (but I guess so).
@Saenger: The explanation given by Ver Greeneyes is correct. There is a certain part of the computation (here: Fast Fourier Transform, FFT) that is executed either exclusively on the GPU (Cuda version) or CPU (conventional app). No matter how you arrange the other work, the number of FFTs per second that your GPU can do will be the bottleneck if the GPU is sufficiently slow. Usually it's impractical to have the GPU and CPU collaborate closely on the same algorithm (e.g. FFT) at the same time, because between CPU and GPU, there is a bottleneck called PCIe bus. You want to push some data onto the card, then have the GPU crunch on this (using it's ultra fast on-board RAM but not the PCIe bus) and only at the end transfer data (results) back from the board over PCIe to main RAM.
I tried the CUDA client on a GTX 480 using 64-bit linux, and it failed. It works fine with GPUGrid and Collatz. Can I be of help fixing the issue?
Are you sure that this
[01:23:28][6844][ERROR] Error acquiring "real" CUDA device!
------> The acquired device is a "Device Emulation (CPU)"
[01:23:28][6844][ERROR] Demodulation failed (error: 1014)!
is the application's fault and not that of your system?
I'm just interpreting the message, since I don't have a CUDA device available.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: How is it possible to
)
Hi,
Well, you can't judge the whole concept from your particular configuration. You have a quite fast CPU (ca. 9000 sec per ABP2 unit is excellent) and a low end grpahics board. The 8600 Gt has just 32 shaders if I'm not mistaken. This means it is perhaps 4 or 5 times (!) slower than a modern, mid-range graphics board, and 10 times slower than a top end board. This is not just true for Einstein@Home. If you look at GPUgrid project, for example, the 8600 GT isn't supported there at all because it is "too slow".
CU
HB
RE: RE: How is it
)
My card is supported, I just don't get any bonus as it's not fast enough for that.
But I crunch the same WUs, at least I think they are the same, as sometimes others crunch them in parallel without GPU as well, in 2/3rd of the time with only my CPU than it takes me to crunch it with more crunching power.
It's irrelevant how fast my CPU and my GPU are, as long as it's the same CPU and I add more processors to the same work, they should finish faster. Of course it should be much faster on a Fermi, and my 8600 should accelerate just a wee bit, but it's slowed down very much instead.
The formula here is: More crunching power ---> more crunching time.
The expected formula is: More crunching power ---> less crunching time.
Grüße vom Sänger
RE: My card is supported, I
)
Which goes to prove, as so often in life, that brute force isn't enough: you need cunning and guile as well. I think that the Einstein project know very well that their application doesn't yet use CUDA cards efficiently, but it is there and can be used by people who have hardware with enough brute force to overcome its limitations.
For the rest of us, there's an option on the Einstein@Home preferences page for
Mine, as you can see, is set to 'no': having done some initial testing (and I'm quite happy to come back and do some more testing when the next version is out), I decided to deploy my CUDA cards on projects where the applications make fuller use of them. It's one solution.
That reminds me: I've got a
)
That reminds me: I've got a Fermi (GTX 470) to test with now.
Kepp an eye on host 1226365: it's just downloaded the standard CUDA application and the cuda23 runtime DLLs.
That combination caused problems on Fermis at both GPUGrid and SETI: my understanding is that, at the very least, applications have to be compiled against the 3.0 SDK. There's also the 'volatile' bug to consider.
I'll be about 2 hours before it finishes its current GPUGrid task and has a go at Einstein: more then.
I'm still totally lost
)
I'm still totally lost here:
If my computer crunches the same WU either on 100% CPU or on 100%CPU + x% GPU, why is it so much slower?
It would be OK if it doesn't increase the speed, because the card is too slow, but it takes 50% longer with additional ressources working on it.
WTF is it doing on my CPU the whole time? It uses 50% more of this ressource than without additional GPU-assistance.
Grüße vom Sänger
Most likely, the GPU is doing
)
Most likely, the GPU is doing part of the work -instead- of the CPU, not alongside it - although the CPU still has to tell it what to do and so probably won't be entirely idle. Since your GPU is slower than your CPU at the same work, it takes longer. It might be an idea to let the CPU still do some of the work (in your case, over 50%), but the program would have to break it up into little chunks so the CPU doesn't end up doing too much when a faster GPU is present. Whether that is possible is something the project devs will have to tell you :)
RE: That reminds me: I've
)
Finished the first task in a shade over 2 hours (7,660 seconds), and validated against an X5355 @ 2.66GHz which took twice as long. The Fermi barely broke sweat: maximum GPU utilisation was 20%, and the average about 12.5%.
For those familiar with the GPUGrid project, the card is currently in host 43404, where it can churn through their standard tasks in 11,000 - 12,000 seconds.
But congratulations to Einstein: I think you're the first project I've come across who made a Fermi-compatible app right from the start, without needing a special build.
Hi! @Richard, thanks for
)
Hi!
@Richard, thanks for giving ABP2 a try on your Fermi. Needless to say, it would be real nice to have you on board for a beta test of the next app, if there will be a beta test (but I guess so).
@Saenger: The explanation given by Ver Greeneyes is correct. There is a certain part of the computation (here: Fast Fourier Transform, FFT) that is executed either exclusively on the GPU (Cuda version) or CPU (conventional app). No matter how you arrange the other work, the number of FFTs per second that your GPU can do will be the bottleneck if the GPU is sufficiently slow. Usually it's impractical to have the GPU and CPU collaborate closely on the same algorithm (e.g. FFT) at the same time, because between CPU and GPU, there is a bottleneck called PCIe bus. You want to push some data onto the card, then have the GPU crunch on this (using it's ultra fast on-board RAM but not the PCIe bus) and only at the end transfer data (results) back from the board over PCIe to main RAM.
HB
I tried the CUDA client on a
)
I tried the CUDA client on a GTX 480 using 64-bit linux, and it failed. It works fine with GPUGrid and Collatz. Can I be of help fixing the issue?
RE: I tried the CUDA client
)
Are you sure that this
is the application's fault and not that of your system?
I'm just interpreting the message, since I don't have a CUDA device available.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)