BRP4cuda32 vs. BRP4cuda32nv301 performance

Bjarke
Bjarke
Joined: 10 Feb 06
Posts: 5
Credit: 4936739
RAC: 0
Topic 196505

In the past weeks I have been using the 276.52 driver (released 2012.02.09) enabling me to run the BRP4cuda32 application. Using this driver/application it took my Nvidia Quadro FX1800 arround 6200 seconds to complete a WU and get the 500 credits.

After updating to the most recent 305.93 driver (released 2012.08.28) I am able to run the BRP4cuda32nv301 application. This seems to put more load on the GPU, since my system becomes less responsive. Therefore I would expect the runtimes to be shorter than when using the old driver. However, now a WU takes arround 19000 seconds to complete - and I still get 500 credits.

My questions are:
- Are my GPU performing better using the old driver and the BRP4cuda32 application?
- Or am I doing any better with the new driver (eg. crunching more numbers), and the credits are just not adjusted properly?

The "credit/time" might not be a very accurate measurement of performance.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2980400708
RAC: 765410

BRP4cuda32 vs. BRP4cuda32nv301 performance

The actual applications used for BRP4cuda32 and BRP4cuda32nv301 are identical:

Quote:
Comparing files einsteinbinary_BRP4_1.25_windows_intelx86__BRP4cuda32nv301.exe and EINSTEINBINARY_BRP4_1.25_WINDOWS_INTELX86__BRP4CUDA32.EXE
FC: no differences encountered


Reading the - huge and complicated - Comparison of Nvidia graphics processing units table at Wikipedia, it looks as if your FX1800 has a variant of the G94 GPU chip, which makes it comparable to a 9600 GS or GT with Compute Capability 1.1

That's quite an old technology. It's highly unlikely that NVidia is targeting driver improvements on those old chips: new features will be designed for the Fermi (GTX 4xx and 5xx) and Kepler (GTX 6xx) ranges. It's even possible that some new features, designed to make computing more robust and reliable, may reduce the raw speed of these older cards.

I suspect that any change in credit per hour is more likely to come from the driver change, and perhaps a slight variability in task runtime, than anything else.

Bjarke
Bjarke
Joined: 10 Feb 06
Posts: 5
Credit: 4936739
RAC: 0

RE: It's even possible that

Quote:
It's even possible that some new features, designed to make computing more robust and reliable, may reduce the raw speed of these older cards.

Interesting view. I havent thought of that, but it truly does make sense.

Quote:
I suspect that any change in credit per hour is more likely to come from the driver change, and perhaps a slight variability in task runtime, than anything else.

However, since my compute-time with the new 305.93 driver is now 3 times as long as with the old 276.52 driver - and I still get the same credit - I might consider downgrading to the old 276.52 driver.

It would be nice if someone could clarify whether the credits truly is an accurate measure for the amount of work done. Put in another way:
Am I doing the same amount of work i 6200 seconds using the 276.52 driver as I am doing in 19000 seconds using the 305.93 driver?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2980400708
RAC: 765410

The tasks and the application

The tasks and the application are the same. The credits are fixed at 500 per task by the project.

The 3xx series drivers are the first to preview cuda5: we're running cuda3(.2) here. That may make a difference - it does sound as if for that card, and this project, the older drivers are more suitable.

(304 and higher drivers for most other NVidia cards are still in Beta)

Bjarke
Bjarke
Joined: 10 Feb 06
Posts: 5
Credit: 4936739
RAC: 0

Thanks. I will definitely

Thanks. I will definitely downgrade to the 276.52 driver then.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 761337078
RAC: 1116180

Hi! Thanks for reporting

Hi!

Thanks for reporting this. As Richard has already mentioned, the two apps in question are, in fact, identical, byte-by-byte.

We want to support a wide range of users , even those having older drivers and hardware. However,if the penalty to use older runtime and libraries is getting too severe with newer drivers, we will be forced to have additional versions targeting newer CUDA versions soon. Note that with all the variations in operating system (Win, Linux, OSX), word length (64 bit, 32 bit) and CPU/GPU type (SSE, NVIDIA , ATI/AMD) , we now have a dozen or so different executables for the BRP4 search alone.

Oliver has done some marvelous work on automating our continuous integration environment, so we are in principle prepared to support even more variants if really neaded.

Stay tuned.

HB

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2725261
RAC: 1164

My testing of the Lunatics

My testing of the Lunatics x41z Cuda apps for Setiathome on legacy hardware also showed a slowdown on Cuda 5 preview drivers, (OpenCL Astropulse times also increased):

x41z Cuda22, Cuda23, Cuda32, Cuda41 and Cuda42 v6 Bench on the 9800GTX+ (Win Vista x64, 301.42 (Cuda42 drivers)):

WU : PG1327.wu
Lunatics_x41z_win32_cuda22.exe -verb -nog :
Elapsed 33.418 secs, speedup: 87.71% ratio: 8.14x
CPU 8.315 secs, speedup: 96.95% ratio: 32.81x
Lunatics_x41z_win32_cuda23.exe -verb -nog :
Elapsed 33.467 secs, speedup: 87.69% ratio: 8.12x
CPU 8.486 secs, speedup: 96.89% ratio: 32.15x
Lunatics_x41z_win32_cuda32.exe -verb -nog :
Elapsed 34.870 secs, speedup: 87.17% ratio: 7.80x
CPU 8.908 secs, speedup: 96.74% ratio: 30.63x
Lunatics_x41z_win32_cuda41.exe -verb -nog :
Elapsed 34.253 secs, speedup: 87.40% ratio: 7.94x
CPU 8.268 secs, speedup: 96.97% ratio: 33.00x
Lunatics_x41z_win32_cuda42.exe -verb -nog :
Elapsed 34.351 secs, speedup: 87.37% ratio: 7.91x
CPU 8.658 secs, speedup: 96.83% ratio: 31.51x

v6 MB Bench of x41z Cuda22, Cuda23, Cuda32, Cuda41 and Cuda42 on the 9800GTX+ (Win Vista x64, 304.48 Cuda 5 preview drivers):

WU : PG1327.wu
Lunatics_x41z_win32_cuda22.exe -verb -nog :
Elapsed 39.856 secs, speedup: 85.34% ratio: 6.82x
CPU 8.798 secs, speedup: 96.78% ratio: 31.01x
Lunatics_x41z_win32_cuda23.exe -verb -nog :
Elapsed 39.870 secs, speedup: 85.34% ratio: 6.82x
CPU 8.564 secs, speedup: 96.86% ratio: 31.86x
Lunatics_x41z_win32_cuda32.exe -verb -nog :
Elapsed 50.052 secs, speedup: 81.59% ratio: 5.43x
CPU 9.485 secs, speedup: 96.52% ratio: 28.77x
Lunatics_x41z_win32_cuda41.exe -verb -nog :
Elapsed 47.394 secs, speedup: 82.57% ratio: 5.74x
CPU 8.034 secs, speedup: 97.06% ratio: 33.96x
Lunatics_x41z_win32_cuda42.exe -verb -nog :
Elapsed 47.363 secs, speedup: 82.58% ratio: 5.74x
CPU 8.190 secs, speedup: 97.00% ratio: 33.31x

Claggy

Bjarke
Bjarke
Joined: 10 Feb 06
Posts: 5
Credit: 4936739
RAC: 0

I have now downgraded from

I have now downgraded from the 305.93 driver (released 2012.08.28) to 267.66 (released 2011.03.21) which has only CUDA 3.2 as required by this project. I have summarised the results below.

  • *Driver 267.66 CUDA 3.2 Average runtime 5,200 sec.

Task 306415821
*Driver 276.52 CUDA 4.0 Average runtime 6,100 sec. Task 305177621
*Driver 305.93 CUDA 5.0 Average runtime 19,000 sec. Task 305660963

Clearly using the lowest required CUDA version seems to speed up computations significantly.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 761337078
RAC: 1116180

RE: I have now downgraded

Quote:

I have now downgraded from the 305.93 driver (released 2012.08.28) to 267.66 (released 2011.03.21) which has only CUDA 3.2 as required by this project. I have summarised the results below.

  • *Driver 267.66 CUDA 3.2 Average runtime 5,200 sec.
Task 306415821
*Driver 276.52 CUDA 4.0 Average runtime 6,100 sec. Task 305177621
*Driver 305.93 CUDA 5.0 Average runtime 19,000 sec. Task 305660963

Clearly using the lowest required CUDA version seems to speed up computations significantly.

Note however that the task were computed using different versions of BRP4. That should not explain the huge slowdown for the 305.93 driver, but I'm not sure that the 267.66 driver is really faster than the 276.52 driver. Because newer drivers also mean bug fixes (ok, sometimes new bugs, too ... :-) ), I would be reluctant to go back too far in time.

Cheers
HB

Bjarke
Bjarke
Joined: 10 Feb 06
Posts: 5
Credit: 4936739
RAC: 0

RE: Note however that the

Quote:

Note however that the task were computed using different versions of BRP4. That should not explain the huge slowdown for the 305.93 driver, but I'm not sure that the 267.66 driver is really faster than the 276.52 driver. Because newer drivers also mean bug fixes (ok, sometimes new bugs, too ... :-) ), I would be reluctant to go back too far in time.

You're right. But that truly does makes it hard to decide which driver to use. Not too old and flawed, not too new and slow...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.