GA-880A-UD3H vs. M4N75TD with GTX580's - Major Speed Differences

Tex1954
Tex1954
Joined: 15 Mar 11
Posts: 28
Credit: 723154606
RAC: 1273
Topic 196618

I have 3x M4N75TD mobo's and 1x Gigabyte GA-880A-UD3H. All with AMD 1090T/1100T CPU's. (also have 990FX Sabertooth and P6X58D setups that run fine).

I have two NVIDIA GTX580 cards, both identical.

I put one GTX580 in the GA-990XA-UD3 and one GTX580 in two of the M4N75TD mobo's in turn.

GPU Utilization (using msi-AB) seems the same. All setups same, run Win7-64 bit, same speed memory etc. All have 3.3GHz AMD 1100T CPU's stock speed and same BIOS setups.

ALL setups run GPU tasks at the same speed EXCEPT Einstein!!!!

All run the 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301)task..

The M4N75TD boards take 26 minutes and the GA-880A-UD3H takes 16 minutes.

In one case running Linux, I got the correct 16 minute speed running 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv270) on the M4N75TD but then it did an update and now it is slow also.

Soo, it seems there is something stupid on the ASUS M4N75TD boards... maybe the 750a chipset???

HELP!!

8-)

Win7-M4N75TD Mobos:
DistrRTgen 3.49 Distributed Rainbow Table Generator (distrrtgen) (cuda40) 00:18:55 (00:00:10) 11/11/2012 10:43:24 AM 11/11/2012 10:45:36 AM 0.869C + 1NV 0.88 Reported: OK * Win7-75TD
PrimeGrid 1.39 PPS (Sieve) (cuda23) pps_sr2sieve_53363569_1 00:13:42 (00:02:23) 11/11/2012 6:47:40 AM 11/11/2012 6:49:45 AM 0.375C + 1NV 17.40 Reported: OK * Win7-75TD
Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:26:02 (00:05:43) 11/11/2012 10:06:47 AM 11/11/2012 10:07:20 AM 0.2C + 1NV 21.96 Reported: OK * Win7-75TD
Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:26:00 (00:05:45) 11/11/2012 8:54:28 AM 11/11/2012 9:00:51 AM 0.2C + 1NV 22.12 Reported: OK * Win7-75TD
Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:26:00 (00:05:47) 11/11/2012 8:23:41 AM 11/11/2012 8:25:55 AM 0.2C + 1NV 22.24 Reported: OK * Win7-75TD

Win7-GA880A-UD3H Mobo:
DistrRTgen 3.48 Distributed Rainbow Table Generator (distrrtgen) (cuda23) 00:18:53 (00:00:11) 11/11/2012 10:42:56 AM 11/11/2012 10:45:35 AM 0.868C + 1NV 0.97 Reported: OK * Win7-UD3H
PrimeGrid 1.39 PPS (Sieve) (cuda23) pps_sr2sieve_53369105_0 00:13:20 (00:02:27) 11/11/2012 10:14:40 AM 11/11/2012 10:16:45 AM 0.392C + 1NV 18.38 Reported: OK * Win7-UD3H
Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:16:13 (00:04:01) 11/11/2012 10:01:20 AM 11/11/2012 10:03:25 AM 0.2C + 1NV 24.77 Reported: OK * Win7-UD3H
Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:16:10 (00:03:59) 11/11/2012 9:42:41 AM 11/11/2012 9:44:46 AM 0.2C + 1NV 24.64 Reported: OK * Win7-UD3H
Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:16:09 (00:03:59) 11/11/2012 9:24:51 AM 11/11/2012 9:26:56 AM 0.2C + 1NV 24.66 Reported: OK * Win7-UD3H

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 38

GA-880A-UD3H vs. M4N75TD with GTX580's - Major Speed Differences

Hello,

The ASUS M4N75TD comes equipped with the NVIDIA NFORCE 750a chipset. This chipset supports two x16 slots at x8 mode. The Gigabyte GA-880A-UD3H comes equipped with the AMD 880A chipset which supports a single x16 slot at x16 mode.

The Einstein application is impacted by available bandwidth. Since your GTX 580 cards are running via an x8 wired slot on the ASUS board, this could account for the slower performance.

The most optimal configuration for the GTX 580 would be to run the card in an x16 wired slot connected to a chipset that can handle the bandwidth of this slot to the CPU for optimal data transfer between the CPU and GPU. With some tweaking, you should be able to get the run time down to a little under 14 minutes with the GTX 580. CPU overclocking can help if you have a good quality CPU cooling setup.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690796378
RAC: 271670

In addition, I wonder whether

In addition, I wonder whether both boards that are compared really had the same drivers installed at the time of the test. What is suspicious, for example, is the fact that DistRT seems to have run different app versions on those boards, if I read the output correctly (even tho their runtime was identical, it may hint at different drivers installed):

Quote:


Win7-M4N75TD Mobos:
DistrRTgen 3.49 Distributed Rainbow Table Generator (distrrtgen) (cuda40) 00:18:55 (00:00:10) 11/11/2012 10:43:24 AM 11/11/2012

[...]

Win7-GA880A-UD3H Mobo:
DistrRTgen 3.48 Distributed Rainbow Table Generator (distrrtgen) (cuda23) 00:18:53 (00:00:11) 11/11/2012 10:42:56

Cheers
HBE

Tex1954
Tex1954
Joined: 15 Mar 11
Posts: 28
Credit: 723154606
RAC: 1273

RE: In addition, I wonder

Quote:
In addition, I wonder whether both boards that are compared really had the same drivers installed at the time of the test.

Yes, Dirt mixes things up... all systems have 306.97 Nvidia drivers...

Quote:

Hello,

The ASUS M4N75TD comes equipped with the NVIDIA NFORCE 750a chipset. This chipset supports two x16 slots at x8 mode. The Gigabyte GA-880A-UD3H comes equipped with the AMD 880A chipset which supports a single x16 slot at x16 mode.

The Einstein application is impacted by available bandwidth. Since your GTX 580 cards are running via an x8 wired slot on the ASUS board, this could account for the slower performance.

I know...

These are 3 systems... each has it's own SSD to run everything (and only 1 SSD) and run all other tasks fine.

AS of now, I have BOTH GTX580 cards installed in the Gigabyte Mobo... one slot is x16 and the other is x4...

Guess what? Einstein runs FAST in the x16 slot... and SLOW in the speed in the X4 slot!!!!!!

So it would seem the 580's really like the full 16 lanes for Einstein...

I never knew or had a way to check on the bandwidth used by BOINC project tasks of various types; now I know a good way... run on an X4 lane slot and see what happens...

THANKS ALL! Problem solved and verified... I need to replace 3 mobo's now..

8-)

Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:16:41 (00:04:20) 11/11/2012 01:01:10 PM 0.2C + 1NV (d1) 25.97 Reported: OK Win7-UD3H
Einstein@Home 1.28 Binary Radio Pulsar Search (Arecibo) (BRP4cuda32nv301) 00:26:02 (00:05:58) 11/11/2012 12:54:23 PM 0.2C + 1NV (d0) 22.92 Reported: OK Win7-UD3H

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 38

RE: I never knew or had a

Quote:
I never knew or had a way to check on the bandwidth used by BOINC project tasks of various types; now I know a good way... run on an X4 lane slot and see what happens...

In Linux, the v5 and v4 NVIDIA CUDA toolkit + SDK includes a sample program called bandwidthTest. This is a handy program that will let you measure bandwidth between the GPU and CPU over the PCI-E bus.

With this application, you can measure bandwidth for individual or all GPUs installed. The test with all GPUs is handy in that you can see if there is a bottleneck when several graphics cards are pushing data across the bus to the CPU or vice versa.

dmike
dmike
Joined: 11 Oct 12
Posts: 76
Credit: 31369048
RAC: 0

RE: AS of now, I have

Quote:

AS of now, I have BOTH GTX580 cards installed in the Gigabyte Mobo... one slot is x16 and the other is x4...

Guess what? Einstein runs FAST in the x16 slot... and SLOW in the speed in the X4 slot!!!!!!

So it would seem the 580's really like the full 16 lanes for Einstein...

I never knew or had a way to check on the bandwidth used by BOINC project tasks of various types; now I know a good way... run on an X4 lane slot and see what happens...

THANKS ALL! Problem solved and verified... I need to replace 3 mobo's now..

Well... maybe.

Let me share my experience with that. It could be AMD cpus. Any time I have tried to run 2 cards in an AMD box, the second one chugs while the first one flies. Also, it had nothing to do with 8x or 16x, at least not with the cards I was using.

You see, both boxes I've tried this with I've used 8x and 16x 2.0 and there was zero difference between the two. None. But, if I added another card, that secondary card would chug whether at 16x or 8x.

Now consider my I7. Both cards run identically. I've a p67 motherboard so I have pcie 3.0 8x, 8x, and a pcie 2.0 x4!

Guess what... the cards run identically no matter what slot they're in, whether they're alone, paired, or otherwise. Of course, I have a limited number of boxes to test, but you're experiencing the same exact thing on three boards with AMD chips as I was with my two.

But, I'd be completely willing to reconsider my conjecture if someone can show me a Phenom II or earlier chip running 2 identical cards at the same speed.

Now consider one more thing. I took one of my cards out of a pcie 2.0 8x slot and put it in a pcie 1.0 16x slot. There should be the same performance. Both have the same encoding and identical bandwidth.
But the 1.0 16x slot chugged so slow that it wasn't even worth having the computer on. The difference? I had a Pentium 4 2.8ghz in that box, vs a phenom II 940 in the other. The cpu made the difference in the speed.

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

RE: In Linux, the v5 and

Quote:

In Linux, the v5 and v4 NVIDIA CUDA toolkit + SDK includes a sample program called bandwidthTest. This is a handy program that will let you measure bandwidth between the GPU and CPU over the PCI-E bus.

where can we get this utility? it seems to very useful to tune ours PC to their best

dmike
dmike
Joined: 11 Oct 12
Posts: 76
Credit: 31369048
RAC: 0

RE: RE: In Linux, the

Quote:
Quote:

In Linux, the v5 and v4 NVIDIA CUDA toolkit + SDK includes a sample program called bandwidthTest. This is a handy program that will let you measure bandwidth between the GPU and CPU over the PCI-E bus.

where can we get this utility? it seems to very useful to tune ours PC to their best

From Nvidia.

https://developer.nvidia.com/cuda-downloads

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 38

RE: RE: In Linux, the

Quote:
Quote:

In Linux, the v5 and v4 NVIDIA CUDA toolkit + SDK includes a sample program called bandwidthTest. This is a handy program that will let you measure bandwidth between the GPU and CPU over the PCI-E bus.

where can we get this utility? it seems to very useful to tune ours PC to their best

You will want to download the 4.2.9 CUDA toolkit as version 5 requires a newer driver of at least 304. I was able to get a build with version 4.2.9 to work with the 295 drivers. You also want to download the SDK which has the samples.

https://developer.nvidia.com/cuda-toolkit-42-archive

To compile, you will need GCC installed and a version prior to 4.7. The toolkit did not work for me with GCC 4.7.

I already have compiled the application but I need to check the license agreement to see if it is okay to distribute a built version of the application.

Gamboleer
Gamboleer
Joined: 5 Dec 10
Posts: 173
Credit: 168389195
RAC: 0

But, I'd be completely

But, I'd be completely willing to reconsider my conjecture if someone can show me a Phenom II or earlier chip running 2 identical cards at the same speed.

Hi dmike,

I have an Athlon 64 x2 5400MHz on an M3A78-T mobo (1 16x 2.0 or two 8x 1.1 slots, my most recent Craig's List find) running two ATI 7750's with two simultaneous BRP tasks each and all four tasks are coming in at 95 minutes +/- 2-3 minutes. However, I'm thinking I may be so badly CPU-throttled that I'm not hitting a PCI bus bandwidth limitation; I'm considering swapping the CPU to a Phenom II, which the mobo supports, and that should make for a nice comparison.

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

yesterday i tried to run HD

yesterday i tried to run HD 7770 on a pci-e 3.0 with a paired , so there is no lack of pci-e bandwidth or cpu power, but result i the same: about 3600 for a singe WU. so, i think such low performance is because of hd 77xx itself. maybe, it's memory bandwidth.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.