Improve 2 x 660 Ti Performance?

Matthew Evans
Matthew Evans
Joined: 25 Apr 06
Posts: 8
Credit: 3313007
RAC: 0
Topic 196771

I ran across an interesting thread (http://einsteinathome.org/node/196762) and it made me wonder if I am getting the best performance out of my crunching box?

Here are the stats:

Intel i7-2600 (non-K, no CPU crunching for heat reasons)
2 x Nvidia 660 Ti (non-SLI mode, PCI-E 2.0 8x)
Intel P67 Motherboard

I looked at the PDF in the linked thread and I see mention of running multiple WUs? Is that on a single card? (I found a thread for this. Much higher utilization on my cards now, nice!)

I also see something mentioned about an optimized executable? I'm running the default BOINC installation and E@H application. Is there a 3rd party optimized application I can install that will provide better performance?

Lastly, in my main desktop machine I'd like to install the best card (up to $400 or so) and dedicate that to E@H and disable it on my CPU. Any suggestions? It looks like a 7970 has the best performance in that price range? It's a Z77 chipset board with an Ivy Bridge CPU and PCI-E 3.0.

Thanks!

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

Improve 2 x 660 Ti Performance?

In my experience, the NVIDIA BRP4 applications can benefit from extra PCI-E bandwidth. If you were to have a board with two x16 2.0 slots or two x8 3.0 slots, that should be of some benefit. I have not tested the 660 cards myself and there may be some difference regarding bandwidth requirements from one card model to another.

I find that with HT enabled processors that it helps to set the maximum CPU usage to 50% or to disable HT. Leaving at least one CPU core available for the BRP4 application is also beneficial. In the case of Windows OS, there is the option of running Process Lasso. Process Lasso will let you set the BRP4 application to higher priority and set the affinity for the BRP4 application so that only physical cores are used. This could be an alternative to disabling HT.

The 7970 would be the way to go for upgrading your main system and BRP4. With the Ivy Bridge CPU and a single 7970, your 7970 will have 16 3.0 lanes available. In the case of the 7970, I found that there is only a 2% performance difference between a 3.0 x16 slot and a 3.0 x8 slot. You could add a second 7970 down the road if you wanted and have each card setup with 8 lanes.

If you go with the 7970, make sure to run at least Catalyst driver 12.11 for improved OpenCL performance.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: I also see something

Quote:
I also see something mentioned about an optimized executable? I'm running the default BOINC installation and E@H application. Is there a 3rd party optimized application I can install that will provide better performance?

No, there is no 3rd party optimized app available for Einstein. The stock apps do get updated fairly regularly, sometimes with the help of outside volunteers and have improved a lot over the years.

Matthew Evans
Matthew Evans
Joined: 25 Apr 06
Posts: 8
Credit: 3313007
RAC: 0

RE: In my experience, the

Quote:

In my experience, the NVIDIA BRP4 applications can benefit from extra PCI-E bandwidth. If you were to have a board with two x16 2.0 slots or two x8 3.0 slots, that should be of some benefit. I have not tested the 660 cards myself and there may be some difference regarding bandwidth requirements from one card model to another.

I find that with HT enabled processors that it helps to set the maximum CPU usage to 50% or to disable HT. Leaving at least one CPU core available for the BRP4 application is also beneficial. In the case of Windows OS, there is the option of running Process Lasso. Process Lasso will let you set the BRP4 application to higher priority and set the affinity for the BRP4 application so that only physical cores are used. This could be an alternative to disabling HT.

The 7970 would be the way to go for upgrading your main system and BRP4. With the Ivy Bridge CPU and a single 7970, your 7970 will have 16 3.0 lanes available. In the case of the 7970, I found that there is only a 2% performance difference between a 3.0 x16 slot and a 3.0 x8 slot. You could add a second 7970 down the road if you wanted and have each card setup with 8 lanes.

If you go with the 7970, make sure to run at least Catalyst driver 12.11 for improved OpenCL performance.

Wow, thank you for the excellent post. Lots of great information.

One final question about CPU crunching. I'm having a bit of trouble trying to figure out how to compare the performance of a CPU to CUDA. I have Googled and found lots of comparisons based upon BOINC points, but that doesn't seem to be relevant to E@H?

What I'm trying to decide is if it is even worthwhile to crunch on my CPUs. If each 660 Ti can crank out 2000 PPD and my CPUs can only manage 70 or so, I'm apt to not even crunch on the processors for heat, noise, and power reasons.

Can you provide any guidance there?

Matthew Evans
Matthew Evans
Joined: 25 Apr 06
Posts: 8
Credit: 3313007
RAC: 0

RE: RE: I also see

Quote:
Quote:
I also see something mentioned about an optimized executable? I'm running the default BOINC installation and E@H application. Is there a 3rd party optimized application I can install that will provide better performance?

No, there is no 3rd party optimized app available for Einstein. The stock apps do get updated fairly regularly, sometimes with the help of outside volunteers and have improved a lot over the years.

Excellent - thank you for the reply, Holmis.

Matthew Evans
Matthew Evans
Joined: 25 Apr 06
Posts: 8
Credit: 3313007
RAC: 0

Oh, I just noticed something

Oh, I just noticed something based upon another post. (Lots of great info in this forum, wish I had found it before upgrading all my equipment a month ago.)

It looks like my 660 Ti results are running 400 seconds slower in the P67 machine than when they were in the Z77 machine. ~1280 seconds (Z77) versus ~1680 seconds (P67).

I need to buy a new case for the P67 anyway since it's a Micro-ATX case/MB. Heat is an issue unless I keep the GPU fans ramped up to 65%. Maybe I'll get a better case and stick a Z77 SLI board in there. If I stick with my i7-2600 in a Z77 board, will I gain some of those lost 400 seconds back? IIRC, for PCI-E 3.0 support it requires an Ivy Bridge processor?

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: Wow, thank you for the

Quote:


Wow, thank you for the excellent post. Lots of great information.

One final question about CPU crunching. I'm having a bit of trouble trying to figure out how to compare the performance of a CPU to CUDA. I have Googled and found lots of comparisons based upon BOINC points, but that doesn't seem to be relevant to E@H?

What I'm trying to decide is if it is even worthwhile to crunch on my CPUs. If each 660 Ti can crank out 2000 PPD and my CPUs can only manage 70 or so, I'm apt to not even crunch on the processors for heat, noise, and power reasons.

Can you provide any guidance there?

In general, GPUs will be able to process BRP4 tasks much faster than even the highest end processors are able to do. To give an example, my 3930K can process a single BRP4 task in about 19,500-20,500 seconds which comes out to approximately 26 tasks per day if I have BOINC run BRP4 tasks on all 6 cores. My GTX 680 can process a single BRP4 task in 950-1050 seconds via Windows and 700 seconds via Linux which comes out to 82-123 tasks per day.

A benefit to the project of running BRP4 via your CPU is to help with cross-hardware validation of completed tasks. This is to help make sure that the results of the GPU application match that of the CPU application.

From looking around at other hosts, a 660 Ti appears to be able to process 72 tasks a day in Windows which is equivalent to 36,000 credits per day. If you were to run Linux OS, there is the possibility of even higher production per day. In addition, in many cases running two to three tasks at once per GPU can result in higher overall production.

With Einstein, there are two other projects that you can run on your CPU including Gravitational Wave Search and Gamma Ray Search. You could run these two projects on your CPU while dedicating your GPU to running BRP4.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.