GPU Upgrade Shows No Improvement in Work Unit Completion

Ace Casino
Ace Casino
Joined: 25 Feb 05
Posts: 36
Credit: 1259252797
RAC: 903128

Rancher, It looks like we

Rancher,

It looks like we have (almost) the same computer....but very different RAC's.

Your Computer: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz [Family 6 Model 94 Stepping 3]

My Computer: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz [Family 6 Model 94 Stepping 3]

Yours: Measured floating point speed 4219.67 million ops/sec
Measured integer speed 14584.55 million ops/sec

Mine: Measured floating point speed 4748.19 million ops/sec
Measured integer speed 18410.21 million ops/sec

You have 2 GTX 970's, I have 1 GTX 970

Both of us using: Windows 10

I'm running 3 WU's at once on the GTX 970 and they take: 9,400seconds, or 2.6 hours.

I'm only running 1 CPU right now with a RAC of 130,000, when I was running 8 cores my RAC was 140,000.

Everything I'm running is stock...no overclocking. You can see a picture of my rig in my profile.

I'm only showing you this cause maybe you have something else going on with your computer???

Good luck

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7024274931
RAC: 1806762

Right up front, Ace Casino,

Right up front, Ace Casino, you are running CUDA55 BRP6, while Florida Rancher's recent change to enable that has not yet reached output.

Second, he does not in fact have two 970's. If a computer has more than one GPU, BOINC reports the number, but only one type--the one with the highest CUDA capability level (even if it is slower). His other GPU is a much slower model.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Phil, I 'think' what

Hi Phil,

I 'think' what he's on about is altering the core speed, you cant alter core speed in p2, but you can in p0.
I also 'think' is done by adjusting the 'offset' to give a higher speed.

Never tried it, the only thing I o/c is the memory and that only to restore the memory speed to what it was before Nvidia decided to cripple that speed via its drivers and use of p2.

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4704
Credit: 17547836507
RAC: 6425458

You just have to pass the

You just have to pass the desired offset parameters to the NVI links created by the tool in NVI. You put the offset parameters into the "Target" line in the Properties page of the desktop link icon. If you have multiple cards, you have to enumerate for each card. For example, I have two GTX970 and these are my offset settings for a mild 40 khz overclock of P0 state for each card and a 100 khz overclock of P2 state for the memory. That gives me an effective 7200 Mhz memory clock speed. Einstein responds best to higher memory speeds that enables faster task completion. Core clock boosts really have minimal effect at Einstein but does help other projects.

NVI_0 GTX970 card target "nvidiaInspector.exe -setBaseClockOffset:0,0,40 -setMemoryClockOffset:0,0,100 -setMemoryClock:0,2,3605"

NVI_1 GTX970 card target "nvidiaInspector.exe -setBaseClockOffset:1,0,40 -setMemoryClockOffset:1,0,100 -setMemoryClock:1,2,3605"

The first number after the colon is the card enumeration number. The second number after the colon in -setBaseClockOffset parameter is the Power state.The third number after the colon in -setMemoryClockOffset is the desired offset in khz. The second number after the colon in -setMemoryClock parameter is the Power state. The third number in the -setMemoryClock parameter is the desired target memory speed.

 

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Thank you Ace. Actually, I'm

Thank you Ace. Actually, I'm only running one 970 card and the second is a Dell Geforce GTX 745. The 745 is a poor quality card.

Like you I once had a RAC of about 145,000 but things have slipped significantly. My CPU tasks are taking longer. Running 2 cores my 970 takes 145 minutes to complete a WU. Also, running 2 cores my 745 takes significantly longer at 690 m minutes.

Now my RAC is about 108,000 per day. Neither card is overclocked but running their default configurations.

Regards,
Phil

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Thanks Cliff. That makes

Thanks Cliff. That makes sense to me. I set both my cards to their default configurations until I get a better understanding of overclocking.

I had so much work piled up that since I've set my "Store at least" to 0.5 days. Consequently, I'm not seeing any CUDA55 tasks yet.

Regards,
Phil

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Hey Keith. Thank you for this

Hey Keith. Thank you for this valuable information. So what is your current clock and GPU clock settings in MHz. Why are you using KHz instead of MHz for the memory clock offset?

Regards,
Phil

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7024274931
RAC: 1806762

RE: I'm not seeing any

Quote:
I'm not seeing any CUDA55 tasks yet.


You currently have 91 GRP6 tasks "in progress", which means they have been downloaded to your host but not yet returned as complete (nor aborted, expired...).

of those 91 tasks, 89 are CUDA55, so your wait is almost over.

Of your most recently returned work, four are already CUDA55 units. As you still have two un-returned CUDA32 units, most likely these first CUDA55 units did not run "pure" that is to say, that the entire time they were running the other unit was also CUDA55. I'm mentioning this to advise that you pay very little attention to the elapsed times until you are getting a stream of pure CUDA55 works.

One of the oddities of running multiple work units on GPUs "simultaneously" is that they really are not running simultaneously at all, but are in fact swapping on and off at a very rapid rate. This can and does lead to "unfair" sharing when dissimilar work is running. Rather than go into more detail, I'll just say your first four 55 returns took less long than your typical 32 work, but that you should wait a few hours, or better yet until tomorrow, to see what the real results will be.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4704
Credit: 17547836507
RAC: 6425458

I goofed on the kHz

I goofed on the kHz descriptor. My clock rate for GPU-1 is 1390 Mhz and for GPU-2 it is 1370 Mhz. The memory clock rate is 3605 Mhz for both cards. Here is the current GPU Info from SIV64:

GPU Info

I also goofed on the descriptor of the third sentence in the callout. It should have been setBaseClockOffset.

When you make the Desktop links, it is a simple matter of editing the Target line information to adjust just how much of a bump in frequency you want for the core clock and memory. It only gets complicated if you have multiple cards in the system and you have to pay attention to the card enumeration.

You can call the links in a batch file if you want also to make it automated. I just click the links before I startup BOINC Manager since I get in and out of the Manager often during the day and shut down the systems during peak electrical rates.

A warning, you have to be exited from BOINC Manager and NOT actively crunching to allow the card adjustments to be put in place. Nothing will happen if you are crunching when you click a NVI link.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7024274931
RAC: 1806762

Florida Rancher, The

Florida Rancher,

The reason your last two CUDA32 units did not report until well after a few CUDA55 units is that they ran on your slower GPU together. The finished and reported recently, and you have no CUDA32 tasks left.

Therefore they were not contaminating your most recent 970 results with impure mixed running. So it appears that at your current operating conditions CUDA55 on your 970 uses about 6,350 elapsed seconds, compared to about 8,700 for recent CUDA32, a rather nice improvement--unless you changed something else at a confounding time.

Improvement on your 745 is yet to be seen. However as to architectural generation I believe it to be a first-generation Maxwell, as are my 750 and 750 Ti cards, and expect it to see a very substantial improvement as well. I don't see any good reason for you to remove it from the box, unless you very greatly value a reduction in the swings in estimated completion time that happen depending on whether the most recent results have been from the 970 or the 745. It does not use much power, and with the current CUDA55 GRP6 application I think it slows down your 970 very little (this was not always true with other applications, and may again not be true for an application of interest to you in the future).

I myself run a box with one (overclocked) 970 and one (slightly overclocked) 750 Ti, and believe based on separate testing that they get along together nicely in running CUDA55 GRP6 for Einstein, which is all they do these days, except for providing my wife's desktop, Word doc display, and Solitaire gameplay.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.