Low memory clock on Maxwell2 cards (960/970/980, probably Titan X)

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7415521687

RAC: 1865743

Gary, The current Einstein

26 Dec 2014 15:07:02 UTC

Message 126992 in response to message 126991

(moderation:

)

Gary,

The current Einstein code seems to use a surprisingly large amount of CPU time on Perseus support tasks for Maxwell cards. At least compared to my GTX 660, and apparently compared to your AMD cards.

The improvement from 2X to 3X was small. I'm not sure if I tried 4X, but for some reason concluded it was unlikely to help.

I use Process Lasso to "herd" both the pure CPU BOINC tasks and as much as I can of other routine work on the PC to one "side" of the HT cores. The Perseus CPU support task gets a shot at either side of either core. The goal is minimum support task latency, and I am aware that I sacrifice some CPU task productivity by this configuration. Another variable is whether my wife is actually using the machine--primarily for Solitaire, Firefox, and MS Word.

My take continues to be that the bigger Maxwells and the current Windows Perseus code don't like each other very much. One view is just that The Maxwell2 architecture shortchanges I/O and memory bandwidth, and that Perseus is hungry for both--so a design misfit. But it could be that there is an optimization opportunity in there somewhere. But unless by happy chance a higher level CUDA compile helps a lot, I doubt the coding resources would view Maxwell2 improvement as much of a priority.

Just to be clear, I personally would not suggest people with a primary Einstein interest purchase either the 970 or 980 as matters stand. I still think the plain vanilla 750 is a pretty tasty combination of low price and excellent power efficiency.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 591074697

RAC: 132622

Regarding the CPU usage: this

1 Jan 2015 19:40:20 UTC

Message 126993 in response to message 126992

(moderation:

)

Regarding the CPU usage: this can be expected to differ between AMD and nVidia GPUs. I have no current numbers though, to know whether this is normal.

And we know Einstein is somewhat limited by PCIe bandwidth. This means it's better to spread the load over multiple weaker GPUs than a single fast one (your GTX970 vs. GTX660 + GTX750 comparison). The overall badnwidth from the CPU may be the same (1 16x vs. 2 8x), but the load would be spread more evenly over time.

We also know Einstein is limited by GPU memory bandwidth: here the 2 smaller GPUs with 192 + 128 Bit actually have an advantage over the GTX970 with a single 256 Bit bus.. which should be mitigated somewhat by the lower memory clocks on the older cards, yielding an approximate tie for bandwidth.

MrS

Scanning for our furry friends since Jan 2002

disturber

Joined: 26 Oct 14

Posts: 30

Credit: 57155818

RAC: 0

RE: So, having found at

7 Jan 2015 17:02:35 UTC

Message 126994 in response to message 126990

(moderation:

)

Quote:

So, having found at least to first order, memclock and core clock ceilings for Perseus work on this rig at 4000 and 1507, I backed down a bit. A run at 3899/1486 gave me an invalid matching the core clock syndrome, so I backed down core clock 20 more.

I have over 24 hours of running at core clock reported at 1466 (+100 from stock) and memclock of 3899. So far no validate errors. I won't claim this is safe or stable, but with over 20 completions, many of which have validated against quorum partners, it is currently not a high error rate condition.

With a modest 3634 in indicated credit/day from two CPU GW jobs added to an indicated 69800 from GPU Perseus jobs, this currently indicates a host credit rate of 73434/day. Though considerably helped by the overclock (especially of memory clock) that still is not a very satisfactory return on the price of a GTX 970 card, nor in line to the performance such cards get in other applications. My two hosts which each have a GTX 660 plus a 750 handily beat it in credit, despite not being overclocked by me at all.

I settled for a gpu clock of 1472 which is an 80 MHz overclock and memclock of 3745 MHz. This gives me no invalid tasks for my computer. I had it both set 20 MHz higher and ended up with 2 to 3 invalids per day. I could raise the memclock higher but I am leaving it at that right now.

I run 2 BPRP4G jobs on the 970 and 3 FGRP4 jobs on my 3770k, giving me about 72,500 calculated credits per day. The 970 is giving about 62k per day and I suppose this is lower than your score because I have my Intel gpu activated to run a second monitor, instead of using the 970 for that. This cwould cut into the PCIe bandwidth. I tried 3 BPRP4G with no discernible improvement. I also cut down the FGRP4 to 2 tasks, also was of no benefit.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7415521687

RAC: 1865743

Since I made my December 26

7 Jan 2015 17:55:36 UTC

Message 126995 in response to message 126994

(moderation:

)

Since I made my December 26 post I have not touched my 970 settings, and believe it to still be running 1466 core clock, 3899 memclock. The host task currently shows no invalids (meaning the ones generated during my tests are too old to show), and I've not spotted any invalids at this setting. RAC has crept up to 71,200, so considering non-infinite settling time, and my wife's usage for games, editing, and browsing, I think my computed base productivity was reasonably accurate.

As both core and mem are only slightly below levels at which I got invalid results, I think there is little room for me to push the card itself harder save perhaps by a voltage boost. There also might be a bit of improvement available from going to 4 simultaneous GPU jobs, or more fiddling with the CPU task affinities and priorities using Process Lasso.

I have the included Intel graphics disabled on this host. My early work with them was puzzling and frustrating, and I wound up doubting they added much if any net output to the host when enabled. I used Claggy's suggestion of extending the desktop as a means to get both graphics running BOINC with only one actually driving a monitor, but my wife and I both disliked the resulting "lost mouse pointer" issue.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 591074697

RAC: 132622

RE: but my wife and I both

7 Jan 2015 21:06:15 UTC

Message 126996 in response to message 126995

(moderation:

)

Quote:

but my wife and I both disliked the resulting "lost mouse pointer" issue.

Side note: this was troubling me as well. I solved it by going to the screen control panel in Win 8.1 and dragging the fake display upwards with the mouse. Thereby the point where the cursor switches is only a single joint connection at the top richt corner of my actual screen, which I trigger seldomly enough. I'm happy with the RAC of ~9k from the HD4000, with the CPU running WCG tasks and GPU-Grid support for the GTX970.

MrS

Scanning for our furry friends since Jan 2002

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7415521687

RAC: 1865743

RE: RE: but my wife and I

7 Jan 2015 22:04:19 UTC

Message 126997 in response to message 126996

(moderation:

)

Quote:

Quote:
but my wife and I both disliked the resulting "lost mouse pointer" issue.

I solved it by going to the screen control panel in Win 8.1 and dragging the fake display upwards with the mouse. Thereby the point where the cursor switches is only a single joint connection at the top right corner of my actual screen, which I trigger seldomly enough.

Thanks for the tip--if I have another try with the Intel graphics I'll want to try that (hope it works in Win7).

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

RE: RE: RE: but my wife

7 Jan 2015 22:27:26 UTC

Message 126998 in response to message 126997

(moderation:

)

Quote:

Quote:
Quote:
but my wife and I both disliked the resulting "lost mouse pointer" issue.

I solved it by going to the screen control panel in Win 8.1 and dragging the fake display upwards with the mouse. Thereby the point where the cursor switches is only a single joint connection at the top right corner of my actual screen, which I trigger seldomly enough.

Thanks for the tip--if I have another try with the Intel graphics I'll want to try that (hope it works in Win7).

I can confirm that it works in Win7 Home Premium x64.
And thanks for this tip! =)

disturber

Joined: 26 Oct 14

Posts: 30

Credit: 57155818

RAC: 0

If you look at this computer,

7 Jan 2015 23:15:48 UTC

Message 126999 in response to message 126997

(moderation:

)

If you look at this computer, a standard locked i5, I have 2 tasks running on the i5-4590 cpu, hd4600 Igpu and 2 on then Nvidia CTX 660 ti. Nothing is overclocked and I still get several invalid Igpu tasks per day. I rolled back the intel driver a while back because I was getting 90% invalids. What I saw on that machine was an increase in times for the 660 ti tasks when I started to run jobs on the Igpu. I still gained more than I lost from the longer times on the 660 ti.

http://einsteinathome.org/host/11686793

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 591074697

RAC: 132622

Yes, everyone should run

8 Jan 2015 20:20:16 UTC

Message 127000 in response to message 126999

(moderation:

)

Yes, everyone should run driver 10.18.10.3621 on Intel GPUs. All newer ones are broken and cause many invalids. Even with the proper value one get's invalids because sometimes 2 hosts with the newer driver validate each other.

And yes, since the iGPU has no dedicated memory other tasks may suffer. This can be reduced by
- running less CPU tasks
- running CPU tasks which don't need as much main memory bandwidth (e.g. WCG instead of Einstein)
- running more than 1 task on the discrete GPU
- increasing the process priority of the other GPU feeding thread via process lasso

MrS

Scanning for our furry friends since Jan 2002

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 532170078

RAC: 287505

RE: Yes, everyone should

9 Jan 2015 9:24:19 UTC

Message 127001 in response to message 127000

(moderation:

)

Quote:

Yes, everyone should run driver 10.18.10.3621 on Intel GPUs. All newer ones are broken and cause many invalids. Even with the proper value one get's invalids because sometimes 2 hosts with the newer driver validate each other.

MrS

Sorry, MrS, this statement need a addendum.
10 days ago I installed a new low budget system, a intel J1900 based pc ('Rocky1'). The latest available driver has version " 09.01.2015 10:05:24 | | OpenCL: Intel GPU 0: Intel(R) HD Graphics (driver version 10.18.10.3408, device version OpenCL 1.2, 1496MB, 1496MB available, 6 GFLOPS peak) "
and most of the ~185 crunched wu's validate, only 3 failed to validate, all against two HD4600 gpu's.
But I want to make clear: this statement is valid for the J1900 (and most likely J2900) only, not for the HD4400 / HD4600 units.
May lead to misinterpretations when crosschecking failed wu's just by driver version.

Low memory clock on Maxwell2 cards (960/970/980, probably Titan X)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner