Anyone GTX 980 running?

Dave
Dave
Joined: 7 Dec 12
Posts: 14
Credit: 7,276,378
RAC: 0
Topic 197736

Hi,

just got my brand new GTX 980 and I'm wondering if anyone already got one himself?
I'm not quite sure on what the best setting will be for the 980 with an i7 4790K supporting it.

Currenty I have 4 simulaneous tasksa BRP4G-cuda-32-nv301 running. But I'm not quite sure if that's appropriate.

980 seems to be a great per Watt performer!

archae86
archae86
Joined: 6 Dec 05
Posts: 3,152
Credit: 7,131,344,931
RAC: 524,614

Anyone GTX 980 running?

Personally, I think a good default starting point for the majority of capable graphics card doing Einstein processing is to start by running 2 simultaneous GPU jobs and by setting the CPU core usage limits used by BOINC to one less than the physical ones present.

If one instruments carefully, then one can compare that configuration, which is likely to be not terribly much less productive than the optimal one with test configurations, adjusting the number of simultaneous GPU jobs, the "free cores", and, if one is more adventurous, adjusting CPU affinity, task CPU priority, and task I/O priority.

You 4x may be more productive than 2x, but it also may be less productive. None of us can tell you, though many of us are eager to learn your observations on your machine.

I believe the major Einstein work types have highly consistent work content per task in the cases where Einstein awards the same credit. But, especially in heavily loaded configurations, individual task elapsed times can very widely. In those cases one must average measurement of an adequate number of tasks per configuration tried to get a reliable comparison.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,152
Credit: 7,131,344,931
RAC: 524,614

If you have an interest in

If you have an interest in comparing the performance of different configurations (such as adjusting the number of simultaneous GPU jobs, or the fraction of CPU cores allocated for use by BOINC), reducing sources of task-to-task variability is helpful.

One possibility on that line you may wish to consider is to adjust your Einstein preferences to accept only one type of GPU work and only one type of CPU work on this host (and not to suspend other projects--if any--during these tests). Given the current work mix, I specifically suggest the Perseus arm survey GPU work and the FGRP4 CPU work be the only types during comparison trials.

Dave
Dave
Joined: 7 Dec 12
Posts: 14
Credit: 7,276,378
RAC: 0

Thanks for your quick reply.

Thanks for your quick reply.
I made some changes in the einstein preferance window and Boinc:

I've set GPU utilization factor of BRP and GW to 0.5. And in BOINC I changed multicore usage to 75% (I treat my 4790K as a quad core which should free up a core this way). I also set einstein so that it runs Perseus Arm Survey and Gamma-ray pulsar search #4 only.
It's running two BRP5 and six FGRP4 tasks atm simultaneously. Should I leave it at that?

My plan is to let BOINC run for a some time and note the time it takes to compelte a task and compare this to the time it took with my old settings? Is that how I can compare the performance?

So far I've picked some values for BRP4G-cuda32-nv301. Took around 4124s = 69 min to complete one task.

All I want is to use my machine as efficiently as possible :)

Phil
Phil
Joined: 8 Jun 14
Posts: 579
Credit: 228,493,502
RAC: 0

Dave, You have a good

Dave,

You have a good start on finding the "sweet spot" for your machine, the setup that will produce the most throughput. I personally don't run CPU tasks on machines with GPUs in them, but lots of people do.

Something to keep in mind. Your ram is shared amongst all tasks, as is your different levels of cache on the CPU itself. Any CPU tasks running will need to take part of the bandwidth and memory for themselves. At a certain point, which is different for each machine, enough bandwidth will be used to start affecting the bandwidth used by your GPU tasks.

For example, a friend of mine is running a fairly nice AMD GPU. He is running 3 GPU tasks concurrently. The CPU is a fairly fast one, also overclocked and water cooled.

Although he currently chooses to not run CPU tasks, he is able to run up to 4 of 8 threads for CPU tasks without affecting GPU task times.

So basically, 4 threads for CPU tasks, 3 threads to support 3 GPU tasks, and 1 thread left over for computer overhead and housekeeping.

You are going about this the correct way. Make a change, let it run for a bit, then check times. Rinse and repeat until you find what works for you and your computer. Just keep in mind the next computer you fire up for crunching will probably end up a bit different on the setup.

Phil

archae86
archae86
Joined: 6 Dec 05
Posts: 3,152
Credit: 7,131,344,931
RAC: 524,614

Dave, thanks for sharing your

Dave, thanks for sharing your results.
Monitoring the "elapsed times" is the quick way to assess the effect of trial configuration changes. Monitoring RAC works in the long run, but it can take weeks to get an accuracy of comparison you can often get within hours by elapsed time comparison.

For the record, before your results disappear, it appears than after you changed from 4x to 2x GPU jobs with 75% of cores allowed, and before you ran out of your supply of previously downloaded Perseus work, you got quite reproducible Perseus Elapsed Times near 11150 seconds average (eyeball by me--not a formal calculation). That provides a good basis of comparison for the Perseus productivity of other platforms.

Dave
Dave
Joined: 7 Dec 12
Posts: 14
Credit: 7,276,378
RAC: 0

Ok, here are my first

Ok, here are my first results. I picked 5 values for BRP4G.
Before I made the changes (that means 4 tasks simulatenously, no CPU) I got:

BRP4G-cuda32-nv301 - 3,853.98s
BRP4G-cuda32-nv301 - 4,384.10s
BRP5-cuda32-nv301 - 13,701.12s
BRP4G-cuda32-nv301 - 4,393.49s
BRP4G-cuda32-nv301 - 3,869.23s AVERAGE: 6040s

After the changes (2 simul., 6 CPU tasks) it takes sigificant more time to compelte a task:

BRP5-cuda32-nv301 - 11,049.50s
BRP5-cuda32-nv301 - 11,201.54
BRP5-cuda32-nv301 - 11,149.78
BRP5-cuda32-nv301 - 11,091.71
BRP5-cuda32-nv301 -11,201.54 AVERAGE: 11128s

I will let it run another 24h or so with 2 tasks running simultaneously and no CPU tasks at all this time.

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

Be careful to separate BRP4G

Be careful to separate BRP4G and BRP5 as they are different searches that take different time to process.

So before the change:
BRP4G averaged 4,125.2s and the one BRP5 took 13,701.12s.

Dave
Dave
Joined: 7 Dec 12
Posts: 14
Credit: 7,276,378
RAC: 0

RE: Be careful to separate

Quote:

Be careful to separate BRP4G and BRP5 as they are different searches that take different time to process.

So before the change:
BRP4G averaged 4,125.2s and the one BRP5 took 13,701.12s.

Ah damnit, totally missed that- Gotta redo my testing. Stay tuned ;)

Dave
Dave
Joined: 7 Dec 12
Posts: 14
Credit: 7,276,378
RAC: 0

Got some new

Got some new numbers:

BRP5-cuda32-nv301 - 2 GPU tasks, no CPU tasks
30 Sep 2014 8:24:09 8,941.96
30 Sep 2014 0:38:25 9,006.95
30 Sep 2014 3:07:54 9,006.67
30 Sep 2014 10:53:28 9,129.37
30 Sep 2014 5:53:59 8,942.53

AVG 9004s * 2 = 18008s

BRP5-cuda32-nv301 - 4 GPU tasks, no CPU tasks
2 Oct 2014 11:39:08 17,760.79
2 Oct 2014 5:17:08 17,682.45
1 Oct 2014 20:18:36 17,749.65
2 Oct 2014 2:46:54 17,733.19
1 Oct 2014 1:24:26 17,316.42

AVG 17648s

FGRP4-SSE2 - 4 GPU tasks, 50% CPU (4 tasks)
3 Oct 2014 22:06:32 25,527.31
3 Oct 2014 14:07:35 26,452.37
3 Oct 2014 7:41:55 26,845.52
3 Oct 2014 5:55:48 26,591.45
3 Oct 2014 0:37:52 25,410.29

AVG 26165

BRP5-cuda32-nv301 - 4 GPU tasks, 50% CPU (4 tasks)
3 Oct 2014 19:03:46 19,231.83
3 Oct 2014 14:06:32 19,584.11
3 Oct 2014 14:07:35 19,595.12
3 Oct 2014 11:15:23 19,557.84
3 Oct 2014 16:40:37 19,237.85

AVG 19440
Power draw (system, worst case): 240W

Trying 5 GPU tasks at the moment. But it seems like 4 is the sweet spot. CPU does affect performance (takes 1/2 hr. longer to complete GPU tasks) but on the other hand I get 4 CPU tasks done in around 26165s. Not sure if I should run GPU only.

Dave
Dave
Joined: 7 Dec 12
Posts: 14
Credit: 7,276,378
RAC: 0

BRP5-cuda32-nv301 - 5 GPU

BRP5-cuda32-nv301 - 5 GPU tasks, no CPU
5 Oct 2014 0:14:20 21,438.22
5 Oct 2014 0:14:20 21,437.92
4 Oct 2014 21:19:52 21,661.69
4 Oct 2014 21:19:52 21,665.37
4 Oct 2014 21:19:52 21,653.86

AVG 21570s
Power draw (system, worst case): 185W

Seems like 5 tasks run just fine. CPU taks however seem to be quite inefficient.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.