Hi,
just got my brand new GTX 980 and I'm wondering if anyone already got one himself?
I'm not quite sure on what the best setting will be for the 980 with an i7 4790K supporting it.
Currenty I have 4 simulaneous tasksa BRP4G-cuda-32-nv301 running. But I'm not quite sure if that's appropriate.
980 seems to be a great per Watt performer!
Copyright © 2024 Einstein@Home. All rights reserved.
Anyone GTX 980 running?
)
Personally, I think a good default starting point for the majority of capable graphics card doing Einstein processing is to start by running 2 simultaneous GPU jobs and by setting the CPU core usage limits used by BOINC to one less than the physical ones present.
If one instruments carefully, then one can compare that configuration, which is likely to be not terribly much less productive than the optimal one with test configurations, adjusting the number of simultaneous GPU jobs, the "free cores", and, if one is more adventurous, adjusting CPU affinity, task CPU priority, and task I/O priority.
You 4x may be more productive than 2x, but it also may be less productive. None of us can tell you, though many of us are eager to learn your observations on your machine.
I believe the major Einstein work types have highly consistent work content per task in the cases where Einstein awards the same credit. But, especially in heavily loaded configurations, individual task elapsed times can very widely. In those cases one must average measurement of an adequate number of tasks per configuration tried to get a reliable comparison.
If you have an interest in
)
If you have an interest in comparing the performance of different configurations (such as adjusting the number of simultaneous GPU jobs, or the fraction of CPU cores allocated for use by BOINC), reducing sources of task-to-task variability is helpful.
One possibility on that line you may wish to consider is to adjust your Einstein preferences to accept only one type of GPU work and only one type of CPU work on this host (and not to suspend other projects--if any--during these tests). Given the current work mix, I specifically suggest the Perseus arm survey GPU work and the FGRP4 CPU work be the only types during comparison trials.
Thanks for your quick reply.
)
Thanks for your quick reply.
I made some changes in the einstein preferance window and Boinc:
I've set GPU utilization factor of BRP and GW to 0.5. And in BOINC I changed multicore usage to 75% (I treat my 4790K as a quad core which should free up a core this way). I also set einstein so that it runs Perseus Arm Survey and Gamma-ray pulsar search #4 only.
It's running two BRP5 and six FGRP4 tasks atm simultaneously. Should I leave it at that?
My plan is to let BOINC run for a some time and note the time it takes to compelte a task and compare this to the time it took with my old settings? Is that how I can compare the performance?
So far I've picked some values for BRP4G-cuda32-nv301. Took around 4124s = 69 min to complete one task.
All I want is to use my machine as efficiently as possible :)
Dave, You have a good
)
Dave,
You have a good start on finding the "sweet spot" for your machine, the setup that will produce the most throughput. I personally don't run CPU tasks on machines with GPUs in them, but lots of people do.
Something to keep in mind. Your ram is shared amongst all tasks, as is your different levels of cache on the CPU itself. Any CPU tasks running will need to take part of the bandwidth and memory for themselves. At a certain point, which is different for each machine, enough bandwidth will be used to start affecting the bandwidth used by your GPU tasks.
For example, a friend of mine is running a fairly nice AMD GPU. He is running 3 GPU tasks concurrently. The CPU is a fairly fast one, also overclocked and water cooled.
Although he currently chooses to not run CPU tasks, he is able to run up to 4 of 8 threads for CPU tasks without affecting GPU task times.
So basically, 4 threads for CPU tasks, 3 threads to support 3 GPU tasks, and 1 thread left over for computer overhead and housekeeping.
You are going about this the correct way. Make a change, let it run for a bit, then check times. Rinse and repeat until you find what works for you and your computer. Just keep in mind the next computer you fire up for crunching will probably end up a bit different on the setup.
Phil
Phil
I thought I was wrong once, but I was mistaken.
Dave, thanks for sharing your
)
Dave, thanks for sharing your results.
Monitoring the "elapsed times" is the quick way to assess the effect of trial configuration changes. Monitoring RAC works in the long run, but it can take weeks to get an accuracy of comparison you can often get within hours by elapsed time comparison.
For the record, before your results disappear, it appears than after you changed from 4x to 2x GPU jobs with 75% of cores allowed, and before you ran out of your supply of previously downloaded Perseus work, you got quite reproducible Perseus Elapsed Times near 11150 seconds average (eyeball by me--not a formal calculation). That provides a good basis of comparison for the Perseus productivity of other platforms.
Ok, here are my first
)
Ok, here are my first results. I picked 5 values for BRP4G.
Before I made the changes (that means 4 tasks simulatenously, no CPU) I got:
BRP4G-cuda32-nv301 - 3,853.98s
BRP4G-cuda32-nv301 - 4,384.10s
BRP5-cuda32-nv301 - 13,701.12s
BRP4G-cuda32-nv301 - 4,393.49s
BRP4G-cuda32-nv301 - 3,869.23s AVERAGE: 6040s
After the changes (2 simul., 6 CPU tasks) it takes sigificant more time to compelte a task:
BRP5-cuda32-nv301 - 11,049.50s
BRP5-cuda32-nv301 - 11,201.54
BRP5-cuda32-nv301 - 11,149.78
BRP5-cuda32-nv301 - 11,091.71
BRP5-cuda32-nv301 -11,201.54 AVERAGE: 11128s
I will let it run another 24h or so with 2 tasks running simultaneously and no CPU tasks at all this time.
Be careful to separate BRP4G
)
Be careful to separate BRP4G and BRP5 as they are different searches that take different time to process.
So before the change:
BRP4G averaged 4,125.2s and the one BRP5 took 13,701.12s.
RE: Be careful to separate
)
Ah damnit, totally missed that- Gotta redo my testing. Stay tuned ;)
Got some new
)
Got some new numbers:
BRP5-cuda32-nv301 - 2 GPU tasks, no CPU tasks
30 Sep 2014 8:24:09 8,941.96
30 Sep 2014 0:38:25 9,006.95
30 Sep 2014 3:07:54 9,006.67
30 Sep 2014 10:53:28 9,129.37
30 Sep 2014 5:53:59 8,942.53
AVG 9004s * 2 = 18008s
BRP5-cuda32-nv301 - 4 GPU tasks, no CPU tasks
2 Oct 2014 11:39:08 17,760.79
2 Oct 2014 5:17:08 17,682.45
1 Oct 2014 20:18:36 17,749.65
2 Oct 2014 2:46:54 17,733.19
1 Oct 2014 1:24:26 17,316.42
AVG 17648s
FGRP4-SSE2 - 4 GPU tasks, 50% CPU (4 tasks)
3 Oct 2014 22:06:32 25,527.31
3 Oct 2014 14:07:35 26,452.37
3 Oct 2014 7:41:55 26,845.52
3 Oct 2014 5:55:48 26,591.45
3 Oct 2014 0:37:52 25,410.29
AVG 26165
BRP5-cuda32-nv301 - 4 GPU tasks, 50% CPU (4 tasks)
3 Oct 2014 19:03:46 19,231.83
3 Oct 2014 14:06:32 19,584.11
3 Oct 2014 14:07:35 19,595.12
3 Oct 2014 11:15:23 19,557.84
3 Oct 2014 16:40:37 19,237.85
AVG 19440
Power draw (system, worst case): 240W
Trying 5 GPU tasks at the moment. But it seems like 4 is the sweet spot. CPU does affect performance (takes 1/2 hr. longer to complete GPU tasks) but on the other hand I get 4 CPU tasks done in around 26165s. Not sure if I should run GPU only.
BRP5-cuda32-nv301 - 5 GPU
)
BRP5-cuda32-nv301 - 5 GPU tasks, no CPU
5 Oct 2014 0:14:20 21,438.22
5 Oct 2014 0:14:20 21,437.92
4 Oct 2014 21:19:52 21,661.69
4 Oct 2014 21:19:52 21,665.37
4 Oct 2014 21:19:52 21,653.86
AVG 21570s
Power draw (system, worst case): 185W
Seems like 5 tasks run just fine. CPU taks however seem to be quite inefficient.