CPU type AuthenticAMD AMD FX(tm)-9370 Eight-Core Processor [Family 21 Model 2 Stepping 0]
....
Large CPU utilization work units that were using 78-84% CPU per work unit were removed from this list. Those had excessive run and cpu times as well
Hi Zalster,
I appreciate it that you are willing to contribute results but when you say you are deliberately leaving out results because they had excessive times, that makes your report rather useless. For that reason, I've moved both posts here to the DISCUSSION thread.
The whole point is that we expect the beta app to have quite a range of crunch times for exactly the reasons that Heinz-Bernd explained so clearly to us.
It would be great if you repost your results with a sample size of around 30 and without leaving out any 'inconvenient' values. The whole point is to document the variability and give people a clear idea of the highs and lows over a decent sample size and not just some relatively meaningless idea of the 'better' values. Also don't chop and change your GPU utilization factor without clearly indicating this. The best thing would be to make a separate post of the results each time you change.
A final point. Reporting a CPU % utilization is probably not really useful. I think it's better to see run time / cpu time as a pair of values. That pretty much shows you on average how much of the CPU the task is consuming.
BRP5:
1WU/GPU (1.0 CPU + 1.0 GPU): 5000 s
2WU/GPU (1.0 CPU + 0.5 GPU): 7400 s
* here are times higher than for the i7-3820 host with slower GPUs, I think this can be explained by the lower performing CPU.
* this host also produces lots of errors on BRP, even with reduced GPU MemClk. Milkyway@Home (DPFP) runs stable there.
-----------------
I should soon be able provide stats from Quadro K4200 and Tesla K20c..
The new app might shorten this period, other developments could prolong it (e.g. if we'll do other searches in parallel on GPUs in the future) or shorten it further (net increase in active members with GPUs, and GPUs getting faster over the next years, perhaps....perhaps not).
So it's hard to predict the actual duration, but it's a rather long lasting run, that's for sure. It was one of the motivations to optimize it again because this search setup will be around for some time.
@HB
The computer is 3 stack of 780s. Precision X is set to throttle down at 76C, as these are air cooled only. As a result 2 of the 3 GPUs began to throttle down in order to maintain these temps.
I've already lost 3 GPUs on this rig to high Temps from another projects so that is why I didn't pursue this setting.
@Gary. The results excluded were those Work units discussed in the technical thread, where there is excessive CPU usage result in excessive extreme prolonged time to completion. How excessive? Normally each work unit uses between 14-18% These individual work units would use up to 80% of a core. As you can see from the times of the 4 listed above they are way out of proportion when compared to the other work units. When paired to a normal work unit they caused the normal unit to proceed much faster while they slowed down even more beyond what they were already doing. Something very similar to another project that we have seen. Since my results aren't useful guess I'll stop the test on these then.
I moved your post to this thread because you haven't specified GPU concurrency and only 6 tasks reported per host is too few a number from which to calculate a meaningful average. It's fine to report run time only if you wish but please report an average over a much larger number of tasks. If you don't report concurrency, the assumption is that you are only running one task at a time. 03:38:00 seems way too slow for a HD7970 running only a single task.
If you'd like to repost with a larger sample size and clearly stated concurrency, it would be much appreciated.
Thank you for your interest in posting performance data for the beta app.
Sorry, I didn't provide the clear report.
Actually, I meant that 6 WUs are running simultaneously on each mentioned GPU and a time for each is about the same so I took an approximate value.
So 03:38:00 is a time for 6 WUs in parallel on HD7970 and I guess it is not too slow at all.
Hope it helps.
Sorry for not providing extensive details, just some basic results.
All machines are also running CPU tasks (WCG and vLHC).
.....
Hi Mumak,
I moved your post to this thread because you don't seem to have supplied any BRP6-beta results, just the standard BRP6 app. Since BRP6 (non-beta) is just likely to be the same as BRP5 plus approximately 33% for the larger task size, results for the standard app can be pretty much anticipated. If the information you have reported is actually for BRP6-beta rather than BRP6, it's not useful without at least the sample size and some indication of the variability of the results.
It would be much appreciated if you would care to repost in the RESULTS thread when you have enough BRP6-beta data to report. You need to have a significant sample size (eg 30+, which you should clearly state) so that the values you give are likely to represent the expected variability. Please be advised that providing a mean and standard deviation of a sufficiently large sample size is pretty much exactly what the Devs are hoping for.
RE: http://einstein.phys.uw
)
Hi Zalster,
I appreciate it that you are willing to contribute results but when you say you are deliberately leaving out results because they had excessive times, that makes your report rather useless. For that reason, I've moved both posts here to the DISCUSSION thread.
The whole point is that we expect the beta app to have quite a range of crunch times for exactly the reasons that Heinz-Bernd explained so clearly to us.
It would be great if you repost your results with a sample size of around 30 and without leaving out any 'inconvenient' values. The whole point is to document the variability and give people a clear idea of the highs and lows over a decent sample size and not just some relatively meaningless idea of the 'better' values. Also don't chop and change your GPU utilization factor without clearly indicating this. The best thing would be to make a separate post of the results each time you change.
A final point. Reporting a CPU % utilization is probably not really useful. I think it's better to see run time / cpu time as a pair of values. That pretty much shows you on average how much of the CPU the task is consuming.
Thank you for your cooperation!
Cheers,
Gary.
Another thing that would
)
Another thing that would interest me:
What happened?
Thanks
HB
Sorry for not providing
)
Sorry for not providing extensive details, just some basic results.
All machines are also running CPU tasks (WCG and vLHC).
-----------------
CPU: Intel Core i7 860
GPU: NVIDIA GeForce GTX 660 Ti (2048MB)
BRP5:
1WU/GPU (0.5 CPU + 1.0 GPU): 8500 s
2WU/GPU (0.5 CPU + 0.5 GPU): 13800 s
BRP6:
2WU/GPU (0.5 CPU + 0.5 GPU): 18100 s
-----------------
CPU: 2xIntel Xeon E5-2687W
GPU: NVIDIA Quadro K4000 (3072MB)
BRP5:
1WU/GPU (1.0 CPU + 1.0 GPU): 10100 s
BRP6:
1WU/GPU (1.0 CPU + 1.0 GPU): 13200 s
-----------------
CPU: Intel Core i7-3820
GPU: 2xAMD RADEON HD7950 (GPU 980 MHz, MemClk 1250 MHz)
BRP5:
1WU/GPU (1.0 CPU + 1.0 GPU): 4700 s
-----------------
CPU: AMD FX-8350
GPU: AMD RADEON R9 280X (GPU 1100 MHz, MemClk reduced to 1250 MHz)
BRP5:
1WU/GPU (1.0 CPU + 1.0 GPU): 5000 s
2WU/GPU (1.0 CPU + 0.5 GPU): 7400 s
* here are times higher than for the i7-3820 host with slower GPUs, I think this can be explained by the lower performing CPU.
* this host also produces lots of errors on BRP, even with reduced GPU MemClk. Milkyway@Home (DPFP) runs stable there.
-----------------
I should soon be able provide stats from Quadro K4200 and Tesla K20c..
These are my results: i7 GHz
)
These are my results:
i7 GHz (also runs 4-5 WU)
Matrix R9 280X Platinum with Catalyst 13.12 (slightly faster than Omega)
2XWU/GPU: 4600s - 4700s
At the rate of 60 beams/days
)
At the rate of 60 beams/days is going to take almost 2 years to complete this search...
RE: At the rate of 60
)
That should drop nicely when the Beta app is proven worthy of general release.
RE: At the rate of 60
)
Actually it was designed to last 3 years given current throughput, roughly, see this message http://einsteinathome.org/node/196604&nowrap=true#138207
The new app might shorten this period, other developments could prolong it (e.g. if we'll do other searches in parallel on GPUs in the future) or shorten it further (net increase in active members with GPUs, and GPUs getting faster over the next years, perhaps....perhaps not).
So it's hard to predict the actual duration, but it's a rather long lasting run, that's for sure. It was one of the motivations to optimize it again because this search setup will be around for some time.
HB
RE: Another thing that
)
@HB
The computer is 3 stack of 780s. Precision X is set to throttle down at 76C, as these are air cooled only. As a result 2 of the 3 GPUs began to throttle down in order to maintain these temps.
I've already lost 3 GPUs on this rig to high Temps from another projects so that is why I didn't pursue this setting.
@Gary. The results excluded were those Work units discussed in the technical thread, where there is excessive CPU usage result in excessive extreme prolonged time to completion. How excessive? Normally each work unit uses between 14-18% These individual work units would use up to 80% of a core. As you can see from the times of the 4 listed above they are way out of proportion when compared to the other work units. When paired to a normal work unit they caused the normal unit to proceed much faster while they slowed down even more beyond what they were already doing. Something very similar to another project that we have seen. Since my results aren't useful guess I'll stop the test on these then.
Have a good weekend
RE: RE: CPU: Intel Core
)
Sorry, I didn't provide the clear report.
Actually, I meant that 6 WUs are running simultaneously on each mentioned GPU and a time for each is about the same so I took an approximate value.
So 03:38:00 is a time for 6 WUs in parallel on HD7970 and I guess it is not too slow at all.
Hope it helps.
RE: Sorry for not providing
)
Hi Mumak,
I moved your post to this thread because you don't seem to have supplied any BRP6-beta results, just the standard BRP6 app. Since BRP6 (non-beta) is just likely to be the same as BRP5 plus approximately 33% for the larger task size, results for the standard app can be pretty much anticipated. If the information you have reported is actually for BRP6-beta rather than BRP6, it's not useful without at least the sample size and some indication of the variability of the results.
It would be much appreciated if you would care to repost in the RESULTS thread when you have enough BRP6-beta data to report. You need to have a significant sample size (eg 30+, which you should clearly state) so that the values you give are likely to represent the expected variability. Please be advised that providing a mean and standard deviation of a sufficiently large sample size is pretty much exactly what the Devs are hoping for.
Thank you.
Cheers,
Gary.