Times (Elapsed/CPU) for BRP6-Beta-cuda55 compared to BRP6-cuda32 - Results and Discussion Thread

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109967225799
RAC: 30557664

At the time of the transition

At the time of the transition from BRP5 -> BRP6 -> BRP6-Beta, back in March, I posted detailed information about a variety of hosts/GPUs including this particular one -- a host with a GTX650Ti GPU. The full details can be seen in the first host documented in the link, but here is a small exerpt showing the stats for 153 V1.52 tasks done at 2x. The machine has soldiered on unchanged at the same settings so now that there is a change to cuda55, it's interesting to revisit that host and see what is happening. The following table has the old data plus two new entries for comparison.
[pre]
Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP6-beta V1.52 17,250 18,657 20,814 319 673 753 969 39 153 From March 2015 (@ 2x) see above link.
BRP6 V1.52 16,902 18,678 21,403 508 613 770 1,107 59 300 June 2015 (@ 2x) last 300 results on cuda32.
BRP6 V1.54 14,031 15,215 17,925 378 694 736 918 37 127 July 2015 (@ 2x) after the change to cuda55.
[/pre]
The 300 results in June are very consistent with the 153 results from back in March when the 1.52 app had beta status. The host is obviously operating quite consistently. There are enough results with the cuda55 app to be reasonably confident of a significant performance improvement - close to 19% faster in elapsed time on average. I haven't yet analysed a GTX650 host properly (still acquiring data) but there will be a (perhaps smaller?) improvement there as well. I also have a GTX550Ti that I'm following and once again I havent yet properly analysed the data. At this stage, I don't think there will be much (if any) improvement with that one. So it seems that there is a worthwhile improvement to be had from Kepler and later series GPUs, but probably not from earlier series.

The above two screenshots show the data in the form of an XY scatter plot and as the elapsed times for sequences of contiguous results. As was found previously, a lot of the results lie close to the mean with a few (sometimes stromg) outliers. A very high time is often immediately adjacent to a low time, indicating that one task in a pair was profiting and the other suffering from the relationship. This only happened relatively sporadically, so perhaps this might be related in some way to how each task in a pair was started, ie. if they start too close to the same time.

The next two plots show the shapes of the distributions and they certainly look like bell curves. These plots also give a good indication of how far away the outliers are from the mean in each case. If you disregard the outliers, what remains is a pretty tight bunch. I quite like the look of these histograms.

At the end of the day, I would like any host/GPU to operate efficiently on the science runs I choose. Maybe I'd get better GPU efficiency by eliminating CPU tasks altogether but that's not a price I'm willing to pay. I want to do FGRP4 and I will want to do advanced LIGO when the data becomes available. It will be very interesting to see if there is a GPU app when advanced LIGO data starts flowing.

Cheers,
Gary.

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0

Not exactly the topic of this

Not exactly the topic of this thread but I'd like to mention that now my GTX 750 does two BRP6 tasks faster than my R7 260. Judging by the numbers, the R7 should be equivalent or superior in every aspect.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.