Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 692,291,622
RAC: 2,833

In case this isn't clear

In case this isn't clear already: Your hosts will ONLY get work crunched with a Beta app if you actively opt-in to the Beta-test, you can check your settings here:

http://einstein.phys.uwm.edu/prefs.php?subset=project&cols=1

Scroll down to "Run beta/test application versions?" . If you want to participate in a beta test , make sure there is a "Yes" selected for all the "venues" of the PCs you want to have in the beta test.

HB

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,869
Credit: 115,074,496,151
RAC: 31,758,566

RE: These are my

Quote:

These are my results:
i7 GHz (also runs 4-5 WU)
Matrix R9 280X Platinum with Catalyst 13.12 (slightly faster than Omega)

2XWU/GPU: 4600s - 4700s


Hi Sale,

I moved your post to this discussion thread because you haven't given any indication of the sample size on which you base your estimate. Without that, and without some sort of indication of the full range of values, the numbers you give are not really sufficient to be of use.

I really don't have time to go trawling through other people's results, but I did decide to have a quick look at yours. Sure, there are quite a few around the values you quote but I did see a number around 5200, and some below 4000 and even one at 2672. Why don't you take, say, 60 consecutive results and feed them into the online stats calculator to come up with the actual mean and standard deviation? You have a very nice host there and I'm sure the information would be appreciated by the Devs. Then you could try progressively higher numbers of concurrent GPU tasks (3, 4, 5, ...), letting each new value run for say 60 tasks. It would be very interesting to see exactly where the 'sweet spot' was.

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,153
Credit: 7,162,074,931
RAC: 591,575

RE: letting each new value

Quote:
letting each new value run for say 60 tasks

Quote:
sample size

Quote:
The results excluded were those Work units discussed in the technical thread, where there is excessive CPU usage result in excessive extreme prolonged time to completion.

This variability question, at least in the degree presented by the 1.47/1.50 beta applications on BRP6 Parkes PMPS WT data is new to us at Einstein in recent times, and I think we are a bit at sea in dealing with it usefully. One extremely serious problem is that even with seemingly appreciable sample sizes (such as the 30 or 60 suggested at various places in this thread), we have very little assurance that the various samples will have seen an equivalent distribution of this behavior. Host comparisons with non-equivalent input WU distributions can be seriously in error.

To add a bit of data, and facilitate discussion, I'll show a representation of the reported Elapsed time vs. CPU time reported for early 1.47 work reported by Stoll7, my sole host which was able to work with the 1.47 version of the application.

While these two GPUs live on the same host, and were being assigned work from the same stock of downloaded work, it appears that the GTX 750 Ti may have been more "unlucky" in getting a distribution of work with more unfavorable retardation by the extra effort associated with extra CPU work. Alternatively, it may be that the application treats Maxwell-architecture GPUs of the 750 sort more unfavorably on a wider set of WUs than it does the 660. I have labelled each point as having been run by a single GPU (the one for which it was logged in the BoincTasks history screen from which I copied the data) but at probably a few of them ran partly on one, and partly the. As I was running 2X, the distribution is probably broadened by non-equivalent execution partners sharing the GPU at run-time, and to a much lesser degree, non-equivalent loading of the host system by the other GPU.

Just to complicate matters, my other two hosts which can now run this stuff since version 1.50 does not error out in 3 seconds have in the last 24 hours run an almost non-stop succession of "fortunate" WUs, so would if summarized by mean and stdev give a much rosier picture of their beta 1.50 performance than I think is likely representative across the true WU distribution.

In any case, characterizing the performance or other behavior just with a mean and standard deviation of the elapsed time likely won't converge with samples of any size that are not drawn from a sufficiently long time period of work distribution to smooth out any locally systematic variability in the "fortunate" behavior of work provided.

I had an old boss years ago who said we should never point out a problem without proposing a solution. I'll throw out one possible approach just as a discussion starter.

It appears from my graphs that for a given GPU/multiplicity/host configuration the prime distribution of CPU time vs. elapsed time is not badly approximated by two numbers--an intercept approximating the hypothetical elapsed time for an (admittedly non-existent) WU requiring zero CPU time, plus a slope describing on average how much the elapsed time increases for each increment of CPU time above zero.

These two parameters may be usefully be estimated on examination of distribution plots such as the one I posted here if and only if the WUs span a reasonable range of CPU requirement. Possibly if we later come to understand the long-term distribution of WU characteristics, we might be able to convert these two parameters to a reasonable estimate of productivity.

I'll make this type of graph for my other two hosts (with three total GPUs) when they have processed more units--and, I hope, seen a greater variety of WU good fortune. Possibly I should include on the graphs data for pre-Beta (1.39) Parkes work, which would facilitate the original interest in documenting performance improvement.

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110,568,193
RAC: 0

I don't seem to have as much

I don't seem to have as much variation. May this be OS-dependent?
These are twenty v1.50 workunits, i5-4690K and 750 Ti running two at a time.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,153
Credit: 7,162,074,931
RAC: 591,575

stef wrote:I don't seem to

stef wrote:
I don't seem to have as much variation.


I'd say instead that your sample illustrates the perils of not getting enough work from the less fortunate population to characterize the behavior. I'd say you have just two of the twenty units which are not crowded down at the most fortunate end, and, that the 2/20 ratio likely has nothing to do with OS and is just luck of the draw on work you have run so far.

Perhaps time will tell.

I'll hazard a guess that the single outlier low elapsed time base population WU probably ran paired with one of the two slow units for a significant fraction of run time.

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110,568,193
RAC: 0

Ok, but mine look a bit more

Ok, but mine look a bit more "grouped", there are just slow ones and fast ones.
I'll make a new graph in a few days when the number of samples is higher.

Sasa Jovicic
Sasa Jovicic
Joined: 17 Feb 09
Posts: 75
Credit: 82,250,661
RAC: 22,487

Hi Gary! No problem.

Hi Gary!

No problem.

Thank you!

archae86
archae86
Joined: 6 Dec 05
Posts: 3,153
Credit: 7,162,074,931
RAC: 591,575

Similar graphs to my first

Similar graphs to my first for my other two hosts. Arguably these are premature, as the amount of Beta data is small and the captured range of WU behavior is poor for characterizing overall behavior--but I have included a representative sample of pre-Beta (1.39 application) behavior in the same representation for comparison.

Stoll8, the first new host has a single GTX 970 running 3X, and appreciably over-clocked (most of the improvement was from the memory clock, which had more headroom than the GPU clock). The overclocking did not change between pre-beta and beta. I consider the beta population of work highly biased toward fortunate WUs, so any performance improvement estimate based means for the data on this graph is likely substantially overstated.

Stoll6 last host has a pair of GPUs, with the same GTX 660 model as Stoll5, but a lower-grade 750 base model rather than the 2 GB Ti on Stoll7.

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110,568,193
RAC: 0

RE: I'll hazard a guess

Quote:
I'll hazard a guess that the single outlier low elapsed time base population WU probably ran paired with one of the two slow units for a significant fraction of run time.


BTW, you were right. They started at the same time.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,869
Credit: 115,074,496,151
RAC: 31,758,566

RE: RE: letting each new

Quote:
Quote:
letting each new value run for say 60 tasks

Quote:
sample size

Quote:
The results excluded were those Work units discussed in the technical thread, where there is excessive CPU usage result in excessive extreme prolonged time to completion.

This variability question, at least in the degree presented by the 1.47/1.50 beta applications on BRP6 Parkes PMPS WT data is new to us at Einstein in recent times, and I think we are a bit at sea in dealing with it usefully. One extremely serious problem is that even with seemingly appreciable sample sizes (such as the 30 or 60 suggested at various places in this thread), we have very little assurance that the various samples will have seen an equivalent distribution of this behavior. Host comparisons with non-equivalent input WU distributions can be seriously in error.

....

Peter,

Thank you so much for contributing here. I know that more than 30 or 60 data points are really needed but I was rather afraid that if I told people that a sample size of several hundred would probably be necessary, then ..... :-).

When I think about it a bit more, I should have been smart enough to trust people's desire to provide good data. I should have said that this needs a bit of effort and then spelt out the details fairly explicitly on just how much work might be involved.

Depending on what tools you have, your skill level with those tools and how much time you have available, there are bound to be many different ways to deal with the problem. I'm an absolute dummy when it comes to statistical presentation and how to get the nice graphs that show the problem at a glance - a picture really is worth 1000's of words - so I'll just lay out exactly what I've done to get my own results. I suggest that if you, the general reader - not Peter - think you're a dummy like me, this is a fairly painless way to contribute your information in a useful way.

1. Find a suitable stats calculator. I just googled "online statistics calculator" or something like that and pretty much started using the first hit I tried. There's a nice text box with a big red calculate button underneath it and the simple instructions of something like "add your numbers here, separated by commas ',' " - so I did :-).

2. Get some available host data. What I found useful was to open the 'tasks list' view of my computer of interest on the website and select just the Parkes PMPs XT tasks. I then clicked through a page at a time until I found the start of returned results for the BRP6-Beta app - whatever version - just as long as it said "Beta".

3. Input the numbers - ALL the numbers - no cherry picking!! I opened the stats calculator in a suitably narrowed new window so I could position it over the EAH data page in such a way as to clearly see the two time data columns to the side of the stats calculator. I opened a second instance of the calculator in a second tab of the narrowed window. I typed all the 'Elapsed' times into one tab and hit "calculate". I changed tabs and typed all the 'CPU' times into the second instance and hit "calculate". You get 'sample size', 'mean', 'standard deviation', and other values but you also get to note that the 'sample size' should be the same in both tabs - a good check that you didn't miss anything.

4. Input more numbers. It's quite easy to then click on 'next page' to get the next 20 data values and to then feed them into the appropriate stats calculator instances and hit the 'calculate' button again. You get to see what sort of a change this makes to the mean and standard deviation - usually quite big changes - giving you a clear 'heads-up' of just how variable the data really is.

5. Rinse and repeat step 4 until you run out of numbers. The more data you have, the better your numbers will be and the more useful your contribution will be. If you are really serious about getting a true picture, expect to have to process a lot more than currently available at this early stage. You could open two files (Elapsed.txt and CPU.txt, say) with a basic text editor and (each day, say) store the new numbers there as a comma separated list. When you finally have saved (over many days perhaps) a big enough sample size, copy and paste your full string of numbers into the stats calculator. It should only take a few minutes per day per host to build up a very useful set of data with many hundreds of data points.

If you are prepared to do something like this, you will be doing a great service to the Devs who would just love to have this kind of information. They can't get it from their own online database because they have no way of knowing exactly what GPU concurrency was being used for each host and how this might have changed over time, or the number and type of GPUs per host for that matter.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.