[S5R3/R4] How to check Performance when Testing new Apps

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,206
Credit: 43,319,822,437
RAC: 44,569,638
Topic 193468

One unfortunate characteristic of crunch times for tasks in the current S5R3 run is that they show a considerable cyclic variation of perhaps 20 - 30% or so. Because of this the casual observer may easily make the mistake of dismissing a new app as being worthless since the first few tasks from the new app may appear to be no better or perhaps even worse than the last tasks crunched with the old app, depending on what part of the cycle they come from.

Because of the skillful work done by people like archae86 and Richard Haselgrove, the cyclic nature is now thoroughly documented with the necessary information and equations to allow interested participants to calculate accurately, the true performance difference when a new app is deployed.

This is brilliant for people who take the trouble to collect the necessary "before" and "after" performance data and who set up something like an Excel spreadsheet (with appropriate cell formulae) to apply the cyclic crunch time equation to the processing of that data. But what about we "lesser mortals" :) who just want to crunch a couple of "new app" tasks and come up with a quick (but reliable) ball park figure for the true performance change?

The key is to take archae86's cycle period formula which first appeared (I think) in this post immediately after the idea of a log/log plot was suggested by Mike Hewson. I'm sure archae86 would have discovered the relationship himself, even if not prompted, and he certainly deserves the credit for all the hard work of data collection and analysis. A classical example of good scientific principles at work.

The key formula is that the cycle period (P) for the variation in crunch time for data of frequency (F) is:-

P = 0.000206 * F^2

The frequency to 2 decimal places is contained within the task filename, eg the frequency for the task h1_0737.50_S5R2__70_S5R3a_1 is 737.50Hz. If you just happened to be crunching this task then you would expect to see cyclic crunch times whose calculated period would be:-

P = 0.000206 * 737.5^2 = 112.04 Each task has a sequence number, eg 70 in the above example. Tasks which have a sequence number that is an integer multiple of the period (including zero) have the maximum crunch time, eg 0, 112, 224, 336, ... in the above example. Let's call these the peak tasks. Tasks whose sequence number falls half way between any two adjacent peaks will have the minimum possible crunch time, ie the fastest speed of crunching. In the above example such tasks would be numbered 56, 168, 280, 392, ... and we could call them trough tasks.

So if we wanted to reliably compare "new app" tasks with "old app" tasks, we just need to ensure two things:-

  • * The tasks need to have a close enough frequency so that the calculated period does not vary significantly.
    * The tasks chosen for comparison should have as close as possible to the same offset from any peak or trough in the cycle.

Because of the relatively steep slope of the cycle near to the peaks, it is best if possible to compare tasks which are close to a trough where the slope is much smaller.

As an example from the above data, if you had a "new" task whose sequence number was 48, you could compare it precisely with any "old" task whose sequence number just happened to be any of 48, 64, 160, 176, 272, 288, 384, 400, etc. Because the new task is fairly close to a trough, you would get a reasonable comparison if the old task was within +/- several sequence numbers of those listed for the precise comparison. So the new 48 task could be reasonably matched against an old 164 task if you just happened to have such a beast in your old results list.

So, if you think the above is all a load of crap, just read it all again more slowly and carefully and it should become understandable. If you reckon you thoroughly understand it all then here are three little tests for you.

TEST 1.
Using the above task sequence whose frequency is 737.5, you have been sent more data files whose frequencies are 737.75, 737.80, 737.85, etc. Could you still do reliable comparisons between appropriate new frequency tasks and old tasks from the 737.5 sequence?

TEST 2.
You have an old task whose sequence number was 120 and a new task whose number was 104. Could these two be reliably compared? What about if the new task was 216? What about if the new task was 108?

TEST 3.
Your old task was 109 and your new task was 106. Would you get an accurate assessment by comparing those two?

ANSWERS.
1. Y
2. Y, Y, N
3. N

If you don't agree with the answers, please explain why :).

Cheers,
Gary.

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1,000,560
RAC: 0

[S5R3/R4] How to check Performance when Testing new Apps

I used that formula to chart it for frequencies from 1-1000. A visual aid might be nice.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 455,676,958
RAC: 41,467

RE: I used that formula to

Message 77725 in response to message 77724

Quote:

I used that formula to chart it for frequencies from 1-1000. A visual aid might be nice.

Hi!

a crucial factor (possibly the only major one) for the period should be the size of the sky-grid used for a range of frequencies. This is something rather easy to find out, I guess (for Linux, the HOWTO for Windows is left as an exercise to the reader):

1) Go to the ~/BOINC/projects/einstein.phys.uwm.edu/ directory and copy the file(s) skygrid_XXXXHz_S5R3.dat to some other directory , say /tmp . XXX denotes the frequency range.

2) go to /tmp . DO NOT execute the following steps in the BOINC directory, do it in the /tmp folder!!!!!

3) rename skygrid_XXXHz_S5R3.dat to skygrid_XXXHz_S5R3.dat.zip

4) unzip skygrid_*.dat.zip, this will create a new (text) file skygrid_XXXHz_S5R3.dat

5) count the number of lines in the text file, e.g.
wc skygrid_XXXHz_S5R3.dat

Do this for all the E@H hosts you have, then try to correlate the frequency ranges to the number of lines. Should produce a similar graph as the one above, I guess. Should follow a nice quadratic function. Hmmm... lets see:

Here are a few of my results:

700 ==> 120198
740 ==> 134316
760 ==> 141675

So gridSize ~ freq^2 * 0.2453

The workunits I've seen so far contain about 1200 sky points, so the period would be roughly

p ~ gridsize/1200 ~ freq^2 / 4892 ~ freq^2 * 0.0002044

At least for WU around 700...760 Hz, but it's the same formula noted above, and it turns out the empirical analysis got the constant right to within ca 1% !!!! Good work, guys!!!!!!!

The results I get:

700 => 100.2
740 => 111.9
760 => 118.1

Which of course matches perfectly Astro's curve from empirical data.

Isn't this just wonderful? Math can be fun. Two approaches ending up at the same result. I love it when a formula works :-).

CU

Bikeman

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,125
Credit: 126,914,071
RAC: 17,292

RE: ..... So gridSize ~

Message 77726 in response to message 77725

Quote:

.....
So gridSize ~ freq^2 * 0.2453

The workunits I've seen so far contain about 1200 sky points, so the period would be roughly

p ~ gridsize/1200 ~ freq^2 / 4892 ~ freq^2 * 0.0002044


Yo! Way to go Bikeman! :-)

Well that makes my prior explanation of quadratic behaviour so much horse-rubbish. :-)

That is, it's not the template matching ( correlation/Doppler ) behaviour in the time domain per grid position, but simple 2D grid size/granularity ( latitude and longitude ) varying. Hmmm ... that cuts the phase space somewhat orthogonally to my guess.

Now if only I can work out the generator for the grid [ without looking at the source code ] .... :-)

I'm gonna look in those skygrid files ....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,206
Credit: 43,319,822,437
RAC: 44,569,638

RE: Here are a few of my

Message 77727 in response to message 77725

Quote:


Here are a few of my results:

700 ==> 120198
740 ==> 134316
760 ==> 141675

Here are some more to expand the range:

380 ==> 35458
450 ==> 49712
500 ==> 61356
540 ==> 71564
580 ==> 82548
640 ==> 100488
800 ==> 156967

Quote:

So gridSize ~ freq^2 * 0.2453

The workunits I've seen so far contain about 1200 sky points, so the period would be roughly

p ~ gridsize/1200 ~ freq^2 / 4892 ~ freq^2 * 0.0002044

At least for WU around 700...760 Hz, but it's the same formula noted above, and it turns out the empirical analysis got the constant right to within ca 1% !!!! Good work, guys!!!!!!!

Actually, good work archae86!! :).

I've looked at the expanded range of frequencies from 380 to 800 as listed above and in all cases used your procedure (assuming 1200 skypoints) to calculate the archae86 constant and the range of values is between 0.0002046 to 0.0002044.

I've actually seen values for skypoints between about 1197 and 1203 but I don't recall if the lower frequencies had slightly higher skypoints or not. Assuming a median value of 1200 seems to be a good idea.

Quote:
Isn't this just wonderful? Math can be fun. Two approaches ending up at the same result. I love it when a formula works :-).

Thanks for your enthusiasm! It's quite infectious :).

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,820
Credit: 3,287,287,356
RAC: 2,538,002

RE: RE: HOWTO for Windows

Message 77728 in response to message 77725

Quote:

Quote:
HOWTO for Windows is left as an exercise to the reader):

Luckily, I use weekly network backup to disks on peer machines. I don't scrub no longer present files so, I was able to find a fair few skygrid files.

As to the Windows-specific HOWTO:

Having collected four machines worth of historic skygrid files in one temporary directory, I did a bulk rename to .zip by:
ren *.zip *.dat
I did the linecounting in Textpad, an nice third party text editor available in a Windows version.

The line count shows in the message area at the bottom of the Textpad window on open. You can get it back later with ctl+F1.

Quote:

700 ==> 120198
740 ==> 134316
760 ==> 141675

here are some from my inventory:

freq lines  Bikeman_estimate
140 4830    4808
230 13008   12976
340 28398   28357
540 71564   71529
720 127158  127164
730 139718  130720
760 141675  141685(same in Windows as in Linux--comforting, that)
800 156967  156992

I don't know where you found the 1200 sky points number--was that by looking at stderr out? I spot-checked a few of mine, and found the skypoints entry on about the tenth line and the last of the monotonous increasing numbers to be one less. For the explicit line I spotted:

1199
1200
1204
1207

So are you suggesting that the real period should be discontinuous, with an increment interval of 10 Hz, which seems to be the increment interval of the skygrid files?

Are you also suggesting that if the project wanted to adjust for this variation in effort required that all they need consider is the line count in the skygrid file, and the skypoints number, both of which should be known at Work Unit issue time? Admittedly the magnitude of the variation is probably mildly architecture dependent, but if is the rest of the story, they could considerably closer than they are now.

I think I have 24 skygrid files for which I can provide line counts if anyone is interested. I've shown all I have above 700, which seems likely to be the interesting zone for current estimation work. Below that I just chose a few to backcheck the estimate.


Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 455,676,958
RAC: 41,467

RE: I don't know where

Message 77729 in response to message 77728

Quote:

I don't know where you found the 1200 sky points number--was that by looking at stderr out?

Yes, as you noted that's right at the top, here's an example for all:

2008-01-25 09:09:08.7510 [normal]: Start of BOINC application 'einstein_S5R3_4.20_i686-pc-linux-gnu'.
2008-01-25 09:09:09.0761 [debug]: Reading SFTs and setting up stacks ... done
2008-01-25 09:09:26.5733 [normal]: INFO: Couldn't open checkpoint h1_0699.40_S5R2__27_S5R3a_0_0.cpt
2008-01-25 09:09:26.5734 [debug]: Total skypoints = 1202. Progress: 0, 

Quote:

For the explicit line I spotted:

1199
1200
1204
1207

I guess the count varies a bit so that the grid fits exactly within n workunits that are of almost equal size but close to the target size of 1200.

Quote:

So are you suggesting that the real period should be discontinuous, with an increment interval of 10 Hz, which seems to be the increment interval of the skygrid files?

Yes, I think so.

Quote:

Are you also suggesting that if the project wanted to adjust for this variation in effort required that all they need consider is the line count in the skygrid file, and the skypoints number, both of which should be known at Work Unit issue time? Admittedly the magnitude of the variation is probably mildly architecture dependent, but if is the rest of the story, they could considerably closer than they are now.

The period is one thing, the amplitude another. I think the amplitude of the variation is more than only mildly architecture dependent. SSE vectorized versions should in theory show a much higher (relative) variation compared to non SSE version on the same platform & compiled with the same compiler (but an almost constant absolute variation). Systems with relatively fast SSE units but slow memory should show the greatest relative variation (again, in theory). Athlon XPs with high clock rates might be candidates.

Results calculated under heavy CPU load should also show higher variations even in CPU seconds, as other processes will "steal" CPU cache from E@H the app, which is not so bad for the first hotloop (the one that runs at almost constant speed for any WU) but should have a bigger impact on the second hotloop which produces the biggest part of the runtime variation. So even on a single machine, I expect the amplitude will vary quite a lot with system load.

CU
Bikeman

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 455,676,958
RAC: 41,467

RE: Now if only I can work

Message 77730 in response to message 77726

Quote:


Now if only I can work out the generator for the grid [ without looking at the source code ] .... :-)

I'm gonna look in those skygrid files ....

Cheers, Mike.

Hi!

From what I got from the visualization and IIRC some info from Bernd the grid is an isotropic grid, that means that all the skypoints have approx the same distance from each other, regardless if they are on the poles or the equator.

So if the workunit generator is a knife that is cutting the skysphere into pieces, the task is the same as slicing an orange using cuts parallel to the equatorial plane of the orange (well...you get the idea..), but with the constraint that the surface area of the peel is almost the same for all of the slices :-).

Note for those who joined the discussion later: The workunit generator does this perfectly, all the WU slices have the same "surface area" : ~1200 sky points. What causes the variation is the unanticipated effect that slices near the poles take longer to digest per skypoint than those near the equator.

CU
Bikeman

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,125
Credit: 126,914,071
RAC: 17,292

Surface area of the peel eh?

Surface area of the peel eh? Well, I'm not going to die wondering! I would have guessed that one ... NOT! :-)

Now I've had a quiet day at home, so I've thrown together a stand-alone web page. It deliberately uses very vanilla Javascript/CSS/HTML and probably will work fine with all recent browsers. Hopefully it may lead to the less well equipped ( or mathophobic ) members having a play with the numbers and algorithms as it applies to their own machines. I suggest you just download ( 20 KB ) a copy to your hard drive and open in your browser from there. Yes, you will need to enable Javascript for it to work.

As it no doubt has some faults, I would be very pleased to receive feedback on any aspect ( forum or PM ). :-)

I am finalising a method to yield estimates of peak runtime and variance, for a given work unit sequence, based upon supplied data ( sequence number & running time ) - taken from two or more completed work units using the same sky search frequency ( the more -> the better the fit ). I hope to obviate a lot of curve fitting angst for you. While it's a doddle with MATLAB/Mathematica, I'll have to jam it in to Javascript's basic maths semantics.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 455,676,958
RAC: 41,467

RE: Now I've had a quiet

Message 77732 in response to message 77731

Quote:

Now I've had a quiet day at home, so I've thrown together a stand-alone web page.

Wow!!! That's cool, and works like a charm! Thanks a lot!

CU

H-B

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,206
Credit: 43,319,822,437
RAC: 44,569,638

RE: here are some from my

Message 77733 in response to message 77728

Quote:

here are some from my inventory:
freq lines  Bikeman_estimate
140 4830    4808
230 13008   12976
340 28398   28357
540 71564   71529
720 127158  127164
730 139718  130720
760 141675  141685(same in Windows as in Linux--comforting, that)
800 156967  156992

I've saved all my skygrid files as well and in my previous message I also listed the number of lines in a selection of these files. I have over 70 of these files saved so I can easily count the lines in any of them if necessary.

In Bikeman's message he came up with the formula for the number of lines in a skygrid file (which he called gridsize) as being

gridsize = Freq^2 * 0.2453

I'll call the constant above the Bikeman constant to distinguish it from your constant in your cycle period formula.

Over a period of looking at many machines, I've noticed that a skygrid file is used for task frequencies over a range of 10Hz below the frequency in the skygrid filename. So you would presume that skygrid_0800 would apply for tasks from about 790 to 800. Well, not quite. I didn't pay much attention at the time but I'm reasonably certain that I saw an example of a task frequency that was just above the skygrid filename frequency.

I believe (but I'm not sure) that the range can actually go slightly above the value in the skygrid filename because a 2 decimal place frequency of 800.45 will actually round to 800 when expressed as an integer. So I think that the 800 skygrid file is used for frequencies between 790.46 and 800.45, ie 790 to 800 if the frequency is rounded to an integer.

If you then think again about the Bikeman formula for gridsize, and if you use the very topmost frequency in the range (eg 800.45 for skygrid_800) the Bikeman constant comes out to be pretty much exactly 0.245 for all skygrid files. I've done quite a bit of checking of several frequencies in a 10Hz range and it is the using of the topmost frequency (ie xxx.45 - actually xxx.433) that seems to give the very closest fit over a wide range of different skygrid files.

Below are some examples which are just a few chosen at random from the 70+ skygrid files I have in my collection. For each skygrid file shown, I've counted the lines to get the gridsize and then used the top frequency^2 times a Bikeman constant to get a calculated value as near as possible to the measured gridsize. The constraint was to make sure the constant was really constant for a wide range of skygrid frequencies. I assessed the fit by keeping the delta in the two gridsizes as small as possible:-

SkyGrid_File GridSz = Top Frequency^2 * Const = Calc gridsize (and Delta)

skygrid_0300 022119 = 300.45 * 300.45 * 0.245 = 22116.2 ( Delta = -2.8 )
skygrid_0380 035458 = 380.45 * 380.45 * 0.245 = 35461.8 ( Delta = +3.8 )
skygrid_0450 049712 = 450.45 * 450.45 * 0.245 = 49711.8 ( Delta = -0.2 )
skygrid_0540 071564 = 540.45 * 540.45 * 0.245 = 71561.1 ( Delta = -2.9 )
skygrid_0640 100488 = 640.45 * 640.45 * 0.245 = 100493.2 ( Delta = +5.1 )
skygrid_0700 120198 = 700.45 * 700.45 * 0.245 = 120204.4 ( Delta = +6.4 )
skygrid_0730 130718 = 730.45 * 730.45 * 0.245 = 130721.5 ( Delta = +3.5 )
skygrid_0800 156967 = 800.45 * 800.45 * 0.245 = 156976.4 ( Delta = +9.4 )

So if the Bikeman constant is 0.245 and there are 1200 skypoints then the archae86 constant in your cycle period formula would be 0.0002042 (0.245/1200).

Quote:
I don't know where you found the 1200 sky points number--was that by looking at stderr out?

That's exactly where I found it. A selection of recent ones seem to be pretty close to 1200.

Quote:
So are you suggesting that the real period should be discontinuous, with an increment interval of 10 Hz, which seems to be the increment interval of the skygrid files?

In view of the examples above, I'm guessing that the cycle period will be constant for the frequency range covered by each skygrid file rather than a continuous function. This would be very difficult to prove by observation since even at high frequencies like 800 -> 810 -> 820, the cycle period only changes from 132 -> 135 -> 138 and I imagine you would need very careful observation to spot that tasks of frequency 809.95 were on a cycle period of 135 whereas tasks of an almost identical frequency of 810.55 were on a 138 cycle period. Now there's an exercise for you when Bernd rereleases the 800+ tasks in a few days time :). By grabbing a swag of tasks when the new frequencies are first on the menu should allow you to get some long continuous runs :).

Of course there is every possibility that Bernd may have worked out a way to pay a more appropriate amount of credit in which case there might not be as much interest in the variable crunch times in a given sequence of tasks.

Quote:
Are you also suggesting that if the project wanted to adjust for this variation in effort required that all they need consider is the line count in the skygrid file, and the skypoints number, both of which should be known at Work Unit issue time? Admittedly the magnitude of the variation is probably mildly architecture dependent, but if is the rest of the story, they could considerably closer than they are now.

I certainly hadn't thought of the possibilities until you mentioned it but why not? You seem to have done all the work to allow the position in the sequence to be used to calculate some sort of correction to the server assigned credit. Even if this varies with platform, some sort of reasonable average would be far better than the current status quo of no correction at all. So I'd think it just requires a bit of extra code to do the job.

As always, thanks for your well thought out contributions. It's taken me a while to compose this (weekend interruptions) so I wonder who else has replied in the meantime :).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.