One unfortunate characteristic of crunch times for tasks in the current S5R3 run is that they show a considerable cyclic variation of perhaps 20 - 30% or so. Because of this the casual observer may easily make the mistake of dismissing a new app as being worthless since the first few tasks from the new app may appear to be no better or perhaps even worse than the last tasks crunched with the old app, depending on what part of the cycle they come from.
Because of the skillful work done by people like archae86 and Richard Haselgrove, the cyclic nature is now thoroughly documented with the necessary information and equations to allow interested participants to calculate accurately, the true performance difference when a new app is deployed.
This is brilliant for people who take the trouble to collect the necessary "before" and "after" performance data and who set up something like an Excel spreadsheet (with appropriate cell formulae) to apply the cyclic crunch time equation to the processing of that data. But what about we "lesser mortals" :) who just want to crunch a couple of "new app" tasks and come up with a quick (but reliable) ball park figure for the true performance change?
The key is to take archae86's cycle period formula which first appeared (I think) in this post immediately after the idea of a log/log plot was suggested by Mike Hewson. I'm sure archae86 would have discovered the relationship himself, even if not prompted, and he certainly deserves the credit for all the hard work of data collection and analysis. A classical example of good scientific principles at work.
The key formula is that the cycle period (P) for the variation in crunch time for data of frequency (F) is:-
P = 0.000206 * F^2
The frequency to 2 decimal places is contained within the task filename, eg the frequency for the task h1_0737.50_S5R2__70_S5R3a_1 is 737.50Hz. If you just happened to be crunching this task then you would expect to see cyclic crunch times whose calculated period would be:-
P = 0.000206 * 737.5^2 = 112.04
Each task has a sequence number, eg 70 in the above example. Tasks which have a sequence number that is an integer multiple of the period (including zero) have the maximum crunch time, eg 0, 112, 224, 336, ... in the above example. Let's call these the peak tasks. Tasks whose sequence number falls half way between any two adjacent peaks will have the minimum possible crunch time, ie the fastest speed of crunching. In the above example such tasks would be numbered 56, 168, 280, 392, ... and we could call them trough tasks.
So if we wanted to reliably compare "new app" tasks with "old app" tasks, we just need to ensure two things:-
Because of the relatively steep slope of the cycle near to the peaks, it is best if possible to compare tasks which are close to a trough where the slope is much smaller.
As an example from the above data, if you had a "new" task whose sequence number was 48, you could compare it precisely with any "old" task whose sequence number just happened to be any of 48, 64, 160, 176, 272, 288, 384, 400, etc. Because the new task is fairly close to a trough, you would get a reasonable comparison if the old task was within +/- several sequence numbers of those listed for the precise comparison. So the new 48 task could be reasonably matched against an old 164 task if you just happened to have such a beast in your old results list.
So, if you think the above is all a load of crap, just read it all again more slowly and carefully and it should become understandable. If you reckon you thoroughly understand it all then here are three little tests for you.
TEST 1.
Using the above task sequence whose frequency is 737.5, you have been sent more data files whose frequencies are 737.75, 737.80, 737.85, etc. Could you still do reliable comparisons between appropriate new frequency tasks and old tasks from the 737.5 sequence?
TEST 2.
You have an old task whose sequence number was 120 and a new task whose number was 104. Could these two be reliably compared? What about if the new task was 216? What about if the new task was 108?
TEST 3.
Your old task was 109 and your new task was 106. Would you get an accurate assessment by comparing those two?
ANSWERS.
1. Y
2. Y, Y, N
3. N
If you don't agree with the answers, please explain why :).
Cheers,
Gary.
Copyright © 2024 Einstein@Home. All rights reserved.
[S5R3/R4] How to check Performance when Testing new Apps
)
I used that formula to chart it for frequencies from 1-1000. A visual aid might be nice.
RE: I used that formula to
)
Hi!
a crucial factor (possibly the only major one) for the period should be the size of the sky-grid used for a range of frequencies. This is something rather easy to find out, I guess (for Linux, the HOWTO for Windows is left as an exercise to the reader):
1) Go to the ~/BOINC/projects/einstein.phys.uwm.edu/ directory and copy the file(s) skygrid_XXXXHz_S5R3.dat to some other directory , say /tmp . XXX denotes the frequency range.
2) go to /tmp . DO NOT execute the following steps in the BOINC directory, do it in the /tmp folder!!!!!
3) rename skygrid_XXXHz_S5R3.dat to skygrid_XXXHz_S5R3.dat.zip
4) unzip skygrid_*.dat.zip, this will create a new (text) file skygrid_XXXHz_S5R3.dat
5) count the number of lines in the text file, e.g.
wc skygrid_XXXHz_S5R3.dat
Do this for all the E@H hosts you have, then try to correlate the frequency ranges to the number of lines. Should produce a similar graph as the one above, I guess. Should follow a nice quadratic function. Hmmm... lets see:
Here are a few of my results:
700 ==> 120198
740 ==> 134316
760 ==> 141675
So gridSize ~ freq^2 * 0.2453
The workunits I've seen so far contain about 1200 sky points, so the period would be roughly
p ~ gridsize/1200 ~ freq^2 / 4892 ~ freq^2 * 0.0002044
At least for WU around 700...760 Hz, but it's the same formula noted above, and it turns out the empirical analysis got the constant right to within ca 1% !!!! Good work, guys!!!!!!!
The results I get:
700 => 100.2
740 => 111.9
760 => 118.1
Which of course matches perfectly Astro's curve from empirical data.
Isn't this just wonderful? Math can be fun. Two approaches ending up at the same result. I love it when a formula works :-).
CU
Bikeman
RE: ..... So gridSize ~
)
Yo! Way to go Bikeman! :-)
Well that makes my prior explanation of quadratic behaviour so much horse-rubbish. :-)
That is, it's not the template matching ( correlation/Doppler ) behaviour in the time domain per grid position, but simple 2D grid size/granularity ( latitude and longitude ) varying. Hmmm ... that cuts the phase space somewhat orthogonally to my guess.
Now if only I can work out the generator for the grid [ without looking at the source code ] .... :-)
I'm gonna look in those skygrid files ....
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: Here are a few of my
)
Here are some more to expand the range:
380 ==> 35458
450 ==> 49712
500 ==> 61356
540 ==> 71564
580 ==> 82548
640 ==> 100488
800 ==> 156967
Actually, good work archae86!! :).
I've looked at the expanded range of frequencies from 380 to 800 as listed above and in all cases used your procedure (assuming 1200 skypoints) to calculate the archae86 constant and the range of values is between 0.0002046 to 0.0002044.
I've actually seen values for skypoints between about 1197 and 1203 but I don't recall if the lower frequencies had slightly higher skypoints or not. Assuming a median value of 1200 seems to be a good idea.
Thanks for your enthusiasm! It's quite infectious :).
Cheers,
Gary.
RE: RE: HOWTO for Windows
)
RE: I don't know where
)
Yes, as you noted that's right at the top, here's an example for all:
2008-01-25 09:09:08.7510 [normal]: Start of BOINC application 'einstein_S5R3_4.20_i686-pc-linux-gnu'. 2008-01-25 09:09:09.0761 [debug]: Reading SFTs and setting up stacks ... done 2008-01-25 09:09:26.5733 [normal]: INFO: Couldn't open checkpoint h1_0699.40_S5R2__27_S5R3a_0_0.cpt 2008-01-25 09:09:26.5734 [debug]: Total skypoints = 1202. Progress: 0,
I guess the count varies a bit so that the grid fits exactly within n workunits that are of almost equal size but close to the target size of 1200.
Yes, I think so.
The period is one thing, the amplitude another. I think the amplitude of the variation is more than only mildly architecture dependent. SSE vectorized versions should in theory show a much higher (relative) variation compared to non SSE version on the same platform & compiled with the same compiler (but an almost constant absolute variation). Systems with relatively fast SSE units but slow memory should show the greatest relative variation (again, in theory). Athlon XPs with high clock rates might be candidates.
Results calculated under heavy CPU load should also show higher variations even in CPU seconds, as other processes will "steal" CPU cache from E@H the app, which is not so bad for the first hotloop (the one that runs at almost constant speed for any WU) but should have a bigger impact on the second hotloop which produces the biggest part of the runtime variation. So even on a single machine, I expect the amplitude will vary quite a lot with system load.
CU
Bikeman
RE: Now if only I can work
)
Hi!
From what I got from the visualization and IIRC some info from Bernd the grid is an isotropic grid, that means that all the skypoints have approx the same distance from each other, regardless if they are on the poles or the equator.
So if the workunit generator is a knife that is cutting the skysphere into pieces, the task is the same as slicing an orange using cuts parallel to the equatorial plane of the orange (well...you get the idea..), but with the constraint that the surface area of the peel is almost the same for all of the slices :-).
Note for those who joined the discussion later: The workunit generator does this perfectly, all the WU slices have the same "surface area" : ~1200 sky points. What causes the variation is the unanticipated effect that slices near the poles take longer to digest per skypoint than those near the equator.
CU
Bikeman
Surface area of the peel eh?
)
Surface area of the peel eh? Well, I'm not going to die wondering! I would have guessed that one ... NOT! :-)
Now I've had a quiet day at home, so I've thrown together a stand-alone web page. It deliberately uses very vanilla Javascript/CSS/HTML and probably will work fine with all recent browsers. Hopefully it may lead to the less well equipped ( or mathophobic ) members having a play with the numbers and algorithms as it applies to their own machines. I suggest you just download ( 20 KB ) a copy to your hard drive and open in your browser from there. Yes, you will need to enable Javascript for it to work.
As it no doubt has some faults, I would be very pleased to receive feedback on any aspect ( forum or PM ). :-)
I am finalising a method to yield estimates of peak runtime and variance, for a given work unit sequence, based upon supplied data ( sequence number & running time ) - taken from two or more completed work units using the same sky search frequency ( the more -> the better the fit ). I hope to obviate a lot of curve fitting angst for you. While it's a doddle with MATLAB/Mathematica, I'll have to jam it in to Javascript's basic maths semantics.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: Now I've had a quiet
)
Wow!!! That's cool, and works like a charm! Thanks a lot!
CU
H-B
RE: here are some from my
)
I've saved all my skygrid files as well and in my previous message I also listed the number of lines in a selection of these files. I have over 70 of these files saved so I can easily count the lines in any of them if necessary.
In Bikeman's message he came up with the formula for the number of lines in a skygrid file (which he called gridsize) as being
gridsize = Freq^2 * 0.2453
I'll call the constant above the Bikeman constant to distinguish it from your constant in your cycle period formula.
Over a period of looking at many machines, I've noticed that a skygrid file is used for task frequencies over a range of 10Hz below the frequency in the skygrid filename. So you would presume that skygrid_0800 would apply for tasks from about 790 to 800. Well, not quite. I didn't pay much attention at the time but I'm reasonably certain that I saw an example of a task frequency that was just above the skygrid filename frequency.
I believe (but I'm not sure) that the range can actually go slightly above the value in the skygrid filename because a 2 decimal place frequency of 800.45 will actually round to 800 when expressed as an integer. So I think that the 800 skygrid file is used for frequencies between 790.46 and 800.45, ie 790 to 800 if the frequency is rounded to an integer.
If you then think again about the Bikeman formula for gridsize, and if you use the very topmost frequency in the range (eg 800.45 for skygrid_800) the Bikeman constant comes out to be pretty much exactly 0.245 for all skygrid files. I've done quite a bit of checking of several frequencies in a 10Hz range and it is the using of the topmost frequency (ie xxx.45 - actually xxx.433) that seems to give the very closest fit over a wide range of different skygrid files.
Below are some examples which are just a few chosen at random from the 70+ skygrid files I have in my collection. For each skygrid file shown, I've counted the lines to get the gridsize and then used the top frequency^2 times a Bikeman constant to get a calculated value as near as possible to the measured gridsize. The constraint was to make sure the constant was really constant for a wide range of skygrid frequencies. I assessed the fit by keeping the delta in the two gridsizes as small as possible:-
skygrid_0300 022119 = 300.45 * 300.45 * 0.245 = 22116.2 ( Delta = -2.8 )
skygrid_0380 035458 = 380.45 * 380.45 * 0.245 = 35461.8 ( Delta = +3.8 )
skygrid_0450 049712 = 450.45 * 450.45 * 0.245 = 49711.8 ( Delta = -0.2 )
skygrid_0540 071564 = 540.45 * 540.45 * 0.245 = 71561.1 ( Delta = -2.9 )
skygrid_0640 100488 = 640.45 * 640.45 * 0.245 = 100493.2 ( Delta = +5.1 )
skygrid_0700 120198 = 700.45 * 700.45 * 0.245 = 120204.4 ( Delta = +6.4 )
skygrid_0730 130718 = 730.45 * 730.45 * 0.245 = 130721.5 ( Delta = +3.5 )
skygrid_0800 156967 = 800.45 * 800.45 * 0.245 = 156976.4 ( Delta = +9.4 )
So if the Bikeman constant is 0.245 and there are 1200 skypoints then the archae86 constant in your cycle period formula would be 0.0002042 (0.245/1200).
That's exactly where I found it. A selection of recent ones seem to be pretty close to 1200.
In view of the examples above, I'm guessing that the cycle period will be constant for the frequency range covered by each skygrid file rather than a continuous function. This would be very difficult to prove by observation since even at high frequencies like 800 -> 810 -> 820, the cycle period only changes from 132 -> 135 -> 138 and I imagine you would need very careful observation to spot that tasks of frequency 809.95 were on a cycle period of 135 whereas tasks of an almost identical frequency of 810.55 were on a 138 cycle period. Now there's an exercise for you when Bernd rereleases the 800+ tasks in a few days time :). By grabbing a swag of tasks when the new frequencies are first on the menu should allow you to get some long continuous runs :).
Of course there is every possibility that Bernd may have worked out a way to pay a more appropriate amount of credit in which case there might not be as much interest in the variable crunch times in a given sequence of tasks.
I certainly hadn't thought of the possibilities until you mentioned it but why not? You seem to have done all the work to allow the position in the sequence to be used to calculate some sort of correction to the server assigned credit. Even if this varies with platform, some sort of reasonable average would be far better than the current status quo of no correction at all. So I'd think it just requires a bit of extra code to do the job.
As always, thanks for your well thought out contributions. It's taken me a while to compose this (weekend interruptions) so I wonder who else has replied in the meantime :).
Cheers,
Gary.