23 Jan 2008 18:49:36 UTC

Topic 193461

(moderation:

The odd waveshape of the cyclic CPU time dependency on sequence (task) number within a frequency which Richard Haselgrove noted many months ago has long suggested to me a sin wave with a fixed portion of each 2 Pi cycle omitted.

I have no algorithmic or physical basis for that, so this is just a case of trying out a parameterized model and twisting knobs until the curve fit looked good. Happily one of my hosts, a Conroe machine which has little other current use, gives a very tidy cycle, and processed enough results over a couple of cycles of one frequency for the model match to be pretty tight.

Before implementing the fraction cycle omission part with a mod function, the basic model is:

CPU time = a(1 + b*sin(c + d*2Pi*seq_no/period))

where:

a is a fixed offset, which would be the average CPU time over the cycle were it not the case that part of the cycle is omitted.

b is the amplitude the variation would take, were a full cycle present, expressed as a fraction of the "average" amplitude

c is an offset angle--to get the cycle to start in the right place

d is the fraction of a full cycle actually present (how much of the sin wave period is omitted).

period is the number of sequence numbers per cycle, as modeled by an estimate I posted previously:

period = .000206*frequency^2

With knob twisting, I eventually got this fit for one frequency on my pet host:

The real data had a mix of many frequencies from 643.50 to 644.95. For the model plot, I used 643.65 for all points.

My first attempt to use the model for application performance improvement estimation was for the Windows 4.26 ap running on my Coppermine Windows 98 SE host. I only had five 4.15 ap CPU times for that host, fortunately all on one frequency, and fortunately spread out over a substantial part of a cycle. I held fixed the c and d parameters, and the period estimate, and adjusted a and b to fit my five points (as others have suggested, there is good reason to expect b to be appreciably architecture dependent).

My current estimates for c and d are:

c (offset angle) 2.8 radians

d (fraction of cycle actually present) 0.6

Both these numbers must have precise correct values arising from the algorithm. I think these estimates are pretty close. These two parameters and the period calculation should be the same for all hosts. The a and b parameters are the ones to tweak for host-to-host fit.

One warning--on my Conroe Quad which gets far more use for non-Einstein purposes, the cycles are not nearly so tidy. If your host has inconsistent timings model matching will need more data than the five points I hope were enough for my Coppermine.

My Excel cell formulas to implement this are not very pretty, so I'll make another post in this thread sharing them in case any of you want to try this out.

Language

Copyright © 2022 Einstein@Home. All rights reserved.

## Cycle model (and new ap performance estimation)

)

This post provides Excel details of an implementation of the model I described in the first post of the thread.

I'd be really pleased if someone more adept at either math or Excel countered with a tidier implementation.

For ease in tweaking, I've allocated four spreadsheet cells to the current values of the model parameters called a,b,c,d in the first post. For this version, they are in the second row of columns I, J, K and L.

With Frequency in column E, and Sequence number (aka Task number) in column F, the cell formula for all cells in the CPU time estimate by model column is:

For my personal use, the heading names I've given the I,J,K,L parameters are:

I2--average

J2--amplitude

K2--offset_angle

L2--period_fraction

For those who speak Excel no more than I do, here are a few notes:

PI() is just what you say in Excel to get 3.14159...

The SIN function presumes an argument in radians

The Excel MOD returns the remainder resulting from dividing its first argument by its second argument.

Come to think of it, I probably should have put .000206 in a fifth parameter box. Although it is not system dependent, it came out of a curve fit that may need slight revision.

## Well if it's curve fitting,

)

Well if it's curve fitting, you're fit is good .... :-)

The computation ought to depend on search signal frequency, and two sky angles - longitude/azimuth and latitude/declination. I think the longitude is fixed for a given work unit series, which then traverses the latitudes for units within the series - the sequence number tracks that. Each individual unit has fixed latitude and longitude though. You expect the density of work units per ( solid ) angle to be less at the poles ( as discussed elsewhere ) and be accordingly about sinusoidal.

Now if you flip the plot over it's a |sin| shape ie. absolute value [ similiar to the initial diode rectification in AC to DC transformers ].

There's a base amount of computational overhead you can't escape which keeps the whole curve ( well ) above the horizontal axis.

Next you add in an amplitude to get the excursion ( max min ) right.

Then you add in a period component for the sine,

and finally bung in a phase to align it.

t = x - y|sin(c + 2*PI*z)|

so the base offset is x - y ( when the sine is unity ) = ~ 27K

the peaks are x ( when the sine is null ) = ~ 32K

with these definitions then z ( ~ the period of the graph's cycle ) is dependent upon the signal frequency we are searching for - and thus has quadratic character as discussed. So you could go for

t = x - y|sin(c + 2*PI*(e*(f^2)))|

where the e = .000206

and f = the signal frequency

x and y ought to be system dependent actually, as they scale/link the algorithmic complexity ( the sine part ) to the time domain. If that's so then a more slower machine will have a higher x value and probably the y too ( in proportion ). So your first equation, modulus aside :

CPU time = a(1 + b*sin(c + d*2Pi*seq_no/period))

reflects that better.

( NB the only RHS variable that has a common definition between mine and yours is the offset c. Also x = a and y = a*b ):

Cheers, Mike.

( edit ) I think the work units begin their sequence number at a declination of ~ zero, then 'lighten' toward the poles and get 'heavier' again as you come to the opposite equatorial area. I'm not sure why that give us ~ 2.75 cycles per sequence though. Interestingly 2.8 radians ~ 0.9*PI for what it's worth ... :-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

## 0.9*Pie == full belly

)

0.9*Pie == full belly ;-)

Again, props to you math nerds... Good job... I remember just enough of this to follow the general principle of it all, but without going back through higher math classes, I couldn't hope to do something similar... To give you an idea, I had Calculus w/ Analytical Geometry back in 1990. I have not used any higher math since then, confirming the "if you don't use it, you'll lose it" saying...

## I'd like to warmly

)

I'd like to warmly congratulate both Richard Haselgrove and Peter (archae86) for their efforts over many months in tracking down and so thoroughly documenting the periodicity in crunch times for the S5R3 run. The simple formula for calculating the sequence numbers per cycle has been very useful to me for trying to quickly compare tasks "before" and "after" an app change. Because I'm running many older machines (Tualatin PIII and AMD Athlon XP) with crunch times around +/- 1 day, it's difficult to get a comparison quickly.

What I've found useful is to take the first result with the new app and calculate exactly where it falls in the cycle of sequence numbers. Then by looking back through the list of tasks (same frequency) crunched with the old app, I can usually find an example with the same offset from the peak or trough.

For example, consider the frequency of 696.75. The period calculates precisely to 100. This means that the slowest crunch times are observed for tasks whose sequence numbers are 0, 100, 200, 300 ... and the fastest crunch times are for sequence numbers 50, 150, 250, 350 .... So if my "new app" task just happened to have a sequence number of 127 for example, I could compare it with any "old app" tasks that just happened to have a sequence number from the series 27, 73, 127, 173, 227, 273, 327, ... or something pretty close to one of those. It's amazing how often you can find a suitable match.

Because the peaks of the cycle are very sharp and the troughs are well rounded, it wouldn't be very smart to do a comparison right near the peak. On the other hand, picking a value near the bottom of the trough (eg say 40 to 60 in the above example gives crunch times that are pretty consistent. So a "new app" whose sequence was 43 should be pretty comparable with an "old app" whose sequence was 52, even though a 43 should really only match up with a 57. In this way I find I can get a pretty good idea of the improvement of the new app after collecting perhaps just a couple of "new app" results.

This quick and dirty method works well for me with lots of slow hosts and absolutely zero excel skills. All I really want to know is the best app to use for a particular platform. I just want to maximise my output for the machines I have crunching.

Cheers,

Gary.

## RE: I'd like to warmly

)

Absolutely! Beautifully reverse engineered! :-)

Cheers, Mike.

( edit ) Was it Karl Marx who said - "Give them pie in the sky, bye and bye ... " ? :-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

## As a testament to the

)

As a testament to the importance and usefulness of the work documented in this thread, I've just used the quick and dirty technique explained in what I wrote here just two posts earlier to examine this post which stated that the new app was giving no apparent speedup. After calculating the cycle period, I was able to find two tasks in the poster's tasks list that were pretty much at identical points on the cycle, one crunched with the new app and one crunched with the old. The comparison showed about a 10% speedup for the new app.

You will find my response to the original poster immediately following the above-linked post which gives the details of the two tasks which should be compared. If you're interested, check out the details whilst the two tasks are still online. The older one may disappear in the not too distant future.

EDIT:

If you do check out the two tasks, you will find that the "new app" one is 8 sequence numbers after the initial "0" peak whilst the "old app" one is 7 sequence numbers before the "112" peak and I'm deeming this to be "close enough" for ball park purposes. Of course, it's not ideal to be doing the comparison too close to a peak because of the steep slope of the curve there. However, I feel it's important to dispel myths about app performance and in this case it's pretty certain that the OP's assessment of no improvement is wrong.

Cheers,

Gary.

## RE: The computation ought

)

You've given a physical interpretation for an absolute value of sine wave component. I've done more curve fitting and am able to get just as good a fit that as before. In fact it is so good I'm going to save a few electrons and not post graphic evidence.

I left in a parameter to adjust the cycle rate, which wound up at 0.5, cancelling your 2, and the phase angle is not needed as it starts up at the right place (I had it set to 3.13 radians as I started typing this note--an alarm bell went off in my head), so my final preferred algebraic form is:

t = a(1 - b$|sin(PI*seq/period)|)

So with that, an Excel cell formula is:

=Q$2*(1 - R$2*ABS(SIN(PI()*F16/(S$2*E16^2))))

where these three parameters reside in the designated cells:

Q2 is the peak number of CPU seconds (System dependent, 32100 for my 3.006 GHz E6600 running 4.15 under Windows XP)

R2 is the fraction below peak at the valley (much less system dependent, but appreciably so, 0.148 for the same system)

S2 is the constant in the period expression, currently estimated at .000206

Thanks, Mike. Your quadratic suggestion for the period function a while back was quite crucial. You absolute value of sin suggestion here cleans up the function into a form which is much more physically plausible. And your general good cheer and encouragement is a pleasure.

Gary Robert's accounts of successful use of the period estimate are very gratifying. Helping people make better performance change estimates was my key goal here.

## RE: You've given a physical

)

I've had some more ideas on that, but it needs work, so I'll get back ...

ROFL!! :-)

yeah, that's nice and compact isn't it? I reckon it's perfect for our purposes here.

I guess one issue is the detail of the type of operations performed in the 'base' overhead vs. what's used in the fluctuating bit. Both are done in the same given processor so you'd think it ought to correlate ( ie. if y is proportional to x in my formula => that is achieved in yours by factoring the 'a' out in front ). [ Note please that I'm not implying the fixed vs variable timings are a result of separately compiled source code units - indeed I'm wondering about the contribution of different subsets of a processor's instruction set. ]. Hmmm .... that suggests running a work unit through a runtime profiler, actually ...

It would be interesting if anyone could comment on say, to what extent floating point code vs integral code has such general character ( one being proportional to the other for a given processor model ). Or maybe some other processor features that impact upon timing like pipelines or whatever. Bound to be complex in detail I suppose, but maybe there's some overall expectation. So could we think that the optimised hot loop ( discussed recently elsewhere ) could be measurable via the variable portion of the runtime ( modelled using 'b' as above )?

Back atchya ! - it's all good productive fun... :-)

Yup. Well, I've been fishing about to see if I can collect enough points for a curve fit or two myself ( plot the sequence number and runtime for a given machine and frequency choice ). I'll try to derive an 'a' and 'b' for that processor and then stump out a prediction for a work unit in progress ( preferably at another search frequency to really test it ) ... suck it and see! :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

## RE: t = a(1 -

)

Grumph. If you are reading this and are interested, you doubtless would not be misled by the typo, but that of course should have read:

t = a(1 - b*|sin(PI*seq/period)|)

## RE: RE: t = a(1 -

)

So what you're saying then, is that you've taken all the money out of the equation? :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal