Solving Several Equations OK. Recall that multiplying an equation by the same constant preserves LHS = RHS, ditto for adding/subtracting two equations. Combining that behaviour I can add a multiple of one equation to another and still preserve the truth/equality of both. This is the key procedure in what follows. Lets look at two equations in two unknowns :

[pre] 2x + 3y = 4 ..... eq_1

3x + y = 3 ..... eq_2[/pre]

Thus we want to find the particular values of x and y, if any, that will make both of these equations true simultaneously. Subtract 3/2 * eq_1 from eq_2 :

[pre] 3x - (3/2)2x + y - (3/2)3y = 3 - (3/2)4

then simplify :

3x - 3x + y - (9/2)y = 3 - 6

0x + (-7/2)y = -3

y = (-3)(-2/7) = 6/7[/pre]
thus by a suitable choice of multiplier ( you can relate the factor of 3/2 to the coefficients of x in each equation ) for eq_1 I have converted to a system of one equation + one unknown and - since I didn't come across either 0 = 0 or something like 0 = 5 along the way - I can solve for that remaining variable. With that variable I go back to eq_1 and substitute ( place the value of the variable where the symbol is ) to get the other one :

[pre] 2x + 3(6/7) = 4

simplify :

2x = 4 - 18/7 = 28/7 - 18/7 = 10/7

x = 5/7[/pre]
Then to be pure of heart one ought to ( consistency ) check by putting both x = 5/7 and y = 6/7 into eq_2 :

[pre]3(5/7) + 6/7 = (15 + 6)/7 = 21/7 = 3 OK!![/pre]
[ I've deliberately chosen non-neat number solutions to emphasise that it is the indexing which is discrete not the values - the count of the number of variables is not 1.5, nor do we have PI equations. Integers greater than zero only. ]

The general process is forward elimination followed by back substitution. If I had three equations in three unknowns I would, say, suitably combine eq_1 and eq_2 to eliminate a variable then appropriately combine eq_1 and eq_3 to likewise eliminate that same variable : now I have two equations in two unknowns ( the remaining two variables that I didn't eliminate ). With that reduced system of two equations I then .... etc.

If one equation is actually a combination of the others 'in disguise' or dependent - there is some linear combination that connects the lot - then I won't succeed. In the more general case where I have more equations than unknowns or vice-versa then I'll have to usually cope with the likes of 0 = 4 ( whoops, I have no solutions whatsoever ) or 0 = 0 ( thus infinitely many solutions ) more often. We are taking it as read that the variables are fundamentally independent ( x and y axes are orthogonal ), or that ab initio I can chose x independently of my choice of y ( prior to combining/restricting them within equations ). The issue of equations is then whether I have too many or not enough with respect to that variable set, taking into account whether or not some of the original equations are really morphed versions of others. For instance :

[pre] x - 2y = 6

-3x + 6y = -18[/pre]
actually represent the same relationship between the variables x and y. Any (x, y) pair that makes one equation true also matches the other ( ie. there are NO (x, y) pairs that fit one but not the other ).

Anyway, the key point is that I am recursing the same general method but with progressively smaller systems. So if I start with some large system, I will either eventually get to one solvable equation with one unknown OR fall over somewhere/sometime because of 0 = 0, 0 = 1 type of stuff. If successful I would then 'wind back out' of the recursion to get the full set of variable values which, by the mode of derivation, will satisfy ( make true ) each and every equation that I originally began with. Mind you, is anyone up for one second of LIGO photodetector data ie. recursion-inwards/winding-outwards of some 16384 levels ?!? :-)

Err .... probably not ..... but for the moment let's frame the problem differently. With matrices. My 2 by 2 system above can be expressed in matrix language without any loss of validity :

this is one 2 x 1 matrix on the LHS equal to another 2 x 1 matrix on the RHS. So that means the corresponding entries of each must be equal. Hence the two original equations have been internalised into this matrix equation but nothing has really changed otherwise. Are we cool with that? It's what you might call the 'row viewpoint' : two lines in the x-y plane intersecting. Or not. No solution corresponds to parallel separated lines. One solution - the case I am also calling 'solvable' - is two lines intersecting at a point. The third case is one line completely overlying the other - infinitely many solutions or if you like two equations describing the very same line :

From the above matrix representation there are more equally valid ways to proceed. A matter of perspective really.

Here is the 'transform viewpoint' :

where I have taken the matrix equation and expressed the LHS as a product of two matrices. I have factorised one matrix into two, much like 12 = 3 x 4. So now I have a 2 x 2 matrix multiplying a 2 x 1 matrix and so this can legitimately equal ( at least on matrix size grounds ) the single 2 x 1 matrix on the RHS. Frequently matrix stuff is expressed as the geometry of multi-dimensional vector spaces. In this mode of description we could say the 2 x 2 matrix is a transform or function acting on the vector [x y] to give the result [4 3]. The beginning and ending vector sizes are the same here, but generally we'd be going from an n dimensional space to an m dimensional space by using an m x n matrix ( remember an m x n matrix multiplied by an n x 1 matrix gives an m x 1 matrix ). So the question/issue becomes 'what vector should I start with such that it would be transformed to the specified RHS by the given matrix?' This is in the vein of simpler algebra : I could ask 'what number multiplied by 4 gives the result of 12?'

Now the 'column viewpoint' :

here again I have left the RHS alone, but split up the LHS 2 x 1 matrix into the sum of two 2 x 1 matrices and then for each of those factored out the x or y variable. In geometrical language I have a linear combination of two vectors summing to a third vector - all within the same vector space. I want to find those specific values of the coefficients x and y that correctly achieve the desired RHS sum :

which fails if the first two vectors are really co-linear or otherwise dependent ( including situations where one or both is the zero vector ) and the third is not along that common line. Now these three approaches/viewpoints - row, transform and column - are mathematically identical in that the total (x, y) solution set is unchanged regardless.

One advantage of the column approach ( thank you Gilbert Strang ) is that visualising is easier to extend to higher dimensions. So if I have 17 equations in 17 unknowns then row-wise I have to envision the intersection of ..... err, what exactly? Hyper-planes? Hype-hemi-cubes? Hyper-fizzwidgets? Whereas with the column view then increasing the dimensions gives me more terms in the linear combination, and each term is a vector with more components. Not as hard to get one's head around and translate the lessons from lower dimensions.

Now the full technique for arbitrary dimensions/variables/equations is known as the Gauss ( see? The guy was into everything ) or row echelon reduction via elimination, Gauss-Jordan or reduced row echelon form for elimination plus back substitution. Plus other monikers no doubt. They are mechanisations of the above principles. I'm not going to describe them as we don't use it per se. Just remember that these algorithms give the N^2 computational cost for the DFT that we are avoiding by using the FFT.

Cheers, Mike.

@Martin : [Quite an alternative ..... behind the mathematical hieroglyphs!]

That is the major rub with learning maths : to see beyond/behind the concise but precise notations that are the necessary evil of the topics.

I have made this letter longer than usual because I lack the time to make it shorter.Blaise Pascal

The Plan Of Attack So what do we want from the DFT/FFT when I express the problem in matrix terms? I want to start from this equation (A) :

and go to this equation (B), if all goes well :

Looking first at the LHS in (A) : the steps in the ( Gauss ) elimination phase have the effect of clearing to zero the elements in the lower left side of the F matrix. Then the ( Jordan ) back-substitution phase has the effect of clearing to zero the elements in the upper right side of the matrix. But the leading diagonal is preserved with non-zero values ( that Jordan converts to ones ):

Most importantly each and every individual manipulation of the matrix/system ( the usual 'atomic' computation is a multiple of one row subtracted from another ) in either phase can be represented as the product of some 'conversion' matrix times a matrix already representing the existing system state, which thus transitions to the new state of the system and ..... lather, rinse & repeat. Examples of 'conversion matrices' with 3 x 3 are :

but they will only do those specified row operations if they multiply on the left side of our current/target matrix ( they would alter entire columns at a time if done on the right ). The principle to observe in constructing these conversion matrices for some operation of interest is : take the identity matrix of whatever size and apply the desired operation to it.

Anyhows the whole Gauss-Jordan rigamarole ( the algorithm I didn't describe in detail ) can be written as :

where you see I will eventually get the identity matrix : which then multiplies my vector x of as yet unknown co-efficients to give just that vector alone on the LHS. There could be millions of steps on the LHS !! However if I faithfully repeat each and every step in exactly the same order but acting on the RHS I convert the vector b ( originally our set of physical measurement values ) to the desired Fourier co-efficient values which are appropriate for that series of measurements. As I have not broken my pledge to maintain equality ( LHS = RHS ) then whatever set of numbers constitute the components of the vector x in equation (A) are precisely the ones constituting x in equation (B).

The FFT will take a similiar tactic ( but different actual steps ) .... to find a legitimate sequence of LHS conversions ultimately giving the identity, apply precisely the same to the RHS and out pops the transform vector with a fraction of the effort. SO FINALLY we will next begin to construct the Fourier matrix and study the patterns within. :-) !!

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter.Blaise Pascal

Meet The Fouriers Here's the first few Fourier matrices with sizes being some power-of-two ....

... which are involved in the matrix versions of the following transform equations :

where I apologise for shifting the start of the matrix indices to zero. The formulae turn out nicer and thus it will be easier to explain certain patterns. I've deliberately used matrix/array notation for f and C, using square brackets [] that is, to emphasise their discrete nature. Examining F_N for N = 2^(something) turns out to be crucial for the FFT. We will find that the F_8 matrix can be expressed in terms of F_4 which in turn can be written as involving F_2 .... etc. So if I have say F_1024, that is 1024 = 2^10 data points, then I will progressively recurse the problem to successively lower powers : 1024 -> 512 -> 256 -> 128 -> 64 -> 32 -> 16 -> 8 -> 4 -> 2 . So here is where the (N/2)*log_base2_N performance comes in, as to get from 1024 to 512 via Gauss-Jordan I have to do 1024 - 512 = 512 steps. Now I can do that in one step!! :-) :-)

If you want to use the FFT for some prime number like N = 37 then you're out of luck. You can involve the FFT where N has some factors not equal to two, but I'm only going to describe the generic Cooley-Tukey main case. Please note the convention I have used for the sign in the exponential terms - positive for the frequency domain -> time domain conversion - so that in strict terms the matrices above refer to the 'back-transform' and not the determination of Fourier transform co-efficients. I do that to be able to write i^some_power rather than writing (-i)^some_power in my demonstration .... relax, it's one of those convention things with the logic of the FFT being the same either way. :-)

So why do these matrices represent the Fourier equations? Let's check it our for N = 4. Firstly :

so i is the factor that will be raised to the power of s * t as we go along the rows and down the columns of F_4. Lets evaluate the Fourier transform equation for that case :

so I've looked at the signal function f for t = time = 0, 1, 2 and 3 : then evaluated what each means over the sum from s = 0, 1, 2 and 3. Obviously if s or t is zero then their product is zero and exp(0) = 1. That's why there are ones along the top and left edges of the Fourier matrices. Thus specifically :

Thus many of you may now see the reason for the possibly quirky seeming definition of matrix multiplication. Any row of F_4 times ( inner product with ) the C matrix is equal to the corresponding entry of the f matrix :

You can think of matrix stuff as a way of abstracting the numbers from the specific context, while retaining the core relationships between the various quantities. This is why simultaneous equations just beg for a matrix approach, or if you like that matrices were invented to solve that set of problems.

Now that annoying looking factor of 1/SQRT(N) out front is there for a special reason. I'm using what is known as the unitary representation of the F matrices. If I do the inner product of the first row with the first column of F_4 then I get :

But if I do the inner product of the first row with the second column of F_4 then I get :

... in other words these unitary matrices have orthonormality applying within them. Inner product with the same row/column index is 1, but with different row/column indices is 0. Remembering that when we do the inner product for complex matrices we use the Hermitian conjugate, with the 1/SQRT(N) in this case scaling matters to 1 ( because SQRT(N)*SQRT(N) = N ). It turns out that the inverse of a Fourier matrix is it's Hermitian conjugate, and you will also note that a Fourier matrix is symmetric ( swapping rows for columns gives the same matrix, or the matrix is it's own mirror image along the leading diagonal ) :

Beware : there is a related and potentially confusing usage of the word 'Hermitian' as in : 'the matrix W is Hermitian'. This means that W is equal to it's Hermitian conjugate. The Fourier matrices are symmetric - equal to their own transposes - but are generally not equal to their Hermitian conjugates. Thus F_N has an Hermitian conjugate. But F_N is not equal to that conjugate, and is thus not an Hermitian matrix.

OK. That'll do for today. Having shown to you and explained how Fourier matrices are built I still have some further justifying to do. Next time .... discretising data records ..... the problem of signal aliasing ..... and The Roots of Unity. :-) :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter.Blaise Pascal

Slice N Dice For a given/chosen data record, within which lies some waveform of interest, we subdivide the time axis into N equal time intervals. Hypothetically for example, a LIGO photodetector setup could give you 16384 of such equal intervals per second, thus with each interval being 1/16384 th of a second long. If I'm looking for a 512Hz GW signal in that data set then that would refer to 16384/512 = 32 data points per signal cycle. Fortunately this turns out to be way more than one needs for that signal frequency !! As a general rule of thumb* the required number of samples N to maintain/establish fidelity goes like :

N = 2 * B * L

or

sampling rate = N/L = 2 * B

where B is the desired 'bandwidth' or frequency-range-width you want to examine and L is the length of the time interval that you have available data for. Clearly if you have a longer interval of time then to keep the fidelity constant one has to have proportionally more data points. For our purposes one can think of the bandwidth as from zero Hertz to the frequency of the highest pitch signal that you're after. Naturally you're welcome to have more than N, but if you have less then you can't confidently reconstruct the signal from the transform while also wanting certain frequencies preserved ( there's a long-ish mathematical path showing that if you start with say M 1, plus however many consecutive higher harmonics needed to make up the set of N. This is why I have been mostly talking of square matrices, those having some hope of being ( left and right ) invertible.

The Nth Roots of Unity Let's have a look at this lad that gets relentlessly multiplied within the Fourier matrix :

which for conversation I've given the name of q with an N subscript to remind us that it depends on N. You can immediately see that it is a complex number with magnitude of 1 and phase 2*PI/N. As 2*PI is a full circuit of arc angle measure then this phase is 1/N th of that. Following through on the algebra you see that q_N is the Nth root of 1, or :

where I apologise for using N as both super- and sub- script. The upper is a power to be taken, the lower is a reminder. Graphically :

so it's no more mysterious than saying 'if I rotate 1/N th of a circle at a time, then I have to do N such rotations to achieve one circuit'. However this is also true :

meaning that if I take a multiple of the above fundamental Nth root = 2*PI/N then that will also be an Nth root. So 'if I rotate m/N th of a circle at a time, then when I do N such rotations I will achieve m such circuits, which is the same as one circuit'.

So how many distinctNth roots of 1 are there? N of them of course ..... and I choose to index them from 0 through to N - 1 ... :-)

I know this all sounds a bit suspect/obtuse, but it is quite valid, and indeed the FFT is going to hinge/pivot on there being a spot of duplicity here !! :-) :-)

Cheers, Mike.

* If you look closely at this, it is ( yet another version of ) The Uncertainty Principle in disguise. Indeed there are terrific and deep correspondences b/w quantum and information theories ...

** The exact rate I think is 44.1 KHz which is more of a technical point or historical precedent related to initial analogue to digital music conversions. Under-sampling of music with respect to playback intentions will cause rotten distortions, as high frequencies are returned that weren't there in the original : being inserted as an aliasing artifact of lower -> higher frequency components.

Also your favorite video game - Need For Greedy Auto Theft - has to cope with equivalent problems as the regularities/granularities differ between the hardware viewing device ( generally discrete/raster ) and the software 'model'.

Plus on these simple calculations/grounds the 16384Hz sampling rate at LIGO well covers signals up to 8192Hz. I don't think we at E@H are going anywhere near that. There are LIGO groups which deal with sharper/transient stuff, thus not our continuous waves, and they will certainly need the extra detail in the records to examine higher derivatives/frequencies.

*** Well 'who aliases who' is a question of perspective - what do we consider as 'original' or 'real' - but the math is there either way.

**** This is one of those leaps in the explanation because (a) the theorems are fiddly (b) you'll be fine without it and (c) life is too short !! :-)

I have made this letter longer than usual because I lack the time to make it shorter.Blaise Pascal

Cooley-Tukey Magic A factorisation of the Fourier matrix was outlined in the 1960's ( but thought up in the Bell labs in the 50's ?? ) where the factors could be expressed in terms of one power of 2 lower* :

.... here I've used a 'block matrix' notation which is a way of outlining a sub-matrix within a bigger matrix. You don't actually change any matrix entries but simply mark out suburbs within the larger city. It's a complicated looking sucker for sure, but when fully recursed it ultimately achieves the separation of variables that we seek! The reason why this factorisation works is simply that a double rotation ( on the Argand plane ) of a higher frequency component, say exp[i*PI/4], is equal to a single rotation of one which is half the frequency eg. exp[i*PI/2] - this being the 'duplicity' that I referred to earlier.

The above product of three matrices on the right actually contains the same information as the single one on the left. But it's more spread out, hence all the extra zeroes popping up in the identity I_N, the diagonal D_N, the zero matrix O_N and the permutation matrix. For all that apparent extra mess you do get to see clearly where the recursive elements lie : the exact relationship b/w F_2N and F_N. Think of it like the mathematical version of a frog dissection in biology class !! :-)

Here is the worked out case for F_4 to F_2 :

What is quite important are all those areas in the matrices which contain zeroes. In effect these somewhat de-couple what is happening between the 'burbs' eg. in block matrix notation :

Essentially it's the D_N times the F_N which recovers the correct odd powers of the exponential factors for F_2N. F_N already had the even powers required for F_2N ( look at the roots of unity circle again : 'PI/2 is not only a fourth root of 1, it is also and eighth root too ...' ). The permutation matrix applied on the right-most side sorts the columns into the correct order for equality with F_2N.

Lets look at what happens to these three main matrix factors as we descend from some large N ( a power of 2 ) down to 1. Firstly the left-most matrix :

next the middle matrix :

in matrix terminology they become more sparse - a greater fraction of contained zeroes - which certainly simplifies their handling. The middle matrix goes towards the identity -> because F_1 = [1] ! The right most matrix doesn't become sparser - it 'weaves' or permutes the rows of whatever matrix lies to it's right :

but since this factorisation exists within a situation like this :

then one can also view the permutations as acting on the columns of any matrix lying to it's left. But the FFT is usually described as first even/odd permuting the input vector ( b in this instance ) before applying the bulk of the transform. Therein lies an interesting pattern of behaviour** :

In the real world of application programming one doesn't have to use a matrix representation internal to the code. However an 'as if' scenario*** would apply whereby the output of a given ( FFT ) library call reflects what I've roughly outlined here - the Mickey Mouse Version. I really don't know the FFT/API syntax but I could well imagine a high level call involving two pointers ( input and output array/buffer/vector locations ) and a size ( what value of N applies ). There would be tremendous scope for hard-coding/pre-computation if one was confident in advance what size transform was applicable. What I should have emphasised earlier is that an unfactored Fourier matrix never contains zeroes, and more so is always invertible ( another theorem states that each matrix factor of that must therefore be invertible too ... )

However I could imagine some difficulties for a given length/type of number representation within an implementing device - specifically the upper & lower limits upon the chosen floating point representation. For instance : would we want 2*PI/N for some large N ? If so, when would that 'bottom out' and/or lose sufficient accuracy due to the representation's bit length? I'm sure there's a bunch of some library code that manages the exceptions on that .....

So flick-flacking through 16384 data points doesn't seem quite so bad now ?? :-) :-)

We are pretty well at the 'heart of the maze' that I've intended/aimed some months ago. I haven't finished yet, as there's some other interesting stuff to put on display ... generally rather easier material by the way, as we've done some pretty hard core topics ... and you can tell by now that I could go on indefinitely! If you've followed my intellectual pathway ( to any degree !! ) then you've seen my interpretation/detail upon the raison d'etre for E@H - crunching of numbers on a massive/supercomputer scale, an humongous mathematical mill if you like, to assist the scientists in teasing out any patterns within the real world of scientific observations. In this case either gravitational or electromagnetic signals.

Cheers, Mike.

( edit ) A particular reason why the Fourier matrices are invertible is that the columns ( or rows for that matter ) are linearly independent. This goes way back to our discussion about an orthonormal basis for a vector space.

( edit ) I didn't want to imply that a matrix without zeroes was invertible for that reason alone. For instance the identity matrix has plenty of zeroes with columns ( rows ) that are an orthonormal basis. I've been reading about other choices of basis functions - Haar wavelets - that give quicker convergence onto sharp features than Fourier. Haar's split the shapes/functions into what is average vs what is different and hence hone in better on steps and jaggy bits. Sharp punters may have noted under/overshoot ( Gibb's phenomenon ) in the convergence to the corners of the box shape of the square wave. That ~ 17% under/overshoot simply moves from any given position as you add in more series terms .... presumably it dies away in the vast reaches of infinity. It's related primarily to the discontinuity ( the cliff ) that we are trying to model with a smooth function ( sinusoids ) approximation. For the curious the key analytic feature is pointwise vs uniform continuity.

( edit ) On re-reading I've also realised that we may have a "Who's On First Base ?" type situation with my wording about permutation matrices and sides. If

A * P

is the case then P permutes the columns of A. If

P * A

is the scenario then P permutes the rows of A.

* putting aside those pesky 1/SQRT(N) factors for today ... :-)

** What is doubly interesting is that the very same re-arrangement of the identity matrix gives a column permutation pattern during multiplication on one side which is the reverse of the row permutation pattern that occurs with multiplication on the other side. So I could have written a fairly similiar table but using column indices instead. For example : if I have a 4 x 4 permutation matrix that shuffles rows-on-the-right in the order 1 -> 2 -> 3 -> 4 -> 1, then it will shuffle columns-on-the-left in the order 1 -> 4 -> 3 -> 2 -> 1 ..... ah patterns, patterns and patterns. :-)

*** For instance I couldn't seriously imagine anyone actually/earnestly using a matrix multiplication routine to swap two matrix rows - surely you'd just exchange a pointer pair or somesuch.

I have made this letter longer than usual because I lack the time to make it shorter.Blaise Pascal

* There's nearly always an identity function/operator in a system, which has the effect of turning something into itself, for every something, thus everything is an eigen-something with eigenvalue one. You might want to guess what the eigenvalues would be for a system within which signals die away, or amplify without bound, or oscillate. I've assumed/implied the eigenfunctions/vectors here are mutually orthogonal too .... that doesn't have to generally hold ie. eigen-whatever doesn't necessarily imply either orthogonal or basis behaviour. But very frequently they are. A linear combination of eigen-somethings ( eg. Fourier transforms ) doesn't have to be an eigen-something.

In my career, I used some calculations related to signals that increased or died away at an exponential rate, by setting the frequencies in the more common calculation methods to complex numbers instead of real numbers.

* There's nearly always an identity function/operator in a system, which has the effect of turning something into itself, for every something, thus everything is an eigen-something with eigenvalue one. You might want to guess what the eigenvalues would be for a system within which signals die away, or amplify without bound, or oscillate. I've assumed/implied the eigenfunctions/vectors here are mutually orthogonal too .... that doesn't have to generally hold ie. eigen-whatever doesn't necessarily imply either orthogonal or basis behaviour. But very frequently they are. A linear combination of eigen-somethings ( eg. Fourier transforms ) doesn't have to be an eigen-something.

In my career, I used some calculations related to signals that increased or died away at an exponential rate, by setting the frequencies in the more common calculation methods to complex numbers instead of real numbers.

The area of solution methods to all manner of differential equations is huge, some equations look deceptively simple to write but solutions may only have numeric ( as opposed to analytic ) representations. Or one can 'discover' new functions relying on the defining property that the given function solves a certain equation. The eigenvalue/eigenfunction approach may be used to find 'key' solutions from which one can construct others. As an extremely general hand-waving comment : the solutions may change character as the eigenvalues (a) change sign, (b) their magnitude is greater-than/less-than/equal-to unity, and (c) their plurality ( how many times a specific eigenvalue is repeated -> rather complicating matters ). The Fourier basis set ( complex exponential functions upon suitable domains ) has the lovely property of orthonormality, and thus a neat correspondence/abstraction with vector spaces becomes available.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter.Blaise Pascal

## Solving Several Equations OK.

)

Solving Several Equations OK. Recall that multiplying an equation by the same constant preserves LHS = RHS, ditto for adding/subtracting two equations. Combining that behaviour I can add a multiple of one equation to another and still preserve the truth/equality of both. This is the key procedure in what follows. Lets look at two equations in two unknowns :

[pre] 2x + 3y = 4 ..... eq_1

3x + y = 3 ..... eq_2[/pre]

Thus we want to find the particular values of x and y, if any, that will make both of these equations true simultaneously. Subtract 3/2 * eq_1 from eq_2 :

[pre] 3x - (3/2)2x + y - (3/2)3y = 3 - (3/2)4

then simplify :

3x - 3x + y - (9/2)y = 3 - 6

0x + (-7/2)y = -3

y = (-3)(-2/7) = 6/7[/pre]

thus by a suitable choice of multiplier ( you can relate the factor of 3/2 to the coefficients of x in each equation ) for eq_1 I have converted to a system of one equation + one unknown and - since I didn't come across either 0 = 0 or something like 0 = 5 along the way - I can solve for that remaining variable. With that variable I go back to eq_1 and substitute ( place the value of the variable where the symbol is ) to get the other one :

[pre] 2x + 3(6/7) = 4

simplify :

2x = 4 - 18/7 = 28/7 - 18/7 = 10/7

x = 5/7[/pre]

Then to be pure of heart one ought to ( consistency ) check by putting both x = 5/7 and y = 6/7 into eq_2 :

[pre]3(5/7) + 6/7 = (15 + 6)/7 = 21/7 = 3 OK!![/pre]

[ I've deliberately chosen non-neat number solutions to emphasise that it is the indexing which is discrete not the values - the count of the number of variables is not 1.5, nor do we have PI equations. Integers greater than zero only. ]

The general process is forward elimination followed by back substitution. If I had three equations in three unknowns I would, say, suitably combine eq_1 and eq_2 to eliminate a variable then appropriately combine eq_1 and eq_3 to likewise eliminate that same variable : now I have two equations in two unknowns ( the remaining two variables that I didn't eliminate ). With that reduced system of two equations I then .... etc.

If one equation is actually a combination of the others 'in disguise' or dependent - there is some linear combination that connects the lot - then I won't succeed. In the more general case where I have more equations than unknowns or vice-versa then I'll have to usually cope with the likes of 0 = 4 ( whoops, I have no solutions whatsoever ) or 0 = 0 ( thus infinitely many solutions ) more often. We are taking it as read that the variables are fundamentally independent ( x and y axes are orthogonal ), or that ab initio I can chose x independently of my choice of y ( prior to combining/restricting them within equations ). The issue of equations is then whether I have too many or not enough with respect to that variable set, taking into account whether or not some of the original equations are really morphed versions of others. For instance :

[pre] x - 2y = 6

-3x + 6y = -18[/pre]

actually represent the same relationship between the variables x and y. Any (x, y) pair that makes one equation true also matches the other ( ie. there are NO (x, y) pairs that fit one but not the other ).

Anyway, the key point is that I am recursing the same general method but with progressively smaller systems. So if I start with some large system, I will either eventually get to one solvable equation with one unknown OR fall over somewhere/sometime because of 0 = 0, 0 = 1 type of stuff. If successful I would then 'wind back out' of the recursion to get the full set of variable values which, by the mode of derivation, will satisfy ( make true ) each and every equation that I originally began with. Mind you, is anyone up for one second of LIGO photodetector data ie. recursion-inwards/winding-outwards of some 16384 levels ?!? :-)

Err .... probably not ..... but for the moment let's frame the problem differently. With matrices. My 2 by 2 system above can be expressed in matrix language without any loss of validity :

this is one 2 x 1 matrix on the LHS equal to another 2 x 1 matrix on the RHS. So that means the corresponding entries of each must be equal. Hence the two original equations have been internalised into this matrix equation but nothing has really changed otherwise. Are we cool with that? It's what you might call the 'row viewpoint' : two lines in the x-y plane intersecting. Or not. No solution corresponds to parallel separated lines. One solution - the case I am also calling 'solvable' - is two lines intersecting at a point. The third case is one line completely overlying the other - infinitely many solutions or if you like two equations describing the very same line :

From the above matrix representation there are more equally valid ways to proceed. A matter of perspective really.

Here is the 'transform viewpoint' :

where I have taken the matrix equation and expressed the LHS as a product of two matrices. I have factorised one matrix into two, much like 12 = 3 x 4. So now I have a 2 x 2 matrix multiplying a 2 x 1 matrix and so this can legitimately equal ( at least on matrix size grounds ) the single 2 x 1 matrix on the RHS. Frequently matrix stuff is expressed as the geometry of multi-dimensional vector spaces. In this mode of description we could say the 2 x 2 matrix is a transform or function acting on the vector [x y] to give the result [4 3]. The beginning and ending vector sizes are the same here, but generally we'd be going from an n dimensional space to an m dimensional space by using an m x n matrix ( remember an m x n matrix multiplied by an n x 1 matrix gives an m x 1 matrix ). So the question/issue becomes 'what vector should I start with such that it would be transformed to the specified RHS by the given matrix?' This is in the vein of simpler algebra : I could ask 'what number multiplied by 4 gives the result of 12?'

Now the 'column viewpoint' :

here again I have left the RHS alone, but split up the LHS 2 x 1 matrix into the sum of two 2 x 1 matrices and then for each of those factored out the x or y variable. In geometrical language I have a linear combination of two vectors summing to a third vector - all within the same vector space. I want to find those specific values of the coefficients x and y that correctly achieve the desired RHS sum :

which fails if the first two vectors are really co-linear or otherwise dependent ( including situations where one or both is the zero vector ) and the third is not along that common line. Now these three approaches/viewpoints - row, transform and column - are mathematically identical in that the total (x, y) solution set is unchanged regardless.

One advantage of the column approach ( thank you Gilbert Strang ) is that visualising is easier to extend to higher dimensions. So if I have 17 equations in 17 unknowns then row-wise I have to envision the intersection of ..... err, what exactly? Hyper-planes? Hype-hemi-cubes? Hyper-fizzwidgets? Whereas with the column view then increasing the dimensions gives me more terms in the linear combination, and each term is a vector with more components. Not as hard to get one's head around and translate the lessons from lower dimensions.

Now the full technique for arbitrary dimensions/variables/equations is known as the Gauss ( see? The guy was into everything ) or row echelon reduction via elimination, Gauss-Jordan or reduced row echelon form for elimination plus back substitution. Plus other monikers no doubt. They are mechanisations of the above principles. I'm not going to describe them as we don't use it per se. Just remember that these algorithms give the N^2 computational cost for the DFT that we are avoiding by using the FFT.

Cheers, Mike.

@Martin : [Quite an alternative ..... behind the mathematical hieroglyphs!]

That is the major rub with learning maths : to see beyond/behind the concise but precise notations that are the necessary evil of the topics.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

## The Plan Of Attack So what do

)

The Plan Of Attack So what do we want from the DFT/FFT when I express the problem in matrix terms? I want to start from this equation (A) :

and go to this equation (B), if all goes well :

Looking first at the LHS in (A) : the steps in the ( Gauss ) elimination phase have the effect of clearing to zero the elements in the lower left side of the F matrix. Then the ( Jordan ) back-substitution phase has the effect of clearing to zero the elements in the upper right side of the matrix. But the leading diagonal is preserved with non-zero values ( that Jordan converts to ones ):

Most importantly each and every individual manipulation of the matrix/system ( the usual 'atomic' computation is a multiple of one row subtracted from another ) in either phase can be represented as the product of some 'conversion' matrix times a matrix already representing the existing system state, which thus transitions to the new state of the system and ..... lather, rinse & repeat. Examples of 'conversion matrices' with 3 x 3 are :

but they will only do those specified row operations if they multiply on the left side of our current/target matrix ( they would alter entire columns at a time if done on the right ). The principle to observe in constructing these conversion matrices for some operation of interest is : take the identity matrix of whatever size and apply the desired operation to it.

Anyhows the whole Gauss-Jordan rigamarole ( the algorithm I didn't describe in detail ) can be written as :

where you see I will eventually get the identity matrix : which then multiplies my vector x of as yet unknown co-efficients to give just that vector alone on the LHS. There could be millions of steps on the LHS !! However if I faithfully repeat each and every step in exactly the same order but acting on the RHS I convert the vector b ( originally our set of physical measurement values ) to the desired Fourier co-efficient values which are appropriate for that series of measurements. As I have not broken my pledge to maintain equality ( LHS = RHS ) then whatever set of numbers constitute the components of the vector x in equation (A) are precisely the ones constituting x in equation (B).

The FFT will take a similiar tactic ( but different actual steps ) .... to find a legitimate sequence of LHS conversions ultimately giving the identity, apply precisely the same to the RHS and out pops the transform vector with a fraction of the effort. SO FINALLY we will next begin to construct the Fourier matrix and study the patterns within. :-) !!

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

## Meet The Fouriers Here's the

)

Meet The Fouriers Here's the first few Fourier matrices with sizes being some power-of-two ....

... which are involved in the matrix versions of the following transform equations :

where I apologise for shifting the start of the matrix indices to zero. The formulae turn out nicer and thus it will be easier to explain certain patterns. I've deliberately used matrix/array notation for f and C, using square brackets [] that is, to emphasise their discrete nature. Examining F_N for N = 2^(something) turns out to be crucial for the FFT. We will find that the F_8 matrix can be expressed in terms of F_4 which in turn can be written as involving F_2 .... etc. So if I have say F_1024, that is 1024 = 2^10 data points, then I will progressively recurse the problem to successively lower powers : 1024 -> 512 -> 256 -> 128 -> 64 -> 32 -> 16 -> 8 -> 4 -> 2 . So here is where the (N/2)*log_base2_N performance comes in, as to get from 1024 to 512 via Gauss-Jordan I have to do 1024 - 512 = 512 steps. Now I can do that in one step!! :-) :-)

If you want to use the FFT for some prime number like N = 37 then you're out of luck. You can involve the FFT where N has some factors not equal to two, but I'm only going to describe the generic Cooley-Tukey main case. Please note the convention I have used for the sign in the exponential terms - positive for the frequency domain -> time domain conversion - so that in strict terms the matrices above refer to the 'back-transform' and not the determination of Fourier transform co-efficients. I do that to be able to write i^some_power rather than writing (-i)^some_power in my demonstration .... relax, it's one of those convention things with the logic of the FFT being the same either way. :-)

So why do these matrices represent the Fourier equations? Let's check it our for N = 4. Firstly :

so i is the factor that will be raised to the power of s * t as we go along the rows and down the columns of F_4. Lets evaluate the Fourier transform equation for that case :

so I've looked at the signal function f for t = time = 0, 1, 2 and 3 : then evaluated what each means over the sum from s = 0, 1, 2 and 3. Obviously if s or t is zero then their product is zero and exp(0) = 1. That's why there are ones along the top and left edges of the Fourier matrices. Thus specifically :

Thus many of you may now see the reason for the possibly quirky seeming definition of matrix multiplication. Any row of F_4 times ( inner product with ) the C matrix is equal to the corresponding entry of the f matrix :

You can think of matrix stuff as a way of abstracting the numbers from the specific context, while retaining the core relationships between the various quantities. This is why simultaneous equations just beg for a matrix approach, or if you like that matrices were invented to solve that set of problems.

Now that annoying looking factor of 1/SQRT(N) out front is there for a special reason. I'm using what is known as the unitary representation of the F matrices. If I do the inner product of the first row with the first column of F_4 then I get :

But if I do the inner product of the first row with the second column of F_4 then I get :

... in other words these unitary matrices have orthonormality applying within them. Inner product with the same row/column index is 1, but with different row/column indices is 0. Remembering that when we do the inner product for complex matrices we use the Hermitian conjugate, with the 1/SQRT(N) in this case scaling matters to 1 ( because SQRT(N)*SQRT(N) = N ). It turns out that the inverse of a Fourier matrix is it's Hermitian conjugate, and you will also note that a Fourier matrix is symmetric ( swapping rows for columns gives the same matrix, or the matrix is it's own mirror image along the leading diagonal ) :

Beware : there is a related and potentially confusing usage of the word 'Hermitian' as in : 'the matrix W is Hermitian'. This means that W is equal to it's Hermitian conjugate. The Fourier matrices are symmetric - equal to their own transposes - but are generally not equal to their Hermitian conjugates. Thus F_N has an Hermitian conjugate. But F_N is not equal to that conjugate, and is thus not an Hermitian matrix.

OK. That'll do for today. Having shown to you and explained how Fourier matrices are built I still have some further justifying to do. Next time .... discretising data records ..... the problem of signal aliasing ..... and The Roots of Unity. :-) :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

## Slice N Dice For a

)

Slice N Dice For a given/chosen data record, within which lies some waveform of interest, we subdivide the time axis into N equal time intervals. Hypothetically for example, a LIGO photodetector setup could give you 16384 of such equal intervals per second, thus with each interval being 1/16384 th of a second long. If I'm looking for a 512Hz GW signal in that data set then that would refer to 16384/512 = 32 data points per signal cycle. Fortunately this turns out to be way more than one needs for that signal frequency !! As a general rule of thumb* the required number of samples N to maintain/establish fidelity goes like :

N = 2 * B * L

or

sampling rate = N/L = 2 * B

where B is the desired 'bandwidth' or frequency-range-width you want to examine and L is the length of the time interval that you have available data for. Clearly if you have a longer interval of time then to keep the fidelity constant one has to have proportionally more data points. For our purposes one can think of the bandwidth as from zero Hertz to the frequency of the highest pitch signal that you're after. Naturally you're welcome to have more than N, but if you have less then you can't confidently reconstruct the signal from the transform while also wanting certain frequencies preserved ( there's a long-ish mathematical path showing that if you start with say M 1, plus however many consecutive higher harmonics needed to make up the set of N. This is why I have been mostly talking of square matrices, those having some hope of being ( left and right ) invertible.

The Nth Roots of Unity Let's have a look at this lad that gets relentlessly multiplied within the Fourier matrix :

which for conversation I've given the name of q with an N subscript to remind us that it depends on N. You can immediately see that it is a complex number with magnitude of 1 and phase 2*PI/N. As 2*PI is a full circuit of arc angle measure then this phase is 1/N th of that. Following through on the algebra you see that q_N is the Nth root of 1, or :

where I apologise for using N as both super- and sub- script. The upper is a power to be taken, the lower is a reminder. Graphically :

so it's no more mysterious than saying 'if I rotate 1/N th of a circle at a time, then I have to do N such rotations to achieve one circuit'. However this is also true :

meaning that if I take a multiple of the above fundamental Nth root = 2*PI/N then that will also be an Nth root. So 'if I rotate m/N th of a circle at a time, then when I do N such rotations I will achieve m such circuits, which is the same as one circuit'.

So how many distinct Nth roots of 1 are there? N of them of course ..... and I choose to index them from 0 through to N - 1 ... :-)

I know this all sounds a bit suspect/obtuse, but it is quite valid, and indeed the FFT is going to hinge/pivot on there being a spot of duplicity here !! :-) :-)

Cheers, Mike.

* If you look closely at this, it is ( yet another version of ) The Uncertainty Principle in disguise. Indeed there are terrific and deep correspondences b/w quantum and information theories ...

** The exact rate I think is 44.1 KHz which is more of a technical point or historical precedent related to initial analogue to digital music conversions. Under-sampling of music with respect to playback intentions will cause rotten distortions, as high frequencies are returned that weren't there in the original : being inserted as an aliasing artifact of lower -> higher frequency components.

Also your favorite video game - Need For Greedy Auto Theft - has to cope with equivalent problems as the regularities/granularities differ between the hardware viewing device ( generally discrete/raster ) and the software 'model'.

Plus on these simple calculations/grounds the 16384Hz sampling rate at LIGO well covers signals up to 8192Hz. I don't think we at E@H are going anywhere near that. There are LIGO groups which deal with sharper/transient stuff, thus not our continuous waves, and they will certainly need the extra detail in the records to examine higher derivatives/frequencies.

*** Well 'who aliases who' is a question of perspective - what do we consider as 'original' or 'real' - but the math is there either way.

**** This is one of those leaps in the explanation because (a) the theorems are fiddly (b) you'll be fine without it and (c) life is too short !! :-)

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

## Cooley-Tukey Magic A

)

Cooley-Tukey Magic A factorisation of the Fourier matrix was outlined in the 1960's ( but thought up in the Bell labs in the 50's ?? ) where the factors could be expressed in terms of one power of 2 lower* :

.... here I've used a 'block matrix' notation which is a way of outlining a sub-matrix within a bigger matrix. You don't actually change any matrix entries but simply mark out suburbs within the larger city. It's a complicated looking sucker for sure, but when fully recursed it ultimately achieves the separation of variables that we seek! The reason why this factorisation works is simply that a double rotation ( on the Argand plane ) of a higher frequency component, say exp[i*PI/4], is equal to a single rotation of one which is half the frequency eg. exp[i*PI/2] - this being the 'duplicity' that I referred to earlier.

The above product of three matrices on the right actually contains the same information as the single one on the left. But it's more spread out, hence all the extra zeroes popping up in the identity I_N, the diagonal D_N, the zero matrix O_N and the permutation matrix. For all that apparent extra mess you do get to see clearly where the recursive elements lie : the exact relationship b/w F_2N and F_N. Think of it like the mathematical version of a frog dissection in biology class !! :-)

Here is the worked out case for F_4 to F_2 :

What is quite important are all those areas in the matrices which contain zeroes. In effect these somewhat de-couple what is happening between the 'burbs' eg. in block matrix notation :

Essentially it's the D_N times the F_N which recovers the correct odd powers of the exponential factors for F_2N. F_N already had the even powers required for F_2N ( look at the roots of unity circle again : 'PI/2 is not only a fourth root of 1, it is also and eighth root too ...' ). The permutation matrix applied on the right-most side sorts the columns into the correct order for equality with F_2N.

Lets look at what happens to these three main matrix factors as we descend from some large N ( a power of 2 ) down to 1. Firstly the left-most matrix :

next the middle matrix :

in matrix terminology they become more sparse - a greater fraction of contained zeroes - which certainly simplifies their handling. The middle matrix goes towards the identity -> because F_1 = [1] ! The right most matrix doesn't become sparser - it 'weaves' or permutes the rows of whatever matrix lies to it's right :

but since this factorisation exists within a situation like this :

then one can also view the permutations as acting on the columns of any matrix lying to it's left. But the FFT is usually described as first even/odd permuting the input vector ( b in this instance ) before applying the bulk of the transform. Therein lies an interesting pattern of behaviour** :

In the real world of application programming one doesn't have to use a matrix representation internal to the code. However an 'as if' scenario*** would apply whereby the output of a given ( FFT ) library call reflects what I've roughly outlined here - the Mickey Mouse Version. I really don't know the FFT/API syntax but I could well imagine a high level call involving two pointers ( input and output array/buffer/vector locations ) and a size ( what value of N applies ). There would be tremendous scope for hard-coding/pre-computation if one was confident in advance what size transform was applicable. What I should have emphasised earlier is that an unfactored Fourier matrix never contains zeroes, and more so is always invertible ( another theorem states that each matrix factor of that must therefore be invertible too ... )

However I could imagine some difficulties for a given length/type of number representation within an implementing device - specifically the upper & lower limits upon the chosen floating point representation. For instance : would we want 2*PI/N for some large N ? If so, when would that 'bottom out' and/or lose sufficient accuracy due to the representation's bit length? I'm sure there's a bunch of some library code that manages the exceptions on that .....

So flick-flacking through 16384 data points doesn't seem quite so bad now ?? :-) :-)

We are pretty well at the 'heart of the maze' that I've intended/aimed some months ago. I haven't finished yet, as there's some other interesting stuff to put on display ... generally rather easier material by the way, as we've done some pretty hard core topics ... and you can tell by now that I could go on indefinitely! If you've followed my intellectual pathway ( to any degree !! ) then you've seen my interpretation/detail upon the raison d'etre for E@H - crunching of numbers on a massive/supercomputer scale, an humongous mathematical mill if you like, to assist the scientists in teasing out any patterns within the real world of scientific observations. In this case either gravitational or electromagnetic signals.

Cheers, Mike.

( edit ) A particular reason why the Fourier matrices are invertible is that the columns ( or rows for that matter ) are linearly independent. This goes way back to our discussion about an orthonormal basis for a vector space.

( edit ) I didn't want to imply that a matrix without zeroes was invertible for that reason alone. For instance the identity matrix has plenty of zeroes with columns ( rows ) that are an orthonormal basis. I've been reading about other choices of basis functions - Haar wavelets - that give quicker convergence onto sharp features than Fourier. Haar's split the shapes/functions into what is average vs what is different and hence hone in better on steps and jaggy bits. Sharp punters may have noted under/overshoot ( Gibb's phenomenon ) in the convergence to the corners of the box shape of the square wave. That ~ 17% under/overshoot simply moves from any given position as you add in more series terms .... presumably it dies away in the vast reaches of infinity. It's related primarily to the discontinuity ( the cliff ) that we are trying to model with a smooth function ( sinusoids ) approximation. For the curious the key analytic feature is pointwise vs uniform continuity.

( edit ) On re-reading I've also realised that we may have a "Who's On First Base ?" type situation with my wording about permutation matrices and sides. If

A * P

is the case then P permutes the columns of A. If

P * A

is the scenario then P permutes the rows of A.

* putting aside those pesky 1/SQRT(N) factors for today ... :-)

** What is doubly interesting is that the very same re-arrangement of the identity matrix gives a column permutation pattern during multiplication on one side which is the reverse of the row permutation pattern that occurs with multiplication on the other side. So I could have written a fairly similiar table but using column indices instead. For example : if I have a 4 x 4 permutation matrix that shuffles rows-on-the-right in the order 1 -> 2 -> 3 -> 4 -> 1, then it will shuffle columns-on-the-left in the order 1 -> 4 -> 3 -> 2 -> 1 ..... ah patterns, patterns and patterns. :-)

*** For instance I couldn't seriously imagine anyone actually/earnestly using a matrix multiplication routine to swap two matrix rows - surely you'd just exchange a pointer pair or somesuch.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

## Time to warp to a new thread

)

Time to warp to a new thread ....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

## RE: * There's nearly always

)

In my career, I used some calculations related to signals that increased or died away at an exponential rate, by setting the frequencies in the more common calculation methods to complex numbers instead of real numbers.

## RE: RE: * There's nearly

)

The area of solution methods to all manner of differential equations is huge, some equations look deceptively simple to write but solutions may only have numeric ( as opposed to analytic ) representations. Or one can 'discover' new functions relying on the defining property that the given function solves a certain equation. The eigenvalue/eigenfunction approach may be used to find 'key' solutions from which one can construct others. As an extremely general hand-waving comment : the solutions may change character as the eigenvalues (a) change sign, (b) their magnitude is greater-than/less-than/equal-to unity, and (c) their plurality ( how many times a specific eigenvalue is repeated -> rather complicating matters ). The Fourier basis set ( complex exponential functions upon suitable domains ) has the lovely property of orthonormality, and thus a neat correspondence/abstraction with vector spaces becomes available.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal