Fast Fourier Transform optimization?

cecht

Joined: 7 Mar 18

Posts: 1533

Credit: 2901855541

RAC: 2166673

23 Feb 2020 17:33:20 UTC

Topic 220779

(moderation:

)

Are there any E@H apps that allow for command line optimization of FFT in either AMD or NVidia GPU apps? More generally, is there specific information from the Linux command 'clinfo' that would be useful for folks wanting to optimize GPU run parameters? I'm asking to help with upgrades to the program amdgpu-utils available from GitHub.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

steffen_moeller

Joined: 9 Feb 05

Posts: 78

Credit: 1773655132

RAC: 0

Hello Cecht, the tool you

23 Mar 2020 0:57:38 UTC

Message 176153

(moderation:

)

Hello Cecht,
the tool you mentioned (https://github.com/Ricks-Lab/amdgpu-utils) is also available for Debian and Ubuntu (https://packages.ubuntu.com/focal/ricks-amdgpu-utils). I admit not to have understood you question, though :o/

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 316295580

RAC: 340084

I think that's a no to both

23 Mar 2020 4:41:00 UTC

Message 176156

(moderation:

)

I think that's a no to both questions I'm afraid.

[This may not be relevant/helpful]

IIRC the bottleneck in the FFT is the evaluation of the sine and cosine of theta, where theta is the ( radian measured ) argument with the precision/granularity of the base transform ie. however many data points we are transforming. Evaluation by power series approximation converges way too slow, but I think it was solved using lookup of precalculated values in a table and double angle formulae eg.

sin(A + B ) = sinA * cosB + cosA * sinB ; cos(A + B) = cosA * cosB - sinA * sinB

making this amenable to a fused multiply/add scheme, if available.

[/This may not be relevant/helpful]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2956606421

RAC: 714990

Mike Hewson wrote:making this

23 Mar 2020 6:02:44 UTC

Message 176157 in response to message 176156

(moderation:

)

Mike Hewson wrote:

making this amenable to a fused multiply/add scheme, if available.

Isn't there a problem with loss of precision with fused multiply/add on some hardware, such as Intel GPUs?

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 316295580

RAC: 340084

Richard Haselgrove wrote:Mike

23 Mar 2020 8:55:00 UTC

Message 176159 in response to message 176157

(moderation:

)

Richard Haselgrove wrote:

Mike Hewson wrote:
making this amenable to a fused multiply/add scheme, if available.

Isn't there a problem with loss of precision with fused multiply/add on some hardware, such as Intel GPUs?

Is there ? That's sad. I guess the question is equivalent to "does Intel/whoever comply with IEEE- 754-2008 ?" As the prime market for GPUs is not computing then it may be better to be fast than right .....

[dull, dirty and dangerous]

So you are right to worry, as with E@H the FFTs are of order 2²²data points IIRC. So you have to have a precision equal/deeper than that in the representation. Of interest is that 32 bit IEEE-754 floats have 23 bits significant for the fractional part of an operand.

This is a clue to calculating the trig functions via double angle formula. Pre-calculate the sine and cosine for each 2ⁿ from 2^-1 thru to 2^-23. Put these into respective tables/vectors indexed suitably, making two tables each of 23 entries of some precision. Now any 23 bit theta argument, from zero up until 1 - 2^-23, is going to be some sum of certain such powers of 2 ie.

a₁ * 2^-1 + a₂ * 2^-2 + ..... + a_k * 2^-k + ...... a₂₃* 2^-23{ Each a_k is either 0 or 1 }

Set the value of two accumulators, call them Asin and Acos. The first is set to sin(0.0) = 0.0 and the second is set to cos(0.0) = 1.0.

Now, by for looping say, shift the operand to the right by one bit and inspect the bit that falls off the right end ( first you will get a₂₃ then another shift gets a₂₂ etc ..... all along to a₁ ). At each shift test the bit's value :

- if a_k is 0 then forget it as 2^-k does not contribute to theta's value, so neither will sin(2^-k) or cos(2^-k) contribute to our task here. Exit that loop iteration.

- if a_k is 1 then 2^-k does contribute to theta's value. So by using sin(2^-k) and cos(2^-k) from your tables plug it into the double angle formulae :

Asin_old <---- Asin

Acos_old <---- Acos

Asin <---- Asin_old * cos(2^-k) + Acos_old * sin(2^-k)

Acos <---- Acos_old * cos(2^-k) - Asin_old * sin(2^-k)

... where Asin_old and Acos_oldare temporary variables with <---- indicating assignment. So actually I don't need FMA at all now that I come to think in detail ....

When the loop finishes looking at all the bit's of theta then you will have Asin = sin(theta) and Acos = cos(theta). To that level of precision anyway. For that matter the precision of the trigs doesn't have to equal the precision of the argument, though the precision of sine must be the same as cosine. But you need a full circle ie. so from 0 to just below 2 * PI is sought : mutatis mutandis to reach that range.

Plus to be truly general you want to enact the arguments moduli PI, bringing the argument back to some principal range. Why not go from -PI to +PI, a lovely (anti-)symmetric range ? Know that :

sin(-theta) = - sin(theta )

cos(-theta) = cos(theta)

{ Ultimately you're actually after cos(theta) + i * sin(theta) = e^i*theta}

Alternatively : to be utterly brutal you could just pre-calculate every sine and cosine for every argument in your principal range. That unwraps all the above gobbledygook to simple lookups from two arrays each of 2²³ entries. What a luxury that would be ...

[/dull, dirty and dangerous]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

cecht

Joined: 7 Mar 18

Posts: 1533

Credit: 2901855541

RAC: 2166673

steffen_moeller wrote:Hello

23 Mar 2020 14:16:06 UTC

Message 176162 in response to message 176153

(moderation:

)

steffen_moeller wrote:

Hello Cecht,
the tool you mentioned (https://github.com/Ricks-Lab/amdgpu-utils) is also available for Debian and Ubuntu (https://packages.ubuntu.com/focal/ricks-amdgpu-utils). I admit not to have understood you question, though :o/

Right, though amdgpu-utils 3.0.0 has been released, so the deb package should be updated.

My questions related to a rather minor point on what information should be listed from the amdgpu-ls --clinfo command. Rick, the dev, set that option to include parameters that are relevant for setting command line arguments of SETI apps. He wasn't sure whether E@H apps used similar or additional information (related to FFT), so I offered to ask here.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht

Joined: 7 Mar 18

Posts: 1533

Credit: 2901855541

RAC: 2166673

Mike Hewson wrote:I think

23 Mar 2020 14:22:12 UTC

Message 176163 in response to message 176156

(moderation:

)

Mike Hewson wrote:

I think that's a no to both questions I'm afraid.

Thank you. The rest of what you and Richard posted is fascinating (though above my head!) and hopefully will be helpful to someone down the road.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Fast Fourier Transform optimization?

Forums › Cruncher's Corner

Hello Cecht, the tool you

I think that's a no to both

Mike Hewson wrote:making this

Richard Haselgrove wrote:Mike

steffen_moeller wrote:Hello

Mike Hewson wrote:I think

Comment viewing options

Forums › Cruncher's Corner