23 Feb 2020 17:33:20 UTC

Topic 220779

(moderation:

Are there any E@H apps that allow for command line optimization of FFT in either AMD or NVidia GPU apps? More generally, is there specific information from the Linux command 'clinfo' that would be useful for folks wanting to optimize GPU run parameters? I'm asking to help with upgrades to the program amdgpu-utils available from GitHub.

*Ideas are not fixed, nor should they be; we live in model-dependent reality.*

Language

Copyright © 2023 Einstein@Home. All rights reserved.

## Hello Cecht, the tool you

)

Hello Cecht,

the tool you mentioned (https://github.com/Ricks-Lab/amdgpu-utils) is also available for Debian and Ubuntu (https://packages.ubuntu.com/focal/ricks-amdgpu-utils). I admit not to have understood you question, though :o/

## I think that's a no to both

)

I think that's a no to both questions I'm afraid.

[This may not be relevant/helpful]

IIRC the bottleneck in the FFT is the evaluation of the sine and cosine of theta, where theta is the ( radian measured ) argument with the precision/granularity of the base transform ie. however many data points we are transforming. Evaluation by power series approximation converges way too slow, but I think it was solved using lookup of precalculated values in a table and double angle formulae eg.

sin(A + B ) = sinA * cosB + cosA * sinB ; cos(A + B) = cosA * cosB - sinA * sinB

making this amenable to a fused multiply/add scheme, if available.

[/This may not be relevant/helpful]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

## Mike Hewson wrote:making this

)

Mike Hewson wrote:Isn't there a problem with loss of precision with fused multiply/add on some hardware, such as Intel GPUs?

## Richard Haselgrove wrote:Mike

)

Richard Haselgrove wrote:Is there ? That's sad. I guess the question is equivalent to "does Intel/whoever comply with

IEEE-754-2008 ?" As the prime market for GPUs is not computing then it may be better to be fast than right .....[dull, dirty and dangerous]

So you are right to worry, as with E@H the FFTs are of order 2

^{22 }data points IIRC. So you have to have a precision equal/deeper than that in the representation. Of interest is that 32 bit IEEE-754 floats havefor the fractional part of an operand.23 bits significantThis is a clue to calculating the trig functions via double angle formula. Pre-calculate the sine and cosine for each 2

^{n}from 2^{-1}thru to 2^{-23}. Put these into respective tables/vectors indexed suitably, making two tables each of 23 entries of some precision. Now any 23 bit theta argument, from zero up until 1 - 2^{-23 }, is going to be some sum of certain such powers of 2 ie.a

_{1}* 2^{-1}+ a_{2}* 2^{-2}+ ..... + a_{k}* 2^{-k}+ ...... a_{23 }* 2^{-23 }{}Each a_{k}is either 0 or 1Set the value of two accumulators, call them Asin and Acos. The first is set to sin(0.0) = 0.0 and the second is set to cos(0.0) = 1.0.

Now, by

looping say, shift the operand to the right by one bit and inspect the bit that falls off the right end ( first you will get afor_{23}then another shift gets a_{22}etc ..... all along to a_{1}). At each shift test the bit's value :- if a

_{k}isthen forget it as 20^{-k}contribute to theta's value, so neither will sin(2does not^{-k}) or cos(2^{-k}) contribute to our task here. Exit that loop iteration.- if a

_{k}isthen 21^{-k}contribute to theta's value. So by using sin(2does^{-k}) and cos(2^{-k}) from your tables plug it into the double angle formulae :Asin

_{old}<---- AsinAcos

_{old}<---- AcosAsin <---- Asin

_{old}* cos(2^{-k}) + Acos_{old}* sin(2^{-k})Acos <---- Acos

_{old}* cos(2^{-k}) - Asin_{old}* sin(2^{-k})... where Asin

_{old}and Acos_{old }are temporary variables with <---- indicating assignment. So actually I don't need FMA at all now that I come to think in detail ....When the loop finishes looking at all the bit's of theta then you will have Asin = sin(theta) and Acos = cos(theta). To that level of precision anyway. For that matter the precision of the trigs doesn't have to equal the precision of the argument, though the precision of sine must be the same as cosine. But you need a full circle ie. so from

to just below0is sought : mutatis mutandis to reach that range.2 * PIPlus to be truly general you want to enact the arguments moduli PI, bringing the argument back to some principal range. Why not go from

to-PI, a lovely (anti-)symmetric range ? Know that :+PIsin(-theta) = - sin(theta )

cos(-theta) = cos(theta)

{ Ultimately you're actually after cos(theta) +

* sin(theta) = ei^{i*theta }}Alternatively : to be utterly brutal you could just pre-calculate every sine and cosine for every argument in your principal range. That unwraps all the above gobbledygook to simple lookups from two arrays each of 2

^{23}entries. What a luxury that would be ...[/dull, dirty and dangerous]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

## steffen_moeller wrote:Hello

)

steffen_moeller wrote:Right, though amdgpu-utils 3.0.0 has been released, so the deb package should be updated.

My questions related to a rather minor point on what information should be listed from the amdgpu-ls --clinfo command. Rick, the dev, set that option to include parameters that are relevant for setting command line arguments of SETI apps. He wasn't sure whether E@H apps used similar or additional information (related to FFT), so I offered to ask here.

Ideas are not fixed, nor should they be; we live in model-dependent reality.## Mike Hewson wrote:I think

)

Mike Hewson wrote:Thank you. The rest of what you and Richard posted is fascinating (though above my head!) and hopefully will be helpful to someone down the road.

Ideas are not fixed, nor should they be; we live in model-dependent reality.