@BackGroundMAN : Yup, you can use double angle formulae and the like to generate (co)sines from other (co)sines. As you say a higher precision is needed to contain errors there. The Epiphany has fused instructions which ( allegedly ) mitigate that by not rounding intermediates. So you either start with sufficiently high precision values and/or derive from them while maintaining precision. I'm having another look at the FloPoCo VHDL generator ( for FPGA's ) which ( still ) looks promising in that arbitrary operator precision is a main design input such that :

Quote:

... fully parameterized in precision, so that your application may use just the precision it needs, and accurate to the last bit, so that your wires don't carry meaningless noise. Internally, FloPoCo operators are carefully designed to ensure that no bit is computed that is not useful to the final result.

... and so an FPGA block could punt out a suitable precision for the outside world by rounding off as the last thing.

Also partly this can depend upon where you want to stop the factorisation/recursion ie. what is the base case that triggers the winding back out of the recursion. And for that matter what factorisations of N are performed on the way in to the base case. Or if you like : how many powers of the Nth root of unity do you need at each step, and that doesn't have to be the same number for each recursive step. Mathematically you only have to adhere to the prime factors of N or products of those. For E@H at least we have simple power-of-two choices.

There does emerge some 'simplicities' with powers-of-two double angle formulae eg. sin[4*A] in terms of (co)sin[A] say. But simplicity is a relative term here as you get get mixed powers of sines and cosines :

sin[4A] = 4[sin[A]cos^3[A] - sin^3[A]cos[A]]

... so this may be an arguable advantage. As a generality (co)sin[M*A] can be expressed as sums/differences of product terms each with a total of M (co)sin[A] factors.

Another point of merit is that if you can do SQRT[1 - sin^2] real fast then you get the cosines from the sines efficiently. Again you swap time for space.

One especial feature is that all the sines and cosines - and by extension the powers - are bounded ( in magnitude ) below by zero and above by one. So you can do much in a fixed-point format and not have to worry excessively about the relative absolute size of operands. So what bits you might have put aside for exponent can be used for extra mantissa. That's less operations needed for normalising and hence shifters and leading-zero counters ..... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter.Blaise Pascal

It's a shame that the BOINC version supplied by Raspian wheezy is so old that it doesn't get the CPU features right for the ARM, so it will not allow you to get NEON tasks. I see why you compiled your own version now.

I selected the overclocking profile "PI2" from the raspi-config menu (CPU freq. at 1GHz), and with 4 Einstein@home tasks running in parallel I could easily get my PI2's CPU to over 75 deg C (!!) while it was enclosed in a PI B+ case. With the case open, temp. fell to more reasonable values below 70 deg C, but then again it's not exactly summer in Germany now.... I wonder what would happen if the ambient temp is more like 30deg C or some such.....
I suspect many of the PI cases that do not have ventilation vents over the CPU will not work well with a Raspi 2 under full load. It's also time to buy those tiny heat sinks, I guess. Not as bad as the Parallella tho which even needs a little fan.

The other oddity is that Raspbian currently only allows access to ca 3/4th of the 1GB total RAM :-( . This is a known problem and is expected to be fixed soon.

Those who are using a self-compiled BOINC version on the PI2 (not the 7.0.x version that comes with Raspbian wheezy which will not correctly detect the presence of the NEON CPU feature) can experiment with providing optimized "wisdom" for the FFTW library. A wisdom file contains hints on the performance of individual building-blocks of the FFT implementation the library can choose from on the particular hardware. When generating an FFT "plan" (assembling a full FFT-code from those building blocks) those hints can better guide the library and help get a better plan.

I'm currently experimenting with this wisdom file:

Indeed, the Raspberry Pi 2 has a photo sensitive component that can be triggered to have a transient failure by extreme bright light (Xenon flashlight from short distance or aiming a laser pointer directly at the chip in question)

It's the oddest case of electromagnetic compatibility issue of electronics I ever saw or heard of (I couldn't resist to try this (successfully) on my own Pi2...it's almost like this flashy-thing in MIB ;-) ).

Now this little component will hardly be build exclusively for the Pi2, and I wonder what other pieces of electronics might show up to be susceptible to the 'Xenon flash of death'.

But there's also good news: an update of the firmware and kernel via rpi-update will now fix the problem that almost 250 MB of the RAM was not usable. Finally you get the full 1GB (minus GPU memory ). Even running 4 E@H task in parallel will leave plenty of free RAM ;-)

Only for comparison, a Odroid C1 features:
(I have no intention to run any boinc project on this, but some numbers are always interesting, imo)

Processor: 4 ARM ARMv7 Processor rev 1 (v7l)
Processor features: swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4
Number of CPUs: 4
623 floating point MIPS (Whetstone) per CPU
2489 integer MIPS (Dhrystone) per CPU

## @BackGroundMAN : Yup, you can

)

@BackGroundMAN : Yup, you can use double angle formulae and the like to generate (co)sines from other (co)sines. As you say a higher precision is needed to contain errors there. The Epiphany has fused instructions which ( allegedly ) mitigate that by not rounding intermediates. So you either start with sufficiently high precision values and/or derive from them while maintaining precision. I'm having another look at the FloPoCo VHDL generator ( for FPGA's ) which ( still ) looks promising in that arbitrary operator precision is a main design input such that :

... and so an FPGA block could punt out a suitable precision for the outside world by rounding off as the last thing.

Also partly this can depend upon where you want to stop the factorisation/recursion ie. what is the base case that triggers the winding back out of the recursion. And for that matter what factorisations of N are performed on the way in to the base case. Or if you like : how many powers of the Nth root of unity do you need at each step, and that doesn't have to be the same number for each recursive step. Mathematically you only have to adhere to the prime factors of N or products of those. For E@H at least we have simple power-of-two choices.

There does emerge some 'simplicities' with powers-of-two double angle formulae eg. sin[4*A] in terms of (co)sin[A] say. But simplicity is a relative term here as you get get mixed powers of sines and cosines :

sin[4A] = 4[sin[A]cos^3[A] - sin^3[A]cos[A]]

... so this may be an arguable advantage. As a generality (co)sin[M*A] can be expressed as sums/differences of product terms each with a total of M (co)sin[A] factors.

Another point of merit is that if you can do SQRT[1 - sin^2] real fast then you get the cosines from the sines efficiently. Again you swap time for space.

One especial feature is that all the sines and cosines - and by extension the powers - are bounded ( in magnitude ) below by zero and above by one. So you can do much in a fixed-point format and not have to worry excessively about the relative absolute size of operands. So what bits you might have put aside for exponent can be used for extra mantissa. That's less operations needed for normalising and hence shifters and leading-zero counters ..... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

## RE: It's got a couple of

)

Cool, thx!

For perspective, I assume this is (yet) at stock CPU clock?

I wonder if you can still get the CPU temperature by

cat /sys/devices/virtual/thermal/thermal_zone*/temp

it would be interesting so see if it's getting any hotter than the old one.

HB

P.S.: still busy downloading Raspbian image ...

## RE: RE: It's got a

)

It's stock clock at present, i haven't got my spare heatsinks with me (they're at my work address), and didn't want to push it too hard yet,

This is what i got as a temp for the Pi 2 :

pi@raspberrypi ~ $ cat /sys/devices/virtual/thermal/thermal_zone*/temp

51920

and my original Model B with a pair of heatsinks and at stock clock gives (running my self compiled armv6 Stock Seti 7.0 app):

pi@raspberrypi ~ $ cat /sys/devices/virtual/thermal/thermal_zone*/temp

48692

Claggy

## Hi! It's a shame that the

)

Hi!

It's a shame that the BOINC version supplied by Raspian wheezy is so old that it doesn't get the CPU features right for the ARM, so it will not allow you to get NEON tasks. I see why you compiled your own version now.

I selected the overclocking profile "PI2" from the raspi-config menu (CPU freq. at 1GHz), and with 4 Einstein@home tasks running in parallel I could easily get my PI2's CPU to over 75 deg C (!!) while it was enclosed in a PI B+ case. With the case open, temp. fell to more reasonable values below 70 deg C, but then again it's not exactly summer in Germany now.... I wonder what would happen if the ambient temp is more like 30deg C or some such.....

I suspect many of the PI cases that do not have ventilation vents over the CPU will not work well with a Raspi 2 under full load. It's also time to buy those tiny heat sinks, I guess. Not as bad as the Parallella tho which even needs a little fan.

The other oddity is that Raspbian currently only allows access to ca 3/4th of the 1GB total RAM :-( . This is a known problem and is expected to be fixed soon.

Cheers

HB

## Those who are using a

)

Those who are using a self-compiled BOINC version on the PI2 (not the 7.0.x version that comes with Raspbian wheezy which will not correctly detect the presence of the NEON CPU feature) can experiment with providing optimized "wisdom" for the FFTW library. A wisdom file contains hints on the performance of individual building-blocks of the FFT implementation the library can choose from on the particular hardware. When generating an FFT "plan" (assembling a full FFT-code from those building blocks) those hints can better guide the library and help get a better plan.

I'm currently experimenting with this wisdom file:

To use it, save to a file

`wisdomf`

and copy it to the predefined system-wide location:Let's see if this does any good.

Cheers

HB

EDIT:

this might work better:

## I've come across a PI 2

)

I've come across a PI 2 scheduler problem at Albert, the scheduler won't send Neon work, i've reported it there with the scheduler log:

http://albertathome.org/content/pi-2-scheduler-problem

Claggy

## RE: I've come across a PI 2

)

Indeed a scheduler problem, Bernd has already a fix for this which will be committed shortly .

Cheers

HB

## Just to inform the Pi-2

)

Just to inform the Pi-2 fans:

Raspberry Pi 2 schaltet sich bei Blitzlicht ab

http://www.tomshardware.de/raspberry-pi2-blitzlicht-bug,news-252128.html

## RE: Just to inform the Pi-2

)

Indeed, the Raspberry Pi 2 has a photo sensitive component that can be triggered to have a transient failure by extreme bright light (Xenon flashlight from short distance or aiming a laser pointer directly at the chip in question)

http://www.raspberrypi.org/xenon-death-flash-a-free-physics-lesson/

It's the oddest case of electromagnetic compatibility issue of electronics I ever saw or heard of (I couldn't resist to try this (successfully) on my own Pi2...it's almost like this flashy-thing in MIB ;-) ).

Now this little component will hardly be build exclusively for the Pi2, and I wonder what other pieces of electronics might show up to be susceptible to the 'Xenon flash of death'.

But there's also good news: an update of the firmware and kernel via rpi-update will now fix the problem that almost 250 MB of the RAM was not usable. Finally you get the full 1GB (minus GPU memory ). Even running 4 E@H task in parallel will leave plenty of free RAM ;-)

Cheers

HB

## Only for comparison, a Odroid

)

Only for comparison, a Odroid C1 features:

(I have no intention to run any boinc project on this, but some numbers are always interesting, imo)