Parallella, Raspberry Pi, FPGA & All That Stuff

KeithSloan

Joined: 13 Aug 10

Posts: 35

Credit: 1180559

RAC: 0

Looks like both sides flunked

15 Sep 2014 9:16:37 UTC

Message 111864 in response to message 111850

(moderation:

)

Looks like both sides flunked the bet. No sign of any software on Raspberry Pi or Parallella and today is the 15th of September the deadline and a Birtshday

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

Hello, I have a raspberryPi

9 Oct 2014 17:44:21 UTC

Message 111865

(moderation:

)

Hello, I have a raspberryPi B+ and I want to ask if there is a way to build the EaH client for ARM.

I want to test if there is any possibility to use the GPU FFT API for the 3*2^22 FFT of the EaH... :)

Thank you,

KeithSloan

Joined: 13 Aug 10

Posts: 35

Credit: 1180559

RAC: 0

I would pm BikeMan he was

10 Oct 2014 12:10:15 UTC

Message 111866 in response to message 111865

(moderation:

)

I would pm BikeMan he was supposed to be working on this. But I suspect he has got distracted. The Source should be available from http://einstein.phys.uwm.edu/license.php

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

Hello, The good news is

19 Oct 2014 13:34:07 UTC

Message 111867

(moderation:

)

Hello,

The good news is that I managed to build the EaH client for RaspberryPi.

I download a test workunit from here and I run the following command in both RaspberryPi and a x86_64 PC :

./einsteinbinary_XXXX-pc-linux-gnu -t ./test/templates_400Hz_2_short2.bank -l zaplist_232.txt -A 0.04 -P 3.0 -W -z -i ./test/J1907+0740_dm_482.binary -c status_profile.cpt -o results_profile.cand

with a sort template bank (due to RaspberryPi) [templates_400Hz_2_short2.bank].

And now the bad news.
The raspberryPi @ 1GHz is almost 16 times slower than an AMD FX-8350 @ 4GHz.

The really bad news is that the results [results_profile.cand] from raspberryPi differ in some numbers (at the 4th fractional digit) compared to the results of FX-8350. I build the x86_64 client on another two PCs (AMD and Intel) and the results are the same with the FX-8350.
Is this normal for the RaspberryPi client ?
Is it possible the compilation flags I add for the cross-compile to produce these errors?

I use these flags for the compilation:

CFLAGS="-march=armv6zk -mcpu=arm1176jzf-s -mtune=arm1176jzf-s -mfpu=vfp -mfloat-abi=hard"

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

After running the same test

21 Oct 2014 17:49:25 UTC

Message 111868

(moderation:

)

After running the same test workunit with several EaH clients (officials) on different architectures (x86_64, ARM, CUDA) I get different results. I guest that this is normal for the EaH (???).

The main problem now is that the ARM client I build produce different results than the official ARM client. This is most probably due to different CFLAGS I used.

KeithSloan

Joined: 13 Aug 10

Posts: 35

Credit: 1180559

RAC: 0

Hope you have some success. I

3 Nov 2014 7:25:54 UTC

Message 111869 in response to message 111868

(moderation:

)

Hope you have some success. I have two Pi's running Einstien@home 24 x 7 and would love somebody to exploit the FFT library so that they run at a more reasonable rate.

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

Hello, I try to measure the

18 Dec 2014 14:06:28 UTC

Message 111870

(moderation:

)

Hello, I try to measure the processing time for the FFT on the RaspberryPi and I see that (file: demod_binary_fft_fftw.c) the FFT inputs are nsamples==6M and fft_size==3M. The FFTW plan is a R2C FFT of 6M-points (nsamples) which outputs 3M complex points (half of the FFT output).

My question is that the R2C FFT size is 3*2^22 or 3*2^21 ??

I am trying to implement this FFT by using the GPU on raspberry which supports up to 1M-point C2C FFTs. I have implement the Radix-2, Radix-3 and the C2C to R2C stages and I try to measure the potential speedup I can have with the GPU-FFT and the fft_size is crucial to the measurements...

Is the fft_size fixed or is variable based on WU ?

Thank you,

KeithSloan

Joined: 13 Aug 10

Posts: 35

Credit: 1180559

RAC: 0

Check the following message

19 Dec 2014 21:26:50 UTC

Message 111871 in response to message 111870

(moderation:

)

Check the following message http://einsteinathome.org/node/196560&nowrap=true#128680

It suggest to me that it computes a 3*2^22 real to complex DFT.

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

I take a WU from a running

20 Dec 2014 10:58:21 UTC

Message 111872

(moderation:

)

I take a WU from a running system (ARM) and the fft_size is 12M-point R2C.
Most probably there are units that needs 12M-points FFTs and units that needs 6M-points FFTs (or I had very old units for testing).

I take some time measurements and I see that the FFT processing time (FFTW) is about 58% of the total template loop processing time on a RapberryPi @1GHz and about 63% on a Parallella board. This different is most probably due to NEON engine in the Parallella's ARM processor.

Thank you,

KeithSloan

Joined: 13 Aug 10

Posts: 35

Credit: 1180559

RAC: 0

I see the new version of the

2 Jan 2015 20:50:39 UTC

Message 111873 in response to message 111872

(moderation:

)

I see the new version of the Pi's FFT library now supports 2^21. Does that mean its now closer but no cigar. See https://github.com/raspberrypi/firmware/tree/master/opt/vc/src/hello_pi/hello_fft

Parallella, Raspberry Pi, FPGA & All That Stuff

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner