Parallella, Raspberry Pi, FPGA & All That Stuff

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

Looks like both sides flunked

Looks like both sides flunked the bet. No sign of any software on Raspberry Pi or Parallella and today is the 15th of September the deadline and a Birtshday

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

Hello, I have a raspberryPi

Hello, I have a raspberryPi B+ and I want to ask if there is a way to build the EaH client for ARM.

I want to test if there is any possibility to use the GPU FFT API for the 3*2^22 FFT of the EaH... :)

Thank you,

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

I would pm BikeMan he was

I would pm BikeMan he was supposed to be working on this. But I suspect he has got distracted. The Source should be available from http://einstein.phys.uwm.edu/license.php

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

Hello, The good news is

Hello,

The good news is that I managed to build the EaH client for RaspberryPi.

I download a test workunit from here and I run the following command in both RaspberryPi and a x86_64 PC :

./einsteinbinary_XXXX-pc-linux-gnu -t ./test/templates_400Hz_2_short2.bank -l zaplist_232.txt -A 0.04 -P 3.0 -W -z -i ./test/J1907+0740_dm_482.binary -c status_profile.cpt -o results_profile.cand

with a sort template bank (due to RaspberryPi) [templates_400Hz_2_short2.bank].

And now the bad news.
The raspberryPi @ 1GHz is almost 16 times slower than an AMD FX-8350 @ 4GHz.

The really bad news is that the results [results_profile.cand] from raspberryPi differ in some numbers (at the 4th fractional digit) compared to the results of FX-8350. I build the x86_64 client on another two PCs (AMD and Intel) and the results are the same with the FX-8350.
Is this normal for the RaspberryPi client ?
Is it possible the compilation flags I add for the cross-compile to produce these errors?

I use these flags for the compilation:

CFLAGS="-march=armv6zk -mcpu=arm1176jzf-s -mtune=arm1176jzf-s -mfpu=vfp -mfloat-abi=hard"

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

After running the same test

After running the same test workunit with several EaH clients (officials) on different architectures (x86_64, ARM, CUDA) I get different results. I guest that this is normal for the EaH (???).

The main problem now is that the ARM client I build produce different results than the official ARM client. This is most probably due to different CFLAGS I used.

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

Hope you have some success. I

Hope you have some success. I have two Pi's running Einstien@home 24 x 7 and would love somebody to exploit the FFT library so that they run at a more reasonable rate.

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

Hello, I try to measure the

Hello, I try to measure the processing time for the FFT on the RaspberryPi and I see that (file: demod_binary_fft_fftw.c) the FFT inputs are nsamples==6M and fft_size==3M. The FFTW plan is a R2C FFT of 6M-points (nsamples) which outputs 3M complex points (half of the FFT output).

My question is that the R2C FFT size is 3*2^22 or 3*2^21 ??

I am trying to implement this FFT by using the GPU on raspberry which supports up to 1M-point C2C FFTs. I have implement the Radix-2, Radix-3 and the C2C to R2C stages and I try to measure the potential speedup I can have with the GPU-FFT and the fft_size is crucial to the measurements...

Is the fft_size fixed or is variable based on WU ?

Thank you,

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

Check the following message

Check the following message http://einsteinathome.org/node/196560&nowrap=true#128680

It suggest to me that it computes a 3*2^22 real to complex DFT.

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

I take a WU from a running

I take a WU from a running system (ARM) and the fft_size is 12M-point R2C.
Most probably there are units that needs 12M-points FFTs and units that needs 6M-points FFTs (or I had very old units for testing).

I take some time measurements and I see that the FFT processing time (FFTW) is about 58% of the total template loop processing time on a RapberryPi @1GHz and about 63% on a Parallella board. This different is most probably due to NEON engine in the Parallella's ARM processor.

Thank you,

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

I see the new version of the

I see the new version of the Pi's FFT library now supports 2^21. Does that mean its now closer but no cigar. See https://github.com/raspberrypi/firmware/tree/master/opt/vc/src/hello_pi/hello_fft

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.