Parallella, Raspberry Pi, FPGA & All That Stuff

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

Yes, I sent an email to

Yes, I sent an email to Andrew Holme (RPi GPU_FFT developer) and he sent me the code for the 2M-point GPU_FFT to test with the EaH client. He released this code after some days. It is faster to do 3 2M-point FFTs and a radix-3 stage than 6 1M-point FFTs, 3 radix-2 stages and a radix-3 stage.

I measure the speedup on the main loop of the EaH and I have 37% for the 6x1M-point GPU_FFTs and 42% for the 3x2M-point GPU_FFTs. These numbers can reduce the total execution time (6662 loop iterations) of one WU from 31 hours to 19.5 and 17.9 hours.

The problem is that the RMS error of the GPU_FFT implementation is very high compared to the RMS error of the FFTW library and it is very unlike to produce valid results. I am working on changing the twiddle calculation of the GPU_FFT and I am going to inform Andrew about this problem with the RMS error. I hope there is an easy solution to this problem...

Thank you,

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686042851
RAC: 592401

Hi! I'm glad to see that

Hi!

I'm glad to see that someone is looking into this, I haven't had too much success so far in my experiments with the Raspi GPU FFT. The newly released FFT lib improvements (including a GPU accelerated transpose and a macro assembler for the GPU) should be quite useful. I was planning to try a radix6 FFT on the GPU but this is only a private side-project for me, I haven't yet mastered the Raspi's GPU details.

We use different FFT lengths for the Arecibo and Parkes BRP searches, but only the Arecibo search runs on the Raspi. That one is using a 3*2^22 real-to-complex, single-precision FFT for all of its WUs (for ARM Linux (e.g.Raspi), Android, Intel HD GPUs, and sometimes WUs for x86 CPUs as well).

As for the RMS error: did you check the error yourself or is this taken from the output of the "hello-fft" demo test program for the PI? I think that demo code has actually a bug in its RMS error calculation for long transforms, so I would not rely on that output.

Cheers
HB

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

Hi, I developed a custom

Hi,

I developed a custom code to verify several splits in the 3*2^22 R2C FFT.
The verification can use random data (uniform or normal distribution with several scaling factors)
and data from a EaH WU (dumped from the EaH client).

For the case of FFT input data from a WU I have the following results:

1. 3xGPU_FFT 2M + radix-3 + R2C stage -> RMS = 5.38*10^-4
2. 6xGPU_FFT 1M + 3xradix-2 + radix-3 + R2C stage -> RMS = 2.69*10^-4
3. 6xFFTW 1M + 3xradix-2 + radix-3 + R2C stage -> RMS = 1.23*10^-7
4. 3xFFTW 2M + radix-3 + R2C stage -> RMS = 2.05*10^-7

All the results are compared with the R2C FFTW 12M-point FFT (used by the default EaH client).

Furthermore, I implement the cases 1,2 and 4 inside the EaH client and run tests with a WU and 100/500 templates.
The default client and the case4 produce the same results while the cases 1,2 produce different results (fewer/different candidates).
I will run more test but I think that the 3*10^-4 RMS is high for the EaH client.

Thanks,

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686042851
RAC: 592401

Nice testing!! Yeah

Nice testing!!

Yeah indeed, the error for the GPU variants look rather high :-(.

Cheers
HB

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

So has there been any

So has there been any reaction from Andrew the author of the Pi's GPU FFT library?

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

Hi, We are discussing with

Hi,

We are discussing with Andrew the possibility of changing the twiddle generation functions with a LUT-based twiddle functionality to improve accuracy performance of the GPU_FFT.
This might solve the accuracy problems for EaH client.

Thank you,

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686042851
RAC: 592401

In somewhat of a surprise

In somewhat of a surprise move, the Raspberry Pi Foundation has today announced the availability of the "Raspberry Pi 2 Model B".

http://www.raspberrypi.org/raspberry-pi-2-on-sale/

I think this is pretty close to being exactly what people had hoped for as the next Raspberry Pi, only sooner than expected. Some highlights:

* same price point, 35 US$
* more RAM (1GB instead of 512 MB for the model B)
* more cores : now a quad-core
* more up-to date instruction set (ARMv7) which means that a wider range of software will be usable out of the box, including ...Windows 10 ...(ehhh...???!??!? who needs that... well anyway....)
* this also means NEON vector arithmetic units.
* 100% backwards compatibility to Raspi "1" B+ (software, GPIO stuff, also the GPU (!))

The only, inevitable drawback is the higher power requirements under full load,
EDIT: that is, compared to the B+, it's reported to be about the same as the original B \EDIT
so after upgrading you might need a better power supply.

Can't wait to get mine....

Einstein@Home already has an ARMv7 NEON capable BRP4 app for Linux, and the 1GB RAM should in theory be enough to run 3 or 4 tasks in parallel, so E@H is more than ready for this new board. All this would probably boost productivity in E@H compared to a Raspi "1" by a factor of 4 or more if (!!) the RAM can provide the necessary throughput.

Cheers
HB

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

RE: In somewhat of a

Quote:

In somewhat of a surprise move, the Raspberry Pi Foundation has today announced the availability of the "Raspberry Pi 2 Model B".

http://www.raspberrypi.org/raspberry-pi-2-on-sale/

I think this is pretty close to being exactly what people had hoped for as the next Raspberry Pi, only sooner than expected. Some highlights:

E@H is more than ready for this new board. All this would probably boost productivity in E@H compared to a Raspi "1" by a factor of 4 or more if (!!) the RAM can provide the necessary throughput.

Cheers
HB

Hope this is not going to stop the effort to exploit the GPU with the GPU fft library, as I have two of the old Raspberry B's that run E@H 24 x 7 that could do with a speed boost.

I have one not overclocked that has an average credit just over 24.
One that is over clocked to 900Mhz who's average credit is just over 34.
I also have a cubieboard 2 which is like the new Raspberry Pi but only dual core not quad which averages over 120. If I got a new Pi 2 and got over 240, that would be close to my oldie desktop 2.5Ghz Intel Core 2 Dual that averages 279.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686042851
RAC: 592401

Hi! RE: Hope this

Hi!

Quote:


Hope this is not going to stop the effort to exploit the GPU with the GPU fft library, as I have two of the old Raspberry B's that run E@H 24 x 7 that could do with a speed boost.

I have one not overclocked that has an average credit just over 24.
One that is over clocked to 900Mhz who's average credit is just over 34.
I also have a cubieboard 2 which is like the new Raspberry Pi but only dual core not quad which averages over 120. If I got a new Pi 2 and got over 240, that would be close to my oldie desktop 2.5Ghz Intel Core 2 Dual that averages 279.

Hmm...that is a bit strange: running 24/7 overclocked, the performance should be similar to this Raspi of mine:

http://einsteinathome.org/host/11456696/tasks&offset=0&show_names=1&state=3&appid=0

so more like RAC of 60.

Good point about GPU acceleration: Yes, the Raspi2 has the same GPU so the code will (should) still work, but with the more efficient and NEON enabled new CPU, the incentive to go this way is weaker. OTOH, the Cortex-A7 cores of the Raspi2 do not use NEON to it's full potential (64bit execution in parallel, so just 2 single precision ops , not 4 as in the Cortex A15, for example). Still...a RAC of 240 might be feasible, beating my OUYA under Android (doing 3 tasks in parallel 24/7 ).

HB

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

The new Rpi2 has a custom SoC

The new Rpi2 has a custom SoC from broadcom (BCM2836) that it is the same with the old one from Rpi1B (BCM2835) with a new quad-core ARM Cortex-A7. The GPU processor (VideoCore IV 3d) is the same and I think that the current GPU_FFT API would run out-of-the-box (or with small changes in the source).

I am not so sure that the use of the GPU_FFT in the EaH client would have a significant speedup over the Cortex-A7. Furthermore, I think that it is very difficult to support multiple clients (multi-core) with GPU_FFT at the same time.

I am trying from yesterday to buy some Rpi2 boards but I can't find a shop with shipping in my country yet.

I will run some tests with EaH client in the new Rpi2 and post it in the forum.

Thank you,

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.