Parallella, Raspberry Pi, FPGA & All That Stuff

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

RE: Hmm...that is a bit

Quote:

Hmm...that is a bit strange: running 24/7 overclocked, the performance should be similar to this Raspi of mine:

Mine is at http://einsteinathome.org/account/tasks
Maybe the validate inconclusive's are the problem. Not spotted them before.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731085150
RAC: 1201698

That link won't work as your

That link won't work as your hosts are hidden. Even then, links to individual hosts will work, tho.

Cheers
HB

Tom Rinehart
Tom Rinehart
Joined: 17 Jun 09
Posts: 9
Credit: 6591748
RAC: 0

RE: Hi, We are discussing

Quote:

Hi,

We are discussing with Andrew the possibility of changing the twiddle generation functions with a LUT-based twiddle functionality to improve accuracy performance of the GPU_FFT.
This might solve the accuracy problems for EaH client.

Thank you,

Have you heard back from Andrew and if so have you had any success creating a client that that uses the GPU_FFT? I run a solar-powered RPI model B and it would be great to have a faster client. I'm using the Turbo setting (1 GHz) and added heat sinks to my RPI. It completes a work unit in 25 hours.

KeithSloan
KeithSloan
Joined: 13 Aug 10
Posts: 35
Credit: 1180559
RAC: 0

RE: That link won't work as

Quote:

That link won't work as your hosts are hidden. Even then, links to individual hosts will work, tho.

Cheers
HB

Okay try these

http://einsteinathome.org/host/10457609/tasks
http://einsteinathome.org/host/11678121/tasks

I see these are both about 62 points per WU.

The numbers I was quoting were from
http://einstein.phys.uwm.edu/hosts_user.php

Which I assume is per day

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731085150
RAC: 1201698

Hi! I see. The per task

Hi!

I see. The per task CPU run time is consistently higher than on my fastest Raspi which needs less than 90k sec per task (the tasks are pretty much all equal, run time wise).

I guess they might be less aggressively overclocked, mine (a model B) has these parameters in /boot/config.txt :

arm_freq=1000
core_freq=500
sdram_freq=600
over_voltage=6

Not all Raspis will overclock to this level, tho, try at your own risk :-).

HB

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6590
Credit: 318749181
RAC: 408119

The twiddles* are a challenge

The twiddles* are a challenge indeed, they are the piece of the FFT which doesn't factorise like the rest. Producing them efficiently is governed by the classic memory vs. speed conundrum. If they are not accurate then one effectively gets a lower resolution transform.

What's the memory on a Pi and what's the float/precision operand lengths ?

Cheers, Mike.

* Essentially one needs the sine and cosine of every angle from 0 to 2PI in increments of 2PI/N where N is the transform size.

( edit ) Perversely : on the Parallella platform you could probably just totally ignore the Epiphany chip and use the dual-core ARM, BUT configure the FPGA to do the FFT heavy lifting. Even if the FPGA simply generated the twiddles on demand that would still be one heck of a speed and memory advantage. I'll have another look at that when/if I get a chance ...

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731085150
RAC: 1201698

RE: What's the memory on a

Quote:

What's the memory on a Pi and what's the float/precision operand lengths ?

The "Raspberry Pi Model B" comes with 512 MB RAM, shared by CPU and GPU, with a configurable memory-split. We are talking about single precision FFTs, real to complex.

The new quad-core "Raspberry Pi 2 model B" has 1 GB RAM. The smaller 256 MB RAM "model A" is probably not worth exploring for this.

Cheers
HB

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

Unfortunately, I do not have

Unfortunately, I do not have any update yet on the GPU_FFT. I didn’t had the chance to change the twiddle generation procedure.

@Mike Hewson: Actually for a N-point C2C FFT you can reduce the memory needed by storing the cos/sin values for N/8 angles and take advantage of the twiddle symmetries (more calculations). The GPU of the Rpi supports single precision floats (32bit) but I do not know if there is any extension mode for the intermediate results (e.g. 40bits). At the current implementation the twiddles are pre-calculated in the ARM with double precision and then stored (casted) in single precision on the GPU memory. The accuracy problem, most probably, is a result of the pre-calculation procedure which is step-based and not LUT-based. The step-based procedure calculates “higher†twiddles (smaller angles) based on previous calculated twiddle values. This technique accumulates errors in the “higher†twiddles.

Thank you,

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 731085150
RAC: 1201698

That sounds reasonably easy

That sounds reasonably easy to fix. Actually we at E@H had a pretty similar problem some years ago when we used an OpenCL FFT lib that computed the twiddle factors with faster, but reduced precision trig. functions (native_sin, native_cos). We replaced this with a LUT based method.

HB

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

RE: Can't wait to get

Quote:

Can't wait to get mine....

Einstein@Home already has an ARMv7 NEON capable BRP4 app for Linux, and the 1GB RAM should in theory be enough to run 3 or 4 tasks in parallel, so E@H is more than ready for this new board. All this would probably boost productivity in E@H compared to a Raspi "1" by a factor of 4 or more if (!!) the RAM can provide the necessary throughput.


Mine arrived yesterday, the Micro-SDCard this morning, just spent the last couple of hours putting Raspbian Wheezy on it, Booting it, wow that was quick,
Getting the Boinc source, Building Boinc 7.2.47 and attaching it to Seti, Seti Beta, Einstein and Albert,
It's got a couple of Neon tasks from here, and a couple of non-Neon tasks from Albert, just running two up at present:

Computer 11741356 at Einstein

COMPUTER 12650 at Albert

Claggy

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.