Parallella, Raspberry Pi, FPGA & All That Stuff

KeithSloan

Joined: 13 Aug 10

Posts: 35

Credit: 1180559

RAC: 0

RE: Hmm...that is a bit

3 Feb 2015 19:51:14 UTC

Message 111884 in response to message 111882

(moderation:

)

Quote:

Hmm...that is a bit strange: running 24/7 overclocked, the performance should be similar to this Raspi of mine:

Mine is at http://einsteinathome.org/account/tasks
Maybe the validate inconclusive's are the problem. Not spotted them before.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 727138877

RAC: 1237710

That link won't work as your

4 Feb 2015 11:07:27 UTC

Message 111885 in response to message 111884

(moderation:

)

That link won't work as your hosts are hidden. Even then, links to individual hosts will work, tho.

Cheers
HB

Tom Rinehart

Joined: 17 Jun 09

Posts: 9

Credit: 6591748

RAC: 0

RE: Hi, We are discussing

4 Feb 2015 19:17:48 UTC

Message 111886 in response to message 111879

(moderation:

)

Quote:

Hi,

We are discussing with Andrew the possibility of changing the twiddle generation functions with a LUT-based twiddle functionality to improve accuracy performance of the GPU_FFT.
This might solve the accuracy problems for EaH client.

Thank you,

Have you heard back from Andrew and if so have you had any success creating a client that that uses the GPU_FFT? I run a solar-powered RPI model B and it would be great to have a faster client. I'm using the Turbo setting (1 GHz) and added heat sinks to my RPI. It completes a work unit in 25 hours.

KeithSloan

Joined: 13 Aug 10

Posts: 35

Credit: 1180559

RAC: 0

RE: That link won't work as

4 Feb 2015 19:54:20 UTC

Message 111887 in response to message 111885

(moderation:

)

Quote:

That link won't work as your hosts are hidden. Even then, links to individual hosts will work, tho.

Cheers
HB

Okay try these

http://einsteinathome.org/host/10457609/tasks
http://einsteinathome.org/host/11678121/tasks

I see these are both about 62 points per WU.

The numbers I was quoting were from
http://einstein.phys.uwm.edu/hosts_user.php

Which I assume is per day

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 727138877

RAC: 1237710

Hi! I see. The per task

4 Feb 2015 20:49:30 UTC

Message 111888 in response to message 111887

(moderation:

)

Hi!

I see. The per task CPU run time is consistently higher than on my fastest Raspi which needs less than 90k sec per task (the tasks are pretty much all equal, run time wise).

I guess they might be less aggressively overclocked, mine (a model B) has these parameters in /boot/config.txt :

arm_freq=1000
core_freq=500
sdram_freq=600
over_voltage=6

Not all Raspis will overclock to this level, tho, try at your own risk :-).

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 316940217

RAC: 365644

The twiddles* are a challenge

4 Feb 2015 21:59:45 UTC

Message 111889

(moderation:

)

The twiddles* are a challenge indeed, they are the piece of the FFT which doesn't factorise like the rest. Producing them efficiently is governed by the classic memory vs. speed conundrum. If they are not accurate then one effectively gets a lower resolution transform.

What's the memory on a Pi and what's the float/precision operand lengths ?

Cheers, Mike.

* Essentially one needs the sine and cosine of every angle from 0 to 2PI in increments of 2PI/N where N is the transform size.

( edit ) Perversely : on the Parallella platform you could probably just totally ignore the Epiphany chip and use the dual-core ARM, BUT configure the FPGA to do the FFT heavy lifting. Even if the FPGA simply generated the twiddles on demand that would still be one heck of a speed and memory advantage. I'll have another look at that when/if I get a chance ...

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 727138877

RAC: 1237710

RE: What's the memory on a

4 Feb 2015 23:49:11 UTC

Message 111890 in response to message 111889

(moderation:

)

Quote:

What's the memory on a Pi and what's the float/precision operand lengths ?

The "Raspberry Pi Model B" comes with 512 MB RAM, shared by CPU and GPU, with a configurable memory-split. We are talking about single precision FFTs, real to complex.

The new quad-core "Raspberry Pi 2 model B" has 1 GB RAM. The smaller 256 MB RAM "model A" is probably not worth exploring for this.

Cheers
HB

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

Unfortunately, I do not have

5 Feb 2015 10:58:20 UTC

Message 111891

(moderation:

)

Unfortunately, I do not have any update yet on the GPU_FFT. I didnâ€™t had the chance to change the twiddle generation procedure.

@Mike Hewson: Actually for a N-point C2C FFT you can reduce the memory needed by storing the cos/sin values for N/8 angles and take advantage of the twiddle symmetries (more calculations). The GPU of the Rpi supports single precision floats (32bit) but I do not know if there is any extension mode for the intermediate results (e.g. 40bits). At the current implementation the twiddles are pre-calculated in the ARM with double precision and then stored (casted) in single precision on the GPU memory. The accuracy problem, most probably, is a result of the pre-calculation procedure which is step-based and not LUT-based. The step-based procedure calculates â€œhigherâ€ twiddles (smaller angles) based on previous calculated twiddle values. This technique accumulates errors in the â€œhigherâ€ twiddles.

Thank you,

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 727138877

RAC: 1237710

That sounds reasonably easy

5 Feb 2015 13:27:17 UTC

Message 111892

(moderation:

)

That sounds reasonably easy to fix. Actually we at E@H had a pretty similar problem some years ago when we used an OpenCL FFT lib that computed the twiddle factors with faster, but reduced precision trig. functions (native_sin, native_cos). We replaced this with a LUT based method.

Claggy

Joined: 29 Dec 06

Posts: 560

Credit: 2699403

RAC: 0

RE: Can't wait to get

5 Feb 2015 22:02:20 UTC

Message 111893 in response to message 111880

(moderation:

)

Quote:

Can't wait to get mine....

Einstein@Home already has an ARMv7 NEON capable BRP4 app for Linux, and the 1GB RAM should in theory be enough to run 3 or 4 tasks in parallel, so E@H is more than ready for this new board. All this would probably boost productivity in E@H compared to a Raspi "1" by a factor of 4 or more if (!!) the RAM can provide the necessary throughput.

Mine arrived yesterday, the Micro-SDCard this morning, just spent the last couple of hours putting Raspbian Wheezy on it, Booting it, wow that was quick,
Getting the Boinc source, Building Boinc 7.2.47 and attaching it to Seti, Seti Beta, Einstein and Albert,
It's got a couple of Neon tasks from here, and a couple of non-Neon tasks from Albert, just running two up at present:

Computer 11741356 at Einstein

COMPUTER 12650 at Albert

Claggy

Parallella, Raspberry Pi, FPGA & All That Stuff

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner