Parallella, Raspberry Pi, FPGA & All That Stuff

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

RE: But I'm confused, which

22 Feb 2015 8:25:09 UTC

Message 111934 in response to message 111929

(moderation:

)

Quote:

But I'm confused, which one of your ARM hosts is a Pi2? This one here:

http://einsteinathome.org/host/11751336/tasks
seems to show exactly the kind of speed-up that my Pi2 got from the wisdom file. Is that another host?

Cheers
HB

Parallella's
Hostid=11389405 -> Using this to test
Hostid=11389404

Pi2's
Hostid=11751336 -> Using this to test
Hostid=11750595

Pi B+
Hostid=11660429

The fftw-wisdom on the Parallella took around 45 mins to generate. It looks like its used it (only completed one task so far with it), even though it got fftw v3.3.3 from the (Ubuntu) repo.

I tried generating one on the B+ but it generated an empty wisdom file. It picked up fftw v3.3.4 from the (Debian) repo. I removed it for the time being.

Looking through results for the Pi2, looks like a speed-up. Not sure why they weren't showing. I did note the cut-over date/time as being 18 Feb at 10:36 UTC. The first task is now showing (sent column) at 19 Feb at 10:45 UTC. Looks like the results have changed since I looked yesterday.

Can you clarify what the file name should be please. The fftw docs refer to wisdom (ie without the f), but you mention a wisdomf in the /etc/fftw directory.

BOINC blog

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

Just had a light-bulb moment.

22 Feb 2015 10:25:18 UTC

Message 111935

(moderation:

)

Just had a light-bulb moment. Are we using single precision? In which case the fftw-wisdom command should be replaced with fftwf-wisdom. I presume that's why you refer to a wisdomf file instead of the default wisdom file.

BOINC blog

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

The release sources use

22 Feb 2015 10:52:43 UTC

Message 111936 in response to message 111935

(moderation:

)

The release sources use single precision FFTW library (--enable-float compilation flag). In the single precision library we must use the fftwf_import_system_wisdom function and the "wisdomf" file (in the /etc/fftw directory).

The "wisdomf" file should be produced with the fftwf_wisdom executable (same FFTW version as in the compilation of the client - default 3.3.2).

I am running some tests with the wisdomf file and I will post the results here.

Thank you,

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 728016747

RAC: 1220808

RE: Just had a light-bulb

22 Feb 2015 10:59:21 UTC

Message 111937 in response to message 111935

(moderation:

)

Quote:

Just had a light-bulb moment. Are we using single precision? In which case the fftw-wisdom command should be replaced with fftwf-wisdom. I presume that's why you refer to a wisdomf file instead of the default wisdom file.

Indeed, that's the explanation.

Cheers
HB

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

You'll need to regen the

22 Feb 2015 12:26:50 UTC

Message 111938

(moderation:

)

You'll need to regen the wisdomf if you get to FFTW 3.3.3 or later...

Quote:

FFTW 3.3.3
Nov 25, 2012
â€¢ Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the bug report and patch, and to Graham Dennis for the bug report).
â€¢ Use 128-bit ARM NEON instructions instead of 64-bit instructions. This change appears to speed up even ARM processors with a 64-bit NEON pipe.
â€¢ Speed improvements for single-precision AVX.
â€¢ Speed up planner on machines without "official" cycle counters, such as ARM.

And just to make life interesting, Debian Wheezy seems to have 3.3.2, Debian Jessie 3.3.4 and Ubuntu Trusty 3.3.3.

I generated a wisdomf on the Parallella in patient mode (it was still going after 3 hours in exhaustive mode and I was impatient). Lets see if it makes any difference.

BOINC blog

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 728016747

RAC: 1220808

For me, the wisdomf file that

22 Feb 2015 12:47:23 UTC

Message 111939 in response to message 111938

(moderation:

)

For me, the wisdomf file that I generated on the Pi2 and posted here worked quite well on the Parallella as well,

http://einsteinathome.org/host/11381212/tasks.

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

RE: I generated a wisdomf

22 Feb 2015 14:12:17 UTC

Message 111940 in response to message 111938

(moderation:

)

Quote:

I generated a wisdomf on the Parallella in patient mode (it was still going after 3 hours in exhaustive mode and I was impatient). Lets see if it makes any difference.

3 hours for rof/rif12M or for all modes?
The EaH client only runs 12M R2C FFTs so you need only rof/rif 12M in the wisdomf file.

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

RE: 3 hours for rof/rif12M

22 Feb 2015 20:57:59 UTC

Message 111941 in response to message 111940

(moderation:

)

Quote:

3 hours for rof/rif12M or for all modes?
The EaH client only runs 12M R2C FFTs so you need only rof/rif 12M in the wisdomf file.

fftwf-wisdom -v -x -o wisdom rif12582912

That was while it was running 2 work units at the same time. If I run it when the machine is idle then its around 30-45 minutes. Patient mode is a lot quicker (about 5 minutes).

BOINC blog

BackGroundMAN

Joined: 25 Feb 05

Posts: 58

Credit: 246736656

RAC: 0

RE: fftwf-wisdom -v -x -o

23 Feb 2015 16:01:46 UTC

Message 111942 in response to message 111941

(moderation:

)

Quote:

fftwf-wisdom -v -x -o wisdom rif12582912

That was while it was running 2 work units at the same time. If I run it when the machine is idle then its around 30-45 minutes. Patient mode is a lot quicker (about 5 minutes).

I think that the most efficient is to run the wisdom with one core load in a dual-core.
Each of the EaH client will be running with one core load.

I run "fftwf-wisdom -v -x -o wisdom rof12582912" in an idle quad-core cortex-a9 (1GHz) in ~6 min
but rif12582912 in ~33 min. I think the the rif is way more complex than rof.
In parallella the rof wisdom file production took about 19min in idle cpu.

In the rof case I didn't see any differences in the wisdom files with and without cpu load.
Furthermore, in the rof case (custom build from release sources) I didn't see any performance speedup
with the use of wisdom file (TBS2910 - Cortex-A9@1GHz).

I test FFTW 3.3.2, 3.3.3 and 3.3.4 with and without wisdom files.
I run 100 templates (100 main loops) for all cases (FFTW version and with/without wisdom file) and the
results shown that there is a very small speedup (<0.5%) with the use of FFT-3.3.3.
The use of wisdom file in FFTW 3.3.3 has also a very small speedup (<0.5%).
The FFTW-3.3.3 with wisdom file has a speedup of <1% compared to FFTW-3.3.2 (default EaH client) without wisdom file.

All the clients in the comparison are custom builds of the release sources (with the addition of import_system_wisdom_file and different FFTW versions)
with extra aggressive optimization flags for ARM Cortex-A9.

I leave the FFTW-3.3.3 with wisdom file to crunch some WUs to see if there is any speedup.
With the FFTW-3.3.2 client with wisdom file I didn't see any difference in the WU crunching (you can see the results from this host here).

I will run some tests with the parallella cpu also.

Thank you,

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

RE: I will run some tests

23 Feb 2015 20:22:19 UTC

Message 111943 in response to message 111942

(moderation:

)

Quote:

I will run some tests with the parallella cpu also.

Thank you,

Did you try some timing between rif and rof?

It would be interesting to know if it would be worthwhile building such an ARM app for those that have the free memory. It might benefit your TBS2910 and the Parallella. We're definitely seeing a speedup on the Pi2 with the wisdom.

BOINC blog

Parallella, Raspberry Pi, FPGA & All That Stuff

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner