Parallella, Raspberry Pi, FPGA & All That Stuff

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 722925263
RAC: 1154816

RE: I presume this fftw

Quote:

I presume this fftw plan would also work on the Parallella although probably not as optimal for the cortex A9 cpu (the Pi2 has cortex A7).

[It's actually just wisdom, not a plan as such. I understand that FFTW still needs to assemble a plan from the performance hints of FFT building blocks given in the wisdom.]

Trying it on a Parallella would be an interesting experiment. Apart form the CPU cores, also the cost of memory transfers should be different on both platforms (if you run 2 tasks in parallel on the Parallella compared to 4 in parallel with the Raspi2.)

HB

Highlander
Highlander
Joined: 1 Jul 05
Posts: 24
Credit: 141580701
RAC: 7710

out of curiosity, i had

out of curiosity, i had myself forced to do a testrun on the Odroid C1 at total stock freq, running 2 at a time:

http://einsteinathome.org/host/11753871/tasks

around 37500 sec for one workunit
running with only boinc-client, Ubuntu repo version 7.2.42, managed with Boinc-Tasks from another computer.
max temp of 59 C with ambient temp of 20 C, but using original heatsink in a case.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: I've put it onto one of

Quote:
I've put it onto one of my Pi2's.


I put the run times for 24 tasks into a spread sheet before the change. Average was 82,238 seconds. Its completed 8 runs since the change and times seem to have increased to 83,451 seconds.

I suspect its not using the hints. Does user boinc need to be the owner of /etc/fftw/wisdomf? Does the file need some attributes set, or do we have to install the full libfftw3 package? This is what I've got at the moment:
[pre]
pi@xxx /etc/fftw $ ls -lh
total 4.0K
-rw-r--r-- 1 root root 2.1K Feb 19 21:43 wisdomf
pi@xxx /etc/fftw $
[/pre]

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: I think that EaH client

Quote:

I think that EaH client only uses 12M R2C FFTW functions, so you can do an exhaustive wisdom only for this type so save a lot of time:

fftw-wisdom -v -x -o wisdom_eah rof12582912

Running this on a Parallella at the moment. Had to install libfftw3-dev package.

With the issue above, the doco I've read refers to a system-wide fftw default file in /etc/fftw/wisdom (without the f).

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

Correction: Running

Correction: Running fftw-wisdom -v -x -o wisdom rif12582912 on a Parallella at the moment.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 722925263
RAC: 1154816

RE: RE: I've put it onto

Quote:
Quote:
I've put it onto one of my Pi2's.

I put the run times for 24 tasks into a spread sheet before the change. Average was 82,238 seconds. Its completed 8 runs since the change and times seem to have increased to 83,451 seconds.

I suspect its not using the hints. Does user boinc need to be the owner of /etc/fftw/wisdomf? Does the file need some attributes set, or do we have to install the full libfftw3 package? This is what I've got at the moment:
[pre]
pi@xxx /etc/fftw $ ls -lh
total 4.0K
-rw-r--r-- 1 root root 2.1K Feb 19 21:43 wisdomf
pi@xxx /etc/fftw $
[/pre]

That is all perfect. No need to install libfftw3 and the wisdom file just needs to be readable by boinc. The /etc/fftw directory itself has to be readable and 'executable' for user boinc as well , tho! You might want to check that.

But I'm confused, which one of your ARM hosts is a Pi2? This one here:

http://einsteinathome.org/host/11751336/tasks
seems to show exactly the kind of speed-up that my Pi2 got from the wisdom file. Is that another host?

Cheers
HB

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 722925263
RAC: 1154816

RE: Correction: Running

Quote:
Correction: Running fftw-wisdom -v -x -o wisdom rif12582912 on a Parallella at the moment.

Great! Prepare to wait for quite some time (days) for this to finish.

Notes (I mentioned the following points before, but just to be sure):

a) The version of fftw-wisdom must match the version of libfftw3 that would use the wisdom file, for the BRP4 stock neon app from E@H, this is version 3.3.2 (!). Wisdom generated by 3.3.3 or later versions won't help! I just wanted to stress this before you waste days of computing time on that little fella.

b) The stock BRP , ARM,linux, neon version of the app that E@H distributes uses in-place FFTs. If you take the official sourcecode tarball (doesn't include our Android and ARM-linux patches) and adapt it to ARM yourself, you will get an app that uses out-of-place FFTs.

c) When experimenting with this I found that it is better to run fftw-wisdom while the CPU is under a similar load level compared to when the wisdom file will be used. If you stop BOINC and run just one instance of fftw-wisdom, the timings that fftw-wisdom will collect might be very different because it can use the full bandwidth to the RAM, while under full E@H load, libfftw3 will need to share the bandwidth with the other BRP instances running.

Good luck for getting good wisdom for the Parallella

HB

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

RE: Great! Prepare to wait

Quote:


Great! Prepare to wait for quite some time (days) for this to finish.

Notes (I mentioned the following points before, but just to be sure):

a) The version of fftw-wisdom must match the version of libfftw3 that would use the wisdom file, for the BRP4 stock neon app from E@H, this is version 3.3.2 (!). Wisdom generated by 3.3.3 or later versions won't help! I just wanted to stress this before you waste days of computing time on that little fella.

b) The stock BRP , ARM,linux, neon version of the app that E@H distributes uses in-place FFTs. If you take the official sourcecode tarball (doesn't include our Android and ARM-linux patches) and adapt it to ARM yourself, you will get an app that uses out-of-place FFTs.

c) When experimenting with this I found that it is better to run fftw-wisdom while the CPU is under a similar load level compared to when the wisdom file will be used. If you stop BOINC and run just one instance of fftw-wisdom, the timings that fftw-wisdom will collect might be very different because it can use the full bandwidth to the RAM, while under full E@H load, libfftw3 will need to share the bandwidth with the other BRP instances running.

Good luck for getting good wisdom for the Parallella

HB

I compile the fftw-3.3.2 for the board (TBS2910) and run the fftwf-wisdom for rof12M with other 3 EaH clients for CPU load (quad-core CPU).

Is there any other changes in the source of the ARM official client?
When the ARM-patches would be available in the source releases?

Thank you,

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 722925263
RAC: 1154816

RE: Is there any other

Quote:

Is there any other changes in the source of the ARM official client?
When the ARM-patches would be available in the source releases?

ARM-changes:
* The backtrace - dumping functionality is turned off because I couln't get it to work for the ARM,

* the build.sh script is heavily modified to allow also for cross-compilation

* FFT is in-place,

* support for pre-canned wisdom read from a string compiled into the code itself

* support for reading the /etc/fftw/wisdomf file. See http://www.fftw.org/doc/Caveats-in-Using-Wisdom.html on how to include this in your code, it's just a single line.

We are currently working on performance fixes and improvements, after that is finished and tested, a new source tar-ball should be created.

Cheers
HB

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 722925263
RAC: 1154816

Is this the

Is this the future?

http://www.computerworld.com/article/2885164/lenovo-building-its-first-prototype-arm-server.html

I think this is an interesting approach. Unlike the accelerator chip on the Parallella, this solution uses plain ARM cores in a multi-core SoC (up to 48 cores per chip). Should be much easier to write software for, and is similar to the current Xeon Phis but w/o the complication of having a traditional host system and an accelerator card with x86 multicores connected over PCIe.

I wonder how many years it will take for the first ARM based supercomputer to appear in the top 500 list. Any bets?

Cheers
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.