Parallella, Raspberry Pi, FPGA & All That Stuff

KF7IJZ
KF7IJZ
Joined: 27 Feb 15
Posts: 110
Credit: 6108311
RAC: 0

I did use your script.  I

I did use your script.  I received the success message on both.  I did not already have a wisdom file in place.  I have put it in place, restarted the app (though I'm running 1.47 beta) - will see what happens!

If you are curious, this is the tinker board - https://einsteinathome.org/host/12523666

 

My YouTube Channel: https://www.youtube.com/user/KF7IJZ
Follow me on Twitter: https://twitter.com/KF7IJZ

N30dG
N30dG
Joined: 29 Feb 16
Posts: 89
Credit: 4805610
RAC: 0

KF7IJZ schrieb:When you say

KF7IJZ wrote:
When you say "use the 1.42_NEON", do you mean the non beta version of the BRP4 app?  If so, I will need to create another account solely for my Tinker Board as the "use beta" is an account wide setting

No, you can download the 1.42 manually from here http://einstein-dl3.phys.uwm.edu/download/einsteinbinary_BRP4_1.42_arm-unknown-linux-gnueabihf__NEON

Then run it as anonymous platform using an app_info.xml

 

N30dG
N30dG
Joined: 29 Feb 16
Posts: 89
Credit: 4805610
RAC: 0

Sorry there is yet another

Sorry there is yet another problem. I forgotten that the offical brp-app's for ARM uses a inplace-transformation so the generated wisdom would never be used (it's incompatible).

You have to build the app manually from our repository. I'm in holidays over the weekend, but I'm sure steffen can help you with that.

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

KF7IJZ wrote:I did use your

KF7IJZ wrote:

I did use your script.  I received the success message on both.  I did not already have a wisdom file in place.  I have put it in place, restarted the app (though I'm running 1.47 beta) - will see what happens!

If you are curious, this is the tinker board - https://einsteinathome.org/host/12523666

 

Ah, ok, so with the official client you are between 18000 and 19500 seconds from how I read it all. No upload yet with the self-built bits. From what I understand so far, this is not bad and likely to be accredited to the ARM6 wisdom file that is hard-wired to the official client, right? Ok. Let's seeeeeee ..... .

Somewhat thrilling, isn't it?

I just received somre news per email from traveling N30dG who ran into an illegal instruction error on his ARM platform while toying with the 3.3.5 fftw wisdomf generation. I actually like hearing about such problems. By addressing them we are hardening the ARM platform for everyone and E@H gives back more than the scientific insights.

KF7IJZ
KF7IJZ
Joined: 27 Feb 15
Posts: 110
Credit: 6108311
RAC: 0

This is exciting stuff

This is exciting stuff indeed.  It would be amazing if this work leads to unlocking other types of work such as the gravitational wave stuff which I know the E@H team has played around with.

I've noticed that with the wisdom in place, I'm getting results back with slightly faster CPU times, but not faster overall times.  I've also noticed that at least one had been validated.

In other news, I am in the process of successfully converting my Pi3 stack to be complete free of uSD cards.  I have my node 0 booting from a USB mSATA drive and the remaining nodes booting using tftboot from the 0 node.  I have 5/8 total nodes online.  I'm considering purchasing a 16 port version of this switch (which was the key to getting netbooting to work) and seeing if I can successfully netboot a total of 15 Pi3s from a single Pi3.  All of my current Pi3s are running N30dG's optimized application.

My YouTube Channel: https://www.youtube.com/user/KF7IJZ
Follow me on Twitter: https://twitter.com/KF7IJZ

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

I very much like the net boot

I very much like the net boot approach. Congrats!

The BRP app just found its way to the Debian upload ("new") queue
  https://ftp-master.debian.org/new.html
where it will be scrutinized for not violating the Debian Free Software Guidelines and one makes sure that everything looks good. Once it is accepted it is passed to the build demons that build the package on many different platforms, among which there is also 64-bit ARM
  https://buildd.debian.org/
Subsequent uploads will be immediate.

You may be aware of the BOINC package Debian already offers
  https://tracker.debian.org/pkg/boinc
which is also available on all those platforms, so nobody should need to recompile anything. This package also provides dynamic libraries, i.e. the code the client and the app use to communicate with each other. This is the same for BRP apps running, so there is little need to copy that code several times in the computer memory. You want those apps to share that code. And not only that, also the fftw3 and gsl library shall be dynamically bound (http://stackoverflow.com/questions/1993390/static-linking-vs-dynamic-linking).  Here we go:

Quote:
$ ldd /usr/lib/boinc-app-eah-brp/einsteinbinary_BRP4
        linux-vdso.so.1 (0x00007ffe7b7cf000)
        libfftw3f.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3f.so.3 (0x00007f4112359000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f4111f9e000)
        libboinc_api.so.7 => /usr/lib/x86_64-linux-gnu/libboinc_api.so.7 (0x00007f4111d7b000)
        libboinc.so.7 => /usr/lib/x86_64-linux-gnu/libboinc.so.7 (0x00007f4111ad4000)
        libbfd-2.28-system.so => /usr/lib/x86_64-linux-gnu/libbfd-2.28-system.so (0x00007f411178c000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4111408000)
        libgsl.so.19 => /usr/lib/x86_64-linux-gnu/libgsl.so.19 (0x00007f4110fa3000)
        libgslcblas.so.0 => /usr/lib/x86_64-linux-gnu/libgslcblas.so.0 (0x00007f4110d66000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4110b49000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f411092f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f411062b000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f411028b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4110074000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f410fe70000)
        libicui18n.so.57 => /usr/lib/x86_64-linux-gnu/libicui18n.so.57 (0x00007f410f9f6000)
        libicuuc.so.57 => /usr/lib/x86_64-linux-gnu/libicuuc.so.57 (0x00007f410f64e000)
        libicudata.so.57 => /usr/lib/x86_64-linux-gnu/libicudata.so.57 (0x00007f410dbd1000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f410d9a9000)
        /lib64/ld-linux-x86-64.so.2 (0x000055b34a22a000)

We share a bit of memory this way, but only when different applications started at the same time (like boinc-client and BRP-app) share a library. When the same binary is started multiple times, the OS should note that, too :)  Anyway, as N30dG indicated, we are likely to run out of memory when running the presumed faster invocation of the FFTW where the output is kept separate from the input, so extra memory is required for storing the results. On this virtual Intel machine here I need 219800kByte (214MB) resident memory per task. Times four, this does not leave much for the 1GB barrier, which is all very much of concern if we want to avoid swapping.

I have just ordered some USB power supplies and hope to start up some ARM devices myself now. The next weekend should bring some news.

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

Good news: The BRP package

Good news: The BRP package was just accepted by Debian. As outlined above it is available on many platforms now, among which is also ARM64

https://packages.debian.org/unstable/boinc-app-eah-brp

If you are accepting packages from the sid ("unstable") distribution of Debian already then if curious please try

apt-get update apt-get install boinc-app-eah-brp

Because of the now any moment awaited new Debian release, which this package will miss, it will take a while until we can provide backports. But there is the following shortcut:

To compile for your local machine, first install the build dependencies

sudo apt-get install debhelper build-essential libboinc-app-dev libxml2-dev libiberty-dev libgsl-dev binutils-dev libfftw3-dev pkg-config

If the package libgsl-dev is not available, try for libgsl0-dev, instead. Then run

sudo apt-get install devscripts # which provides the 'dget' tool dget --build http://httpredir.debian.org/debian/pool/main/b/boinc-app-eah-brp/boinc-app-eah-brp_0.20170426+dfsg-2.dsc

Install with

sudo dpkg -i boinc-app-eah-brp_0.20170426+dfsg-2_amd64.deb

Any self-compiled version on ARM platforms will depend on the /etc/fftw/wisdomf file, so please do not forget about that one to be competitive. And - you are likely to see more invalid work units than before. We cannot tell how many that will be. Do not feel too disappointed, it is part of the development. Just go back to the official clients if it is too bad, still. So, for this "-2" version of the package, its first aim may be to just get a first assessment of a speed gain for the better-than-Raspberry ARM-machines. We then know if it is worth the effort. And we learn about the frequency of non-tolerated deviations from the reference client.

 

 

KF7IJZ
KF7IJZ
Joined: 27 Feb 15
Posts: 110
Credit: 6108311
RAC: 0

Thanks for the update Steffen

Thanks for the update Steffen - Some questions:

What is the difference between the brp4 app in boing-app-eah-brp and the version N30dG had made available through apt-get?

Does the self compiled version rely on the fftw installed on the system, or is it included in the build?

If it is included in the build, do we just run your script (create_wisdomf_eah_brp.sh) to generate wisdom?

Should the machine be "burdened" while generating wisdom (should we leave E@H running)?

 

In other news, I finished rebuilding my hardware "cluster" this weekend.  All 8 Pis were physically moved to the Netgear GS108Ev3 switch, 20mm standoffs between Pis were replaced with 18mm, and everything was net booted.  I have two machines that are consistently returning signal 11s and erring out a work unit about 1/day or so.  Funny thing is that both of these machines were Allied Electronics builds rather than Element 14 builds.  Picture is here:  https://twitter.com/KF7IJZ/status/861361430079180800

 

My YouTube Channel: https://www.youtube.com/user/KF7IJZ
Follow me on Twitter: https://twitter.com/KF7IJZ

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

Quote:KF7IJZ wrote:Thanks for

KF7IJZ wrote:
Thanks for the update Steffen - Some questions:What is the difference between the brp4 app in boing-app-eah-brp and the version N30dG had made available through apt-get?

Not all the optimisations of N30dG have yet made it into the Debian package. But you can easily recompile everything yourself and if the system fftw library on ARM was properly configured (see below) you would be more competitive.

KF7IJZ wrote:
Does the self compiled version rely on the fftw installed on the system, or is it included in the build?

It fully depends on the system's one.

KF7IJZ wrote:
If it is included in the build, do we just run your script (create_wisdomf_eah_brp.sh) to generate wisdom?

Because it is the system's library you can use the system's binaries to create the wisdom.

KF7IJZ wrote:
Should the machine be "burdened" while generating wisdom (should we leave E@H running)?

Uh, I truly don't know. Either would make sense. I do not see why having all the caches busy would harm since this is how you will be using the library.

Today N30dG informed me he found a bug with the package distributed with Debian ARM64, which fails to activate the NEON extension - it only happens for armhf. So, for mere performance reasons we first need to fix that. But we would nonetheless  be curious about how it all works for you as of today without FFTW in full speed.

KF7IJZ wrote:
In other news, I finished rebuilding my hardware "cluster" this weekend.  All 8 Pis were physically moved to the Netgear GS108Ev3 switch, 20mm standoffs between Pis were replaced with 18mm, and everything was net booted.  I have two machines that are consistently returning signal 11s and erring out a work unit about 1/day or so.  Funny thing is that both of these machines were Allied Electronics builds rather than Element 14 builds.  Picture is here:  https://twitter.com/KF7IJZ/status/861361430079180800

Is that with the Debian-distributed app , the official one or with the N30dG one? I do not have any immediate idea, yet, I am afraid.

Since the most-performant Wisdom file was one of the core reasons to use the Debian package on ARM64, the just identified issue basically renders it pointless as of today to install the boinc-app-eah-brp package on that platform - otherwise it seems to work fine. We will chase that up and confirm the effect of a presumed trivial patch against FFTW on a couple of work units. Once that was reuploaded to Debian it will definitely be worth trying everywhere again.

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

My power supplies have

My power supplies have arrived and I restarted the Banana Pi R1. I decided to start off with the official clients. These are

offical_E@H_app wrote:
root@lamobo-r1:/var/lib/boinc-client/projects/einstein.phys.uwm.edu# ldd einsteinbinary_BRP4_1.42_arm-unknown-linux-gnueabihf__NEON
    linux-vdso.so.1 (0xbeebb000)
    libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6f6a000)
    libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb6ef5000)
    libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb6e3e000)
    libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6d4e000)
    /lib/ld-linux-armhf.so.3 (0x7f586000)
    libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6d25000)
root@lamobo-r1:/var/lib/boinc-client/projects/einstein.phys.uwm.edu# ldd einsteinbinary_BRP4_1.47_arm-unknown-linux-gnueabihf__NEON_Beta
    linux-vdso.so.1 (0xbeee3000)
    libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6fb3000)
    libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6ec2000)
    /lib/ld-linux-armhf.so.3 (0x7f5e2000)
root@lamobo-r1:/var/lib/boinc-client/projects/einstein.phys.uwm.edu# ls -l einsteinbinary_BRP4_1.4*
-rwxr-xr-x 1 boinc boinc 8060402 May  9 15:49 einsteinbinary_BRP4_1.42_arm-unknown-linux-gnueabihf__NEON
-rwxr-xr-x 1 boinc boinc 4981612 Apr 23 00:43 einsteinbinary_BRP4_1.47_arm-unknown-linux-gnueabihf__NEON_Beta

It is a Debian Jessie based distribution called armbian, i.e. the current stable one:

cat_/etc/armbian-release wrote:
BOARD=lamobo-r1
BOARD_NAME="Lamobo R1"
VERSION=5.25
LINUXFAMILY=sunxi
BRANCH=next
ARCH=arm
IMAGE_TYPE=stable

The Debian platform is armhf, i.e. a platform that also with Debian has NEON support activated with the fftw3 library as of today already. So, the Debian package should be competitive.

dpkg-architecture|grep_HOST wrote:

DEB_HOST_ARCH=armhf
DEB_HOST_ARCH_BITS=32
DEB_HOST_ARCH_CPU=arm
DEB_HOST_ARCH_ENDIAN=little
DEB_HOST_ARCH_OS=linux
DEB_HOST_GNU_CPU=arm
DEB_HOST_GNU_SYSTEM=linux-gnueabihf
DEB_HOST_GNU_TYPE=arm-linux-gnueabihf
DEB_HOST_MULTIARCH=arm-linux-gnueabihf

/proc/cpuinfo wrote:

#1st not shown
processor    : 1
model name    : ARMv7 Processor rev 4 (v7l)
BogoMIPS    : 50.52
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x0
CPU part    : 0xc07
CPU revision    : 4

Hardware    : Allwinner sun7i (A20) Family
Revision    : 0000
Serial        : 165166990581d5fe

The official BOINC client after 20 minutes into it all estimates some 18:33 hours, not minutes, to complete a work unit. And that time basically does not change, it should get a minute less with every minute. My laptop with which I am accessign the device would have completed the first one already. Sorry, I have to stop this. That is factor 60 slower. With 127856KB residential memory per task I we are just above half of what I observed before on the amd64 platform. Preumably this is because of the "inner" memory operation.

I added the line

deb-src http://httpredir.debian.org/debian sid main contrib non-free

to /etc/apt/sources.list and ran "apt-get update". And the build started with "apt-get -b source boinc-app-eah-brp" as described above. Just ensure you are in a directory you have write access to. Caveat, with default settings you get

# apt-get install libboinc-app-dev libxml2-dev libiberty-dev libgsl-dev binutils-dev libfftw3-dev pkg-config libboinc7 Reading package lists... Done Building dependency tree        Reading state information... Done libboinc7 is already the newest version. libboinc7 set to manually installed. Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libboinc-app-dev : Depends: libboinc7 (= 7.4.23+dfsg-1) but 7.6.33+dfsg-5~bpo8+1 is to be installed
E: Unable to correct problems, you have held broken packages.

While a Debian stable, the version we need is in jessie-backports, which needs to be explicitly mentioned:

# apt-get install -t jessie-backports libboinc-app-dev libxml2-dev libiberty-dev libgsl-dev binutils-dev libfftw3-dev pkg-config libboinc7 Reading package lists... Done Building dependency tree        Reading state information... Done libboinc7 is already the newest version. libboinc7 set to manually installed. The following packages were automatically installed and are no longer required:   autotools-dev libftdi1 liblircclient0 libsigsegv2 m4 Use 'apt-get autoremove' to remove them. The following extra packages will be installed:   libboinc-app7 libfftw3-bin libfftw3-double3 libfftw3-single3 libgsl0ldbl libgsl2 libmysqlclient-dev libssl-dev libssl1.0.0 zlib1g-dev Suggested packages:   libfftw3-doc gsl-ref-psdoc gsl-doc-pdf gsl-doc-info gsl-ref-html Recommended packages:   libssl-doc The following NEW packages will be installed:   binutils-dev libboinc-app-dev libboinc-app7 libfftw3-bin libfftw3-dev libfftw3-double3 libfftw3-single3 libgsl-dev libgsl0ldbl libgsl2   libiberty-dev libmysqlclient-dev libssl-dev libxml2-dev pkg-config zlib1g-dev The following packages will be upgraded:   libssl1.0.0 1 upgraded, 16 newly installed, 0 to remove and 48 not upgraded. Need to get 10.2 MB of archives. After this operation, 35.9 MB of additional disk space will be used. Do you want to continue? [Y/n]

Yes, please! And then please build. Uh, I had forgotten about debhelper from backports....fixed.

After installation and a newly subscribed to Einstein@Home, the situation is much like before, except that is now a mostly invariant 11 hours that the WU is expected to last. That is also more than I would want to accept - 22 times slower - no! The memory requirement is at 212420KB per task:

# ls -l /usr/lib/boinc-app-eah-brp/einsteinbinary_BRP4 -rwxr-xr-x 1 root root 68404 May  9 16:37 /usr/lib/boinc-app-eah-brp/einsteinbinary_BRP4 # ldd /usr/lib/boinc-app-eah-brp/einsteinbinary_BRP4     linux-vdso.so.1 (0xbe9b5000)     libfftw3f.so.3 => /usr/lib/arm-linux-gnueabihf/libfftw3f.so.3 (0xb6db2000)     libxml2.so.2 => /usr/lib/arm-linux-gnueabihf/libxml2.so.2 (0xb6cc6000)     libboinc_api.so.7 => /usr/lib/arm-linux-gnueabihf/libboinc_api.so.7 (0xb6c9a000)     libboinc.so.7 => /usr/lib/arm-linux-gnueabihf/libboinc.so.7 (0xb6c05000)     libbfd-2.25-system.so => /usr/lib/libbfd-2.25-system.so (0xb6b46000)     libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb6a8f000)     libgsl.so.19 => /usr/lib/arm-linux-gnueabihf/libgsl.so.19 (0xb68f4000)     libgslcblas.so.0 => /usr/lib/arm-linux-gnueabihf/libgslcblas.so.0 (0xb68c5000)     libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb68a2000)     libz.so.1 => /lib/arm-linux-gnueabihf/libz.so.1 (0xb6880000)     libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb680c000)     libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb671b000)     /lib/ld-linux-armhf.so.3 (0x7f630000)     libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb66f2000)     libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb66df000)     liblzma.so.5 => /lib/arm-linux-gnueabihf/liblzma.so.5 (0xb66b7000)

No the wisdomf file is created:

mkdir /etc/fftw # this fails if it is already existing and you want to know about it to save the files
/usr/share/boinc-app-eah-brp/create_wisdomf_eah_brp.sh /etc/fftw/wisdomf

The creation of the wisdom file took other than reported only a few minutes

# cat /etc/fftw/wisdomf
(fftw-3.3.4 fftwf_wisdom #x4a633eef #xb5a95564 #x91014bdd #x9c85ce5f
  (fftwf_dft_vrank_geq1_register 0 #x11048 #x11048 #x0 #x3ccfdb1a #xf8b7fb16 #xb777192b #xa86989c1)
  (fftwf_codelet_t1fv_32_neon 0 #x11048 #x11048 #x0 #x13a350ff #xccbd0d68 #x1df44d80 #x16ee0d41)
  (fftwf_codelet_hc2cfdftv_16_neon 0 #x11048 #x11048 #x0 #xead3fd9e #xffec487c #x5b618d5e #x8f8f3b12)
  (fftwf_codelet_r2cfII_16 2 #x11048 #x11048 #x0 #xf4d971ab #x381e69c1 #xc4398fe0 #x3f2135b1)
  (fftwf_codelet_t2fv_16_neon 0 #x11048 #x11048 #x0 #x3d0b62a7 #x15d0e0a0 #xd8a2423f #xa9a6da1c)
  (fftwf_dft_vrank_geq1_register 0 #x11048 #x11048 #x0 #xbddeb44e #xfd7343e7 #x3c8fc850 #x6888d042)
  (fftwf_codelet_t1fv_12_neon 0 #x11048 #x11048 #x0 #x821ed100 #xa1017c4b #x40993259 #x7860b2a1)
  (fftwf_codelet_r2cf_16 2 #x11048 #x11048 #x0 #x8f3ef9f7 #xe67e11ab #xe25a4700 #x8eed687a)
  (fftwf_dft_vrank_geq1_register 0 #x11048 #x11048 #x0 #xa65ca367 #xee5c44cb #x0578eeed #x986cea5e)
  (fftwf_dft_vrank_geq1_register 1 #x11048 #x11048 #x0 #xd97dcac9 #xd6110c1d #x25bf8814 #xe9a1ed91)
  (fftwf_codelet_n1fv_128_neon 0 #x11048 #x11048 #x0 #xdcd9ab89 #x9279272f #x45725e3d #xb22380a2)
)

Unfortunately there is no immediate effect from the wisdomf file provided. With one task started without the wisdom and the other being new they seem equally slow. Projecting from the Elapsed time and the progress percentage, the complete rnutime should be at 1000 minutes. Why did I not do so for the offiicial client, will do at some later point.

Summary:

  • The Debian package aparrently dramatically outperformes the offiial one for the RP1.
  • It is still dead slow on this armhf
  • Memory almost doubled, no swapping for the only cores running one task each.
  • The wisdomf file had no effect.
  • The N30dG app was not yet tried.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.