Parallella, Raspberry Pi, FPGA & All That Stuff

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

RE: So how long are tasks

Quote:
So how long are tasks taking on the new Raspberry Pi2 as apposed to the old Pi's?
Obviously there can be four at a time, but how much faster is the neon ARM7 versus the ARM6 for Einstein@home.


I have i think 20 Arm cores crunching across various projects, Here they get:

My Pi2 running 2 up is coming in at around 68k secs:

http://einsteinathome.org/host/11741356/tasks

One of my Parallellas running 2 up comes in with similar times:

http://einsteinathome.org/host/11381548/tasks

My Samsung S5 mini on Android 4.4.2 (clocked at 1400MHz) comes in with similar times:

http://einsteinathome.org/host/11667599/tasks

My 2012 Nexus 7 on Android 4.4.4 (clocked at 1200MHz) comes in 50k to 65k secs:
(only running three up, running three BRP tasks make it very laggy, so try and keep at least one task from another project running)

http://einsteinathome.org/host/11543282/tasks

My 2012 HTC One S running Android 4.1.1 (clocked at 1500MHz) is fastest at around 27k secs, But produces eithier validate error, or Error while computing:
(I still think there's an api problem hanging around to do with power saving and critical sections)

http://einsteinathome.org/host/9721755/tasks

My single core HTC Desire S at Albert on Android 2.2.2 (clocked at 998MHz) comes in at 82k Secs:

http://albertathome.org/host/12645/tasks

I don't have times for my Model B Pi at present.

Claggy

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728085742
RAC: 1205743

RE: I think that EaH client

Quote:

I think that EaH client only uses 12M R2C FFTW functions, so you can do an exhaustive wisdom only for this type so save a lot of time:

fftw-wisdom -v -x -o wisdom_eah rof12582912

Furthermore, this host has a FreeScale iMX6Q (4 x Cortex-A9 @ 1GHz - this board).

Thank you,


Yup, only that the ARM version of the BRP app is using in-place transforms to get a small memory footprint. rof --> rif

Nice performance from that board!

HB

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

RE: Yup, only that the ARM

Quote:


Yup, only that the ARM version of the BRP app is using in-place transforms to get a small memory footprint. rof --> rif

Nice performance from that board!

HB

Yes, you are right it is in-place.
The EaH client running on the board is a custom build with specific optimization flags for this ARM.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728085742
RAC: 1205743

RE: The EaH client running

Quote:

The EaH client running on the board is a custom build with specific optimization flags for this ARM.

Interesting. How much faster is that version than the stock E@H version on that board? We know we are losing some performance by doing more stuff in-place in our ARM variants (Linux and Android), so that the RAM used is max ca 125 MB, which matters a lot on those tiny devices.

HB

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

RE: Interesting. How much

Quote:


Interesting. How much faster is that version than the stock E@H version on that board? We know we are losing some performance by doing more stuff in-place in our ARM variants (Linux and Android), so that the RAM used is max ca 125 MB, which matters a lot on those tiny devices.

HB

With the default boinc (ubuntu) version 7.0.27 this board is not supported by EaH.
To be clean, I do not change the source code. I only change the optimization flags for the GCC in the build script and Makefiles.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: Yup, only that the ARM

Quote:
Yup, only that the ARM version of the BRP app is using in-place transforms to get a small memory footprint. rof --> rif


Would if be worthwhile using a "normal" version app to see if it gains anything for the extra memory usage?

[pre]
KiB Mem: 998096 total, 728616 used, 269480 free, 83984 buffers
KiB Swap: 102396 total, 0 used, 102396 free. 102236 cached Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24229 boinc 39 19 130304 128188 5024 R 99.5 12.8 886:47.71 einsteinbi+
24366 boinc 39 19 130304 128188 5024 R 99.5 12.8 875:44.73 einsteinbi+
24066 boinc 39 19 130304 128268 5104 R 98.8 12.9 899:39.99 einsteinbi+
24134 boinc 39 19 130304 128184 5020 R 98.5 12.8 894:21.25 einsteinbi+
[/pre]
As you can see mine has a bit of free memory that it could utilise.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728085742
RAC: 1205743

RE: RE: Yup, only that

Quote:
Quote:
Yup, only that the ARM version of the BRP app is using in-place transforms to get a small memory footprint. rof --> rif

Would if be worthwhile using a "normal" version app to see if it gains anything for the extra memory usage?

[pre]
KiB Mem: 998096 total, 728616 used, 269480 free, 83984 buffers
KiB Swap: 102396 total, 0 used, 102396 free. 102236 cached Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24229 boinc 39 19 130304 128188 5024 R 99.5 12.8 886:47.71 einsteinbi+
24366 boinc 39 19 130304 128188 5024 R 99.5 12.8 875:44.73 einsteinbi+
24066 boinc 39 19 130304 128268 5104 R 98.8 12.9 899:39.99 einsteinbi+
24134 boinc 39 19 130304 128184 5020 R 98.5 12.8 894:21.25 einsteinbi+
[/pre]
As you can see mine has a bit of free memory that it could utilise.

Probably it would gain something. But with memory requirements close to 200 MB or even a bit higher per task IIRC for the out-of-place version, it would be really tight to run 4 of them in parallel even on a device like your's with 1GB RAM so it's no option for our official ARM apps.

We are currently re-unifying our code branches for the regular x86 CPU&GPU apps with the (up until now) somewhat experimental ARM code line, and after that we'll put a new BRP source-code package online so everyone can do experiments on their own.

Cheers
HB

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728085742
RAC: 1205743

Time to share some wisdom:

Time to share some wisdom: next attempt, tailored to the Raspberry Pi 2 running the neon version of the BRP app.

(fftw-3.3.2 fftwf_wisdom #x4a633eef #xb5a95564 #x91014bdd #x9c85ce5f
  (fftwf_codelet_n2fv_6_neon 0 #x10048 #x10048 #x0 #xb9cbc22e #xf75981ec #xfe7fc97d #xa7ef9237)
  (fftwf_dft_vrank_geq1_register 0 #x10048 #x10048 #x0 #x09d6f37d #x36ae1044 #x6551932c #x0ec9837f)
  (fftwf_codelet_q1_2 0 #x11048 #x11048 #x0 #x460f2bdc #x4aa37cb4 #x5c9974cb #x6f00dfca)
  (fftwf_codelet_hc2cfdftv_4_neon 0 #x11048 #x11048 #x0 #xc338dbbd #x81477318 #xc96aed6b #xb15ea60a)
  (fftwf_dft_vrank_geq1_register 0 #x10048 #x10048 #x0 #x67d50d3b #x1369bee7 #x0bbec497 #x32eabb65)
  (fftwf_codelet_t2_8 0 #x10048 #x10048 #x0 #xf837784a #xe72939cb #x379e76e3 #x8e126882)
  (fftwf_dft_vrank_geq1_register 0 #x10048 #x10048 #x0 #x1571fa10 #x389121a2 #xcbdf20c7 #x758fc9be)
  (fftwf_dft_nop_register 0 #x11048 #x11048 #x0 #xe1547730 #xce0f0276 #x1f492e5e #xa455fbfa)
  (fftwf_codelet_t2_8 0 #x10048 #x10048 #x0 #x8ea619df #xab3fb47d #x8f464445 #x0f6cea27)
  (fftwf_rdft_rank0_register 3 #x11048 #x11048 #x0 #xa3218bf8 #x1e4e02e5 #xf3ad505f #xc8d6e15d)
  (fftwf_dft_buffered_register 0 #x11048 #x11048 #x0 #x617ea872 #x4f8387c0 #xc0e3f3b1 #x32b873cd)
  (fftwf_dft_vrank_geq1_register 0 #x11048 #x11048 #x0 #x70b600d6 #xe07ee625 #xbdfc11e2 #x38581e93)
  (fftwf_dft_r2hc_register 0 #x11048 #x11048 #x0 #x92778231 #xf2c5be82 #xbf854e1f #xcdce7520)
  (fftwf_codelet_r2cfII_4 2 #x11048 #x11048 #x0 #x583c6dad #xcad0b14f #xd60d8871 #x3c3e732b)
  (fftwf_dft_vrank_geq1_register 0 #x10048 #x10048 #x0 #x15de8f80 #xf5ad0971 #xfb949337 #x44106823)
  (fftwf_codelet_t3fv_4_neon 0 #x10048 #x10048 #x0 #x463dc2ec #xe48ba2db #x8a49b157 #x2a8a8635)
  (fftwf_codelet_t2fv_16_neon 0 #x10048 #x10048 #x0 #x6beedaf2 #x6ed72333 #x36accb1e #xaee780f5)
  (fftwf_dft_vrank_geq1_register 0 #x11048 #x11048 #x0 #x1f032d84 #x8c4d1b96 #xdb1f2c30 #xb7dd028c)
  (fftwf_codelet_q1_8 0 #x11048 #x11048 #x0 #xd1bb3633 #x91bc40c2 #x20e3bbdc #x4f21b78b)
  (fftwf_codelet_r2cf_4 2 #x11048 #x11048 #x0 #x1ccbb87b #xe43cf57c #xeb78f271 #x2bc4f22f)
  (fftwf_codelet_t1fv_8_neon 0 #x11048 #x11048 #x0 #xa6d492a8 #x769e621f #x709716dc #x6920b4a0)
  (fftwf_dft_vrank_geq1_register 0 #x11048 #x11048 #x0 #x38767c90 #x01ee70b5 #xb6e53cd8 #x51a820b2)
)

Put into /etc/fftw/wisdomf
and restart boinc to force a restart of the science apps.

This might stress the SoC quite a bit, you can monitor CPU temperature like this :

cat /sys/devices/virtual/thermal/thermal_zone0/temp

This give CPU temperature in degrees Celsius x 1000. My Raspi2 is running at around 68deg C with a tiny heat sink installed. [Thermal self-protection of the SoC is triggered at around 85 deg C or some such IIRC.]

HB

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728085742
RAC: 1205743

YES, this wisdom file does

YES, this wisdom file does help quite a bit:
http://einsteinathome.org/task/484845661

ca 50k sec per task, or close to 14 h, at 1GHz, running 4 in parallel. Shortens the time to earn 1Mio credits to 6 years 5 months :-).

Note that for this to work, your RasPi2 needs to execute the NEON enabled app version, and you won't get that one if you are using the outdated BOINC 7.0.x client that is part of Raspbian "wheezy". You can however build your own from recent sources (I haven't tried the Raspbian "jessie" repo yet but that should work as well)

HB

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: YES, this wisdom file

Quote:

YES, this wisdom file does help quite a bit:
http://einsteinathome.org/task/484845661

ca 50k sec per task, or close to 14 h, at 1GHz, running 4 in parallel. Shortens the time to earn 1Mio credits to 6 years 5 months :-).

Note that for this to work, your RasPi2 needs to execute the NEON enabled app version, and you won't get that one if you are using the outdated BOINC 7.0.x client that is part of Raspbian "wheezy". You can however build your own from recent sources (I haven't tried the Raspbian "jessie" repo yet but that should work as well)

HB


I've put it onto one of my Pi2's. I have two with copper heat sinks. One in a case is doing 62-63 degrees (stock speed). One still waiting on case to arrive (it got back ordered) so its sitting on top of the box it came in. It was doing 57-58 degrees before I put the fftw plan onto it, 40 degrees at idle (room temp is 30 degrees). After a few minutes running its up to 55 degrees.

As for the BOINC versions, Jessie has 7.4.23 which reports cpu features so it picks up the Neon app. Both of mine are running Jessie.

I presume this fftw plan would also work on the Parallella although probably not as optimal for the cortex A9 cpu (the Pi2 has cortex A7).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.