Parallella, Raspberry Pi, FPGA & All That Stuff

N30dG-ARM
N30dG-ARM
Joined: 20 Oct 17
Posts: 23
Credit: 22094059
RAC: 0

koschi wrote:Another round of

koschi wrote:
Another round of miracles by you :-D How is the performance of the T860, does it beat a fast x86 core? How much faster than one of the A72 cores of that board is it?

Runtime was about 1h 20min on my firefly-rk3399. Runtime on the A72-cores (@1.8Ghz) was around 2h on this Board.

Potential the runtime could be way better (<1h). But there are some things in the openCL-code that are not optimal for a system like this, with slow shared memory. The BRP-openCL-code makes heavy use of LUT's, which speed's up things on "real" GPU's with fast GDDR5X/HBM-Memory but hit's performance on slow memory systems. (The Firefly uses only DDR3-PC1333.) I've changed some things and was able to squeeze out runtimes slightly below 1h. But this increased the invalid-rate by a fraction. Even without this modifications the invalid-rate was high. Around 20%. I know what caused this but there was no (easy) way around it.

Anyway the mail-midgard-series (T-xxx) show's bad openCL-Performance due to it's weird architecture. Midgard-GPU's are vector-based, similar to NEON/SSE/AVX on the CPU-side. You have to rely on vectorisation of the openCL compiler. Vectorisation is not alway's possible. So openCL-code has to be written with a midgard-GPU in mind to get decent performance. Also the auto-vectorisation seem's to be the reason that caused my invalid's. Setting the auto-vectorisation to be more aggressive increased the invalid-rate.

And I wasn't able to create a app_info that enables Boinc to use GPU and CPU for the same App at the same time. -.- Sometimes I really hate Boinc ;).

Using only the GPU and with around 20% invalids, the RAC of this Board was slightly better than a RPI3 with my optimized-App. Around 1100 to 1200.

 

koschi wrote:
I'll be receiving an Odroid N2 tomorrow, featuring 4 x A73 (1.8 GHz) and 2 x A53 (1.9GHz) and 4GB RAM. The Amlogic S922X SoC comes with a Mali G52, which on paper is faster than the T860. So all in all a promising package. Would be awesome if its GPU could also be used as well

I was able to grep a N2 from the first Batch :) But didn't test Boinc-performance, yet.

In general it's Bifrost-GPU(Gxx) should be way better for openCL-Task's as ARM dropped the vector-based approch :).

 

At the Moment, my N2 is dedicated to some Monitoring-Task's for my mini-ARM-Cluster.

You can take a look here: http://31.17.18.50/nagUI/

Development is in a really early stage. At the moment I'm working on some Boinc-related readouts. But this is only a small side-project, so things go slow as I don't put to much time into it.

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1691457555
RAC: 852263

Wow, thanks for the

Wow, thanks for the insight!

So potentially an N2 with DDR4 (Samsung DDR4-2666, K4A8G165WC-BCTD, https://www.samsung.com/semiconductor/global.semi/file/resource/2017/12/x16%20only_8G_C_DDR4_Samsung_Spec_Rev1.5_Apr.17.pdf) if running at full speed and paired with the faster Mali G52 could actually be interesting! 

I have done several tests with the N2, on Einstein BRP (your optimized aarch64 executable), Asteroids and Tn-Grid it beats the C2 by a factor of 2 for the whole board. It loves Universe@home though, beating the C2 by a factor of 3. Reaching 80000 credits per day, that compares quite well against an R7 1700 @ 3.2Ghz that pumps out 260k per day. 

Under worst Asteroids conditions the board draws 8W from the wall, Universe is closer to 6W, didn't measure other projects.

I was also thinking about some monitoring, nagios, icinga, maybe others. According the name I guess you use some form or fork of Nagios?

N30dG-ARM
N30dG-ARM
Joined: 20 Oct 17
Posts: 23
Credit: 22094059
RAC: 0

koschi schrieb:So potentially

koschi wrote:
So potentially an N2 with DDR4 (Samsung DDR4-2666, K4A8G165WC-BCTD, https://www.samsung.com/semiconductor/global.semi/file/resource/2017/12/x16%20only_8G_C_DDR4_Samsung_Spec_Rev1.5_Apr.17.pdf) if running at full speed and paired with the faster Mali G52 could actually be interesting!

Yes, it really could deliver great performance. When I can find the time I want try the FGRP. But  maybe I start with the BRP as I've already have the code adjusted for Mali-GPU's. There are some changes required as memory-allocation / copy is a little bit different.

 

koschi wrote:
I have done several tests with the N2, on Einstein BRP (your optimized aarch64 executable), Asteroids and Tn-Grid it beats the C2 by a factor of 2 for the whole board. It loves Universe@home though, beating the C2 by a factor of 3.

I haven't tested my with Boinc-Load yet as it's supposed as my main ARM-Dev-System and therefore I don't want any unnecessary load. But your result's are very promising.

 

koschi wrote:
I was also thinking about some monitoring, nagios, icinga, maybe others. According the name I guess you use some form or fork of Nagios?

Yes, it's NEMS-Linux wich is basiclly a debian with preinstalled/configured nagios-core for some SBC's.

I've wrote some plugins for nagios-NRPE(Nagios Remote Plugin Executor). Which are simple shell-scripts. The UI is written in PHP. Yes I know I'm not a great Webdesigner Wink. I could turn this into open-source if someone is interessed. But it's only a few lines at this point. Everythink is hardcoded to my needs. And my PHP-code is so ugly at this Point.

poppageek
poppageek
Joined: 13 Aug 10
Posts: 259
Credit: 2473733872
RAC: 0

I used the instructions

I used the instructions earlier in this thread to compile the app for the Odroid XU4. Was there a reason to disable check pointing?

 

Thanks!

N30dG-ARM
N30dG-ARM
Joined: 20 Oct 17
Posts: 23
Credit: 22094059
RAC: 0

Yes, there was some reason to

Yes, there was some reason to disable it... But I don't remember anymore...

I think it was because of many people complained about killing there SD-Card's on there SBC's. The main reason for that is checkpointing as Boinc's default setting is to checkpoint every minute. To avoid this, expecially for inexperienced user's, we decided to disable it.

 

To re-enable it simply remove the line CFLAGS_APPEND += -DNOCHECKPOINTING from boinc-app-eah-brp/debian/rules

poppageek
poppageek
Joined: 13 Aug 10
Posts: 259
Credit: 2473733872
RAC: 0
MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

poppageek wrote:New Raspberry

Finally you can get more memory and it got true Gbe networking. Interesting that they still run it as a 32 bit even though it’s got an ARMv8. Probably for software compatibility with the older apps.

The other thing is it wants a 15 watt power supply, that’s 3 amps so I expect it produces more heat as a result. The power connector is a USB type C, so you can’t use your old power cables or power bricks.

Anonymous

adafruit.com has most of the

adafruit.com has most of the pi4s but not the 4gig of ram one.  promising "soon".  If you put a pi4 in a tight case would it not have a serious heating problem.  my 3s a all open with a 80mm fan mounted on the top to ensure adequate cooling.

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

N30dG-ARM schrieb:Yes, there

N30dG-ARM wrote:

Yes, there was some reason to disable [checkpointing]... But I don't remember anymore...

I think it was because of many people complained about killing there SD-Card's on there SBC's. The main reason for that is checkpointing as Boinc's default setting is to checkpoint every minute. To avoid this, expecially for inexperienced user's, we decided to disable it.

I think the idea was that the small machines run 24/7, like there are too many to monitor them manually and stare at the progress bar. So there is no need to invest in checkpointing.

PorkyPies
PorkyPies
Joined: 27 Apr 16
Posts: 199
Credit: 33740629
RAC: 945

I managed to order 2 Pi4’s

I managed to order 2 Pi4’s (4GB) but they have been back ordered along with the official power supply. I also need to sort out a multi-Pi power supply to use to power four at a time.

On another note I installed Buster on a B+ which I use as an ntp server - its too slow for anything else. It appears to be the same as before, but that might be because it’s headless.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.