Parallella, Raspberry Pi, FPGA & All That Stuff

Rod

Joined: 3 Jan 06

Posts: 4396

Credit: 811266

RAC: 0

My risk is $300.00. I will

29 Oct 2012 0:08:05 UTC

Message 111673

(moderation:

)

My risk is $300.00. I will always learn something. I give better then even odds on delivery.

Really I guess I have no risk. I just enjoy the journey.

There are some who can live without wild things and some who cannot. - Aldo Leopold

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 314958263

RAC: 303292

For what it's worth, while

1 Nov 2012 0:32:28 UTC

Message 111674

(moderation:

)

For what it's worth, while reading stuff I have produced a series of diagrams that demonstrate the Epiphany addressing scheme. A key point is that while 1MB per core is available in address space, currently only 32KB is physically provided for application programmer use. I assume this beckons a pathway for later development ... :-)

The first demonstrates the global unprotected flat addressing, the second is the intra-core map, the third is the relationship of core addresses (decimal/binary/hex) in a small sample array. I have also thrown in as last the Ping/Pong double buffering trick - as used for the matrix multiplication application example, but I think has generality within the Single Program Multiple Data paradigm. This efficiently passes around data slabs within a 'circle' of cores, I have omitted the specific signalling details that co-ordinate.

Cheers, Mike.

( edit ) Note that their definition of a machine word is 32 bits ( 4 bytes ), whereas one may be used to an Intel word being 16 bits ( 2 bytes ) and 'double word' being 32 bits.

( edit ) Also one may get easily confused ( I did! ) by the strong vs weak memory order models. For memory locations referenced by code executing on the same core then you get the usual expected sequence. That's the strong part. Weak refers to scenarios where the memory is on a different core ( or other memory-mapped device for that matter ) than the one the code is executing upon. There is a table on page 20 of the Epiphany Architecture Manual that looks at cases labelled 'CoreX' and 'CoreY'. Neither CoreX nor CoreY is/are the one the code is executing on. So a CoreX/CoreX case refers to memory on a particular single core, but other than the one the code is running on. A CoreX/CoreY case refers to two distinct cores, but again neither being the one the code is running on.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 721202285

RAC: 1122625

Interesting news from this

1 May 2013 21:07:48 UTC

Message 111675 in response to message 111674

(moderation:

)

Interesting news from this project:
[url]http://www.hpcwire.com/hpcwire/2013-04-23/adapteva_shows_off_$99_supercomputer_boards.html[/url]

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 314958263

RAC: 303292

RE: Interesting news from

1 May 2013 23:44:02 UTC

Message 111676 in response to message 111675

(moderation:

)

Quote:

Interesting news from this project:
[url]http://www.hpcwire.com/hpcwire/2013-04-23/adapteva_shows_off_$99_supercomputer_boards.html[/url]

Sigh .... I've been waiting at the gate for the postman every day. :-)

I've been thinking - obviously in generalities at this stage - of how to parallelise ( further ) the GW work. Maybe as the WU's currently stand then there's probably not much scope - maybe a bit quicker FFT's say. But then I thought that we have already parallelised, in that E@H by it's distributed nature is effectively doing that already. Soooo ....

[vaulting ambition overleaping itself]
To what extent can work currently labelled as 'pre-' and 'post-' ( with respect to distributed WU allocation ) be shifted from server side to the volunteer milieu? Could there be scope for creating new classes of work units that do that stuff ? Or is there little to no advantage, accounting for outlay to setup and maintain, compared to status quo ? Is it the type of work that needs a level of validation beyond our present criteria for volunteer submitted results? Or on the 'post-' side is it the case that the results go many different ways for further ( shall I say 'ad-hoc' ) analysis? Or is it none of my business ? Will the implementing developers groan collectively and then group-fund a hitman to shut me up? ;-) :=)
[/vaulting ambition overleaping itself]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

mikey

Joined: 22 Jan 05

Posts: 12676

Credit: 1839076099

RAC: 3992

RE: RE: Interesting news

2 May 2013 12:55:19 UTC

Message 111677 in response to message 111676

(moderation:

)

Quote:

Quote:
Interesting news from this project:
[url]http://www.hpcwire.com/hpcwire/2013-04-23/adapteva_shows_off_$99_supercomputer_boards.html[/url]

Sigh .... I've been waiting at the gate for the postman every day. :-)

I've been thinking - obviously in generalities at this stage - of how to parallelise ( further ) the GW work. Maybe as the WU's currently stand then there's probably not much scope - maybe a bit quicker FFT's say. But then I thought that we have already parallelised, in that E@H by it's distributed nature is effectively doing that already. Soooo ....

[vaulting ambition overleaping itself]
To what extent can work currently labelled as 'pre-' and 'post-' ( with respect to distributed WU allocation ) be shifted from server side to the volunteer milieu? Could there be scope for creating new classes of work units that do that stuff ? Or is there little to no advantage, accounting for outlay to setup and maintain, compared to status quo ? Is it the type of work that needs a level of validation beyond our present criteria for volunteer submitted results? Or on the 'post-' side is it the case that the results go many different ways for further ( shall I say 'ad-hoc' ) analysis? Or is it none of my business ? Will the implementing developers groan collectively and then group-fund a hitman to shut me up? ;-) :=)
[/vaulting ambition overleaping itself]

Cheers, Mike.

By "parallelised" do you mean like Boinc does, many different pc's each working on the same workunit or more like Cray does, lots of cpu cores working on the same workunit but within a single pc? Because part of the problem with the later is the validation of a unit done within a single pc that could be producing junk. Sure using a single pc to process a workunit can be very fast, especially if it can use multiple cpu and gpu cores within that same machine, but if there is an error somewhere it must be found or the whole process is worthless. If you are sending the same unit to multiple pc's anyway then what's the advantage to using more of a single pc to crunch the unit. UNLESS the unit is more comprehensive, then it makes alot of sense, IMHO. I will say for the sake of my point a unit take 2 hours to finish using a single gpu and cpu core, if you were to use all available gpu's and all available cpu cores too perhaps a unit of 3 or 4 times the current size could be processed in that same 2 hours. This would give the ability to wring even more data out of a unit then you can now. The PROBLEM comes in when that pc has an 'issue', and the whole 2 hours worth of intensive work is junked because of it.

In short if you go that route I would like to see 'validation points' along the way, at periodic intervals, that say yea or nay to keep crunching that particular unit. What if it was an 8 hour unit and there was an error in the first 10 minutes and it continued on crunching basing all future calculations on that error? What a waste of time that would have been.

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 314958263

RAC: 303292

Exactly. BOINC essentially

2 May 2013 23:31:37 UTC

Message 111678

(moderation:

)

Exactly. BOINC essentially parallelises, and the server side validates ( currently at E@H using specific validation per returned result, but also via quorums ). BOINC is the framework for distribution but doesn't determine content. I'm just musing about whether more work - out of the entire data processing pipeline - can be shifted to volunteer clients, and whether that is useful/acceptable/pragmatic etc. A basic BOINC design 'rule' is that only server-side work can be 'trusted', so with our current workflow there are already mechanisms in place to determine if client work can be trusted ( in the scientific/cognitive sense ).

I'm just shooting the breeze here. The thought came up when I was thinking of parallel computing in the specific with Adapteva ie. how to apply that ( if at all ) to E@H. I do fully anticipate developer wrath here .... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 721202285

RAC: 1122625

RE: I'm just musing about

13 May 2013 13:27:03 UTC

Message 111679 in response to message 111678

(moderation:

)

Quote:

I'm just musing about whether more work - out of the entire data processing pipeline - can be shifted to volunteer clients, and whether that is useful/acceptable/pragmatic etc. A basic BOINC design 'rule' is that only server-side work can be 'trusted', so with our current workflow there are already mechanisms in place to determine if client work can be trusted ( in the scientific/cognitive sense ).

I'm just shooting the breeze here. The thought came up when I was thinking of parallel computing in the specific with Adapteva ie. how to apply that ( if at all ) to E@H. I do fully anticipate developer wrath here .... :-)

Good question. I think if we consider the whole range of computational tasks that are associated with our current GW search, we currently already have the part that is best suited for volunteer computing out in the field. The main factor is the ratio of computational cost in terms of CPU hours to the volume of data that needs to get in and out of the computation, and the amount of latency that we can afford to get back the result (is it blocking the next stage?). So (currently) I cannot think of a better way to make use of the volunteers' resources, but that's just me of course ;-).

Meanwhile....there is a concrete date (more or less) for the first shipments of Parallella boards to Kickstarter supporters: June (2013 I guess ;-) ) ==>

http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone/posts/478271

I haven't looked deep enough into the docs so far, so I'm not really sure how difficult it would be to port (say) BRP4 to this platform. Yes, it supports OpenCL, but that doesn't mean it's as easy as recompiling....

I can see a challenge for volunteers here...get BRP running on the Parallella using all available cores of the Epiphany chip.

Cheers
HB

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 314958263

RAC: 303292

RE: Good question. I think

14 May 2013 4:56:32 UTC

Message 111680 in response to message 111679

(moderation:

)

Quote:

Good question. I think if we consider the whole range of computational tasks that are associated with our current GW search, we currently already have the part that is best suited for volunteer computing out in the field. The main factor is the ratio of computational cost in terms of CPU hours to the volume of data that needs to get in and out of the computation, and the amount of latency that we can afford to get back the result (is it blocking the next stage?). So (currently) I cannot think of a better way to make use of the volunteers' resources, but that's just me of course ;-).

Thanks HB, I thought it would be something like that : otherwise it would have probably been already done.

Quote:

... so I'm not really sure how difficult it would be to port (say) BRP4 to this platform. Yes, it supports OpenCL, but that doesn't mean it's as easy as recompiling....

Yes one can support any language, the question is whether that will yield some efficiency advantage using that hardware. My initial goal - with the board yet to be seen or touched, mind you - is to get some FFT going with, say, a CUDA-FFT library style interface ( setup/plan/pre-calculate then execute ). Learn from there ....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 721202285

RAC: 1122625

RE: My initial goal - with

16 May 2013 12:43:54 UTC

Message 111681 in response to message 111680

(moderation:

)

Quote:

My initial goal - with the board yet to be seen or touched, mind you - is to get some FFT going with, say, a CUDA-FFT library style interface ( setup/plan/pre-calculate then execute ).

Even that first step would be interesting , E@H-wise, see the current Germi-search beta version here wghich also is just doing FFT on the GPU.
Of course the few Parallellla boards would not make a signifcant contribution to E@H as a whole, but it would be a cool outreach / technology demonstration for this new platform, and for E@H.

Cheers
HB

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 506994932

RAC: 116076

RE: So (currently) I

18 May 2013 19:35:38 UTC

Message 111682 in response to message 111679

(moderation:

)

Quote:

So (currently) I cannot think of a better way to make use of the volunteers' resources, but that's just me of course ;-).

Referring to this I had a short PM xchange with HB, subject SW development.
A short part of the answer was Weltweit kommen vielleicht gerade einmal einige Hundert qualifizierte Leute dafÃ¼r in Frage .... (Worldwide, perhaps just a few hundred qualified people ....)

Reading this I could remember that I have read an article about programming from scientists some time ago.

http://www.nature.com/news/2010/101013/pdf/467775a.pdf

Just for Info and if someone wants to compare what's going on here and somewhere else. I would say, we can be happy to participate in this project!

Parallella, Raspberry Pi, FPGA & All That Stuff

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner