Parallella, Raspberry Pi, FPGA & All That Stuff

Rod
Rod
Joined: 3 Jan 06
Posts: 4,396
Credit: 811,266
RAC: 0

My risk is $300.00. I will

My risk is $300.00. I will always learn something. I give better then even odds on delivery.

Really I guess I have no risk. I just enjoy the journey.

There are some who can live without wild things and some who cannot. - Aldo Leopold

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,126
Credit: 128,363,463
RAC: 33,462

For what it's worth, while

For what it's worth, while reading stuff I have produced a series of diagrams that demonstrate the Epiphany addressing scheme. A key point is that while 1MB per core is available in address space, currently only 32KB is physically provided for application programmer use. I assume this beckons a pathway for later development ... :-)

The first demonstrates the global unprotected flat addressing, the second is the intra-core map, the third is the relationship of core addresses (decimal/binary/hex) in a small sample array. I have also thrown in as last the Ping/Pong double buffering trick - as used for the matrix multiplication application example, but I think has generality within the Single Program Multiple Data paradigm. This efficiently passes around data slabs within a 'circle' of cores, I have omitted the specific signalling details that co-ordinate.

Cheers, Mike.

( edit ) Note that their definition of a machine word is 32 bits ( 4 bytes ), whereas one may be used to an Intel word being 16 bits ( 2 bytes ) and 'double word' being 32 bits.

( edit ) Also one may get easily confused ( I did! ) by the strong vs weak memory order models. For memory locations referenced by code executing on the same core then you get the usual expected sequence. That's the strong part. Weak refers to scenarios where the memory is on a different core ( or other memory-mapped device for that matter ) than the one the code is executing upon. There is a table on page 20 of the Epiphany Architecture Manual that looks at cases labelled 'CoreX' and 'CoreY'. Neither CoreX nor CoreY is/are the one the code is executing on. So a CoreX/CoreX case refers to memory on a particular single core, but other than the one the code is running on. A CoreX/CoreY case refers to two distinct cores, but again neither being the one the code is running on.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 460,268,506
RAC: 18,657

Interesting news from this

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,126
Credit: 128,363,463
RAC: 33,462

RE: Interesting news from


Sigh .... I've been waiting at the gate for the postman every day. :-)

I've been thinking - obviously in generalities at this stage - of how to parallelise ( further ) the GW work. Maybe as the WU's currently stand then there's probably not much scope - maybe a bit quicker FFT's say. But then I thought that we have already parallelised, in that E@H by it's distributed nature is effectively doing that already. Soooo ....

[vaulting ambition overleaping itself]
To what extent can work currently labelled as 'pre-' and 'post-' ( with respect to distributed WU allocation ) be shifted from server side to the volunteer milieu? Could there be scope for creating new classes of work units that do that stuff ? Or is there little to no advantage, accounting for outlay to setup and maintain, compared to status quo ? Is it the type of work that needs a level of validation beyond our present criteria for volunteer submitted results? Or on the 'post-' side is it the case that the results go many different ways for further ( shall I say 'ad-hoc' ) analysis? Or is it none of my business ? Will the implementing developers groan collectively and then group-fund a hitman to shut me up? ;-) :=)
[/vaulting ambition overleaping itself]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

mikey
mikey
Joined: 22 Jan 05
Posts: 6,604
Credit: 605,839,410
RAC: 827,789

RE: RE: Interesting news

Quote:

Sigh .... I've been waiting at the gate for the postman every day. :-)

I've been thinking - obviously in generalities at this stage - of how to parallelise ( further ) the GW work. Maybe as the WU's currently stand then there's probably not much scope - maybe a bit quicker FFT's say. But then I thought that we have already parallelised, in that E@H by it's distributed nature is effectively doing that already. Soooo ....

[vaulting ambition overleaping itself]
To what extent can work currently labelled as 'pre-' and 'post-' ( with respect to distributed WU allocation ) be shifted from server side to the volunteer milieu? Could there be scope for creating new classes of work units that do that stuff ? Or is there little to no advantage, accounting for outlay to setup and maintain, compared to status quo ? Is it the type of work that needs a level of validation beyond our present criteria for volunteer submitted results? Or on the 'post-' side is it the case that the results go many different ways for further ( shall I say 'ad-hoc' ) analysis? Or is it none of my business ? Will the implementing developers groan collectively and then group-fund a hitman to shut me up? ;-) :=)
[/vaulting ambition overleaping itself]

Cheers, Mike.

By "parallelised" do you mean like Boinc does, many different pc's each working on the same workunit or more like Cray does, lots of cpu cores working on the same workunit but within a single pc? Because part of the problem with the later is the validation of a unit done within a single pc that could be producing junk. Sure using a single pc to process a workunit can be very fast, especially if it can use multiple cpu and gpu cores within that same machine, but if there is an error somewhere it must be found or the whole process is worthless. If you are sending the same unit to multiple pc's anyway then what's the advantage to using more of a single pc to crunch the unit. UNLESS the unit is more comprehensive, then it makes alot of sense, IMHO. I will say for the sake of my point a unit take 2 hours to finish using a single gpu and cpu core, if you were to use all available gpu's and all available cpu cores too perhaps a unit of 3 or 4 times the current size could be processed in that same 2 hours. This would give the ability to wring even more data out of a unit then you can now. The PROBLEM comes in when that pc has an 'issue', and the whole 2 hours worth of intensive work is junked because of it.

In short if you go that route I would like to see 'validation points' along the way, at periodic intervals, that say yea or nay to keep crunching that particular unit. What if it was an 8 hour unit and there was an error in the first 10 minutes and it continued on crunching basing all future calculations on that error? What a waste of time that would have been.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,126
Credit: 128,363,463
RAC: 33,462

Exactly. BOINC essentially

Exactly. BOINC essentially parallelises, and the server side validates ( currently at E@H using specific validation per returned result, but also via quorums ). BOINC is the framework for distribution but doesn't determine content. I'm just musing about whether more work - out of the entire data processing pipeline - can be shifted to volunteer clients, and whether that is useful/acceptable/pragmatic etc. A basic BOINC design 'rule' is that only server-side work can be 'trusted', so with our current workflow there are already mechanisms in place to determine if client work can be trusted ( in the scientific/cognitive sense ).

I'm just shooting the breeze here. The thought came up when I was thinking of parallel computing in the specific with Adapteva ie. how to apply that ( if at all ) to E@H. I do fully anticipate developer wrath here .... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 460,268,506
RAC: 18,657

RE: I'm just musing about

Quote:

I'm just musing about whether more work - out of the entire data processing pipeline - can be shifted to volunteer clients, and whether that is useful/acceptable/pragmatic etc. A basic BOINC design 'rule' is that only server-side work can be 'trusted', so with our current workflow there are already mechanisms in place to determine if client work can be trusted ( in the scientific/cognitive sense ).

I'm just shooting the breeze here. The thought came up when I was thinking of parallel computing in the specific with Adapteva ie. how to apply that ( if at all ) to E@H. I do fully anticipate developer wrath here .... :-)

Good question. I think if we consider the whole range of computational tasks that are associated with our current GW search, we currently already have the part that is best suited for volunteer computing out in the field. The main factor is the ratio of computational cost in terms of CPU hours to the volume of data that needs to get in and out of the computation, and the amount of latency that we can afford to get back the result (is it blocking the next stage?). So (currently) I cannot think of a better way to make use of the volunteers' resources, but that's just me of course ;-).


Meanwhile....there is a concrete date (more or less) for the first shipments of Parallella boards to Kickstarter supporters: June (2013 I guess ;-) ) ==>

http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone/posts/478271

I haven't looked deep enough into the docs so far, so I'm not really sure how difficult it would be to port (say) BRP4 to this platform. Yes, it supports OpenCL, but that doesn't mean it's as easy as recompiling....

I can see a challenge for volunteers here...get BRP running on the Parallella using all available cores of the Epiphany chip.

Cheers
HB

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,126
Credit: 128,363,463
RAC: 33,462

RE: Good question. I think

Quote:
Good question. I think if we consider the whole range of computational tasks that are associated with our current GW search, we currently already have the part that is best suited for volunteer computing out in the field. The main factor is the ratio of computational cost in terms of CPU hours to the volume of data that needs to get in and out of the computation, and the amount of latency that we can afford to get back the result (is it blocking the next stage?). So (currently) I cannot think of a better way to make use of the volunteers' resources, but that's just me of course ;-).


Thanks HB, I thought it would be something like that : otherwise it would have probably been already done.

Quote:
... so I'm not really sure how difficult it would be to port (say) BRP4 to this platform. Yes, it supports OpenCL, but that doesn't mean it's as easy as recompiling....


Yes one can support any language, the question is whether that will yield some efficiency advantage using that hardware. My initial goal - with the board yet to be seen or touched, mind you - is to get some FFT going with, say, a CUDA-FFT library style interface ( setup/plan/pre-calculate then execute ). Learn from there ....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 460,268,506
RAC: 18,657

RE: My initial goal - with

Quote:
My initial goal - with the board yet to be seen or touched, mind you - is to get some FFT going with, say, a CUDA-FFT library style interface ( setup/plan/pre-calculate then execute ).

Even that first step would be interesting , E@H-wise, see the current Germi-search beta version here wghich also is just doing FFT on the GPU.
Of course the few Parallellla boards would not make a signifcant contribution to E@H as a whole, but it would be a cool outreach / technology demonstration for this new platform, and for E@H.

Cheers
HB

Alex
Alex
Joined: 1 Mar 05
Posts: 449
Credit: 342,169,514
RAC: 136,544

RE: So (currently) I

Quote:

So (currently) I cannot think of a better way to make use of the volunteers' resources, but that's just me of course ;-).

Referring to this I had a short PM xchange with HB, subject SW development.
A short part of the answer was Weltweit kommen vielleicht gerade einmal einige Hundert qualifizierte Leute dafür in Frage .... (Worldwide, perhaps just a few hundred qualified people ....)

Reading this I could remember that I have read an article about programming from scientists some time ago.

http://www.nature.com/news/2010/101013/pdf/467775a.pdf

Just for Info and if someone wants to compare what's going on here and somewhere else. I would say, we can be happy to participate in this project!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.