Parallella, Raspberry Pi, FPGA & All That Stuff

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 314958263
RAC: 303292

Yeah, I think there's quite a

Yeah, I think there's quite a back story here. On 21/08/2013 update

Quote:
After 5 years of having to constantly “do more with less†it finally looks like our ship has come in! I can’t say more than that for now, but I will say that the stronger Adapteva is financially the more likely it is that the Parallella platform will be a long term success!


but from 27/09/2013 forum post

Quote:
Sorry for the lack of communication!! We have been in a pretty delicate position (nothing related to the board or the chips). Hopefully some day I can tell everyone the whole horrific story...


FWIW my guess is that they've been occupied at a business, not technical, level with a potential big backer or contract or somesuch that didn't go as well as hoped .....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 577240218
RAC: 192177

RE: OK, it's here ( my

Quote:

OK, it's here ( my coloring ) :

Quote:
On every clock cycle, the following operations can occur:
- 64 bits of instructions can be fetched from memory to the program sequencer.
- 64 bits of data can be passed between the local memory and the CPU’s register file.
- 64 bits can be written into the local memory from the network interface.
- 64 bits can be transferred from the local memory to the network using the local DMA

Oh, I see. But being able to perform one such action per cycle only gives you the throughput. It doesn't tell you how long it will take to finish these actions, i.e. the latency. From you other post:

Quote:
Every router in the mesh is connected to the north, east, west, south, and to a mesh node. Write transactions move through the network, with a latency of 1.5 clock cycles per routing hop. A transaction traversing from the left edge to right edge of a 64- core chip would thus take 12 clock cycles.

That's what I was getting at: the latency to finish that write depends on the distance between the chip and can be much more than 5 clocks (still fast, though!). I hope we didn't just talk about different "how long"s all the time: "How long does it take the sender to send the write?" versus "How long does it take for the write to arrive?". Actually.. your initial statement was "I can write into the memory (or was it register?) of another core in 5 cycles". So it's actually the total latency to finish the write.

Quote:
In theory at least, one might 'unroll' a loop to perform the same essential calculations on several cores, with each core doing what might have been done for a single loop iteration ie. accounting for different values of whatever loop variable(s) would have otherwise been updated per round of the loop.

That's what I'm occasionally using in MATLAB with a parfor loop. The overhead there is significant, though, in that individual loops have to exceed 10's or better 100's of ms of runtime for this to provide any benefit. Which greatly limits its applicability.. so I'm a bit jealous about what you could do at a low level. On the other hand I'm not all that keen on spending the time to hand-tweak such details ;)

Quote:
A subtle bit here is we are using RISC processors which by definition will/may/could have an expanded code memory footprint for a given task(s) c/w their CISC cousins ( but not necessarily ).

Considering Parallela is starting from scratch here and that the individual cores are fairly simple, I'd actually expect their instruction footprint to be less than x86. Especially if 16 bit instructions can sometimes be used.

MrS

Scanning for our furry friends since Jan 2002

Rod
Rod
Joined: 3 Jan 06
Posts: 4396
Credit: 811266
RAC: 0

RE: RE: Sorry for the

Quote:

Quote:
Sorry for the lack of communication!! We have been in a pretty delicate position (nothing related to the board or the chips). Hopefully some day I can tell everyone the whole horrific story...
my guess is that they've been occupied at a business, not technical, level with a potential big backer or contract or somesuch that didn't go as well as .

I suspect a challenge on their intellectual property.

There are some who can live without wild things and some who cannot. - Aldo Leopold

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: RE: RE: Sorry for

Quote:

Quote:

Quote:
Sorry for the lack of communication!! We have been in a pretty delicate position (nothing related to the board or the chips). Hopefully some day I can tell everyone the whole horrific story...
my guess is that they've been occupied at a business, not technical, level with a potential big backer or contract or somesuch that didn't go as well as .

I suspect a challenge on their intellectual property.


Well they have a new logo but I don't think that was it.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 314958263
RAC: 303292

@Rod + @MarkJ : Intellectual

@Rod + @MarkJ : Intellectual property challenge, yeah there's a thought. It'd be the sort of thing a big player might do to squash a start-up, but I speculate. IMHO ( FWIW ) I reckon their design is brilliant so certainly well worth a patent, which they have.

@MrS :

Quote:
I hope we didn't just talk about different "how long"s all the time ...


Oooops, I think there may have been a tad of that. Sorry :-O :-)
Yup, throughput of one per cycle with latency of five cycles.

Quote:
On the other hand I'm not all that keen on spending the time to hand-tweak such details ;)


One early task for me, when the card arrives, is to create a wide set of assembler macros suitably parameterised. Their implementation of the superscalar aspect is intriguing I think, and with a bit of clever ordering the CPU can really hand off alot of stuff simultaneously. Here the dependencies within the pipeline can be mitigated by attention to the parallel scheduling rules and cycle separations ie. avoid stalls.

Quote:
Especially if 16 bit instructions can sometimes be used.


Yup, using the general registers 0 through 7 with short immediates is now on my list of features to ruthlessly exploit at assembler level. Within 16 bits you would, at most, get room for a signed immediate of 3 bits ( simm3 ) ie. -4 to + 3

Cheers, Mike.

( edit ) More info from Andreas : here and here .... :-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 577240218
RAC: 192177

RE: On the other hand I'm

Quote:
On the other hand I'm not all that keen on spending the time to hand-tweak such details ;)


... and I'm glad that there are others who are keen to do so :D
(in a clever way, without wasting teir time, of course)

MrS

Scanning for our furry friends since Jan 2002

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 314958263
RAC: 303292

Well I must say : I am

Well I must say : I am getting itchy fingers for the alleged imminent Parallella delivery ! :-)

Anyway, while waiting I have produced these musings upon possible approaches to the Parallella for FFT, keeping the horrible mathematical mud in the appendices. :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 577240218
RAC: 192177

... just be careful with your

... just be careful with your pets and such ;)

MrS

Scanning for our furry friends since Jan 2002

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 314958263
RAC: 303292

RE: ... just be careful

Quote:

... just be careful with your pets and such ;)

MrS


I do so love XKCD, it fills the hole left by Gary Larson when he retired. :-)

Cheers, Mike.

( edit ) Subtitle/hover is 'That cat has some serious periodic components' ....

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 577240218
RAC: 192177

He already had over 300 posts

He already had over 300 posts when I discovered XKCD some time ago.. went through all of them :)
I even made myself an A0 poster with some old favorites, it's still happily hanging at the bathroom door. Too bad most guests have trouble getting the (sligthly) nerdy jokes in english!

MrS

Scanning for our furry friends since Jan 2002

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.