Many-core DSP architectures

Many-core DSP architectures
Gerard Rauwerda, CTO & co-founder
[email protected]
Recore Systems BV
P.O. Box 77, 7500 AB,
Enschede, The Netherlands
℡ +31 53 4753 000
+31 53 4753 009
[email protected]
www.recoresystems.com
Recore Systems
Dutch fabless semiconductor company
Intellectual Property (IP) licensing
Embedded many-core sub-system design
SoC integration and embedded SW solutions
Focus on professional market
Space, defense, security, …
Customized mix of processing, connectivity, and reliability
Digital beamforming (e.g. advanced radar systems)
On-board payload processing for space applications
…
© 2014 Recore Systems BV
2
Confidential
System-on-Chip design
Complexity, challenges
Super-exponentially increasing design process complexity
Silicon complexity increases due to scaling of technology
System complexity increases due to larger designs
Software development is very complex and cumbersome
Overall design technology challenges
Design productivity & reuse productivity
Power consumption
Manufacturability
Reliability
© 2014 Recore Systems BV
3
Confidential
System-on-Chip design
Complexity, challenges
Complex systems nowadays require the integration of many
processors of various types (MCU, GPU, DSP,…)
Software development on these heterogeneous many-core
systems is very complex and cumbersome.
© 2014 Recore Systems BV
4
Confidential
Reconfigurable functionality of a SoC
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Based on ITRS 2009
© 2014 Recore Systems BV
5
Confidential
References
General Stream Processor
Heterogeneous many-core SoC
General Purpose Processor
45 Xentium DSP cores
10 smart memory tiles
Network-on-Chip (NoC)
Reliable processor for space
Scalable DSP subsystem (new)
2 Xentiums
Network-on-Chip
GPP subsystem (existing)
NoC bridge
© 2014 Recore Systems BV
6
Confidential
Xentium ® : Powerful DSP
Suitable for scalable many-core SoC design or as accelerator
core
Competitive combination of
Small silicon footprint,
computational power and low power consumption
Customizable core
Predictable and deterministic behaviour
Straightforward Xentium integration for customized SoC
Optimize memory size (small, large)
Choose interfaces
Optimize # of accelerators
© 2014 Recore Systems BV
7
Confidential
Xentium Architecture - Highlights
Programmable high-performance DSP
High instruction-level parallelism
Data precision
32/40-bit fixed-point
16-bit SIMD
Features
Single-cycle latency Data Memory
Single-cycle instruction cache latency
Short 3-cycle pipeline
Efficient complex MAC execution
Register bypassing
(latency, energy efficiency, code size)
Loop buffer
(energy efficiency, code size)
Xentium core
tightly coupled
data memory
Bus
Common
bus
interface
datapath
control
logic
instruction
cache
CMOS
GMAC/s
Clock
65 nm
90 nm
1.6 GMAC/s
0.88 GMAC/s
400 MHz
220 MHz
© 2014 Recore Systems BV
8
Confidential
Xentium - datapath
parallel execution units
© 2014 Recore Systems BV
9
Confidential
Xentium
Tool chain
C source
files
C compiler
Assembler
source
Assembler
Archiver
Xentium C compiler
Object files
Xentium assembler
Object file
library
Linker
Executable
object file
Object file
utilities
Xentium instruction set simulator
Profiler
info
Debugger
Clean and readable
Extensive built-in preprocessor
Standard assembler directives
Compile, assemble & link
a program in a single step
Simulator
Xentium
core
ANSI/ISO-standard C
Built-in functions for Xentium
specific operations
Mix C and assembly functions calls
Trace program execution
Program execution cycle count
Runs on 32- and 64-bit Linux
Profiler
© 2014 Recore Systems BV
10
Confidential
Xentium Eclipse Plug-in
Overview
Use features provided by
the Eclipse IDE for
C/C++
Integrates the command
line Xentium tool chain
Diagnostics support for
Xentium compiler
© 2014 Recore Systems BV
11
Confidential
Tomorrow’s
General Stream Processor
A radically scalable many-core architecture
Run-time resource management for
dynamically reconfiguring cores and
network-on-chip
On-chip dependability infrastructure for
hardware test, diagnose and repair
Low-cost and super high-performance
application demonstrators
Prototype includes 45 Xentium® DSP cores
© 2014 Recore Systems BV
12
Confidential
The GSP X-45 prototype
(includes 45 Xentium cores)
Xentium
Memtium
Xentium
Memtium
Xentium
Xentium
Xentium
Xentium
Xentium
Xentium
Xentium
© 2014 Recore Systems BV
13
Confidential
The GSP X-45 prototype PCB
5x RFD
GPD
MCP
1 GigE
Virtex-4 FPGA
© 2014 Recore Systems BV
14
Confidential
Run-time mapping
P3
Dynamically determine the
assignment of streaming
application kernels to available
resources
Configure tile processors
Configure Network-on-Chip
Objectives
R
Improve dependability by
dynamically circumventing faulty
hardware
R
R
R
R
“Main”
GPP
FPGA
R
GPP
Dramatically improve the
utilization of available cores
FPGA
Xentium
FPGA
Xentium
Xentium
R
ASIC
R
Xentium
R
Xentium
ASIC
Xentium
Xentium
R
R
R
R
Xentium
FPGA
Xentium
Xentium
ASIC
R
R
R
R
ASIC
R
R
R
Xentium
FPGA
R
R
Simplify compilers by separating
communication from computation
© 2014 Recore Systems BV
15
A streaming application is modeled
as P7a series
of P4communicating
P6
P2
P1
parallel processes to be applied for
each element in aP5stream of data
R
Xentium
R
Xentium
R
Xentium
Confidential
Hide (software) complexity
WWW.CRISP-PROJECT.EU
Dynamically determine mapping
Automatically reconfigure
Adapt to platform changes
Processors / Network-on-Chip
© 2014 Recore Systems BV
16
Running on control processor
Managing HW resources in manycore system
Remapping on faults, anticipating
on power
Confidential
Run-time resource management
Requires annotated task graphs
Maps
Cores differ on access to memory and IO
Adding and removing applications changes platform
tasks to processing cores
inter-task communication to network resources
Heterogeneous platform resources
Specifying process and communication
Reconfiguring system resources
Construction of annotated task graphs
Fairly straightforward for applications with natural task-level parallelism
Radar Beamforming
Difficult in general
© 2014 Recore Systems BV
17
Confidential
Run-time reliability management
Reduce Cost for Fault Tolerance
Small Fault Free area manages unreliable resources
Fault Prone resources for flexibility, reconfiguration
System reliability
managed at runtime:
Application requirements
System constraints
Fault types and density
© 2014 Recore Systems BV
www.desyre.eu
Confidential
Hide (hardware) complexity
Keep it Simple for the Programmer
Platform-independent application code performance
estimation and optimization
Identification of possible partitions and placing & routing on
different underlying architectures
© 2014 Recore Systems BV
Confidential
Fields of expertise
Heterogeneous many-core architecture design
Programming of heterogeneous many-core systems
On-chip interconnect (NoC) IP & Processor IP (Xentium DSP)
Run time mapping within a heterogeneous many-core system
Parallel programming models
System analysis, generation, simulation and verification tools
Definition of desired
behaviour (what)
Customer’s software application
(in C)
Multi-core operating system with runtime resource scheduling
Management of hardware
resources (how)
On-chip interconnect
Data exchange
and on-chip communication
Xentium DSP
Processor /
Memory
Xentium DSP
Hardware blocks
selected to efficiently
perform the application’s tasks
© 2014 Recore Systems BV
21
Confidential
FP7 European Research Projects
ALMA
DeSyRe
on-Demand System Reliability [2011-2014]
Sensation
ALgorithm parallelization for Multicore Architectures [2011-2014]
Self Energy-Supporting Autonomous Computation [2012-2015]
Polca
Programming Large Scale Heterogeneous Infrastructures [2013–
2016]
www.alma-project.eu
www.desyre.eu
www.sensation-project.eu
www.polca-project.eu
© 2014 Recore Systems BV
Confidential
Recore Systems BV
P.O. Box 77, 7500 AB,
Enschede, The Netherlands
℡ +31 53 4753 000
+31 53 4753 009
[email protected]
© 2013 Recore Systems BV
23
www.recoresystems.com
Confidential