Many-core DSP architectures Gerard Rauwerda, CTO & co-founder [email protected] Recore Systems BV P.O. Box 77, 7500 AB, Enschede, The Netherlands ℡ +31 53 4753 000 +31 53 4753 009 [email protected] www.recoresystems.com Recore Systems Dutch fabless semiconductor company Intellectual Property (IP) licensing Embedded many-core sub-system design SoC integration and embedded SW solutions Focus on professional market Space, defense, security, … Customized mix of processing, connectivity, and reliability Digital beamforming (e.g. advanced radar systems) On-board payload processing for space applications … © 2014 Recore Systems BV 2 Confidential System-on-Chip design Complexity, challenges Super-exponentially increasing design process complexity Silicon complexity increases due to scaling of technology System complexity increases due to larger designs Software development is very complex and cumbersome Overall design technology challenges Design productivity & reuse productivity Power consumption Manufacturability Reliability © 2014 Recore Systems BV 3 Confidential System-on-Chip design Complexity, challenges Complex systems nowadays require the integration of many processors of various types (MCU, GPU, DSP,…) Software development on these heterogeneous many-core systems is very complex and cumbersome. © 2014 Recore Systems BV 4 Confidential Reconfigurable functionality of a SoC 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 Based on ITRS 2009 © 2014 Recore Systems BV 5 Confidential References General Stream Processor Heterogeneous many-core SoC General Purpose Processor 45 Xentium DSP cores 10 smart memory tiles Network-on-Chip (NoC) Reliable processor for space Scalable DSP subsystem (new) 2 Xentiums Network-on-Chip GPP subsystem (existing) NoC bridge © 2014 Recore Systems BV 6 Confidential Xentium ® : Powerful DSP Suitable for scalable many-core SoC design or as accelerator core Competitive combination of Small silicon footprint, computational power and low power consumption Customizable core Predictable and deterministic behaviour Straightforward Xentium integration for customized SoC Optimize memory size (small, large) Choose interfaces Optimize # of accelerators © 2014 Recore Systems BV 7 Confidential Xentium Architecture - Highlights Programmable high-performance DSP High instruction-level parallelism Data precision 32/40-bit fixed-point 16-bit SIMD Features Single-cycle latency Data Memory Single-cycle instruction cache latency Short 3-cycle pipeline Efficient complex MAC execution Register bypassing (latency, energy efficiency, code size) Loop buffer (energy efficiency, code size) Xentium core tightly coupled data memory Bus Common bus interface datapath control logic instruction cache CMOS GMAC/s Clock 65 nm 90 nm 1.6 GMAC/s 0.88 GMAC/s 400 MHz 220 MHz © 2014 Recore Systems BV 8 Confidential Xentium - datapath parallel execution units © 2014 Recore Systems BV 9 Confidential Xentium Tool chain C source files C compiler Assembler source Assembler Archiver Xentium C compiler Object files Xentium assembler Object file library Linker Executable object file Object file utilities Xentium instruction set simulator Profiler info Debugger Clean and readable Extensive built-in preprocessor Standard assembler directives Compile, assemble & link a program in a single step Simulator Xentium core ANSI/ISO-standard C Built-in functions for Xentium specific operations Mix C and assembly functions calls Trace program execution Program execution cycle count Runs on 32- and 64-bit Linux Profiler © 2014 Recore Systems BV 10 Confidential Xentium Eclipse Plug-in Overview Use features provided by the Eclipse IDE for C/C++ Integrates the command line Xentium tool chain Diagnostics support for Xentium compiler © 2014 Recore Systems BV 11 Confidential Tomorrow’s General Stream Processor A radically scalable many-core architecture Run-time resource management for dynamically reconfiguring cores and network-on-chip On-chip dependability infrastructure for hardware test, diagnose and repair Low-cost and super high-performance application demonstrators Prototype includes 45 Xentium® DSP cores © 2014 Recore Systems BV 12 Confidential The GSP X-45 prototype (includes 45 Xentium cores) Xentium Memtium Xentium Memtium Xentium Xentium Xentium Xentium Xentium Xentium Xentium © 2014 Recore Systems BV 13 Confidential The GSP X-45 prototype PCB 5x RFD GPD MCP 1 GigE Virtex-4 FPGA © 2014 Recore Systems BV 14 Confidential Run-time mapping P3 Dynamically determine the assignment of streaming application kernels to available resources Configure tile processors Configure Network-on-Chip Objectives R Improve dependability by dynamically circumventing faulty hardware R R R R “Main” GPP FPGA R GPP Dramatically improve the utilization of available cores FPGA Xentium FPGA Xentium Xentium R ASIC R Xentium R Xentium ASIC Xentium Xentium R R R R Xentium FPGA Xentium Xentium ASIC R R R R ASIC R R R Xentium FPGA R R Simplify compilers by separating communication from computation © 2014 Recore Systems BV 15 A streaming application is modeled as P7a series of P4communicating P6 P2 P1 parallel processes to be applied for each element in aP5stream of data R Xentium R Xentium R Xentium Confidential Hide (software) complexity WWW.CRISP-PROJECT.EU Dynamically determine mapping Automatically reconfigure Adapt to platform changes Processors / Network-on-Chip © 2014 Recore Systems BV 16 Running on control processor Managing HW resources in manycore system Remapping on faults, anticipating on power Confidential Run-time resource management Requires annotated task graphs Maps Cores differ on access to memory and IO Adding and removing applications changes platform tasks to processing cores inter-task communication to network resources Heterogeneous platform resources Specifying process and communication Reconfiguring system resources Construction of annotated task graphs Fairly straightforward for applications with natural task-level parallelism Radar Beamforming Difficult in general © 2014 Recore Systems BV 17 Confidential Run-time reliability management Reduce Cost for Fault Tolerance Small Fault Free area manages unreliable resources Fault Prone resources for flexibility, reconfiguration System reliability managed at runtime: Application requirements System constraints Fault types and density © 2014 Recore Systems BV www.desyre.eu Confidential Hide (hardware) complexity Keep it Simple for the Programmer Platform-independent application code performance estimation and optimization Identification of possible partitions and placing & routing on different underlying architectures © 2014 Recore Systems BV Confidential Fields of expertise Heterogeneous many-core architecture design Programming of heterogeneous many-core systems On-chip interconnect (NoC) IP & Processor IP (Xentium DSP) Run time mapping within a heterogeneous many-core system Parallel programming models System analysis, generation, simulation and verification tools Definition of desired behaviour (what) Customer’s software application (in C) Multi-core operating system with runtime resource scheduling Management of hardware resources (how) On-chip interconnect Data exchange and on-chip communication Xentium DSP Processor / Memory Xentium DSP Hardware blocks selected to efficiently perform the application’s tasks © 2014 Recore Systems BV 21 Confidential FP7 European Research Projects ALMA DeSyRe on-Demand System Reliability [2011-2014] Sensation ALgorithm parallelization for Multicore Architectures [2011-2014] Self Energy-Supporting Autonomous Computation [2012-2015] Polca Programming Large Scale Heterogeneous Infrastructures [2013– 2016] www.alma-project.eu www.desyre.eu www.sensation-project.eu www.polca-project.eu © 2014 Recore Systems BV Confidential Recore Systems BV P.O. Box 77, 7500 AB, Enschede, The Netherlands ℡ +31 53 4753 000 +31 53 4753 009 [email protected] © 2013 Recore Systems BV 23 www.recoresystems.com Confidential
© Copyright 2024 ExpyDoc