Ch.10 SoC Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology System Design SoC Specification Algorithm Design • • • • • Behavior Description Behavior Partitioning Behavior Design (Algorithm design) Discrete Time、Quantization、Precision Power, Cost, Interface High Level Synthesis Architecture Design • • • • • Structure Design Parallel/Pipeline Processing High Level Synthesis (Scheduling/Allocation) SW/HW, Processor、ASIC、 ASIP, IP RTL Optimal Realization for required System Performance System Specification the clarification of any possible ambiguity the careful definition of the project scope approximated costs for development Identification of competition subsequent improvement on their capabilities System Design Target Platform / Computation model ASIP, Processor with or without RTOS, ASIC, FPGA Fixed Point Arithmetic Multiplication/ Division Memory Pin Assignment 10.1 Algorithm Design System Development Tool System Description Language Hardware Oriented Algorithm a. System Development Tool MATLAB/SIMULINK MATLAB® is a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numeric computation. Using the MATLAB product, you can solve technical computing problems faster than with traditional programming languages, such as C, C++, and Fortran. Control, DSP, Image Processing, Communication, Neural Network, Statics, Optimization, Differential Equations Key Features High-level language for technical computing Development environment for managing code, files, and data Interactive tools for iterative exploration, design, and problem solving Mathematical functions for linear algebra, statistics, Fourier analysis, filtering, optimization, and numerical integration 2-D and 3-D graphics functions for visualizing data Tools for building custom graphical user interfaces Functions for integrating MATLAB based algorithms with external applications and languages, such as C, C++, Fortran, Java, COM, and Microsoft Excel b. System Description Language #include "systemc.h“ SC_MODULE (up_down_counter) { //-----------input ports--------------sc_in <bool> clk; sc_in <bool> reset; sc_in <bool> enable; sc_in <bool> up_down; //------------output ports--------------sc_out <sc_uint<8> > out; //------------internal variables-------sc_uint<8> count; Up-down Counter by //--------------process declaration------void counter () { if (reset.read()) { count = 0 ; } else if (enable.read()) { if (up_down.read()) { count = count + 1;} else { count = count - 1; } } out.write(count); } //-------------process registration-------SC_CTOR(up_down_counter) { SC_METHOD (counter); SystemC sensitive << clk.pos(); } }; SystemC sc_prim_channel is the base class for all primitive channels, and provides such channels with unique access to the update phase of the scheduler. This standard provides a number of predefined primitive channels to model common communication mechanisms. Some of them are •sc_mutex •sc_fifo •sc_semaphore c. Hardware Oriented Algorithm Inner Product –Distributed Arithmetic- Hardware Algorithm for Inner Product DFT, DCT, Digital Filters Basic Idea Input data is decomposed into a group of bits. By replacing order of calculation in such a way that multiplication between coefficent and each bit is performed and accumulated at first and accumulate the results with shifting. Bit Manupilation Inner Product Bit 2’s complement Representation of xn By substituting and changing an order of operations as, Realized by ROM with N-bits address and NBc-bits data Inner Product Circuit 8 point DCT Implementation 10.2 Architecture Design a. High Level Synthesis b. ASIP Design c. FPGA Design a. High Level Synthesis Software Language (C/C++, System C) simulation is available on PC. CDFG: Control and Data flow Graph is constructed. Scheduling : to decide the time and Op unit for each operation. Allocation: to decide registers or memory to store the data Output: Data path and its control logic circuit are derived. Example: Directed acyclic graph Example: Scheduling Example: Binding(allocation) Example: Final Data-Path Example: Controller Hardware Cost 1. 2. 3. Area or hardware Delay or Speed Power Critical path Schedule Methods b. ASIP Design with LISA c. FPGA Design RTL RTL Simulation Logic Synthesis LSI Tool FPGA Tool Synthesis Netlist Gate Assignment LE Place and Rout Configuration Data Functional Verification END Systolic Algorithm Alternative Method Systolic Array Low parallel Efficiency inefficient Data Memory Bottle Neck for I/O PE number uniquely decided Restricted to local communication between PEs Memory Sharing Processor Array (MSPA) High parallel Efficiency Minimization of Data Memory Restriction to I/O ports Restriction to PE number Not restricted to local communication MSPA Theory A(0,0) A(1,0) A(2,0) A(3,0) A(4,0) A(5,0) A(0,1) A(1,1) A(2,1) A(3,1) A(4,1) A(5,1) A(0,2) A(1,2) A(2,2) A(3,2) A(4,2) A(5,2) A(0,3) A(1,3) A(2,3) A(3,3) A(4,3) A(5,3) A(0,4) A(1,4) A(2,4) A(3,4) A(4,4) A(5,4) A(0,5) A(1,5) A(2,5) A(3,5) A(4,5) A(5,5) X(0) X(1) X(2) X(3) X(4) X(5) PE2 PE1 PE0 時刻 Find out appropriate cordinate Of Time and Space, which may Not violate the precedence Relation between operations. PE2 PE1 PE0 Comparisons Matrix Product Case Systolic Size #PE time MSPA Parallel Efficiency #PE time Parallel Efficiency 4 19 10 34% 34 2 94% 10 109 28 33% 208 5 96% 50 1,717 49 33% 8,425 18 98% 201 15,001 1401 39% 80,801 101 100% Widow-MSPA I Image MSPA for Widow Operation Minimize I/O ports and decide #PE Fast Parallel Operation Flexible #PE and processing time Applicable various Image Processing such as motion vector search Window 出力 WINDOW--MSPA Widow-MSPA II Internal data parallel distribution network WPE WPE WPE Internal data parallel distribution network WPE WPE WPE WPE Internal data parallel distribution network WPE WPE WPE WPE WINDOW-MSPA ARCHITECTURE Output Network Input Network External Image Data Memory WPE External Output Data Memory Widow-MSPA理論3 (1996-1998) ① ② ③ ④ ⑤ A(0,0) A(0,1) A(0,2) A(0,3) A(0,4) A(0,5) A(0,6) A(3,0) A(3,1) A(3,2) A(3,3) A(1,0) A(1,1) A(1,2) A(1,3) A(1,4) A(1,5) A(1,6) A(4,0) A(2,0) A(2,1) A(2,2) A(2,3) A(2,4) A(2,5) A(2,6) PE00 PE01 PE02 PE03 PE04 ⑥ ⑦ ⑧ ⑨ ⑩ PE10 PE11 PE12 PE13 ① ② PE14 ③ ④ ⑤ A(0,0) A(0,1) A(0,2) A(0,3) A(0,4) A(0,5) A(0,6) A(1,0) A(1,1) A(1,2) A(1,3) A(1,4) A(1,5) A(1,6) PE20 PE21 PE22 PE23 PE24 ⑥ A(2,0) A(2,1) A(2,2) A(2,3) A(2,4) A(2,5) A(2,6) ①PE00でデータを9クロックで受信して処理(1行目ポート1、---) A(3,0) A(3,1) A(3,2) A(3,3) A(3,4) A(3,5) A(3,6) ②PE01でデータを9クロックで受信して処理(①より1クロック遅れる) ③PE02でデータを9クロックで受信して処理(②より1クロック遅れる) ④PE03でデータを9クロックで受信して処理(③より1クロック遅れる) ⑤PE04でデータを9クロックで受信して処理(④より1クロック遅れる) A(4,0) A(4,1) A(4,2) A(4,3) A(4,4) A(4,5) A(4,6) ⑥PE11でデータを9クロックで受信して処理(①より3クロック遅れる) A(5,0) A(5,1) A(5,2) A(5,3) A(5,4) A(5,5) A(5,6) A(6,0) A(6,1) A(6,2) A(6,3) A(6,4) A(6,5) A(6,6) Operation of Window-MSPA Clock Global/IO_0 0 (0,0) 1 (0,1) 2 (0,2) Global/IO_1 3 4 5 Local/IO.01 (0,0) (0,1) (0,2) 7 8 9 (0,4) (0,5) (0,6) (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,0) (2,1) (2,2) Global/IO_2 Local/IO.00 6 (0,3) (3,0) 11 (2,6) (3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (4,0) (4,1) (4,2) (0,5) (0,6) (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,0) (2,1) (2,2) (2,3) (1,1) (2,4) (2,5) (1,2) PE(2,0) 18 (3,5) (3,6) (4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (5,0) (5,1) (5,2) (5,3) (4,3) (4,4) (4,5) (4,6) (5,0) (5,1) (5,2) (5,3) (2,6) (1,4) (1,5) (1,6) (2,0) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (3,0) (3,1) (3,2) (3,3) (6,0) 19 (4,0) (2,1) (2,2) (4,1) (3,4) (4,2) 21 (6,2) (6,3) (5,4) (5,5) (5,6) (5,4) (5,5) (5,6) (3,5) (4,3) (4,4) (4,5) (4,6) (5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (6,0) (6,1) (6,2) (6,3) (3,6) PE(1,2) (2,3) (2,4) (2,5) (2,6) (3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (4,0) (4,1) (4,2) (4,3) PE(2.1) 20 (6,1) 22 23 24 (6,4) (6,5) (6,6) (6,4) (6,5) (6,6) PE(0,2) PE(1,1) (2,0) Local/IO.22 17 (3,4) (1,3) PE(1,0) Local/IO.21 16 (3,3) PE(0,1) Local/IO.12 Local/IO.20 15 (2,5) (0,4) (1,0) 14 (2,4) (1,0) Local/IO.11 13 (2,3) PE(0,0) Local/IO.10 12 (3,2) (0,3) Local/IO.02 10 (3,1) (4,4) (4,5) (4,6) PE(2,2) Widow-MSPA理論1 (1996-1998) 画面サイズはいずれも22x22 Widow サイズ シストリックアレイ 処理時間 演算器数 並列効率 Window-MSPA 処理時間 (入力ポート数) 演算器数 並列効率 (入力ポート数) 3x3 809 9 (2) 49% 66 63 (9) 87% 8x8 514 64 (7) 44% 176 120 (3) 68% 16x16 354 256 (15) 14% 352 49 (2) 73% Example 1. Line Direction Problem Description direction type: 1 0 2 3 block of interest [Problem] Find out a 3x3 direction type, Which is included the most among Line pattern in target image. [Software Solution] To count the number of direction Types among the line pattern In the target image. (a) (b) Hardware Solution 1. Describe a direction pattern by 9 bit signals such as (000111111) for the first direction type. 2. Use each direction pattern as 9 bit address data of ROM, whose data is a increment signal of corresponding accumulation registers. Increment signals Direction Type 0 Direction Type 1 ROM Direction Type 2 Address 9 bits Data 24 bits Direction Type 23 Direction of Line pattern Select the largest number in the registers
© Copyright 2024 ExpyDoc