Ch.10 SoC Design

Ch.10 SoC Design
TAIST ICTES Program
VLSI Design Methodology
Hiroaki Kunieda
Tokyo Institute of Technology
System Design
SoC
Specification
Algorithm
Design
•
•
•
•
•
Behavior Description
Behavior Partitioning
Behavior Design (Algorithm design)
Discrete Time、Quantization、Precision
Power, Cost, Interface
High Level Synthesis
Architecture
Design
•
•
•
•
•
Structure Design
Parallel/Pipeline Processing
High Level Synthesis (Scheduling/Allocation)
SW/HW, Processor、ASIC、
ASIP, IP
RTL
Optimal Realization for required System Performance
System Specification





the clarification of any possible ambiguity
the careful definition of the project scope
approximated costs for development
Identification of competition
subsequent improvement on their capabilities
System Design

Target Platform / Computation model





ASIP, Processor with or without RTOS, ASIC, FPGA
Fixed Point Arithmetic
Multiplication/ Division
Memory
Pin Assignment
10.1 Algorithm Design
System Development Tool
System Description Language
Hardware Oriented Algorithm
a. System Development Tool
MATLAB/SIMULINK
MATLAB® is a high-level technical
computing language and interactive
environment for algorithm
development, data visualization, data
analysis, and numeric computation.
Using the MATLAB product, you can
solve technical computing problems
faster than with traditional
programming languages, such as C,
C++, and Fortran.
Control, DSP, Image Processing,
Communication, Neural Network,
Statics, Optimization, Differential Equations
Key Features







High-level language for technical computing
Development environment for managing code, files, and data
Interactive tools for iterative exploration, design, and problem
solving
Mathematical functions for linear algebra, statistics, Fourier
analysis, filtering, optimization, and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with
external applications and languages, such as C, C++, Fortran,
Java, COM, and Microsoft Excel
b. System Description Language
#include "systemc.h“
SC_MODULE (up_down_counter) {
//-----------input ports--------------sc_in <bool> clk;
sc_in <bool> reset;
sc_in <bool> enable;
sc_in <bool> up_down;
//------------output ports--------------sc_out <sc_uint<8> > out;
//------------internal variables-------sc_uint<8> count;
Up-down Counter by
//--------------process declaration------void counter () {
if (reset.read()) {
count = 0 ;
} else if (enable.read()) {
if (up_down.read()) {
count = count + 1;} else {
count = count - 1; }
}
out.write(count);
}
//-------------process registration-------SC_CTOR(up_down_counter) {
SC_METHOD (counter);
SystemC sensitive << clk.pos();
}
};
SystemC
sc_prim_channel is the base
class for all primitive channels,
and provides such channels with
unique access to the update
phase of the scheduler.
This standard provides a number
of predefined primitive channels
to model common communication
mechanisms. Some of them are
•sc_mutex
•sc_fifo
•sc_semaphore
c. Hardware Oriented Algorithm
Inner Product
–Distributed Arithmetic-

Hardware Algorithm for Inner Product

DFT, DCT, Digital Filters

Basic Idea

Input data is decomposed into a group of bits. By replacing order of
calculation in such a way that multiplication between coefficent and each
bit is performed and accumulated at first and accumulate the results with
shifting.
Bit Manupilation

Inner Product

Bit 2’s complement Representation of xn

By substituting and changing an order of operations as,
Realized by ROM with N-bits address and NBc-bits data
Inner Product Circuit
8 point DCT Implementation
10.2 Architecture Design
a. High Level Synthesis
b. ASIP Design
c. FPGA Design
a. High Level Synthesis
Software Language (C/C++, System C)
simulation is available on PC.
CDFG: Control and Data flow Graph
is constructed.
Scheduling : to decide the time and Op unit
for each operation.
Allocation: to decide registers or memory
to store the data
Output: Data path and its control logic
circuit are derived.
Example: Directed acyclic
graph
Example: Scheduling
Example: Binding(allocation)
Example: Final Data-Path
Example: Controller
Hardware Cost
1.
2.
3.
Area or hardware
Delay or Speed
Power
Critical path
Schedule
Methods
b. ASIP Design with LISA
c. FPGA Design
RTL
RTL Simulation
Logic Synthesis
LSI Tool
FPGA Tool
Synthesis
Netlist
Gate Assignment
LE Place and Rout
Configuration
Data
Functional Verification
END
Systolic Algorithm
Alternative Method

Systolic Array






Low parallel Efficiency
inefficient Data Memory
Bottle Neck for I/O
PE number uniquely decided
Restricted to local communication between PEs
Memory Sharing Processor Array (MSPA)





High parallel Efficiency
Minimization of Data Memory
Restriction to I/O ports
Restriction to PE number
Not restricted to local communication
MSPA Theory
A(0,0)
A(1,0)
A(2,0)
A(3,0)
A(4,0)
A(5,0)
A(0,1)
A(1,1)
A(2,1)
A(3,1)
A(4,1)
A(5,1)
A(0,2)
A(1,2)
A(2,2)
A(3,2)
A(4,2)
A(5,2)
A(0,3)
A(1,3)
A(2,3)
A(3,3)
A(4,3)
A(5,3)
A(0,4)
A(1,4)
A(2,4)
A(3,4)
A(4,4)
A(5,4)
A(0,5)
A(1,5)
A(2,5)
A(3,5)
A(4,5)
A(5,5)
X(0)
X(1)
X(2)
X(3)
X(4)
X(5)
PE2
PE1
PE0
時刻
Find out appropriate cordinate
Of Time and Space, which may
Not violate the precedence
Relation between operations.
PE2
PE1
PE0
Comparisons
Matrix Product Case
Systolic
Size
#PE
time
MSPA
Parallel
Efficiency
#PE
time
Parallel
Efficiency
4
19
10
34%
34
2
94%
10
109
28
33%
208
5
96%
50
1,717
49
33%
8,425
18
98%
201
15,001
1401
39%
80,801
101
100%
Widow-MSPA I
Image





MSPA for Widow Operation
Minimize I/O ports and
decide #PE
Fast Parallel Operation
Flexible #PE and processing
time
Applicable various Image
Processing such as motion
vector search
Window
出力
WINDOW--MSPA
Widow-MSPA II
Internal data parallel distribution network
WPE
WPE
WPE
Internal data parallel distribution network
WPE
WPE
WPE
WPE
Internal data parallel distribution network
WPE
WPE
WPE
WPE
WINDOW-MSPA ARCHITECTURE
Output Network
Input Network
External
Image
Data
Memory
WPE
External
Output
Data
Memory
Widow-MSPA理論3
(1996-1998)
①
②
③
④
⑤
A(0,0) A(0,1) A(0,2) A(0,3) A(0,4) A(0,5) A(0,6)
A(3,0) A(3,1) A(3,2) A(3,3)
A(1,0) A(1,1) A(1,2) A(1,3) A(1,4) A(1,5) A(1,6)
A(4,0)
A(2,0) A(2,1) A(2,2) A(2,3) A(2,4) A(2,5) A(2,6)
PE00
PE01
PE02
PE03
PE04
⑥
⑦
⑧
⑨
⑩
PE10
PE11
PE12
PE13
① ②
PE14
③
④
⑤
A(0,0) A(0,1) A(0,2) A(0,3) A(0,4) A(0,5) A(0,6)
A(1,0) A(1,1) A(1,2) A(1,3) A(1,4) A(1,5) A(1,6)
PE20
PE21
PE22
PE23
PE24
⑥
A(2,0) A(2,1) A(2,2) A(2,3) A(2,4) A(2,5) A(2,6)
①PE00でデータを9クロックで受信して処理(1行目ポート1、---)
A(3,0) A(3,1) A(3,2) A(3,3) A(3,4) A(3,5) A(3,6)
②PE01でデータを9クロックで受信して処理(①より1クロック遅れる)
③PE02でデータを9クロックで受信して処理(②より1クロック遅れる)
④PE03でデータを9クロックで受信して処理(③より1クロック遅れる)
⑤PE04でデータを9クロックで受信して処理(④より1クロック遅れる)
A(4,0) A(4,1) A(4,2) A(4,3) A(4,4) A(4,5) A(4,6)
⑥PE11でデータを9クロックで受信して処理(①より3クロック遅れる)
A(5,0) A(5,1) A(5,2) A(5,3) A(5,4) A(5,5) A(5,6)
A(6,0) A(6,1) A(6,2) A(6,3) A(6,4) A(6,5) A(6,6)
Operation of Window-MSPA
Clock
Global/IO_0
0
(0,0)
1
(0,1)
2
(0,2)
Global/IO_1
3
4
5
Local/IO.01
(0,0)
(0,1)
(0,2)
7
8
9
(0,4)
(0,5)
(0,6)
(1,0)
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
(2,0)
(2,1)
(2,2)
Global/IO_2
Local/IO.00
6
(0,3)
(3,0)
11
(2,6)
(3,0)
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
(3,6)
(4,0)
(4,1)
(4,2)
(0,5)
(0,6)
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
(2,0)
(2,1)
(2,2)
(2,3)
(1,1)
(2,4)
(2,5)
(1,2)
PE(2,0)
18
(3,5)
(3,6)
(4,0)
(4,1)
(4,2)
(4,3)
(4,4)
(4,5)
(4,6)
(5,0)
(5,1)
(5,2)
(5,3)
(4,3)
(4,4)
(4,5)
(4,6)
(5,0)
(5,1)
(5,2)
(5,3)
(2,6)
(1,4)
(1,5)
(1,6)
(2,0)
(2,1)
(2,2)
(2,3)
(2,4)
(2,5)
(2,6)
(3,0)
(3,1)
(3,2)
(3,3)
(6,0)
19
(4,0)
(2,1)
(2,2)
(4,1)
(3,4)
(4,2)
21
(6,2)
(6,3)
(5,4)
(5,5)
(5,6)
(5,4)
(5,5)
(5,6)
(3,5)
(4,3)
(4,4)
(4,5)
(4,6)
(5,0)
(5,1)
(5,2)
(5,3)
(5,4)
(5,5)
(5,6)
(6,0)
(6,1)
(6,2)
(6,3)
(3,6)
PE(1,2)
(2,3)
(2,4)
(2,5)
(2,6)
(3,0)
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
(3,6)
(4,0)
(4,1)
(4,2)
(4,3)
PE(2.1)
20
(6,1)
22
23
24
(6,4)
(6,5)
(6,6)
(6,4)
(6,5)
(6,6)
PE(0,2)
PE(1,1)
(2,0)
Local/IO.22
17
(3,4)
(1,3)
PE(1,0)
Local/IO.21
16
(3,3)
PE(0,1)
Local/IO.12
Local/IO.20
15
(2,5)
(0,4)
(1,0)
14
(2,4)
(1,0)
Local/IO.11
13
(2,3)
PE(0,0)
Local/IO.10
12
(3,2)
(0,3)
Local/IO.02
10
(3,1)
(4,4)
(4,5)
(4,6)
PE(2,2)
Widow-MSPA理論1
(1996-1998)
画面サイズはいずれも22x22
Widow
サイズ
シストリックアレイ
処理時間
演算器数
並列効率
Window-MSPA
処理時間
(入力ポート数)
演算器数
並列効率
(入力ポート数)
3x3
809
9
(2)
49%
66
63
(9)
87%
8x8
514
64
(7)
44%
176
120
(3)
68%
16x16
354
256
(15)
14%
352
49
(2)
73%
Example 1. Line Direction
Problem Description
direction type:
1
0
2
3
block of interest
[Problem]
Find out a 3x3 direction type,
Which is included the most among
Line pattern in target image.
[Software Solution]
To count the number of direction
Types among the line pattern
In the target image.
(a)
(b)
Hardware Solution
1. Describe a direction pattern by 9 bit signals such as (000111111) for
the first direction type.
2. Use each direction pattern as 9 bit address data of ROM, whose data
is a increment signal of corresponding accumulation registers.
Increment signals
Direction Type 0
Direction Type 1
ROM
Direction Type 2
Address 9 bits
Data 24 bits
Direction Type 23
Direction of
Line pattern
Select
the
largest
number
in the
registers