VLSI Programming: Lecture 2

VLSI Programming: Lecture 2
Course 2IN35
Course:
Lab:
Kees van Berkel
[email protected]
Rudolf Mak
[email protected]
Kees van Berkel
Rudolf Mak
Alok Lele, Hrishikesh Salunkhe
www:
http://www.win.tue.nl/~cberkel/2IN35/
Lecture 2
pipelining, retiming, J-slow, parallel
1
4/23/2014
VLSI Programming: time table 2014
2014
in
Tuesday: h5-h6
introduction,
22-Apr
DSP representations, bounds
unfolding (cntd), look-ahead,
29-Apr T1 + T2 strength reduction
6-May
T3 + T4 folding
13-May
DSP processors
20-May
systolic computation
FPGA lab/L3:
27-May T5
out
Thursday: h1-h4
out
pipelining, retiming, transposition,
J-slow, unfolding
T1 + T2
(have FPGA tools installed)
T3 + T4 FPGA + Verilog intros
L1
FPGA lab/L1:
audio filter simulation
L2
FPGA lab/L2:
audio filter on XUP board
L3
FPGA lab/L3:
sequential FIR, strength-reduced
T5
FIR
L4
FPGA lab/L4:
10-Jun L4
FPGA lab/L4:
audio sample rate convertor
FPGA lab/L5:
audio sample rate convertor
"1024x"
17-Jun
FPGA lab/L5:
deadline report L5
3-Jun
2
L3
4/23/2014
L5
FPGA IC on a Xilinx XUP Board (Atlys)
FPGA
Xilinx
Spartan 6
3
4/23/2014
Atlys board, based on Xilinx Spartan 6
FPGA
Xilinx
Spartan 6
4
4/23/2014
Preparation for Lab work
•
Prepare your notebook for lab work
•
See preparation link on 2IN35 web-site
•
Install the required tools and test them
2 weeks from now (May 7):
Hrishikesh and Alok will be around for Q&A
•
First Lab exercises: Thursday May 1
•
Find a partner (team size is max 2)
5
4/23/2014
Note on course literature
Lectures VLSI programming are loosely based on:
•
•
Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and
Implementation. Wiley Inter-Science 1999.
This book is recommended, but not mandatory
Accompanying slides can be found on:
•
•
http://www.ece.umn.edu/users/parhi/slides.html
http://www.win.tue.nl/~cberkel/2IN35/
Mandatory reading:
6
•
Edward A. Lee and David G. Messerschmitt. Synchronous Data
Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp 1235-1245.
•
Keshab K. Parhi. High-Level Algorithm and Architecture
Transformations for DSP Synthesis. Journal of VLSI Signal
Processing, 9, 121-143 (1995), Kluwer Academic Publishers.
4/23/2014
Outline Lecture 2
Transformations of DFGs and SFGs:
•
(commuting of an SFG)
lecture 1
•
pipelining of a DFG
Parhi3.pdf
•
transposition of an SFG
Parhi3.pdf
•
retiming of a DFG
Parhi4.pdf
•
K-slow transformation of a DFG
Parhi4.pdf
•
unfolding of a DFG
Parhi5.pdf
•
assignments
7
Parhi3.pdf
4/23/2014
8
4/23/2014
• car assembly line; Henry Ford [1908]
• 1914: Ts = 3min; latency = 93 min
9
4/23/2014
10
4/23/2014
11
4/23/2014
12
4/23/2014
13
4/23/2014
every
by ≥ 0
14
4/23/2014
15
4/23/2014
16
4/23/2014
17
4/23/2014
18
4/23/2014
19
4/23/2014
20
4/23/2014
Retiming and pipelining
•
•
•
•
•
21
Review slides Parhi3.pdf
Parhi follows a graph-theoretic approach to compute
optimal pipelining/retiming
For our purposes “moving delays around” is sufficient:
Node retiming (Parhi4.pdf, slide 2)
Introduction of a delay at all inputs (or all outputs)
4/23/2014
Parhi ’95, Fig 3a
1
2
2
1
2
Critical path is 10 time units long
(transposed version: 8 time units)
22
4/23/2014
1
1
Parhi ’95, Fig 3a / retiming step 1
Critical path is 10 time units long
23
4/23/2014
Parhi ’95, Fig 3a / retiming step 2
Critical path is 10 time units long
24
4/23/2014
Parhi ’95, Fig 3a / retiming step 3
Critical path is 7 time units long
25
4/23/2014
Parhi ’95, Fig 3a / retiming step 4
Critical path is 7 time units long
26
4/23/2014
Parhi ’95, Fig 3a / retiming step 5
Critical path is 4 time units long
27
4/23/2014
Parhi ’95, Fig 3a / retiming step 6
Critical path is 4 time units long
28
4/23/2014
Parhi ’95, Fig 3a / retiming step 7
3
Critical path is 3 time units long
29
4/23/2014
3
Parhi ’95, Fig 3a / retiming step 8
4
Critical path is 3 time units long
30
4/23/2014
3
Parhi ’95, Fig 3a / retiming step 9
4
Critical path is 2 time units long
31
4/23/2014
3
Parhi ’95, Fig 3a after retiming = Fig 3b
Critical path is 2 time units long
32
4/23/2014
33
4/23/2014
34
4/23/2014
35
4/23/2014
36
4/23/2014
37
4/23/2014
38
4/23/2014
39
4/23/2014
x(2(k-1))
x(10(k-1))
40
4/23/2014
Unfolding, L=2
•Parhi’s paper, Fig 1/2, paper p123/124
•y(n) = ax(n) + bx(n-1) + cx(n-2)
•y(2k) = ax(2k) + bx(2k-1) + cx(2k-2)
•y(2k+1) = ax(2k+1) + bx(2k) + cx(2k-1)
•Rewrite all indices in equations to the form
•(L(k - i) + j), with 0 ≤ j < L
•y(2k) = ax(2k) + bx(2(k-1)+1) + cx(2(k-1))
•y(2k+1) = ax(2k+1) + bx(2k) + cx(2(k-1)+1)
41
4/23/2014
= Fig 2
Unfolding, L=3
•Same FIR
•y(3k) = ax(3k )
•y(3k+1) = ax(3k+1)
•y(3k+2) = ax(3k+2)
+ bx(3k-1)
+ cx(3k-2)
+ bx(3k )
+ cx(3k-1)
+ bx(3k+1)
+ cx(3k )
•Rewrite all indices in equations to the form
•(L(k - i) + j), with 0 ≤ j < L
•y(3k) = ax(3k )
•y(3k+1) = ax(3k+1)
•y(3k+2) = ax(3k+2)
42
+ bx(3(k-1)+2)+ cx(3(k-1)+1)
4/23/2014
+ bx(3k )
+ cx(3(k-1)+2)
+ bx(3k+1)
+ cx(3k
)
43
4/23/2014
44
4/23/2014
45
4/23/2014
46
4/23/2014
47
4/23/2014
48
4/23/2014
Parhi 5, slide 2
•Original program:
y(n)
= a x(n) + b y(n-2)
•2-unfolded version y(2k)
= a x(2k) + b y(2k-2)
y(2k+1) = a x(2k+1) + b y(2k-1)
•
•Rewrite all indices in equations to the form
•(L(k - i) + j), with 0 ≤ j < L
•2-unfolded version y(2k)
= a x(2k) + b y(2(k-1))
y(2k+1) = a x(2k+1) + b y(2(k-1)+1)
49
4/23/2014
Parhi 5, slide 3 (Fig 5.3, pp 123)
•Original program:
v(n)
•4-unfolded version v(4k)
•
v(4k+1)
•
v(4k+2)
•
v(4k+3)
•4-unfolded,
•
•
•
50
= u(n-37)
= u(4k-37)
= u(4k-36)
= u(4k-35)
= u(4k-34)
v(4k)
= u(4(k-10) +3)
v(4k+1)
= u(4(k-9))
v(4k+2)
= u(4(k-9)+1)
v(4k+3)
= u(4(k-9)+2)
4/23/2014
51
4/23/2014
Parhi5, slide 4 (Fig 5.4, pp 123)
•v(n)
•v(3k)
•v(3k+1)
•v(3k+2)
•v(3k)
•v(3k+1)
•v(3k+2)
= u(n-1)
+ t(n-6)
+ v(n-12)
= u(3k-1)
+ t(3k-6)
+ v(3k-12)
= u(3k)
+ t(3k-5)
+ v(3k-11)
= u(3k+1)
+ t(3k-4)
+ v(3k-10)
= u(3(k-1)+2) + t(3(k-2))
= u(3k)
+ t(3(k-2)+1) + v(3(k-4)+1)
= u(3k+1)
+ t(3(k-2)+2) + v(3(k-4)+2)
•= Fig 5.4b
52
+ v(3(k-4))
4/23/2014
53
4/23/2014
Parhi5, slide 6 (Fig 5.6, pp 129)
•u(n)
= p(n)
+ (s*u(n-3) + t*u(n-2))
•u(2k) = p(2k) + (s*u(2k-3) + t*u(2k-2))
•u(2k+1) = p(2k+1) + (s*u(2k-2) + t*u(2k-1))
•u(2k) = p(2k) + (s*u(2(k-2)+1) + t*u(2(k-1))
•u(2k+1) = p(2k+1) + (s*u(2(k-1)) + t*u(2(k-1)+1)
54
4/23/2014
55
4/23/2014
56
4/23/2014
FIR assignment
•
•
Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-5)
Assume add and multiply times: 2 and 5 nsec resp.
1. Draw DFG of FIR, calculate throughput.
2. Pipeline and retime FIR for maximal throughput.
3. Unfold FIR J=2; draw the unfolded DFG. Throughput?
4. pipeline and retime unfolded FIR; draw DFG. Throughput?
5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs).
Throughput?
•
57
Return deadline: Tuesday April 29, 13:45
4/23/2014
IIR assignment
•
•
Consider IIR: y(n) = x(n) + a*y(n-3)
Assume add and multiply time: 2 and 5 nsec resp.
1. Draw DFG of IIR, calculate throughput.
2. Pipeline and retime IIR for maximal throughput.
3. Unfold IIR J=2; draw the unfolded DFG. Throughput?
4. pipeline and retime unfolded IIR; draw DFG. Throughput?
5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs).
Throughput?
•
58
Return deadline: Tuesday April 29, 13:45
4/23/2014
VLSI Programming: April 29
•
•
•
59
Parhi,
More unfolding, parallelism
Strength reduction
4/23/2014
THANK YOU