VLSI Programming: Lecture 2 Course 2IN35 Course: Lab: Kees van Berkel [email protected] Rudolf Mak [email protected] Kees van Berkel Rudolf Mak Alok Lele, Hrishikesh Salunkhe www: http://www.win.tue.nl/~cberkel/2IN35/ Lecture 2 pipelining, retiming, J-slow, parallel 1 4/23/2014 VLSI Programming: time table 2014 2014 in Tuesday: h5-h6 introduction, 22-Apr DSP representations, bounds unfolding (cntd), look-ahead, 29-Apr T1 + T2 strength reduction 6-May T3 + T4 folding 13-May DSP processors 20-May systolic computation FPGA lab/L3: 27-May T5 out Thursday: h1-h4 out pipelining, retiming, transposition, J-slow, unfolding T1 + T2 (have FPGA tools installed) T3 + T4 FPGA + Verilog intros L1 FPGA lab/L1: audio filter simulation L2 FPGA lab/L2: audio filter on XUP board L3 FPGA lab/L3: sequential FIR, strength-reduced T5 FIR L4 FPGA lab/L4: 10-Jun L4 FPGA lab/L4: audio sample rate convertor FPGA lab/L5: audio sample rate convertor "1024x" 17-Jun FPGA lab/L5: deadline report L5 3-Jun 2 L3 4/23/2014 L5 FPGA IC on a Xilinx XUP Board (Atlys) FPGA Xilinx Spartan 6 3 4/23/2014 Atlys board, based on Xilinx Spartan 6 FPGA Xilinx Spartan 6 4 4/23/2014 Preparation for Lab work • Prepare your notebook for lab work • See preparation link on 2IN35 web-site • Install the required tools and test them 2 weeks from now (May 7): Hrishikesh and Alok will be around for Q&A • First Lab exercises: Thursday May 1 • Find a partner (team size is max 2) 5 4/23/2014 Note on course literature Lectures VLSI programming are loosely based on: • • Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science 1999. This book is recommended, but not mandatory Accompanying slides can be found on: • • http://www.ece.umn.edu/users/parhi/slides.html http://www.win.tue.nl/~cberkel/2IN35/ Mandatory reading: 6 • Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp 1235-1245. • Keshab K. Parhi. High-Level Algorithm and Architecture Transformations for DSP Synthesis. Journal of VLSI Signal Processing, 9, 121-143 (1995), Kluwer Academic Publishers. 4/23/2014 Outline Lecture 2 Transformations of DFGs and SFGs: • (commuting of an SFG) lecture 1 • pipelining of a DFG Parhi3.pdf • transposition of an SFG Parhi3.pdf • retiming of a DFG Parhi4.pdf • K-slow transformation of a DFG Parhi4.pdf • unfolding of a DFG Parhi5.pdf • assignments 7 Parhi3.pdf 4/23/2014 8 4/23/2014 • car assembly line; Henry Ford [1908] • 1914: Ts = 3min; latency = 93 min 9 4/23/2014 10 4/23/2014 11 4/23/2014 12 4/23/2014 13 4/23/2014 every by ≥ 0 14 4/23/2014 15 4/23/2014 16 4/23/2014 17 4/23/2014 18 4/23/2014 19 4/23/2014 20 4/23/2014 Retiming and pipelining • • • • • 21 Review slides Parhi3.pdf Parhi follows a graph-theoretic approach to compute optimal pipelining/retiming For our purposes “moving delays around” is sufficient: Node retiming (Parhi4.pdf, slide 2) Introduction of a delay at all inputs (or all outputs) 4/23/2014 Parhi ’95, Fig 3a 1 2 2 1 2 Critical path is 10 time units long (transposed version: 8 time units) 22 4/23/2014 1 1 Parhi ’95, Fig 3a / retiming step 1 Critical path is 10 time units long 23 4/23/2014 Parhi ’95, Fig 3a / retiming step 2 Critical path is 10 time units long 24 4/23/2014 Parhi ’95, Fig 3a / retiming step 3 Critical path is 7 time units long 25 4/23/2014 Parhi ’95, Fig 3a / retiming step 4 Critical path is 7 time units long 26 4/23/2014 Parhi ’95, Fig 3a / retiming step 5 Critical path is 4 time units long 27 4/23/2014 Parhi ’95, Fig 3a / retiming step 6 Critical path is 4 time units long 28 4/23/2014 Parhi ’95, Fig 3a / retiming step 7 3 Critical path is 3 time units long 29 4/23/2014 3 Parhi ’95, Fig 3a / retiming step 8 4 Critical path is 3 time units long 30 4/23/2014 3 Parhi ’95, Fig 3a / retiming step 9 4 Critical path is 2 time units long 31 4/23/2014 3 Parhi ’95, Fig 3a after retiming = Fig 3b Critical path is 2 time units long 32 4/23/2014 33 4/23/2014 34 4/23/2014 35 4/23/2014 36 4/23/2014 37 4/23/2014 38 4/23/2014 39 4/23/2014 x(2(k-1)) x(10(k-1)) 40 4/23/2014 Unfolding, L=2 •Parhi’s paper, Fig 1/2, paper p123/124 •y(n) = ax(n) + bx(n-1) + cx(n-2) •y(2k) = ax(2k) + bx(2k-1) + cx(2k-2) •y(2k+1) = ax(2k+1) + bx(2k) + cx(2k-1) •Rewrite all indices in equations to the form •(L(k - i) + j), with 0 ≤ j < L •y(2k) = ax(2k) + bx(2(k-1)+1) + cx(2(k-1)) •y(2k+1) = ax(2k+1) + bx(2k) + cx(2(k-1)+1) 41 4/23/2014 = Fig 2 Unfolding, L=3 •Same FIR •y(3k) = ax(3k ) •y(3k+1) = ax(3k+1) •y(3k+2) = ax(3k+2) + bx(3k-1) + cx(3k-2) + bx(3k ) + cx(3k-1) + bx(3k+1) + cx(3k ) •Rewrite all indices in equations to the form •(L(k - i) + j), with 0 ≤ j < L •y(3k) = ax(3k ) •y(3k+1) = ax(3k+1) •y(3k+2) = ax(3k+2) 42 + bx(3(k-1)+2)+ cx(3(k-1)+1) 4/23/2014 + bx(3k ) + cx(3(k-1)+2) + bx(3k+1) + cx(3k ) 43 4/23/2014 44 4/23/2014 45 4/23/2014 46 4/23/2014 47 4/23/2014 48 4/23/2014 Parhi 5, slide 2 •Original program: y(n) = a x(n) + b y(n-2) •2-unfolded version y(2k) = a x(2k) + b y(2k-2) y(2k+1) = a x(2k+1) + b y(2k-1) • •Rewrite all indices in equations to the form •(L(k - i) + j), with 0 ≤ j < L •2-unfolded version y(2k) = a x(2k) + b y(2(k-1)) y(2k+1) = a x(2k+1) + b y(2(k-1)+1) 49 4/23/2014 Parhi 5, slide 3 (Fig 5.3, pp 123) •Original program: v(n) •4-unfolded version v(4k) • v(4k+1) • v(4k+2) • v(4k+3) •4-unfolded, • • • 50 = u(n-37) = u(4k-37) = u(4k-36) = u(4k-35) = u(4k-34) v(4k) = u(4(k-10) +3) v(4k+1) = u(4(k-9)) v(4k+2) = u(4(k-9)+1) v(4k+3) = u(4(k-9)+2) 4/23/2014 51 4/23/2014 Parhi5, slide 4 (Fig 5.4, pp 123) •v(n) •v(3k) •v(3k+1) •v(3k+2) •v(3k) •v(3k+1) •v(3k+2) = u(n-1) + t(n-6) + v(n-12) = u(3k-1) + t(3k-6) + v(3k-12) = u(3k) + t(3k-5) + v(3k-11) = u(3k+1) + t(3k-4) + v(3k-10) = u(3(k-1)+2) + t(3(k-2)) = u(3k) + t(3(k-2)+1) + v(3(k-4)+1) = u(3k+1) + t(3(k-2)+2) + v(3(k-4)+2) •= Fig 5.4b 52 + v(3(k-4)) 4/23/2014 53 4/23/2014 Parhi5, slide 6 (Fig 5.6, pp 129) •u(n) = p(n) + (s*u(n-3) + t*u(n-2)) •u(2k) = p(2k) + (s*u(2k-3) + t*u(2k-2)) •u(2k+1) = p(2k+1) + (s*u(2k-2) + t*u(2k-1)) •u(2k) = p(2k) + (s*u(2(k-2)+1) + t*u(2(k-1)) •u(2k+1) = p(2k+1) + (s*u(2(k-1)) + t*u(2(k-1)+1) 54 4/23/2014 55 4/23/2014 56 4/23/2014 FIR assignment • • Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-5) Assume add and multiply times: 2 and 5 nsec resp. 1. Draw DFG of FIR, calculate throughput. 2. Pipeline and retime FIR for maximal throughput. 3. Unfold FIR J=2; draw the unfolded DFG. Throughput? 4. pipeline and retime unfolded FIR; draw DFG. Throughput? 5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs). Throughput? • 57 Return deadline: Tuesday April 29, 13:45 4/23/2014 IIR assignment • • Consider IIR: y(n) = x(n) + a*y(n-3) Assume add and multiply time: 2 and 5 nsec resp. 1. Draw DFG of IIR, calculate throughput. 2. Pipeline and retime IIR for maximal throughput. 3. Unfold IIR J=2; draw the unfolded DFG. Throughput? 4. pipeline and retime unfolded IIR; draw DFG. Throughput? 5. Same for J=3 (draw DFG), and J=16 (no need to draw DFGs). Throughput? • 58 Return deadline: Tuesday April 29, 13:45 4/23/2014 VLSI Programming: April 29 • • • 59 Parhi, More unfolding, parallelism Strength reduction 4/23/2014 THANK YOU
© Copyright 2024 ExpyDoc