Fundamental Algorithms for System Modeling, Analysis, and Optimization Edward A. Lee, Jaijeet Roychowdhury, Sanjit A. Seshia, Stavros Tripakis UC Berkeley EECS 144/244 Spring 2013 Copyright © 2010-13, E. A. Lee, J. Roychowdhury, S. A. Seshia, S. Tripakis. All rights reserved Lecture: Timing Analysis, Retiming Thanks to Kurt Keutzer for several slides RTL Synthesis Flow FSM, Verilog, VHDL HDL Simulation/ Verification HDL RTL Synthesis netlist Library/ module generators a 0 b 1 d q s Boolean circuit/network clk logic optimization netlist a 0 b 1 s physical design d q Boolean circuit/network clk Timing Analysis Graph / Rectangles layout K. Keutzer EECS 144/244, UC Berkeley: 2 1 Timing Analysis / Verification Verifying a property about system timing Arises in many settings: Integrated circuits Embedded software Distributed embedded systems Biological systems … Here we focus on circuits. Chapter 15 of Lee-Seshia book contains a more broad discussion. EECS 144/244, UC Berkeley: 3 Timing Analysis for Digital ICs (Clock) Speed is one of the major performance metrics for digital circuits Timing Analysis = the process of verifying that a chip meets its speed requirement E.g., 1 GHz means that next-state function must be computed within 1 ns EECS 144/244, UC Berkeley: 4 2 Static Timing Analysis for Circuits Combinational logic Combinational logic Combinational logic clk clk clk Determine fastest permissible clock speed (e.g. 1 GHz) by determining delay of longest path from register to register (e.g. 1ns.) EECS 144/244, UC Berkeley: 5 Cycle Time - Critical Path Delay Cycle time (T) cannot be smaller than longest path delay (Tmax) Longest (critical) path delay is a function of: Total gate, wire delays cycle time data Tclock1 Q2 Q1 Tclock1 Tmax T critical path, ~5 logic levels Tclock2 clock EECS 144/244, UC Berkeley: 6 3 Cycle Time - Setup Time For FFs to correctly latch data, input data must be stable during: Setup time (Tsetup) before clock arrives setup time data Tclock1 Tmax Tsetup T Q2 Q1 critical path, ~5 logic levels Tclock1 Tclock2 clock EECS 144/244, UC Berkeley: 7 Cycle Time - Clock-skew If clock network has unbalanced delay – clock skew Cycle time is also a function of clock skew (Tskew) data Tclock2 Tclock1 Q2 Tmax Tsetup Tskew T Q2 Q1 Tclock1 clock skew critical path, ~5 logic levels Tclock2 clock EECS 144/244, UC Berkeley: 8 4 Cycle Time - Clock to Q Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q : time from arrival of clock signal till change at FF output) data Tclock1 Tclock2 Q2 clock-to-Q Tmax Tsetup Tskew Tclk to Q T Q2 Q1 critical path, ~5 logic levels Tclock1 Tclock2 clock EECS 144/244, UC Berkeley: 9 Min Path Delay - Hold Time For FFs to correctly latch data, input data must be stable during Hold time (Thold) after clock arrives Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew) hold time data Tclock1 Q2 Q1 Tclock1 Tmin Thold Tskew short path, ~3 logic levels Tclock2 clock EECS 144/244, UC Berkeley: 10 5 Summary Need to be able to compute Tmin: delay of “fastest” path in the circuit Tmax: delay of “slowest” (critical) path in the circuit EECS 144/244, UC Berkeley: 11 Modeling Timing in a Combinational Circuit Arrival time in green 0 Interconnect delay in red Gate delay in blue 0 .20 A W .10 .05 X 2 1 Z .05 C 2 Y f .15 .20 2 B 1 .20 1 What’s the right mathematical object to use to represent this physical object? EECS 144/244, UC Berkeley: 13 6 Modeling - 1 Use a labeled directed graph G = <V,E> Vertices represent gates, primary inputs and primary outputs Edges represent wires Labels represent delays s What about arrival times? .05 X 0 2 1 .10 Z .05 C 0 .20 A W 2 B 1 .20 1 A X 0.05 0 0 f .15 .20 C 2 Y 2 1 .1 W Y 1 2 .15 f .2 1 .20 .05 2 Z .20 B EECS 144/244, UC Berkeley: 14 Modeling - 2 A Find longest/shortest paths in a directed graph G = <V,E> 0 0 s What sort of directed graph do we have? X 0 C .1 2 1 W 1 2 .15 f .2 1 .20 .05 Y 2 .20 Z B Is this in the standard form for a longest/shortest path problem? EECS 144/244, UC Berkeley: 15 7 Split Nodes into Edges A X 0 0 0 s C 2 1 .1 W Y 1 2 .15 f .2 1 .20 .05 2 Z .20 B 2 2 2 3 1 2.5 0.5 EECS 144/244, UC Berkeley: 16 DAG with Weighted Edges A X 0 0 2.2 0 C 0.1 W s 2.15 1.05 Z 0.2 1 1 f 2.2 Y B Problem: Find the longest (critical) path from source s to sink f. EECS 144/244, UC Berkeley: 17 8 Naïve Approach: Enumerate Paths How many paths in this example? In the worst case? A X 0 0 2.2 0 C .1 W s 2.15 1.05 f Z .2 2.2 1 1 Y B Problem: Find the longest path from source s to sink f. EECS 144/244, UC Berkeley: 18 Algorithm 1: Longest path in a DAG Critical Path Method [Kirkpatrick 1966, IBM JRD] (dynamic programming) Let w(u,v) denote weight of edge from u to v Steps: 1. Topologically sort vertices (graph is acyclic) order: v1, v2, …, vn v1 = s, vn = ? 2. For each vertex v, compute d(v) = length of longest path from source s to v d(v1) = 0 For i = 2..n d(vi) = maxall incoming edges (u, vi) d(u) + w(u,vi) EECS 144/244, UC Berkeley: 19 9 Algorithm 1: Longest path in a DAG Critical Path Method [Kirkpatrick 1966, IBM JRD] Let w(u,v) denote weight of edge from u to v Steps: Time Complexity? 1. Topologically sort vertices O(m+n) order: v1, v2, …, vn v1 = s, vn = f 2. For each vertex v, compute d(v) = length of longest path from source s to v d(v1) = 0 For i = 2..n d(vi) = maxall incoming edges (u, vi) d(u) + w(u,vi) Run the CPM on our example EECS 144/244, UC Berkeley: 20 Graph vs. Circuit Delay of a combinational circuit depends on Circuit topology (graph model) Delay model (e.g., fixed vs. variable) Boolean behavior of gates We have only considered circuit topology so far. Longest/shortest path found on the graph can be very pessimistic: Paths can be FALSE Delay values are BOUNDS EECS 144/244, UC Berkeley: 21 10 False Paths (consider Transition Mode) A path is false if it cannot be responsible for the delay of a circuit Graph model implies path of length 6 EECS 144/244, UC Berkeley: 22 False Paths A path is false if it cannot be responsible for the delay of a circuit Graph model implies path of length 6 EECS 144/244, UC Berkeley: 23 11 False vs. True Paths TRUE path = one that can be responsible for the delay of a circuit Need techniques to find whether a path is TRUE or FALSE EECS 144/244, UC Berkeley: 24 The Fixed Delay Model: Constant delay for each gate (or wire) EECS 144/244, UC Berkeley: 25 12 Paradoxical Behavior with the Transition Model? non-monotonic w.r.t. time delays! EECS 144/244, UC Berkeley: 26 Problems with Fixed Delay + Transition Model 1. 2. Transition model can be tricky to reason about Fixed gate delays are unrealistic, due to manufacturing process variations More realistic delay model: Lower and upper bounds Perform timing analysis for a whole family of circuits that share the same lower/upper bounds EECS 144/244, UC Berkeley: 27 13 Fixed Delays Bounded Delays Want algorithms that report the critical path delay of the slowest circuit in the circuit family Paper: Computation of Floating Mode Delay in Combinational Circuits: Theory and Algorithms, Devadas, Keutzer, Malik, 1993 (on bspace) EECS 144/244, UC Berkeley: 28 RETIMING EECS 144/244, UC Berkeley: 29 14 Retiming Tradeoffs a b 1 1 2 2 1 2 clock period = 6 # registers = 4 [Shenoy, 1997] EECS 144/244, UC Berkeley: 30 Retiming Tradeoffs a b 1 1 2 2 1 2 clock period = # registers = 5 4 [Shenoy, 1997] EECS 144/244, UC Berkeley: 31 15 Retiming Tradeoffs a b 1 1 2 2 1 2 clock period = # registers = 4 3 [Shenoy, 1997] EECS 144/244, UC Berkeley: 32 Retiming Tradeoffs a b 1 1 2 2 1 2 clock period = # registers = 2 4 [Shenoy, 1997] EECS 144/244, UC Berkeley: 33 16 Goals of Retiming Possible goals: • Minimize clock period (min-period retiming) • Minimize number of registers (min-area retiming) • Minimize number of registers for a target clock period (constrained min-area retiming) [Shenoy, 1997] EECS 144/244, UC Berkeley: 34 Abstraction: Circuit Graph Nodes: circuit elements Node weights: delay Dummy node for fanout EECS 144/244, UC Berkeley: 35 17 Abstraction: Registers? EECS 144/244, UC Berkeley: 36 Abstraction - Registers 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 Arc weights indicate registers EECS 144/244, UC Berkeley: 37 18 Abstraction - Environment 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 EECS 144/244, UC Berkeley: 38 Cutset – Divides the Graph in Two 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 EECS 144/244, UC Berkeley: 39 19 Retiming: Add a register on all arcs crossing the cutset in one direction, and subtract a register from all arcs crossing the cutset in the other direction. 0 0+1=1 0 0 0 0 0 0 0 0 0 1 0 0+1=1 1-1=0 0 1 1-1=0 EECS 144/244, UC Berkeley: 40 Recall: Retiming Tradeoffs a b 1 1 2 2 1 2 clock period = 6 # registers = 4 [Shenoy, 1997] EECS 144/244, UC Berkeley: 41 20 Recall: Retiming Tradeoffs a 1 b 1 2 2 1 2 clock period = # registers = 5 4 [Shenoy, 1997] EECS 144/244, UC Berkeley: 42 Simplest cutset surrounds one node 0 0 1-1=0 0+1=1 0 0 0 0 0 0 0 0 1-1=0 0 0 1 1 0 EECS 144/244, UC Berkeley: 43 21 Recall: Retiming Tradeoffs a b 1 1 2 2 1 2 clock period = # registers = 5 4 [Shenoy, 1997] EECS 144/244, UC Berkeley: 44 Recall: Retiming Tradeoffs a b 1 1 2 2 1 2 clock period = # registers = 4 3 [Shenoy, 1997] EECS 144/244, UC Berkeley: 45 22 For each node v, define r (v) = # of registers moved from the outputs to the inputs. A retiming is now an assignment r (v) for every node v such that the weight of every arc is non-negative. r (v) = -1 0 0 1-1=0 0+1=1 0 v 0 0 0 0 0 0 0 1-1=0 0 0 1 1 0 EECS 144/244, UC Berkeley: 46 Rest of the story Retiming operations = graph transformations Each graph has a vector of costs: (clock period, # registers, …) Formulate and solve optimization problem. Papers (on bspace): 1. Leiserson, C. E. and J. B. Saxe (1983). "Optimizing synchronous systems." Journal of VLSI and Computer Systems: pp. 41-67. 2. Leiserson, C. E. and J. B. Saxe (1991). "Retiming synchronous circuitry." Algorithmica 6(1): pp. 5-35. 3. Shenoy, N. (1997). "Retiming: Theory and practice." Integration, the VLSI Journal 22: pp. 1-21. EECS 144/244, UC Berkeley: 47 23 Back-up slides EECS 144/244, UC Berkeley: 48 Fixed Delays Bounded Delays Want algorithms that report the critical path delay of the slowest circuit in the circuit family Delay of 6 for the above circuit for transition model (longest path that can propagate a transition) EECS 144/244, UC Berkeley: 49 24 Floating-Mode Delay Model Input transition Single input vector condition Pessimistic, but easier to compute EECS 144/244, UC Berkeley: 50 Floating-Mode Delay Model Assume an input pair <v1, v2> has been applied, but we only look at v2 -- i.e. node values are unknown until set by v2 (pessimistic because we assume any v1 can be adversarially selected, to reason about long paths) Assume the 1 at the input of the AND arrives before the 0 (even if in reality it arrives later and the gate output stays at 0 throughout, and no path is sensitized). EECS 144/244, UC Berkeley: 51 25 Roadmap for rest of lecture Consider conditions under which paths are TRUE or FALSE under the floating-mode delay model with fixed delays + under floating-mode model, fixed and bounded delays yield same worst-case circuit delay (for same upper bounds) + worst-case delay under floating-mode model is upper bound on that under the transition model Reference (posted on bSpace): S. Devadas, K. Keutzer, S. Malik: “Computation of Floating Mode Delay in Combinational Circuits: Theory and Algorithms”, IEEE TCAD, December 1993. EECS 144/244, UC Berkeley: 52 Controlling and Non-Controlling Values A controlling value at a gate input is the value that determines the output value of that gate irrespective of the other input value. (the output value is called a controlled value) A controlling value for an AND gate is 0 and for an OR gate is 1. (The controlled values are 0 and 1 resp.) A non-controlling value for an AND gate is 1 and for an OR gate is 0. What about NAND and NOR gates? EECS 144/244, UC Berkeley: 53 26 Static Sensitization Definition: A path is statically sensitized by a vector V, if along each gate on the path, if the gate output is a controlled value, the input corresponding to the path is the only input with a controlling value Input vector 100X statically sensitizes red path EECS 144/244, UC Berkeley: 54 Static Sensitization Static sensitization is sufficient for a path to be responsible for the delay of a circuit WHY? Input vector 100X statically sensitizes red path EECS 144/244, UC Berkeley: 55 27 Is this path statically sensitizable? Definition: A path is statically sensitized by a vector V, if along each gate on the path, if the gate output is a controlling value, the input corresponding to the path is the only input with a controlling value EECS 144/244, UC Berkeley: 56 Is this path statically sensitizable? No, red path is NOT statically sensitizable (work this out) EECS 144/244, UC Berkeley: 57 28 More on Static Sensitization Are paths a,d,f,g and b,d,f,g statically sensitizable? Are they true paths? EECS 144/244, UC Berkeley: 58 Static Sensitization is too strong A true path (one that is responsible for delay of a circuit) need not be statically sensitizable Paths a,d,f,g and b,d,f,g are NOT statically sensitizable. But they are TRUE paths. EECS 144/244, UC Berkeley: 59 29 Static Co-sensitization Definition: A path is statically co-sensitized by a vector V, if the input corresponding to the path presents a controlling value at each gate along the path whose output is a controlled value. Not necessarily the ONLY controlling value EECS 144/244, UC Berkeley: 60 Static Co-sensitization Definition: A path is statically co-sensitized by a vector V, if the input corresponding to the path presents a controlling value at each gate along the path whose output is a controlling value. Not necessarily the ONLY controlling value Paths a,d,f,g and b,d,f,g are statically co-sensitizable EECS 144/244, UC Berkeley: 61 30 Static Co-sensitization and Delay Static co-sensitization is necessary for a path to be responsible for the delay of a circuit. (WHY?) Is it sufficient? EECS 144/244, UC Berkeley: 62 Static Co-sensitization and Delay Static co-sensitization is necessary for a path to be responsible for the delay of a circuit But NOT sufficient Path of length 6 is statically co-sensitized Delay of circuit is 5 (as observed earlier) EECS 144/244, UC Berkeley: 63 31 Summary Static sensitization (SS) sufficient for true path, but not necessary Static co-sensitization (SC) necessary for true path, but not sufficient All paths SS True paths SC EECS 144/244, UC Berkeley: 64 32
© Copyright 2024 ExpyDoc