Fundamental Algorithms for System Modeling, Analysis

Fundamental Algorithms
for System Modeling,
Analysis, and Optimization
Edward A. Lee, Jaijeet Roychowdhury,
Sanjit A. Seshia, Stavros Tripakis
UC Berkeley
EECS 144/244
Spring 2013
Copyright © 2010-13, E. A. Lee, J. Roychowdhury,
S. A. Seshia, S. Tripakis. All rights reserved
Lecture: Timing Analysis, Retiming
Thanks to Kurt Keutzer for several slides
RTL Synthesis Flow
FSM,
Verilog,
VHDL
HDL
Simulation/
Verification
HDL
RTL
Synthesis
netlist
Library/
module
generators
a
0
b
1
d
q
s
Boolean circuit/network
clk
logic
optimization
netlist
a
0
b
1
s
physical
design
d
q
Boolean circuit/network
clk
Timing Analysis
Graph / Rectangles
layout
K. Keutzer
EECS 144/244, UC Berkeley: 2
1
Timing Analysis / Verification
Verifying a property about system timing
Arises in many settings:





Integrated circuits
Embedded software
Distributed embedded systems
Biological systems
…
Here we focus on circuits.
Chapter 15 of Lee-Seshia book contains a more broad
discussion.
EECS 144/244, UC Berkeley: 3
Timing Analysis for Digital ICs
(Clock) Speed is one of the major performance metrics
for digital circuits
Timing Analysis = the process of verifying that a chip
meets its speed requirement
 E.g.,
1 GHz means that next-state function must be
computed within 1 ns
EECS 144/244, UC Berkeley: 4
2
Static Timing Analysis for Circuits
Combinational
logic
Combinational
logic
Combinational
logic
clk
clk
clk
Determine fastest permissible clock speed (e.g. 1 GHz)
by determining delay of longest path from register to
register (e.g. 1ns.)
EECS 144/244, UC Berkeley: 5
Cycle Time - Critical Path Delay
Cycle time (T) cannot be
smaller than longest
path delay (Tmax)
Longest (critical) path
delay is a function of:
Total gate, wire delays
cycle time
data
Tclock1
Q2
Q1
Tclock1
Tmax  T
critical path,
~5 logic levels
Tclock2
clock
EECS 144/244, UC Berkeley: 6
3
Cycle Time - Setup Time
For FFs to correctly latch
data, input data must
be stable during:
Setup time (Tsetup) before
clock arrives
setup time
data
Tclock1
Tmax  Tsetup  T
Q2
Q1
critical path,
~5 logic levels
Tclock1
Tclock2
clock
EECS 144/244, UC Berkeley: 7
Cycle Time - Clock-skew
If clock network has
unbalanced delay – clock
skew
Cycle time is also a function
of clock skew (Tskew)
data
Tclock2
Tclock1
Q2
Tmax  Tsetup  Tskew  T
Q2
Q1
Tclock1
clock skew
critical path,
~5 logic levels
Tclock2
clock
EECS 144/244, UC Berkeley: 8
4
Cycle Time - Clock to Q
Cycle time is also a
function of
propagation delay of
FF (Tclk-to-Q)
Tclk-to-Q : time from
arrival of clock signal
till change at FF
output)
data
Tclock1
Tclock2
Q2
clock-to-Q
Tmax  Tsetup  Tskew  Tclk  to Q  T
Q2
Q1
critical path,
~5 logic levels
Tclock1
Tclock2
clock
EECS 144/244, UC Berkeley: 9
Min Path Delay - Hold Time
For FFs to correctly latch
data, input data must be
stable during Hold time
(Thold) after clock arrives
Determined by delay of
shortest path in circuit
(Tmin) and clock skew
(Tskew)
hold time
data
Tclock1
Q2
Q1
Tclock1
Tmin  Thold  Tskew
short path, ~3
logic levels
Tclock2
clock
EECS 144/244, UC Berkeley: 10
5
Summary
Need to be able to compute
Tmin: delay of “fastest” path in the circuit
Tmax: delay of “slowest” (critical) path in the circuit
EECS 144/244, UC Berkeley: 11
Modeling Timing in a Combinational Circuit
Arrival time
in green
0
Interconnect
delay in
red
Gate delay in
blue
0
.20
A
W
.10
.05 X
2
1
Z
.05
C
2
Y
f
.15
.20
2
B
1
.20
1
What’s the right mathematical
object to use to represent this
physical object?
EECS 144/244, UC Berkeley: 13
6
Modeling - 1
Use a labeled
directed graph
G = <V,E>
Vertices represent
gates, primary
inputs and
primary outputs
Edges represent
wires
Labels represent
delays
s
What about arrival
times?
.05 X
0
2
1
.10
Z
.05
C
0
.20
A
W
2
B
1
.20
1
A
X
0.05
0
0
f
.15
.20
C
2
Y
2
1
.1
W
Y
1
2 .15
f
.2
1
.20
.05
2
Z
.20
B
EECS 144/244, UC Berkeley: 14
Modeling - 2
A
Find longest/shortest
paths in a directed
graph G = <V,E>
0
0
s
What sort of directed
graph do we have?
X
0
C
.1
2
1
W
1
2 .15
f
.2
1
.20
.05
Y
2
.20
Z
B
Is this in the standard
form for a
longest/shortest
path problem?
EECS 144/244, UC Berkeley: 15
7
Split Nodes into Edges
A
X
0
0
0
s
C
2
1
.1
W
Y
1
2 .15
f
.2
1
.20
.05
2
Z
.20
B
2
2
2
3
1
2.5
0.5
EECS 144/244, UC Berkeley: 16
DAG with Weighted Edges
A
X
0
0
2.2
0
C
0.1
W
s
2.15
1.05
Z
0.2
1
1
f
2.2
Y
B
Problem: Find the longest (critical) path
from source s to sink f.
EECS 144/244, UC Berkeley: 17
8
Naïve Approach: Enumerate Paths
How many paths in this example?
In the worst case?
A
X
0
0
2.2
0
C
.1
W
s
2.15
1.05
f
Z
.2
2.2
1
1
Y
B
Problem:
Find the longest path from source s to sink f.
EECS 144/244, UC Berkeley: 18
Algorithm 1: Longest path in a DAG
Critical Path Method [Kirkpatrick 1966, IBM JRD]
(dynamic programming)
Let w(u,v) denote weight of edge from u to v
Steps:
1. Topologically sort vertices (graph is acyclic)
order: v1, v2, …, vn
v1 = s, vn = ?
2. For each vertex v, compute
d(v) = length of longest path from source s to v
d(v1) = 0
For i = 2..n
d(vi) = maxall incoming edges (u, vi) d(u) + w(u,vi)
EECS 144/244, UC Berkeley: 19
9
Algorithm 1: Longest path in a DAG
Critical Path Method [Kirkpatrick 1966, IBM JRD]
Let w(u,v) denote weight of edge from u to v
Steps:
Time Complexity?
1. Topologically sort vertices
O(m+n)
order: v1, v2, …, vn
v1 = s, vn = f
2. For each vertex v, compute
d(v) = length of longest path from source s to v
d(v1) = 0
For i = 2..n
d(vi) = maxall incoming edges (u, vi) d(u) + w(u,vi)
Run the CPM on our example
EECS 144/244, UC Berkeley: 20
Graph vs. Circuit
Delay of a combinational circuit depends on



Circuit topology (graph model)
Delay model (e.g., fixed vs. variable)
Boolean behavior of gates
We have only considered circuit topology so far.
Longest/shortest path found on the graph can be very
pessimistic:


Paths can be FALSE
Delay values are BOUNDS
EECS 144/244, UC Berkeley: 21
10
False Paths
(consider Transition Mode)
A path is false if it cannot be responsible for the delay of a circuit
Graph model implies path of length 6
EECS 144/244, UC Berkeley: 22
False Paths
A path is false if it cannot be responsible for the delay of a circuit
Graph model implies path of length 6
EECS 144/244, UC Berkeley: 23
11
False vs. True Paths
TRUE path = one that can be responsible for the delay of
a circuit
Need techniques to find whether a path is TRUE or
FALSE
EECS 144/244, UC Berkeley: 24
The Fixed Delay Model: Constant delay for each
gate (or wire)
EECS 144/244, UC Berkeley: 25
12
Paradoxical Behavior with the Transition Model?
non-monotonic
w.r.t. time delays!
EECS 144/244, UC Berkeley: 26
Problems with Fixed Delay + Transition Model
1.
2.
Transition model can be tricky to reason about
Fixed gate delays are unrealistic, due to
manufacturing process variations
More realistic delay model: Lower and upper bounds
Perform timing analysis for a whole family of circuits that
share the same lower/upper bounds
EECS 144/244, UC Berkeley: 27
13
Fixed Delays  Bounded Delays
Want algorithms that report the critical path delay
of the slowest circuit in the circuit family
Paper: Computation of Floating Mode Delay in Combinational
Circuits: Theory and Algorithms, Devadas, Keutzer, Malik, 1993
(on bspace)
EECS 144/244, UC Berkeley: 28
RETIMING
EECS 144/244, UC Berkeley: 29
14
Retiming Tradeoffs
a
b
1
1
2
2
1
2
clock period = 6
# registers = 4
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 30
Retiming Tradeoffs
a
b
1
1
2
2
1
2
clock period =
# registers =
5
4
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 31
15
Retiming Tradeoffs
a
b
1
1
2
2
1
2
clock period =
# registers =
4
3
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 32
Retiming Tradeoffs
a
b
1
1
2
2
1
2
clock period =
# registers =
2
4
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 33
16
Goals of Retiming
Possible goals:
•
Minimize clock period (min-period retiming)
•
Minimize number of registers (min-area retiming)
•
Minimize number of registers for a target clock period
(constrained min-area retiming)
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 34
Abstraction:
Circuit  Graph
Nodes: circuit elements
Node weights: delay
Dummy node for fanout
EECS 144/244, UC Berkeley: 35
17
Abstraction:
Registers?
EECS 144/244, UC Berkeley: 36
Abstraction - Registers
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
1
0
1
Arc weights indicate registers
EECS 144/244, UC Berkeley: 37
18
Abstraction - Environment
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
1
1
EECS 144/244, UC Berkeley: 38
Cutset – Divides the Graph in Two
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
1
0
1
EECS 144/244, UC Berkeley: 39
19
Retiming: Add a register on all arcs crossing the
cutset in one direction, and subtract a register from
all arcs crossing the cutset in the other direction.
0
0+1=1
0
0
0
0
0
0
0
0
0
1
0
0+1=1
1-1=0
0
1
1-1=0
EECS 144/244, UC Berkeley: 40
Recall: Retiming Tradeoffs
a
b
1
1
2
2
1
2
clock period = 6
# registers = 4
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 41
20
Recall: Retiming Tradeoffs
a
1
b
1
2
2
1
2
clock period =
# registers =
5
4
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 42
Simplest cutset surrounds one node
0
0
1-1=0
0+1=1
0
0
0
0
0
0
0
0
1-1=0
0
0
1
1
0
EECS 144/244, UC Berkeley: 43
21
Recall: Retiming Tradeoffs
a
b
1
1
2
2
1
2
clock period =
# registers =
5
4
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 44
Recall: Retiming Tradeoffs
a
b
1
1
2
2
1
2
clock period =
# registers =
4
3
[Shenoy, 1997]
EECS 144/244, UC Berkeley: 45
22
For each node v, define r (v) = # of registers moved
from the outputs to the inputs.
A retiming is now an
assignment r (v) for every
node v such that the weight of
every arc is non-negative.
r (v) = -1
0
0
1-1=0
0+1=1
0
v
0
0
0
0
0
0
0
1-1=0
0
0
1
1
0
EECS 144/244, UC Berkeley: 46
Rest of the story
Retiming operations = graph transformations
Each graph has a vector of costs: (clock period, # registers, …)
Formulate and solve optimization problem.
Papers (on bspace):
1. Leiserson, C. E. and J. B. Saxe (1983). "Optimizing synchronous systems."
Journal of VLSI and Computer Systems: pp. 41-67.
2. Leiserson, C. E. and J. B. Saxe (1991). "Retiming synchronous circuitry."
Algorithmica 6(1): pp. 5-35.
3. Shenoy, N. (1997). "Retiming: Theory and practice." Integration, the VLSI
Journal 22: pp. 1-21.
EECS 144/244, UC Berkeley: 47
23
Back-up slides
EECS 144/244, UC Berkeley: 48
Fixed Delays  Bounded Delays
Want algorithms that report the critical path delay
of the slowest circuit in the circuit family
Delay of 6 for the above circuit for transition model
(longest path that can propagate a transition)
EECS 144/244, UC Berkeley: 49
24
Floating-Mode Delay Model
Input transition  Single input vector condition
Pessimistic, but easier to compute
EECS 144/244, UC Berkeley: 50
Floating-Mode Delay Model
Assume an input pair <v1, v2> has been applied, but we only
look at v2 -- i.e. node values are unknown until set by v2
(pessimistic because we assume any v1 can be adversarially
selected, to reason about long paths)
Assume the 1 at the input of the AND arrives before the
0 (even if in reality it arrives later and the gate output
stays at 0 throughout, and no path is sensitized).
EECS 144/244, UC Berkeley: 51
25
Roadmap for rest of lecture
Consider conditions under which paths are TRUE or
FALSE under the floating-mode delay model with fixed
delays
+ under floating-mode model, fixed and bounded delays yield
same worst-case circuit delay (for same upper bounds)
+ worst-case delay under floating-mode model is upper bound
on that under the transition model
Reference (posted on bSpace):
S. Devadas, K. Keutzer, S. Malik:
“Computation of Floating Mode Delay in Combinational Circuits:
Theory and Algorithms”, IEEE TCAD, December 1993.
EECS 144/244, UC Berkeley: 52
Controlling and Non-Controlling Values
A controlling value at a gate input is the value that
determines the output value of that gate irrespective of
the other input value.
(the output value is called a controlled value)
A controlling value for an AND gate is 0 and for an OR
gate is 1. (The controlled values are 0 and 1 resp.)
A non-controlling value for an AND gate is 1 and for an
OR gate is 0.
What about NAND and NOR gates?
EECS 144/244, UC Berkeley: 53
26
Static Sensitization
Definition: A path is statically sensitized by a vector V, if
along each gate on the path, if the gate output is a
controlled value, the input corresponding to the path is the
only input with a controlling value
Input vector 100X statically sensitizes red path
EECS 144/244, UC Berkeley: 54
Static Sensitization
Static sensitization is sufficient for a path to
be responsible for the delay of a circuit
WHY?
Input vector 100X statically sensitizes red path
EECS 144/244, UC Berkeley: 55
27
Is this path statically sensitizable?
Definition: A path is statically sensitized by a vector V, if along each gate
on the path, if the gate output is a controlling value, the input
corresponding to the path is the only input with a controlling value
EECS 144/244, UC Berkeley: 56
Is this path statically sensitizable?
No, red path is NOT statically sensitizable (work this out)
EECS 144/244, UC Berkeley: 57
28
More on Static Sensitization
Are paths a,d,f,g and b,d,f,g statically sensitizable?
Are they true paths?
EECS 144/244, UC Berkeley: 58
Static Sensitization is too strong
A true path (one that is responsible for delay of a circuit)
need not be statically sensitizable
Paths a,d,f,g and b,d,f,g are NOT statically sensitizable.
But they are TRUE paths.
EECS 144/244, UC Berkeley: 59
29
Static Co-sensitization
Definition: A path is statically co-sensitized by a
vector V, if the input corresponding to the path
presents a controlling value at each gate along
the path whose output is a controlled value.
Not necessarily the ONLY controlling value
EECS 144/244, UC Berkeley: 60
Static Co-sensitization
Definition: A path is statically co-sensitized by a
vector V, if the input corresponding to the path
presents a controlling value at each gate along
the path whose output is a controlling value.
Not necessarily the ONLY controlling value
Paths a,d,f,g and b,d,f,g are statically co-sensitizable
EECS 144/244, UC Berkeley: 61
30
Static Co-sensitization and Delay
Static co-sensitization is necessary for a path to
be responsible for the delay of a circuit. (WHY?)
Is it sufficient?
EECS 144/244, UC Berkeley: 62
Static Co-sensitization and Delay
Static co-sensitization is necessary for a path to
be responsible for the delay of a circuit
But NOT sufficient
Path of length 6 is statically co-sensitized
Delay of circuit is 5 (as observed earlier)
EECS 144/244, UC Berkeley: 63
31
Summary
Static sensitization (SS) sufficient for true path, but not
necessary
Static co-sensitization (SC) necessary for true path, but
not sufficient
All paths
SS
True
paths
SC
EECS 144/244, UC Berkeley: 64
32