LQR and LQG Controllers

Last Update: 02.06.2014
1 State Dependent Riccati Equation Method and Hinfty
Control
1.1 Introduction
1.1.1 The Linear Quadratic Regulator (LQR)
Consider the system
x = Ax + B u
And the performance criteria
∞
∫
J [u (⋅)][= [ xT Qx + u T Ru ]dt ,  Q ≥ 0, R > 0,
0
p
Problem: Calculate function u : [0, ∞]  ℜ such that J [u ] is minimized.
Remarks:
1. LQR can be considered for final times
2. LQR can be considered for time varying matrices
3. LQR can be extended in several ways to nonlinear systems (e.g. State Dependent
Riccati Equations)
4. LQR assumes full knowledge of the state
The LQR controller has the following form
u (t ) = − R −1BT Px(t )
n× n
Where P ∈ ℜ is given by the positive (symmetric) semi definite solution of
0 = PA + AT P + Q − PBR −1 BT P
This equation is called Ricatti equation. It is solvable iff the pair ( A, B ) is controllable
and
(Q, A)
is detectable
LQR controller design
1.
2.
( A, B) is given by “design” and can not be modified at this stage
(Q, R) are the controller design parameters. Large Q penalizes
large R penalizes usage of control action u
-1-
transients of
x,
1.1.2 The Linear Quadratic Gaussian Regulator (LQG)
In LQR we assumed that the whole state is available for control at all times (see formula
for control action above). This is unrealistic as the very least there is always measurement
noise.
One possible generalization is to look at
x = Ax + B u + w
y = Cx + v
Where v, w are stochastic processes called measurement and process noise respectively.
For simplicity one assumes these processes to be white noise (ie zero mean, uncorrelated,
Gaussian distribution)
Now only y (t ) is available for control. It turns out that for linear systems a separation
principle holds
1. First, calculate xˆ (t ) estimate the full state x(t ) using the available information
2. Secondly, apply the LQR controller, using the estimation xˆ (t ) in place of the true
(now unknown) state x(t ) .
Observer design (Kalman Filter)
The estimation xˆ (t ) is calculated by integrating in real time the following ODE
xˆ = Axˆ + Bu + L( y − Cxˆ )
With the following matrices calculated offline (ie before hand)
L = PC T R −1
0 = AP + PAT − PC T R −1CP + QY , P ≥ 0,
Q = E ( wwT ), R = E (vvT )
The Riccati equation above has its origin in the minimization of the cost functional
0
J [ xˆ (⋅)][=
∫ [( xˆ − x)( xˆ − x)
T
]dt
−∞
-2-
1.2 State Dependent Riccati Equation Approach
The State-Dependent Riccati Equation (SDRE) strategy is provides an effective algorithm
for synthesizing nonlinear feedback controls by allowing nonlinearities in the system
states while additionally offering design flexibility through state-dependent weighting
matrices.
The method entails factorization (that is, parameterization) of the nonlinear dynamics into
the state vector and the product of a matrix-valued function that depends on the state
itself. In doing so, the SDRE algorithm brings the nonlinear system to a non-unique linear
structure having state-dependent coefficient (SDC) matrices. The method includes
minimizing a nonlinear performance index having a quadratic-like structure.
An algebraic Riccati equation (ARE) using the SDC matrices is then solved on-line to
give the suboptimum control law. The coefficients of this equation vary with the given
point in state space. The algorithm thus involves solving, at a given point in state space,
an algebraic state-dependent Riccati equation, or SDRE. The non-uniqueness of the
parameterization creates extra degrees of freedom, which can be used to enhance
controller performance.
Problem Formulation
Consider the deterministic, infinite-horizon nonlinear optimal regulation (stabilization)
problem, where the system is full- state observable, autonomous, nonlinear in the state,
and affine in the input, represented in the form
x (t) = f (x) + B(x)u(t),
x(0) = x0
(1)
where x ∈ IRn is the state vector, u ∈ IRm is the input vector, and t ∈[0, ∞) , with C1 (IRn)
functions f : IRn → IRn and B : IRn → IRn×m , and B(x) ≠ 0 ∀x .
Without any loss of generality, x = 0 is assumed to be an equilibrium point:
f (0) = 0.
In this context, the minimization of the infinite-time performance criterion
J (x0 , u(⋅)) =
∫[x(t)Q(x)x(t) + u(t)R(x)u(t)] dt
(2)
is considered, which is non-quadratic in x but quadratic in u .
The state and input weighting matrices are assumed state dependent such that
-3-
Q : IRn → IRn×n and R : IRn → IRm×m .
These design parameters satisfy Q(x) ≥ 0 and R(x) > 0 for all x .
Under the specified conditions, a control law
u(x) = k(x) = −K(x)x, k(0) = 0 ,
(3)
where K(⋅) ∈ C1 (IRn ) , is then sought that will (approximately) minimize the cost (2)
subject to the input-affine nonlinear differential constraint (1) while regulating the
system to the origin ∀x , such that lim x(t) = 0, t->∞.
Extended Linearization is the process of factorizing a nonlinear system into a linear-like
structure which contains SDC matrices. Under the assumptions
f (0) = 0 and f (⋅) ∈ C1 (IRn ) ,
a continuous nonlinear matrix-valued function A(x) always exists such that
f (x) = A(x)x ,
(4)
where A : IRn → IRn× is found by mathematical factorization and is, clearly, nonunique when n > 1 .
Hence, after extended linearization of the input-affine nonlinear system (1) becomes
x (t) = A(x)x(t) + B(x)u(t),
x(0) = x0 ,
(5)
which has a linear structure with SDC matrices A(x) , B(x).
The application of any linear control synthesis method to the linear-like SDC structure (5),
where A(x) and B(x) are constant matrices, forms an extended linearization control
method, which is, broadly, a family of control methods that find the controller u(x)=K(x)
such that A(x)-B(x)K(x) is point-wise Hurtwitz.
The following conditions are required for guaranteeing local asymptotic stability.
Condition 1.: A(⋅) ,B(⋅) ,Q(⋅) and R(⋅) are C1 ( ) matrix-valued functions.
-4-
Condition 2.: The respective pairs {A(x), B(x)} and {A(x), Q1 2 (x)} are pointwise
stabilizable and detectable SDC parameterizations of the nonlinear system (1) for all x
Theorem 2 (Mracek & Cloutier, 1998).
Under Conditions 1 &2 , consider the nonlinear multivariable system
x (t) = f (x) + B(x)u(t), x(0) = x0 (1)
where x ∈ IRn is the state vector, u ∈ IRm is the input vector
u(x) = −R−1 (x)BT (x)P(x)x
where P(x) is the unique, symmetric, positive-definite solution of the algebraic StateDependent Riccati Equation
P(x)A(x)+A(x)P(x) - P(x)B(x)R{-1}B(x){T}(x)P(x) + Q(x) = 0
Then, the method produces a closed-loop solution which is locally asymptotically stable.
Remark. Note that global stability has not been established, this is a local result. In
general, even if Acl(x)=A(x)-B(x)K(x) is Hurtwitz for all x, this does not imply global
stability. One can prove though that is Acl(x) is symmetric and Hurtwitz for all x, then
global stability holds. The proof is simply obtained by showing that under these condtions
V(x)=x*x is a Lyapunov function for system (1).
Optimality of solution: (Mracek & Cloutier, 1998). Under Conditions 1 &2 , the
SDRE nonlinear feedback solutionn and its associated state and costate trajectories satisfy
the first necessary condition for optimality ∂H/∂u=0 of the nonlinear optimal regulator
problem.
Example.
Steer to x=(d,0) the following system.
dx1/dt = x2
dx2/dt= -a sin(x(1)) - b x2 + c u(t)
Indeed,
A(x)=[0 1; -a sin(x1-d)/(x1-d) –b]; B(x)=[0;1];
We choose
Q=[max(1,(x(1)-d)^2 0; 0 max(1,x(2)^2]; R=1 ;
-5-
The choice of Q(x) ensures larger control actions for large deviations from the equilibrium.
The blue trajectory is obtained using LQR on the standard linearization of the original
system with Q=eye(2), R=1. The magenta line is the SDRE method described in the
example. Note how the state is brought faster to the equilibrium in the SDRE case.
-6-
1.3 H∞ Control
p×q
Let us introduce the so called H∞ norm of a transfer function G ( s ) ∈ C
. This is a
mapping from the space of matrix transfer functions into the positive real numbers
defined by
G ( s)
∞
= maxω∈ℜ G ( jω )
2
= maxω∈ℜ σ 1 (G ( jω )) ,
where σ 1 denotes the largest singular value of the complex matrix G ( jω ) .
The intuition associated to this definition is that the H∞ norm quantified the maximal
amplification that a signal may have once applied to the system given by the transfer
matrix G (s ) .
Figure 1. H∞ norm
Let us now consider the system depicted below.
Figure 2. H∞ Objects
-7-
Here w(t ) is a disturbance acting on the system, while u (t ) is the control action. Further,
z (t ) is to be understood as performance index. Without loss of generality, one assumes
that this continuous time system is given by the matrices and equations below.
x = Ax + B1 w + B2u
z = C1 x + D12 w
y = C 2 x + D21 w
Figure 3. Underlying H∞ System Equations
Problem: H∞ Control is about finding the stabilizing control (not necessarily memoryless)
law u (t ) = F ( y (t )) that stabilizes the system above AND minimizes the effect of the
disturbance w(t ) on the performance index z .
This goal can be achieved by minimizing the H∞ norm of the transfer function T : w  z
that is generated once a controller design has been chosen.
Figure 4. H∞ Goal
In other words, one designs a controller that minimizes the effect of the worst possible
disturbance.
Another related formulation is to have a controller that renders the system dissipative and
internally stable, see below.
-8-
1.3.1 Linear H∞ Control Design
By analogy with the LQG case, for the design of this controller we expect to have to solve
2 Riccati equations, one for calculation the optimal action given the system state, and one
for generating an estimation of the state.
The main difference now is that we are dealing now with two objects
1. The worst possible disturbance
2. The best possible reaction to that worst disturbance
Relation to the Game Theory is seen here, where we have two players, one that is trying
to maximize a cost function and one that is trying to minimize it. The applicable cost
function here is
t
J [ w, u , x0] = ∫ ( z 2 − γ w 2 )ds
0
subject to the equations, see Figure 3.
In practice one does not seek the optimal controller (i.e. the ones that produces the
absolute minimal value for T ∞ but the design is made using an iterative procedure that
seeks for reducing the norm while still looking at other performance measures. Indeed, for
any sufficiently large given γ > 0 one can calculate a controller that makes
using the following formulae
i)
Find X ≥ 0 that satisfies the ARE
0 − XA + AT X − C1T C1 + X (1 / γ 2 B1B1T − B 2 B 2T ) X
And
A − (1 / γ 2 B1B1T − B 2 B 2T ) X is stable
ii)
ii)
Find Y ≥ 0 that satisfies the ARE
T
0 − AY + YA − B1B1T + Y (1 / γ 2C1T C1 − C 2T C 2)Y
And
A − (1 / γ 2C1T C1 − C 2T C 2)Y is stable
iii)
ρ ( XY ) < γ 2 (lowest singular value)
Then the dynamic H∞ controller has the form
xˆ = Axˆ + B1 wworst (t ) + B 2uopt (t ) + ZL(C 2 xˆ (t ) − y (t ))
wworst (t ) = γ 2 B1T Xxˆ (t )
uopt (t ) = Fxˆ (t )
-9-
T
∞
<γ
Where
Z = ( I − 1 / γ 2 XY ) −1
L = −YC 2T
F = − B2 X
Remarks
1. For very small values of γ , these equations will have no solution while for very
large γ , these equations reduce to the LQG case.
2. The best design is made by an iterative procedure reducing γ that keeps an eye on the
disturbance rejection properties at all interesting frequencies (and not jus the
maximum value).
It can be proved that these Riccati Equations may have solution only if the following
“structural” assumptions hold.
Figure 5. Linear H∞ Assumptions
These assumptions are rather natural as they transfer properties as observability and
controllability to this new context.
Remark: H∞ design provides robustness at the cost of a pessimistic control law: it
assumes that the worst possible perturbation is acting on the system at all times.
1.3.2 Nonlinear H∞ Control Design
All elements of the linear case are here present, and we expect a solution schema where
one has an optimal control law given the full system state, and a law that helps us to
generate optimal estimations of this actually unknown full system state.
The difference we do expect though is that in place of 2 Riccati equations, one for optimal
control and one for the optimal observer, we will have to deal here with two Hamilton
Jacobi equations.
We have met these partial differential equations already, when we treated “Optimal
Control and Dynamic Programming”. We know they are hyperbolic, that they are
amenable for treatment with numerical methods, and that they have an interesting
- 10 -
mathematical theory due to the fact that their solutions may become discontinuous and
those usual concepts like differentiability can not be applied straightforwardly.
The theory has been developed for nonlinear systems of the form.
x = A( x) x + B1 ( x) w + B2 ( x)u
z = C1 ( x) x + D12 ( x) w
y = C 2 ( x) x + D21 ( x) w
For this system the “Observation” PDE has the form.
Figure 6. Observation Hamilton Jacobi Equation
It turns out that the best estimate of the full state is given by the formula.
x ( p ) = arg max x ( p ( x) + V ( x) )
where V (x ) is the solution of the optimal control Hamilton Jacobi PDE.
- 11 -
In the graphic above, the blue line is the true state, while the magenta line is its estimation
via formula
x ( p ) = arg max x ( p ( x) + V ( x) ) .
The control law is then given by
*
u * ( p ) = u state
( x ( p )) = −C1 ( x ( p )) − B2 ( x ( p ))∇ xV ( x ( p ))'
and steers the system to equilibrium.
- 12 -
Naturally, for all this to hold we need the system matrix functions to have good structural
properties for this PDE to make sense in the first place. So we expect
1.
2.
3.
A( x), C1 ( x), C 2 ( x) globally Lipschitz, smooth, vanish at 0
B1 ( x), B2 ( x) globally Lipschitz, bounded, smooth
D12 ( x), D21 ( x) constant (for simplicity)
1.4 References




J.W. Helton & M.R. James, A General Framework for Extending H-Infinity Control to
Nonlinear Systems, SIAM 1998.
J.A. Ball & J.W. Helton, H-Infinity Control for Stable Plants, MCSS 5 (1992) 233-262.
J.A. Ball, J.W. Helton & M. Walker, H-Infinity Control for Nonlinear Systems via Output
Feedback, IEEE TAC 38 (1993) 546-559.
M.R. James & J.S. Baras, Robust H-Infinity Output Feedback Control for Nonlinear
Systems, IEEE TAC 40 (1995) 1007-1017.
- 13 -