UNIVERSITY of LIMERICK

UNIVERSITY of LIMERICK
OLLSCOIL LUIMNIGH
College of Informatics and Electronics
END OF SEMESTER ASSESSMENT PAPER
MODULE CODE: MS4327
SEMESTER: Spring 2007
MODULE TITLE: Optimisation
DURATION OF EXAMINATION: 2 1/2 hours
LECTURER: Dr. J. Kinsella
PERCENTAGE OF TOTAL MARKS: 80%
EXTERNAL EXAMINER: Prof. J. King
INSTRUCTIONS TO CANDIDATES: Answer four questions correctly for full marks; 80%.
See the Appendix at the end of the paper for some useful results.
MS4327 Optimisation
1
Dr. J. Kinsella
Spring 2007
Marks
(a) Given the one-dimensional “line” function
φ(α) = 2 − 2α + 2α2 − 1/2α3;
show that the strong Wolfe conditions (Appendix B) hold in the interval [0 · 45, 1] (approximately). Take c1 = 0.1 and c2 = 0.25.
(b) The function φ(α) has a local minimum at α = 32 . Is this consistent
with the above result?
15
1
(c) For any line function φ, define Φ(α) = φ(α) − (φ(0) + c1αφ ′ (0))
and show that when 0 < c1 < c2 < 1:
(i) For any α, Φ(α) ≤ 0 is equivalent to the first Wolfe condition
(11a) holding at α.
(ii) For any α, Φ ′ (α) = 0 implies that the second (weak) Wolfe condition (11b) holds at α.
2
(d) Use the results from (c) to show that if the first Wolfe condition fails
for some α0 > 0 then the interval (0, α0) contains values for α that
satisfy the strong Wolfe conditions. (Assume that φ ′ (0) < 0.) Illustrate your proof with a sketch graph of Φ.
5
(e) Explain briefly the significance of this result for step-size selection
algorithms.
1
1
2 The Dogleg Trust Region method is described in Appendix C.
(a) First show that for a positive definite matrix B that the Cauchy-Schwartz
inequality
(uT v)2 ≤ (uT u)2(vT v)2
(1)
implies that
2
(pT Bp)(pT B−1p) ≥ (pT p)2 for any vector p.
(2)
(b) Show that if B is positive definite then (you may assume the result
from part (a))
˜
(i) kp(τ)k
is an increasing function of τ and
˜
(ii) m(p(τ))
is a decreasing function of τ .
6
10
˜
(c) Explain why the path p(τ)
intersects the trust region boundary kpk =
˜
∆ at exactly one point if kp(2)k
≡ kpBk ≥ ∆ and nowhere otherwise.
1
(d) What value should be selected for p if kpBk ≤ ∆?
1
(e) In the case where the vector p is chosen to be on the boundary, explain
carefully how the appropriate value of the parameter τ is chosen.
2
1
MS4327 Optimisation
Dr. J. Kinsella
Spring 2007
0.5
1
and ∆ = 1.5, find the appropriate
, pB =
(f) Finally, if pU =
1p.5
1
value of τ.
Marks
3
3 The Fletcher-Reeves (FR) version of the non-linear conjugate gradient algorithm (Alg. 1) is given in Appendix D.
is increasing on 0 ≤ t ≤ 12 and
(a) Show that the function h(t) = 2t−1
1−t
conclude that −1 ≤ h(t) < 0 on [0, 12 ).
(b) Suppose that the algorithm is implemented with a step length αk that
satisfies the strong Wolfe conditions with 0 < c2 < 21 . Use the result
of (a) to show that the method generates descent directions pk that
satisfy the following inequalities:
−
1
≤
1 − c2
pTkgk
kgkk2
≤
2c2 − 1
,
1 − c2
3
14
for all k = 0, 1, . . . .
(c) Explain the significance of the result.
(d) Explain carefully why the method “sticks” if a “bad” value of pk (satkgkk
isfying
≪ 1) is generated at iteration k.
kgkk
(e) What is the significance of this result for the method?
1
6
1
4 The BFGS method updates an estimate Hk of the Inverse Hessian in the
form:
Hk+1 = (I − γkskyTk)Hk(I − γkyksTk) + γksksTk.
(3)
Here sk = xk+1 − xk, yk = ∇f(xk+1) − ∇f(xk) and γk =
1
.
sTk yk
(a) Derive this update rule by solving the problem:
18
min kH − Hkk
(3a)
subject to H = H , Hyk = sk.
(3b)
H
T
1
1
Use the “weighted
norm”: kAkW ≡ kW 2 AW 2 kF, where
Pn PnFrobenius
2
2
kCkF ≡
j=1 Cij. The weight W is an unspecified positive
i=1
definite symmetric matrix satisfying Wsk = yk.
(b) Explain briefly how you could use the BFGS method to generate search
directions.
(c) Show that (when the inverse Hessian approximations are generated
using the BFGS algorithm) if Hk is positive definite, then the next
iterate Hk+1 is also ).
(d) Explain the significance of this result.
2
1
5
1
MS4327 Optimisation
5
Dr. J. Kinsella
Spring 2007
Marks
(a) Consider the constrained minimisation problem:
min f(x) = (x1 − a)2 + (x2 − b)2 + x1x2
x∈R2
subject to 0 ≤ xi ≤ 1, for i = 1, 2.
(i) Show that the unconstrained minimum of f is at
x=
3
2
(2a − b, 2b − a) .
3
(ii) Write the first-order necessary (KKT) conditions for this problem.
(See Appendix E.)
(iii) When a = 47 and b = 54 , show that the unconstrained minimum
of f is at x = 23 , 12 — an infeasible point.
(iv) Show that because of the location of the unconstrained minimum
either one or two of the constraints x1 ≤ 1, x2 ≥ 0 and x2 ≤ 1
must be active at the optimal point. Explain why the solution must
either be at (A) the vertex (1, 1), (B) on the edge x1 = 1 or (C) at
the vertex (1, 0) — use a sketch to illustrate your argument.
(v) Solve the KKT equations to find the solution to the constrained
problem for these values of a and b — check the three cases A,
B and C separately.
4
2
4
8
(b) Consider the problem
min f(x) = −2x1 + x2
x∈R2
(1 − x1)3 − x2
≥0
subject to
2
x2 + 0.25x1 − 1 ≥ 0.
(4)
(i) Show that the LICQ applies at x∗ = (0, 1) (active constraints are
linearly independant).
(ii) Show that the KKT conditions are satisfied at x∗ .
3
2
2
MS4327 Optimisation
Dr. J. Kinsella
Spring 2007
Marks
6 Consider the equality-constrained Quadratic Program (Q.P.):
1
min q(x) = xT Gx + xT d,
x
2
T
subject to ai x = bui, i = 1, . . . , k
(5a)
(5b)
(a) Writing the set of k equality constraints (5b) as the matrix equation
Ax − b = 0, show that if x∗ is a local minimum then the KKT conditions (Appendix E) require that there must be a vector λ∗ of Lagrange
multipliers such that the following system of equations is satisfied:
G −AT
A
0
∗ −d
x
∗ =
b
λ
3
(6)
(b) If we write x∗ = x0 + p, where x0 is any estimate of the solution and
p the required step to the solution, show that (6) can be re-written as:
g
G AT −p
(7)
=
c
λ∗
A 0
1
where c = Ax0 − b, g = Gx0 + d (the gradient of q(x0)) and p =
x∗ − x0.
(c) Let A have full row rank and let Z be the n × (n − m) matrix whose
columns form a basis for the null space of A. (AZ = 0.) Show that,
if the “reduced-Hessian” matrix ZT GZ is positive definite, the KKT
matrix
G AT
(8)
K=
A 0
is non-singular.
8
(d) Briefly explain the significance of this result.
1
(e) Show that this block matrix equation can be reduced to the problem of
solving
AG−1AT λ∗ =
AG−1g − c .
(9)
for λ∗ and then solving
−Gp + AT λ∗ = g
for p.
(10)
2
4
MS4327 Optimisation
Dr. J. Kinsella
Spring 2007
(f) Given q(x) = 2x2 + 2xy + 2y2 − x + y and the inequality constraint
2x + 3y ≥ 1, solve the QP using the method you derived in (e). (First
check whether the unconstrained solution is feasible — if it is, you are
finished; if not the inequality constraint must be active at the optimal
point.)
5
Marks
10
MS4327 Optimisation
Dr. J. Kinsella
Spring 2007
Appendix of Results
A The Wolfe conditions for the step length α in a line search require that
f(xk + αpk) ≤
pkT g(xk
f(xk) + c1αpkT g(xk),
(10a)
c2pkT g(xk)
(10b)
+ αpk) ≥
where g(x) ≡ ∇f(x) and 0 < c1 < c2 < 1. The strong Wolfe conditions
replace (10b) by
|pkT g(xk + αpk)| ≤ c2|pkT g(xk)|.
(11)
B In terms of a “line” function φ(α) ≡ f(x + αp); the Wolfe conditions for the
step length α in a line search require that
φ(0) + c1αφ ′ (0),
c2φ ′ (0)
φ(α) ≤
φ ′ (α) ≥
(11a)
(11b)
where 0 < c1 < c2 < 1. The strong Wolfe conditions replace (11b) by
|φ ′ (α)| ≤ c2|φ ′ (0)|.
(12)
C The Trust Region Method is based on the problem:
1
minn m(p) ≡ f0 + gT p + pT Bp,
p∈R
2
such that kpk ≤ ∆,
(13)
where f0 is a fixed scalar, g a fixed vector in Rn, B a fixed n × n matrix and ∆
a fixed positive scalar. The “dogleg” method finds an approximate solution to
(13) by replacing the (unknown) curved trajectory for p∗ (∆) with a path consisting of two line segments. The first line segment runs from the starting point
to the unconstrained minimiser along the steepest descent direction defined by
pU = −
gT g
g
gT Bg
(14)
while the second line segment runs from pU to pB ≡ −B−1g. We can define
˜
the trajectory as a path p(τ)
parameterised by τ as follows:
τpU,
0 ≤ τ ≤ 1,
˜
(15)
p(τ)
=
U
B
U
p + (τ − 1)(p − p ), 1 ≤ τ ≤ 2.
6
Marks
MS4327 Optimisation
Dr. J. Kinsella
Spring 2007
D Algorithm 1 (FR-CGM)
begin
Given x0.
set r0 ← ∇f0, p0 ← −r0, k ← 0;
while rk 6= 0 do
αk ← Result of line search along pk;
xk+1 ← xk + αkpk;
rk+1 ← ∇fk+1
rT
r
2
k
k+1 k+1
≡ krkrk+1
βFR
2 ;
k+1 ← rk T rk
kk
pk+1 ← −rk+1 + βFR
k+1pk;
k ← k + 1;
end (while)
end
E The first-order KKT necessary conditions for a point x∗ with optimal multipliers λ∗ to be a local solution of an equality-constrained minimisation problem
are as follows: Suppose that x∗ is a local solution of a constrained minimisation problem and that the LICQ holds at x∗ . Then there is a Lagrange multiplier
vector λ∗ , with components λ∗i , i ∈ E ∪ I, such that the following conditions
are satisfied at (x∗ , λ∗)
∇xL(x∗ , λ∗) = 0,
ci(x∗ ) = 0,
ci(x∗ ) ≥ 0,
λ∗i ≥ 0,
λ∗i ci(x∗ ) = 0,
for all i ∈ E,
for all i ∈ I,
for all i ∈ I,
for all i ∈ E ∪ I
7
(16a)
(16b)
(16c)
(16d)
(16e)
Marks