Problem Set 2

EECS227C - Homework II
February 18, 2014
This assignment is due at the beginning of class on March 4.
1. Let f be a convex, continuously differentiable function, with L-Lipschitz derivatives, and
bounded level sets. That is, suppose that for any α, the set
Ωα = {x : f (x) ≤ α}
is bounded. Prove that gradient descent with backtracking line search converges for all such
f . Provide as tight an upper bound on the rate of convergence as you can.
2. Consider the momentum method
x(k+1) = x(k) − αk ∇f (x(k) ) + βk (x(k) − x(k−1) )
where α and β are chose to minimize f at each iteration. Assume that x0 = x−1 . Show that
if f (x) = 21 xT Ax − bT x, then this method is equivalent to the conjugate gradient method.
3. Let ` and L be constants satisfying 0 < ` ≤ L. Define
4
√
α := √
( L + `)2
√
√
L− `
√ .
β := √
L+ `
and the matrix
1 + β − αλ −β
T (λ) =
.
1
0
Assume throughout that λ ∈ (`, L).
(a) Show that T (λ) has two complex eigenvalues γ1 and γ2 , each with magnitude β.
(b) Bound the condition number of the complex similarity transform S(λ) which satisfies
γ1 0
−1
S(λ) T (λ)S(λ) =
0 γ2
That is, compute maxλ kS(λ)kkS(λ)−1 k.
4. Recall that in the Heavy Ball Method, we tried to force the matrix
(1 + β)I − αA −βI
T =
.
I
0
1
to have as small a spectral radius as possible when the eigenvalues of A were in the interval
(`, L). What is the smallest norm of T possible? That is, compute
max kT k .
min
α>0,1>β>0 `IALI
What values of α and β achieve the minimum?
5. Let f be a convex, continuously differentiable function, with L-Lipschitz derivatives. Prove
for all such f . Provide as
that Nesterov’s method converges when αk = 1/L and βk = k−1
k+2
tight an upper bound on the rate of convergence as you can.
6. Apply Newton’s method with a constant stepsize to minimize the function f : Rn → R
given by
!3/2
n
X
f (x) := 31
x2i
.
i=1
Identify the range of stepsizes for which this method converges. Show that for any stepsize
within this range, the iterates converge linearly to xopt = 0. Explain why the method does
not converge quadratically to the optimal solution.
7. Derive closed form expressions for the following quasi-Newton methods:
(a) H (k+1) a diagonal matrix: H (k+1) = diag(h1 , . . . , hn ) for all k. Minimize the distance
to the last iterate in the operator norm subject to s(k) = H (k+1) y (k) .
(b) H (k+1) a rank-one update H (k+1) = H (k) + vv T . Ensure s(k) = H (k+1) y (k) .
In both cases, determine whether H (k) is positive definite for all k. Explain why or why not.
2