E0 229: Foundations of Data Science Lecturer: Ravi Kannan Scribe: Vidya Narayanan Lecture 9 6-Feb-2014 Power Method for computing SVD The power method is a conceptual starting point of algorithms for computing singular vectors. The intent is to look at an algorithm that approximately computes SVD and establish clean bounds (withoutP many parameters) on the approximation. Let A be an n × d matrix of rank r and A = t σt ut vt T be its SVD. We want to find the first singular vector v1 of A. ( Since the greedy approach works, we can repeat the process on the subspace orthogonal to v1 to find the next and so on.) Let B = AT A. B is a square(d × d) matrix. B = AT A X T X σ s us v s T ] σ t ut v t T ] [ =[ s t XX =[ σt σs vt (ut T us )vs ] t = X s σ t 2 vt vt T (∵ ut T us = 0 for t 6= s) t B2 = X σ t 4 vt vt T t B k = X σt 2k vt vt T t Assume σ1 > σ2 Bk σ22k σ32k T T = v v + v v + v3 v3 T + · · · 1 1 2 2 σ12k σ12k σ12k Bk = v1 v1 T k→∞ σ12k lim From the matrix v1 v1 T it is easy to find v1 since each column is a scalar multiple of v1 . The value of k depends on the gap σ1 − σ2 . If σ2 = (1 − )σ1 σ 2k 2 = (1 − )2k σ1 ≈ e−2k As a rule of thumb, we need k 1/. ( small) (∵ 1 + x < ex ) An issue with the above approach is the use of matrix multiplication. Each step involves i.e d × d matrix multiplication requiring O(d3 ) computation which is costly. A is often a large, sparse matrix. However B = AT A need not be sparse and it may not be easy to even store all the entries of B. To avoid matrix multiplication consider the following. Let x be a random unit d-dimensional vector. First, B k x too in the limit gives us v1 . X σt 2k vt vt T Bk = BkB t k B x = X σt 2k vt (vt T x) t Bk σ22k σ32k T T x = v (v x) + v (v x) + v3 (v3 T x) + · · · 1 1 2 2 σ12k σ12k σ12k Bk x = v1 (v1 T x) k→∞ σ12k lim (c = v1 T x, scalar) = c · v1 Second, By using B k x we can compute powers of B by matrix-vector multiplication which is computationally more efficient than matrix multiplication. B k = B(B k−1 x) = AT A(B k−1 x) B k can be computed with 2k matrix-vector multiplications. However, if x is orthogonal or k nearly orthogonal to v1 , limk→∞ σB2k x = v1 (v1 T x) = 0. 1 If σ1 = σ2 or σ1 ≈ σ2 , by the above procedure we would find a v in the space spanned by v1 and v2 with |Av| = σ1 . The following theorem shows that the vector found by the power method has a high component in the space spanned by those singular vectors that correspond to (nearly) highest singular values and has a low component in the orthogonal space. (AT A)k x Let w = |(A T A)k x| be the result of running the power method for k steps. Let σ1 · · · σl ≥ (1 − )σ1 and σl+1 · · · σr < (1 − )σ1 . Assume x satisfies |v1 T x| ≥ δ > 0 provided k = Ω( ln 1 δ ). Theorem 1. If k > ln(1/δ) , the component of w along subspace V spanned by the first l singular vectors is atleast 1 − . 2 Proof. (Component) of (AT A)x along V ≥ (Component) of (AT A)x along v1 = |v1 T (AT A)k x| X σt 2k vt vt T )x| = |v1 T ( = ≥ t 2k T |σ1 v1 x| δσ12k (Component)2 of (AT A)x along the space orthogonal to V T T = [vl+1 (AT A)k x]2 + [vl+2 (AT A)k x]2 + · · · X T σt 2k vt vt T )x]2 + · · · ( = [vl+1 = = t 2 2k T (σl+1 vl+1 x) 2 4k T σl+1 (vl+1 x) 2 2k T + (σl+2 vl+2 x) + · · · 2 4k T + σl+2 (vl+2 x) + · · · 2 2 T T x) + · · · ) x) + (vl+2 ≤ (1 − )4k σ14k ((vl+1 (∵ vl+1 . . . vr can be extended to a basis ) ≤ (1 − )4k σ14k |x|2 Component of w along the space orthogonal to V (1 − )2k σ12k |(AT A)k x| (1 − )2k σ12k = δσ12k + · · · ≤ ≤ (1 − )2k σ12k δσ12k (1 − )2k δ ≤ e−2k−ln δ = = e+ ln (∵ 1 + x < ex ) ( for a suitable k ) = √ For a random unit vector x, with high1 probability |v1 T x| = Ω(1/ d) and even with d k ≥ c. ln ( c positive constant) power method should converge. We can now carry out a greedy procedure for finding the next singular vector in the space perpendicular to w. The greedy algorithm still works for approximate SVD (Proof requires more work). By 1 Lemma 4.10 in text 3 approximate that the left singular vectors are only approximately orthogonal i.e P we0 mean T when A= t σt ut wt where wt was computed using approximate SVD, wt are orthogonal by construction, but ut are only nearly orthogonal. Note that the equality is exact. 4
© Copyright 2024 ExpyDoc