STOR 565: Introduction to Machine Learning Principal Component Analysis (PCA) Andrew Nobel January, 2014 Andrew Nobel STOR 565 Approximating a Set of Vectors Given: Vectors u1 , . . . , un ∈ Rp centered so that P ui = 0 Goal: Find a low dimensional summary of u1 , . . . , un , more precisely, a subspace V of Rp such that dim(V ) much less than p (and n) projection of uj onto V is close to uj Note: Smallest subspace of Rp containing {ui } is V = span(u1 , . . . , un ) with dim(V ) ≤ n Andrew Nobel STOR 565 Approximating a Set of Vectors Consider approximating subspace V of dimension 1, equivalently, V = {α u0 : α ∈ R} some u0 ∈ Rp with ku0 k = 1. Note: Projection of u onto V is (ut u0 ) u0 . Two Goals: Find u0 to maximize Var({ut1 u0 , . . . , utn u0 }) Find u0 to minimize n−1 Pn Andrew Nobel i=1 kui − (uti u0 ) u0 k2 STOR 565 Variance of Projections By definition of the variance: n n i=1 i=1 h1 X i2 1X t (ui u0 )2 − uti u0 n n Var({ut1 u0 , . . . , utn u0 }) = n n h1 i2 X 1X t (ui u0 )2 − ut0 ui ) n n = i=1 i=1 n 1X t (ui u0 )2 n = i=1 The last equality follows since P Andrew Nobel i ui = 0 (data are centered). STOR 565 Sum of Squares Fit Consider sum of squares. Completing square of ith term gives kui − (uti u0 ) u0 k2 = kui k2 − 2 (uti u0 )2 + (uti u0 )2 ku0 k2 = kui k2 − (uti u0 )2 Therefore n 1X kui − (uti u0 ) u0 k2 = n i=1 n n 1X 1 X t kui k2 − (ui u0 )2 n n i=1 i=1 n = 1X kui k2 − Var({uti u0 }) (1) n i=1 Conclusion: Choosing u0 to minimize the sum of squares fit is equivalent to maximizing variance of projection lengths. Andrew Nobel STOR 565 Empirical Covariance Matrix Define X = n × p matrix with rows ut1 , . . . , utn . Let Σ = n−1 Xt X be p × p covariance matrix of X. Note that Var({uti u0 }) = = n n i=1 i=1 1X t 2 1X t (ui u0 ) = (u0 ui )(uti u0 ) n n 1 n n X = ut0 ut0 (ui uti ) u0 = ut0 n 1 X i=1 1 n Xt X u0 = ut0 Σ u0 Andrew Nobel STOR 565 n i=1 ui uti u0 Best 1-Dimensional Subspace Upshot: Variance of the projections {ut1 u0 , . . . , utn u0 } onto u0 is equal to ut0 Σ u0 . Maximized when u0 is the leading e-vector of Σ. Let λ1 ≥ · · · ≥ λp ≥ 0 be the eigenvalues of Σ, with corresponding orthonormal eigenvectors v1 , . . . , vn . Fact: The best one dimensional approximation for u1 ,. . . , un is obtained by projecting the data vectors onto the line V1 = span{v1 }, in the direction of the principal eigenvector of Σ. Andrew Nobel STOR 565 Approximation Error of V1 By (1) and fact that n 1 n Pn 2 i=1 kui k 1X kui − (uti v1 ) v1 k2 = n i=1 = tr(Σ), approximation error is n 1X kui k2 − v1t Σ v1 n i=1 = tr(Σ) − λ1 = p X i=1 λi − λ1 = p X i=2 Residual error after projecting onto v1 is sum of the remaining eigenvalues 2 through p. Andrew Nobel STOR 565 λi Higher Order PCs For 1 ≤ d ≤ p the d-dimensional subspace V of Rp minimizing n 1X kui − projV (ui )k2 n i=1 is Vd = span{v1 , . . . , vd } = span of d leading eigenvectors of Σ. In this case, the projection of u onto Vd is projVd (u) = d X (ut vj ) vj j=1 and the approximation error of Vd is given by n p X i=1 i=d+1 1X kui − projVd (ui )k2 = n Andrew Nobel STOR 565 λi Principal Components Definitions: Eigenvectors vi of Σ called the principal component directions (loadings) of {ui } or X The projection (uti vj ) vj is called the jth principal component of ui Definition: The percentage of variation captured by the first d principal components, equivalently the subspace Vd , is Pd λ Ppi=1 i × 100 j=1 λj Andrew Nobel STOR 565 TCGA Gene Expression Data Heat map of gene expression data from The Cancer Genome Atlas (TCGA) Samples 95 Luminal A breast tumors 122 Basal breast tumors Variables: 2000 randomly selected genes Andrew Nobel STOR 565 PCA on TCGA Expression Data -10 10 30 -40 -20 0 10 20 -30 10 30 -40 0 PC1 10 30 -30 -10 PC2 0 20 -30 -10 PC3 -40 -20 PC4 -40 -20 0 20 40 -30 -10 10 30 Figure : Projections of Sample data onto the first four principal components of the TCGA dataset. Colors represent subtype of cancer: Luminal A and Basal Andrew Nobel STOR 565 Image Data Data: X = 458 × 685 matrix of pixel intensities Question: Can we project columns of the image onto a low dimensional subspace and still reconstruct the image? Approach: Project columns of X onto the d leading eigenvectors of their empirical covariance matrix. Andrew Nobel STOR 565 Proportion Variation Explained 1 Proportion of Variation Explained 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8 0 10 20 30 40 50 60 70 80 Number of Principal Components Andrew Nobel STOR 565 90 100 Image Reconstruction Andrew Nobel STOR 565
© Copyright 2024 ExpyDoc