Component Analysis

STOR 565: Introduction to Machine Learning
Principal Component Analysis (PCA)
Andrew Nobel
January, 2014
Andrew Nobel
STOR 565
Approximating a Set of Vectors
Given: Vectors u1 , . . . , un ∈ Rp centered so that
P
ui = 0
Goal: Find a low dimensional summary of u1 , . . . , un , more
precisely, a subspace V of Rp such that
dim(V ) much less than p (and n)
projection of uj onto V is close to uj
Note: Smallest subspace of Rp containing {ui } is
V = span(u1 , . . . , un ) with dim(V ) ≤ n
Andrew Nobel
STOR 565
Approximating a Set of Vectors
Consider approximating subspace V of dimension 1, equivalently,
V = {α u0 : α ∈ R} some u0 ∈ Rp with ku0 k = 1.
Note: Projection of u onto V is (ut u0 ) u0 .
Two Goals:
Find u0 to maximize Var({ut1 u0 , . . . , utn u0 })
Find u0 to minimize n−1
Pn
Andrew Nobel
i=1 kui
− (uti u0 ) u0 k2
STOR 565
Variance of Projections
By definition of the variance:
n
n
i=1
i=1
h1 X
i2
1X t
(ui u0 )2 −
uti u0
n
n
Var({ut1 u0 , . . . , utn u0 }) =
n
n
h1
i2
X
1X t
(ui u0 )2 −
ut0
ui )
n
n
=
i=1
i=1
n
1X t
(ui u0 )2
n
=
i=1
The last equality follows since
P
Andrew Nobel
i
ui = 0 (data are centered).
STOR 565
Sum of Squares Fit
Consider sum of squares. Completing square of ith term gives
kui − (uti u0 ) u0 k2 = kui k2 − 2 (uti u0 )2 + (uti u0 )2 ku0 k2
= kui k2 − (uti u0 )2
Therefore
n
1X
kui − (uti u0 ) u0 k2 =
n
i=1
n
n
1X
1 X t
kui k2 −
(ui u0 )2
n
n
i=1
i=1
n
=
1X
kui k2 − Var({uti u0 }) (1)
n
i=1
Conclusion: Choosing u0 to minimize the sum of squares fit is
equivalent to maximizing variance of projection lengths.
Andrew Nobel
STOR 565
Empirical Covariance Matrix
Define X = n × p matrix with rows ut1 , . . . , utn .
Let Σ = n−1 Xt X be p × p covariance matrix of X. Note that
Var({uti u0 }) =
=
n
n
i=1
i=1
1X t 2
1X t
(ui u0 ) =
(u0 ui )(uti u0 )
n
n
1
n
n
X
= ut0
ut0 (ui uti ) u0 = ut0
n
1 X
i=1
1
n
Xt X u0 = ut0 Σ u0
Andrew Nobel
STOR 565
n
i=1
ui uti u0
Best 1-Dimensional Subspace
Upshot: Variance of the projections {ut1 u0 , . . . , utn u0 } onto u0 is
equal to ut0 Σ u0 . Maximized when u0 is the leading e-vector of Σ.
Let λ1 ≥ · · · ≥ λp ≥ 0 be the eigenvalues of Σ, with corresponding
orthonormal eigenvectors v1 , . . . , vn .
Fact: The best one dimensional approximation for u1 ,. . . , un is
obtained by projecting the data vectors onto the line
V1 = span{v1 },
in the direction of the principal eigenvector of Σ.
Andrew Nobel
STOR 565
Approximation Error of V1
By (1) and fact that
n
1
n
Pn
2
i=1 kui k
1X
kui − (uti v1 ) v1 k2 =
n
i=1
= tr(Σ), approximation error is
n
1X
kui k2 − v1t Σ v1
n
i=1
= tr(Σ) − λ1 =
p
X
i=1
λi − λ1 =
p
X
i=2
Residual error after projecting onto v1 is sum of the remaining
eigenvalues 2 through p.
Andrew Nobel
STOR 565
λi
Higher Order PCs
For 1 ≤ d ≤ p the d-dimensional subspace V of Rp minimizing
n
1X
kui − projV (ui )k2
n
i=1
is Vd = span{v1 , . . . , vd } = span of d leading eigenvectors of Σ.
In this case, the projection of u onto Vd is
projVd (u) =
d
X
(ut vj ) vj
j=1
and the approximation error of Vd is given by
n
p
X
i=1
i=d+1
1X
kui − projVd (ui )k2 =
n
Andrew Nobel
STOR 565
λi
Principal Components
Definitions:
Eigenvectors vi of Σ called the principal component directions
(loadings) of {ui } or X
The projection (uti vj ) vj is called the jth principal component
of ui
Definition: The percentage of variation captured by the first d
principal components, equivalently the subspace Vd , is
Pd
λ
Ppi=1 i × 100
j=1 λj
Andrew Nobel
STOR 565
TCGA Gene Expression Data
Heat map of gene expression data from
The Cancer Genome Atlas (TCGA)
Samples
95 Luminal A breast tumors
122 Basal breast tumors
Variables: 2000 randomly selected genes
Andrew Nobel
STOR 565
PCA on TCGA Expression Data
-10
10
30
-40
-20
0 10
20
-30
10
30
-40
0
PC1
10
30
-30
-10
PC2
0
20
-30
-10
PC3
-40
-20
PC4
-40 -20
0
20
40
-30
-10
10
30
Figure : Projections of Sample data onto the first four principal
components of the TCGA dataset. Colors represent subtype of cancer:
Luminal A and Basal
Andrew Nobel
STOR 565
Image Data
Data: X = 458 × 685 matrix of pixel intensities
Question: Can we project columns of the image onto a low
dimensional subspace and still reconstruct the image?
Approach: Project columns of X onto the d leading
eigenvectors of their empirical covariance matrix.
Andrew Nobel
STOR 565
Proportion Variation Explained
1
Proportion of Variation Explained
0.98
0.96
0.94
0.92
0.9
0.88
0.86
0.84
0.82
0.8
0
10
20
30
40
50
60
70
80
Number of Principal Components
Andrew Nobel
STOR 565
90
100
Image Reconstruction
Andrew Nobel
STOR 565