Gaussian Graphical Models

Gaussian Graphical Models arguable points
Gaussian Graphical Models
Jiang Guo
Junuary 9th, 2013
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Contents
1
Gaussian Graphical Models
Undirected Graphical Model
Gaussian Graphical Model
Precision matrix estimation
Main approaches
Sparsity
Graphical Lasso
CLIME
TIGER
Greedy methods
Measure methods
non-gaussian scenario
Applications
Project
2
arguable points
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Undirected Graphical Model
Markov Property
pairwise
Xu ⊥
⊥ Xv |XV \{u,v} if {u, v} ∈
/E
local
global
Conditional Independence
Partial Correlation
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Gaussian Graphical Model
Multivariate Gaussian
X ∼ Nd (ξ, Σ)
If Σ is positive definite, distribution has density on Rd
T
f (x|ξ, Σ) = (2π)−d/2 (detΩ)1/2 e−(x−ξ)
Ω(x−ξ)/2
where Ω = Σ−1 is the Precision matrix of the distribution.
Marginal distribution: Xs ∼ Nd0 (ξs , Σs,s )
Conditional distribution:
X1 |X2 ∼ N (ξ1|2 , Σ1|2 )
where: ξ1|2 = ξ1 + Σ12 Σ−1
22 (x2 − ξ2 )
Σ1|2 = Σ11 − Σ12 Σ−1
22 Σ21
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Gaussian Graphical Model
Multivariate Gaussian
Sample Covariance Matrix
n
˜=
Σ
1 X
(xi − ξ)(xi − ξ)T
n − 1 i=1
Precision Matrix
Ω = Σ−1
˜ is not invertible
In high dimensional settings where p n, the Σ
(semi-positive definite).
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Gaussian Graphical Model
Every Multivariate Gaussian distribution can be represented by a
pairwise Gaussian Markov Random Field (GMRF)
GMRF: Undirected graph G = (V, E) with
vertex set V = {1, ..., p} corresponding to random variables
edge set E = {(i, j) ∈ V |i 6= j, Ωij 6= 0}
Goal: Estimate sparse graph structuregiven n p iid observations.
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Precision matrix estimation
Graph recovery
also known as ”Graph structure learning/estimation”
For each pair of nodes (variables), decide whether there should be
an edge
Edge(α, β) not exists ⇔ α ⊥
⊥ β|V \{α, β} ⇔ Ωα,β = 0
Precision matrix is sparse
Hence, it turns out to be a non-zero patterns detection problem.
x
y
z
w
Jiang Guo
x y z w
0 1 0 1
1 0 0 1
0 0 0 0
1 1 0 0
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Precision matrix estimation
Parameter Estimation
x
y
z
w
Jiang Guo
x y z w
0 ? 0 ?
? 0 0 ?
0 0 0 0
? ? 0 0
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Sparsity
The word ”sparsity” has (at least) four related meanings in NLP! (Noah
Smith et al.)
1
Data sparsity: N is too small to obtain a good estimate for w.
(usually bad)
2
”Probability” sparsity: most of events receive zero probability
3
Sparsity in the dual: Associated with SVM and other kernel-based
methods.
4
Model sparsity: Most dimensions of f is not needed for a good hw ;
those dimensions of w can be zero, leading to a sparse w (model)
We focus on sense #4.
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Sparsity
Linear regression
f (~x) = w0 +
d
X
wi ∗ xi
i=1
sparsity means some of the wi (s) are zero
1
problem 1: why do we need sparse solution?
feature/variable selection
better interprete the data
shrinkage the size of model
computatioal savings
discourage overfitting
2
problem 2: how to achieve sparse solution?
solutions to come...
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Sparsity
Ordinary Least Square
Objective function to minimize:
L=
N
X
|yi − f (xi )|2
i=1
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Sparsity
Ridge: L2 norm Regularization
min L =
N
X
|yi − f (xi )|2 +
i=1
λ
kwk
~ 22
2
equaivalent form (constrained optimization):
min L =
N
X
|yi − f (xi )|2
i=1
subject to kwk
~ 22 6 C
Corresponds to zero-mean Gaussian prior w
~ ∼ N (0, σ 2 ), i.e.
2
p(wi ) ∝ exp(−λkwk2 )
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Sparsity
Lasso: L1 norm Regularization
min L =
N
X
|yi − f (xi )|2 +
i=1
λ
kwk
~ 1
2
equaivalent form (constrained optimization):
min L =
N
X
|yi − f (xi )|2
i=1
subject to kwk
~ 16C
Corresponds to zero-mean Laplace prior w
~ ∼ Laplace(0, b), i.e.
p(wi ) ∝ exp(−λ|wi |)
sparse solution
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Why lasso sparse? an intuitive interpretation
Ridge
Lasso
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Algorithms for the Lasso
Subgradient methods
interior-point methods (Boyd et al 2007)
Least Angle RegreSsion(LARS) (Efron et al 2004), computes entire
path of solutions. State of the Art until 2008.
Pathwise Coordinate Descent (Friedman, Hastie et al
2007)
Proximal Gradient (project gradient)
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Pathwise Coordinate Descent for the Lasso
Coordinate descent: optimize one parameter (coordinate) at a time.
How? suppose we had only one predictor, problem is to minimize
X
(γi − xi β)2 + λ|β|
i
Solution is the soft-thresholded estimate
ˆ β|
ˆ − λ)+
sign(β)(|
where βˆ is the least square solution.

 z−λ
z+λ
sign(z)(|z| − λ)+ =

0
Jiang Guo
and

if z > 0 and λ < |z| 
if z < 0 and λ < |z|

if λ > |z|
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Pathwise Coordinate Descent for the Lasso
With multiple predictors, cycle
Pthrough each predictor in turn. We
compute residuals γi = yi − j6=k xij βˆk and apply univariate
soft-thresholding, pretending our data is (xij , ri )
Start with large value for λ (high sparsity) and slowly decrease it.
Exploits current estimation as warm start, leading to a more stable
solution.
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Variable selection
Lasso: L1 regularization
L=
N
X
λ
|yi − f (xi )|2 + kwk
~ 1
2
i=1
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Variable selection
Lasso: L1 regularization
L=
N
X
λ
|yi − f (xi )|2 + kwk
~ 1
2
i=1
Subset selection: L0 regularization
L=
N
X
λ
~ 0
|yi − f (xi )|2 + kwk
2
i=1
Greedy forward
Greedy backward
Greedy forward-backward (Tong Zhang, 2009 and 2011)
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Greedy forward-backward algorithm
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Graphical Lasso
Recall the Gaussian graphical model(multi-variate gaussian)
T
f (x|ξ, Σ) = (2π)−d/2 (detΩ)1/2 e−(x−ξ)
Ω(x−ξ)/2
log-likelihood
ˆ
log det Ω − trace(ΣΩ)
Graphical lasso (Friedman et al 2007)
Maximize the L1 penalized log-likelihood:
ˆ − λkΩk1
log det Ω − trace(ΣΩ)
Coordinate descent
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
CLIME
Constrained L1 Minimization approach to sparse precision matrix
Estimation. (CAI et al 2011)
CLIME estimator
minkΩk1 subject to:
ˆ
kΣΩ − Ik∞ 6 λn , Ω ∈ Rp×p
ˆ have to be symmetrized
Solution Ω
Equivalent to solving the p optimization problems:
ˆ − ei k∞ 6 λn
minkβk1 subject to: kΣβ
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
CLIME
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Gaussian Graphical Model and Column-by-Column
Regression
Consider the conditional distribution
X1 |X2 ∼ N (ξ1|2 , Σ1|2 )
where: ξ1|2 = ξ1 + Σ12 Σ−1
22 (X2 − ξ2 )
Σ1|2 = Σ11 − Σ12 Σ−1
22 Σ21
By standardizing the data matrix, i.e. ξ1 = ξ2 = 0
−1
X1 |X2 ∼ N (Σ12 Σ−1
22 X2 , Σ11 − Σ12 Σ22 Σ21 )
Column-by-Column Regression
X1 = α1T X2 + 1 where 1 ∼ N (0, σ1 )
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Column-by-Column Regression
node-by-node neighbor
detection
Easy to implement
Easy to parallelize, scalable to
large scale data
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
TIGER
A generalized framework for supervised learning
βˆ = arg min L(β; X, Y ) + Ω(β)
β
A Tuning-Insensitive Approach for Optimally Estimating Gaussian
Graphical Models (Han et al 2012)
SQRT-Lasso
1
βˆ = arg min { √ ky − Xβk2 + λkβk1 }
n
β∈Rd
Tuning-insensitive (good point)
State-of-the-art
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Greedy methods
High-dimensional (Gaussian) Graphical Model Estimation Using
Greedy Methods (Pradeep et al 2012)
Rather than exploiting lasso-like methods to achieve sparse solution,
we apply greedy methods to do variable selection
Global greedy: treat each element of the Precision Matrix as a
Variable
Local greedy: Column-by-column fashion
(Potentially) state-of-the-art
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Global Greedy
Estimate graph structure through a series of forward and backward
stagewise steps
Forward Step: Choose ”best” new edge and add to current estimate,
as long as decrease in loss δ exceeds stopping criterion.
Backward Step: Choose ”weakest” current edge and remove if
increase in loss is < product of backward step factor and decrease in
loss due to previous forward step (νδ).
”Best” and ”Weakest” determined by sample-based Gaussian MLE
ˆ
L(Ω) = log det Ω − trace(ΣΩ)
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Local Greedy
Estimates each node’s neighborhood in parallel using a series of
forward and backward steps
Forward Step: Choose ”best” new edge and add to current estimate,
as long as decrease in loss δ exceeds stopping criterion.
Backward Step: Choose ”weakest” current edge and remove if
increase in loss is < product of backward step factor and decrease in
loss due to previous forward step (νδ).
”Best” and ”Weakest” determined by least-square loss
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Measure methods
Graph recovery: ROC Curves
TP: True positive rate
FP: False positve rate
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Graph recovery: ROC Curves
Decrease the regularization parameter gradually to achieve the solution
path
An illustration:
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Measure methods
Parameter estimation: norm error
ˆ − ΩkF
Frobenius norm errorµkΩ
v
uX
n
um X
kAkF = t
|a2ij |
i=1 j=1
ˆ − Ωk2
Spectrum norm error: kΩ
p
kAk2 = λmax (A ∗ A) = σmax (A)
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Performance: Greedy, TIGER, Clime, Glasso
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Performance: Greedy, TIGER, Clime, Glasso
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
non-gaussian scenario
Question?
In a general graph, whether a relationship exists between conditional
independence and the structure of the precision matrix?
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
non-gaussian scenario
Question?
In a general graph, whether a relationship exists between conditional
independence and the structure of the precision matrix?
Remain unsolved. Recently, there are some progresses:
High dimensional semiparametric Gaussian copula graphical models
(Han et al. 2012)
The nonparanormal: Semiparametric estimation of high dimensional
undirected graphs (Han et al. 2009)
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
non-gaussian scenario
Question?
In a general graph, whether a relationship exists between conditional
independence and the structure of the precision matrix?
Remain unsolved. Recently, there are some progresses:
High dimensional semiparametric Gaussian copula graphical models
(Han et al. 2012)
The nonparanormal: Semiparametric estimation of high dimensional
undirected graphs (Han et al. 2009)
Structure estimation for discrete graphical models: Generalized
covariance matrices and their inverses (NIPS 2012 Outstanding
Student Paper Awards)
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Applications
Biomatics
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Applications
Biomatics
Social media
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Applications
Biomatics
Social media
NLP?
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Applications
Biomatics
Social media
NLP?
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Project: a graphical modeling platform
Numerical
Precision Matrix
Graph Visualization
· Infovis (Client, Javascript)
· Circos (Server, Perl)
Amazon EC2 (Cloud)
& Princeton Server
Graphical Modeling Platform
User
dataset
Jiang Guo
Model
Gaussian Graphical Models
· Regression Analysising
(future)
Gaussian Graphical Models arguable points
Undirected Graphical Model Gaussian Graphical Model Precision matrix estim
Project: a graphical modeling platform
R Visualization:
My visualization(1):
Jiang Guo
Gaussian Graphical Models
Gaussian Graphical Models arguable points
Arguable points
It is student who should push supervisor, rather than the superviser
push student
Work extremely hard, blindly trust your supervisor
Keep quality first, quantity will follow
Science is about problems, not equations.
Being focused.
Think less, do more.
Open mind.
Jiang Guo
Gaussian Graphical Models