Gaussian Graphical Models arguable points Gaussian Graphical Models Jiang Guo Junuary 9th, 2013 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Contents 1 Gaussian Graphical Models Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Sparsity Graphical Lasso CLIME TIGER Greedy methods Measure methods non-gaussian scenario Applications Project 2 arguable points Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Undirected Graphical Model Markov Property pairwise Xu ⊥ ⊥ Xv |XV \{u,v} if {u, v} ∈ /E local global Conditional Independence Partial Correlation Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Gaussian Graphical Model Multivariate Gaussian X ∼ Nd (ξ, Σ) If Σ is positive definite, distribution has density on Rd T f (x|ξ, Σ) = (2π)−d/2 (detΩ)1/2 e−(x−ξ) Ω(x−ξ)/2 where Ω = Σ−1 is the Precision matrix of the distribution. Marginal distribution: Xs ∼ Nd0 (ξs , Σs,s ) Conditional distribution: X1 |X2 ∼ N (ξ1|2 , Σ1|2 ) where: ξ1|2 = ξ1 + Σ12 Σ−1 22 (x2 − ξ2 ) Σ1|2 = Σ11 − Σ12 Σ−1 22 Σ21 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Gaussian Graphical Model Multivariate Gaussian Sample Covariance Matrix n ˜= Σ 1 X (xi − ξ)(xi − ξ)T n − 1 i=1 Precision Matrix Ω = Σ−1 ˜ is not invertible In high dimensional settings where p n, the Σ (semi-positive definite). Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Gaussian Graphical Model Every Multivariate Gaussian distribution can be represented by a pairwise Gaussian Markov Random Field (GMRF) GMRF: Undirected graph G = (V, E) with vertex set V = {1, ..., p} corresponding to random variables edge set E = {(i, j) ∈ V |i 6= j, Ωij 6= 0} Goal: Estimate sparse graph structuregiven n p iid observations. Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Precision matrix estimation Graph recovery also known as ”Graph structure learning/estimation” For each pair of nodes (variables), decide whether there should be an edge Edge(α, β) not exists ⇔ α ⊥ ⊥ β|V \{α, β} ⇔ Ωα,β = 0 Precision matrix is sparse Hence, it turns out to be a non-zero patterns detection problem. x y z w Jiang Guo x y z w 0 1 0 1 1 0 0 1 0 0 0 0 1 1 0 0 Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Precision matrix estimation Parameter Estimation x y z w Jiang Guo x y z w 0 ? 0 ? ? 0 0 ? 0 0 0 0 ? ? 0 0 Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Sparsity The word ”sparsity” has (at least) four related meanings in NLP! (Noah Smith et al.) 1 Data sparsity: N is too small to obtain a good estimate for w. (usually bad) 2 ”Probability” sparsity: most of events receive zero probability 3 Sparsity in the dual: Associated with SVM and other kernel-based methods. 4 Model sparsity: Most dimensions of f is not needed for a good hw ; those dimensions of w can be zero, leading to a sparse w (model) We focus on sense #4. Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Sparsity Linear regression f (~x) = w0 + d X wi ∗ xi i=1 sparsity means some of the wi (s) are zero 1 problem 1: why do we need sparse solution? feature/variable selection better interprete the data shrinkage the size of model computatioal savings discourage overfitting 2 problem 2: how to achieve sparse solution? solutions to come... Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Sparsity Ordinary Least Square Objective function to minimize: L= N X |yi − f (xi )|2 i=1 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Sparsity Ridge: L2 norm Regularization min L = N X |yi − f (xi )|2 + i=1 λ kwk ~ 22 2 equaivalent form (constrained optimization): min L = N X |yi − f (xi )|2 i=1 subject to kwk ~ 22 6 C Corresponds to zero-mean Gaussian prior w ~ ∼ N (0, σ 2 ), i.e. 2 p(wi ) ∝ exp(−λkwk2 ) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Sparsity Lasso: L1 norm Regularization min L = N X |yi − f (xi )|2 + i=1 λ kwk ~ 1 2 equaivalent form (constrained optimization): min L = N X |yi − f (xi )|2 i=1 subject to kwk ~ 16C Corresponds to zero-mean Laplace prior w ~ ∼ Laplace(0, b), i.e. p(wi ) ∝ exp(−λ|wi |) sparse solution Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Why lasso sparse? an intuitive interpretation Ridge Lasso Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Algorithms for the Lasso Subgradient methods interior-point methods (Boyd et al 2007) Least Angle RegreSsion(LARS) (Efron et al 2004), computes entire path of solutions. State of the Art until 2008. Pathwise Coordinate Descent (Friedman, Hastie et al 2007) Proximal Gradient (project gradient) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Pathwise Coordinate Descent for the Lasso Coordinate descent: optimize one parameter (coordinate) at a time. How? suppose we had only one predictor, problem is to minimize X (γi − xi β)2 + λ|β| i Solution is the soft-thresholded estimate ˆ β| ˆ − λ)+ sign(β)(| where βˆ is the least square solution. z−λ z+λ sign(z)(|z| − λ)+ = 0 Jiang Guo and if z > 0 and λ < |z| if z < 0 and λ < |z| if λ > |z| Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Pathwise Coordinate Descent for the Lasso With multiple predictors, cycle Pthrough each predictor in turn. We compute residuals γi = yi − j6=k xij βˆk and apply univariate soft-thresholding, pretending our data is (xij , ri ) Start with large value for λ (high sparsity) and slowly decrease it. Exploits current estimation as warm start, leading to a more stable solution. Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Variable selection Lasso: L1 regularization L= N X λ |yi − f (xi )|2 + kwk ~ 1 2 i=1 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Variable selection Lasso: L1 regularization L= N X λ |yi − f (xi )|2 + kwk ~ 1 2 i=1 Subset selection: L0 regularization L= N X λ ~ 0 |yi − f (xi )|2 + kwk 2 i=1 Greedy forward Greedy backward Greedy forward-backward (Tong Zhang, 2009 and 2011) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Greedy forward-backward algorithm Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Graphical Lasso Recall the Gaussian graphical model(multi-variate gaussian) T f (x|ξ, Σ) = (2π)−d/2 (detΩ)1/2 e−(x−ξ) Ω(x−ξ)/2 log-likelihood ˆ log det Ω − trace(ΣΩ) Graphical lasso (Friedman et al 2007) Maximize the L1 penalized log-likelihood: ˆ − λkΩk1 log det Ω − trace(ΣΩ) Coordinate descent Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim CLIME Constrained L1 Minimization approach to sparse precision matrix Estimation. (CAI et al 2011) CLIME estimator minkΩk1 subject to: ˆ kΣΩ − Ik∞ 6 λn , Ω ∈ Rp×p ˆ have to be symmetrized Solution Ω Equivalent to solving the p optimization problems: ˆ − ei k∞ 6 λn minkβk1 subject to: kΣβ Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim CLIME Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Gaussian Graphical Model and Column-by-Column Regression Consider the conditional distribution X1 |X2 ∼ N (ξ1|2 , Σ1|2 ) where: ξ1|2 = ξ1 + Σ12 Σ−1 22 (X2 − ξ2 ) Σ1|2 = Σ11 − Σ12 Σ−1 22 Σ21 By standardizing the data matrix, i.e. ξ1 = ξ2 = 0 −1 X1 |X2 ∼ N (Σ12 Σ−1 22 X2 , Σ11 − Σ12 Σ22 Σ21 ) Column-by-Column Regression X1 = α1T X2 + 1 where 1 ∼ N (0, σ1 ) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Column-by-Column Regression node-by-node neighbor detection Easy to implement Easy to parallelize, scalable to large scale data Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim TIGER A generalized framework for supervised learning βˆ = arg min L(β; X, Y ) + Ω(β) β A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models (Han et al 2012) SQRT-Lasso 1 βˆ = arg min { √ ky − Xβk2 + λkβk1 } n β∈Rd Tuning-insensitive (good point) State-of-the-art Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Greedy methods High-dimensional (Gaussian) Graphical Model Estimation Using Greedy Methods (Pradeep et al 2012) Rather than exploiting lasso-like methods to achieve sparse solution, we apply greedy methods to do variable selection Global greedy: treat each element of the Precision Matrix as a Variable Local greedy: Column-by-column fashion (Potentially) state-of-the-art Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Global Greedy Estimate graph structure through a series of forward and backward stagewise steps Forward Step: Choose ”best” new edge and add to current estimate, as long as decrease in loss δ exceeds stopping criterion. Backward Step: Choose ”weakest” current edge and remove if increase in loss is < product of backward step factor and decrease in loss due to previous forward step (νδ). ”Best” and ”Weakest” determined by sample-based Gaussian MLE ˆ L(Ω) = log det Ω − trace(ΣΩ) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Local Greedy Estimates each node’s neighborhood in parallel using a series of forward and backward steps Forward Step: Choose ”best” new edge and add to current estimate, as long as decrease in loss δ exceeds stopping criterion. Backward Step: Choose ”weakest” current edge and remove if increase in loss is < product of backward step factor and decrease in loss due to previous forward step (νδ). ”Best” and ”Weakest” determined by least-square loss Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Measure methods Graph recovery: ROC Curves TP: True positive rate FP: False positve rate Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Graph recovery: ROC Curves Decrease the regularization parameter gradually to achieve the solution path An illustration: Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Measure methods Parameter estimation: norm error ˆ − ΩkF Frobenius norm errorµkΩ v uX n um X kAkF = t |a2ij | i=1 j=1 ˆ − Ωk2 Spectrum norm error: kΩ p kAk2 = λmax (A ∗ A) = σmax (A) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Performance: Greedy, TIGER, Clime, Glasso Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Performance: Greedy, TIGER, Clime, Glasso Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim non-gaussian scenario Question? In a general graph, whether a relationship exists between conditional independence and the structure of the precision matrix? Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim non-gaussian scenario Question? In a general graph, whether a relationship exists between conditional independence and the structure of the precision matrix? Remain unsolved. Recently, there are some progresses: High dimensional semiparametric Gaussian copula graphical models (Han et al. 2012) The nonparanormal: Semiparametric estimation of high dimensional undirected graphs (Han et al. 2009) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim non-gaussian scenario Question? In a general graph, whether a relationship exists between conditional independence and the structure of the precision matrix? Remain unsolved. Recently, there are some progresses: High dimensional semiparametric Gaussian copula graphical models (Han et al. 2012) The nonparanormal: Semiparametric estimation of high dimensional undirected graphs (Han et al. 2009) Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses (NIPS 2012 Outstanding Student Paper Awards) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Applications Biomatics Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Applications Biomatics Social media Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Applications Biomatics Social media NLP? Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Applications Biomatics Social media NLP? Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Project: a graphical modeling platform Numerical Precision Matrix Graph Visualization · Infovis (Client, Javascript) · Circos (Server, Perl) Amazon EC2 (Cloud) & Princeton Server Graphical Modeling Platform User dataset Jiang Guo Model Gaussian Graphical Models · Regression Analysising (future) Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estim Project: a graphical modeling platform R Visualization: My visualization(1): Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Arguable points It is student who should push supervisor, rather than the superviser push student Work extremely hard, blindly trust your supervisor Keep quality first, quantity will follow Science is about problems, not equations. Being focused. Think less, do more. Open mind. Jiang Guo Gaussian Graphical Models
© Copyright 2025 ExpyDoc