An Introduction to Graphical Lasso

Size: px

Start display at page:

Download "An Introduction to Graphical Lasso"

Justin Barrie Mitchell
5 years ago
Views:

1 An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, / 16

2 Undirected Graphical Models An undirected graph, each vertex represents a random variable. The absence of an edge between two vertices means the corresponding random variables are conditionally independent, given other variables. The Gaussian distribution is widely used for such graphical models, because of its convenient analytical properties. Penalized regression methods for inducing sparsity in the precision matrix are central to the construction of Gaussian graphical models. Bo Chang (UBC) Graphical Lasso May 15, / 16

3 Precision Matrix Denote the covariance matrix by Σ, then the inverse covariance matrix Θ = Σ 1 is called precision matrix. Let θ ij be the (i, j)th element of Θ. θ ij = σ ij;rest det(σ (ij) ) det(σ) 1. σ ij;rest : conditional/partial covariance of variables i and j, given the other variables. Σ (ij) : matrix Σ with ith row and jth column removed. If θ ij = 0, then variables i and j are conditionally independent, given other variables. Bo Chang (UBC) Graphical Lasso May 15, / 16

4 Precision Matrix Suppose we partition X T = (X T 1, X 2), where X 1 consists of the first d 1 variables and X 2 is the last. We have the partition of Σ and Θ: ( ) ( ) Σ11 σ Σ = 12 Θ11 θ σ12 T, Θ = 12 σ 22 θ12 T. θ 22 Let β = Σ 1 11 σ 12 be the multiple linear regression coefficient of X 2 on X 1. Since ΣΘ = I, Σ 11 θ 12 + σ 12 θ 22 = 0, β = Σ 1 11 σ 12 = θ 12 /θ 22. Bo Chang (UBC) Graphical Lasso May 15, / 16

5 Precision Matrix Regression coefficient: β = θ 12 /θ 22. We can learn about the dependence structure through multiple linear regression. Meinshausen and Bhlmann (2006) try to estimate which components θ ij are zero, rather than fully estimate Θ. They fit a lasso regression using each variable as the response and the others as predictors. Bo Chang (UBC) Graphical Lasso May 15, / 16

6 Lasso Minimize Q(β) = 1 2 Y X β 2 + λ j β j. When n = p = 1 and X = 1, Q(β) = 1 2 (y β)2 + λ β. Q (β) = y + β + λ sign(β) = 0. Lasso solution ˆβ(λ) = sign(y)( y λ) + = S(y, λ), where S(y, λ) is called the soft-thresholding operator. Bo Chang (UBC) Graphical Lasso May 15, / 16

7 Graphical Lasso A more systematic approach by Friedman, Hastie and Tibshirani (2008). Consider maximizing the penalized log-likelihood log(det[θ]) trace(sθ) λ Θ 1. S: sample covariance matrix. Θ 1 : element L 1 norm, the sum of the absolute values of the elements of Θ. The gradient equation Θ 1 S λ Sign(Θ) = 0. Bo Chang (UBC) Graphical Lasso May 15, / 16

8 Graphical Lasso The gradient equation Θ 1 S λ Sign(Θ) = 0. Let W = Θ 1 and ( ) ( ) W11 w 12 Θ11 θ 12 w12 T w 22 θ12 T = θ 22 where β = θ 12 /θ 22. w 12 = W 11 θ 12 /θ 22 = W 11 β, The upper right block of the gradient equation: W 11 β s 12 + λ Sign(β) = 0 ( ) I 0 0 T. 1 which is recognized as the estimation equation for the Lasso regression. Bo Chang (UBC) Graphical Lasso May 15, / 16

9 Graphical Lasso Bo Chang (UBC) Graphical Lasso May 15, / 16

10 Graphical Lasso Coordinate descent: Let V = W 11, ˆβ i S(s 12i k j V ki ˆβ k, λ)/v ii, where S(y, λ) is the soft-thresholding operator. Bo Chang (UBC) Graphical Lasso May 15, / 16

11 Analysis of Protein-signalling Data We analyze a flow cytometry dataset on d = 11 proteins and n = 7466 cells. Several methods are compared: Graphical Lasso Bayesian Network Truncated Vine (Sequential MST) Factor Analysis Bo Chang (UBC) Graphical Lasso May 15, / 16

12 Discrepancy Measure A common discrepancy measure in the psychometrics and structural equation modeling literatures is: D = log(det[r model (ˆδ)]) log(det[r data ]) + tr[r 1 model (ˆδ)R data ] d. d: number of variables. R data : sample correlation matrix. R model (ˆδ): model-based correlation matrix based on the estimate of the parameter δ. If either model has some conditional independence relations, then the dimension of δ is less than d(d 1)/2. Bo Chang (UBC) Graphical Lasso May 15, / 16

13 Discrepancy Measure Other comparisons are the AIC/BIC based on a Gaussian log-likelihood. Also useful are the average and max absolute deviations of the model-based correlation matrix from the empirical correlation matrix: max j<k R data,jk R model,jk (ˆδ). Bo Chang (UBC) Graphical Lasso May 15, / 16

14 Results Model Dfit MaxAbsDiff AIC( 10 5 ) BIC( 10 5 ) #Par BN glasso(λ = 0.13) glasso(λ = 0.10) glasso(λ = 0.08) truncated seq. MST truncated seq. MST truncated seq. MST truncated seq. MST truncated seq. MST factor factor factor factor Bo Chang (UBC) Graphical Lasso May 15, / 16

15 References Hastie, T., Tibshirani, R., Friedman, J. (2009). The elements of statistical learning. New York: springer. Pourahmadi, M. (2013). High-Dimensional Covariance Estimation: With High-Dimensional Data. John Wiley & Sons. Meinshausen, N., & Bhlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), Chicago Bo Chang (UBC) Graphical Lasso May 15, / 16

16 The End Bo Chang (UBC) Graphical Lasso May 15, / 16

Graphical Model Selection

May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor