Multivariate Normal Models

Size: px

Start display at page:

Download "Multivariate Normal Models"

Winifred Richards
5 years ago
Views:

1 Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 11 th, 2014 Emily Fox Multivariate Normal Models So far, we looked at univariate multiple regression If one has a multivariate response y i 2 R d Assuming independence between dimensions Emily Fox

Multivariate Normal Models If one has a multivariate response y i 2 R d Assuming correlation between the output dimensions Assume linear (or other mean regression) is removed and

2 Multivariate Normal Models If one has a multivariate response y i 2 R d Assuming correlation between the output dimensions Assume linear (or other mean regression) is removed and focus on the correlation structure Matrix valued parameter! Emily Fox Low-Rank Approximations In pictures 0 = diag( 2 1,..., 2 d) = = + Number of parameters: Emily Fox

Latent Factor Models Original multivariate

Proof: Emily Fox 2014 5 Lower-dim Embeddings

3 Latent Factor Models Original multivariate regression y i = B T x i + i, i N(0, ) = Latent factor model assumption: Low-rank approximation arises from a latent factor model Proof: Emily Fox Lower-dim Embeddings Sharing informa,on in low- dim subspace R d R k Emily Fox

4 Sparsity Assumptions What if we assume is sparse? More often, we can reasonably make statements about conditional independence Emily Fox Information Form Motivations for considering information form of multivariate normal Easier to read off conditional densities Has log-linear form in terms of information parameters Emily Fox

5 Conditional Densities Assume a model with and divide the dimensions into two sets Then, Emily Fox Conditional Densities Let A = {s, t} p(y A y Ā )=N 1 ( A A Āy Ā, AA ) Therefore, Emily Fox

6 Connection with Graphical Models Undirected graphical model or Markov random field (MRF) p(y, ) / Y t t(y t ) Y (s,t)2e st(y s,y t ) t(y t ) / e ty t st(y s,y t ) / e 1 2 y s st y t Emily Fox Sparse Precision vs. Covariance For a sparse precision matrix, the covariance need not be Emily Fox

7 ML Estimation for Given Graph 1 p 2 e 1 2 (y µ)t 1 (y µ) Assume a known graph G = {V,E} Rewrite log likelihood: Emily Fox ML Estimation for Given Graph L( ) = log tr(s ) Take gradient: Many approaches to solving: Barrier method add penalty if (Dahl et al. 2008) leaves the positive definite cone Coordinate descent method (cf., Hastie et al. 2009) Emily Fox

8 ML Estimation for Given Graph Can show that the optimal solution satisfies Example: G = B A S = B A = B = B 0 C A Emily Fox Estimating Graph Structure To learn the structure of the Gaussian graphical model, we want to trade off fit and sparsity Measure of fit: Encouraging sparsity: Overall objective = graphical LASSO or Glasso Emily Fox

9 Solving the Graphical LASSO Objective is convex, but non-smooth as in LASSO Also, positive definite constraint! There are many approaches to optimizing the objective Most common = coordinate descent akin to shooting algorithm (Friedman et al. 2008) Some issues Ballpark: several minutes for a 1000-variable problem Algorithms scale as O(d^3) Other approach = ADMM Emily Fox Faster Computations From Daniela Witten s talk at JSM 2012: 1. The jth variable is unconnected from all others in the graphical lasso solution if and only if S ij apple for all i =1,...,j 1, j +1,...,p. 2. Let A denote the p p matrix whose elements take the form A ii = 1, A ij =1 Sij >. Then the connected components of A are the same as the connected components of the graphical lasso solution. We can obtain the exact right answer by solving the graphical lasso on each connected component separately! Citations: Witten et al. JCGS 2011, Mazumder and Hastie JMLR 2012 Emily Fox

10 Covariance Screening for Glasso From Daniela Witten s talk at JSM 2012: I The solution to the graphical lasso problem with = 0.7 has five connected components (why 5?!) I Perform graphical lasso on each component separately! I Reduction in computational time: From O(50 3 )too(24 3 ). 12 / 23 Emily Fox

Multivariate Normal Models

Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models