Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Similar documents
Gaussian Graphical Models and Graphical Lasso

Sparse Gaussian conditional random fields

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Robust Inverse Covariance Estimation under Noisy Measurements

Chapter 17: Undirected Graphical Models

Multivariate Normal Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

Probabilistic Graphical Models

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Multivariate Normal Models

Graphical Model Selection

Probabilistic Graphical Models

10708 Graphical Models: Homework 2

Sparse inverse covariance estimation with the lasso

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Sparse Covariance Selection using Semidefinite Programming

Learning Gaussian Graphical Models with Unknown Group Sparsity

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

Linear Models for Regression CS534

Permutation-invariant regularization of large covariance matrices. Liza Levina

An Introduction to Graphical Lasso

Linear Models for Regression CS534

On the Convergence of the Concave-Convex Procedure

Node-Based Learning of Multiple Gaussian Graphical Models

Sparse and Locally Constant Gaussian Graphical Models

Lecture 25: November 27

Structure estimation for Gaussian graphical models

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

High-dimensional graphical model selection: Practical and information-theoretic limits

Sparse Inverse Covariance Estimation for a Million Variables

High-dimensional covariance estimation based on Gaussian graphical models

A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Joint Gaussian Graphical Model Review Series I

CPSC 540: Machine Learning

Statistical Data Mining and Machine Learning Hilary Term 2016

Linear Regression and Discrimination

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Proximity-Based Anomaly Detection using Sparse Structure Learning

Machine Learning Linear Models

Robust Inverse Covariance Estimation under Noisy Measurements

Introduction to Probabilistic Graphical Models: Exercises

CSCI-567: Machine Learning (Spring 2019)

High-dimensional graphical model selection: Practical and information-theoretic limits

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan

The Expectation-Maximization Algorithm

Robust and sparse Gaussian graphical modelling under cell-wise contamination

GAUSSIAN PROCESS TRANSFORMS

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

ECS289: Scalable Machine Learning

Graphical Models for Collaborative Filtering

Markov Network Estimation From Multi-attribute Data

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Gaussian Mixture Models

Machine Learning for Data Science (CS4786) Lecture 12

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

PMR Learning as Inference

CPSC 540: Machine Learning

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

High-dimensional statistics: Some progress and challenges ahead

Coordinate Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization / Slides adapted from Tibshirani

10-725/36-725: Convex Optimization Prerequisite Topics

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Gaussian Mixture Models

Gaussian Processes 1. Schedule

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning of Sparse Gaussian Graphical Models. Duke University Durham, NC, USA. University of South Carolina Columbia, SC, USA I.

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

1 Bayesian Linear Regression (BLR)

Mixtures of Gaussians. Sargur Srihari

CS281A/Stat241A Lecture 17

Biostat 2065 Analysis of Incomplete Data

Lecture 9: September 28

Fast Direct Methods for Gaussian Processes

Tractable Upper Bounds on the Restricted Isometry Constant

Learning Binary Classifiers for Multi-Class Problem

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression


Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

11 : Gaussian Graphic Models and Ising Models

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

A Framework for Feature Selection in Clustering

The lasso: some novel algorithms and applications

An Efficient Sparse Metric Learning in High-Dimensional Space via l 1 -Penalized Log-Determinant Regularization

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Transcription:

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering & Computer Sciences University of California at Berkeley 2 Department of Civil & Environmental Engineering University of California at Berkeley December 8, 2014 1

Outline Motivation: Gaussian Markov Random Fields Sparse estimator with missing data Concave convex procedure Comparison with Expectation-Maximization algorithm Conclusion 2

Outline Motivation: Gaussian Markov Random Fields Sparse estimator with missing data Concave-Convex Procedure Comparison with Expectation-Maximization algorithm Conclusion Motivation: Gaussian Markov Random Fields 3

Multivariate Gaussian & Gaussian Markov Random Fields Definition: Multivariate Gaussian A random vector x =(X 1,, X p ) T 2 R p is a Multivariate Gaussian with mean µ and inverse covariance matrix Q if its density is 1 (x µ, Q 1 )=(2 ) n/2 Q 1/2 exp 2 (x µ)t Q(x µ) Motivation: Gaussian Markov Random Fields 4

Multivariate Gaussian & Gaussian Markov Random Fields Definition: Multivariate Gaussian A random vector x =(X 1,, X p ) T 2 R p is a Multivariate Gaussian with mean µ and inverse covariance matrix Q if its density is 1 (x µ, Q 1 )=(2 ) n/2 Q 1/2 exp 2 (x µ)t Q(x µ) Definition: Gaussian Markov Random Field A random vector x =(X 1,, X p ) T is a Gaussian Markov random field w.r.t. graph G =(V, E) if it is a Multivariate Gaussian with Q ij =0 () {i, j} /2 E, 8 i 6= j () x i? x j x ij Motivation: Gaussian Markov Random Fields 4

Applications of Gaussian Markov Random Fields (GMRF) Find sparse 1 for dependency patterns between biological factors 1 1 Dobra, Variable selection and dependency networks for genomewide data, Biostatistics, 2009 2 Banerjee, El Ghaoui, daspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR, 2008. 3 Friedman, Hastie, Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9,2008. Motivation: Gaussian Markov Random Fields 5

Applications of Gaussian Markov Random Fields (GMRF) Find sparse 1 for dependency patterns between biological factors 1 Sparse estimator for GMRF from data y 1,, y n 2 R p (ˆµ-centered): 2 3 ˆQ = argmin Q 0 Pn log Q +Tr j=1 y jyj T Q + kqk 1 1 Dobra, Variable selection and dependency networks for genomewide data, Biostatistics, 2009 2 Banerjee, El Ghaoui, daspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR, 2008. 3 Friedman, Hastie, Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9,2008. Motivation: Gaussian Markov Random Fields 5

Outline Motivation: Gaussian Markov Random Fields Sparse estimator with missing data Concave-Convex Procedure Comparison with Expectation-Maximization algorithm Conclusion Sparse estimator with missing data 6

Maximum likelihood with missing data I obs j = observed entries, mis j =missingentriesineachsampley j Sparse estimator with missing data 7

Maximum likelihood with missing data I obs j = observed entries, mis j =missingentriesineachsampley j I Sparse estimator (max. likelihood) with missing data: ˆQ = argmax Q 0 nx j=1 log (y j,obs obsj ) {z } + kqk 1 density of marginal Gaussian N(µ obsj, obsj ) with µ obsj, obsj subvector and submatrix over entries obs j Sparse estimator with missing data 7

What is the dependency of ( obsj ) 1 in Q? 4 Kolar and Xing. Estimating Sparse Precision Matrices from Data with Missing Values. ICML 2012. 5 Stadler and Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Sparse estimator with missing data 8

What is the dependency of ( obsj ) 1 in Q? I Modulo permutation matrix P j P j y j = apple yj,obs y j,mis P j QP T j = apple Qobsj Q misj obs j Q obsj mis j Q misj 4 Kolar and Xing. Estimating Sparse Precision Matrices from Data with Missing Values. ICML 2012. 5 Stadler and Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Sparse estimator with missing data 8

What is the dependency of ( obsj ) 1 in Q? I Modulo permutation matrix P j P j y j = apple yj,obs y j,mis P j QP T j = apple Qobsj Q misj obs j Q obsj mis j Q misj I We have the Schur complement w.r.t. Q obsj : S j (Q) :=( obsj ) 1 = Q obsj Q obsj mis j Q 1 mis j Q misj obs j 4 Kolar and Xing. Estimating Sparse Precision Matrices from Data with Missing Values. ICML 2012. 5 Stadler and Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Sparse estimator with missing data 8

What is the dependency of ( obsj ) 1 in Q? I Modulo permutation matrix P j P j y j = apple yj,obs y j,mis P j QP T j = apple Qobsj Q misj obs j Q obsj mis j Q misj I We have the Schur complement w.r.t. Q obsj : S j (Q) :=( obsj ) 1 = Q obsj Q obsj mis j Q 1 mis j Q misj obs j I Explicit formulation of the sparse estimator with missing data: 4 5 ˆQ = argmin Q 0 nx { log S j (Q) +Tr(y j,obs yj,obs T S j(q))} + kqk 1 j=1 4 Kolar and Xing. Estimating Sparse Precision Matrices from Data with Missing Values. ICML 2012. 5 Stadler and Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Sparse estimator with missing data 8

What is the dependency of ( obsj ) 1 in Q? I Modulo permutation matrix P j P j y j = apple yj,obs y j,mis P j QP T j = apple Qobsj Q misj obs j Q obsj mis j Q misj I We have the Schur complement w.r.t. Q obsj : S j (Q) :=( obsj ) 1 = Q obsj Q obsj mis j Q 1 mis j Q misj obs j I Explicit formulation of the sparse estimator with missing data: 4 5 ˆQ = argmin Q 0 nx { log S j (Q) +Tr(y j,obs yj,obs T S j(q))} + kqk 1 j=1 4 Kolar and Xing. Estimating Sparse Precision Matrices from Data with Missing Values. ICML 2012. 5 Stadler and Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Sparse estimator with missing data 8

What is the dependency of ( obsj ) 1 in Q? I Modulo permutation matrix P j P j y j = apple yj,obs y j,mis P j QP T j = apple Qobsj Q misj obs j Q obsj mis j Q misj I We have the Schur complement w.r.t. Q obsj : S j (Q) :=( obsj ) 1 = Q obsj Q obsj mis j Q 1 mis j Q misj obs j I Explicit formulation of the sparse estimator with missing data: 4 5 ˆQ = argmin Q 0 nx { log S j (Q) +Tr(y j,obs yj,obs T S j(q))} + kqk 1 j=1 4 Kolar and Xing. Estimating Sparse Precision Matrices from Data with Missing Values. ICML 2012. 5 Stadler and Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Sparse estimator with missing data 8

Problem statement I Inverse covariance estimation with missing data has a complicated non-convex objective. Sparse estimator with missing data 9

Problem statement I Inverse covariance estimation with missing data has a complicated non-convex objective. I Design a novel application of the Concave-Convex procedure to solve our program. Sparse estimator with missing data 9

Problem statement I Inverse covariance estimation with missing data has a complicated non-convex objective. I Design a novel application of the Concave-Convex procedure to solve our program. I Better theoretical and experimental convergence than the Expectation-maximization algorithm. Sparse estimator with missing data 9

Outline Motivation: Gaussian Markov Random Fields Sparse estimator with missing data Concave-Convex Procedure Comparison with Expectation-Maximization algorithm Conclusion Concave-Convex Procedure 10

Di erence of convex programs Our program: min f (Q) g(q) s.t. Q 0with f (Q) = g(q) = P n P n j=1 log S j(q) + kqk 1 j=1 Tr(y j,obsyj,obs T S j(q)) Concave-Convex Procedure 11

Di erence of convex programs Our program: min f (Q) g(q) s.t. Q 0with f (Q) = g(q) = P n P n j=1 log S j(q) + kqk 1 j=1 Tr(y j,obsyj,obs T S j(q)) Why are both f and g convex? Concave-Convex Procedure 11

Di erence of convex programs Our program: min f (Q) g(q) s.t. Q 0with f (Q) = g(q) = P n P n j=1 log S j(q) + kqk 1 j=1 Tr(y j,obsyj,obs T S j(q)) Why are both f and g convex? Lemma The function Q 7! S(Q) is concave on the set of positive definite matrices. Concave-Convex Procedure 11

Di erence of convex programs Our program: min f (Q) g(q) s.t. Q 0with f (Q) = g(q) = P n P n j=1 log S j(q) + kqk 1 j=1 Tr(y j,obsyj,obs T S j(q)) Why are both f and g convex? Lemma The function Q 7! S(Q) is concave on the set of positive definite matrices. Proposition The function Q 7! log S(Q) is concave on the set of positive definite matrices. Concave-Convex Procedure 11

Review of the Concave-Convex Procedure Definition: Di erence of Convex (DC) program Let f, g two convex functions and X R n a convex set, a Di erence of Convex (DC) program is such that min f (x) g(x) s.t. x 2X 6 Yuille and Rangarajan. The Concave-Convex Procedure. Neural Computation, 2003. Concave-Convex Procedure 12

Review of the Concave-Convex Procedure Definition: Di erence of Convex (DC) program Let f, g two convex functions and X R n a convex set, a Di erence of Convex (DC) program is such that min f (x) g(x) s.t. x 2X Concave-convex procedure (CCCP) to solve DC programs At x t, solves convex approximation by linearizing g L x t g x t+1 = argmin f (x) g(x t ) rg(x t ) T (x x t ) x2x 6 Yuille and Rangarajan. The Concave-Convex Procedure. Neural Computation, 2003. Concave-Convex Procedure 12

Review of the Concave-Convex Procedure Definition: Di erence of Convex (DC) program Let f, g two convex functions and X R n a convex set, a Di erence of Convex (DC) program is such that min f (x) g(x) s.t. x 2X Concave-convex procedure (CCCP) to solve DC programs At x t, solves convex approximation by linearizing g L x t g x t+1 = argmin f (x) g(x t ) rg(x t ) T (x x t ) x2x Proposition CCCP is a descent method: f (x t+1 ) g(x t+1 ) apple f (x t ) g(x t ) 6 6 Yuille and Rangarajan. The Concave-Convex Procedure. Neural Computation, 2003. Concave-Convex Procedure 12

Illustration of the Concave-Convex Procedure (CCCP) Concave-Convex Procedure 13

Illustration of the Concave-Convex Procedure (CCCP) Concave-Convex Procedure 13

Illustration of the Concave-Convex Procedure (CCCP) Concave-Convex Procedure 13

Illustration of the Concave-Convex Procedure (CCCP) Concave-Convex Procedure 13

Outline Motivation: Gaussian Markov Random Fields Sparse estimator with missing data Concave-Convex Procedure Comparison with Expectation-Maximization algorithm Conclusion Comparison with Expectation-Maximization algorithm 14

Expectation-Maximization (EM) algorithm I Our estimate: ˆQ = argmin Q 0 P n j=1 log (y j,obs obsj ) + kqk 1 {z } density of marginal distribution over obs j Comparison with Expectation-Maximization algorithm 15

Expectation-Maximization (EM) algorithm I Our estimate: ˆQ = argmin Q 0 P n j=1 log (y j,obs obsj ) + kqk 1 {z } density of marginal distribution over obs j I Di cult because ( obsj ) 1 = S j (Q) =Q obsj Q obsj mis j Q 1 mis j Q misj obs j Comparison with Expectation-Maximization algorithm 15

Expectation-Maximization (EM) algorithm I Our estimate: ˆQ = argmin Q 0 P n j=1 log (y j,obs obsj ) + kqk 1 {z } density of marginal distribution over obs j I Di cult because ( obsj ) 1 = S j (Q) =Q obsj Q obsj mis j Q 1 mis j Q misj obs j Expectation-Maximization (EM) algorithm Updates Q t in two steps E-step: y j,mis x j,mis {x j,obs = y j,obs, Q = Q t } ˆ t = P j E x j,mis x j,obs =y j,obs [y j y T j ] M-step: Q t+1 = argmin Q 0 = argmin Q 0 P j E x j,mis x j,obs =y j,obs log (x (Q t ) 1 )+ kqk 1 log Q +Tr(ˆ t Q)+ kqk 1 Comparison with Expectation-Maximization algorithm 15

EM algorithm as a Concave-Convex procedure Proposition For Gaussians, the EM algorithm is a CCCP with DC decomposition: min f EM (Q) g EM (Q) s.t. Q 0 f EM (Q) = log Q + kqk 1 g EM (Q) = 1 n P n j=1 {log Q mis j +Tr(y j,obs y T j,obs S j(q))} Comparison with Expectation-Maximization algorithm 16

EM algorithm as a Concave-Convex procedure Proposition For Gaussians, the EM algorithm is a CCCP with DC decomposition: min f EM (Q) g EM (Q) s.t. Q 0 f EM (Q) = log Q + kqk 1 g EM (Q) = 1 n P n j=1 {log Q mis j + Tr(y j,obs y T j,obs S j(q))} If we pose h(q) := 1 n P n j=1 log Q mis j : EM decomposition: f EM (h + g) Our decomposition: (f EM h) g Comparison with Expectation-Maximization algorithm 16

EM algorithm as a Concave-Convex procedure Proposition For Gaussians, the EM algorithm is a CCCP with DC decomposition: min f EM (Q) g EM (Q) s.t. Q 0 f EM (Q) = log Q + kqk 1 g EM (Q) = 1 n P n j=1 {log Q mis j + Tr(y j,obs y T j,obs S j(q))} If we pose h(q) := 1 n P n j=1 log Q mis j : Proposition EM decomposition: f EM (h + g) Our decomposition: (f EM h) g With our decomposition, CCCP is a lower bound on EM: (f EM h) Lg apple f EM (Lh + Lg) Comparison with Expectation-Maximization algorithm 16

Experimental setting 1. Generate n samples y 1,, y n from N (0, ) from 3 models with dimension p = 10, 50, 100 and n = 100, 150, 200: 7 I Model 1: ij =0.7 j i AR(1) I Model 2: ij = I i=j +0.4I i j =1 +0.2I i j 2{2,3} +0.1I i j =4 ( 0 if i = j I Model 3: = B + I where B ij = 0/0.5 w.p.=0.5 if i 6= j 7 N. Stadler and P. Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Comparison with Expectation-Maximization algorithm 17

Experimental setting 1. Generate n samples y 1,, y n from N (0, ) from 3 models with dimension p = 10, 50, 100 and n = 100, 150, 200: 7 I Model 1: ij =0.7 j i AR(1) I Model 2: ij = I i=j +0.4I i j =1 +0.2I i j 2{2,3} +0.1I i j =4 ( 0 if i = j I Model 3: = B + I where B ij = 0/0.5 w.p.=0.5 if i 6= j 2. Remove at random 20, 40, 60, 80% of the data for each sample 7 N. Stadler and P. Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Comparison with Expectation-Maximization algorithm 17

Experimental setting 1. Generate n samples y 1,, y n from N (0, ) from 3 models with dimension p = 10, 50, 100 and n = 100, 150, 200: 7 I Model 1: ij =0.7 j i AR(1) I Model 2: ij = I i=j +0.4I i j =1 +0.2I i j 2{2,3} +0.1I i j =4 ( 0 if i = j I Model 3: = B + I where B ij = 0/0.5 w.p.=0.5 if i 6= j 2. Remove at random 20, 40, 60, 80% of the data for each sample 3. Impute y j,mis by row means ) complete su cient statistics ˆ 7 N. Stadler and P. Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Comparison with Expectation-Maximization algorithm 17

Experimental setting 1. Generate n samples y 1,, y n from N (0, ) from 3 models with dimension p = 10, 50, 100 and n = 100, 150, 200: 7 I Model 1: ij =0.7 j i AR(1) I Model 2: ij = I i=j +0.4I i j =1 +0.2I i j 2{2,3} +0.1I i j =4 ( 0 if i = j I Model 3: = B + I where B ij = 0/0.5 w.p.=0.5 if i 6= j 2. Remove at random 20, 40, 60, 80% of the data for each sample 3. Impute y j,mis by row means ) complete su cient statistics ˆ 4. Initialization: Q 0 = argmin Q 0 log Q +Tr(ˆ Q)+ kqk 1 7 N. Stadler and P. Buhlmann. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2009. Comparison with Expectation-Maximization algorithm 17

Numerical results on synthetic datasets missing=20% missing=40% missing=60% missing=80% Comparison with Expectation-Maximization algorithm 18

Numerical results on real datasets missing=20% missing=40% missing=60% missing=80% Comparison with Expectation-Maximization algorithm 19

Numerical results Comparison with Expectation-Maximization algorithm 20

Outline Motivation: Gaussian Markov Random Fields Sparse estimator with missing data Concave-Convex Procedure Comparison with Expectation-Maximization algorithm Conclusion Conclusion 21

Summary of contributions I Schur complement is log-concave. I Hence sparse inverse covariance estimator is a DC program. I New CCCP for the sparse inverse covariance estimator. I Superior convergence in number of iterations. I Validated by numerical results on synthetic and real datasets. Conclusion 22