High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

Similar documents
High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional covariance estimation based on Gaussian graphical models

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

The annals of Statistics (2006)

Learning discrete graphical models via generalized inverse covariance matrices

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Permutation-invariant regularization of large covariance matrices. Liza Levina

Robust Inverse Covariance Estimation under Noisy Measurements

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Sparse inverse covariance estimation with the lasso

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

19.1 Problem setup: Sparse linear regression

arxiv: v1 [math.st] 13 Feb 2012

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

The Nonparanormal skeptic

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

The deterministic Lasso

Estimation of Graphical Models with Shape Restriction

STAT 200C: High-dimensional Statistics

Tractable Upper Bounds on the Restricted Isometry Constant

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

Inference in high-dimensional graphical models arxiv: v1 [math.st] 25 Jan 2018

High-dimensional graphical model selection: Practical and information-theoretic limits

Learning Gaussian Graphical Models with Unknown Group Sparsity

Causal Inference: Discussion

Statistica Sinica Preprint No: SS R2

High-dimensional graphical model selection: Practical and information-theoretic limits

Least squares under convex constraint

Reconstruction from Anisotropic Random Measurements

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Sparse Gaussian conditional random fields

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Log Covariance Matrix Estimation

IEOR 265 Lecture 3 Sparse Linear Regression

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

A Parametric Simplex Approach to Statistical Learning Problems

Introduction to graphical models: Lecture III

Sparse Permutation Invariant Covariance Estimation: Final Talk

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Inference For High Dimensional M-estimates. Fixed Design Results

(Part 1) High-dimensional statistics May / 41

12. Perturbed Matrices

arxiv: v2 [econ.em] 1 Oct 2017

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

arxiv: v1 [stat.ml] 3 Nov 2010

CSC 576: Variants of Sparse Learning

Sparse Covariance Selection using Semidefinite Programming

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

High dimensional Ising model selection

Robust Principal Component Analysis

A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

11 : Gaussian Graphic Models and Ising Models

Linear Algebra Massoud Malek

arxiv: v1 [math.st] 5 Oct 2009

arxiv: v2 [math.st] 2 Jul 2017

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

Indirect multivariate response linear regression

Probabilistic Graphical Models

High-dimensional Ordinary Least-squares Projection for Screening Variables

Robust Inverse Covariance Estimation under Noisy Measurements

Convergence of Quantum Statistical Experiments

Composite likelihood estimation of sparse Gaussian graphical models with symmetry

Hypothesis Testing of Matrix Graph Model with. Application to Brain Connectivity Analysis

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

Sparse estimation of high-dimensional covariance matrices

A direct formulation for sparse PCA using semidefinite programming

Compressed Sensing: Lecture I. Ronald DeVore

arxiv: v2 [math.st] 12 Feb 2008

Canonical Correlation Analysis of Longitudinal Data

ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY. By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B.

On the inconsistency of l 1 -penalised sparse precision matrix estimation

Gaussian Graphical Models and Graphical Lasso

The Iterated Lasso for High-Dimensional Logistic Regression

Overlapping Variable Clustering with Statistical Guarantees and LOVE

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

Estimation of large dimensional sparse covariance matrices

A Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology

10708 Graphical Models: Homework 2

Graphical Model Selection

Basic Concepts in Linear Algebra

A direct formulation for sparse PCA using semidefinite programming

Lecture 1 Introduction

A note on the group lasso and a sparse group lasso

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Section 3.9. Matrix Norm

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Sparsity Regularization

Review of Basic Concepts in Linear Algebra

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

Transcription:

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011

Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω i,j ) pxp ω i,j = 0 iff X i and X j are independent

Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω i,j ) pxp ω i,j = 0 iff X i and X j are independent

Previous work Yuan and Li (2007), Banerjee, El Ghaoui and d Aspremont (2008), d Aspremont et al ( 2008), Friedman et al (2008) min tr(ˆσω) logdet(ω) + λσ i j ω i,j where ˆΣ is sample covariance matrix. These methods may not perform well when p is larger than n. Meinshausen Buhlmann (2006) Neighborhood selection approach foucs on identifying the correct graphical model instead of estimation

Motivation Focus in this paper: Estimate a high dimensional inverse covariance matrix which can be well approximated by sparse matrices Estimating procedure via linear programming that has the potential to be used in very high dimensional problem Oracle inequalities are established for the estimation error

Methodology X i = (X 1,...X i 1, X i+1,...x p ) X i X i N(µ i + Σ i, i Σ 1 i, i (X i µ i ), Σ ii Σ i, i Σ 1 i, i Σ i,i) Equivalentyly X i = α i + X iθ (i) + ɛ i ] α i = µ i Σ i, i Σ 1 i, i µ i is scalar θ (i) = Σ 1 i, i Σ i,i is a p-1 dimensional vector ɛ i N(0, Σ ii Σ i, i Σ 1 i, i Σ i,i)

Regression and Inverse Covariance Matrix Ω i,i = ( Σ i,i Σ i, i Σ 1 i, i Σ i,i) 1; Ω i,i = ( Σ i,i Σ i, i Σ 1 i, i Σ i,i Ω i,i = (Var(ɛ i )) 1 Ω i,i = (Var(ɛ i )) 1 θ (i) ) 1Σ 1 i, i Σ i,i; Thus the sparsity in the entries of Ω sparsity in θ (i)

Initial estimate Let Z i = X i X i, then the Dantzig selector estimate of θ (i) is min β R p 1 β l 1 subject to E n [(Z i Z iβ)z i ] l δ E n :sample average δ > 0:tuning parameter Since E n Z i Z j = S ij, then equivalently min β R p 1 β l 1 subject to S i,i S i, i β l δ Once an estimate of θ (i) is obtained ˆ Var(ɛ i ) = E n (X i X i ˆθ (i) ) 2 = S ii 2ˆθ (i) S i, i ˆθ (i) We obtain Ω by repeating this procedure for i = 1,..., p.

Symmetrization Ω is not a good estimate, since it is not symmetric Ω could be a good estimate in a certain matrix operator norm

Matrix Operator Norm For vector X=(X 1, X 2,..., X p ) X lq = ( X 1 q +... + X p q ) 1/q For square matrix A p p Ax lq A lq = sup x 0 x lq Special cases: A l1 = max 1 j p A l = max 1 i p p a ij ; i=1 p a ij ; j=1 A l2 = σ max (A)

Symmetrization Construct ˆΩ by min Ωis symmetric Ω Ω l1

Algorithm Summary

Oracle Set A 0: matrix A is symmetric and positive definite λ min and λ max : the smallest and largest eigenvalue respectively A max = max 1 i,j p a ij ; Approximation error is measured by Σ 0 Ω I max

Oracle Inequality Theorem 1 There exist constants C 1, C 2 depending only on ν, τ, λ min (Ω 0 ) and λ max (Ω 0 ), and C 3 depending only on c 0 such that, for any A> 0, with probability at least 1 P A, ˆΩ Ω 0 l1 C 1 provided that inf Ω O(ν,η,τ) ( Ω Ω0 l1 + deg(ω)δ ), and inf Ω O(ν,η,τ) ( Ω Ω0 l1 + deg(ω)δ ) C 2, δ νη + C 3 ντλ 1 min (Ω 0)((A + 1)n 1 log p) 1/2, where deg(ω) = max i j I(Ω ij 0)

Oracle Inequality Recall that for a symmetric matrix A A l = A l1 and A l1 ( A l1 A l ) 1/2 = A l1 Corollary 2 Same as theorem1, for any A> 0, with probability at least 1 P A, ˆΩ Ω 0 l, ˆΩ Ω 0 l2 C 1 inf Ω O(ν,η,τ) ( Ω Ω0 l1 + deg(ω)δ ), provided that the same condtion as theorem 1 Note: The proposed ˆΩ is not guaranteed to be positive definite, but Cor 2 suggests that with overwhelming probability, it is Positive definite, since λ min (ˆΩ) λ min (Ω 0 ) ˆΩ Ω 0 l2

Oracle Inequality Moreover, a positive definite estimate of Ω can always be constructed by replacing its negative eigenvalues with δ as ˆΩ Corollary 3Same as theorem1, for any A> 0, with probability at least 1 P A, ˆΩ 1 Σ 0 l2, ˆΩ Ω 0 l2 C 1 inf Ω O(ν,η,τ) ( Ω Ω0 l1 +deg(ω)δ ), provided that the same condtion as theorem 1

Sparse Models Define: { M 1 (τ 0, ν 0, d) = A 0 : A l1 τ 0, ν0 1 λ min (A) } λ max (A) ν 0, deg(a) d τ 0, ν 0 > 1, and deg(a)=max i j I(A ij 0) Theorem 4 Assume that d(n 1 log p) 1/2 = o(1). Then sup Ω 0 M 1 (ν 0,τ 0,d) ( log p ) ˆΩ Ω 0 lq = O p d, n provided that δ = C(n 1 log p) 1/2 and C is large enough

Sparse Models Theorem 5 Assume that d(n 1 log p) 1/2 = o(1). Then there exists a constant C > 0 depending only on τ 0, and ν 0 such that inf sup Ω Ω 0 M 1 (τ 0,ν 0,d) { P Ω Ω 0 l1 Cd log p } > 0, n where the infimum is taken over all estimate Ω based on observations X (1),..., X (n) Note: the estimability of a sparse inverse covariance matrix is dictated by its degree deg(a) as opposed to the total number of nonzero entries

Approximately Sparse Models In many applications, the inverse covariance matrix is only approximately sparse. Define: { M 2 (τ 0, ν 0, α, M) = A 0 : A 1 l1 τ 0, ν0 1 λ min (A) λ max (A) ν 0, } p j=1 A ij α ) M, τ 0, ν 0 > 1, and 0 < α < 1 M 1 is the limiting case of M 2 when α approach 0.

Approximately Sparse Models Theorem 6 Assume that M(n 1 log p) 1 α 2 = o(1). Then sup ˆΩ Ω 0 lq = O p (M Ω 0 M 2 (ν 0,τ 0,α,M) ( log p n ) 1 α 2 ), provided that δ = C(n 1 log p) 1/2 and C is sufficiently large. Theorem 7 Assume that M(n 1 log p) 1 α 2 = o(1). Then there exists a constant C > 0 depending only on τ 0, and ν 0 such that and inf sup Ω Ω 0 M 1 (τ 0,ν 0,d) inf sup Σ Σ 0 M 1 (τ 0,ν 0,d) { P Ω Ω 0 l1 CM { P Σ Σ 0 l1 CM ( log p n ( log p n ) 1 α } 2 > 0, ) 1 α } 2 > 0, where the infimum is taken over all estimate based on observations X (1),..., X (n)

Simulation Result n = 50 p = 25, 50, 100, or200 Σ 0 ij = ρ i j,where ρ = 0.1, 0.2,...0.7 set δ = (2n 1 log p) 1 for all simulation studies Methods for comparison: (1) l 1 Penalized likelihood estimate ( Yuan and Li 2007) (2) Neighborhood selection approach ( Meinshausen and Buhlmann 2006) Estimation error measured by Ĉ C l2

Simulation Result