High Dimensional Covariance and Precision Matrix Estimation

Similar documents
Permutation-invariant regularization of large covariance matrices. Liza Levina

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Final Talk

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Estimation of large dimensional sparse covariance matrices

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Multivariate Statistical Analysis

High-dimensional covariance estimation based on Gaussian graphical models

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

Dimension Reduction in Abundant High Dimensional Regressions

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data

arxiv: v2 [math.st] 2 Jul 2017

(Part 1) High-dimensional statistics May / 41

Sparse estimation of high-dimensional covariance matrices

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Maximum Likelihood Estimation for Factor Analysis. Yuan Liao

arxiv: v1 [math.st] 31 Jan 2008

Homework 1. Yuan Yao. September 18, 2011

Robust and sparse Gaussian graphical modelling under cell-wise contamination

2.3. Clustering or vector quantization 57

Sparse Permutation Invariant Covariance Estimation

Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator

Lecture 5 : Projections

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu


The Eigenvalue Problem: Perturbation Theory

14 Singular Value Decomposition

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Chapter 3 Transformations

Estimating Covariance Structure in High Dimensions

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

arxiv: v1 [math.st] 13 Feb 2012

Basic Concepts in Matrix Algebra

High Dimensional Low Rank and Sparse Covariance Matrix Estimation via Convex Minimization

Singular Value Decomposition and Principal Component Analysis (PCA) I

Log Covariance Matrix Estimation

Sparse Permutation Invariant Covariance Estimation

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Sparse Covariance Matrix Estimation with Eigenvalue Constraints

Estimation of Graphical Models with Shape Restriction

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Posterior convergence rates for estimating large precision. matrices using graphical models

MLCC 2015 Dimensionality Reduction and PCA

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Next is material on matrix rank. Please see the handout

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

arxiv: v2 [math.st] 7 Aug 2014

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Independent component analysis for functional data

Unsupervised Learning: Dimensionality Reduction

Vast Volatility Matrix Estimation for High Frequency Data

Eigenvalues and diagonalization

Sparse PCA in High Dimensions

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

. a m1 a mn. a 1 a 2 a = a n

Variable Selection for Highly Correlated Predictors

Matrix Rank Minimization with Applications

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

1 Feature Vectors and Time Series

Lecture 3: Review of Linear Algebra

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Methods for sparse analysis of high-dimensional data, II

Lecture 3: Review of Linear Algebra

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Bootstrapping factor models with cross sectional dependence

Methods for sparse analysis of high-dimensional data, II

Non white sample covariance matrices.

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Approximate Principal Components Analysis of Large Data Sets

UNIT 6: The singular value decomposition.

On Spectral Factorization and Riccati Equations for Time-Varying Systems in Discrete Time

Linear Dimensionality Reduction

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Optimal spectral shrinkage and PCA with heteroscedastic noise

Robust Principal Component Analysis

Lecture 7. Econ August 18

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Section 3.9. Matrix Norm

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

CS540 Machine learning Lecture 5

Second-Order Inference for Gaussian Random Curves

Bootstrapping factor models with cross sectional dependence

LECTURE NOTE #10 PROF. ALAN YUILLE

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

STAT 100C: Linear models

Unsupervised dimensionality reduction

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Tuning-parameter selection in regularized estimations of large covariance matrices

Transcription:

High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 1 / 18

Outline 1 Introduction and Notation 2 Part I Covariance Matrix Estimation Shrinkage Estimation Sparse Estimation Factor Model-based Estimation 3 Part II Precision Matrix Estimation CLIME CONDREG Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 2 / 18

Introduction and Notation Introduction Covariance matrix marginal correlations between variables Precision (inverse covariance) matrix conditional correlations between pairs of variables given the remaining variables The estimation of covariance and precision matrices is fundamental in multivariate analysis. In high dimensional settings, sample covariance matrix has undesirable properties. p > n = singular overspreading eigenvalues under the large p small n scenario Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 3 / 18

Introduction and Notation Eigenvalues for the sample covariance matrix under the large p small n scenario Figure: Average of the largest and smallest eigenvalues of the sample covariance matrices of i.i.d samples from N(0, I) out of 100 replications where p ranges from 5 to 100 and n = 50. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 4 / 18

Introduction and Notation Notation X i = (X i1,..., X ip ) T, i = 1,..., n, are i.i.d. samples of a p-variate random vector X = (X 1,..., X p ) T R p with Cov(X) = Σ and precision matrix Ω = Σ 1. Σ = (σ ij ) p p and Ω = (σ ij ) p p. Sample covariance matrix S n = (s jk ) p p = 1 n ( n 1 i=1 Xi X ) ( X i X ) T, where X = 1 n n i=1 X i. Operator norm of a square matrix A = (a ij ) p p : A op = λ max (A). Frobenius norm : A F = i l 1 -norm A 1 = i A = max 1 i p,1 j p a ij j a ij 2. j a ij, A 1,off = i j i a ij. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 5 / 18

Part I Covariance Matrix Estimation Part I Covariance Matrix Estimation Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 6 / 18

Part I Covariance Matrix Estimation Shrinkage Estimation Shrinkage Estimation Ledoit and Wolf (2003) proposed the shrinkage estimation: S = λt + (1 λ)s n, where T is the target matrix and λ [0, 1] is the shrinkage parameter. T is often chosen to be positive definite and well conditioned. There are two popular target matrices: Identity matrix I diag(s 11,..., s pp) Warton (2008): the sample correlation matrix R n is regularized as ˆR(λ) = λr n + (1 λ)i, where R n = S 1/2 d S n S 1/2 d, and S d = diag(s 11,..., s pp ). Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 7 / 18

Part I Covariance Matrix Estimation Sparse Estimation Sparse Estimation: Banding, Tapering and Thresholding Banding and tapering require a natural ordering among the variables and assume that variables farther apart in the ordering are less correlated. 1. Banding Bickel and Levina (2008a) gives the k-banded estimator of Σ : B k (S n ) = [s ij 1( i j k)] p p. Here, k (0 k p) is the banding parameter which is usually chosen by a cross-validation method. Figure: Banding of a 16 16 matrix whose (i, j)th entry is 0.8 i j. k = 5. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 8 / 18

Part I Covariance Matrix Estimation Sparse Estimation 1. Banding (cont d) The banded estimator is consistent in the operator (spectral) norm, uniformly over the class of approximately bandable matrices U (α, ε) ={Σ : 0 < ε λ min(σ) λ max(σ) ε 1, max { σ j ij ; i j > k} Ck α }. Under the conditions log p 1 0 as p, n. n 2 C > 0, ε > 0 is fixed and independent of p. i α > 0 controls the rate of decay of the covariance entries σ ij as one moves away from the main diagonal. B k (S n) is not necessarily positive definite. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 9 / 18

Part I Covariance Matrix Estimation Sparse Estimation 2. Tapering A tapered estimator of Σ with a tapering matrix W = (w ij ) p p is given by S W = S n W = (s ij w ij ) p p A smoother positive-definite tapering matrix with off diagonal entries gradually decaying to zero will ensure the positive-definiteness as well as optimal rate of convergence of the tapered estimator, eg. Cai et al.(2010) used the trapezoidal weight matrix given by 1, if i j k h, w ij = 2 i j k h, if k h < i j < k, 0, otherwise. Under the autoregressive model scenario, usually use k h = k/2. Banding is a special case of tapering with { 1, if i j k, w ij = 0, otherwise. Consistency under both the operator and Frobenius norms holds in a larger class of covariance matrices than banding where their smallest eigenvalue is allowed to be 0. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 10 / 18

Part I Covariance Matrix Estimation Sparse Estimation Comparison of Banding and Tapering Figure: Banding and tapering of a 16 16 matrix whose (i, j)th entry is 0.8 i j. Upper: banded (k = 5). Lower: tapered (k = 10, k h = 5). Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 11 / 18

Part I Covariance Matrix Estimation Sparse Estimation 3. Thresholding It does not require the variables to be ordered so that the estimator is invariant to permutation of the variables. Sparsity e.g. soft-thresholding. A soft-thresholded covariance matrix estimator is defined by applying the soft thresholding operator to S n elementwise, ˆΣ λ = S(S n, λ), where S(, λ) = sign( )( λ) + is the soft thresholding operator. The soft thresholded estimator is the solution of the following optimization problem ˆΣ λ = argmin{ 1 Σ 2 Σ Sn 2 F + λ Σ 1} = argmin Σ p i=1 j=1 p { 1 2 (σij sij)2 + λ σ ij } Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 12 / 18

Part I Covariance Matrix Estimation Sparse Estimation 3. Thresholding (cont d) Regularize the eigenvalues of S n e.g. Liu (2014): Estimation of Covariance matrices with Eigenvalue Constraints (EC2) The EC2 estimator of the correlation matrix is defined as ˆR EC2 = argmin Σ 1 2 S n Σ 2 F + λ Σ 1,off s.t. τ λ min (Σ), σ jj = 1, where τ > 0 is a desired minimum eigenvalue lower bound of the estimator. The EC2 estimator of the covariance matrix is defined as ˆΣ EC2 = S 1/2 d where S 1/2 d = diag( s 11,..., s pp ). ˆR EC2 S 1/2 d, Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 13 / 18

Part I Covariance Matrix Estimation Factor Model-based Estimation Factor Model-based Estimation In many applications, the more desirable assumption is conditional sparse, i.e. conditional on the common factors, the covariance matrix of the remaining components is sparse. Fan(2013) proposed an estimator of Σ, the principal orthogonal complement thresholding estimator (POET), which can be written as a sum of low rank and sparse matrices. Start with the spectral decomposition of the sample covariance matrix of the data, q S n = ˆλ i ê i ê T i + ˆR i=1 where q is the number of selected PCs and ˆR = (r ij ) is the matrix of residuals. The estimator is obtained by adaptively thresholding the residual matrix after taking out the first q PCs. Finding q using data-based methods is an important familiar and well-studied topic in the literature of PCA and factor analysis. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 14 / 18

Part II Precision Matrix Estimation Part II Precision Matrix Estimation Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 15 / 18

Part II Precision Matrix Estimation CLIME Constrained l 1 -minimization for Inverse Matrix Estimation (CLIME) Cai (2011) The CLIME estimator is the solution of the following optimization problem: where λ > 0 is the tuning parameter. The solution is usually not symmetric. min Ω Ω 1 s.t. S n Ω I λ, Suppose ˆΩ 1 = (ŵ 1 ij ) p p is the solution of the above optimization problem. The final CLIME estimator is defined as ˆΩ = (ŵ ij ), where ŵ ij = ŵ ji = min{ŵ 1 ij, ŵ 1 ji}, which is demonstrated to be positive definite with high probability. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 16 / 18

Part II Precision Matrix Estimation CONDREG CONDition number REGularized estimation (CONDREG) Won (2013) The CONDREG estimator is the solution of the following optimization problem: min tr(ωs n) log det Ω Ω where k > 0 is the tuning parameter. s.t. λ max (Ω)/λ min (Ω) k, Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 17 / 18

Q&A Thank you! Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 18 / 18