High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 1 / 18
Outline 1 Introduction and Notation 2 Part I Covariance Matrix Estimation Shrinkage Estimation Sparse Estimation Factor Model-based Estimation 3 Part II Precision Matrix Estimation CLIME CONDREG Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 2 / 18
Introduction and Notation Introduction Covariance matrix marginal correlations between variables Precision (inverse covariance) matrix conditional correlations between pairs of variables given the remaining variables The estimation of covariance and precision matrices is fundamental in multivariate analysis. In high dimensional settings, sample covariance matrix has undesirable properties. p > n = singular overspreading eigenvalues under the large p small n scenario Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 3 / 18
Introduction and Notation Eigenvalues for the sample covariance matrix under the large p small n scenario Figure: Average of the largest and smallest eigenvalues of the sample covariance matrices of i.i.d samples from N(0, I) out of 100 replications where p ranges from 5 to 100 and n = 50. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 4 / 18
Introduction and Notation Notation X i = (X i1,..., X ip ) T, i = 1,..., n, are i.i.d. samples of a p-variate random vector X = (X 1,..., X p ) T R p with Cov(X) = Σ and precision matrix Ω = Σ 1. Σ = (σ ij ) p p and Ω = (σ ij ) p p. Sample covariance matrix S n = (s jk ) p p = 1 n ( n 1 i=1 Xi X ) ( X i X ) T, where X = 1 n n i=1 X i. Operator norm of a square matrix A = (a ij ) p p : A op = λ max (A). Frobenius norm : A F = i l 1 -norm A 1 = i A = max 1 i p,1 j p a ij j a ij 2. j a ij, A 1,off = i j i a ij. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 5 / 18
Part I Covariance Matrix Estimation Part I Covariance Matrix Estimation Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 6 / 18
Part I Covariance Matrix Estimation Shrinkage Estimation Shrinkage Estimation Ledoit and Wolf (2003) proposed the shrinkage estimation: S = λt + (1 λ)s n, where T is the target matrix and λ [0, 1] is the shrinkage parameter. T is often chosen to be positive definite and well conditioned. There are two popular target matrices: Identity matrix I diag(s 11,..., s pp) Warton (2008): the sample correlation matrix R n is regularized as ˆR(λ) = λr n + (1 λ)i, where R n = S 1/2 d S n S 1/2 d, and S d = diag(s 11,..., s pp ). Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 7 / 18
Part I Covariance Matrix Estimation Sparse Estimation Sparse Estimation: Banding, Tapering and Thresholding Banding and tapering require a natural ordering among the variables and assume that variables farther apart in the ordering are less correlated. 1. Banding Bickel and Levina (2008a) gives the k-banded estimator of Σ : B k (S n ) = [s ij 1( i j k)] p p. Here, k (0 k p) is the banding parameter which is usually chosen by a cross-validation method. Figure: Banding of a 16 16 matrix whose (i, j)th entry is 0.8 i j. k = 5. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 8 / 18
Part I Covariance Matrix Estimation Sparse Estimation 1. Banding (cont d) The banded estimator is consistent in the operator (spectral) norm, uniformly over the class of approximately bandable matrices U (α, ε) ={Σ : 0 < ε λ min(σ) λ max(σ) ε 1, max { σ j ij ; i j > k} Ck α }. Under the conditions log p 1 0 as p, n. n 2 C > 0, ε > 0 is fixed and independent of p. i α > 0 controls the rate of decay of the covariance entries σ ij as one moves away from the main diagonal. B k (S n) is not necessarily positive definite. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 9 / 18
Part I Covariance Matrix Estimation Sparse Estimation 2. Tapering A tapered estimator of Σ with a tapering matrix W = (w ij ) p p is given by S W = S n W = (s ij w ij ) p p A smoother positive-definite tapering matrix with off diagonal entries gradually decaying to zero will ensure the positive-definiteness as well as optimal rate of convergence of the tapered estimator, eg. Cai et al.(2010) used the trapezoidal weight matrix given by 1, if i j k h, w ij = 2 i j k h, if k h < i j < k, 0, otherwise. Under the autoregressive model scenario, usually use k h = k/2. Banding is a special case of tapering with { 1, if i j k, w ij = 0, otherwise. Consistency under both the operator and Frobenius norms holds in a larger class of covariance matrices than banding where their smallest eigenvalue is allowed to be 0. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 10 / 18
Part I Covariance Matrix Estimation Sparse Estimation Comparison of Banding and Tapering Figure: Banding and tapering of a 16 16 matrix whose (i, j)th entry is 0.8 i j. Upper: banded (k = 5). Lower: tapered (k = 10, k h = 5). Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 11 / 18
Part I Covariance Matrix Estimation Sparse Estimation 3. Thresholding It does not require the variables to be ordered so that the estimator is invariant to permutation of the variables. Sparsity e.g. soft-thresholding. A soft-thresholded covariance matrix estimator is defined by applying the soft thresholding operator to S n elementwise, ˆΣ λ = S(S n, λ), where S(, λ) = sign( )( λ) + is the soft thresholding operator. The soft thresholded estimator is the solution of the following optimization problem ˆΣ λ = argmin{ 1 Σ 2 Σ Sn 2 F + λ Σ 1} = argmin Σ p i=1 j=1 p { 1 2 (σij sij)2 + λ σ ij } Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 12 / 18
Part I Covariance Matrix Estimation Sparse Estimation 3. Thresholding (cont d) Regularize the eigenvalues of S n e.g. Liu (2014): Estimation of Covariance matrices with Eigenvalue Constraints (EC2) The EC2 estimator of the correlation matrix is defined as ˆR EC2 = argmin Σ 1 2 S n Σ 2 F + λ Σ 1,off s.t. τ λ min (Σ), σ jj = 1, where τ > 0 is a desired minimum eigenvalue lower bound of the estimator. The EC2 estimator of the covariance matrix is defined as ˆΣ EC2 = S 1/2 d where S 1/2 d = diag( s 11,..., s pp ). ˆR EC2 S 1/2 d, Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 13 / 18
Part I Covariance Matrix Estimation Factor Model-based Estimation Factor Model-based Estimation In many applications, the more desirable assumption is conditional sparse, i.e. conditional on the common factors, the covariance matrix of the remaining components is sparse. Fan(2013) proposed an estimator of Σ, the principal orthogonal complement thresholding estimator (POET), which can be written as a sum of low rank and sparse matrices. Start with the spectral decomposition of the sample covariance matrix of the data, q S n = ˆλ i ê i ê T i + ˆR i=1 where q is the number of selected PCs and ˆR = (r ij ) is the matrix of residuals. The estimator is obtained by adaptively thresholding the residual matrix after taking out the first q PCs. Finding q using data-based methods is an important familiar and well-studied topic in the literature of PCA and factor analysis. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 14 / 18
Part II Precision Matrix Estimation Part II Precision Matrix Estimation Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 15 / 18
Part II Precision Matrix Estimation CLIME Constrained l 1 -minimization for Inverse Matrix Estimation (CLIME) Cai (2011) The CLIME estimator is the solution of the following optimization problem: where λ > 0 is the tuning parameter. The solution is usually not symmetric. min Ω Ω 1 s.t. S n Ω I λ, Suppose ˆΩ 1 = (ŵ 1 ij ) p p is the solution of the above optimization problem. The final CLIME estimator is defined as ˆΩ = (ŵ ij ), where ŵ ij = ŵ ji = min{ŵ 1 ij, ŵ 1 ji}, which is demonstrated to be positive definite with high probability. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 16 / 18
Part II Precision Matrix Estimation CONDREG CONDition number REGularized estimation (CONDREG) Won (2013) The CONDREG estimator is the solution of the following optimization problem: min tr(ωs n) log det Ω Ω where k > 0 is the tuning parameter. s.t. λ max (Ω)/λ min (Ω) k, Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 17 / 18
Q&A Thank you! Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23 rd February, 2017 18 / 18