Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix

Similar documents
Estimation of the Global Minimum Variance Portfolio in High Dimensions

Non white sample covariance matrices.

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

Large sample covariance matrices and the T 2 statistic

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data

Covariance estimation using random matrix theory

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

arxiv: v1 [math.pr] 13 Aug 2007

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

Homework 1. Yuan Yao. September 18, 2011

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

Gaussian vectors and central limit theorem

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

Eigenvectors of some large sample covariance matrix ensembles

Central limit theorems for functionals of large dimensional sample covariance matrix and mean vector in matrix-variate skewed model

Asymptotic Statistics-VI. Changliang Zou

INVERSE EIGENVALUE STATISTICS FOR RAYLEIGH AND RICIAN MIMO CHANNELS

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST

Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator

Random Matrices: Invertibility, Structure, and Applications

Weiming Li and Jianfeng Yao

Eigenvectors of some large sample covariance matrix ensembles.

Preface to the Second Edition...vii Preface to the First Edition... ix

LARGE DIMENSIONAL RANDOM MATRIX THEORY FOR SIGNAL DETECTION AND ESTIMATION IN ARRAY PROCESSING

Design of MMSE Multiuser Detectors using Random Matrix Techniques

Lecture I: Asymptotics for large GUE random matrices

Section 8.1. Vector Notation

Inhomogeneous circular laws for random matrices with non-identically distributed entries

Random Matrices for Big Data Signal Processing and Machine Learning

A new method to bound rate of convergence

Multiple Random Variables

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

Gaussian Models (9/9/13)

A Note on Hilbertian Elliptically Contoured Distributions

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Asymptotic Statistics-III. Changliang Zou

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Multivariate Distributions

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

Semicircle law on short scales and delocalization for Wigner random matrices

The Dynamics of Learning: A Random Matrix Approach

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

arxiv: v3 [math-ph] 21 Jun 2012

Regularized LRT for Large Scale Covariance Matrices: One Sample Problem

Notes on Random Vectors and Multivariate Normal

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

arxiv: v5 [math.na] 16 Nov 2017

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Random Matrix Theory and its Applications to Econometrics

Statistical Inference On the High-dimensional Gaussian Covarianc

Principal Component Analysis

An Introduction to Large Sample covariance matrices and Their Applications

Regression and Statistical Inference

An introduction to G-estimation with sample covariance matrices

Gaussian random variables inr n

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

Invertibility of random matrices

Comparison Method in Random Matrix Theory

3. Probability and Statistics

conditional cdf, conditional pdf, total probability theorem?

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Numerical Solutions to the General Marcenko Pastur Equation

EUSIPCO

Supelec Randomness in Wireless Networks: how to deal with it?

A remark on the maximum eigenvalue for circulant matrices

The Matrix Dyson Equation in random matrix theory

Local semicircle law, Wegner estimate and level repulsion for Wigner random matrices

1 Intro to RMT (Gene)

The norm of polynomials in large random matrices

A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS

The properties of L p -GMM estimators

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

Stein s Method and Characteristic Functions

Multiplicative Perturbation Bounds of the Group Inverse and Oblique Projection

Convergence in Distribution

Normal approximation of Poisson functionals in Kolmogorov distance

Gaussian, Markov and stationary processes

Submitted to the Brazilian Journal of Probability and Statistics

Moore Penrose inverses and commuting elements of C -algebras

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Trace Class Operators and Lidskii s Theorem

Estimation of large dimensional sparse covariance matrices

Department of Statistics

Seminar on Linear Algebra

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

Random matrices: Distribution of the least singular value (via Property Testing)

2 Two-Point Boundary Value Problems

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

FORMULATION OF THE LEARNING PROBLEM

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Elliptically Contoured Distributions

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Transcription:

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix Taras Bodnar a, Holger Dette b and Nestor Parolya c a Department of Mathematics, Stockholm University, SE-069 Stockhom, Sweden e-mail: taras.bodnar@math.su.se b Department of Mathematics, Ruhr University Bochum, D-44870 Bochum, Germany e-mail: holger.dette@ruhr-uni-bochum.de c Institute of Empirical Economics, Leibniz University Hannover, D-3067 Hannover, Germany e-mail: nestor.parolya@ewifo.uni-hannover.de Abstract For a sample of n independent identically distributed p-dimensional centered random vectors with covariance matrix Σ n let S n denote the usual sample covariance centered by the mean) and S n the non-centered sample covariance matrix i.e. the matrix of second moment estimates), where p > n. In this paper, we provide the limiting spectral distribution and central limit theorem for linear spectral statistics of the Moore-Penrose inverse of S n and S n. We consider the large dimensional asymptotics when the number of variables p and the sample size n such that p/n c, + ). We present a Marchenko-Pastur law for both types of matrices, which shows that the limiting spectral distributions for both sample covariance matrices are the same. On the other hand, we demonstrate that the asymptotic distribution of linear spectral statistics of the Moore-Penrose inverse of S n differs in the mean from that of S n. AMS 200 subject classifications: 60B20, 60F05, 60F5, 60F7, 62H0 Keywords: CLT, large-dimensional asymptotics, Moore-Penrose inverse, random matrix theory.

Introduction Many statistical, financial and genetic problems require estimates of the inverse population covariance matrix which are often constructed by inverting the sample covariance matrix. Nowadays, the modern scientific data sets involve the large number of sample points which is often less than the dimension number of features) and so the sample covariance matrix is not invertible. For example, stock markets include a large number of companies which is often larger than the number of available time points; or the DNA can contain a fairly large number of genes in comparison to a small number of patients. In such situations, the Moore-Penrose inverse or pseusoinverse of the sample covariance matrix can be used as an estimator for the precision matrix [see, e.g., Srivastava 2007), Kubokawa and Srivastava 2008), Hoyle 20), Bodnar et al. 205)]. In order to better understand the statistical properties of estimators and tests based on the Moore-Penrose inverse in high-dimensional settings, it is of interest to study the asymptotic spectral properties of the Moore-Penrose inverse, for example convergence of its linear spectral statistics LSS). This information is of great interest for high-dimensional statistics because more efficient estimators and tests, which do not suffer from the curse of dimensionality and do not reduce the number of dimensions, may be constructed and applied in practice. Most of the classical multivariate procedures are based on the central limit theorems assuming that the dimension p is fixed and the sample size n increases. However, it has been pointed out by numerous authors that this assumption does not yield precise distributional approximations for commonly used statistics, and that better approximations can be obtained considering scenarios where the dimension tends to infinity as well [see, e.g., Bai and Silverstein 2004) and references therein]. More precisely, under the high-dimensional asymptotics we understand the case when the sample size n and the dimension p tend to infinity, such that their ratio p/n converges to some positive constant c. Under this condition the well-known Marchenko-Pastur equation as well as Marchenko-Pastur law were derived [see, Marčenko and Pastur 967), Silverstein 995)]. While most authors in random matrix theory investigate spectral properties of the sample covariance matrix S n = n n i= y iy i here y,..., y n denotes a sample of i.i.d. p-dimensional random vectors with mean 0 and variance Σ n ), Pan 204) studies the differences occurring if S n is replaced by its centered version S n = n n i= y i ȳ)y i ȳ) here ȳ denotes the mean of y,..., y n ). Corresponding asymptotic) spectral properties for the inverse of S n have been recently derived by Zheng et al. 203) in the case p < n, which correspond to the case c <. The aim of the present paper is to close a gap in the literature and focussing on the case c, ). We investigate the differences in the asymptotic spectral properties of Moore-Penrose inverses of centered and non-centered sample covariance matrices. In particular we provide the limiting spectral distribution and the central limit theorem CLT) for linear spectral statistics of the Moore-Penrose inverse of the sample covariance matrix. In Section 2 we present the Marchenko-Pastur equation together with a Marchenko-Pastur law for the Moore-Penrose inverse of the sample covariance matrix. Section 3 is divided into two parts: the first one is dedicated to the CLT for the LSS of the pseusoinverse of the non-centered sample covariance matrix while the second part covers the case when the sample covariance matrix is a centered one. While the limiting spectral distributions for both sample covariance matrices are the same, it is shown that the asymptotic distribution of LLS of the Moore-Penrose inverse of S n and S n differ. Finally, some technical details are given in Section 4. 2

2 Preliminaries and the Marchenko-Pastur equation Throughout this paper we use the following notations and assumptions: For a symmetric matrix A we denote by λ A)... λ p A) its ordered eigenvalues and by F A t) the corresponding empirical distribution function e.d.f.), that is F A t) = p p {λ i A) t}, i= where { } is the indicator function. A) Let X n be a p n matrix which consists of independent and identically distributed i.i.d.) real random variables with zero mean and unit variance. A2) For the latter matrix X n = X ij ) j=,...,n i=,...,p for some δ > 0. By Y n = Σ 2 n X n. we assume additionally that EX4+δ ) < we define a p n observation matrix with independent columns with mean 0 and covariance matrix Σ n. It is further assumed that neither Σ 2 n nor X n are observable. The centered and non-centered sample covariance matrix are denoted by S n = n Y n ȳ )Y n ȳ ) = n Y ny n ȳȳ S n = n Y ny n = n Σ 2 n X n X nσ 2 n, where denotes the n-dimensional vector of ones and ȳ = n n i= y i. The corresponding e.d.f. s are given by F S n and F Sn, respectively. The Moore-Penrose inverse of a p p matrix A is denoted by A + and from its definition must satisfy the following four criteria [see, e.g., Horn and Johnsohn 985)] i) AA + A = A, ii) A + AA + = A +, iii) AA + is symmetric, iv) A + A is symmetric. It is worth pointing out that the generalized inverse considered recently by Bodnar et al. 205) does not satisfy the conditions iii) and iv) presented above. If the matrix A has a full column rank then the matrix A A is invertible and the Moore-Penrose inverse obeys a simple representation given by A + = A A) A. 2.) We could easily include the population mean vector into the model but it will only make the formulas for weak convergence more complex not the analysis itself. 3

For a function G : R R of bounded variation we introduce the Stieltjes transform m G z) = + λ z dgλ); z C+. Remark. Although assumption A2) requires the existence of moments of order 4 + δ, we suspect that the results of this paper also hold under the existence of moments of order 4. For a proof one would have to use truncation techniques as provided by Bai et al. 2007) for the matrix /ny ny n. These extremely technical details are omitted for the sake of brevity and transparency. In this paper we are interested in the asymptotic properties of the empirical distribution function and linear spectral statistics of the eigenvalues of the Moore-Penrose inverse of the matrices S n and S n. Actually, the limiting spectral distribution of both matrices coincide because they differ only by a rank one perturbation. This is shown in the following Lemma 2.. Lemma 2.. Let S + n and S + n be the Moore-Penrose inverses of centered and non-centered sample covariance matrices, respectively, then F S + n F S+ n 2 p, 2.2) where g is the usual supremum norm of a function g : R R. Proof. We obtain for the rank one update of the Moore-Penrose inverse [see Meyer 973)] S + n = S n ȳȳ ) + = S + n S+ n ȳȳ S + n ) 2 + S + n ) 2 ȳȳ S + n ) ȳ S + n ) 2 ȳ + ȳ S + n ) 3 ȳ ȳ S + n ) 2 ȳ) 2 S+ n ȳȳ S + n. 2.3) With the notation u = S + n ȳ and v = S+ n ) 2 ȳ the difference of the Moore-Penrose inverses can therefore be rewritten as follows S + n S + n = uv + vu u v u u u u) 2 uu = u u [ u v ] /2u [ u u ] /2 )[ u v ] /2u [ u u ] /2 ) ) u u u v vv v v u u u v u u u v = u v vv u u ww, 2.4) where w = [ ] u /2 v u v u u [ ] u /2 u. u v Thus, the difference S + n S + n is a matrix at most of rank 2 and the rank inequality in Theorem A.43 of Bai and Silverstein 200) yields the estimate 2.2). We now study the asymptotic properties of the e.d.f. of the spectrum of the Moore-Penrose inverse of the sample covariance matrices S n and S n. As a consequence of Lemma 2. and equality 2.) the asymptotic properties of the e.d.f. of both Moore-Penrose inverses can be studied concentrating on the matrix S + n = /ny n Y n) + = [ / ny n ) +] / nyn ) + = /ny n /ny ny n ) 2 Y n together with the corresponding e.d.f. F S+ n. Indeed, because m F /ny n Yn z) tends almost surely to the solution of the Marchenko-Pastur equation we obtain the following result. 4

Theorem 2.. Assume that A) holds, p Σn c, ) as n and that F converges n weakly to a cumulative distribution function c.d.f.) H. Then the e.d.f. F S+ n converges weakly almost surely to some deterministic c.d.f. P whose Stieltjes transformation m P satisfies the following equation m P z) = z 2 c + + dhτ) zτczm P z) + ) Proof. Let S n = U n D n U n be the eigenvalue decomposition of the matrix S n, then m F S + n z) = p tr [ Un D n U n) + zi ) ] = p tr [ D + n zi ) ] = p n p = n p ) + n z p n n i= λ i /ny ny n ) ) z ). ) ) + n z p m F /ny nyn) z). 2.5) The last two equalities follow from the fact that the spectrum of the matrix S + n differs from that of /ny ny n ) by exactly p n zero eigenvalues. For the Stieltjes transform m F /ny n Yn) z) in this expression we get m F /ny n z) = n Yn) n λ i= i /ny ny n ) z = n λ i /ny ny n ) nz λ i= i /ny ny n ) z = z n z 2 n λ i= i /ny ny n ) z = z ) z m 2 F /ny n. 2.6) Yn z Combining the identities 2.5) and 2.6) provides an equation which relates the Stieltjes transforms of F S+ n and F /ny ny n, that is m F S + n z) = z n p z 2 m F /ny n Yn /z). 2.7) It now follows from Bai and Silverstein 200) that as p/n c > the e.d.f. s F /ny ny n and F /nyny n converge weakly almost surely to non-generate distribution functions F and F with corresponding Stieltjes transforms satisfying the equation m F z) = c z Consequently, we have from 2.7) almost surely + cm F z). m F S + n z) m P z) := z c z 2 zc ) + cm F /z)) = 2 c z m F /z) z 2 5

as n, where m F /z) is the Stieltjes transform of the limiting distribution F of the e.d.f. of S n, which satisfies the equation see, Silverstein 995)) m F /z) = + dhτ) τ c c m F /z) ) z z. 2.8) Thus, zc 2 zm P z)) = m F /z) must satisfy the same equation 2.8) which implies zc 2 zm P z)) = + After some simplification the result follows. dhτ) τ c c zc 2 zm P z)) ) z z Corollary 2.. If Σ n = σ 2 I n and the assumptions of Theorem 2. are satisfied, then the e.d.f. of S n converges weakly almost surely to a deterministic distribution function P with Stieltjes transform m P z) = z Moreover, the limiting distribution is given by + /z + c )σ2 + /z cσ 2 + σ 2 ) 2 4/zσ 2 2σ 2 c F = c )δ 0 + νx)dx where δ a denotes the Dirac measure at the point a R and λ c + /x)/x λ ), x [λ νx) = 2πσ 2 +, λ ] x 0, otherwise, with λ + = σ 2 + c) 2 and λ = σ 2 c) 2. 3 CLT for linear spectral statistics For a random) symmetric matrix A with spectrum λ A),..., λ p A) we consider the linear spectral statistic + F A g) = gx)df A x) = p gλ i A)) p where g : R R is a given test function. In the next two sections we will investigate the asymptotic properties of F S+ n and F S + n n. i=. ). 3. Linear spectral statistics of S + n In the following discussion we consider the random function G n x) = pf S+ n Pn)x), where Pn is a finite sample proxy of the limiting spectral distribution of S + n, namely P. Note that the function Pn is constructed simply by substituting p/n for c and H n = F Σn for H into the limiting distribution P. 6

Theorem 3.. Assume that i) A) and A2) hold. ii) p n c, ) as n. iii) The population covariance matrix Σ n is nonrandom symmetric and positive definite with a bounded spectral norm and the e.d.f. H n = F Σn converges weakly to a nonrandom distribution function H with a support bounded away from zero. iv) g,..., g k are functions on R analytic on an open region D of the complex plane, which contains the real interval [ 0, lim sup λ max Σ n )/ c) 2] n v) Let e i denote a p-dimensional vector with the i-th element and others 0, and define κ i z) = e iσ /2 n m F /z)σ n + I) Σn /2 e i, χ i z) = e iσ /2 n m F /z)σ n + I) 2 Σ /2 e i as the ith diagonal elements of the matrices n Σ /2 n m F /z)σ n + I) Σ /2 n, Σ /2 n m F /z)σ n + I) 2 Σ /2 n, respectively. The population covariance matrix Σ n satisfies lim n n lim n n n κ i z )κ i z 2 ) = h z, z 2 ) i= n κ i z)χ i z) = h 2 z) i= then + g x)dg n x),..., + g k x)dg n x) D X g,..., X gk ), where X g,..., X gk ) is a Gaussian vector with mean EX g ) = c gz) z 2 c + EX4 ) 3 and covariance function CovX g, X g2 ) = 2π 2 EX4 ) 3 4π 2 + t 2 m F /z)) 3 +tm F dht) /z)) 3 + t 2 m F /z)) 2 +tm F dht) /z)) 2 gz) cm F /z)) 3 h 2 z) z 2 + t c 2 m F /z)) 2 gz )gz 2 ) z 2 z2 2 ) 2 dz +tm F dht) /z)) 2 m F /z )m F /z 2) mf /z ) m F /z 2 ) ) 2 dz dz 2 dz 3.) gz )gz 2 ) [ mf /z )m F /z 2 )h z, z 2 ) ] dz dz 2. 3.2) z 2 z2 2 7

The contours in 3.) and 3.2) are both contained in the analytic region for the functions g,..., g k and both enclose the support of P n for sufficiently large n. Moreover, the contours in 3.2) are disjoint. Proof. The proof of this theorem follows combining arguments from the proof of Lemma. of Bai and Silverstein 2004) and Theorem of Pan 204). To be precise, we note first that m F /ny n Yn z) converges almost surely with limit, say m F z). We also observe that the CLT of pm F S + n z) m P z)) is the same as of n z 2 m F /ny n Yn /z) m F /z)) and proceeds in two steps i) Assume first that EX 4 ) = 3. By Lemma. of Bai and Silverstein 2004) we get that the process n z 2 m F /ny n Yn /z) m F /z)) defined on an arbitrary positively oriented contour C which contains the support of P n for sufficiently larger n converges weakly to a Gaussian process with mean and covariance z 2 z 2 2 c z 2 c + t 2 m F /z)) 3 +tm F dht) /z)) 3 + ) t 2 m F /z)) 2 2 +tm F dht) /z)) 2 m F /z )m F /z 2) m F /z 2 ) m F /z )) 2 z z 2 ) 2. In order to obtain the assertion of Theorem 3. we use again the argument made at the beginning of the proof and the following identity gx)dg n x) = gz)m Gn z)dz, 3.3) which is valid with probability one for any analytic function g defined on an open set containing the support of G n if n is sufficiently large. The complex integral on the RHS of 3.3) is over certain positively oriented contour enclosing G on which the function g is analytic [see the discussion in Bai and Silverstein 2004) following Lemma.)]. Further, following the proof of Theorem. in Bai and Silverstein 2004) we obtain that g x)dg n x),..., g k x)dg n x) ) converges weakly to a Gaussian vector X g,..., X gk ). ii) In the case EX) 4 3 we will use a result proved in Pan and Zhou 2008), more precisely Theorem.4 of this reference. Here we find out that in the case EX) 4 3 there appears an additional summand in the asymptotic mean and covariance which involve the limiting functions h z, z 2 ) and h 2 z) from assumption v), namely for the mean we obtain the additional term EX) 4 3 cm F /z)) 3 h 2 z) z 2 + t c 2 m F /z)) 2 and for the covariance EX 4 ) 3 z 2 z 2 2 +tm F dht) /z)) 2 [m F /z )m F /z 2 )h z, z 2 )]. 8

These new summands arise by studying the limits of products of EX 4 ) 3 and the diagonal elements of the matrix S n /zi) [see, Pan and Zhou 2008), proof of Theorem.4]. Imposing the assumption v) we basically assure that these limits are the same. The assertion is now obtained by the same arguments as given in part i) of this proof. A version of Theorem 3. has been proved in Zheng et al. 203) for the usual inverse of the sample covariance matrix S n. It is also not hard to verify the CLT for linear spectral statistics of S n in the case c < with the same limit distribution. Theorem 3. shows that in the case c there appear additional terms in the asymptotic mean and covariance of the linear spectral statistics corresponding to S + n. In general, as we can see, the results on the limiting spectral distribution as well as the CLT for linear spectral statistics of S + n follow more or less from the already known findings which correspond to the matrix /ny ny n. In the next section we will show that the general CLT for LSS of the random matrix S + n is different from that of S + n. 3.2 CLT for linear spectral statistics of S + n. The goal of this section is to show that the CLT for linear spectral statistics of the Moore- Penrose inverse S + n differs from that of S + n. In order to show why these two CLTs are different consider again the identity 2.3), that is S + n = S n ȳȳ ) + = S + n S+ n ȳȳ S + n ) 2 + S + n ) 2 ȳȳ S + n ) ȳ S + n ) 2 ȳ + ȳ S + n ) 3 ȳ ȳ S + n ) 2 ȳ) 2 S+ n ȳȳ S + n. We emphasize here the difference to the well known Sherman-Morrison identity for the inverse of a non-singular matrix S n, that is S n = S n ȳȳ ) = S n + S n ȳȳ S n ȳ S ȳ [see, Sherman and Morrison 950)]. Note that it follows from the identity n 3.4) ȳ S + n ȳ = n 2 Y / n)y/ny Y) 2 / n)y Y = n = 0, that the right-hand side of the formula 3.4) is not defined for the Moore-Penrose inverse in the case p > n. Now consider the resolvent of S + n, namely ) ) Rz) = S+ n zi = [S n ȳȳ ] + zi = = S + n S+ n ȳȳ S + n ) 2 + S + n ) 2 ȳȳ S + n ) ȳ S + n ) 2 ȳ Az) u v vv + u u ww ), ) + ȳ S + n ) 3 ȳ ȳ S + n ) 2 ȳ) 2 S+ n ȳȳ S + n zi where we use 2.4) and the notations u = S + n ȳ, v = S+ n ) 2 ȳ and Az) = S + n zi and w = [ ] u /2 v u v u u 9 [ ] u /2 u u v

to obtain the last identity. A twofold application of the Sherman-Morrison formula yields the representation Rz) = Az) ) Az) u v vv u v vv ) ww u u Az) u v vv ) + u u w Az) u v vv ) w Az) u v vv ) ww u u Az) u v vv ) = A z) + u v A z)vv A z) u v v A z)v Taking the trace of both sides of 3.5) we obtain the identity pm F S+ n z) = trrz)) = pm F S + n z) + v A 2 z)v w u v v A z)v + u u w Az) u v vv ) w. 3.5) A z) + A z)vv A z) u v v A z)v u u + w A z)w + w A z)v) 2 u v v A z)v ) 2 w, 3.6) which indicates that the CLTs for linear spectral statistics of Moore-Penrose sample covariance matrices of S + n and S + n might differ. In fact, the following result shows that the last two terms on the right-hand side of 3.6) are asymptotically not negligible. Theorem 3.2. Let G n x) = pf S + n g) P ng)) and suppose that the assumptions of Theorem 3. are satisfied, then + g x)d G n x),..., + g k x)d G n x) ) D X g,..., X gk ), where X g,..., X gk ) is a Gaussian vector with mean EX g ) = gz) z 2 and covariance function c + t 2 m F /z)) 3 +tm F /z)) dht) 3 + c t 2 m F /z)) 2 +tm F /z)) dht) ) 2 2 dz EX4 ) 3 gz)cm F /z)) 3 h 2 z) + t c 2 m F /z)) 2 +tm F /z)) dht) 2 gz) m F /z) z 2 dz 3.7) m F /z) CovX g, X g2 ) = 2π 2 EX4 ) 3 4π 2 gz )gz 2 ) z 2 z2 2 m F /z )m F /z 2) mf /z 2 ) ) 2 dz dz 2 gz )gz 2 ) [ mf /z )m F /z 2 )h z, z 2 ) ] dz dz 2. 3.8) z 2 z2 2 The contours in 3.7) and 3.8) are both contained in the analytic region for the functions g,..., g k and both enclose the support of P n for sufficiently large n. Moreover, the contours in 3.8) are disjoint. Before we provide the proof of this result we emphasize the existence of the extra summand in the asymptotic mean. It has a very simple structure and can be calculated without much effort in practice. Indeed, the last integral in 3.7) can be rewritten using integration by parts in the following way see, Section 4. for detailed derivation) gz) m F /z) z 2 m F /z) dz = π b a g x) arg[m F /x)]dx, 3.9) 0 dz

where m F /x) lim z x m F /z) for x R and the interval a, b) contains the support of P. On the other hand, the asymptotic variances of the linear spectral statistics for centered and non-centered sample covariance matrices coincide. For a discussion of assumption v) we refer to Pan 204), Remark 2, and other references therein. Proof of Theorem 3.2. The proof of Theorem 3.2 is based on the Stieltjes transform method and consists of a combination of arguments similar to those given by Bai and Silverstein 2004) and Pan 204). First, by the analyticity of the functions g,..., g k and 3.3) it is sufficient to consider the Stieltjes transforms of the spectral e.d.f. of sample covariance matrices. Furthermore, recall from 3.6) that the Stieltjes transform of the e.d.f. of S + n can be decomposed as the sum of the Stieltjes transform of the e.d.f. of S + n and the additional term ξ n z) = v A 2 z)v w u v v A z)v involving sample mean ȳ and S + n. A z) + A z)vv A z) u v v A z)v ) 2w u u + w A z)w + w A z)v) 2 u v v A z)v 3.0) Thus, it is sufficient to show that this random variable converges almost surely on C + = {z C : Iz > 0} to a nonrandom quantity as p/n c > and to determine its limit. As a result, Theorem 3.2 follows from Slutsky s theorem, the continuous mapping theorem and the results in Bai and Silverstein 2004) and Pan and Zhou 2008). It is shown in Section 4.2 that the function ξ n in 3.0) can be represented as ξ n z) = z ȳ ȳ + 2zθ n z) + z 2 θ nz) + zȳ ȳ + z 2 θ n z) where the functions θ n is given by θ n z) = z ȳ ȳ + z, 3.) n n/ny ny n ) zi) n. 3.2) As a consequence, the asymptotic properties of ξ n can be obtained analyzing the quantity η n z) = n n/ny ny n ) zi) n = z tr [ /ny ny n )/ny ny n /zi) /n n n = z z 2 tr [ /ny ny n /zi) Θ n ], where we use the notation Θ n = /n n n. It now follows from Theorem in Rubio and Mestre 20) that tr [ /ny ny n /zi) Θ n ] xn /z) 0 a.s., where x n /z) is a unique solution in C + of the equation ] + /zx n /z) x n /z) = c p tr ) x n /z)i + Σ n. Note that tr Θ n ) = and that Theorem in Rubio and Mestre 20) is originally proven assuming the existence of moments of order 8 + δ. However, it is shown in Bodnar et al. 205) that only the existence of moments of order 4 + δ is required for this statement.

In order to see how x n /z) relates to m F /z) we note that due to assumption iii) H D n H as n and, thus, x n /z) x/z). This implies x n /z) + /z = c p tr ) x n /z)i + Σ n = c which leads to τdh n τ) x n /z)τ + c τ ) x/z) = /z c x/z)τ + dhτ). τdhτ) x/z)τ +, The last equation is the well-known Marchenko-Pastur equation for the Stieltjes transformation m F /z) of the limiting distribution F. Because the solution of this equation is unique we obtain x/z) = m F /z). As a result we get the following asymptotics for + zȳ ȳ + z 2 θ n z) as n which ensures that ξ n z) z + zȳ ȳ + z 2 θ n z) m F /z) z m F /z) z m F /z) z ) = z z m F /z) m F /z) z 3 z 2 m F /z) a.s., = z 2 m F /z) m F /z) a.s. 3.3) for p/n c, + ) as n. The assertion of Theorem 3.2 now follows taking into account the argument made at the beginning of the proof and 3.3). 4 Appendix 4. Derivation of 3.9) We select the contour to be a rectangle with sides parallel to the axes. It intersects the real axis at the points a 0 and b such that the interval a, b) contains the support of P. The horizontal sides are taken as a distance y 0 > 0 from the real axis. More precisely, the contour C is given by C = { a+iy : y y 0 } { x+iy0 : x [a, b] } { b+iy : y y 0 } { x iy0 : x [a, b] } 4.) so that a, b) contains [ 0, lim sup n λ max Σ n )/ c) 2] and is enclosed in the analytic region of function g. We calculate the four parts of the contour integral and then let y 0 tend to zero. First, we note that d dz logm F /z)) = m F /z) z 2 m F /z). Then using integration by parts the last integral in 3.7) becomes gz) m F /z) z 2 m F /z) dz = gz)dlog m F /z)) = g z) log m F /z)dz = g z) log m F /z) + i argm F /z))) dz, 4.2) 2

where any branch of the logarithm may be taken. Naturally extending the Stieltjes transform on the negative imaginary axis and using z = x + iy we get m F /z) = df λ) λ /z df λ) λ /z = df λ) x 2 + y 2. 4.3) λ x ) z 2 + y2 y 2 z 4 Next we note that any portion of the integral 4.2) which involves the vertical side can be neglected. Indeed, using 4.3), the fact that g z) K and 5.) in Bai and Silverstein 2004) for the left vertical side we have y 0 g a + iy) log m F /a + iy))dy K 2π K 2π y 0 y 0 y 0 y 0 y 0 y 0 log mf /a + iy)) + argmf /a + iy))) ) dy log a2 + y 2 y ) + π dy = K log a2 + y 2 dy + Ky 0 π y 0 = K ) y 0 log y2 0 + a 2 y 0 + 2a arctany 0 /a) + Ky 0, 4.4) π y 0 which converges to zero as y 0 0. A similar argument can be used for the right vertical side. Consequently, only the remaining terms b I [g x + iy 0 )] log m F /x + iy 0 )) dx b R [g x + iy 0 )] arg [ m F /x + iy 0 )) ] dx 2π 2π a a 2π b a I [g x iy 0 )] log m F /x iy 0 )) dx b R [g x iy 0 )] arg [ m F /x iy 0 )) ] dx 2π a have to be considered in the limit of the integral 4.2). Similarly, using the fact that sup Ihx + iy) K y. 4.5) x [a,b] for any real-valued analytic function h on the bounded interval [a, b] [see equation 5.6) in Bai and Silverstein 2004)], we obtain that the first and the third integrals are bounded in absolute value by Oy 0 log y0 ) and, thus, can be neglected. As a result the dominated convergence theorem leads to which proves 3.9). g z) log m F /z) + i argm F /z))) dz b g x) arg [ m F /x) ] dx, 4.6) y0 0 π a 3

4.2 Proof of the representations 3.) and 3.2) For a proof of 3.) we introduce the notations a = Then u u) /2 w = b a, and we obtain for 3.0) ξ n z) = a A 2 z)a a A z)a u v) /2 v, b = u v) /2 u u u. b a) A 2 z) + A 2 z)aa A z) + A z)aa A 2 z) a A + a A 2 z)aa z)aa A z)) z)a a A z)a) 2 A tedious but straightforward calculation now gives + b a) A z)b a) + b a) A z)a) 2 a A z)a ) b a). ξ n z) = a A 2 z)a + a A 2 z)ab a) A z)b a) + a A 2 z)ab a) A z)a) 2 a A z)a a A z)a + a A z)a)b a) A z)b a) + b a) A z)a) 2 a A z)a)b a) A 2 z)b a) a A z)a + a A z)a)b a) A z)b a) + b a) A z)a) 2 2b a) A 2 z)ab a) A z)a + a A 2 z)ab a) A z)a) 2 a A z)a a A z)a + a A z)a)b a) A z)b a) + b a) A z)a) 2 = + a A 2 z)a[b A z)b] b A 2 z)b + 2a A 2 z)b + b A z)b 2a A z)b a A z)a[b A z)b] + a A z)b) 2 a A z)a[b A 2 z)b] 2a A 2 z)b[a A z)b] + b A z)b 2a A z)b a A z)a[b A z)b] + a A z)b) 2 = a A 2 z)a[b A z)b] b A 2 z)b[ a A z)a] + 2a A 2 z)b[ a A z)b] [a A z)b ] 2 + b A z)b[ a A z)a]. 4.7) Now note that ξ n z) is a non-linear function of the quantities a A z)a, a A 2 z)a, b A z)b, b A 2 z)b, a A z)b and a A 2 z)b, which can easily be expressed in term of ȳ and S + n. For example, we have a A z)a = = = = = = ȳȳ S + ȳ S + n ) 3 n ) 2 S + n zi) S + n ) 2 ȳ ȳ S + ȳ S + n ) 3 n S + n zi + zi)s + n zi) S + n ) 2 ȳ ) ȳ ȳ S + ȳ S + n ) 3 n ) 3 ȳ + zȳ S + n zi + zi)s + n zi) S + n ) 2 ȳ ) ȳ ȳ S + ȳ S + n ) 3 n ) 3 ȳ + zȳ S + n ) 2 ȳ + z 2 ȳ S + n zi) S + n zi + zi)s + n ȳ ȳ) ȳ S + ȳ S + n ) 3 n ) 3 ȳ + zȳ S + n ) 2 ȳ + z 2 ȳ S + n ȳ ȳ + z3 ȳ S + n zi) S + n zi + zi)ȳ ) ȳ S + ȳ S + n ) 3 n ) 3 ȳ + zȳ S + n ) 2 ȳ + z 2 + z 3 ȳ ȳ + z 4 ȳ S + n zi) ȳ ), ȳ 4

where the last equality follows from the fact that ȳ S + n ȳ =. In a similar way, namely adding and subtracting zi from S + n in a sequel we obtain representations for the remaining quantities of interest a A 2 z)a = S + ȳ S + n ) 3 n ) 2 ȳ + 2z + 3z 2 ȳ ȳ + 4z 3 ȳ S + n zi) ȳ + z 4 ȳ S + n zi) 2 ȳ ) ȳ b A z)b = ȳ S + n ) 3 ȳ + zȳ ȳ + z 2 ȳ S + ȳ S + n ) 2 ȳ) 2 n zi) ȳ ) b A 2 z)b = a A z)b = a A 2 z)b = ȳ S + n ) 3 ȳ ȳ ȳ + 2zȳ S + ȳ S + n ) 2 ȳ) 2 n zi) ȳ + z 2 ȳ S + n zi) 2 ȳ ) ȳ S + ȳ S + n ) 2 n ) 2 ȳ + z + z 2 ȳ ȳ + z 3 ȳ S + n zi) ȳ ) ȳ + 2zȳ ȳ + 3z 2 ȳ S + ȳ S + n ) 2 n zi) ȳ + z 3 ȳ S + n zi) 2 ȳ ). ȳ In the next step we substitute these results in 4.7). More precisely introducing the notations θ n z) = ȳ S + n zi) ȳ, α n = /ȳ S + n ) 2 ȳ, we have θ nz) = z θ nz) = ȳ S + n zi) 2 ȳ and obtain for ξ n z) the following representation ξ n z) = + αnα 2 n + 2z + 3z 2 ȳ ȳ + 4z 3 θ n z) + z 4 θ nz)) + zȳ ȳ + z 2 θ n z)) αnz 2 2 + zȳ ȳ + z 2 θ n z)) 2 αnzα 2 n + z + z 2 ȳ ȳ + z 3 θ n z)) + zȳ ȳ + z 2 θ n z)) αnzȳ 2 ȳ + 2zθ n z) + z 2 θ nz))α n + z + z 2 ȳ ȳ + z 3 θ n z)) αnz 2 2 + zȳ ȳ + z 2 θ n z)) 2 αnzα 2 n + z + z 2 ȳ ȳ + z 3 θ n z)) + zȳ ȳ + z 2 θ n z)) 2αnz 2 + 2zȳ ȳ + 3z 2 θ n z) + z 3 θ nz)) + zȳ ȳ + z 2 θ n z)) αnz 2 2 + zȳ ȳ + z 2 θ n z)) 2 αnzα 2 n + z + z 2 ȳ ȳ + z 3 θ n z)) + zȳ ȳ + z 2 θ n z)) = α2 n + zȳ ȳ + z 2 θ n z))αn z 2 ȳ ȳ + 2zθ n z) + z 2 θ nz))) ) α n z + zȳ ȳ + z 2 θ n z)) + α2 nzȳ ȳ + 2zθ n z) + z 2 θ nz))α n + z + zȳ ȳ + z 2 θ n z))) α n z + zȳ ȳ + z 2. θ n z)) In order to simplify the following calculations we introduce the quantities ψ n z) = + zȳ ȳ + z 2 θ n z); ψ nz) = ȳ ȳ + 2zθ n z) + z 2 θ nz), which lead to ξ n z) = α2 nψ n z)αn = α2 nαn ψ n z) + zψ nz)) α n zψ n z) z 2 ψ nz)) + αnzψ 2 nz)α n α n zψ n z) = ψ nz) + zψ nz) zψ n z) = z ψ nz) ψ n z) = z ȳ ȳ + 2zθ n z) + z 2 θ nz). + zȳ ȳ + z 2 θ n z) + zψ n z)) Finally, we derive the representation 3.2) θ n z) using Woodbary matrix inversion lemma [see, 5

e.g., Horn and Johnsohn 985)]: θ n z) = ȳ S + n zi) ȳ = ȳ /ny n /ny ny n ) 2 Y n zi ) ȳ = ȳ z I z /ny ny ny 2 n ) I ) z Y ny n ) ) Y ny n ) Y n ȳ = z ȳ ȳ z 2 ȳ /ny n /ny ny n ) I z /ny ny n ) ) /ny ny n ) Y nȳ = z ȳ ȳ z 2 n n/ny ny n )/ny ny n ) I z /ny ny n ) ) /ny ny n ) /ny ny n ) n = z ȳ ȳ z 2 n ni z /ny ny n ) ) n = z ȳ ȳ + z n n/ny ny n ) zi) n. Acknowledgements. The authors would like to thank M. Stein who typed this manuscript with considerable technical expertise. The work of H. Dette was supported by the DFG Research Unit 735, DE 502/26-2 References Bai, Z. D., Miao, B. Q., and Pan, G. M. 2007). On asymptotics of eigenvectors of large sample covariance matrix. The Annals of Probability, 35:532 572. Bai, Z. D. and Silverstein, J. W. 2004). CLT for linear spectral statistics of large dimensional sample covariance matrices. Annals of Probability, 32:553 605. Bai, Z. D. and Silverstein, J. W. 200). Matrices. Springer, New York. Spectral Analysis of Large Dimensional Random Bodnar, T., Gupta, A. K., and Parolya, N. 205). Direct shrinkage estimation of large dimensional precision matrix. Journal of Multivariate Analysis, page to appear. Horn, R. A. and Johnsohn, C. R. 985). Matrix Analysis. Cambridge University Press, Cambridge. Hoyle, D. C. 20). Accuracy of pseudo-inverse covariance learning - A random matrix theory analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33:470 48. Kubokawa, T. and Srivastava, M. 2008). Estimation of the precision matrix of a singular wishart distribution and its application in high-dimensional data. Journal of Multivariate Analysis, 99:906 928. Marčenko, V. A. and Pastur, L. A. 967). Distribution of eigenvalues for some sets of random matrices. Sbornik: Mathematics, :457 483. Meyer, C. R. 973). Gemeralized inversion of modified matrices. SIAM Journal of Applied Mathematics, 24:35 323. Pan, G. M. 204). Comparison between two types of large sample covariance matrices. Ann. Inst. H. Poincaré Probab. Statist., 50:655 677. Pan, G. M. and Zhou, W. 2008). Central limit theorem for signal-to-interference ratio of reduced rank linear receiver. Annals of Applied Probability, 8:232 270. 6

Rubio, F. and Mestre, X. 20). Spectral convergence for a general class of random matrices. Statistics & Probability Letters, 8:592 602. Sherman, J. and Morrison, W. J. 950). Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Annals of Mathematical Statistics, 2:24 27. Silverstein, J. W. 995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. Journal of Multivariate Analysis, 55:33 339. Srivastava, M. 2007). Multivariate theory for analyzing high dimensional data. Journal of The Japan Statistical Society, 37:53 86. Zheng, Z., Bai, Z. D., and Yao, J. 203). CLT for linear spectral statistics of random matrix S T. Working paper. 7