CLT for linear spectral statistics of large dimensional sample covariance matrices with dependent data

Similar documents
On corrections of classical multivariate tests for high-dimensional data

arxiv: v1 [math.pr] 13 Aug 2007

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix

Large sample covariance matrices and the T 2 statistic

An Introduction to Large Sample covariance matrices and Their Applications

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new method to bound rate of convergence

Non white sample covariance matrices.

Weiming Li and Jianfeng Yao

TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Convergence in Distribution

Notes on Random Vectors and Multivariate Normal

Empirical Likelihood Tests for High-dimensional Data

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

Asymptotic Statistics-III. Changliang Zou

Lecture I: Asymptotics for large GUE random matrices

Preface to the Second Edition...vii Preface to the First Edition... ix

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

The Multivariate Gaussian Distribution

Fluctuations from the Semicircle Law Lecture 4

Random Matrices and Multivariate Statistical Analysis

Convergence of empirical spectral distributions of large dimensional quaternion sample covariance matrices

Some functional (Hölderian) limit theorems and their applications (II)

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Asymptotic Statistics-VI. Changliang Zou

arxiv: v2 [math.pr] 2 Nov 2009

Limiting Distributions

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

1 Math 241A-B Homework Problem List for F2015 and W2016

4 Sums of Independent Random Variables

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

2. Matrix Algebra and Random Vectors

LARGE DIMENSIONAL RANDOM MATRIX THEORY FOR SIGNAL DETECTION AND ESTIMATION IN ARRAY PROCESSING

5: MULTIVARATE STATIONARY PROCESSES

Random matrices: Distribution of the least singular value (via Property Testing)

arxiv: v2 [math.pr] 16 Aug 2014

STA205 Probability: Week 8 R. Wolpert

arxiv: v2 [math.st] 22 Nov 2013

Inhomogeneous circular laws for random matrices with non-identically distributed entries

International Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994

DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES

ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION

Independence Test For High Dimensional Random Vectors

3. Probability and Statistics

A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices

Convergence rates of spectral distributions of large sample covariance matrices

The Dynamics of Learning: A Random Matrix Approach

An introduction to G-estimation with sample covariance matrices

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Lecture 21: Convergence of transformations and generating a random variable

Grouped Network Vector Autoregression

Gaussian vectors and central limit theorem

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Orthogonal decompositions in growth curve models

A Bootstrap Test for Conditional Symmetry

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Stein s Method and Characteristic Functions

high-dimensional inference robust to the lack of model sparsity

7 Convergence in R d and in Metric Spaces

Large Sample Properties of Estimators in the Classical Linear Regression Model

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Spectral representations and ergodic theorems for stationary stochastic processes

Statistica Sinica Preprint No: SS R2

Applications of random matrix theory to principal component analysis(pca)

Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond

Concentration Inequalities for Random Matrices

Probability and Measure

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Gaussian processes. Basic Properties VAG002-

Covariance estimation using random matrix theory

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Multivariate Analysis and Likelihood Inference

Local semicircle law, Wegner estimate and level repulsion for Wigner random matrices

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

Limiting Distributions

Practical conditions on Markov chains for weak convergence of tail empirical processes

Random Matrix Theory and its Applications to Econometrics

Wigner s semicircle law

Random regular digraphs: singularity and spectrum

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Explicit evaluation of the transmission factor T 1. Part I: For small dead-time ratios. by Jorg W. MUller

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

1 Intro to RMT (Gene)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

Transcription:

arxiv:708.03749v [math.pr] 2 Aug 207 CLT for linear spectral statistics of large dimensional sample covariance matrices with dependent data Shurong Zheng, Zhidong Bai, Jianfeng Yao and Hongtu Zhu July 3, 206 Abstract This paper investigates the central limit theorem for linear spectral statistics of high dimensional sample covariance matrices of the form B n = n n Qx jx j Q whereqis anonrandommatrixof dimensionp k, and{x j }isasequenceofindependent k-dimensional random vector with independent entries, under the assumption that p/n y > 0. A key novelty here is that the dimension k p can be arbitrary, possibly infinity. This new model of sample covariance matrices B n covers most of the known models as its special cases. For example, standard sample covariance matrices are obtained with k = p and Q = T /2 n for some positive definite Hermitian matrix T n. Also with k = our model covers the case of repeated linear processes considered in recent high-dimensional time series literature. The CLT found in this paper substantially generalizes the seminal CLT in Bai and Silverstein 2004. Applications of this new CLT are proposed for testing the structure of a high-dimensional covariance matrix. The derived tests are then used to analyse a large fmri data set regarding its temporary correlation structure. AMS 2000 subject Classifications: Primary 62H5; secondary 60F05,5B52. Key word: Large sample covariance matrices, linear spectral statistics, central limit theorem, high-dimensional time series, high-dimensional dependent data. Shurong Zheng and Zhidong Bai are professors in School of Mathematics & Statistics and KLAS, Northeast Normal University, China; Jianfeng Yao is professor with the Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong; Hongtu Zhu is professor at the University of North Caroline at Chapel Hill, USA.

Introduction Sample covariance matrices are of central importance in multivariate and high-dimensional data analysis. They have prominent applications in various big-data fields such as wireless communication networks, social networks and signal processing, see for instance the recent survey papers Couillet and Merouane 203, Johnstone 2007 and Paul and Aue 204. Many statistical tools in these area depend on the so-called linear spectral statistics LSS of a sample covariance matrix, say B n of size p p with eigenvalues {λ i } p i=, which have the form p p i= fλ i where f is a given function. Two main questions arise for such LSS, namely i determining the point limit of a limiting spectral distribution LSD, say G, such that LSS converges to Gf = fxdgx in an appropriate sense and for a wide family of functions f; and ii characterizing the fluctuations p p i= fλ i Gf in terms of an appropriate central limit theorem CLT. Both questions have a long history and continue to receive considerable attention in recent years. As for LSDs, the question has been extensively studied in the literature starting from the classical work of Marčenko and Pastur 967, and contitued in Silverstein 995 and Wachter 980, and found many recent developments Bai and Zhou, 2008; Yao, 202; Banna and Merlevède, 205, among others. As for CLTs for linear spectral statistics, a CLT for trb n,, trb l n is established in Jonsson 982 for a sequence of Wishart matrices {B n } with Gaussian variables, where l is a fixed number, and the dimension p of the matrices grows proportionally to the sample size n. By employing a general method based on Stieltjes transform, Bai and Silverstein 2004 provides a CLT for LSS of large dimensional sample covariance matrices from a general population, that is, not necessarily Gaussian along with an arbitrary population covariance matrix. A distinguished feature of this CLT is that its centering term and limiting means and covariance functions are all fully characterized. This celebrated paper has been subsequently improved in several follow-up papers including Pan and Zhou 2008, Pan 202, Najim and Yao 203 and Zheng, Bai and Yao 205. These CLTs have found successful applications in high-dimensional statistics by solving notably important problems in parameter estimation or hypothesis testing with high-dimensional data. Examples include Bai et al. 2009, Wang and Yao 203 and Liu et al. 204 for hypothesis testing on covariance matrix; Bai et al. 203 for testing on regression coefficients, and Jiang, Bai and Zheng 203 for testing indepedence between two large sets of variables, among others. The aim of this paper is to establish a new CLT for a much extended class of sample covariance matrices. This new development represents a novel and major extension of the 2

CLT of Bai and Silverstein 2004 and its recent follow-up version. Specifically, in this paper, we consider samples {y j } j n of the form y j = Qx j where M {x j = x j,...,x kj T, j n}isasequence ofindependent k-dimensionalrandom vectorswithindependent standardizedcomponentsx ij, i.e. Ex ij = 0andE x ij 2 =, and the dimension k p, possibly k = for example, each x j is a white noise made with an infinite sequence of standardized errors; M2 Q is a p k matrix with arbitrary entries. The sample covariance matrix in our setting is then given by B n = n y j yj = n QX n X nq,. where X n = x,...,x n is an k n matrix with independent entries and stands for the transpose and complex conjugate of matrices or vectors. In contrast, classical sample covariance matrices as in Bai and Silverstein 2004 consider samples of the form y j = C /2 n x j, where O {x j = x j,...,x pj T, j n} is a sequence of iid p-dimensional random vectors with independent standardized components x ij, i.e. Ex ij = 0 and E x ij 2 =. O2 C /2 n is a p p positive definite Hermitian matrix. The sample covariance matrix of interest thus takes form S n = n i= y j y j = n C /2 n X n X nc /2 n,.2 where X n = x,...,x n is a p n data matrix with independent entries. The main innovation in the model. is that the mixing matrix Q can have an infinite number of columns, and this will allow to cover dependent samples. For example, the repeated linear process {y ij = t= b tx i t,j,i =,...,p,j =,...,n} is a special case of. with. b p b 2 p... b 0 b b 2... y j = Qx j =....... x p,j b b 0... b p 2 b p b p..... b 0 b... b p b p b p+... }{{} Q is p dimensional 3 x,j. }{{} x j is dimensional

Such data structure often arises in panel surveys or longitudinal studies, in which respondents or subjects are interviewed/observed during a certain time periodsee Wong and Miller 990; Wong, Miller and Shrestha 200. Moreover, if indeed k = p, then.2 of Bai and Silverstein 2004 is a special case of our novel extension. In this case, although the entries of p p matrix Q are arbitrary and Q is not required to be Hermitian or positive definite, our setup is in fact strictly equivalent to Bai and Silverstein 2004 due to the polar decomposition Q = U n T /2 n, in which U n is unitary and T /2 n is Hermitian and nonnegative definite. Consequently, B n = U n S n U n where S n is equivalent to the sample covariance matrix in.2 has been considered in Bai and Silverstein 2004. Therefore, B n and S n have exactly the same spectrum of eigenvalues and their LSS are identical. However, when k tends to infinity with a higher order than p, the new model in. represents a non-trivial extension of the framework of Bai and Silverstein 2004. We will derive the LSD of the sample covariance matrix B n in. and the corresponding CLT for its linear spectral statistics, thus extending both the results of Bai and Silverstein 2004 and Silverstein 995 into the new framework. As it will be seen below, establishing these extensions requires several novel ideas and major techniques in order to tackle with a mixing matrix Q that can be infinite and with arbitrary entries. The rest of the paper is organized as follows. In Section 2, we derive the LSD of the sample covariancematrixb n under suitable conditions. InSection3, wederive thecltfor linear spectral statistics of B n under some reinforced conditions. In Section 4, we present two applications of these theoretical results. Section 5 collects the proofs for the main theorems of the paper. Finally, some technical lemmas and auxiliary results used in the proofs of Section 5 are postponed to the appendices. 2 LSD of the sample covariance matrix B n This section aims at the derivation of the LSD for the sample covariance matrix B n in. from sample y j s of dependent data. Recall that the empirical spectral distribution ESD of a p p square matrix A is the probability measure F A = p p i= δ λ i, where the λ i s are eigenvalues of A and δ a denotes the Dirac mass at point a. For any probability measure F on the real line, its Stieltjes transform is defined by mz = t z dfx, z C+ = {z C : Iz > 0}, where C denotes the complex plane and I the imaginary part. The assumptions needed for the derivation of the LSD of B n are as follows. 4

Assumption a Samples are {y j = Qx j,j =,...,n}, where Q is p k, x j is k, x j = x j,...,x kj T, and the dimension k is arbitrary possibly infinite. Moreover, {x ij,i =,...,k,j =,...,n} is a k n array of independent random variables, not necessarily identically distributed, with common moments Ex ij = 0, E x 2 ij =, and satisfying the following Lindeberg-type condition: for each η > 0, pnη 2 k i= q i 2 E x 2 ij x I ij > η n/ q i 0, where q i is the Euclidean norm k is finite or the l 2 norm k = of the i-th column vector q i of Q. Assumption b With probability one, the ESD H n of the population covariance matrix T n = QQ converges weakly to a probability distribution H. Also the sequence T n n is bounded in spectral norm. Assumption c Both p and n tend to infinity such that y n = p/n y > 0 as n. The Lindeberg-type condition in Assumption a is a classical moment condition of second order. It is automatically satisfied if the entries {x ij } are identically distributed. This condition is here used to tackle with possibly non identically distributed entries and ensures a suitable truncation of the variables. Assumption b is also a standard condition on the convergence of the population spectral distribution and requires the weak convergence of the ESD of the population covairance matrix. Assumption c defines the asymptotic regime following the seminal work of Marčenko and Pastur 967. As the first main result of the paper, we derive the LSD of B n. Theorem 2.. Under Assumptions a, b and c, almost surely the ESD F Bn of B n weakly converges to a non-random LSD F y,h. Moreover, the LSD F y,h is determined through its Stieltjes transform mz, which is the unique solution to the following Marčenko- Pastur equation mz = on the set {mz C : y/z +ymz C + }. dht, 2. t[ y yzmz] z Theorem 2. has several important implications. The LSD F y,h is exactly the generalized Marčenko-Pastur distribution with parameters y, H as described in Silverstein 5

995. The key novelty of Theorem 2. consists of an extension of this LSD into a much more general setting on matrix entries as defined in Assumptions a-b. Therefore, Theorem 2. includes many existing results related to LSDs of sample covariance matrices from dependent data as special cases, e.g. Bai and Zhou 2008; Jin et al. 204; Jin, Wang and Miao 2009; Wang, Jin and Miao 20; Yao 202. To the best of our knowledge, although the model considered in Bai and Zhou 2008 is more general than that in this paper, it relies on a strong quadratic form condition, which can be hardly verified in many applications. Further developments on this LSD rely on an equivalent representation of the Marčenko- Pastur law. Namely, define the companion LSD of B n as F y,h = yδ 0 +yf y,h. It is readily checked that F y,h is the LSD of the companion sample covariance matrix B n = n X n Q QX n which is n n, and its Stieltjes transform mz satisfies the socalled Silverstein equation z = mz +y t dht, mz = y +ymz. 2.2 +tmz z The advantage of this equation is that it indeed defines the inverse function of the Stieltjes transform mz. This equation is the key to numerical evaluation of the Stieltjes transform, or the underlying density function of the LSD. Moreover, many analytical properties on LSD can be inferred from this equation see Silverstein and Choi 995. We consider an example to elaborate more on Theorem 2.. Example 2.. Assume that for each j n, {y ij } i is an ARMA, process of the form y ij = φy i,j +θε i,j +ε ij, where {ε ij } is an array of independently and identically distributed random variables with mean zero and variance. The condition φ θ < is assumed for causality and invertibility of this ARMA, process. From 2. or 2.2, it follows that the Stieltjes transform mz of the LSD F y,h satisfies the following equation z = mz + θ yθmz φ φ+θ+φθ ǫα yθmz φ 2 α2 4 with α = ymz+θ2 ++φ 2 yθmz φ 6 and ǫα = sgniα.

In the case of an AR process θ = 0, we have z = mz + [ymz++φ2 ] 2 4φ 2. Similarly, in the case of a MA processs φ = 0, we have z = mz + ymz + y 2 m 2 z {[ymz] ++θ 2 } 2 4θ 2 see Jin, Wang and Miao 2009, Wang, Jin and Miao 20 and Yao 202. Remark. In general, we cannot find a closed-form formula for the Stieltjes transform mz or mz. One has to rely on numerical evaluation using the Silverstein equation 2.2. Notice that the values of mz at points z = x + iε with small positive ε provides an approximation to the value of the density of the LSD at x by the inversion formula. For details on these numerical algorithms, we refer to Dobriban 205. By 2.2 and Silverstein and Choi 995, we obtain mz and the limiting density f y,h of F y,h, which satisfies f y,h x = yπ lim z x+0i Imz. Remark 2. If T n = QQ is the identity matrix I p, then H = δ and we have by 2.2, That is, z = mz + y +mz. mz = z + y+ z y 2 4y 2z and the LSD F y,h of B n is the Marčenko-Pastur law with the density function f y,h = 2πyx [+ y 2 x][x y 2 ] 2.3 and has a point mass /y at the origin if y >. 3 CLT for linear spectral statistics of the sample covariance matrix B n In this section, we establish the corresponding CLT for LSS of the sample covariance matrix B n in. under approriate conditions. This CLT constitutes the second main contribution of this paper. We first introduce the assumptions needed for this CLT. 7

Assumption d The variables {x ij,i =,...,k,j =,...,n} are independent, with common moments Ex ij = 0, E x 2 ij =, β x = E x 4 ij Ex2 ij 2 2, and α x = Ex 2 ij 2, and satisfying the following Lindeberg-type condition: for each η > 0 pnη 6 k i= q i 2 E x 4 ij x I ij > η n/ q i 0. 3. Assumption e Either α x = 0, or the mixing matrix Q is real with arbitrary α x. Assumption f Either β x = 0, or the mixing matrix Q is such that diagonal Q Q with arbitrary β x. Remark 3. Assumption d is stronger than Assumption a in Section 2 since Assumption d assumes the Lindeberg-type condition on the fourth order moments instead of the previous moments of second order. Assumption e means that if variables x ij s are complex-valued, Ex 2 ij = 0 should hold which is the Gaussian-like second moment in Bai and Silverstein 2004. However, if the other real Q condition is imposed, then the Gaussian-like second moment can be removed α x 0. If α x 0 and Q is not real, then the CLT for LSS of B n may not hold. Such counterexample for the case of k = p can be found in Zheng, Bai and Yao 205. As for Assumption f, if the population is Gaussian, then β x = 0 is the Gaussian-like fourth moment. If we impose the condition that the matrix Q Q is real and diagonal, then the Gaussian-like fourth moment condition can be removed. Again if β x 0 and Q Q is not diagonal, the CLT may not hold. Such counterexamples with k = p can be found in Zheng, Bai and Yao 205. Theorem 3.. Under Assumptions b-f, let f,...,f L be L functions analytic on a complex domain containing [I 0<y< y 2 liminf n λ Tn min, + y 2 limsup n λ Tn max ] 3.2 with T n = QQ, and λ Tn min and λtn max denoting its smallest and the largest eigenvalue, respectively. Consider the random vector X p f,...,x p f L with X p f l = p f l λ j pf yn,hn f l, l =,...,L, 3.3 8

in which F yn,hn f l = f l xf yn,hn xdx and {λ j } p are the sample eigenvalues of B n. Then, the random vector X p f,...,x p f L converges to a L-dimensional Gaussian random vector X f,...,x fl with mean function EX fl = 2πi C β x 2πi and variance-covariance function C α x y m 3 zt 2 dht +tmz f l z 3 y m 2 zt 2 dht α +tmz 2 x y m 2 zt 2 m3zt2 y dht mzt+ f l z 3 y dz, m 2 zt 2 dht +tmz 2 +tmz 2 dht CovX fl,x fl = f l z f l z 2 mz mz 2 dz 4π 2 mz mz 2 2 dz 2 z z 2 C C 2 yβ [ ] x t t mz mz 2 f l z 4π 2 f l z 2 mz t+ 2 mz 2 t+ 2dHt dz dz 2 z z 2 C C 2 + [ ] 2 f l z 4π 2 f l z 2 log az,z 2 dz dz 2, 3.4 z z 2 C 2 C where C, C and C 2 are closed contours in the complex plane enclosing the support of the LSD F y,h, and C and C 2 are non-overlapping. Moreover, az,z 2 is given by az,z 2 = α x + mz mz 2 z z 2. mz 2 mz Remark 4. Theorem 3. gives the explicit form for the mean and covariance functions of X fl expressed by contour integrals. For some specific function f l and Q, the mean and covariance functions of X fl can be explicitly obtained. Even if the mean and covariance functions of X fl have no explicit forms, numerical methods can be used since the integrals are at most two-dimensional. If the population is Gaussian, then β x = 0 and the corresponding terms in both the limiting mean and covariance functions vanish. If the population is real, then we have α x = and 2 log az,z 2 z z 2 = mz mz 2. mz mz 2 2 z z 2 Notice that the limiting mean and covariance functions of {X fl } depend on the LSD H of QQ and y only. In many applications, the ESD H n of QQ and the ratio y n = p/n can 9 dz

be used to replace the LSD H and the limit y. But it is worthwhile to mention that when B n is the centered sample covariance matrix B n = n n i= y i ȳy i ȳ T with ȳ = n n i= y i, y n in 3.3 should be replaced by y n. For example, y i N0 p,i p, then the Wishart matrices n n i= y iyi T and n n i= y i ȳy i ȳ T have degrees of freedoms n and n, respectively. For practical use, by Theorem 3. and Remark 2, we will give three corollaries that are useful for computing the variance and covariance of the limiting distribution. Corollary 3.. Under the conditions of Theorem 3. and if the population is real, then the random vector X p f,...,x p f L converges to an L-dimensional Gaussian random vector X f,...,x fl with mean function EX fl = 2πi C f l z y m 3 zt 2 dht +tmz 3 y 2 dz β x m 2 zt 2 2πi dht +tmz 2 C m3zt2 y dht mzt+ f l z 3 y dz, m 2 zt 2 dht +tmz 2 and variance-covariance function CovX fl,x fl = 2π 2 yβ x 4π 2 C C C 2 f l z f l z 2 mz mz 2 2 mz z mz 2 z 2 dz dz 2 [ f l z f l z 2 C 2 ] t t mz mz 2 mz t+ 2 mz 2 t+ 2dHt dz dz 2, z z 2 where C, C and C 2 are closed contours in the complex plane enclosing the support of the LSD F y,h, and C and C 2 are non-overlapping. Corollary 3.2. Under the assumptions of Theorem 3., if the population is real and H n = δ, then we have F y,hn f l = + y 2 f l x [+ y y 2πyx 2 x][x y 2 ]dx+ /ysgny > f l 0, 2 EX fl = f l y 2 +f l + y 2 + y 2 f l x 4 2π dx y 2 4y x y 2 β x ym 3 zmz+ 3 f l z 2πi ym 2 z+mz 2dz, C 0

and CovX fl,x fl = 2π 2 yβ x 4π 2 C C C 2 f l z f l z 2 mz mz 2 dz mz mz 2 2 dz 2 z z 2 f l z mz dz mz + 2 z C f l z 2 mz 2 dz mz 2 + 2 2, z 2 where C, C and C 2 are closed contours in the complex plane enclosing the support [ y 2, + y 2 ] of the LSD F y,h, C and C 2 are non-overlapping, and z = m z + y+mz. If the real samples {y i,i =,...,n} are transformed as {QQ /2 y i,i =,...,n}, then the population covariance matrix of QQ /2 y i is the identity matrix I p. Then H n = δ. By Corollary 3.2 and Remark 4, we give below a corollary useful for highdimensional statistical inference. Corollary 3.3. Under the assumptions of Theorem 3., if the population is such that H n = δ, then we have { tr[qq B n ] p,...,tr[qq B n ] L pf L y n } T Nµ,Σ where B n = n n i= y i Ey i y i Ey i T, µ = µ,...,µ L T and Σ = σ ll L l,l =. Moreover, if B n = n n i= y i ȳy i ȳ T with the sample mean ȳ = n n i= y i, then we have { tr[qq B n ] p,...,tr[qq B n ] L pf L y n } T Nµ,Σ. Here, F l y = + y 2 y 2 x l 2πy [+ y 2 x][x y 2 ]dx, l =,...,L, µ l = y 2l ++ y 2l 4 2 l l +β x sgnl 2 l 2 2 l 2 =2 l 2 l y l l =0 l l 2 l y l+ l 2, l =,...,L, and σ ll =

l 2y l+l +yβ x l l l =0 l 2 =0 l 3 = l l y l l l 3 l 2 l l 3 y l l 3 y l l 3 = l +l 2 l l l 3 =0 l 3 2l l l 3 l l l y l l 3, l,l =,...,L. l 3 l 3 2l l 2 +l 3 l In practice, y may be replaced by y n. Corollary 3.3 can be used for many high dimensional testing problems and its proof is given in the appendix. 4 Applications In this section, we propose two applications of Theorem 3.. The first application is about testing the structure of a high-dimensional covariance matrix. The second application applies such structure testing to the analysis of a large fmri dataset. 4. Testing high dimensional covariance structure Testing the structure of high dimensional covariance matrices is a fundamental problem in multivariate and high-dimensional data analysis Srivastava, 2005; Bai et al., 2009; Chen, Zhang and Zhong, 200; Wang and Yao, 203. Most of this literature consider samples of the form y i = Qx i where the dimension k of the x i s is finite. When k = as for time series observations, these existing results are not applicable anymore for structure testing on high dimensional covariance matrices. We consider two testing problems on high dimensional covariance structure. Let Σ 0 be a given covariance matrix. The first testing problem is given by H 0 : QQ = Σ 0. 4. We consider trσ /2 0 B n Σ /2 0 I p 2 as our test statistic, where B n = n n i= y i ȳy i ȳ T. Because Σ /2 0 QQ Σ /2 0 = I p under H 0, H n = δ Dirac mass at. Therefore, it follows from Corollary 3.3 and the delta method that we have under H 0 2 {trσ 0 B n I p 2 py n β x +y n }/ yn 2 +β x +2yn 3 = N0,. 4.2 The second testing problem we consider is about the hypothesis H 02 : QQ = σ 2 Σ 0. 4.3 2

When Σ 0 = I p, this reduces to the well known sphericity test Wang and Yao, 203. We consider tr{σ /2 0 B n Σ /2 0 /[p trσ 0 B n] I p } 2 as our test statistic. Since σ 2 Σ /2 0 Q Q Σ /2 0 = I p under H 02, again H n = δ. By Corollary 3.3 and the delta method, we have under H 02, 2 {tr[σ 0 B n/p trσ 0 B n I p ] 2 py n β x +y n }/y n = N0,. 4.4 Simulation experiments are done to evaluate the finite sample performance of our test for the null H 02. The test size is set at 5% and empirical sizes and powers are obtained using 5000 independent replications. We draw y i according to the AR2 model y ti = φ y t,i + φ 2 y t 2,i + e ti, where the e ti s are i.i.d. as N0,σ 2. We set n {00,200,300} and p {50,00,200,500,000}. Empirical sizes are evaluated for φ {0.3,0.6}, φ 2 {0.2,0.3}, and Σ 0 = {γ i j } p i, with γ 0 =, γ = φ / φ 2 and γ l = φ γ l +φ 2 γ l 2 for l 2. They are given in Table. To evaluate empirical powers, the null hypothesis has φ = φ 2 = 0.8 and the alternative hypothesis has φ {0.3,0.35} and φ 2 {0.2,0.25}. Table 2 presents these empirical powers. Both tables show a very satisfactory finite-sample performance of the proposed test. Table : Empirical test sizes for H 02 in percentage φ = 0.3 φ = 0.6 φ 2 n p =50 00 200 500 000 p=50 00 200 500 000 0.2 00 5.36 5.38 5.38 5.32 5.54 5.2 5.36 5.92 5.46 4.94 200 5.2 5.46 5.22 5.22 5.80 5.76 5.34 5.32 5.4 4.76 300 5.2 4.82 5.0 5.50 4.90 5.44 5.22 5.36 5.52 5.0 0.3 00 5.72 5.28 5.60 5.22 5.90 4.90 5.76 5.42 5.06 5.28 200 4.60 5.2 5.28 5.02 5.6 5.52 5.68 5.48 5.8 5.20 300 5.04 5.56 5.30 5.24 4.88 5.54 5.64 5.48 5.50 5.56 4.2 Application to a fmri dataset We apply the proposed method to a structure testing problem for the resting-state fmri data from the ADHD-200 sample. The whole data was collected from eight sites of the ADHD-200 consortium. We focus on the data from New York University NYU with the largest number of subjects. Among them, we used the 000 ROI extracted time courses 3

Table 2: Empirical powers for H 02 in percentage φ = 0.3 φ = 0.35 φ 2 n p =50 00 200 500 p=50 00 200 500 0.2 00 59.00 6.24 62.80 64.50 98.54 99.36 99.54 00.00 200 97.98 98.90 99.28 00.00 00.00 00.00 00.00 00.00 300 00.00 00.00 00.00 00.00 00.00 00.00 00.00 00.00 0.25 00 94.46 97.8 97.46 98.3 00.00 00.00 00.00 00.00 200 00.00 00.00 00.00 00.00 00.00 00.00 00.00 00.00 300 00.00 00.00 00.00 00.00 00.00 00.00 00.00 00.00 preprocessed by the Neuroimaging Analysis Kit NIAK Ahn, 205. The data set consists of 70 subjects n = 70, each of which contains 924 Regions Of Interest ROIs. Moreover, each ROI contains a time series with 72 time points. At a given ROI, the fmri observations for the ith subject are treated as y i, a time series of length p = 72. The question of interest here is to test whether the standard autoregressive assumption e.g., AR or AR2 is valid for the processed fmri time series across all ROIs Lindqusit, 2008. Applying results derived in the previous section, we first test the AR structure with the hypothesis H 0 : The time series in a specific ROI is an AR process. 4.5 Let φ be the coefficient of the AR process and Σ φ = φ i j p i,. The auto-covariance matrix of the AR process is given by QQ = σ 2 Σ φ and then we have CovΣ /2 φ y i = σ 2 I p, where σ 2 is the variance. Testing the AR assumption of a given ROI is equivalent to testing the sphericity covariance structure of such ROI. Similar to the arguments of Section 4., we can show that 4.4 still holds here. We detail results for ROI 200 and 500. Since φ takes values in the interval,, we set φ be +0.00, +0.002,..., 0.00 and calculated the P-values of the test 4.5 in Figure. It indicates that both sets of P- values are much smaller than 5%, which yields the rejection of the proposed AR structure in both ROIs. Next we test the AR2 hypothesis H 0 : The time series in a given ROI is an AR2 process. 4.6 Let φ and φ 2 be the auto-regressive coefficients of the AR2 model. Then, the autocovariance matrix of the AR2 model is equal to QQ = σ 2 Σ φ,φ 2, where Σ φ,φ 2 = 4

ROI 200 ROI 500 p values 0.0e+00 4.0e 8.0e.2e 0 p values 0.00000 0.00005 0.0000 0.0005.0 0.5 0.0 0.5.0 φ.0 0.5 0.0 0.5.0 φ Figure : P-values of both ROI 200 and 500 for the testing problem 4.5. γ i j p i, with γ 0 =, γ = φ / φ 2, and γ l = φ γ l + φ 2 γ l 2 for l 2 satisfying φ 2 + φ2 2 < and φ 2 + φ <. Then, we have CovΣ /2 φ,φ 2 y i = σ 2 I p. For a specific ROI, testing the AR2 assumption is again equivalent to testing the sphericity covariance structure; in particular 4.4 still holds here. Similar to the previous test, we calculated the P-values of our test statistic for every φ,φ 2 { +0.00, +0.002,..., 0.00} satisfying φ 2 +φ2 2 < and φ 2 + φ < for ROI 200 and 500. As a result, these P-values from both ROI 200 and 500 are much smaller than the nominal level 5% so that the proposed AR2 structure is clearly rejected. In summary, neither of the AR and AR2 structures is suitable for the resting-state fmri data considered here. 5 Proofs of the main theorems 5. Proof of Theorem 2. 5.. Case where Q has an infinite number of columns Consider a sequence of {k p } that satisfies p p i= l=k p+ q2 il = op 2. For simplicity, we write k for k p. If Q is a p dimensional matrix, then we truncate Q as Q = Q, Q where Q is a p k dimensional matrix and Q = q ij is a p dimensional matrix with i =,...,p,j = k+,...,. Similarly, truncate X n as X n = X n, X n where X n is k n 5

dimensional and X n is n dimensional. Then we have L 4 F n QX nx nq,f n Q X n X n Q 2p 2 n 2 trqx n X n Q + Q X n X n Q tr Q X n X n Q wherel,isthelevydistance, F n QX nx n Q istheesdofn QX n X n Q andf n Q X n X n Q is the ESD of n Q Xn X n Q. We have Note that and = pn E pn p 4 n 4 η6 n p 4 pn tr Q X n X n Q = pn p i= E pn p i= l=k+ p i= l=k+ l=k+ l=k+ l=k+ p i= q l 2 + 3η4 n p 4 p i= q 2 il x 2 lj + pn q 2 ik x 2 lj l=k+ p i= = p q 2 il x2 lj E x2 lj 4 q 2 il p q il x lj 2 i= l=k+ 4 E x 8 lj + 3 p 4 n 4 l=k+ k k 2 q ik q ik2 x k,j x k2,j. l=k+ q 2 il = op 2 p i= q 2 il 2 E x 4 q l 2 2 = op 2. 5. These inequalities simply imply pn p n k i= l= q2 il x2 lj = o a.s.. Furthermore, 2 p E q ik q ik2 x k,j x k2,j 2 p 2 q ik q pn p 2 n 2 ik2 i= k k 2 k k 2 i= 2 p 2 n tr Q Q 2 = op 2, 5.2 which implies that pn p n i= k k 2 q ik q ik2 x k,j x k2,j 0, a.s. Then we have pn tr Q X n X n Q = o a.s.. Similarly, wecanprovepn trqx n X n Q = O a.s. andpn tr Q X n X n Q = O a.s.. Then we have L 4 F n QX nx nq,f n Q X n X n Q = o a.s.. Therefore, without loss of generality, we will hereafter assume that the number of columns k of Q is finite. 6 lj 2

5..2 Sketch of the proof of Theorem 2. Silverstein 995 obtained the LSD of the classical sample covariance matrix when the dimension increases proportionally with the sample size. But the sample covariance matrix B n of this paper is different from that of Silverstein 995 because Silverstein 995 assumed that Q is p p and nonnegative definite while this paper assumes Q is p k with k p, even being infinity. In establishing the LSD of B n under the assumptions of Theorem 2., we need the truncation, centralization and rescaling on x ij. Under the assumptions of Theorem 2., the LSD of B n will be the same before or after the truncation, centralization and rescaling. Because the dimension of Q is different from those defined in Silverstein 995, the assumptions and truncations of Theorem 2. are also different from those given in Silverstein 995. The readers are reminded that the corresponding proofs given in Bai and Silverstein 200 and Silverstein 995, strongly depend on a fact that n β jz = trn X n T n X n zi n, where β j z s are defined in Subsection 5..3 and T n = T n in Silverstein 995 while T n = Q Q in the present paper. It may cause some ambiguity when k is infinity in the multiplication of infinite-dimensional matrices, we will avoid to use this fact in this paper. Let m n z = p trb n zi p be the Stieltjes transform of the ESD of B n, where z = u + iv with v > 0. We will show that m n z tends to a non-random limit that satisfies 2.2, where m n z = y n z +y n m n z is in fact the Stieltjes transform of B n = n X n T n X n, where T n = Q Q. The proof will be split into two parts: i m n z Em n z 0 a.s., 5.3 ii Em n z mz satisfying 2.2. 5.4 For truncation and re-normalization see Appendix B, we may further assume x ij < η n n/ qi, where the constant sequence {η n } tends to zero as n. 5..3 Proof of 5.3 Let r j = n /2 Qx j, then B n = n r j r j. Define Dz = B n zi p and D j z = Dz r j r j, β j z = [+r jd j zr j ], γ j z = r jd 2 j zr j β j z. 5.5 Since Iβ j z = vr jd j zd j z r j > v r jd 2, j zr j we obtain γ j z v, 5.6 7

where Iz = v. Moreover, I[zr j D j zr j ] = 2i [zr j D j zr j zr j D j zr j ] = v z 2 zr j D j z zr i r i D j zr j 0, where z and D j z denote the conjugate of z and D j z. Thus we have i j β j z z v. 5.7 Denote the conditional expectation given {r,,r j } by E j and E 0 for the unconditional expectation. Then, we have m n z Em n z = p E j E j trd z = p E j E j [trd z trd j z] = p E j E j γ j z. 5.8 By Lemma A. Burkholder inequality and the inequality 5.6, for any l >, we obtain l 2 E m n z Em n z l K 0 p l E E j E j γ j z 2 K 0 v l p l n l 2 5.9 where K 0 is a constant. Taking l > 2, Chebyshev inequality and 5.9 imply 5.3: m n z Em n z 0 a.s. 5..4 Proof of 5.4 Following the steps of the proof of Theorem. of Bai and Zhou 2008, define K = +y n a n, T n and y n = p/n, where T n = QQ, a n,l = p Etr[T l n D z], l = 0 or. We have K zi D z = n K zi r j r jd z K zi KD z = n K zi r j r j D j zβ j z K zi KD z. 5.0 8

For l = 0,, multiplying both sides by T l n and then taking trace and dividing by p, we have p Etr[T l n K zi ] a n,l = p Er jd j zt l nk zi r j β j z p Etr[T l nk zi KD z].5. One can prove a formula similar to.5 of Bai and Silverstein 2004 and verify that E r j D j zr j n trq D j zq 2 2n 2 EtrQ D j zqq D j zq +n 2 k i= Q D j zq ii 2 E x 4 ij k 2n 2 pv 2 Q 4 +n 2 v 2 q i 4 ηnn/ q 2 i 2 Cηn 2 0. 5.2 By Lemma A.7, one can prove that i= n trq D j zq n trq D zq 2 Kn 2 v 2 and by the similar method of the proof of 5.3, we have E n trq D zq n EtrQ D zq 2 Knv 2. Therefore we have E β j z +y n a n, 2 = o which, applying Lemma A.7 again and 5.7, implies that p = p Er j D j zt l n K zi r j β j z p Etr[T l n K zi KD z] Er j D j zt l n K zi r j +y n a n, p Etr[T l n K zi KD z]+o = pn EtrD j zt l n K zi T n +y n a n, p Etr[T l n K zi KD z]+o = pn EtrD j zt l n K zi K p EtrD zt l n K zi K+o = o. 9 5.3

It then follows from 5. and 5.3 that { a n,l = p tr Tn[ l +yn a n, T n zi ] } +o t l = t+y n a n, z dh nt+o, 5.4 whereh n istheesdoft n. BecauseI[z+y n a n, ] > v,weconcludethat +y n a n, z /v. Taking l = in 5.4 and multiplying both sides by +y n a n,, we obtain From this, one can easily derive that a n, +y na n, = t+y na n, dh t+y na n, z nt+o = +z dh t+y na n, z nt+o = +za n,0 +o. +y n a n, = y n +za n,0 +o = y n [+zem n z]+o. 5.5 Finally, from 5.4 with l = 0, we obtain Em n z = t[ y n +zem n z] z dh nt+o. 5.6 The limiting equation of the above estimate is mz = dht. 5.7 t[ y+zm y z] z It was proved in Silverstein 995 that for each z C + the above equation has a unique solution mz satisfying Im > 0. By this fact, we conclude that Em n z tends to the uniquesolutiontotheequation5.7. Finally, bytherelationmz = yz +ymz, we obtain 5.4. 5.2 Proof of Theorem 3. First, when the number of columns k in Q is infinite we can reduce the situation to the finite k case using arguments analoguous to those developed in Section 5... Therefore, without loss of generality we assume in this section that k is finite. 5.2. Sketch of the proof of Theorem 3. Let M n z = p[m n z m 0 n z] = n[m nz m 0 n z] where m nz is the Stieltjes transform of the ESD of B n, m 0 nz and m 0 nz are the Stieltjes transforms of F yn,hn and F yn,hn 20

satisfying z = m 0 n z +y n tdh n t +tm 0 n z with y n = p/n and H n being the ESD of T n. Let x r be a number greater than + y 2 limsupλ Tn max. Let x l be a number between 0 and y 2 liminf n n λtn min if the latter is greater than 0 and y <. Otherwise, let x l be a negative number. Let η l and η r satisfy x l < η l < y 2 I 0<y< liminf n λtn min < + y 2 limsup n λ Tn max < η r < x r. Let v 0 be any positive number. Define a contour C = {x l +iv : v v 0 } C u C b {x r +iv : v v 0 } where C u = {x+iv 0 : x [x l,x r ]}, C b = {x iv 0 : x [x l,x r ]} 5.8 and C n = {z : z C and Iz > n ǫ n } with ǫ n n α, 0 < α <. Let M n z = p[m n z m F yn,hnz] and M n z, z C n, M n x r +in ǫ n, x = x r, v [0,n ǫ n ], M n z = M n x r in ǫ n, x = x r, v [ n ǫ n,0, M n x l +in ǫ n, x l < 0,x = x l, v [0,n ǫ n ], M n x l in ǫ n, x l < 0,x = x l, v [ n ǫ n,0. Lemma 5.. Under Assumptions b-c-d-e-f, there exists {ǫ n } for which { M n } forms a tight sequence on C. Moreover, Mn converges weakly to a two-dimensional Gaussian process M satisfying for z C y m 3 zt 2 +tmz 3 dht EMz = α x y m 2 zt 2 dht α +tmz 2 x y m 2 zt 2 and variance-covariance function +β x y m 3 zt 2 +tmz 3 dht y m 2 zt 2 +tmz 2 dht, +tmz 2 dht CovMz,Mz 2 = mz [ ] / z mz 2 / z 2 t t mz mz 2 +yβ mz mz 2 2 x mz t+ 2 mz 2 t+ 2dHt z z 2 2 z z 2 2 log az,z 2 z z 2 where az,z 2 = α x + mz mz 2 z z 2. mz 2 mz 2

The proof of Lemma 5. is given in the following Section 5.2.2. We show how Theorem 3. follows from the above Lemma. We use the identity fxdgx = fzmzdz 5.9 2πi valid for c.d.f. G and f analytic on the support of G where mz is the Stieltjes transform of G. The complex integral on the right is over any positively oriented contour enclosing the support of G and on which f is analytic. Choose v 0, x r, and x l so that f,...,f L are all analytic on and inside the contour C. Therefore for any f {f,...,f L }, with probability one fxdg n x = 2πi fzm n zdz for all n large, where the complex integral is over C. Moreover, with probability one, for all n large fzm n z M n zdz 4Kǫ n maxλ Tn max + c n 2,λ Bn max x r C + minλ Tn min I 0,c n c n 2,λ Bn min x l which converges to zero as n. Here K is a bound on f over C. Since M n f z 2πi M n zdz,..., f L z 2πi M n zdz is a continuous mapping of CC,R 2 into R r, it follows that the above vector forms tight sequences. Letting M denote thelimit ofanyweakly converging subsequence of{ M n }, then we have the weak limit equal in distribution to f zmzdz,..., 2πi 2πi f L zmzdz. The fact that this vector, under the assumptions b-c-d-e-f, is multivariate Gaussian following from the fact that Riemann sums corresponding to these integrals are multivariate Gaussian, and that weak limits of Gaussian vectors can only be Gaussian. The limiting expressions for the mean and covariance follow immediately. 5.2.2 Proof of Lemma 5. The proof of Lemma 5. is similar to the proof of Lemma. of Bai and Silverstein 2004 but the proof of the mean and variance-covariance is different from Lemma. of Bai and Silverstein 2004. Let M n z = p[m n z m 0 n z]. In fact, we also have M n z = n[m n z m 0 n z]. Convergence of finite dimensional distributions. Write for z C n, M n z = M n z+m2 n z where M nz = p[m n z Em n z], 22 M 2 nz = p[em n z m 0 nz],

and m 0 nz is the Stieltjes transform of F yn,hn. In this subsection we will show that for any positive integer r and any complex numbers z,,z r, the random vector M l n z j, j =,,r converges to an 2r-dimensional Gaussian vector for l =,2. Because of Assumption e, without loss of generality, we may assume Q for all n. Constants appearing in inequalities will be denoted by K and may take on different values from one expression to the next. Let ǫ j z = r jd j zr j n trt n D j z, δ j z = r jd 2 j zr j n trt n D 2 j z = d dz ǫ jz, 5.20 and Notice that β j z = By 5.22, we obtain +n trt n D j z, b nz = +n EtrT n D z. 5.2 D z = D j z D j zr j r j D j zβ j z. 5.22 p[m n z Em n z] = tr[d z ED z] = tre j D z tre j D z = = tre j [D z D j z] tre j [D z D j z] E j E j β j zr jd 2 j zr j = d E j E j logβ j z dz = d E j E j log +ǫ j z β j z dz where β j z = β j z β j z β j zǫ j z. By Lemma A.2, we have d E E j E j [ log +ǫ j z β j z ǫ j z β j z ] 2 dz [ log +ǫj ζ β j ζ ǫ j ζ β j ζ ] E dζ 2πi ζ z =v/2 z ζ 2 E ǫ 2πv 4 j ζ β j ζ 4 dζ ζ z =v/2 23 2

K 2πn 4 v 4 + k i= ζ z =v/2 } E x ij 8 E q T i D j zq i 4 dζ { E [ trt n D j ζt n D j z ζ ] 2 Kn +Kη 4 n 0. 5.23 Therefore, we need only to derive the finite dimensional limiting distribution of d E j E j ǫ j z β j z = d E j ǫ j z β j z. 5.24 dz dz Similar to the last three lines of the proof of 5.23, one can show that E E j E j d dz ǫ 2 jz β j z I E j E d j dz ǫ jz β j z ǫ d E E ǫ 2 j dz ǫ jz β j z 4 0. Thus, the martingale difference sequence{e j E j d dz ǫ jz β j z} satisfies the Lyapunov condition. Applying Lemma A.4, the random vector M nz,,m nz r will tend to an r-dimensional Gaussian vector Mz,, Mz r whose covariance function is given by CovMz,Mz 2 = lim E j E j ǫ j z β j z E j ǫ j z 2 β j z 2. n z z 2 Consider the sum Γ n z,z 2 = n E j [E j β j z ǫ j z E j β j z 2 ǫ j z 2 ]. Using the same approach of Bai and Silverstein 2004, we may replace β j z by b n z. Therefore, by.5 of Bai and Silverstein 2004, we have Γ n z,z 2 = b n z b n z 2 E j [E j ǫ j z E j ǫ j z 2 ] [ = b n 2 n z b n z 2 tr E j T n D j z E j T n D j z 2 = b n 2 n z b n z 2 tr +α x tre j QQ D j z QQ T E j D T j z 2 ] k +β x q ie j D j z q i q ie j D j z 2 q i i= [ E j T n D j z E j T n D j z 2 24

+α x trt n E j D j z T n E j D T j z 2 ] k +β x q i E jd j z q i q i E jd j z 2 q i i= 5.25 where α x = Ex 2 2 and β x = E x 4 α x 2. Here, the third equality holds if either α x = 0 or Q is real which implies that T n = QQ T = QQ. Now we use the new method to derive the limit of the first term which is different from but easier than that used in Bai and Silverstein 2004. Let v 0 be a lower bound on Iz i. Define r j as an i.i.d. copy of r j, j =,,n and define D j z similar as D j z by using r,,r j, r j+, r n. Then we have [ ] [ ] tr E j T n D j z E j T n D j z 2 = tre j T n D j z T n D j z 2. Similar to 5.5, one can prove that n E j [z trt n D j z z 2 trt n D j z 2 ] z [b z ] z 2 [b z 2 ],a.s. where bz = lim n b n z = zmz. On the other hand, E j n [z trt n D j z z 2 trt n D j z 2 ] [ j = n E j trt n D j z z z 2 r i r i + j = n z z 2 E j r i i= +n i= i=j+ i= i=j+ D ji z 2T n D ji z r i β ji z β ji z 2 ] D z r i r i z 2 r i r i j z 2 [ ] E j z βji z 2 r i D ji z 2T n D j z r i z 2 β ji z r i D j z 2 T n D ji z r i j = n 2 z z 2 E j trt n D ji z 2T n D ji z bz bz 2 = +n 2 i=j+ [ ] E j z bz 2 trt n D ji z 2T n D j z z 2 bz trt n D j z 2 T n D ji z +o a.s. by replacing D ji = D j +D ji r ir i D ji β ji [ j n z z 2 bz bz 2 + n j ] n z bz 2 z 2 bz E j n trt n D j z 2 T n D j z +o a.s.. Comparing the two estimates, we obtain E j n trt n D j z 2 T n D j z = z b z z 2 b z 2 +o a.s. j z n z 2 bz bz 2 + n j z n bz 2 z 2 bz 25

Consequently, we obtain [ ] b n 2 n z b n z 2 tre j T n D j z E j T n D j z 2 az,z 2 az,z 2 0 taz,z 2 dt = log az,z 2 = 0 z dz, where Thus, we have [ ] bz bz 2 z b z z 2 b z 2 az,z 2 = z bz 2 z 2 bz = + bz bz 2 z 2 z z bz 2 z 2 bz = + mz mz 2 z z 2. mz 2 mz az,z 2 2 z 2 z 0 z dz = z az,z 2 z 2 az,z 2 = mz 2 mz z 2 mz z mz 2 z z 2 +mz mz 2 mz 2 mz 2 +mz mz 2 z z 2 mz z mz 2 mz mz mz 2 z 2 z = mz / z + z 2 mz = mz / z mz 2 / z 2 mz 2 mz 2 + mz / z z z 2 mz 2 mz z z 2 2. Next, we compute the limit of the second term of 5.25. In this step, we need the assumption that the matrix Q is real. Similarly, we consider n E j[z trt n D j z z 2 trt n D T j z 2 ] z b z z 2 b z 2,a.s. On the other hand, n E j [z trtd j z z 2 trt D T j z 2 ] = n E j trtd j z [ j z r i r i z 2 r i r T i + z r i r i z 2 r i r T i ] D T j z 2 i= 26 i=j+

j [ = n E j z βji z 2 r i D T ji z 2 T[D ji z D ji z r i r id ji z β ji z ]r i = z 2 r T i i= +n D T ji z 2 β ji z 2 D T ji z 2 r i r T i D T ji z 2 i=j+ E j [ z βji z 2 r i D T ji z 2 TD j z r i z 2 r T i TD ji z β ji z ] r i ] ] D j z 2 TD ji z r i β ji z { } j n α x [ z bz 2 z 2 bz +bz bz 2 z z 2 ]+z bz 2 z 2 bz E j n trtd j z T D T j z 2 +o a.s.. Comparing the two estimates, we obtain = E j n trt D j z 2 TD j z z b z z 2 b z 2 +o a.s. j n α x [ z bz 2 z 2 bz +bz bz 2 z z 2 ]+[z bz 2 z 2 bz ]. Consequently, we obtain where n 2 α x b n z b n z 2 tre j TD j z E j TD T j z 2 ãz,z 2 0 tãz,z 2 dt = log ãz,z 2 = ãz,z 2 = α xbz bz 2 z b z z 2 b z 2 z bz 2 z 2 bz = α x + bz bz 2 z 2 z z bz 2 z 2 bz ãz,z 2 0 z dz, = α x + mz mz 2 z z 2 mz 2 mz Last,wewillcomputethelimitofthethirdtermof5.25. By9.9.2ofBai and Silverstein 200, we have. = k n 2 β x e T i Q D j z Qe i e T i Q D j z 2 Qe i n 2 z z 2 i= k β x e T i Q mz T n +I p Qe i e T i Q mz 2 T n +I p Qe i +o p i= 27

Ifβ x 0,thenbyassumption, thematrixq Qisdiagonal. UsingtheidentityQ [mzt n + I p ] Q = m z{i k [mzq Q+I k ] }, we see that the matrix is diagonal. We have n 2 z z 2 = n 2 z z 2 = yβ x z z 2 k β x e T i Q [mz T n +I p ] Qe i e T i Q [mz 2 T n +I p ] Qe i i= β x tr{q [mz T n +I p ] QQ [mz 2 T n +I p ] Q} t 2 [+tmz ][+tmz 2 ] dht+o. Then the third term of CovMz,Mz 2 is { 2 } yβx bz bz 2 t 2 z z 2 z z 2 [+tmz ][+tmz 2 ] dht 2 } t = {yβ 2 mz mz 2 x z z 2 [+tmz ][+tmz 2 ] dht mz mz 2 t 2 = yβ x z z 2 [+tmz ] 2 [+tmz 2 ] 2dHt. Tightness of Mn z. As done in Bai and Silverstein 2004, the proof of tightness of Mn relies on the proof of sup n,z,z 2 C M n E z M n z 2 E M n z M n z 2 z z 2 = sup E trd z D z 2 EtrD z D z 2 2 n,z,z 2 C Iz i >n ǫn := sup n,z,z 2 C Iz i >n ǫn J n z,z 2 < 5.26 where by the formula 5.22, we have J n z,z 2 = E E j E j trd z D z 2 = + E Ej E j r j D j z D j z 2 D j z r j β j z 2 E E j E j r j D j z 2 D j z D j z 2 r j β j z 2 2 2 28 2

= + + + E Ej E j r j D j z D j z 2 r j r j D j z 2 D j z r j β j z β j z 2 2 E Ej E j r jd j z D j z 2 D j z r j b j z 2 = O. E Ej E j r j D j z 2 D j z D j z 2 r j b j z 2 2 E Ej E j r jd j z D j z 2 r j r jd j z 2 D j z r j b j z b j z 2 2 +O Convergence of M 2 n z. In order to simplify the exposition, we let C = C u or C u C l if x l < 0, and C 2 = C r or C r C l if x l > 0. We begin with proving sup z C n Em n z mz 0 as n. 5.27 Since F B n F y,h almost surely, we get from d.c.t. EF B n F y,h. It is easy to verify that EF B n is a proper c.d.f. Since, as z ranges in C, the functions λ z in λ [0, form a bounded, equicontinuous family, it follows [see, e.g. Billingsley 968, Problem 8, p. 7] that sup z C Em n z mz 0. For z C 2, we write η l,η r defined as in previous section Em n z mz = λ z I [η l,η r]λdef B n λ F c,h λ+e As above, the first term converges uniformly to zero. With l 2, we get sup E λ z I [η l,η r] cλdfb n λ z C 2 ǫ n /n P B n η r or λ Bn min η l Knǫ n n l 0. λ z I [η l,η r] cλdfb n λ. From the fact that F yn,hn F y,h [see Bai and Silverstein 998, below 3.0] along with the fact that C lies outside the support of F c,h, it is straightforward to verify that sup m 0 nz mz 0 as n 5.28 z C where m 0 n z is the Stieltjes transform of Fyn,Hn. We now show that sup n,z C n Em n zt n +I p <. 5.29 29

From Lemma 2. of Bai and Silverstein 998, Em n zt n + I p is bounded by max2,4v 0 on C u. On C l, the boundedness of Em n zt n +I p follows from the fact that Em n zλ j + REm n zλ j + Pλ min B n > η l ǫ n Pλ min B n < η l, where λ j is an arbitrary eigenvalue of T n and Q. Now let us consider the bound on C r. By. of Bai and Silverstein 998, there exists a support point t 0 of H such that +t 0 mz 0 on C r. Since mz is analytic on C r, there exist positive constants δ and µ 0 such that inf +t 0 mz > δ, and sup mz < µ 0. z C r z C r By 5.27 and H n H, for all large n, there exists an integer j n such that λ j t 0 < δ /4µ 0 and sup z Cr Em n z mz < δ /4. Then, we have which completes the proof of 5.29. inf +λ j Em n z > δ /2, z C r Next we show the existence of ξ 0, such that for all n large sup y nem n z 2 t 2 +tem n z 2dH nt < ξ. 5.30 z C n From the identity. of Bai and Silverstein 998 mz = z +y t dht +tmz valid for z = x+iv outside the support of F y,h, we find Imz = v +Imzy z +y t2 dht +tmz 2 t dht 2. +tmz Therefore ymz2 t 2 +tmz 2dHt = y t 2 +tmz 2 dht z +y t dht 2 +tmz Imzy t 2 +tmz 2 dht v +Imzy t 2 +tmz 2 dht 30

= y df y,h x t 2 dht x z 2 +tmz 2 +y df x z y,h x t 2 dht <,5.3 2 +tmz 2 for all z C. By continuity, we have ξ < such that t 2 ymz2 +tmz 2dHt < ξ. 5.32 sup z C Therefore, using 5.27, 5.30 follows. We proceed with some improved bounds on quantities appearing earlier. Recall the functions β j z and γ j z defined in 5.5. For p 4, we have Let M be nonrandom p p. Then E trd zm EtrD zm 2 = E E j trd zm E j trd zm 2 = E = = b n z 2 E γ z p Kn 2. 5.33 E j E j trd z D j zm 2 E E j E j β j zr j D j zmd j zr j 2 = b n z 2 E E j E j β j zγ j zr j D j zmd j zr j 2 E E j r j D j zmd j zr j n trd j zmd j zt n E j E j β j zγ j zr j D j zmd j zr j 2 2 b n z 2 ne r jd j zmd j zr j n trd j zmd j zt n 2 +4 b n z 2 ne β j zγ j zr j D j zmd j zr j 2 2 b n z 2 ne r j D j zmd j zr j n trd j zmd j zt n 2 +4 b n z 2 ne γ j z 4 /2 E β j z 8 /4 E r jd j zmd j zr j 8 /4. Using 9.9.3 of Bai and Silverstein 200, 5.33 and the boundedness of b n z we get E trd zm EtrD zm 2 K M 2. 5.34 The same argument holds for D z so we also have E trd zm EtrD zm 2 K M 2. 5.35 3

Our next task is to investigate the limiting behavior of dh n t n y n +tem n z +zy nem n z [ ] = neβ z r D zem n zt n +I p r n EtrEm n zt n +I p T n D z, for z C n [see 5.2 in Bai and Silverstein 998]. Throughout the following, all bounds, including O and o expressions, and convergence statements hold uniformly for z C n. We have From 5.29 we get Therefore EtrEm n zt n +I p T n D z EtrEm nzt n +I p T n D z 5.36 = Eβ ztrem n zt n +I p T n D zr r D z = b n ze β zγ zr D zem n zt n +I p T n D zr. Eβ zγ zr D zem nzt n +I p T n D zr E γ z 2 /2 E β z 4 /4 E r D zem n zt n +I p T n D zr 4 /4 Kn /2. 5.36 n b n zetrd zem nzt n +I p T n D zt n Kn /2. Since β z = b n z b 2 n zγ z+β zb 2 n zγ2 z, we have neβ zr D zem n zt n +I p r Eβ zetrem n zt n +I p T n D z = b 2 n zneγ zr D zem nzt n +I p r +b 2 nzneβ zγ 2 zr D zem n zt n +I p r Eβ zγ 2 zetrem nzt n +I p T n D z = b 2 n zneγ zr D zem nzt n +I p r +b 2 nze[nβ zγ 2 zr D zem n zt n +I p r β zγ 2 ztrd zem nzt n +I p T n ] +b 2 n zcovβ zγ 2 z,trd zem nzt n +I p T n CovX,Y = EXY EXEY. Using 5.29, 5.33, 5.35 and Lemma A.2, we have E[nβ zγ 2 zr D zem n zt n +I p r β zγ 2 ztrd zem n zt n +I p T p ] 32

ne β z 4 /4 E γ z 4 /2 E r D zem n zt n +I p r and n trd Em nzt n +I p T n 4 /4 Kn /2 Covβ zγ 2 z,trd zem n zt n +I p T p E β z 6 /6 E γ z 6 /3 E trd zem nzt n +I p T n EtrD zem nzt n +I p T n 2 /2 Kn 2/3. Since β z = b n z b n zβ zγ z, then we have Eβ z = b n z+on /2. Write Enγ zr D zem n zt n +I p r = ne[r D zr n trd zt nr D zem nzt n +I p r n trd zem n zt n +I p T n ] +n CovtrD zt n,trd zem nzt n +I p T n. From 5.35 we see that the second term above is On. Therefore, we arrive at dh n t n y n +zy n Em n z +tem n = b 2 n zn EtrD zem nzt n +I p T n D zt n b 2 n zne[r D zr n trd zt nr D zem nzt n +I p r n trd zem n zt n +I p T n ]+o m = b2 nz n β x i= q i D zq i q i D zem nzt n +I q i b2 n z n α xtrt n D T z T n D zem n zt n +I. dh Let A n z = y nt n +zy +tem n z nem n z. Using theidentity Em n z = yn +y z n Em n z, we have dh n t A n z = y n +tem n z y n +zem n z+ = Em n z z Em n z +y tdh n t n. +tem n z It follows that Em n z = tdh z +y nt n +A +tem n z n/em n z. 33

From this, together with the analogous identity [below 4.4] we get Em n z m 0 n z = m 0 nza n y n Em n zm 0 nz t 2 dh nt. 5.37 +tem n z+tm 0 n z We see from 5.30 and the corresponding bound involving m 0 n z, that the denominator of 5.37 is bounded away from zero. First we have α x z 2 m 2 z n 2 m 2 z = α x n EtrmzT n +I p 3 T 2 n z 2 m 4 z { +α x Etr n 2 i j Etr[D z{mzt n +I p } T n D T z T n ] mzt n +I p T n mzt n +I p r i r i n T na ij z mzt n +I p T n A T ij z r i r T i } n T n +o m 2 z = α x n trmzt n +I p 3 T 2 n + α2 x z2 m 4 z {trmzt n 3 n +I p T n } 2 trd zmzt n +I p T n A T j z T n +o. Then we have = = Thus we obtain = α x z 2 m 2 z n 2 α x m 2 z Etr[D z{mzt n +I p } T n D T z T n ] tr{mzt n n +I p 3 T 2 n} αxm2 z tr{mzt n n +I p T n } +o 2 α x y m 2 zt 2 dht +tmz 3 α x y +o. m 2 zt 2 dht +tmz 2 α x z 2 m 2 z n 2 Etr[D z{mzt n +I p } T n D T z T n ] y m 2 zt 2 dht +tmz 2 α x y m 2 zt 2 dht +tmz 3 { y m 2 zt 2 dht }{ α +tmz 2 x y m 2 zt 2 dht } +o. +tmz 2 34

Moreover, if β x = 0 or Q Q is diagonal, then we have where and β x y n z 2 m 3 z pn m i= y m3zt2 dht mzt+ = β 3 x y +o m 2 zt 2 dht +tmz 2 Eq i D z{mzt n +I p } q i q i D zq i y m 2 zt 2 dht +tmz 2 q id z{mzt n +I p } q i q id zq i = e T i Q D z{mzt n +I p } Qe i e T i Q D zqe i = z 2 e T i Q {mzt n +I p } 2 Qe i e T i Q {mzt n +I p } Qe i = z 2 e T i Q {mzt n +I p } QQ QQ {mzt n +I p } Qe i e T i Q {mzt n +I p } Qe i = z 2 e T i Q {mzt n +I p } QQ QQ /2 {mzt n +I p } QQ /2 Qe i That is, e T i Q {mzt n +I p } Qe i, Q QQ /2 mzqq +I p QQ /2 Q = I k [Q Q+m zi k ] Q Q Q mzqq +I p Q = m z{i k [I k +mzq Q] }. Em n z m 0 n z m 0 n = za nz c n Em n zm 0 n z = t 2 dh nt +tem n z+tm 0 n z α x y m 3 zt 2 dht +tmz 3 y m 2 zt 2 dht α +tmz 2 x y m 2 zt 2 dht 5.3 Proof of Corollary 3.3 +tmz 2 +β x y m 3 zt 2 dht mzt+ 3 y. m 2 zt 2 dht +tmz 2 Let ay = y 2 and by = + y 2, then for l {,...,L}, we have EX fl = [ay]l +[by] l 4 2π π 0 +y 2 ycosθ l dθ 35