TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST

Size: px

Start display at page:

Download "TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST"

Ambrose Blair
5 years ago
Views:

1 Submitted to the Annals of Statistics arxiv:. TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST By Dandan Jiang Jilin University arxiv: v1 [stat.me] 11 Oct 15 This paper proposes a new test for covariance matrices structure based on the correction to Rao s score test in large dimensional framework. By generalizing the CLT for the linear spectral statistics of large dimensional sample covariance matrices, the test can be applicable for large dimensional non-gaussian variables in a wider range without the restriction of the 4th moment. Moreover, the amending Rao s score test is also powerful even for the ultra high dimensionality as p n, which breaks the inherent idea that the corrected tests by RMT can be only used when p < n. Finally, we compare the proposed test with other high dimensional covariance structure tests to evaluate their performances through the simulation study. 1. Introduction. Recent advances in data acquisition techniques and the ease of access to high computation power have fueled increased interest in analyzing the data with moderate even large dimensional variables in most sciences, such as microarray gene expressions in biology, where the number of feature variables p greatly exceeds the sample size n. However, the traditional statistical methods encounter failure due to the increase in dimensionality, because they are established on the basis of fixed dimension p as the sample size n tends to infinity. So many efforts have been made to improve the power of the classical statistical methods and to propose new procedures designed for the large dimensional data. A particular attention has been paid to the covariance matrices structure test, which is of fundamental statistical interest and widely used in the biology, finance and etc. Let χ = (x 1,, x n ) is an independent and identically distributed sample from a p dimensional random vector X with mean µ and covariance matrix Σ. To test on the structure of covariance matrices, we consider the hypothesis (1.1) H : Σ = Σ v.s. H 1 : Σ Σ, Supported by Project from NSFC. Primary 6H15; secondary 6H1 Keywords and phrases: Large dimensional data, Covariance structure, Rao s score test, Random matrix theory. 1

2 D. JIANG which covers the identity hypothesis test H : Σ = I p and the sphericity hypothesis test H : Σ = γi p as the special cases. Within this context, it has been well studied under the normal distribution assumption with the classical setting of fixed p, such as [1], [1] and [13]. Also, the Rao s score test was given in [14]. But they all lost their effectiveness as p is a moderate or ultra high dimension, even worse for the non-gaussian variables. Therefore, many statisticians have investigated this problem and provided the various solutions for the large dimensional data setting. The earlier works include [11], [1] and [15], which involved some well-chosen distance function relied on the first and second spectral moments as dimension p and sample size n go to infinity together, whereas they were invalid for either the ultra high dimensionality or non-gaussian variables. Then Bai et al. [] focused on deriving the limiting behavior of the corrected LRT under the large dimensional limiting scheme p/n c [, 1), and Jiang et al. [9] extended it to a wider spread with c [, 1] and p < n. Their methods expanded the application range without distribution assumption, but still not applicable for the case of p > n where the likelihood ratio cannot be well defined. Recently, Chen et al. [7] proposed a nonparametric test with the constrains of uniformly bounded 8th moment and derived its asymptotic distribution under the null hypothesis regardless of the limiting behavior of p/n. Motivated by this, Cai and Ma [6] investigated the high dimensional covariance testing problem from a minimax point of view under the normal assumption. It showed that its power uniformly dominated that of the corrected LRTs over the entire asymptotic regime in which the corrected LRTs were defined. Though it had the optimal power, as seen from our simulation, It failed in empirical sizes when the dimension p was much higher than the sample size n, especially the case of large p small n. In this paper, we proposed a new test for the hypothesis (1.1) by RMT (random matrix theory) based on the aforementioned Rao s score test. The main contributions of this work displayed in several aspects. First, we generalized the CLT(central limit theorem) for the LSS (linear spectral statistic) of large dimensional sample covariance matrices in [4]. By removing the restriction that the 4th moment of the variable is 3 + δ, where δ is a positive constant tending to, we provided an enhanced version of the theorem, which made the test proposed in this work suitable for non-gaussian variables in a wider range. Moreover, our correction based on Rao s score test can be applied to the ultra high dimensionality in despite of the functional relationship between p and n. Although it was derived under the limiting scheme p/(n 1) q [, + ) with unknown mean parameter µ, exactly what we need was just the ratio of p over n in practical problems,

3 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 3 which is always easily acquired under any functional expression of p and n. It can be sustained by the simulation when (p, n) = (4, 19) or (3, 79) and etc., which are close to the pair numbers adopted in [7] by the function p = exp(n.4 ) + 1. It also revealed that whether the corrections by RMT can be used in the case of p > n depends on the corrected statistics we chose rather than the tools we used in RMT. Finally, the restricted condition is relaxed to the finite 4th moment compared with [7], and our correction to Rao s score test has the more accurate sizes and better powers as shown in the simulation study. The remainder of the article is organized as follows. Section gives a quick review of the Rao s Score test, then details their testing statistics for covariance structure tests. An enhanced version of the large dimensional CLT in [4] is also provided in this part. In Section 3, we propose the new testing statistics in large dimensional setting based on the Rao s score test. Simulation results are presented to evaluate the performance of our test compared with other large dimensional covariance matrices tests in Section 4. Then we draw a conclusion in the Section 5, and the proofs and derivations are listed in the Appendix A. Preliminary. We first give a quick review of the Rao s Score test, and derive their classical test statistic for the hypothesis (1.1). Then the test statistic is refined into something precisely needed in the amendment process. An enhanced version of the CLT for LSS of large dimensional sample covariance matrices is also presented, which makes it possible that the modifications of the score tests have a wider use with the 4th moment requirement excluded..1. Rao s Score Test. Let X be a random variable with population distribution F X (x, θ) and density function f X (x, θ), where θ is an unknown parameter. The score vector of X is defined as U(X, θ) = d dθ lnf X(x, θ). Then the information matrix of X is I(X, θ) = E(U(X, θ)u (X, θ)) It is well known that the information matrix can be also calculated by Hessian matrix H(X, θ) as below: I(X, θ) = E(H(X, θ)) = E( d dθ lnf X(x, θ)) Let χ = (x 1,, x n ) denote a sample from the population distribution F X (x, θ). Then the log-likelihood, the score function and the infor-

4 4 D. JIANG mation matrix of the sample are given by l(χ, θ) = n lnf(x i, θ), U(χ, θ) = n U(x i, θ) and I(χ, θ) = ni(x 1, θ), respectively. Then we have the definition of Rao s score test statistic as below: Definition.1. Rao s score test statistic for the hypothesis H : θ = θ is defined by RST(χ, θ ) = U (χ, θ )I(χ, θ ) 1 U(χ, θ ), where θ = (θ 1,, θ p ) is a known vector and RST(χ, θ ) tends to a χ p limiting distribution as n under H. (Rao,1948). To specify the Rao s score test statistic for hypothesis test (1.1), we suppose the sample χ = (x 1,, x n ) follows a normal distribution with mean parameter µ and covariance matrix Σ. Denote θ = (µ, vec(σ) ), where vec( ) is the vectorization operator. First, the logarithm of the density of the sample χ is written as l(χ, θ) = np ln(π) n ln Σ 1 By the definition where d = dθ the sample is d dµ d dvec(σ) (.1) U(χ, θ) =: where (.) ˆµ = 1 n n tr ( Σ 1 (x i µ)(x i µ) ). U(χ, θ) = d l(χ, θ), dθ is a p(p + 1) 1 vector, then the score vector for ( U1 (χ, θ) U (χ, θ) ) ( = n x i and A = 1 n nσ 1 ) ( ˆµ µ) n vec(σ 1 (AΣ 1 I p )) n (x i µ)(x i µ). Derivations of (.1) is specified in the Appendix A.1.

5 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 5 ( ) Secondly, the Hessian matrix H(χ, θ) = d l(χ, θ) =: H11 H 1, dθ H 1 H where the part of the parameter Σ is (.3) H = n = n dvec(σ 1 (AΣ 1 I p )) dvec (Σ) dvec(σ 1 ) dvec(σ 1 AΣ 1 Σ 1 ) dvec (Σ) dvec (Σ 1 ) = n (Σ 1 Σ 1 )(AΣ 1 I p + I p AΣ 1 I p ). Details of derivations for (.3) can be found in the Appendix A.1. Because I(X, θ) = E(H(X, θ)) and E(A) = E[(X µ)(x µ) ] = Σ, where A is defined in (.), then the information matrix where the part for Σ is I(χ, θ) =: ( I11 (χ, θ) I 1 (χ, θ) I 1 (χ, θ) I (χ, θ) I (χ, θ) = n (Σ 1 Σ 1 ) If there are no restrictions on µ, the parameter µ in the score vector is replaced by its maximum likelihood estimator ˆµ. Then the part of the score vector corresponding to µ turns to, and only the second part of the score vector U (χ, θ) and the element I (χ, θ) of the information matrix contribute to the calculation of the Rao s score test statistic.(see [8]). Therefore, the Rao s score test statistic for hypothesis test (1.1) can be calculated by the expressions of U (χ, θ) and I (χ, θ), where µ and A are substituted with sample mean ˆµ and the sample covariance matrix (.4) ˆΣ = 1 n n (x i ˆµ)(x i ˆµ), respectively. Also Σ is instead of Σ under the null hypothesis. Thus, we have Proposition.1. Rao s score test statistic for testing H : Σ = Σ with no constrains on µ has the following form (.5) RST(χ, Σ ) = n tr[(σ 1 Σ I p ) ] where χ = (x 1,, x n ) is a sample from N p (µ, Σ), and the test statistic RST(χ, Σ ) tends to a χ p(p + 1) distribution with freedom degree under H when n. ),

6 6 D. JIANG Proof. RST(χ, Σ ) = n vec (Σ 1 1 ( ˆΣΣ I p ))[ n (Σ 1 Σ 1 n )] 1 vec(σ 1 1 ( ˆΣΣ I p )) = n vec (Σ 1 1 ( ˆΣΣ I p ))vec( ˆΣ Σ ) = n tr[(σ 1 Σ I p ) ] For some special cases are listed in the corollaries as following. Corollary.1. Rao s score test statistic for testing H : Σ = I p with no constrains on µ has the following form RST(χ, I p ) = n tr[( Σ I p ) ] where χ = (x 1,, x n ) is a sample from N p (µ, Σ), and the test statistic RST(χ, I p ) tends to a χ p(p + 1) distribution with freedom degree under H when n. Corollary.. Rao s score test statistic for testing H : Σ = γi p with no constrains on µ has the following form RST(χ, γi p ) = n tr[( p tr( Σ) Σ I p ) ] where χ = (x 1,, x n ) is a sample from N p (µ, Σ), and the test statistic RST(χ, γi p ) tends to a χ p(p + 1) distribution with freedom degree 1 under H when n. Proof. Replace the Σ by ˆγI p according to (.5), where ˆγ = the maximum likelihood estimator of γ. tr( Σ) p is.. CLT for LSS of a large dimensional sample covariance matrix. As seen above, the statistics of Rao s score test for the hypothesis (1.1) can be encoded by the trace function of a matrix, i.e. a function of the eigenvalues of some matrix concerned with the sample covariance matrix. That is exactly what we need in the corrections to the score test for large dimensional cases.

7 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 7 Consequently, a quick survey of the CLT for LSS of a large dimensional sample covariance matrix referred in [4] is presented below, which is a basic tool for improvements on the classical Rao s score test. Because the original version of the theorem has a strict condition that the 4th moment of the variable is 3 + δ, where δ is a positive constant tending to, so we derive an enhanced version excluding this requirement for a more widely usage. Before quoting, we first introduce some basic concepts and notations. Suppose {ξ ki C, i, k = 1,, } be a double array of i.i.d. random variables with mean and variance 1. Then (ξ 1,, ξ n ) is regarded as an i.i.d. sample from some p-dimensional distribution with mean p and covariance matrix I p, where ξ i = (ξ 1i, ξ i,, ξ pi ). So the sample covariance matrix is (.6) S n = 1 n n ξ i ξ i, where we use conjugate transpose for the complex variables instead. For simplicity we use F q, F qn to denote the Marčenko-Pastur law of index q and q n respectively, where q n = p Sn n q [, + ). Fn marks the empirical spectral distribution(esd) of the matrix S n, which is defined as ( where λ Sn i Fn Sn (x) = 1 p p 1 λ Sn i x, x R, ) are the real eigenvalues of the p p square matrix S n. Define f(x)dfn Sn (x) = 1 p p f(λ Sn i ), which is a so-called linear spectral statistic (LSS) of the random matrix S n. Based on this, we consider the empirical process G n := {G n (f)} indexed by A, (.7) G n (f) = p + f(x) [ F Sn n F qn] (dx), f A, where U is an open set of the complex plane including [a(q), b(q)], where a(q) = (1 q) and b(q) = (1+ q) ], and A be the set of analytic functions f : U C. Actually, the contours in U should contain the whole supporting set of the LSD F q. It is known that if q 1, exactly it is [a(q), b(q)]. If q > 1, the contours should enclose the whole support {} [a(q), b(q)], because the F q has a positive mass at the origin at this time. However, due to the exact

8 8 D. JIANG separation theorem in [3], for large enough p and n, the discrete mass at the origin will coincide with that of F q. So we can restrict the integral G n (f) on the contours only enclosed the continuous part of the LSD F q. Define {, if the ξ variables are real, κ = 1, if the ξ variables are complex. Then an enhanced version of Theorem 1.1 in [4] is provided, which will play a fundamental role in next derivations. Lemma.1. Assume: f 1,, f k A, {ξ ij } are i.i.d. random variables, such that Eξ 11 =, E ξ 11 = κ 1, E ξ 11 4 < and the {ξ ij } satisfy the condition 1 np E ξ ij 4 I( ξ ij nη) ij for any fixed η >. Moreover, p n = q n q [, + ) as n, p and E(ξ11 4 ) = β + κ + 1, where β is a constant. Then the random vector (G n (f 1 ),, G n (f k )) forms a tight sequence by the index n, and it weakly converges to a k-dimensional Gaussian vector with mean vector µ(f j ) = κ 1 qm 3 (.8) (z)(1 + m(z)) f j (z) πi [(1 q)m (z) + m(z) + 1] dz + βq m 3 (.9) (z) f j (z) πi (1 + m(z))[(1 q)m (z) + m(z) + 1] dz, and covariance function υ (f j, f l ) = κ f j (z 1 )f l (z ) (.1) 4π (m(z 1 ) m(z )) dm(z 1)dm(z ) βq f j (z 1 )f l (z ) (.11) 4π (1 + m(z 1 )) (1 + m(z )) dm(z 1)dm(z ), where j, l {1,, k}, and m(z) m F q(z) is the Stieltjes Transform of F q (1 q)i [, ) + qf q. The contours all contain the support of F q and non overlapping in both (.1) and (.11). The proof of the Lemma.1 is detailed in Appendix A..

9 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST The Proposed Testing Statistics. In this part, χ = (x 1,, x n ) remains to be an independent and identically distributed sample from a p dimensional random vector X with mean µ and covariance matrix Σ. For testing the hypothesis H : Σ = Σ, set ξ i = Σ 1 (x i µ), then the array { ξ i },,n contains p-dimensional standardized variables under H. If the mean parameter µ is known, the Lemma.1 can be cited in a direct way because its sample covariance matrix is identical with S n in (.6). However, it shows a slightly difference with unknown µ. By [17], it is reasonable to use n 1 instead of n, if applying the CLT in the Lemma.1 to correct the score test in large dimensional data with the unknown mean parameter. Also, the estimator of covariance matrix in the the corrected statistics should be changed into the unbiased one. Therefore, we define the unbiased sample covariance matrix of { ξ i } as Σ = Σ 1 n ( n 1 Σ)Σ 1, and denote S = Σ 1 ( n n 1 Σ). Because Σ has the same LSD with S n 1 defined in (.6) with n substituted by n 1, so that the matrix S has the same LSD as S n 1 due to the positive definiteness of Σ. Therefore, it is natural to use n 1 instead of n by [17] when the Lemma.1 is applied to amending the score test concerned the eigenvalues of S. Let (3.1) RST(χ, Σ ) = n tr[(s I p) ], then the correction to Rao s score test is hold in the following theorem: Theorem 3.1. Suppose that the conditions of Lemma.1 hold, for hypothesis test H : Σ = Σ, RST(χ, Σ ) is defined as (3.1), set p/(n 1) = q n q [, + ), q n 1 and g(x) = (x 1). Then, under H and when n, the correction to Rao s score test statistics is (3.) [ ] CRST (χ, Σ ) = υ(g) 1 n RST(χ, Σ ) p F qn (g) µ(g) N (, 1), where F qn is the Marčenko-Pastur law of index q n, and F qn (g), µ(g) and υ(g) are calculated in (3.3), (3.5) and (3.6), respectively.

10 1 D. JIANG Proof. By the derivation (3.1), we have n RST(χ, Σ ) = tr[(s I p ) ] = p ( λ S i 1 ) = p (x 1) dfn S (x) = p g(x)d ( F S n (x) F qn (x) ) + p F qn (g), where (λ S i ), i = 1,, p and F n S are the eigenvalues and the ESD of the matrix S, respectively. F qn (g) denotes the integral of the function g(x) by the density corresponding to the Marčenko-Pastur law of index q n, that is (3.3) F qn (g) = g(x)df qn (x) = q n, if q n 1, which is calculated in the Appendix A.3. As the definition in (.7), we have (3.4) G n (g) = p g(x)d ( Fn S (x) F qn (x) ) = n RST(χ, Σ ) p F qn (g). By Lemma.1, G n (g) weakly converges to a Gaussian vector with the mean (3.5) µ(g) = (κ 1)q + βq and variance (3.6) υ(g) = κq (1 + q) + 4βq 3, which are calculated in the Appendix A.3. Then, by (3.4) we arrive at Finally, CRST (χ, Σ ) = υ(g) 1 n RST(χ, Σ ) p F qn (g) N (µ(g), υ(g)), [ ] n RST(χ, Σ ) p F qn (g) µ(g) N (, 1) For the identity and sphericity hypothesis tests, we have the following corollaries:

11 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 11 Corollary 3.1. For testing H : Σ = I p with no constrains on µ, the conclusion of Theorem 3.1 still holds, only with the test statistic RST(χ, I p ) in (3.) is revised by RST(χ, I p ) = n tr[( n n 1 Σ I p ) ]. Corollary 3.. For testing H : Σ = γi p with no constrains on µ, the conclusion of Theorem 3.1 still holds, only with the test statistic RST(χ, I p ) in (3.) is revised by where ˆγ = tr( n n 1 Σ) p RST(χ, γi p ) = n n tr[(ˆγ 1 n 1 Σ I p ) ] is the maximum likelihood estimator of γ. 4. Simulation Study. Simulations are conducted in this section to evaluate the correction to Rao s score test (CRST) that we proposed. To compare the performance, we also present the corresponding simulation results of the test in [7] (SCT), the test in [6] (TCT) and the classical Rao s score test in [14] (RST). We consider the identity hypothesis test H : Σ = I p, and generate i.i.d random samples χ = (x 1,, x n ) from a p-dimensional random vector X following two scenarios of the populations under the null hypothesis, Gaussian Assumption: random vector X follows a p-dimensional normal distribution with mean µ 1 p and covariance matrix I p, where µ = and 1 p denotes a vector with that all elements are 1. Gamma Assumption: random vector X = (X 1,, X p ) and the components are independent and identically distributed as Gamma (4,.5), so that each of the random variables X i also has mean and variance 1. For each set of the scenarios, we report both empirical Type I errors and powers with 1, replications at α =.5 significance level. Different pair values of p, n are selected at a wide rage regardless of the functional expression or limiting behavior between them. The mean parameter is supposed to be unknown and substituted by the sample mean during the calculations. To calculate the empirical powers of the tests, two alternatives are designed in the simulations. In the first alternative, two different sample sets are provided for the corresponding scenarios. For Gaussian assumption,

12 1 D. JIANG the samples are independently generated from the random vector X following the normal distribution with mean vector µ 1 p and covariance matrix Σ = diag( 1 [v p], 1 p [v p]), where µ =, v =. are varying constants and [ ] denotes the integer truncation function. For Gamma assumption, the samples are still randomly selected from the random vector X = (X 1,, X p ) with independent components. Each component of the front part (X 1,, X [v p]) is distributed as Gamma(,1), whereas each of the components in the rest part (X [v p]+1,, X [p v p]) follows Gamma(4,.5), where v =.4. In the second alternative, the samples for Gaussian assumption are independently drawn from the normal distribution with mean vector µ 1 p and covariance matrix Σ = diag((1 + / np) 1 [v p], 1 p [v p]), where µ =, v =.5. The samples for Gamma assumption are followed the distribution of the random vector X = (X 1,, X p ), which satisfied that the components are independent and each component in the front part 4 (X 1,, X [v p]) is distributed as Gamma( 1+/ np, 1+/ np ), whereas each component of the rest part (X [v p]+1,, X [p v p]) follows Gamma(4,.5), where v =.5. Simulation results of empirical Type I errors and powers for the first alternative are listed in the Table 1, and the empirical powers for the second alternative is represented in Table. As seen from the Table 1, the empirical Type I errors of our proposed test CRST for both scenarios are almost around the nominal size 5%, and it converges to the nominal level rapidly as the dimension p approaches infinity, even for small n. Although, the empirical sizes of the proposed CRST is slightly higher for the case of p = 17 or under the Gamma assumption, it can be accepted with comparison to the other tests and be understood due to both asymptotic and nonparametric. For a further comparison, it is limited to several aspects. First, the Rao s score test and our proposed test both give a good performance when p is very small under the normal assumption. However, the empirical sizes of the Rao s score test deviate from the nominal level as p increases, and it shows a even worse result under the Gamma distribution assumption, where the proposed CRST is still active. Another interesting note is that the Rao s score test has a resilient power for the normal cases when p is much higher than n, for example (p = 16, n = 19). Second, for small and moderate dimensions like p = or 4 with higher sample size n = 79 or 159, the empirical Type I errors of the TCT in [6] behave well. However, the TCT leads to a dramatically high empirical size as the dimension p increases much higher, especially for large p, small n such as (p=16,n=19), though it has the optimal powers. Meanwhile, the

13 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 13 Table 1 Empirical sizes and powers(in brackets) of the comparative tests for H : Σ = I p at α =.5 significance level for normal and gamma random vectors with 1, replications. The alternative hypothesis is Σ = diag( 1 [v p], 1 p [v p]) with v =. for Normal variables and v =.4 for Gamma variables. proposed CRST SCT TCT RST p n=19 Normal random vectors: Type I error (Power) (.785).748 (.34).157 (.497).973 (.385).715 (.511).765 (.181).116 (.494).118 (.31) 4.68 (.1638).75 (.146).54 (.345).119 (.487) (.1148).7 (.951).6664 (.7115).17 (.1893) (.1955).718 (.178).9985 (.999).89 (.53) (.791).76 (.146) 1. (1.).419 (.434) Gamma random vectors: Type I error (Power) (.986).94 (.895).193 (.134).189 (.173).93 (.943).913 (.887).145 (.1465).34 (.55) (.1611).788 (.1316).794 (.3584).8 (.363) (.53).745 (.1578).6858 (.7897).98 (.496) (.375).734 (.155).9988 (.9995).1667 (.5511) (.4719).714 (.1497) 1. (1.).184 (.68) p n=39 Normal random vectors: Type I error (Power).687 (.3986).71 (.94).88 (.3168).89 (.4451) (.558).638 (.1771).1179 (.69).143 (.3451) (.448).637 (.168).146 (.644).146 (.337) (.159).6 (.161).591 (.3465).1151 (.581) (.3).63 (.179).653 (.7761).191 (.44) 3.53 (.458).614 (.146).9955 (.9985).83 (.541) Gamma random vectors: Type I error (Power).937 (.945).891 (.91).113 (.13).53 (.316) (.41).741 (.19).1339 (.83).398 (.4734) (.46).79 (.177).1415 (.656).414 (.4649) (.3958).664 (.97).748 (.544).441 (.686) (.541).69 (.5).6615 (.8553).189 (.7913) 3.54 (.731).611 (.).9954 (.9996).1764 (.8957)

14 14 D. JIANG TABLE 1 (cont.) proposed CRST SCT TCT RST p n=79 Normal random vectors: Type I error (Power).64 (.6563).645 (.53).69 (.5353).759 (.6743) (.496).594 (.874).83 (.336).885 (.4841) (.41).568 (.1556).16 (.659).164 (.359) (.38).577 (.1487).158 (.6).17 (.3458) (.545).554 (.68).573 (.5345).1164 (.665) (.7117).564 (.36).6458 (.8616).197 (.831) Gamma random vectors: Type I error (Power).898 (.919).83 (.816).861 (.839).175 (.156) (.363).698 (.946).959 (.3415).384 (.694) (.659).618 (.443).133 (.5758).487 (.8698) (.6473).615 (.49).1337 (.581).494 (.8656) (.88).579 (.4377).668 (.753).487 (.9543) 3.54 (.9493).551 (.4533).654 (.955).31 (.994) p n=159 Normal random vectors: Type I error (Power).685 (.9178).68 (.8598).693 (.8657).713 (.9178) 4.56 (.7159).531 (.5891).616 (.69).69 (.7563) (.435).586 (.836).839 (.361).874 (.538) (.7986).563 (.4415).1198 (.5954).13 (.8869) (.851).541 (.4443).16 (.5989).198 (.89) 3.55 (.953).5 (.469).545 (.7714).119 (.9736) Gamma random vectors: Type I error (Power).76 (.889).76 (.745).766 (.747).1983 (.1) (.69).595 (.5397).78 (.551).1 (.7831) (.9145).68 (.7869).997 (.867).493 (.981) (.9783).594 (.83).1175 (.8963).5 (.999) (.9785).576 (.843).136 (.914).475 (.9997) (.9998).58 (.8566).53 (.9716).673 (1.)

15 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 15 Table Empirical powers of the comparative tests for H : Σ = I p at α =.5 significance level for normal and gamma random vectors with 1, replications. The alternative hypothesis is Σ = diag((1 + / np) 1 [v p], 1 p [v p]) with v =.5. CRST SCT TCT RST CRST SCT TCT RST p proposed p proposed Normal random vectors n=19 n= n=79 n= Gamma random vectors n=19 n= n=79 n=

16 16 D. JIANG proposed CRST remains accurate. Last, compared to the SCT in [7], our proposed CRST have more closer empirical sizes to 5% with growing dimension p, especially for the small sample sizes. Furthermore, the powers of proposed CRST uniformly dominates that of the SCT over the entire range. For example, the powers of proposed CRST rise rapidly up to 1 as p increases in the case of n = 79 under the Gamma assumption, while those of the SCT remains less than.5 even if the sample size is not quite small. Finally, It must be pointed out that the proposed CRST cannot be use for the case q n = 1, but it remains in force even if q = 1, which means the q n could be very close to 1 by two sides. So we choose a different p for each n, which makes q n 1, for example (p=17,n=19) or (p=77,n=79). Also, the cases as (p=,n=19) or (p=8,n=79) are chosen for q n 1 +. As seen from the results, the proposed CRST performs well even if q n 1. Table shows a more apparent comparison advantage under the second alternative. The higher empirical powers of RST and SCT don t make sense because their empirical sizes are much higher. Moreover, the powers of the TCT decline sharply, even near to.1, when the dimension p rises up. Whereas, the proposed CRST gives the powers around.9 at the eligible empirical sizes. For a more intuitive understanding, take the cases (n=39,p=8) and (n=39,p=3) as an example, Figure 1 portrays a dynamic view of the powers for the first alternative under the Gamma assumption by the varying parameter v from to.1. Figure describes the powers for the second alternative under the Gamma assumption by the varying parameter v from to.5. Because v depicts the distance between the null and alternative hypothesis, so the starting point at v = is for the empirical sizes. As shown in the picture, the proposed CRST is a more sensitive and powerful test with the accurate empirical sizes. 5. Conclusion. In this paper, we propose a new testing statistic for the large dimensional covariance structure test based on amending Rao s score tests by RMT. Through generalizing the CLT for LSS of a large dimensional sample covariance matrix in [4], we guarantee the test proposed is feasible for the non-gaussian variables in a wider range. Furthermore, the correction to Rao s score test can be also used in the case of ultra high dimensionality regardless of the functional relationship between p and n. It breaks the inherent thinking that the corrections by RMT are usually practicable when p < n, and shows that it is the corrected statistics we chose to decide whether the corrections by RMT can be used in the case of p > n rather than the

17 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 17 n=39,p=8 n=39,p=3 Powers CRST SCT TCT RST Powers CRST SCT TCT RST Varying Constant v Varying Constant v Fig 1 Empirical sizes and powers of the comparative tests for H : Σ = I p at α =.5 significance level based on 1, independent replications of Gamma Assumption. The null and alternative hypothesis are Σ = diag( 1 [v p], 1 p [v p]) with v varied from to.1. Left: n = 39, p = 8; Right: n = 39, p = 3. n=39,p=8 n=39,p=3 Powers CRST SCT TCT RST Powers CRST SCT TCT RST Varying Constant v Varying Constant v Fig Empirical sizes and powers of the comparative tests for H : Σ = I p at α =.5 significance level based on 1, independent replications of Gamma Assumption. The null and alternative hypothesis are Σ = diag((1 + / np) 1 [v p], 1 p [v p]) with v varied from to.5. Left: n = 39, p = 8; Right: n = 39, p = 3.

18 18 D. JIANG tools we used in RMT. So we believe that large dimensional spectral analysis in RMT will have more application fields in light of different situations. APPENDIX A: DERIVATIONS AND PROOFS. A.1. Proofs of the derivation in (.1) and (.3). The logarithm of the density of the sample χ is written as l(χ, θ) = np ln(π) n ln Σ 1 = np ln(π) n ln Σ 1 n (x i µ) Σ 1 (x i µ) n tr ( Σ 1 (x i µ)(x i µ) ), and θ is denoted as (µ, vec(σ) ). For the first part of (.1), by the formula dx BX dx = (B + B )X, where the X is a vector and B is a matrix dependent on X, we have dl(χ, θ) dµ = 1 = 1 n d(x i µ) Σ 1 (x i µ) dµ n Σ 1 (x i µ) = nσ 1 ( ˆµ µ) where ˆµ = 1 n n x i. For the second part of (.1), by the following formulas d ln X dx = vec((x 1 ) ) dtr(b X) dx dx 1 dx = ((X 1 ) X 1 ) = vec(b) (B C)vec(D) = vec(cdb) where X, B, C, D are all matrices. Then we have dl(χ, θ) dσ = n ( d ln Σ dσ 1 dσ 1 dtr Σ 1 dσ = n vec(σ 1 ) + n (Σ 1 Σ 1 )vec(a) = n vec(σ 1 (AΣ 1 I p )) n (x i µ)(x i µ) ) dσ 1

19 where A = 1 n LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 19 n (x i µ)(x i µ). Thus ( ) dl(χ, θ) dl(χ, θ) dvec(σ) = vec = n dσ vec(σ 1 (AΣ 1 I p )). Therefore, the score vector for the sample is ( ) ( U1 (χ, θ) nσ 1 ( ˆµ µ) U(χ, θ) =: = n U (χ, θ) vec(σ 1 (AΣ 1 I p )) Next consider the derivation of (.3). By the definitions of the Hessian matrix and score vector, we have ( ) H(χ, θ) = d du(χ, θ) H11 H l(χ, θ) = dθ dθ =: 1, H 1 H where the part of the parameter Σ is Since and H = n = n dvec(σ 1 (AΣ 1 I p )) dvec (Σ) dvec(σ 1 ) dvec(σ 1 AΣ 1 Σ 1 ) dvec (Σ) dvec (Σ 1 ) = n (Σ 1 Σ 1 )(AΣ 1 I p + I p AΣ 1 I p ). d(vec(σ 1 AΣ 1 ) vec(σ 1 )) dvec (Σ 1 ) = d(aσ 1 I p )vec(σ 1 ) dvec (Σ 1 ) = dvec(σ 1 ) dvec (Σ 1 ) (AΣ 1 I p ) + d(aσ 1 I p ) dvec (Σ 1 ) vec(σ 1 ) I p = (AΣ 1 I p ) + dvec(aσ 1 ) dvec (Σ 1 ) (I p Σ 1 ) I p = (AΣ 1 I p ) + (I p A)(I p Σ 1 ) I p = (AΣ 1 I p + I p AΣ 1 I p ). dvec(σ 1 ) dvec (Σ) = (Σ 1 I p )(I p Σ 1 ) = (Σ 1 Σ 1 ), ) dvec(σ 1 ) dvec (Σ 1 ) where the formulas (B C)vec(D) = vec(cdb) and (B C)(D E) = (BD CE) are repeatedly used.

20 D. JIANG A.. Proofs of Lemma.1. First, the result of (.8) and (.1) is corresponding to the ones in [4] with the 4th moment equal to 3. Obviously, the mean in (.8) is formed under the condition that the matrix T in [4] is identity, and its LSD is H(t) = I [1, ) (t) according to the assumptions in Lemma.1. Next, If we drop the condition on the 4th moment, it will be found that each of the (4.1) and (.7) in [4] should be plused an additional item by their (1.15) ) βqb p(z) E (e 1T 1 D 1 T 1 e1 e 1T 1 D 1 (m(z)t + I) 1 T 1 e1 and βb p (z 1 )b p (z ) n respectively, where n p j=1 e it 1 Ej (D 1 j (z 1 ))T 1 ei e it 1 Ej (D 1 (z ))T 1 ei e i = (,, 1,,, ) ; D(z) = T 1 Sn T 1 zi; }{{} i 1 D j (z) = D(z) r j rj ; r j = 1 T 1 1 ξ j b p (z) = n 1 + n 1 E trt D1 1 and E j is the conditional expectation given r 1,, r j for j = 1,, n. According to the Lemma 6. in [16], if the 4th moment is arbitrary finite number, the mean function of M(z) in Lemma 1.1 in [4] should be added βqm3 (z) t 1+tm(z) dh(t) 1 dh(t) (1+tm(z)) 1 q t m (z) dh(t) (1+tm(z)) which is the limit of βqm(z)b p(z) E (e 1 T 1 D 1 T 1 e 1 e 1 T 1 D 1 (m(z)t + I) 1 T 1 e 1 ) 1 q t m (z) (1+tm(z)) dh(t) ever dropped in (4.1) and (4.1) of [4]. Similarly, the covariance function of M(z) should include the additional item tm (z 1 ) βq (1 + tm(z 1 )) dh(t) tm (z ) (1 + tm(z )) dh(t) which is the limit of βb p(z 1 )b p (z ) z 1 z n n p j=1 e it 1 Ej (D 1 j j (z 1 ))T 1 ei e it 1 Ej (D 1 (z ))T 1 ei j

21 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 1 ever dropped in (.7) of [4]. Then by their (1.14), the added mean function of G n (f j ) should be βq (A.1) f j (z) m3 (z) t 1+tm(z) dh(t) 1 dh(t) (1+tm(z)) πi 1 q dz t m (z) dh(t) (1+tm(z)) and the covariance function of G n (f j ) should plus (A.) βq tm (z 1 ) 4π f j (z 1 )f l (z ) (1 + tm(z 1 )) dh(t) tm (z ) (1 + tm(z )) dh(t) dz 1dz Put the condition H(t) = I [1, ) (t) assuming in Lemma.1 into the equation (A.1) and (A.), then we have the additional mean function βq m 3 (z) f j (z) πi (1 + m(z))[(1 q)m (z) + m(z) + 1] dz, and the added covariance function βq f j (z 1 )f l (z ) 4π (1 + m(z 1 )) (1 + m(z )) dm(z 1)dm(z ), where j, l {1,, k}. A.3. test. Proofs of limiting schemes for the correction to Rao s score Calculation of F qn (g) in (3.3). Because F qn (g) = g(x)df qn (x), where F qn (x) is the Marčenko- Pastur law of the matrix S with index q n, the density is p qn (x) = 1 (bn x)(x a n ), πxq n if a n x b n,, otherwise, and has a point mass 1 1 at the origin if q n > 1, where a n = q n (1 q n ) and b n = (1 + q n ). (See in [5]). According the the definition, the supporting set of MP-law is x [, 4] if q n = 1. But it is unreasonable that x lies on the denominator if x = by the expression of the density. So we exclude the case of q n = 1 and consider the following integral first, F qn (g) = (x 1) (bn x)(x a n )dx πxq n

22 D. JIANG Make a substitution x = 1 + q n q n cos θ, where θ π, then F qn (g) = = π = 1 π bn (x 1) (bn x)(x a n )dx a n πxq n π (q n q n cos θ) 1 + q n q n cos θ π sin θ sin θdθ (q n q n cos θ) sin θ 1 + q n dθ q n cos θ Let x = 1+q n q n cos θ = q n (cos θ+d ), where d = 1 + q n is q n a constant. Thus, the above integral F qn (g) is obtained by the partition into three parts as below : 1 π = 1 π = 1 π π = 1 π { = π π (q n q n cos θ) sin θ 1 + q n dθ q n cos θ [ q n (cos θ + d ) 1] sin θ dθ q n (cos θ + d ) [ q n cos θ sin θ ( q n d + ) sin θ [ + (q n 1)π + π q n (1 + q n 1 q n ) q n, if q n < 1; q n 1 + 1/q n, if q n > 1, ] sin ] θ dθ q n (cos θ + d ) where the third part is calculated by the following integral, which is also used in other calculations. (A.3) π π = = = π 1 dθ = cos θ + d π 1 cos θ sin θ + d dθ d θ (1 tan θ + d sec θ ) cos θ d tan θ d 1 = π d 1 d (d 1) tan θ π ( ) 1 d 1 ( ) d 1 + d 1 d +1 tan θ d + 1 tan θ

23 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 3 and the third part of the limiting integral F qn (g) is (A.4) π = 1 q n = 1 q n sin θ q n (cos θ + d ) dθ π π 1 cos θ cos θ + d dθ = π q n (1 + q n 1 q n ) ( cos θ + d + 1 d cos θ + d )dθ Because the density corresponding to F qn (x) has a point mass 1 1 q n at the origin if q n > 1, then the F qn (g) should be added the term (1 ) (1 1 q n ) if q n > 1. Then we arrive at F qn (g) = q n, if q n < 1 & q n > 1. Calculation of µ(g) in (3.5). By (9.1.13) in Bai and Silverstein [5], with H(t) = I [1, (t), the first part of the limiting mean µ(g) in (.8) can also be expressed as ( g (a(q)) + g (b(q)) µ 1 (g) = (κ 1) 1 ) b(q) g(x) 4 π dx a(q) 4q (x 1 q) where a(q) = (1 q) and b(q) = (1 + q). For g(x) = (x 1), make a substitution x = 1 + q q cos θ, θ π, then ( g (a(q)) + g (b(q)) µ 1 (g) = (κ 1) 4 1 π g(1 + q ) q cos θ) π q sin θdθ 4q ( q cos θ) ( g (a(q)) + g (b(q)) = (κ 1) ( 4q + q = (κ 1) = (κ 1)q 4 1 4π π 1 π g(1 + q ) q cos θ)dθ 4π ) (q 4q 3 cos θ + 4q cos θ)dθ where κ = if the variables are real, and κ = 1 if the variables are complex.

24 4 D. JIANG The second part of the limiting mean µ(g) is obtained by (.9) µ (g) = βq (1 z) m 3 (z) πi (1 + m(z))[(1 q)m (z) + m(z) + 1] dz, For z C +, recall the equation (9.1.1) given in [4] z = 1 m(z) + q 1 + m(z). Denote m(z) as m for simplicity, it is easily obtained that (1 z) = [m (q )m + 1] m (1 + m) dz = (1 q)m + m + 1 m (1 + m) dm then we have µ (g) = βq [m (q )m + 1] πi m(1 + m) 5 dm, By solving for m from (9.1.1) in [5], we get the contour for the integral above should enclose the interval [ min( 1 1 q, ), max( q 1 q, 1 ] 1 + q ). Therefore, -1 is the residue if q 1 and is the residue if q > 1. and the integral is calculated as µ (g) = βq, which is the same result for both the cases of q 1 and q > 1. Finally, we obtained Calculation of υ(g) in (3.6). µ(g) = (κ 1)q + βq. By Lemma.1, the first part of limiting variance υ(g) in (.1) is υ 1 (g) = κ g(z 1 )g(z ) 4π (m(z 1 ) m(z )) dm(z 1)dm(z ) and g(z 1 )g(z ) = (z 1 1) (z 1) = 1 z 1 z + z 1 + z + 4z 1 z z 1z z 1 z + z 1z.

25 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 5 Let 1 denote constant function which equals to 1, It is obvious that υ(1, 1) =. Denoting m(z i ) = m i, i = 1,. As mentioned above, for fixed m, we have a contour enclosed -1, but not when q 1, whereas it enclosed, but not -1 when q > 1. On one hand, we consider the case of q 1. Because z 1 (m 1 m ) dm 1 1 = q ( + 1 q ) (1 + m 1 ) j (m + 1) l( m m 1 q m + 1 )l 1 dm 1 and (A.5) = πi q (1 + m ). z j= (m 1 m ) dm 1 ( = 1 ) ( q 1 + m 1 + m [ = 4πi m q (1 + m 1 ) + q (1 + m 1 ) 3 ]. ) q (1 + m ) l=1 1 m m 1 dm So υ(z 1 z 1, 1) =. Similarly, υ(1, z z ) =. Therefore, there are only four parts left, i.e. z 1 z z 1 z z 1 z +4z 1z. Further, υ(z 1, z ) = κq πi = κq 1 (m + 1) ( q 1 + m q ) (1 + m ) j dm j= υ(z1, z ) = κ 4π = κq ( 1 m 1 + πi z 1 z (m 1 m ) dm 1dm ) q 1+m 1 (1 + m 1 ) dm 1 = κq ( 1 ) ( q 1 + πi m m 1 = κq(1 + q). m 1 ) q 1 (1 + m 1 ) dm m 1

26 6 D. JIANG Similarly, υ(z 1, z ) = κq(1+q). For the last part υ(z 1, z ), the integral is calculated by eq.(a.5) as below. υ(z1, z) = κ z1 z = κq ( πi = κq πi + κq πi 4π (m 1 m ) dm 1dm 1 ) q [ ] 1 + m m 1 (1 + m 1 ) + q (1 + m 1 ) 3 dm 1 ( 1 ) ( ) q 1 q 1 + m m 1 m 1 (1 + m 1 ) dm m 1 ( 1 ) ( ) [ q 1 q 1 + m m 1 (1 + m 1 ) (1 + m 1 ) = κ(4q + 1q + 4q 3 ) Finally, we obtain m 1 υ 1 (g) = υ(z 1, z ) υ(z 1, z ) υ(z 1, z ) + 4υ(z 1, z ) = κ(4q + 1q + 4q 3 ) 8κq(1 + q) + 4κq = κq (1 + q). when q 1. On the other hand, Similar calculations are conducted for the case of q > 1. It is found that the result is the same as the one of the case above only with the residues are changed from -1 to. So for all the cases of q, we arrive at υ 1 (g) = κq (1 + q) For the second part of υ(g) in (.11), we have υ (g) = βq g(z 1 )g(z ) 4π (1 + m 1 ) (1 + m ) dm 1dm. Furthermore, g(z 1 ) (1 + m 1 ) dm 1 = [m 1 (q )m 1 + 1] m 1 (1 + m 1) 4 dm 1 = 4πiq Since the contour contains -1 as a residue if q 1, and enclose as a residue for the other case q > 1. By the calculations of the both cases, it will be found that the results are all the same although the residues are different. Thus we get υ (g) = βq 4π ( 4πiq) ( 4πiq) = 4βq3. ] dm 1

27 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 7 Finally, we obtained υ(g) = κq (1 + q) + 4βq 3. ACKNOWLEDGEMENT The author thanks the reviewers for their helpful comments and suggestions to make an improvement of this article. This research was supported by the National Natural Science Foundation of China REFERENCES [1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. Second Edition. John Wiley & Sons. [] Bai, Z.D., Jiang, D., Yao, J. F. and Zheng, S. (9) Corrections to LRT on large dimensional covariance matrix by RMT. Ann. Statist. 37, No.6B: [3] Bai, Z. D. and Silverstein, J. W. (1999). Exact Separation of Eigenvalues of Large Dimensional Sample Covariance Matrices. Ann.Probab. 7, No.3, [4] Bai, Z. D. and Silverstein, J. W. (4). CLT for linear spectral statistics of large dimensional sample covariance matrices. Ann.Probab. 3, [5] Bai, Z. D. and Silverstein, J. W. (1). Spectral analysis of large dimensional random matrices. nd ed. Beijing: Science Press. [6] Cai, T.T. and Ma, Z. (13). Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli. 19 (5B), [7] Chen, S.X., Zhang, L.X. and Zhong, P.S. (1). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 15, [8] Gombay, E. (). Parametric sequential tests in the presence of nuisance parameters. Theory Stochastic Progresses. 8, [9] Jiang, D., Jiang, T. and Yang, F. (1). Likelihood ratio tests for covariance matrices of high-dimensional normal distributions. J. Statist. Plann. Inference. 14, [1] John, S. (1971). Some Optimal Multivariate Tests. Biometrika. 59, [11] Johnstone, I.M. (1). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 9, [1] Ledoit, O. and Wolf, M. () Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 3, [13] Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist. 1, [14] Rao, C.R. (1948). Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society. 44, [15] Srivastava, M.S. (5). Some tests concerning the covariance matrix in high dimensional data. J. Japan Statist. Soc. 35, [16] Zheng, S. (1). Central limit theorems for linear spectral statistics of large dimensional F -Matrices. Annales de 1 Institut Henri Poincaré-Probabilités et Statistiques. 48, No., [17] Zheng, S., Bai, Z.D. and Yao, J. F. (15). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Ann. Statist. 43, No.,

28 8 D. JIANG School of Mathematics, Jilin University, 699 QianJin Street, Changchun 131, China.

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics