TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST

Size: px
Start display at page:

Download "TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST"

Transcription

1 Submitted to the Annals of Statistics arxiv:. TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST By Dandan Jiang Jilin University arxiv: v1 [stat.me] 11 Oct 15 This paper proposes a new test for covariance matrices structure based on the correction to Rao s score test in large dimensional framework. By generalizing the CLT for the linear spectral statistics of large dimensional sample covariance matrices, the test can be applicable for large dimensional non-gaussian variables in a wider range without the restriction of the 4th moment. Moreover, the amending Rao s score test is also powerful even for the ultra high dimensionality as p n, which breaks the inherent idea that the corrected tests by RMT can be only used when p < n. Finally, we compare the proposed test with other high dimensional covariance structure tests to evaluate their performances through the simulation study. 1. Introduction. Recent advances in data acquisition techniques and the ease of access to high computation power have fueled increased interest in analyzing the data with moderate even large dimensional variables in most sciences, such as microarray gene expressions in biology, where the number of feature variables p greatly exceeds the sample size n. However, the traditional statistical methods encounter failure due to the increase in dimensionality, because they are established on the basis of fixed dimension p as the sample size n tends to infinity. So many efforts have been made to improve the power of the classical statistical methods and to propose new procedures designed for the large dimensional data. A particular attention has been paid to the covariance matrices structure test, which is of fundamental statistical interest and widely used in the biology, finance and etc. Let χ = (x 1,, x n ) is an independent and identically distributed sample from a p dimensional random vector X with mean µ and covariance matrix Σ. To test on the structure of covariance matrices, we consider the hypothesis (1.1) H : Σ = Σ v.s. H 1 : Σ Σ, Supported by Project from NSFC. Primary 6H15; secondary 6H1 Keywords and phrases: Large dimensional data, Covariance structure, Rao s score test, Random matrix theory. 1

2 D. JIANG which covers the identity hypothesis test H : Σ = I p and the sphericity hypothesis test H : Σ = γi p as the special cases. Within this context, it has been well studied under the normal distribution assumption with the classical setting of fixed p, such as [1], [1] and [13]. Also, the Rao s score test was given in [14]. But they all lost their effectiveness as p is a moderate or ultra high dimension, even worse for the non-gaussian variables. Therefore, many statisticians have investigated this problem and provided the various solutions for the large dimensional data setting. The earlier works include [11], [1] and [15], which involved some well-chosen distance function relied on the first and second spectral moments as dimension p and sample size n go to infinity together, whereas they were invalid for either the ultra high dimensionality or non-gaussian variables. Then Bai et al. [] focused on deriving the limiting behavior of the corrected LRT under the large dimensional limiting scheme p/n c [, 1), and Jiang et al. [9] extended it to a wider spread with c [, 1] and p < n. Their methods expanded the application range without distribution assumption, but still not applicable for the case of p > n where the likelihood ratio cannot be well defined. Recently, Chen et al. [7] proposed a nonparametric test with the constrains of uniformly bounded 8th moment and derived its asymptotic distribution under the null hypothesis regardless of the limiting behavior of p/n. Motivated by this, Cai and Ma [6] investigated the high dimensional covariance testing problem from a minimax point of view under the normal assumption. It showed that its power uniformly dominated that of the corrected LRTs over the entire asymptotic regime in which the corrected LRTs were defined. Though it had the optimal power, as seen from our simulation, It failed in empirical sizes when the dimension p was much higher than the sample size n, especially the case of large p small n. In this paper, we proposed a new test for the hypothesis (1.1) by RMT (random matrix theory) based on the aforementioned Rao s score test. The main contributions of this work displayed in several aspects. First, we generalized the CLT(central limit theorem) for the LSS (linear spectral statistic) of large dimensional sample covariance matrices in [4]. By removing the restriction that the 4th moment of the variable is 3 + δ, where δ is a positive constant tending to, we provided an enhanced version of the theorem, which made the test proposed in this work suitable for non-gaussian variables in a wider range. Moreover, our correction based on Rao s score test can be applied to the ultra high dimensionality in despite of the functional relationship between p and n. Although it was derived under the limiting scheme p/(n 1) q [, + ) with unknown mean parameter µ, exactly what we need was just the ratio of p over n in practical problems,

3 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 3 which is always easily acquired under any functional expression of p and n. It can be sustained by the simulation when (p, n) = (4, 19) or (3, 79) and etc., which are close to the pair numbers adopted in [7] by the function p = exp(n.4 ) + 1. It also revealed that whether the corrections by RMT can be used in the case of p > n depends on the corrected statistics we chose rather than the tools we used in RMT. Finally, the restricted condition is relaxed to the finite 4th moment compared with [7], and our correction to Rao s score test has the more accurate sizes and better powers as shown in the simulation study. The remainder of the article is organized as follows. Section gives a quick review of the Rao s Score test, then details their testing statistics for covariance structure tests. An enhanced version of the large dimensional CLT in [4] is also provided in this part. In Section 3, we propose the new testing statistics in large dimensional setting based on the Rao s score test. Simulation results are presented to evaluate the performance of our test compared with other large dimensional covariance matrices tests in Section 4. Then we draw a conclusion in the Section 5, and the proofs and derivations are listed in the Appendix A. Preliminary. We first give a quick review of the Rao s Score test, and derive their classical test statistic for the hypothesis (1.1). Then the test statistic is refined into something precisely needed in the amendment process. An enhanced version of the CLT for LSS of large dimensional sample covariance matrices is also presented, which makes it possible that the modifications of the score tests have a wider use with the 4th moment requirement excluded..1. Rao s Score Test. Let X be a random variable with population distribution F X (x, θ) and density function f X (x, θ), where θ is an unknown parameter. The score vector of X is defined as U(X, θ) = d dθ lnf X(x, θ). Then the information matrix of X is I(X, θ) = E(U(X, θ)u (X, θ)) It is well known that the information matrix can be also calculated by Hessian matrix H(X, θ) as below: I(X, θ) = E(H(X, θ)) = E( d dθ lnf X(x, θ)) Let χ = (x 1,, x n ) denote a sample from the population distribution F X (x, θ). Then the log-likelihood, the score function and the infor-

4 4 D. JIANG mation matrix of the sample are given by l(χ, θ) = n lnf(x i, θ), U(χ, θ) = n U(x i, θ) and I(χ, θ) = ni(x 1, θ), respectively. Then we have the definition of Rao s score test statistic as below: Definition.1. Rao s score test statistic for the hypothesis H : θ = θ is defined by RST(χ, θ ) = U (χ, θ )I(χ, θ ) 1 U(χ, θ ), where θ = (θ 1,, θ p ) is a known vector and RST(χ, θ ) tends to a χ p limiting distribution as n under H. (Rao,1948). To specify the Rao s score test statistic for hypothesis test (1.1), we suppose the sample χ = (x 1,, x n ) follows a normal distribution with mean parameter µ and covariance matrix Σ. Denote θ = (µ, vec(σ) ), where vec( ) is the vectorization operator. First, the logarithm of the density of the sample χ is written as l(χ, θ) = np ln(π) n ln Σ 1 By the definition where d = dθ the sample is d dµ d dvec(σ) (.1) U(χ, θ) =: where (.) ˆµ = 1 n n tr ( Σ 1 (x i µ)(x i µ) ). U(χ, θ) = d l(χ, θ), dθ is a p(p + 1) 1 vector, then the score vector for ( U1 (χ, θ) U (χ, θ) ) ( = n x i and A = 1 n nσ 1 ) ( ˆµ µ) n vec(σ 1 (AΣ 1 I p )) n (x i µ)(x i µ). Derivations of (.1) is specified in the Appendix A.1.

5 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 5 ( ) Secondly, the Hessian matrix H(χ, θ) = d l(χ, θ) =: H11 H 1, dθ H 1 H where the part of the parameter Σ is (.3) H = n = n dvec(σ 1 (AΣ 1 I p )) dvec (Σ) dvec(σ 1 ) dvec(σ 1 AΣ 1 Σ 1 ) dvec (Σ) dvec (Σ 1 ) = n (Σ 1 Σ 1 )(AΣ 1 I p + I p AΣ 1 I p ). Details of derivations for (.3) can be found in the Appendix A.1. Because I(X, θ) = E(H(X, θ)) and E(A) = E[(X µ)(x µ) ] = Σ, where A is defined in (.), then the information matrix where the part for Σ is I(χ, θ) =: ( I11 (χ, θ) I 1 (χ, θ) I 1 (χ, θ) I (χ, θ) I (χ, θ) = n (Σ 1 Σ 1 ) If there are no restrictions on µ, the parameter µ in the score vector is replaced by its maximum likelihood estimator ˆµ. Then the part of the score vector corresponding to µ turns to, and only the second part of the score vector U (χ, θ) and the element I (χ, θ) of the information matrix contribute to the calculation of the Rao s score test statistic.(see [8]). Therefore, the Rao s score test statistic for hypothesis test (1.1) can be calculated by the expressions of U (χ, θ) and I (χ, θ), where µ and A are substituted with sample mean ˆµ and the sample covariance matrix (.4) ˆΣ = 1 n n (x i ˆµ)(x i ˆµ), respectively. Also Σ is instead of Σ under the null hypothesis. Thus, we have Proposition.1. Rao s score test statistic for testing H : Σ = Σ with no constrains on µ has the following form (.5) RST(χ, Σ ) = n tr[(σ 1 Σ I p ) ] where χ = (x 1,, x n ) is a sample from N p (µ, Σ), and the test statistic RST(χ, Σ ) tends to a χ p(p + 1) distribution with freedom degree under H when n. ),

6 6 D. JIANG Proof. RST(χ, Σ ) = n vec (Σ 1 1 ( ˆΣΣ I p ))[ n (Σ 1 Σ 1 n )] 1 vec(σ 1 1 ( ˆΣΣ I p )) = n vec (Σ 1 1 ( ˆΣΣ I p ))vec( ˆΣ Σ ) = n tr[(σ 1 Σ I p ) ] For some special cases are listed in the corollaries as following. Corollary.1. Rao s score test statistic for testing H : Σ = I p with no constrains on µ has the following form RST(χ, I p ) = n tr[( Σ I p ) ] where χ = (x 1,, x n ) is a sample from N p (µ, Σ), and the test statistic RST(χ, I p ) tends to a χ p(p + 1) distribution with freedom degree under H when n. Corollary.. Rao s score test statistic for testing H : Σ = γi p with no constrains on µ has the following form RST(χ, γi p ) = n tr[( p tr( Σ) Σ I p ) ] where χ = (x 1,, x n ) is a sample from N p (µ, Σ), and the test statistic RST(χ, γi p ) tends to a χ p(p + 1) distribution with freedom degree 1 under H when n. Proof. Replace the Σ by ˆγI p according to (.5), where ˆγ = the maximum likelihood estimator of γ. tr( Σ) p is.. CLT for LSS of a large dimensional sample covariance matrix. As seen above, the statistics of Rao s score test for the hypothesis (1.1) can be encoded by the trace function of a matrix, i.e. a function of the eigenvalues of some matrix concerned with the sample covariance matrix. That is exactly what we need in the corrections to the score test for large dimensional cases.

7 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 7 Consequently, a quick survey of the CLT for LSS of a large dimensional sample covariance matrix referred in [4] is presented below, which is a basic tool for improvements on the classical Rao s score test. Because the original version of the theorem has a strict condition that the 4th moment of the variable is 3 + δ, where δ is a positive constant tending to, so we derive an enhanced version excluding this requirement for a more widely usage. Before quoting, we first introduce some basic concepts and notations. Suppose {ξ ki C, i, k = 1,, } be a double array of i.i.d. random variables with mean and variance 1. Then (ξ 1,, ξ n ) is regarded as an i.i.d. sample from some p-dimensional distribution with mean p and covariance matrix I p, where ξ i = (ξ 1i, ξ i,, ξ pi ). So the sample covariance matrix is (.6) S n = 1 n n ξ i ξ i, where we use conjugate transpose for the complex variables instead. For simplicity we use F q, F qn to denote the Marčenko-Pastur law of index q and q n respectively, where q n = p Sn n q [, + ). Fn marks the empirical spectral distribution(esd) of the matrix S n, which is defined as ( where λ Sn i Fn Sn (x) = 1 p p 1 λ Sn i x, x R, ) are the real eigenvalues of the p p square matrix S n. Define f(x)dfn Sn (x) = 1 p p f(λ Sn i ), which is a so-called linear spectral statistic (LSS) of the random matrix S n. Based on this, we consider the empirical process G n := {G n (f)} indexed by A, (.7) G n (f) = p + f(x) [ F Sn n F qn] (dx), f A, where U is an open set of the complex plane including [a(q), b(q)], where a(q) = (1 q) and b(q) = (1+ q) ], and A be the set of analytic functions f : U C. Actually, the contours in U should contain the whole supporting set of the LSD F q. It is known that if q 1, exactly it is [a(q), b(q)]. If q > 1, the contours should enclose the whole support {} [a(q), b(q)], because the F q has a positive mass at the origin at this time. However, due to the exact

8 8 D. JIANG separation theorem in [3], for large enough p and n, the discrete mass at the origin will coincide with that of F q. So we can restrict the integral G n (f) on the contours only enclosed the continuous part of the LSD F q. Define {, if the ξ variables are real, κ = 1, if the ξ variables are complex. Then an enhanced version of Theorem 1.1 in [4] is provided, which will play a fundamental role in next derivations. Lemma.1. Assume: f 1,, f k A, {ξ ij } are i.i.d. random variables, such that Eξ 11 =, E ξ 11 = κ 1, E ξ 11 4 < and the {ξ ij } satisfy the condition 1 np E ξ ij 4 I( ξ ij nη) ij for any fixed η >. Moreover, p n = q n q [, + ) as n, p and E(ξ11 4 ) = β + κ + 1, where β is a constant. Then the random vector (G n (f 1 ),, G n (f k )) forms a tight sequence by the index n, and it weakly converges to a k-dimensional Gaussian vector with mean vector µ(f j ) = κ 1 qm 3 (.8) (z)(1 + m(z)) f j (z) πi [(1 q)m (z) + m(z) + 1] dz + βq m 3 (.9) (z) f j (z) πi (1 + m(z))[(1 q)m (z) + m(z) + 1] dz, and covariance function υ (f j, f l ) = κ f j (z 1 )f l (z ) (.1) 4π (m(z 1 ) m(z )) dm(z 1)dm(z ) βq f j (z 1 )f l (z ) (.11) 4π (1 + m(z 1 )) (1 + m(z )) dm(z 1)dm(z ), where j, l {1,, k}, and m(z) m F q(z) is the Stieltjes Transform of F q (1 q)i [, ) + qf q. The contours all contain the support of F q and non overlapping in both (.1) and (.11). The proof of the Lemma.1 is detailed in Appendix A..

9 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST The Proposed Testing Statistics. In this part, χ = (x 1,, x n ) remains to be an independent and identically distributed sample from a p dimensional random vector X with mean µ and covariance matrix Σ. For testing the hypothesis H : Σ = Σ, set ξ i = Σ 1 (x i µ), then the array { ξ i },,n contains p-dimensional standardized variables under H. If the mean parameter µ is known, the Lemma.1 can be cited in a direct way because its sample covariance matrix is identical with S n in (.6). However, it shows a slightly difference with unknown µ. By [17], it is reasonable to use n 1 instead of n, if applying the CLT in the Lemma.1 to correct the score test in large dimensional data with the unknown mean parameter. Also, the estimator of covariance matrix in the the corrected statistics should be changed into the unbiased one. Therefore, we define the unbiased sample covariance matrix of { ξ i } as Σ = Σ 1 n ( n 1 Σ)Σ 1, and denote S = Σ 1 ( n n 1 Σ). Because Σ has the same LSD with S n 1 defined in (.6) with n substituted by n 1, so that the matrix S has the same LSD as S n 1 due to the positive definiteness of Σ. Therefore, it is natural to use n 1 instead of n by [17] when the Lemma.1 is applied to amending the score test concerned the eigenvalues of S. Let (3.1) RST(χ, Σ ) = n tr[(s I p) ], then the correction to Rao s score test is hold in the following theorem: Theorem 3.1. Suppose that the conditions of Lemma.1 hold, for hypothesis test H : Σ = Σ, RST(χ, Σ ) is defined as (3.1), set p/(n 1) = q n q [, + ), q n 1 and g(x) = (x 1). Then, under H and when n, the correction to Rao s score test statistics is (3.) [ ] CRST (χ, Σ ) = υ(g) 1 n RST(χ, Σ ) p F qn (g) µ(g) N (, 1), where F qn is the Marčenko-Pastur law of index q n, and F qn (g), µ(g) and υ(g) are calculated in (3.3), (3.5) and (3.6), respectively.

10 1 D. JIANG Proof. By the derivation (3.1), we have n RST(χ, Σ ) = tr[(s I p ) ] = p ( λ S i 1 ) = p (x 1) dfn S (x) = p g(x)d ( F S n (x) F qn (x) ) + p F qn (g), where (λ S i ), i = 1,, p and F n S are the eigenvalues and the ESD of the matrix S, respectively. F qn (g) denotes the integral of the function g(x) by the density corresponding to the Marčenko-Pastur law of index q n, that is (3.3) F qn (g) = g(x)df qn (x) = q n, if q n 1, which is calculated in the Appendix A.3. As the definition in (.7), we have (3.4) G n (g) = p g(x)d ( Fn S (x) F qn (x) ) = n RST(χ, Σ ) p F qn (g). By Lemma.1, G n (g) weakly converges to a Gaussian vector with the mean (3.5) µ(g) = (κ 1)q + βq and variance (3.6) υ(g) = κq (1 + q) + 4βq 3, which are calculated in the Appendix A.3. Then, by (3.4) we arrive at Finally, CRST (χ, Σ ) = υ(g) 1 n RST(χ, Σ ) p F qn (g) N (µ(g), υ(g)), [ ] n RST(χ, Σ ) p F qn (g) µ(g) N (, 1) For the identity and sphericity hypothesis tests, we have the following corollaries:

11 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 11 Corollary 3.1. For testing H : Σ = I p with no constrains on µ, the conclusion of Theorem 3.1 still holds, only with the test statistic RST(χ, I p ) in (3.) is revised by RST(χ, I p ) = n tr[( n n 1 Σ I p ) ]. Corollary 3.. For testing H : Σ = γi p with no constrains on µ, the conclusion of Theorem 3.1 still holds, only with the test statistic RST(χ, I p ) in (3.) is revised by where ˆγ = tr( n n 1 Σ) p RST(χ, γi p ) = n n tr[(ˆγ 1 n 1 Σ I p ) ] is the maximum likelihood estimator of γ. 4. Simulation Study. Simulations are conducted in this section to evaluate the correction to Rao s score test (CRST) that we proposed. To compare the performance, we also present the corresponding simulation results of the test in [7] (SCT), the test in [6] (TCT) and the classical Rao s score test in [14] (RST). We consider the identity hypothesis test H : Σ = I p, and generate i.i.d random samples χ = (x 1,, x n ) from a p-dimensional random vector X following two scenarios of the populations under the null hypothesis, Gaussian Assumption: random vector X follows a p-dimensional normal distribution with mean µ 1 p and covariance matrix I p, where µ = and 1 p denotes a vector with that all elements are 1. Gamma Assumption: random vector X = (X 1,, X p ) and the components are independent and identically distributed as Gamma (4,.5), so that each of the random variables X i also has mean and variance 1. For each set of the scenarios, we report both empirical Type I errors and powers with 1, replications at α =.5 significance level. Different pair values of p, n are selected at a wide rage regardless of the functional expression or limiting behavior between them. The mean parameter is supposed to be unknown and substituted by the sample mean during the calculations. To calculate the empirical powers of the tests, two alternatives are designed in the simulations. In the first alternative, two different sample sets are provided for the corresponding scenarios. For Gaussian assumption,

12 1 D. JIANG the samples are independently generated from the random vector X following the normal distribution with mean vector µ 1 p and covariance matrix Σ = diag( 1 [v p], 1 p [v p]), where µ =, v =. are varying constants and [ ] denotes the integer truncation function. For Gamma assumption, the samples are still randomly selected from the random vector X = (X 1,, X p ) with independent components. Each component of the front part (X 1,, X [v p]) is distributed as Gamma(,1), whereas each of the components in the rest part (X [v p]+1,, X [p v p]) follows Gamma(4,.5), where v =.4. In the second alternative, the samples for Gaussian assumption are independently drawn from the normal distribution with mean vector µ 1 p and covariance matrix Σ = diag((1 + / np) 1 [v p], 1 p [v p]), where µ =, v =.5. The samples for Gamma assumption are followed the distribution of the random vector X = (X 1,, X p ), which satisfied that the components are independent and each component in the front part 4 (X 1,, X [v p]) is distributed as Gamma( 1+/ np, 1+/ np ), whereas each component of the rest part (X [v p]+1,, X [p v p]) follows Gamma(4,.5), where v =.5. Simulation results of empirical Type I errors and powers for the first alternative are listed in the Table 1, and the empirical powers for the second alternative is represented in Table. As seen from the Table 1, the empirical Type I errors of our proposed test CRST for both scenarios are almost around the nominal size 5%, and it converges to the nominal level rapidly as the dimension p approaches infinity, even for small n. Although, the empirical sizes of the proposed CRST is slightly higher for the case of p = 17 or under the Gamma assumption, it can be accepted with comparison to the other tests and be understood due to both asymptotic and nonparametric. For a further comparison, it is limited to several aspects. First, the Rao s score test and our proposed test both give a good performance when p is very small under the normal assumption. However, the empirical sizes of the Rao s score test deviate from the nominal level as p increases, and it shows a even worse result under the Gamma distribution assumption, where the proposed CRST is still active. Another interesting note is that the Rao s score test has a resilient power for the normal cases when p is much higher than n, for example (p = 16, n = 19). Second, for small and moderate dimensions like p = or 4 with higher sample size n = 79 or 159, the empirical Type I errors of the TCT in [6] behave well. However, the TCT leads to a dramatically high empirical size as the dimension p increases much higher, especially for large p, small n such as (p=16,n=19), though it has the optimal powers. Meanwhile, the

13 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 13 Table 1 Empirical sizes and powers(in brackets) of the comparative tests for H : Σ = I p at α =.5 significance level for normal and gamma random vectors with 1, replications. The alternative hypothesis is Σ = diag( 1 [v p], 1 p [v p]) with v =. for Normal variables and v =.4 for Gamma variables. proposed CRST SCT TCT RST p n=19 Normal random vectors: Type I error (Power) (.785).748 (.34).157 (.497).973 (.385).715 (.511).765 (.181).116 (.494).118 (.31) 4.68 (.1638).75 (.146).54 (.345).119 (.487) (.1148).7 (.951).6664 (.7115).17 (.1893) (.1955).718 (.178).9985 (.999).89 (.53) (.791).76 (.146) 1. (1.).419 (.434) Gamma random vectors: Type I error (Power) (.986).94 (.895).193 (.134).189 (.173).93 (.943).913 (.887).145 (.1465).34 (.55) (.1611).788 (.1316).794 (.3584).8 (.363) (.53).745 (.1578).6858 (.7897).98 (.496) (.375).734 (.155).9988 (.9995).1667 (.5511) (.4719).714 (.1497) 1. (1.).184 (.68) p n=39 Normal random vectors: Type I error (Power).687 (.3986).71 (.94).88 (.3168).89 (.4451) (.558).638 (.1771).1179 (.69).143 (.3451) (.448).637 (.168).146 (.644).146 (.337) (.159).6 (.161).591 (.3465).1151 (.581) (.3).63 (.179).653 (.7761).191 (.44) 3.53 (.458).614 (.146).9955 (.9985).83 (.541) Gamma random vectors: Type I error (Power).937 (.945).891 (.91).113 (.13).53 (.316) (.41).741 (.19).1339 (.83).398 (.4734) (.46).79 (.177).1415 (.656).414 (.4649) (.3958).664 (.97).748 (.544).441 (.686) (.541).69 (.5).6615 (.8553).189 (.7913) 3.54 (.731).611 (.).9954 (.9996).1764 (.8957)

14 14 D. JIANG TABLE 1 (cont.) proposed CRST SCT TCT RST p n=79 Normal random vectors: Type I error (Power).64 (.6563).645 (.53).69 (.5353).759 (.6743) (.496).594 (.874).83 (.336).885 (.4841) (.41).568 (.1556).16 (.659).164 (.359) (.38).577 (.1487).158 (.6).17 (.3458) (.545).554 (.68).573 (.5345).1164 (.665) (.7117).564 (.36).6458 (.8616).197 (.831) Gamma random vectors: Type I error (Power).898 (.919).83 (.816).861 (.839).175 (.156) (.363).698 (.946).959 (.3415).384 (.694) (.659).618 (.443).133 (.5758).487 (.8698) (.6473).615 (.49).1337 (.581).494 (.8656) (.88).579 (.4377).668 (.753).487 (.9543) 3.54 (.9493).551 (.4533).654 (.955).31 (.994) p n=159 Normal random vectors: Type I error (Power).685 (.9178).68 (.8598).693 (.8657).713 (.9178) 4.56 (.7159).531 (.5891).616 (.69).69 (.7563) (.435).586 (.836).839 (.361).874 (.538) (.7986).563 (.4415).1198 (.5954).13 (.8869) (.851).541 (.4443).16 (.5989).198 (.89) 3.55 (.953).5 (.469).545 (.7714).119 (.9736) Gamma random vectors: Type I error (Power).76 (.889).76 (.745).766 (.747).1983 (.1) (.69).595 (.5397).78 (.551).1 (.7831) (.9145).68 (.7869).997 (.867).493 (.981) (.9783).594 (.83).1175 (.8963).5 (.999) (.9785).576 (.843).136 (.914).475 (.9997) (.9998).58 (.8566).53 (.9716).673 (1.)

15 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 15 Table Empirical powers of the comparative tests for H : Σ = I p at α =.5 significance level for normal and gamma random vectors with 1, replications. The alternative hypothesis is Σ = diag((1 + / np) 1 [v p], 1 p [v p]) with v =.5. CRST SCT TCT RST CRST SCT TCT RST p proposed p proposed Normal random vectors n=19 n= n=79 n= Gamma random vectors n=19 n= n=79 n=

16 16 D. JIANG proposed CRST remains accurate. Last, compared to the SCT in [7], our proposed CRST have more closer empirical sizes to 5% with growing dimension p, especially for the small sample sizes. Furthermore, the powers of proposed CRST uniformly dominates that of the SCT over the entire range. For example, the powers of proposed CRST rise rapidly up to 1 as p increases in the case of n = 79 under the Gamma assumption, while those of the SCT remains less than.5 even if the sample size is not quite small. Finally, It must be pointed out that the proposed CRST cannot be use for the case q n = 1, but it remains in force even if q = 1, which means the q n could be very close to 1 by two sides. So we choose a different p for each n, which makes q n 1, for example (p=17,n=19) or (p=77,n=79). Also, the cases as (p=,n=19) or (p=8,n=79) are chosen for q n 1 +. As seen from the results, the proposed CRST performs well even if q n 1. Table shows a more apparent comparison advantage under the second alternative. The higher empirical powers of RST and SCT don t make sense because their empirical sizes are much higher. Moreover, the powers of the TCT decline sharply, even near to.1, when the dimension p rises up. Whereas, the proposed CRST gives the powers around.9 at the eligible empirical sizes. For a more intuitive understanding, take the cases (n=39,p=8) and (n=39,p=3) as an example, Figure 1 portrays a dynamic view of the powers for the first alternative under the Gamma assumption by the varying parameter v from to.1. Figure describes the powers for the second alternative under the Gamma assumption by the varying parameter v from to.5. Because v depicts the distance between the null and alternative hypothesis, so the starting point at v = is for the empirical sizes. As shown in the picture, the proposed CRST is a more sensitive and powerful test with the accurate empirical sizes. 5. Conclusion. In this paper, we propose a new testing statistic for the large dimensional covariance structure test based on amending Rao s score tests by RMT. Through generalizing the CLT for LSS of a large dimensional sample covariance matrix in [4], we guarantee the test proposed is feasible for the non-gaussian variables in a wider range. Furthermore, the correction to Rao s score test can be also used in the case of ultra high dimensionality regardless of the functional relationship between p and n. It breaks the inherent thinking that the corrections by RMT are usually practicable when p < n, and shows that it is the corrected statistics we chose to decide whether the corrections by RMT can be used in the case of p > n rather than the

17 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 17 n=39,p=8 n=39,p=3 Powers CRST SCT TCT RST Powers CRST SCT TCT RST Varying Constant v Varying Constant v Fig 1 Empirical sizes and powers of the comparative tests for H : Σ = I p at α =.5 significance level based on 1, independent replications of Gamma Assumption. The null and alternative hypothesis are Σ = diag( 1 [v p], 1 p [v p]) with v varied from to.1. Left: n = 39, p = 8; Right: n = 39, p = 3. n=39,p=8 n=39,p=3 Powers CRST SCT TCT RST Powers CRST SCT TCT RST Varying Constant v Varying Constant v Fig Empirical sizes and powers of the comparative tests for H : Σ = I p at α =.5 significance level based on 1, independent replications of Gamma Assumption. The null and alternative hypothesis are Σ = diag((1 + / np) 1 [v p], 1 p [v p]) with v varied from to.5. Left: n = 39, p = 8; Right: n = 39, p = 3.

18 18 D. JIANG tools we used in RMT. So we believe that large dimensional spectral analysis in RMT will have more application fields in light of different situations. APPENDIX A: DERIVATIONS AND PROOFS. A.1. Proofs of the derivation in (.1) and (.3). The logarithm of the density of the sample χ is written as l(χ, θ) = np ln(π) n ln Σ 1 = np ln(π) n ln Σ 1 n (x i µ) Σ 1 (x i µ) n tr ( Σ 1 (x i µ)(x i µ) ), and θ is denoted as (µ, vec(σ) ). For the first part of (.1), by the formula dx BX dx = (B + B )X, where the X is a vector and B is a matrix dependent on X, we have dl(χ, θ) dµ = 1 = 1 n d(x i µ) Σ 1 (x i µ) dµ n Σ 1 (x i µ) = nσ 1 ( ˆµ µ) where ˆµ = 1 n n x i. For the second part of (.1), by the following formulas d ln X dx = vec((x 1 ) ) dtr(b X) dx dx 1 dx = ((X 1 ) X 1 ) = vec(b) (B C)vec(D) = vec(cdb) where X, B, C, D are all matrices. Then we have dl(χ, θ) dσ = n ( d ln Σ dσ 1 dσ 1 dtr Σ 1 dσ = n vec(σ 1 ) + n (Σ 1 Σ 1 )vec(a) = n vec(σ 1 (AΣ 1 I p )) n (x i µ)(x i µ) ) dσ 1

19 where A = 1 n LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 19 n (x i µ)(x i µ). Thus ( ) dl(χ, θ) dl(χ, θ) dvec(σ) = vec = n dσ vec(σ 1 (AΣ 1 I p )). Therefore, the score vector for the sample is ( ) ( U1 (χ, θ) nσ 1 ( ˆµ µ) U(χ, θ) =: = n U (χ, θ) vec(σ 1 (AΣ 1 I p )) Next consider the derivation of (.3). By the definitions of the Hessian matrix and score vector, we have ( ) H(χ, θ) = d du(χ, θ) H11 H l(χ, θ) = dθ dθ =: 1, H 1 H where the part of the parameter Σ is Since and H = n = n dvec(σ 1 (AΣ 1 I p )) dvec (Σ) dvec(σ 1 ) dvec(σ 1 AΣ 1 Σ 1 ) dvec (Σ) dvec (Σ 1 ) = n (Σ 1 Σ 1 )(AΣ 1 I p + I p AΣ 1 I p ). d(vec(σ 1 AΣ 1 ) vec(σ 1 )) dvec (Σ 1 ) = d(aσ 1 I p )vec(σ 1 ) dvec (Σ 1 ) = dvec(σ 1 ) dvec (Σ 1 ) (AΣ 1 I p ) + d(aσ 1 I p ) dvec (Σ 1 ) vec(σ 1 ) I p = (AΣ 1 I p ) + dvec(aσ 1 ) dvec (Σ 1 ) (I p Σ 1 ) I p = (AΣ 1 I p ) + (I p A)(I p Σ 1 ) I p = (AΣ 1 I p + I p AΣ 1 I p ). dvec(σ 1 ) dvec (Σ) = (Σ 1 I p )(I p Σ 1 ) = (Σ 1 Σ 1 ), ) dvec(σ 1 ) dvec (Σ 1 ) where the formulas (B C)vec(D) = vec(cdb) and (B C)(D E) = (BD CE) are repeatedly used.

20 D. JIANG A.. Proofs of Lemma.1. First, the result of (.8) and (.1) is corresponding to the ones in [4] with the 4th moment equal to 3. Obviously, the mean in (.8) is formed under the condition that the matrix T in [4] is identity, and its LSD is H(t) = I [1, ) (t) according to the assumptions in Lemma.1. Next, If we drop the condition on the 4th moment, it will be found that each of the (4.1) and (.7) in [4] should be plused an additional item by their (1.15) ) βqb p(z) E (e 1T 1 D 1 T 1 e1 e 1T 1 D 1 (m(z)t + I) 1 T 1 e1 and βb p (z 1 )b p (z ) n respectively, where n p j=1 e it 1 Ej (D 1 j (z 1 ))T 1 ei e it 1 Ej (D 1 (z ))T 1 ei e i = (,, 1,,, ) ; D(z) = T 1 Sn T 1 zi; }{{} i 1 D j (z) = D(z) r j rj ; r j = 1 T 1 1 ξ j b p (z) = n 1 + n 1 E trt D1 1 and E j is the conditional expectation given r 1,, r j for j = 1,, n. According to the Lemma 6. in [16], if the 4th moment is arbitrary finite number, the mean function of M(z) in Lemma 1.1 in [4] should be added βqm3 (z) t 1+tm(z) dh(t) 1 dh(t) (1+tm(z)) 1 q t m (z) dh(t) (1+tm(z)) which is the limit of βqm(z)b p(z) E (e 1 T 1 D 1 T 1 e 1 e 1 T 1 D 1 (m(z)t + I) 1 T 1 e 1 ) 1 q t m (z) (1+tm(z)) dh(t) ever dropped in (4.1) and (4.1) of [4]. Similarly, the covariance function of M(z) should include the additional item tm (z 1 ) βq (1 + tm(z 1 )) dh(t) tm (z ) (1 + tm(z )) dh(t) which is the limit of βb p(z 1 )b p (z ) z 1 z n n p j=1 e it 1 Ej (D 1 j j (z 1 ))T 1 ei e it 1 Ej (D 1 (z ))T 1 ei j

21 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 1 ever dropped in (.7) of [4]. Then by their (1.14), the added mean function of G n (f j ) should be βq (A.1) f j (z) m3 (z) t 1+tm(z) dh(t) 1 dh(t) (1+tm(z)) πi 1 q dz t m (z) dh(t) (1+tm(z)) and the covariance function of G n (f j ) should plus (A.) βq tm (z 1 ) 4π f j (z 1 )f l (z ) (1 + tm(z 1 )) dh(t) tm (z ) (1 + tm(z )) dh(t) dz 1dz Put the condition H(t) = I [1, ) (t) assuming in Lemma.1 into the equation (A.1) and (A.), then we have the additional mean function βq m 3 (z) f j (z) πi (1 + m(z))[(1 q)m (z) + m(z) + 1] dz, and the added covariance function βq f j (z 1 )f l (z ) 4π (1 + m(z 1 )) (1 + m(z )) dm(z 1)dm(z ), where j, l {1,, k}. A.3. test. Proofs of limiting schemes for the correction to Rao s score Calculation of F qn (g) in (3.3). Because F qn (g) = g(x)df qn (x), where F qn (x) is the Marčenko- Pastur law of the matrix S with index q n, the density is p qn (x) = 1 (bn x)(x a n ), πxq n if a n x b n,, otherwise, and has a point mass 1 1 at the origin if q n > 1, where a n = q n (1 q n ) and b n = (1 + q n ). (See in [5]). According the the definition, the supporting set of MP-law is x [, 4] if q n = 1. But it is unreasonable that x lies on the denominator if x = by the expression of the density. So we exclude the case of q n = 1 and consider the following integral first, F qn (g) = (x 1) (bn x)(x a n )dx πxq n

22 D. JIANG Make a substitution x = 1 + q n q n cos θ, where θ π, then F qn (g) = = π = 1 π bn (x 1) (bn x)(x a n )dx a n πxq n π (q n q n cos θ) 1 + q n q n cos θ π sin θ sin θdθ (q n q n cos θ) sin θ 1 + q n dθ q n cos θ Let x = 1+q n q n cos θ = q n (cos θ+d ), where d = 1 + q n is q n a constant. Thus, the above integral F qn (g) is obtained by the partition into three parts as below : 1 π = 1 π = 1 π π = 1 π { = π π (q n q n cos θ) sin θ 1 + q n dθ q n cos θ [ q n (cos θ + d ) 1] sin θ dθ q n (cos θ + d ) [ q n cos θ sin θ ( q n d + ) sin θ [ + (q n 1)π + π q n (1 + q n 1 q n ) q n, if q n < 1; q n 1 + 1/q n, if q n > 1, ] sin ] θ dθ q n (cos θ + d ) where the third part is calculated by the following integral, which is also used in other calculations. (A.3) π π = = = π 1 dθ = cos θ + d π 1 cos θ sin θ + d dθ d θ (1 tan θ + d sec θ ) cos θ d tan θ d 1 = π d 1 d (d 1) tan θ π ( ) 1 d 1 ( ) d 1 + d 1 d +1 tan θ d + 1 tan θ

23 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 3 and the third part of the limiting integral F qn (g) is (A.4) π = 1 q n = 1 q n sin θ q n (cos θ + d ) dθ π π 1 cos θ cos θ + d dθ = π q n (1 + q n 1 q n ) ( cos θ + d + 1 d cos θ + d )dθ Because the density corresponding to F qn (x) has a point mass 1 1 q n at the origin if q n > 1, then the F qn (g) should be added the term (1 ) (1 1 q n ) if q n > 1. Then we arrive at F qn (g) = q n, if q n < 1 & q n > 1. Calculation of µ(g) in (3.5). By (9.1.13) in Bai and Silverstein [5], with H(t) = I [1, (t), the first part of the limiting mean µ(g) in (.8) can also be expressed as ( g (a(q)) + g (b(q)) µ 1 (g) = (κ 1) 1 ) b(q) g(x) 4 π dx a(q) 4q (x 1 q) where a(q) = (1 q) and b(q) = (1 + q). For g(x) = (x 1), make a substitution x = 1 + q q cos θ, θ π, then ( g (a(q)) + g (b(q)) µ 1 (g) = (κ 1) 4 1 π g(1 + q ) q cos θ) π q sin θdθ 4q ( q cos θ) ( g (a(q)) + g (b(q)) = (κ 1) ( 4q + q = (κ 1) = (κ 1)q 4 1 4π π 1 π g(1 + q ) q cos θ)dθ 4π ) (q 4q 3 cos θ + 4q cos θ)dθ where κ = if the variables are real, and κ = 1 if the variables are complex.

24 4 D. JIANG The second part of the limiting mean µ(g) is obtained by (.9) µ (g) = βq (1 z) m 3 (z) πi (1 + m(z))[(1 q)m (z) + m(z) + 1] dz, For z C +, recall the equation (9.1.1) given in [4] z = 1 m(z) + q 1 + m(z). Denote m(z) as m for simplicity, it is easily obtained that (1 z) = [m (q )m + 1] m (1 + m) dz = (1 q)m + m + 1 m (1 + m) dm then we have µ (g) = βq [m (q )m + 1] πi m(1 + m) 5 dm, By solving for m from (9.1.1) in [5], we get the contour for the integral above should enclose the interval [ min( 1 1 q, ), max( q 1 q, 1 ] 1 + q ). Therefore, -1 is the residue if q 1 and is the residue if q > 1. and the integral is calculated as µ (g) = βq, which is the same result for both the cases of q 1 and q > 1. Finally, we obtained Calculation of υ(g) in (3.6). µ(g) = (κ 1)q + βq. By Lemma.1, the first part of limiting variance υ(g) in (.1) is υ 1 (g) = κ g(z 1 )g(z ) 4π (m(z 1 ) m(z )) dm(z 1)dm(z ) and g(z 1 )g(z ) = (z 1 1) (z 1) = 1 z 1 z + z 1 + z + 4z 1 z z 1z z 1 z + z 1z.

25 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 5 Let 1 denote constant function which equals to 1, It is obvious that υ(1, 1) =. Denoting m(z i ) = m i, i = 1,. As mentioned above, for fixed m, we have a contour enclosed -1, but not when q 1, whereas it enclosed, but not -1 when q > 1. On one hand, we consider the case of q 1. Because z 1 (m 1 m ) dm 1 1 = q ( + 1 q ) (1 + m 1 ) j (m + 1) l( m m 1 q m + 1 )l 1 dm 1 and (A.5) = πi q (1 + m ). z j= (m 1 m ) dm 1 ( = 1 ) ( q 1 + m 1 + m [ = 4πi m q (1 + m 1 ) + q (1 + m 1 ) 3 ]. ) q (1 + m ) l=1 1 m m 1 dm So υ(z 1 z 1, 1) =. Similarly, υ(1, z z ) =. Therefore, there are only four parts left, i.e. z 1 z z 1 z z 1 z +4z 1z. Further, υ(z 1, z ) = κq πi = κq 1 (m + 1) ( q 1 + m q ) (1 + m ) j dm j= υ(z1, z ) = κ 4π = κq ( 1 m 1 + πi z 1 z (m 1 m ) dm 1dm ) q 1+m 1 (1 + m 1 ) dm 1 = κq ( 1 ) ( q 1 + πi m m 1 = κq(1 + q). m 1 ) q 1 (1 + m 1 ) dm m 1

26 6 D. JIANG Similarly, υ(z 1, z ) = κq(1+q). For the last part υ(z 1, z ), the integral is calculated by eq.(a.5) as below. υ(z1, z) = κ z1 z = κq ( πi = κq πi + κq πi 4π (m 1 m ) dm 1dm 1 ) q [ ] 1 + m m 1 (1 + m 1 ) + q (1 + m 1 ) 3 dm 1 ( 1 ) ( ) q 1 q 1 + m m 1 m 1 (1 + m 1 ) dm m 1 ( 1 ) ( ) [ q 1 q 1 + m m 1 (1 + m 1 ) (1 + m 1 ) = κ(4q + 1q + 4q 3 ) Finally, we obtain m 1 υ 1 (g) = υ(z 1, z ) υ(z 1, z ) υ(z 1, z ) + 4υ(z 1, z ) = κ(4q + 1q + 4q 3 ) 8κq(1 + q) + 4κq = κq (1 + q). when q 1. On the other hand, Similar calculations are conducted for the case of q > 1. It is found that the result is the same as the one of the case above only with the residues are changed from -1 to. So for all the cases of q, we arrive at υ 1 (g) = κq (1 + q) For the second part of υ(g) in (.11), we have υ (g) = βq g(z 1 )g(z ) 4π (1 + m 1 ) (1 + m ) dm 1dm. Furthermore, g(z 1 ) (1 + m 1 ) dm 1 = [m 1 (q )m 1 + 1] m 1 (1 + m 1) 4 dm 1 = 4πiq Since the contour contains -1 as a residue if q 1, and enclose as a residue for the other case q > 1. By the calculations of the both cases, it will be found that the results are all the same although the residues are different. Thus we get υ (g) = βq 4π ( 4πiq) ( 4πiq) = 4βq3. ] dm 1

27 LARGE DIMENSIONAL COVARIANCE STRUCTURE TEST. 7 Finally, we obtained υ(g) = κq (1 + q) + 4βq 3. ACKNOWLEDGEMENT The author thanks the reviewers for their helpful comments and suggestions to make an improvement of this article. This research was supported by the National Natural Science Foundation of China REFERENCES [1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. Second Edition. John Wiley & Sons. [] Bai, Z.D., Jiang, D., Yao, J. F. and Zheng, S. (9) Corrections to LRT on large dimensional covariance matrix by RMT. Ann. Statist. 37, No.6B: [3] Bai, Z. D. and Silverstein, J. W. (1999). Exact Separation of Eigenvalues of Large Dimensional Sample Covariance Matrices. Ann.Probab. 7, No.3, [4] Bai, Z. D. and Silverstein, J. W. (4). CLT for linear spectral statistics of large dimensional sample covariance matrices. Ann.Probab. 3, [5] Bai, Z. D. and Silverstein, J. W. (1). Spectral analysis of large dimensional random matrices. nd ed. Beijing: Science Press. [6] Cai, T.T. and Ma, Z. (13). Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli. 19 (5B), [7] Chen, S.X., Zhang, L.X. and Zhong, P.S. (1). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 15, [8] Gombay, E. (). Parametric sequential tests in the presence of nuisance parameters. Theory Stochastic Progresses. 8, [9] Jiang, D., Jiang, T. and Yang, F. (1). Likelihood ratio tests for covariance matrices of high-dimensional normal distributions. J. Statist. Plann. Inference. 14, [1] John, S. (1971). Some Optimal Multivariate Tests. Biometrika. 59, [11] Johnstone, I.M. (1). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 9, [1] Ledoit, O. and Wolf, M. () Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 3, [13] Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist. 1, [14] Rao, C.R. (1948). Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society. 44, [15] Srivastava, M.S. (5). Some tests concerning the covariance matrix in high dimensional data. J. Japan Statist. Soc. 35, [16] Zheng, S. (1). Central limit theorems for linear spectral statistics of large dimensional F -Matrices. Annales de 1 Institut Henri Poincaré-Probabilités et Statistiques. 48, No., [17] Zheng, S., Bai, Z.D. and Yao, J. F. (15). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Ann. Statist. 43, No.,

28 8 D. JIANG School of Mathematics, Jilin University, 699 QianJin Street, Changchun 131, China.

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p. Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Regularized LRT for Large Scale Covariance Matrices: One Sample Problem

Regularized LRT for Large Scale Covariance Matrices: One Sample Problem Regularized LRT for Large Scale ovariance Matrices: One Sample Problem Young-Geun hoi, hi Tim Ng and Johan Lim arxiv:502.00384v3 [stat.me] 0 Jun 206 August 6, 208 Abstract The main theme of this paper

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

Weiming Li and Jianfeng Yao

Weiming Li and Jianfeng Yao LOCAL MOMENT ESTIMATION OF POPULATION SPECTRUM 1 A LOCAL MOMENT ESTIMATOR OF THE SPECTRUM OF A LARGE DIMENSIONAL COVARIANCE MATRIX Weiming Li and Jianfeng Yao arxiv:1302.0356v1 [stat.me] 2 Feb 2013 Abstract:

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-2017-0275 Title Testing homogeneity of high-dimensional covariance matrices Manuscript ID SS-2017-0275 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0275

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond

Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond arxiv:1707.04010v2 [math.st] 17 Apr 2018 Xinxin Yang Department of ISOM, Hong Kong University of Science and Technology

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title itle On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime Authors Wang, Q; Yao, JJ Citation Random Matrices: heory and Applications, 205, v. 4, p. article no.

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

A new multivariate CUSUM chart using principal components with a revision of Crosier's chart

A new multivariate CUSUM chart using principal components with a revision of Crosier's chart Title A new multivariate CUSUM chart using principal components with a revision of Crosier's chart Author(s) Chen, J; YANG, H; Yao, JJ Citation Communications in Statistics: Simulation and Computation,

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices by Jack W. Silverstein* Department of Mathematics Box 8205 North Carolina State University Raleigh,

More information

Covariance estimation using random matrix theory

Covariance estimation using random matrix theory Covariance estimation using random matrix theory Randolf Altmeyer joint work with Mathias Trabs Mathematical Statistics, Humboldt-Universität zu Berlin Statistics Mathematics and Applications, Fréjus 03

More information

Preface to the Second Edition...vii Preface to the First Edition... ix

Preface to the Second Edition...vii Preface to the First Edition... ix Contents Preface to the Second Edition...vii Preface to the First Edition........... ix 1 Introduction.............................................. 1 1.1 Large Dimensional Data Analysis.........................

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix Taras Bodnar a, Holger Dette b and Nestor Parolya c a Department of Mathematics, Stockholm University, SE-069

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

Some Statistical Inferences For Two Frequency Distributions Arising In Bioinformatics

Some Statistical Inferences For Two Frequency Distributions Arising In Bioinformatics Applied Mathematics E-Notes, 14(2014), 151-160 c ISSN 1607-2510 Available free at mirror sites of http://www.math.nthu.edu.tw/ amen/ Some Statistical Inferences For Two Frequency Distributions Arising

More information

Numerical Solutions to the General Marcenko Pastur Equation

Numerical Solutions to the General Marcenko Pastur Equation Numerical Solutions to the General Marcenko Pastur Equation Miriam Huntley May 2, 23 Motivation: Data Analysis A frequent goal in real world data analysis is estimation of the covariance matrix of some

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

CENTRAL LIMIT THEOREMS FOR CLASSICAL LIKELIHOOD RATIO TESTS FOR HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

CENTRAL LIMIT THEOREMS FOR CLASSICAL LIKELIHOOD RATIO TESTS FOR HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS The Annals of Statistics 013, Vol. 41, No. 4, 09 074 DOI: 10.114/13-AOS1134 Institute of Mathematical Statistics, 013 CENTRAL LIMIT THEOREMS FOR CLASSICAL LIKELIHOOD RATIO TESTS FOR HIGH-DIMENSIONAL NORMAL

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

An Introduction to Large Sample covariance matrices and Their Applications

An Introduction to Large Sample covariance matrices and Their Applications An Introduction to Large Sample covariance matrices and Their Applications (June 208) Jianfeng Yao Department of Statistics and Actuarial Science The University of Hong Kong. These notes are designed for

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Bull. Korean Math. Soc. 5 (24), No. 3, pp. 7 76 http://dx.doi.org/34/bkms.24.5.3.7 KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Yicheng Hong and Sungchul Lee Abstract. The limiting

More information

Tests for Covariance Matrices, particularly for High-dimensional Data

Tests for Covariance Matrices, particularly for High-dimensional Data M. Rauf Ahmad Tests for Covariance Matrices, particularly for High-dimensional Data Technical Report Number 09, 200 Department of Statistics University of Munich http://www.stat.uni-muenchen.de Tests for

More information

A new method to bound rate of convergence

A new method to bound rate of convergence A new method to bound rate of convergence Arup Bose Indian Statistical Institute, Kolkata, abose@isical.ac.in Sourav Chatterjee University of California, Berkeley Empirical Spectral Distribution Distribution

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

PATTERNED RANDOM MATRICES

PATTERNED RANDOM MATRICES PATTERNED RANDOM MATRICES ARUP BOSE Indian Statistical Institute Kolkata November 15, 2011 Introduction Patterned random matrices. Limiting spectral distribution. Extensions (joint convergence, free limit,

More information

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Statistica Sinica 24 (2014), 1685-1702 doi:http://dx.doi.org/10.5705/ss.2013.239 CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Mingyao Ai 1, Bochuan Jiang 1,2

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING

A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING Statistica Sinica 26 (2016), 761-777 doi:http://dx.doi.org/10.5705/ss.202015.0029 A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING Mingyao Ai, Xiangshun Kong and Kang Li Peking University

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao A note on a Marčenko-Pastur type theorem for time series Jianfeng Yao Workshop on High-dimensional statistics The University of Hong Kong, October 2011 Overview 1 High-dimensional data and the sample covariance

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Multivariate-sign-based high-dimensional tests for sphericity

Multivariate-sign-based high-dimensional tests for sphericity Biometrika (2013, xx, x, pp. 1 8 C 2012 Biometrika Trust Printed in Great Britain Multivariate-sign-based high-dimensional tests for sphericity BY CHANGLIANG ZOU, LIUHUA PENG, LONG FENG AND ZHAOJUN WANG

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Exercises Chapter 4 Statistical Hypothesis Testing

Exercises Chapter 4 Statistical Hypothesis Testing Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Topic 4 Notes Jeremy Orloff

Topic 4 Notes Jeremy Orloff Topic 4 Notes Jeremy Orloff 4 auchy s integral formula 4. Introduction auchy s theorem is a big theorem which we will use almost daily from here on out. Right away it will reveal a number of interesting

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

Cluster Validity. Oct. 28, Cluster Validity 10/14/ Erin Wirch & Wenbo Wang. Outline. Hypothesis Testing. Relative Criteria.

Cluster Validity. Oct. 28, Cluster Validity 10/14/ Erin Wirch & Wenbo Wang. Outline. Hypothesis Testing. Relative Criteria. 1 Testing Oct. 28, 2010 2 Testing Testing Agenda 3 Testing Review of Testing Testing Review of Testing 4 Test a parameter against a specific value Begin with H 0 and H 1 as the null and alternative hypotheses

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

An introduction to G-estimation with sample covariance matrices

An introduction to G-estimation with sample covariance matrices An introduction to G-estimation with sample covariance matrices Xavier estre xavier.mestre@cttc.cat Centre Tecnològic de Telecomunicacions de Catalunya (CTTC) "atrices aleatoires: applications aux communications

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Non white sample covariance matrices.

Non white sample covariance matrices. Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions

More information

STA 2101/442 Assignment 3 1

STA 2101/442 Assignment 3 1 STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information