Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Size: px

Start display at page:

Download "Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices"

Rosemary Sims
5 years ago
Views:

1 Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical Information Science, Tokyo University of Science, 1-3, Kagurazaka, Shinjuku-ku, Tokyo, , Japan b Graduate School of Economics, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, , Japan, JSPS Research Fellow c Department of Mathematics, KTH Royal Institute of Technology, SE-100, 44, Stockholm, Sweden Abstract We propose a new test procedure for testing linear hypothesis on the mean vectors of normal populations with unequal covariance matrices when the dimensionality, p exceeds the sample size N, i.e. p/n c <. Our procedure is based on the Dempster trace criterion and is shown to be power and size consistent in high dimensions. The asymptotic null and non-null distributions of the proposed test statistic are established in the high dimensional setting and improved estimator of the critical point of the test is derived using Cornish-Fisher expansion. As a special case, our testing procedure is applied to multivariate Behrens-Fisher problem. We illustrate the relevance and benefits of the proposed approach via Monte-Carlo simulation which show that our new test is comparable to, and in many cases is more powerful than, the tests for equality of means presented in the recent literature. Keywords: Cornish-Fisher transform, Dempster trace criterion, High dimensionality, Multivariate Behrens-Fisher problem, (N, p)-asymptotics. Corresponding author. Tel.: ; Fax: address: nishiyam@rs.kagu.tus.ac.jp (Takahiro Nishiyama )

2 1. Introduction The problem of testing mean vectors is a part of many procedures of multivariate statistical analysis, such as multiple comparisons, MANOVA and classification. The standard testing technique is based on classical Hotelling s T test which is known to have optimal performance properties in a large sample case, i.e assuming that the number of feature variables, p is fixed and is much smaller the sample size, N. However, in many practical applications of modern multivariate statistics (e.g. DNA microarray data) the number of feature p exceeds N, so that a straightforward use of T statistics is impossible due to singularity of the sample covariance matrix. Thus, to cope with this high dimensional situation, it would be desirable to develop new tests for N p, and investigate their asymptotic properties when both N and p are going to infinity; this asymptotic framework is also known as (N, p)-asymptotics. There have been a series of important results on this testing problem. In particular, Bai and Saranadasa (1996) focus on (normal) the two-sample case with equal covariance matrix, and propose to use an estimator of Euclidean norm of the shift vector instead of T statistics; they also establish the asymptotic normality of the test statistics assuming that p and N is of the same order. Later, by using the same approach as Bai and Saranadasa (1996), Aoshima and Yata (011) derive the test for which the significance levels can be controlled with or without the assumption of normality of the data, that is robust for the model assumption. Unlike the above approach, Srivastava (007) suggests F -type test statistics based on the Moore-Penrose inverse of the singular sample covariance matrix, and Srivastava and Du (008) have developed the test procedure based on the Dempster s trace criterion (D-criterion) (Dempster (1958,1960)) under the assumption of variable independence. Also it is important to note, that both these procedures are less restrictive than that of Bai and Saranadasa (1996) in a sense that they allow p grow faster than N according to (N, p)-asymptotic framework. Motivated by the previous literature and as part of effort in developing testing procedures with stable characteristics in high-dimensions, we focus on testing linear hypotheses of mean vectors for high-dimensional data with unequal covariance matrices. Our main objective in this paper is to show that our newly derived test statistics based on the Dempster trace criterion has a number of attractive asymptotic properties and demonstrates good performance in large (N, p). We state the asymptotic distribution of the derived test statistics under both the null and the local alternative hypotheses, and

3 provide the explicit expressions for asymptotic power of the test in terms of δ when N = O(p δ ) and 0 < δ < 1. To further improve the test performance, Cornish-Fisher approximation of the upper 100α percentile of the test is provided in (N, p)-asymptotics. We also apply our new test procedure to the multivariate Behrens-Fisher problem and compare its performance with the above-mentioned testing procedures from resent literature. The rest of the paper is organized as follows. Section provides description of the new test and main asymptotic results. Section 3 considers the application to the Behrens-Fisher problem. In Section 4, the level hypothesis is tested and the attained significance levels of the newly derived test is analyzed for a number of high dimensional scenarios. At last, we provide some concluding remarks. The proofs of theorems and lemmas stated in Section are given in the Appendix A and the Appendix B.. Description of the new test and asymptotic properties Let x (i) 1,..., x (i) N i, i = 1,..., k be N i samples from N p (µ i, Σ i ), respectively. We are interesting in the linear testing the hypothesis H 0 : k β i µ i = 0 vs. H 1 : H 0, (.1) i=1 where β 1,..., β k are given scalars and covariance matrices Σ i s are assumed to be unequal. In this study, we consider Bennett (1951) s transform derived Anderson (003) for k-sample case (see, Bennett (1951), Anderson (003)). For convenience, let N 1 be the smallest. Then, for j = 1,..., N 1, we define y j = β 1 x (1) j + k l= β l N1 N l ( x (l) j 1 N 1 N 1 m=1 x (l) m + 1 N1 N l N l Especially, when N 1 = = N k, y j = k l=1 β lx (l) j. The expected value of y j and the covariance matrix of y l and y m are n=1 x (l) n ). E ( ) k y j = β i µ i, i=1 Cov (y l, y m ) = δ lm ( k i=1 βi N 1 Σ i N i ), 3

4 respectively, where δ lm is Kronecker s delta. We further set k β i µ i µ, i=1 k i=1 β i N 1 N i Σ i Σ, respectively, and note that y 1,..., y N1 are independent and identically distributed as N p (µ, Σ). Thus, we may consider testing H 0 : µ = 0 vs. H 1 : µ 0, (.) which is equivalent to (.1). For testing (.) under the assumption that p > n 1 = N 1 1, we define a new test statistic as follows where T D = N 1y y trs, (.3) y = 1 N 1 y N i, S = 1 N 1 (y 1 n i y)(y i y). 1 i=1 T D is based on Dempster trace criterion and does not require any restrictions on the relation between the dimension and sample size. At first, we derive the null distribution of the statistic (.3) under the following (N, p)-asymptotic framework: (A.1) : p, n 1, i=1 Further, in addition to (A1), we assume that trσ i (A.) : lim p p trσ i (A.3) : lim p p p n 1 c (0, ). a i (0, ), i = 1,..., 4, a i (0, ), i = 1,..., 8. Let T D = { } N1 y y p trs 1. (.4) 4

5 The following theorem provides asymptotic null distribution of T D /σ D where a trσ /p σ D = =. a 1 trσ/p Theorem.1. When the null hypothesis H 0 : µ = 0 is true, the distribution function of T D / σ D can be expanded in the asymptotic framework (A.1) and under assumption (A.3) as P ( ) TD z σ D [ 1 = Φ(z) φ(z) p c 3 h (z) + 1 p {c 4h 3 (z) + c 6 h 5 (z)} + 1 ] c h 1 (z) + O(p 3/ ), (.5) n 1 where σ D is obtained by replacing a 1 and a with their estimators, Φ(z) is the distribution function of the standard normal distribution, c = 1, c 3 = a3 3 a 3, c 4 = a 4, c a 6 = a 3, 9a 3 and h i (z) s (i = 1,..., 6) are the Hermite polynomials given by h 1 (z) = z, h (z) = z 1, h 3 (z) = z 3 3z, h 4 (z) = z 4 6z + 3, h 5 (z) = z 5 10z z, h 6 (z) = z 6 15z z 15. Proof. See, Appendix A.1. In practice, c i s and a i s are unknown. Hence, to use the result of Theorem, we need their estimators that are expected to be good in high-dimension setting. As sample counterparts of a i s, we use their (N, p)-consistent and unbiased estimators derived in Srivastava(005), Srivastava and Yanagihara (010) and Hyodo, Takahashi and Nishiyama (01) as â 1 = trs p, 5

6 â = n 1 p(n 1 1)(n 1 + ) n 1 {trs (trs) â 3 = (n 1 1)(n 1 )(n 1 + )(n 1 + 4) { trs 3 } 3(n 1 + )(n 1 1)â 1 â n 1 p â 3 1, p â 4 = 1 ( trs 4 ) pb 1 â 1 p b â b 0 p 1â pb 3 â n 1 p 3 â 4 1, where n 1 }, b 0 = n 1 (n n 1 + 1n ), b 1 = n 1 (n 1 + 6n 1 + 9), b = n 1 (3n 1 + ), b 3 = n 1 (n 1 + 5n 1 + 7). Type 1 error of the asymptotic test based on using the main term of (.5) can be essentially improved by the corrected estimator of the upper 100α-percentile of T D / σ. This correction is achieved by using Cornish-Fisher expansion. Corollary.. Under the asymptotic framework (A.1) and assumption (A.3), Cornish-Fisher expansion of the estimated upper percentile of T D / σ is derived as ẑ(α) = z α + 1 â3 p 3 (z â 3 α 1) + 1 p { â4 â z α (zα 3) â 3 z 9â 3 α (zα 5) } + 1 n 1 z α + O p (p 3/ ), (.6) where z α is the upper 100α% percentile of the standard normal distribution. Proof. See, Appendix A.. Next, we state the asymptotic distribution of the test statistic T D under the local alternative. We state H L(δ) 1 : Nµ µ = O(p δ ), Nµ Σµ = O(p δ ), 0 < δ < 1, (.7) 6

7 where N = k i=1 N i and assume that N 1 /N i c i (0, ) (i = 1,..., k). Then, under the hypothesis H L(δ) 1, the test statistic can be expressed as TD = N 1y y trs 1 N 1µ µ trσ. (.8) Theorem.3 provides the limiting distribution of TD /σ D, where σd trσ = + 4N 1 µ Σµ, (trσ) under the hypothesis H L(δ) 1. Theorem.3. When the local alternative hypothesis H L(δ) 1 is true, the distribution of TD /σ D is asymptotically normal, i.e. TD = N 1y y/trs 1 N 1 µ µ/trσ d N(0, 1), σd (trσ + 4N 1 µ Σµ)/trΣ in the asymptotic framework (A.1) and under assumption (A.). Proof. See, Appendix A.3. Using the asymptotic distribution under the alternative hypothesis, we are able to describe the (N, p)-asymptotic behavior of the power function of our test statistic which is collaborated in the following theorem. Theorem.4. Let Power α ( T D, δ) = P ( ) TD z α H L(δ) 1, σ D be the power function of T D. Then, in the asymptotic framework (A.1) and assumptions (A.), (i) Power α ( T D, δ) α, if 0 < δ < 1/, ( ) (ii) Power α ( T N1 µ µ D, δ) Φ z α, if δ = 1/, trσ (iii) Power α ( T D, δ) 1, if 1/ < δ < 1. 7

8 Proof. See, Appendix A.4. In words, this theorem claims that the test statistic T D is (N, p)-consistent. 3. New test procedure for multivariate Behrens-Fisher problem In this section, we focus on an important special case of the testing problem (.1). We consider testing equality of mean vectors of two normal populations with unequal covariance matrices, that is, we consider testing the hypothesis H 0 : µ 1 = µ vs. H 1 : µ 1 µ. (3.1) We note that (3.1) is the special case that k =, β 1 = 1 and β = 1 in (.1). This problem is known as the multivariate Behrens-Fisher problem, and many authors have discussed (see, e.g., Bennett (1951), Johnson and Weerahandi (1988) and Yanagihara and Yuan (005)). Also, for highdimensional data, testing equality of mean vectors of two populations have been discussed by Bai and Saranadasa (1996), Srivastava (007), Chen and Qin (010), Aoshima and Yata (011), and so on. Especially, Chen and Qin (010) and Aoshima and Yata (011) gave a test statistic for the multivariate Behrens-Fisher problem in high-dimension setting. We propose a new test statistic for this problem by using the idea stated in Section, that is, we consider the following statistic where, for j = 1..., N 1, y j = x (1) j N1 N x () j + T D = N 1y y trs, 1 N 1 N1 N m=1 x () m 1 N N n=1 x () n. (3.) Then we note that y 1,..., y N1 are independent and identically distributed as N p (µ, Σ) where µ = µ 1 µ, Σ = Σ 1 + N 1 N Σ, (3.3) 8

9 respectively (see, Bennett (1951)), that is, (3.1) is equivalent to the following hypothesis: H 0 : µ = 0 vs. H 1 : µ 0. Therefore from Theorem.1, Corollary. and Theorem.4, we have following corollary. Corollary 3.1. Suppose that T D, â i s, c i s and Power α ( T D, δ) are defined according to (3.) and (3.3). Then, under the asymptotic framework (A.1) and assumption (A.3) the asymptotic distribution of T D under the null hypothesis H 0 : µ 1 = µ and Cornish-Fisher expansion of the upper percentile are derived as follows ( ) TD [ 1 P z = Φ(z) φ(z) p c 3 h (z) σ D + 1 p {c 4h 3 (z) + c 6 h 5 (z)} + 1 ] c h 1 (z) + O(p 3/ ), n 1 ẑ(α) = z α + 1 â3 p 3 (z â 3 α 1) + 1 p { â4 â z α (zα 3) â 3 z 9â 3 α (zα 5) respectively, and for the asymptotic power of T D we have } + 1 n 1 z α + O p (p 3/ ), (3.4) (i) Power α ( T D, δ) α, if 0 < δ < 1/, ( ) (ii) Power α ( T N1 µ µ D, δ) Φ z α, if δ = 1/, trσ (iii) Power α ( T D, δ) 1, if 1/ < δ < Simulation study A simulation study shows the effectiveness of the suggested test statistics in high dimension. We first provide a study justifying accuracy of the approximation for the critical point derived in Corollary. of our testing procedure 9

10 by simulating the Attained Significance Level (ASL), or size of the test. We draw k independent samples of size N i = 10(1 + i) and N i = 0(1 + i), where i = 1,..., k valid p-dimensional normal distributions under the null hypothesis (i.e. (.1)). We replicate this r = 10 5 times, and using T D from (.4) calculate ( ( ) of TD / σ > z α H 0 is true) ASL 1 α TD =, r and ( ( ) of TD / σ > ẑ(α) H 0 is true) ASL α TD =, r denoting the ASL of T D where z α is the upper 100α percentile of the standard normal distribution and ẑ(α) is the corrected value of the upper 100α percentile given by (.6). Tables 1-4 provide the results for a number of high-dimensional scenarios and an assortment of null hypothesis H 0 specified by the choice of β i s for i = 1,..., k. Also, we set up the covariance structures Σ 1 = I, Σ = (0. i j ), Σ 3 = (0.5 i j ) for the case k = 3 and Σ 1 = I, Σ = I, Σ 3 = (0. i j ), Σ 4 = (0.5 i j ) for the case k = 4, respectively. For each table, the simulated size of T D is systematically lower for the suggested corrected percentile ẑ(α), and ẑ(α) is closer to the selected nominal level α. Furthermore, in all the settings of H 0, the size of the test remains essentially the same when both p and k grows thereby validating our asymptotic results. Further, we perform a series of power simulations to investigate consistency of our test and to demonstrate its improved performance under certain alternative hypotheses. From now on we focus on the simulations for the case of k =, representing multivariate Behrens-Fisher problem. We provide examples for two cases of H L(δ) 1, = 5 and = 10 for the following settings of Σ i : Table 5 : Σ 1 = I, Σ = (0.5 i j ) and = 5, Table 6 : Σ 1 = I, Σ = (0.5 i j ) and = 10, Table 7 : Σ 1 = (0. i j ), Σ = (0.5 i j ) and = 5, Table 8 : Σ 1 = (0. i j ), Σ = (0.5 i j ) and = 10, 10

11 where = µ 1 µ, and we recall that in our notations, see (.7), N = O(p δ ). Using the same number of replicates as above, we draw k = 1, independent samples of size N k under H L(δ) 1 and using TD from (.8) calculate the empirical power as ( ) EP α TD, δ = of ( ) TD / σ D > ẑ(α) H L(δ) 1 is true. r To make comparisons of T D with the test statistics defined in (Srivastava (007) (S), Srivastava and Du (008) (SD) and Aoshima and Yata (011) (AY)), the same is repeated for the corresponding tests. Srivastava (007) and Srivastava and Du (008) discussed one and two sample problem, but their procedures for testing equality of two mean vectors are based on the assumption of equal covariance structure. So, in order to adapt these procedures to our approach, we at first consider transformation (3.), and adapted transformed y j (j = 1,..., N 1 ) to their procedures for one sample problem. Aoshima and Yata (011) proposed the test procedure for the multivariate Behrens-Fisher problem with size α and power no less than 1 β when L, where α, β (0, 1/) and L (> 0) are prespecified constant. As they assumed L = o(p 1/ ), we set up β = 0.1 and L = /. Also, procedure by Aoshima and Yata (011) does not need the assumption of normality, but in order to compare it with other procedures we carry out the simulation study under normality. Tables 5 to 8 show that our test statistic appear to be consistent as (N, p). Further, under simulated alternative hypothesis H L(δ) 1 with = 5 our statistic performs better than both S and SD, and is comparable to AY. For the alternative with = 10 our test turns out to be most powerful for almost all the settings of p ans N i. Hence, as increases the newly proposed test appears to dominate that of AY, and both are seen to be consistent as grows. We also see that neither test perform particularly well for = 5, in combination with small N i and large p. Further, a series of simulations is provided to demonstrate the advantage of the correction procedure suggested in (3.4) for the critical point of the test. We simulate the ASL of our test using ẑ(α) from the approximation (3.4), and ASL of AY which uses L z α /(z α + z β ) as the critical point (see for details Aoshima and Yata (011)). Both S and SD use z α. Tables

12 give ASL for the following cases: Table 9 : Σ 1 = I, Σ = (0.5 i j ) N 1 = 10, and N = 0, Table 10 : Σ 1 = I, Σ = (0.5 i j ) N 1 = 0, and N = 30, Table 11 : Σ 1 = (0. i j ), Σ = (0.5 i j ) N 1 = 10, and N = 0, Table 1 : Σ 1 = (0. i j ), Σ = (0.5 i j ) N 1 = 0, and N = 30. Also, AY(.5) and AY(5) denote AY with L =.5 and L = 5, respectively. Tables 9 to 1 show that the our test procedure based on ẑ(α) outperforms all other procedures for both types of alternatives, = 5 and = 10, and for all the settings of p and N i, yielding the value of ASL closest to the nominal level α. Lastly, we study the effect of p on the power of the test. From tables 5 to 1 one can see that for both = 5 and = 10 our test statistic appear to have most stable power when p grows, and this result remain valid for very small sizes. 5. Concluding remarks We have proposed a new test statistic for testing linear hypothesis of mean vectors assuming that covariance matrices are unequal. We also suggested a method for correcting the critical point of the test that leads to better performance in large p and small N i case. Simulations indicate that the newly derived test statistic, T D, in (.4) appear to perform well for a range of settings of H 0 specified by β i s and k >, when ẑ(α) from Corollary. is used as a critical point. For the multivariate Behrens-Fisher problem, our test procedure has a comparable power performance to that of Aoshima and Yata (011), and outperforms both procedures derived in Srivastava (007) and Srivastava and Du (008) for all the high-dimensional settings of p and N i and a number of settings of Σ i s. It is especially important to point out that our procedure performs well for small deviations from H 0, i.e. when = 5. In conclusion, our test can be recommended for testing the mean vectors for both k > case and multivariate Behrens-Fisher problem, when p is much larger than N i, and when a small deviation from H 0 is suspected. 1

13 Appendix A. A.1. Proof of Theorem.1. To derive the asymptotic expansion of the distribution of T D / σ D under null hypothesis we need estimators of a 1 and a. So we consider unbiased and (N, p)-consistent estimators of a 1 and a derived in Srivastava (005) (see, section ). By (N, p) consistency, we obtain T D σ D = N 1y y trs pâ. Let w = n 1 (â a ) which implies that w = O p (1) (see Srivastava (005)) and T D / σ D can be expanded as ( T D 1 = (N 1 y y trs) ) 1/ w σ D pâ n 1 a ( 1 = (N 1 y y trs) 1 1 ) w + O p (n 1 ). pâ n 1 a To further explore the distribution of T D / σ D, we derive the asymptotic expansion of the characteristic function. The following lemma is proved in Appendix B. Lemma A.1. Under the asymptotic framework (A.1), the characteristic function of T D / σ D can be expressed as 1/ (it) n C(t) = I (it) 1 / p Σ pa I p + Σ Ez n 1 pa [g(s, y)], 1,Z where g(s, y) = 1 (it)w n 1 a pa (N 1 y y trs) + O p (n 1 ). We now proceed to show Theorem.1 by analyzing 1/ (it) n (i) I (it) 1 / p Σ pa I p + Σ, (ii) n 1 pa Ez [g(s, y)], 1,Z 13

14 respectively. At first, for (i), the following equality are hold 1/ (it) n log I (it) 1 / p Σ log pa I p + Σ n 1 pa = 1 ( ) (it) trσ + 1 ( ) (it) trσ (it) trσ 3 pa pa 3 pa ( ) { (it) trσ 4 + O(p 3/ ) } n 1 (it) trσ 4 pa n 1 pa (it) trσ + O(p 7/ ) } n 1pa = (it) (it) 3 a (it)4 a 4 pa 3 pa + (it) n 1 + O(n 3/ 1 ). Therefore (i) can be rewritten as 1/ (it) I (it) p Σ pa I p + Σ n 1 pa { (it) (it) 3 a 3 = exp (it)4 a 4 pa 3 pa { } { (it) (it) 3 a 3 = exp (it)4 a 4 pa 3 pa n 1 / } + (it) + O(n 3/ 1 ) n 1 + (it)6 a 3 + (it) 9pa 3 n 1 Secondly, we expand Ez [g(s, y)] in (ii) as follows: 1,Z E (z 1,Z ) [g(s, y)] (it) = 1 E[w(N 1 y y trs) + O p (n 1 )] n 1 a pa (it) [{ n ( 1 tr(σz Z = 1 E ΣZ Z ) n 1 a pa p(n 1 + )(n 1 1) (tr(σz Z )) n 3 1 ) trσ p n 1 } +O(n 3/ 1 ). }( tr(σz 1 z 1) tr(σz Z ) ) + O p (n 1 ) n 1 14 ] (1)

15 (it) [{ 1 ( ) = 1 E n 1 a pa (n 1 + )(n 1 1) tr BZZ BZZ 1 ( ( )) ( ) tr BZ n 1 (n 1 + )(n 1 1) Z trσ }{ tr Az 1z1 1 ( )}] tr BZ n Z + O(n 1 ), () 1 where A = Σ ( I p 1 ( ) 1 (it) (it) Σ), B = Σ I p + Σ, pa n 1 pa respectively. lemma. To calculate the expectation in (), we need the following Lemma A.. Let z 1 and Z be mutually independently and distributed as N p (0, I p ) and N pn1 (0, I p I n1 ), respectively. Then the following expectations are calculated as E[tr(BZZ BZZ )]E[tr(Az 1z 1 )] = tra{n 1 (n 1 + 1)trB + n 1 (tra) }, E[tr(BZZ BZZ )tr(bzz )] = 4n 1 (n 1 + 1)trB 3 + n 1 (n 1 + n 1 + 4) trbtrb + n 1(trB) 3, E[{tr(BZZ )} tr(az 1z 1 )] = tra{n 1 trb + n 1(trB) }, E[{tr(BZZ )} 3 ] = 8n 1 trb 3 + 6n 1trBtrB + n 3 1(trB) 3, E[trΣ tr(az 1z 1 )] = trσ tra, E[trΣ tr(bzz )] = trσ trb. Proof. See, Himeno (007). Now, by applying Lemma A. to (), we obtain: E (z 1,Z ) [g(s, y)] = 1 + O(n 3/ 1 ). Summarizing (1) and (), we obtain the expansion of the characteristic function { (it) } [ (it) 3 a 3 C(t) = exp (it)4 a 4 + (it)6 a ] 3 + (it) + O(n 3/ pa 3 pa 9pa 3 1 ). n 1 15

16 By inverting this characteristic function, we get the following density function of T D / σ D ; f(z) = 1 exp{ itz}c(t)dt π [ = φ(z) 1+ 1 c 3 h 3 (z)+ 1 p p {c 4h 4 (z)+c 6 h 6 (z)}+ 1 ] c h (z) +O(n 3/ 1 ), n 1 where φ(z) is the density function of the standard normal distribution, c = 1, c 3 = a3 3 a 3, c 4 = a 4, c a 6 = a 3, 9a 3 and h i (z) s (i = 1,..., 6) are the Hermite polynomials given by h 1 (z) = z, h (z) = z 1, h 3 (z) = z 3 3z, h 4 (z) = z 4 6z + 3, h 5 (z) = z 5 10z z, h 6 (z) = z 6 15z z 15. Therefore, we obtain Theorem.1. A.. Proof of Corollary.. We prepare following Lemma to derive the Cornish-Fisher expansion of the upper 100α percentiles of T D / σ D. Lemma A.3. Let z(α) be z(α) = z α + 1 p q 1 (z α ) + 1 p q (z α ) + 1 n 1 q 3 (z α ), where z α is the upper 100α% point of the standard normal distribution and q 1 (z α ) = a3 3 (z a 3 α 1), q (z α ) = a 4 z a α (zα 3) a 3 z 9a 3 α (zα 5), q 3 (z α ) = z α. 16

17 Then under the framework (A.1) and assumption (A.3), ( ) TD P z(α) = 1 α + O(p 3/ ). σ D Proof. See, Appendix B.. Now, by replacing a i s in Lemma A.3 with their unbiased and consistent estimators â i s for i = 1,..., 4, we obtain Corollary.. A.3. Proof of Theorem.3. We derive the limiting distribution of the statistic TD /σ D under (A.1), (A.) and H L(δ) 1. TD /σ D is expanded as T D σ D = (trσ/trs)n 1y y trσ N 1 µ µ trσ + 4N 1 µ Σµ 1 = trσ + 4N 1 µ Σµ (z 1Σz 1 + N 1 µ Σ 1/ z 1 trσ) + o p (1). Also, the characteristic function of TD /σ D is calculated as { it(z C(t) = Ez 1 [exp 1Σz 1 + }] N 1 µσ 1/ z 1 trσ) + o(1) trσ + 4N 1 µ Σµ { } { (it)trσ = etr (π) p/ etr 1 ( I p trσ + 4N 1 µ Σµ z 1 (it)σ trσ + 4N 1 µ Σµ ) z 1 z 1 + (it) N 1 µ Σ 1/ z 1 trσ + 4N 1 µ Σµ } dz 1 + o(1). Further, we consider the following transformation z 1 = ( 1/ (it)σ I p z 1 trσ + 4N 1 µ Σµ) ( ) 1/ (it)σ (it) I p N1 Σ trσ + 4N 1 µ Σµ trσ + 4N 1 µ Σµ 1/ µ, 17

18 whose Jacobian is given by I p (it)σ/ trσ + 4N 1 µ Σµ 1/. Also, under (A.), 1/ log I (it)σ p trσ + 4N 1 µ Σµ = (it)trσ trσ + 4N 1 µ Σµ + (it) trσ trσ + 4N 1 µ Σµ + o(1). Therefore, 1/ I (it)σ p trσ + 4N 1 µ Σµ ( ) (it)trσ = exp trσ + 4N 1 µ Σµ + (it) trσ + o(1), trσ + 4N 1 µ Σµ and then we obtain the expansion of the characteristic function ( ) ( (it)trσ (it)trσ C(t) = exp exp trσ + 4N 1 µ Σµ trσ + 4N 1 µ Σµ ) ( ) (it) trσ (it) N 1 µ Σµ + exp + o(1) trσ + 4N 1 µ Σµ trσ + 4N 1 µ Σµ { } (it) = exp + o(1). Therefore, T D /σ D d N(0, 1). A.4. Proof of Theorem.4. Let T = T D T D = p N 1µ µ trσ, then the power of T D with significance level α can be expressed as: Power α ( T D, δ) = P (T D > σ D z α T H L(δ) 1 ). 18

19 Then, by the results of Theorem.3, lim Power α( T D, δ) = lim Φ p p ( T σ D z α By the assumption (A.), Power α ( T D, δ) 1 when 1/ < δ < 1, since T, and Power α ( T D, δ) α when 0 < δ < 1/, since T 0 and σd σ D. When δ = 1/, we obtain lim Power α( T D, δ) = Φ p Appendix B. B.1. Proof of Lemma A.1. ( T σ D z α ) σ D ). ( ) N1 µ µ = Φ z α. trσ The characteristic function of T D / σ D is calculated as [ ( (it) C(t) = E exp T )] [ { } ] D (it) = E exp (N 1 y y trs) g(s, y), σ D pa where g(s, y) = 1 (it)w n 1 a pa (N 1 y y trs) + O p (n 1 ). Let z 1 be a p-dimensional random vector distributed as N p (0, I p ), and Z be a p n 1 random matrix such that vec(z ) is distributed as N pn1 (0, I p I n1 ). Then we note that N 1 y y = tr(σ 1/ z 1 z 1Σ 1/ ), n 1 S = Σ 1/ Z Z Σ 1/, and we can rewrite the characteristic function as [ C(t) = E exp = { ( (it) tr(σ 1/ z 1 z pa 1Σ 1/ ) tr(σ1/ Z Z Σ 1/ ) { ( (π) (n1+1)p/ etr 1 ) } (it) I p Σ z 1 z 1 pa { etr 1 ( ) } 1 I p + Σ Z Z g(s, y)dz 1 dz. n 1 pa 19 n 1 )} ] g(s, y)

20 Here, we consider following transformations z 1 = ( I p ) 1/ ( ) n1 / (it) (it) Σ z pa 1, Z = I p + Σ Z n 1 pa, respectively, and the Jacobians for these transformations are (it) I p Σ pa 1/, (it) I p + Σ n 1 pa n 1 / Therefore the characteristic function can be written as 1/ (it) n C(t) = I (it) 1 / p Σ pa I p + Σ n 1 pa { (π) (n1+1)p/ etr 1 z 1z 1 1 } Z Z g(s, y)dz 1dZ = 1/ (it) I (it) p Σ pa I p + Σ n 1 pa n 1 /. Ez [g(s, y)]. 1,Z B.. Proof of Lemma A.3 Let z(α) be the upper 100α percentile of T D / σ D. We further expand z(α) as z(α) = u + 1 p q 1 (u) + 1 p q (u) + 1 n 1 q 3 (u). Now, from the result of Theorem.1, we derive the following expansion ( ) TD 1 α = P z(α) σ D [ 1 = Φ(z(α)) φ(z(α)) p c 3 h (z(α)) + 1 p {c 4h 3 (z(α)) + c 6 h 5 (z(α))} + 1 ] c h 1 (z(α)) + O(p 3/ ). n 1 0

21 Then, by Taylor expansion of Φ, φ and h i s around u, we obtain ( ) [ TD 1 P z(α) = Φ(u) φ(u) p {q 1 (u) c 3 h (u)} σ D + 1 p {q (u) c 4 h 3 (u) c 6 h 5 (u)} + 1 ] {q 3 (u) c h 1 (u)} + O(p 3/ ). n 1 Therefore, we have q 1 (u) = a3 3 (u 1), a 3 q (u) = a 4 u(u 3) a 3 u(u 5), a 9a 3 q 3 (u) = u. Acknowledgements T. Nishiyama s research was in part supported by the travel grant from The Royal Swedish Academy of Sciences, G.S. Magnuson s fond. M. Hyodo s research was in part supported by Japan Society for the Promotion of Science (JSPS). T. Seo s research was in part supported by Grant-in-Aid for Scientific Research (C)( ). T. Pavlenko s research was in part supported by grant from Swedish Research Council ( ). References [1] Anderson, T. W. (003). An Introduction to Multivariate Statistical Analysis Third Edition. Wiley, New york. [] Aoshima, M. and Yata, K. (011). Two stage procedures for highdimensional data. Sequential Analysis, 30, [3] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6,

22 [4] Bennett, B. M. (1951). Note on a solution of the generalized Behrens- Fisher problem. Annals of the Institute of Statistical Mathematics,, [5] Chen, S. X. and Qin, Y. L. (010). A two-sample test for highdimensional data with applications to gene-set testing. The Annals of Statistics, 38, [6] Dempster, A. P. (1958). A high dimensional two sample significance test. Annals of Mathematical Statistics, 9, [7] Dempster, A. P. (1960). A significant test for the separation of two highly multivariate small samples. Biometrics, 16, [8] Himeno, T. (007). Asymptotic expansions of the null distributions for the Dempster trace criterion. Hiroshima Mathematical Journal, 37, [9] Hyodo, M., Takahashi, S. and Nishiyama, T. (01). Multiple comparisons among mean vectors when the dimension is larger than the total sample size. Technical Report No.1-01, Statistical Research Group, Hiroshima University. [10] Johnson, R. A. and Weerahendi, S. (1988). A Baysian solution to the multivariate Behrens-Fisher problem. Journal of the American Statistical Association, 83, [11] Srivastava, M. S. (005). Some tests concerning the covariance matrix in high-dimensional data. Journal of Japan Statistical Society, 35, [1] Srivastava, M. S. (007). Multivariate theory for analyzing high dimensional data. Journal of Japan Statistical Society, 37, [13] Srivastava, M. S. and Du, M. (008). A test for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis, 99, [14] Srivastava, M. S. and Yanagihara, H. (010). Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101,

23 [15] Yanagihara, H. and Yuan, K. H. (005). Three approximate solutions to the multivariate Behrens-Fisher problem. Communications in Statistics - Simulation and Computation, 34,

24 Table 1: ASL in the case of k = 3 and β 1 = β = β 3 = 1 (i = 1,, 3). N i = 10(1 + i) N i = 0(1 + i) p α z α t ASL 1 α ASL α t ASL 1 α ASL α Table : ASL in the case of k = 3, β 1 = 1 and β = β 3 = 1/ (i = 1,, 3). N i = 10(1 + i) N i = 0(1 + i) p α z α t ASL 1 α ASL α t ASL 1 α ASL α

25 Table 3: ASL in the case of k = 4 and β 1 = β = β 3 = β 4 = 1 (i = 1,..., 4). N i = 10(1 + i) N i = 0(1 + i) p α z α t ASL 1 α ASL α t ASL 1 α ASL α Table 4: ASL in the case of k = 4, β 1 = β 3 = 1 and β = β 4 = 1 (i = 1,..., 4). N i = 10(1 + i) N i = 0(1 + i) p α z α t ASL 1 α ASL α t ASL 1 α ASL α

26 Table 5: Empirical powers with Σ 1 = I, Σ = (0.5 i j ) and = 5. N 1 = 10, N = 0 N 1 = 0, N = 30 p α S SD AY NHSP S SD AY NHSP Table 6: Empirical powers with Σ 1 = I, Σ = (0.5 i j ) and = 10. N 1 = 10, N = 0 N 1 = 0, N = 30 p α S SD AY NHSP S SD AY NHSP

27 Table 7: Empirical powers with Σ 1 = (0. i j ), Σ = (0.5 i j ) and = 5. N 1 = 10, N = 0 N 1 = 0, N = 30 p α S SD AY NHSP S SD AY NHSP Table 8: Empirical powers with Σ 1 = (0. i j ), Σ = (0.5 i j ) and = 10. N 1 = 10, N = 0 N 1 = 0, N = 30 p α S SD AY NHSP S SD AY NHSP

28 Table 9: ASL in the case of Σ 1 = I, Σ = (0.5 i j ). N 1 = 10, N = 0 p α S SD AY(.5) AY(5) NHSP Table 10: ASL in the case of Σ 1 = I, Σ = (0.5 i j ). N 1 = 0, N = 30 p α S SD AY(.5) AY(5) NHSP

29 Table 11: ASL in the case of Σ 1 = (0. i j ), Σ = (0.5 i j ). N 1 = 10, N = 0 p α S SD AY(.5) AY(5) NHSP Table 1: ASL in the case of Σ 1 = (0. i j ), Σ = (0.5 i j ). N 1 = 0, N = 30 p α S SD AY(.5) AY(5) NHSP

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School