Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality

Size: px

Start display at page:

Download "Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality"

Martin Nigel Armstrong
5 years ago
Views:

1 Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Yuki Yamada, Masashi Hyodo and Takahiro Nishiyama Department of Mathematical Information Science, Tokyo University of Science, 1-3, Kagurazaka, Shinjuku-ku, Tokyo, , Japan, Department of Mathematical Sciences, Graduate School of Engineering, Osaka prefecture university, 1-1, Gakuen-cho, Naka-ku, Sakai-shi, Osaka , Japan. Department of Business Administration, Senshu University, -1-1 Higashimita, Tama-ku, Kawasaki-shi, Kanagawa , Japan. Abstract In this article, we propose a test for making an inference about the blockdiagonal covariance structure of a covariance matrix in non-normal highdimensional data. We prove that the limiting null distribution of the proposed test is normal under mild conditions when its dimension is substantially larger than its sample size. We further study the local power of the proposed test. Finally, we study the finite sample performance of the proposed test via Monte Carlo simulations. We demonstrate the relevance and benefits of the proposed approach for a number of alternative covariance structures. AMS 000 subject classification: Primary 6H15; secondary 6F05. Key words: Tests of covariance structure, high dimension, statistical hypothesis testing, non-normality. 1. Introduction In recent years, a number of statistical methods for high-dimensional data such as DNA microarray gene expressions, where the number of feature variables, p, exceeds the sample size, N, have been discussed by many Hiroshima Statistical Research Group Technical Report March 30, 016

2 aouthors. For analysing high-dimensional data, the p p covariance matrix Σ is an important measure for dependence among the components of a high-dimensional random vector x, and thus plays a major role in many statistical procedures. Accordingly, the statistical inference of Σ is an important problem. Incidentally, a number of procedures for testing a high-dimensional covariance matrix, which include testing identity, sphericity and diagonal structure, have been proposed by Schott 005), Srivastava 005), Akita, Jin and Wakaki 010), Srivastava and Reid 01), and so on. All these tests assume either normality or moderate dimensionality such that p/n c for a finite constant c or both assumptions. There has also been substantial research on the inference of hte covariance matrix under non-normal highdimensional distributions: see, for example, Chen, Zhang and Zhong 010), Srivastava, Kollo and Rosen 011), Zhong and Chen 011), Li and Chen 01), Qiu and Chen 01), Yata and Aoshima 013), Srivastava, Yanagihara and Kubokawa 014) and Himeno and Yamada 014). In this paper, we provide tests for covariance matrices without the normality assumption while allowing the dimension p to be much larger than the sample size N. Let x 1, x,..., x N N 4) be independent and identically distributed i.i.d.) p-dimensional random vectors with E[x i ] µ and Var[x i ] Σ. Now, we partition x i, µ, and Σ into q components: x 1) i µ 1) Σ 11 Σ 1 Σ 1q x ) ị x i., µ µ )., Σ Σ 1 Σ Σ q......, x q) µ q) Σ q1 Σ q Σ qq i where x g) i and µ g) are p g 1 vectors and Σ gh is a p g p h matrix, g, h 1,..., q. Note that p q g1 p g. Our interest is to test structure for covariance: H 0 : Σ Σ d vs. H A : Σ Σ d, 1.1) where Σ d diagσ 11, Σ,..., Σ qq ). When p N, to verify the block-diagonal sturucture is very important; if H 0 holds, then the total number of unknown parameters to be estimated for Σ reduces to q i1 p ip i +1)/ instead of pp+1)/. Further, it is interesting to note that the block-diagonal covariance structure attracts much attention across the fields of convex optimization and machine learning, with the main forcus being the Gaussian graph structure learning procedures.

3 For testing hypothesis 1.1), Hyodo et al. 015) proposed a test statistic based on the normalized Frobenious matrix norm under normal assumption. They used an estimator of trσ proposed by Bai and Saranadasa 1996) and Srivastava 005). Under normality assumption, this estimator has unbaisedness, but this estimator is not generally unbiased. Thus, in this paper, we propose a new test statistic based on an unbiased estimator proposed by Himeno and Yamada 014) and Srivastava, Yanagihara and Kubokawa 014) that does not need the normality assumption. Hypothesis 1.1) includes some important hypotheses. For p g 1, that is, for a test of the covariance being diagonal H 0 : Σ diagσ 1,, σ p )), Schott 005) and Srivastava 005) proposed an approximate test for a highdimensional diagonal matrix under the multivariate normal assumption. Besides, Chen, Zhang and Zhong 010) proposed approximate test without the normality assumption. Further, for q, that is, for a test of the independence of two sub-vectors H 0 : Σ diagσ 11, Σ )), Srivastava and Reid 01) proposed a test procedure under the multivariate normal assumption, while Yata and Aoshima 015) proposed a test procedure without the normality assumption. Our testing procedure extends the results derived by Chen, Zhang and Zhong 010) and Yata and Aoshima 015). The rest of the paper is organized as follows. Section presents test procedure after establishing the asymptotic normality of the test statistics, and derives asymptotic powers in Section 3. Further, in Section 4, the attained significance levels and powers of the suggested test are empirically analyzed. Finally, Section 5 concludes this paper. Some technical details are relegated to the Appendix.. Test procedure.1. Assumptions In this section, we mention the several assumptions to made construct a test for 1.1). Suppose x 1, x,..., x N are i.i.d. p-dimensional random vectors such that x i Γz i + µ for i 1,..., N,.1) where Γ Γ 1), Γ ),, Γ q) ) is a p m constant matrix so that ΓΓ Σ, and z 1, z,..., z N are i.i.d. m-dimensional random vectors such that E[z i ] 0 and Var[z i ] I m, an m m identity matrix. Note that x g) i Γ g) z i +µ g). Further, we use the following assumptions as necessary. 3

4 A1) Let z i z i1,..., z im ), with each z ij having a uniformly bounded 4th moment, and there existing finite constants κ 3, κ 4 such that E[zij] 3 κ 3, E[zij] 4 κ and for any positive integers r and α l such that α l 4, r l1 α l 8, E[z α 1 ij 1 z α ij z αr ij r ] E[z α 1 ij 1 ]E[z α ij ] E[z αr ij r ], whenever j 1, j,..., j r are distinct indices. A1 ) Let z i z i1,..., z im ), with each z ij having a uniformly bounded 8th moment, and there existing finite constants κ 3, κ 4 such that E[zij] 3 κ 3, E[zij] 4 κ 4 +3 and for any positive integers r and α l such that r l1 α l 8, E[z α 1 ij 1 z α ij z α r ij r ] E[z α 1 ij 1 ]E[z α ij ] E[z α r ij r ], whenever j 1, j,..., j r are distinct indices. A) We assume one of the following asymptotic frameworks: i) N, q is fixed and at least one of p 1,..., p q goes to infinity. ii) N, q and p 1,..., p q are fixed. A3) Let τ trσ ggtrσ hh. Then under A), g,h1 τ 1 min trσ4, A4) Under A), τ 1 τ 1 g g h h g g,g,h 1 g g h g A5) Under A), trσ 4 gg) 1/4 trσ 4 hh) 1/4 o1). g,h1 trσ gh Σ hg )trσ g h Σ h g ) o1), τ 1 Ntr Σ Σ d o1), trσ gg)trσ g h Σ h g ) o1). 1 N {trσ Σ d } trσ ghσ hg)trσ g h Σ h g ) o1). g g,h h 4

5 We state some remarks on the conditions. In assumption A1), the requirement of the common third and fourth moments of z ij, is not essential and is purely for the sake of simpler notation. Our theory allows different third and fourth moments as long as they are uniformly bounded, which is assured by z ij having a uniformly bounded fourth moment. We sometimes assume A1 ) instead of A1) to investigate the power properties of the test. Assumption A)-i) means that number of blocks is small but the number of dimensions is large. Conversely, assumption A)-ii) means that the number of blocks is small, but the matrix is large-dimensional. We discuss some specific examples of assumptions A3) and A4). Assumption A3) is used to derive the asymptotic null distribution, and assumption A4) is used to derive the asymptotic non-null distribution. Let us consider the case q. If q, H 0 means the hypothesis that testing the correlation coefficient matrix between x 1) i and x ) i. For this hypothesis, Yata and Aoshima 015) proposed a test statistic using the extended cross-data-matrix methodology and showed the asymptotic normality of this statistic under assumptions A1), C1) C) { } trσ 4 min gg g1, 0 as p, trσ gg lim sup m { N trσ 1 Σ 1 trσ 11 trσ } <, where m min{n, p}. Note that if C1) holds, min { trσ 4, trσ 4 11) 1/ trσ 4 ) 1/} τ trσ4 11) 1/ trσ 4 ) 1/ τ as p. We also note that if C1) and C) hold, 0 Ntr Σ Σ d ) τ 4Nλ maxσ 11 )λ max Σ )trσ 1 Σ 1 trσ 11trΣ 0 as p. Here, λ max A) denotes the largest eigenvalue of matrix A. From these results, A3) and A4) quite similar to but somewhat weaker than conditions C1) and C) formulated by Yata and Aoshima 015). Thus, we can compare our statistics with those proposed by Yata and Aoshima 015) under assumptions A1), C1), and C). We now consider specific types of the alternative covariance structure under p 1 p p q. The following covariance structure is called the blocked 5

6 compound symmetric BCS) structure: Σ I q Σ R) + 1 q 1 q R, where Σ is a p 1 p 1 positive definite symmetric matrix, and R is a p 1 p 1 symmetric matrix. Here, 1 q denotes a q-dimensional vector with all elements being one. If q p, Σ σ and R σ ρ, where 1/p 1) ρ 1, Σ σ {1 ρ)i p + ρ1 p 1 p}. This covariance structure is called the intraclass correlation structure. Note that τ qq 1)trΣ, trσ 4 gg) 1/4 trσ 4 hh) 1/4 q q 1 trσ 4, g,h1 Ntr Σ Σ d N{qq 1)trRΣ + qq 1)q )trr 3 Σ ) +qq 1)q 3q + 3)trR 4 }. Using the results for the correlation matrix in Yata and Aoshima 015), we get Suppose that trrσ λ max Σ ) trr, trr 3 Σ trrσ trr 4 λ max Σ ) trr, trr 4 λ max Σ ) trr. A3 ) A4 ) trσ 4 0 as p, trσ { } lim sup N trr m <. trσ If A)-i), A3 ), and A4 ) are assumed, A3) and A4) holds. Further, if we assume A)-ii) and r ij 1, Nq A3) and A4) hold. Here, a b means that a Ob) and b Oa), and r ij denotes each element of the correlation matrix R. 6

7 .. Test statistic We note that H 0 is valid if and only if Σ gh O for g h, and the latter implies that trσ Σ d 0. A test for the hypothesis H 0 vs. H A can be based on the unbiased estimator of trσ Σ d trσ trσ d and the same can be used to develop the test statistic. We assume.1), and let x 1 N N i1 x i, S 1 N 1 N x i x)x i x), Q 1 N x i x) x i x) ; N 1 i1 then, the unbiased estimator of trσ is trσ N 1 NN )N 3) {N 1)N )trs + trs NQ}. This estimator has been proposed in Himeno and Yamada 014) and Srivastava, Yanagihara and Kubokawa 014). From these results, it is straightforward to propose the unbiased estimator of trσ d as trσ d where and x g) 1 N N 1 NN )N 3) N i1 i1 {N 1)N )trsg + trs g NQ g }, g1 x g) i, S g 1 N 1 Q g 1 N 1 N i1 {x g) i N i1 x g) i x g) )x g) i x g) ) x g) ) x g) i x g) )}. From these facts, we construct the test statistic as follows: T trσ trσ d which is an unbiased estimator of trσ Σ d under.1). The following theorem shows that statistic T is the rate-consistent estimator of trσ Σ d. Theorem.1. Under assumptions A1), A3), and A5), T trσ Σ d p 1. Here, p denotes convergence in probability. Proof) The assertion follows immediately from Lemma A.3. 7

8 .3. Asymptotic normality of the test statistic We first introduce the asymptotic variance of T : σ 4 NN 1) trσ ghσ hg)trσ g h Σ h g ) + g g,h h + 4κ 4 N tr {Γ Σ Σ d )Γ} {Γ Σ Σ d )Γ}), 8 N tr ) Σ Σ d where denotes the Hadamard product, that is, A B) ij A) ij B) ij. See Lemma A.3 for details.) Remark.1. Under H 0 : Σ Σ d, σ H 0 4 NN 1) g,h1 trσ ggtrσ hh. Note that for any symmetric matrix A, tra A) tra. Hence, tr {Γ Σ Σ d )Γ} {Γ Σ Σ d )Γ}) tr Σ Σ d. Thus, under assumptions A1) - A4), σ σ H 0 1. The following theorems establish the asymptotic normality of T. Theorem.. Under assumptions A1) - A4), T trσ Σ d σ H0 d N 0, 1), where d denotes the convergence in distribution. Proof) See Appendix A.. To test H 0, it is necessary to estimate σ H 0. Using the unbiased estimator of trσ gg, trσ gg N 1 NN )N 3) {N 1)N )trs g + trs g NQ g }, 8

9 we estimate σh 0 by σ H 0 4 NN 1) g,h1 trσ gg trσ hh. The following theorem shows that σ H 0 σh 0. is the ratio-consistent estimator of Theorem.3. Under H 0 and assumptions A1) and A), Proof) See Appendix A.3. σ H 0 σ H 0 p 1. Applying Theorems. and.3, under H 0, T σ H0 d N 0, 1). Hence, we propose a test based on the test statistic T and reject H 0 if T > σ H0 z α, where z α is the upper critical value of the standard normal distribution at significance level α. 3. Asymptotic power In this section, we discuss the asymptotic power of our statistic T. We define the power of the test under H A : Σ Σ d as Power Y HN Pr T > σ H0 z α Σ Σ d ). First, we compare the asymptotic power of our statistic and the asymptotic power of Yata and Aoshima s statistic. From Theorems. and.3, we obtain the following corollary. Corollary 3.1. We assume q, C1), and C). Then, T trσ Σ d σ H0 d N 0, 1). 9

10 Remark 3.1. Let δ trσ 1 Σ 1 ). /{NN 1)}trΣ 11 trσ From Corollary 3.1, under q, A1), A), C1), and C), { 1 Φ z α δ 0 ) if δ δ 0 <, Power Y HN 1 if δ. From the above results and Theorem 4.1 in Yata and Aoshima 015), the power of statistic T when q is equivalent to that of Yata and Aoshima s statistic asymptotically. Second, we derive the asymptotic power of our test under a specific type of alternative covariance structure, the BSC structure. Let qq 1)NN 1)trR δ. trσ Using Theorem., under A1), A)-i), A3 ), and A4 ); or A1), A)-ii), and r ij 1/ nq, T qq 1)trR σ H0 d N 0, 1). From the above results, we obtain the asymptotic power of our statistic as follows: { 1 Φ z α Power Y HN δ ) 0 if δ δ 0 <, 1 if δ. Finally, we derive the asymptotic distribution of T when we cannot assume A4). Moreover, we consider the asymptotic power of our test under H A : Σ Σ d. To derive the asymptotic distribution of T, we assume assumption A1 ) instead of assumption A1). In assumption A1 ), the requirement of common third and fourth moments of z ij is not essential and is purely for the sake of simpler notation. Our theory allows different third and fourth moments as long as they are uniformly bounded, which is assured by z ij having a uniformly bounded eighth moment. The following theorem establishes the asymptotic normality of T. 10

11 Theorem 3.1. Under assumptions A1 ), A), and A3), Proof) See Appendix A.4. We define T trσ Σ d σ d N 0, 1). ν trσ Σ d, σ which may be viewed as a signal-to-noise ratio for the testing problem. This is because trσ Σ d is the square of the Frobenius norm of the difference between Σ and Σ d. The following corollary expresses the asymptotic power of the test more clearly. Corollary 3.. Under assumptions A1 ),A), and A3), { 1 Φ σ H0 σ Power Y HN z ) α ν 0 if ν ν0 <, 1 if ν. Thus, if the difference between Σ and Σ d is not too small in that trσ Σ d is of the same order as σ or of a larger order, the test will be powerful. Conversely, if the difference between Σ and Σ d is so small that trσ Σ d is of a smaller order than σ, the test will not be powerful and cannot distinguish H 0 from H A. 4. Simulation studies We now investigate the effectiveness of the proposed test statistic by numerical studies. We first provide the numerical evaluation by simulating the attained significant level ASL), or size of the test. ASL is defined by ASLα) T 0 > σ H0 z α ), r where T 0 is the value of the test statistic T calculated from N independent observations under the null hypothesis H 0, r is the number of replications, and z α is the upper 100α percentiles of the standard normal distribution. In 11

12 our simulation studies, we set N 10, 50, 100 and r We evaluate ASL under the diagonal structure: { Σ diag } 1,, {0.5 + p } ) 1, p + 1 p + 1 for p 100, 00, 400, 1000, and under the block diagonal structure: Here, Σ B0.3 i j 1/3 )B B Σ I q Σ Σ q ) q, 5) / d 1/ / d 1/3 0.3 d 1 1/3 0.3 d 1/3 1 B, where B diag{ /d + 1)} 1,, {0.5 + d/d + 1)} 1 ). Through simulations, for j 1,..., N, z j z ij ) in assumption A1) emerges for the following cases: Case1) z j N p 0, I p ), Case) z ij u ij 1)/ for u ij χ 1, Case3) Further, we set α 0.05 and z ij SN1), Case4) z ij u ij / 3 for u ij t 3, Case5) z ij u ij / 5 for u ij t 5. q, d), 50),, 100),, 00),, 500), 5, 0), 5, 40), 5, 80), 5, 00). Tables 1-3 give the results for ASL. Overall, it may be noted that even if power is low for N 10, when the values of p and N become large, approximation accuracy becomes good. The procedures proposed by Srivasta 005) and Srivasta and Reid 01) need the assumption of normality. Thus, these procedures are not helpful for Cases -5. However, we note that the accuracy of approximation for our procedure is almost the same for Cases 1-5. Further, because Yata and Aoshima 015) provide a procedure under 1

13 HDLSS framework, for this simulation setting, we observe that our procedure is better than that of Yata and Aoshima 015). Further, we compute empirical power EP) under certain alternative covariance structure. EP is defined as EPα) T 1 > σ H0 z α Σ Σ d ), r where T 1 is the value of test statistic T calculated from N independent observations under the alternative hypothesis H 1. When H 0 : Σ is a diagonal covariance, we set the alternative hypothesis as follows: { i) H 1 : Σ σ ij ), σ ij δ ij i } I i j 0), p + 1 for i, j 1,,..., p, and { ii) H 1 : Σ diag } 1,, {0.5 + p } ) p 1 p I p ). p + 1 p + 1 When H 0 : Σ Σ q q, 5), we set the following alternative hypothesis: i) H 1 : Σ I q Σ d 1 d) + 1 q 1 q) d 1 d), and ii) H 1 : Σ I i j 1)) d 1 d) + I q Σ d 1 d)). Tables 4-9 give the results for EP. As a whole, it may be noted when the values of p and N become large, power becomes high. However, for case ii) and q, power becomes low when p Besides, for q, we observe that the power of our procedure is almost the same as that of Yata and Aoshima 015) and Srivasta and Reid 01) for Case 1 and 3. Further, we note that the power for q is higher than that for q 5 in the case i), and the power for q 5 is higher than that for q in case ii). When H 0 : Σ is a diagonal covariance, we observe that the power of our procedure is almost the same as that of Srivasta 005) for Case 1 and 3. The proposed test are slightly more powerful than Srivasta 005) in Case, 4 and 5. 13

14 5. Concluding remarks In this paper, we discuss the test for a block-diagonal covariance structure of the covariance matrix in non-normal and high-dimensional settings. This hypothesis includes some important hypotheses, for example, when each block size is one, that is, in a test for the covariance being diagonal, and when there are two blocks, that is, in a test for the independence of two subvectors. Further, the validity of commonly used high-dimensional statistical procedures including diagonal or block diagonal) linear classifiers requires the assumption of diagonal or block diagonal) covariance matrices. A test for the block-diagonal covariance structure of the covariance matrix usually used the likelihood ratio test. Akita, Jin and Wakaki 010) derived the Edgeworth expansion of a likelihood ratio test statistic when both sample size and dimension are large, and simulate the goodness of fit of its approximation. However, the likelihood ratio cannot use p > n. Motivated by an unbiased estimator of the Frobenius norm of Σ Σ d, Hyodo et al. 015) proposed an approximate test under a multivariate normal population. The difference between our study and theirs lies in the population distribution. Under the non-normal population, we proposed an approximate test based on the asymptotic normality of the unbiased estimator of Σ Σ d that does not need the normality assumption, and also studied its asymptotic power. The type-i error and power of our test have been investigated under some non-normal settings by simulation. Through the simulation results, we have confirmed that the approximate test is not bad in most cases except for in the case of N 10. However, some multivariate non-normal distributions will not satisfy assumption A1) or A1 )). For instance, elliptical distributions except for the multivariate normal model seldom satisfy assumption A1) or A1 )). For these reasons, it is interesting to investigate the robustness of our test under elliptical distributions. This is an important topic for future research that is of both theoretical and practical interest. Acknowledgments. The research of the third author was supported in part by a Grant-in-Aid for Young Scientists B673000) from the Japan Society for the Promotion of Science. 14

15 References [1] T. Akita, J. Jin, H. Wakaki, High-dimensional Edgeworth expansion of a test statistic on independence and its error bound. J. Multivariate Anal., ) [] Z. Bai, H. Saranadasa, Effect of high dimension: by an example of a two sample problem. Statist. Sinica, ) [3] S. X. Chen, L. X. Zhang, P. S. Zhong, Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc., ) [4] T. Himeno, T. Yamada, Estimations for some functions of covariance matrix in high dimension under non-normality and its applications. J. Multivariate Anal., [5] M. Hyodo, N. Shutoh, T. Nishiyama, T. Pavlenko, Testing blockdiagonal covariance structure for high-dimensional data. Stat. Neerl., ) [6] J. Li, S. X. Chen, Two sample tests for high-dimensional covariance matrices. Ann. Statist., 40 01) [7] Y. Qiu, S. X. Chen, Test for bandedness of high-dimensional covariance matrices and bandwidth estimation. Ann. Statist., 40 01) [8] J. R. Schott, Testing for complete independence in high dimensions. Biometrika, 9 005) [9] M. S. Srivastava, Some tests concerning the covariance matrix in highdimensional data. J. Japan Statist. Soc., [10] M. S. Srivastava, T. Kollo, D. von Rosen, Some tests for the covariance matrix with fewer observations than the dimension under non-normality. J. Multivariate Anal., ) [11] M. S. Srivastava, N. Reid, Testing the structure of the covariance matrix with fewer observations than the dimension. J. Multivariate Anal., 11 01)

16 [1] M. S. Srivastava, H. Yanagihara, T. Kubokawa, Tests for covariance matrices in high dimension with less sample size. J. Multivariate Anal., [13] K. Yata, M. Aoshima, High-dimensional inference on covariance structures via the extended cross-data-matrix methodology. Submitted arxiv: ), 015). [14] K. Yata, M. Aoshima, Correlation tests for high-dimensional data using extended cross-data-matrix methodology. J. Multivariate Anal., ) [15] P. S. Zhong, S. X. Chen, Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc.,

17 A. Appendix A.1. Preliminary results Lemma A.1. Let A, B, C and D are m m symmetric matrices and E, F are m m matrices. Under assumption A1), we have: i) E[z 1Ez 1 z 1F z 1 ] κ 4 tre F ) + tretrf + tref ) + tref ), ii) E[z 1Az z 1Bz z 1Ez ] κ 3tr A B)E ), iii) iv) E[z 1Az z 1Bz z 1Cz z 1Dz ] κ 4tr A B)C D)) + κ 4 tr AB) CD)) + κ 4 tr AC) BD)) + κ 4 tr AD) BC)) + trab)trcd) + trac)trbd) + trad)trbc) + trabcd) + trabdc) + tracbd) + tracdb) + tradbc) + tradcb), m ) 4 m E a ij z 1i z 1j const. aij. i j Proof) See supplemental material. Lemma A.. Let Γ g) Γ g) A g) a g) ij ) and q g1 Ag) Γ Γ A a ij ). It holds that i) tr A g) A h) ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, trσ 4, 17

18 ii) iii) iv) v) vi) tra g) A g ) A h) A h ) ),g h ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, trσ 4, tr g g,h h A g) A h)) A g ) A h ) )) ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, trσ 4, ) tr A g) A g ) )A h ) A h),g h ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, 4trΣ 4, tr g 1 h 1,g h g 3 h 3,g 4 h 4 A g 1 ) A g 3) ) A h 1) A h 3) ) ) A g ) A g 4) ) A h ) A h 4) ) ) ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, 18trΣ 4, g tra 1 ) A g) A g3) A g4) ) g 1 h 1,g h g 3 h 3,g 4 h 4 tra h1) A h) A h3) A h4) ) ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, 18trΣ 4, 18

19 vii) viii) tr A g) Γ Σ Σ d )ΓA h) ) min 4trΣ4, trσ 4 gg) 1/4 trσ 4 hh) 1/4 trσσ Σ d), { [diagγσ Σ d )Γ)] A g) A h)) g g,h h )} A g ) A h ) [diagγσ Σ d )Γ)] ) min trσ4, trσ 4 gg) 1/4 trσ 4 hh) 1/4 trσσ Σ d). Here, [diaga)] a 11, a,, a mm ) for m m matrix A a ij ). Proof) See supplemental material. Lemma A.3. Under assumptions A1)-A3), Proof) Let U 1 U U 3 E[T ] trσ Σ d, Var[T ] σ 1 + o1)). 1 NN 1) g,h1 N z ia g) z j z ia h) z j, i j 1 NN 1)N ) g,h1 1 NN 1)N )N 3) N i,j,k1 i j k i g,h1 z ia g) z j z ia h) z k, N i,j,k,l1 i j k l j l i k z ia g) z j z ka h) z l. Then T can be expressed as T U 1 U +U 3. It is straightforward to show that E[T ] E[U 1 ] trσ Σ d. By using Lemma A.1, Var[U 1 ] is calculate as Var[U 1 ] 19 6 i1 V 1) i,

20 where V 1) 1 4 NN 1) 4 NN 1) tra g g,h h ) V 1) 8 N tr A g) A h) g) A h) )tra g ) A h ) ) trσ ghσ hg)trσ g h Σ h g ), g g,h h 8 N tr Σ Σd, ) )) V 1) 3 4κ 4 N tr A g) A h) A g) A h) 4κ 4 N tr ΓΣ Σ d)γ ΓΣ Σ d )Γ), V 1κ 4 4 NN 1) tr A g) A h), V 1) 5 V 1) 6 8κ 4 NN 1) 4 NN 1) tr g g,h h tra,g h A g) A h)) )) A g ) A h ), g) A g ) A h) A h ) ). From Lemma A., V 1) 4, V 1) 5 and V 1) 6 are negligible under A1)-A3). Hence, Var[U 1 ] σ 3 t1 V 1) t σ + o1) A.1) under A1)-A3). From A.1) and 3 t1 V 1) t σ, we have Var[U 1 ] σ 1. A.) 0

21 By using Lemma A.1, Var[U ] is calculate as Var[U ] 5 i1 V ) i, where V ) 1 V ) V ) 3 V ) 4 V ) 5 NN 1)N ) NN 1)N ) tra g g,h h g) A h) )tra g ) A h ) ) trσ ghσ hg)trσ g h Σ h g ), g g,h h NN 1) tr A g) A h), NN 1) tr Σ Σ d κ 4 NN 1)N ) 4κ 3 NN 1)N ) NN 1)N ) tr g g,h h tr,g h tra,g h A g) A h)) )) A g ) A h ), A g) A g ) )A h ) A h) ), g) A g ) A h) A h ) ). From Lemma A., under A1)-A3), and V ) 3 σ 0, V ) 4 σ 0, V ) 5 σ 0, A.3) V ) 1 V 1) 1 0, V ) V 1) 1 0 A.4)

22 as N. From A.3) and A.4), under A1)-A3), Var[U ] σ By using Lemma A.1, Var[U 3 ] is calculate as Var[U 3 ] 5 t1 V ) t σ 0. A.5) 16 N N 1 N N 3 N E[z ia g1) z j z ka h1) z l z ia g) z k z ja h) z l ] g 1,h 1,g,h 1 i,j,k,l1 g 1 h 1,g h i j k l j l i k 8 + N N 1 N N 3 N E[z ia g1) z j z ka h1) z l z ia g) z j z ka h) z l ] g 1,h 1,g,h 1 i,j,k,l1 g 1 h 1,g h i j k l j l i k 8 NN 1)N )N 3) ) +tra g) A g ) )tra h) A h ) ).,g h From above result, under assumptions A1)-A3), tra g) A g ) A h) A h ) ) Var[U 3 ] σ 0. A.6) Combining A.), A.5) and A.6), Var[T ] σ {1 + o1)}. Lemma A.4. Under assumptions A1)-A4), it holds that T trσ Σ d 1 NN 1) N i j k 1 k,l 1 l β k1 l 1 k l z ik1 z ik z jl1 z jl + o p σ H0 ), where β k1 l 1 k l a g) k 1 l 1 a h) k l.

23 And under assumptions A1)-A3), it holds that T trσ Σ d 1 NN 1) + N N j1 N i j k 1 k,l 1 l β k1 l 1 k l z ik1 z ik z jl1 z jl { z j Γ Σ Σ d )Γz j tr )} Σ Σ d + op σ). Proof) From A.5) and A.6), U and U 3 are negligible under A1)-A3). So, we only consider U 1 in T trσ Σ d. Let U 11 U 1 N U 13 N U 14 U 15 1 NN 1) N j1 k,l1 N j1 N i j β k1l1klz ik1z ikz jl1z jl, k 1,k,l 1,l 1 k 1 k,l 1 l β klkl zjl 1), k,l 1,l 1 l 1 l 1 NN 1) NN 1) N β kl1klz jl1z jl, i j k,l1 N i j β klkl zik 1)zjl 1), k 1,k,l1 k 1 k β k1lklz ik1z ikzjl 1). Then these random variables are uncorrelated, namely, CovU 1t1, U 1t ) 0 for t 1 t, and U 1 can be expressed as U 1 trσ Σ d + 5 U 1t. At first we show first statement in Lemma A.4. For the proof of first statement, we need to show that Var[ 5 t U 1t] σ H t1

24 under A1)-A4). The variance of U 11 is caluclated as ) Var[U 11 ] NN 1) E β k1 l 1 k l z ik1 z ik z jl1 z jl k 1 k,l 1 l 4 NN 1) k 1 k,l 1 l β k 1 l 1 k l + β k1 l 1 k l β k l 1 k 1 l ) 4 NN 1) + 4 tra g g,h h tr g g,h h tra g g,h h g) A h) )tra g ) A h ) ) g) A h) A g ) A h ) ) + tr A g) A h) A g) A h)) A g ) A h ) )). Thus under A1)-A4), Var[U 11 ] V 1) 1 We also note that under A1)-A4), 1. A.7) Var[U 1 ] V 1) 1 1, V 1) 1 σ H 0 1. A.8) From A.7) and A.8), under A1)-A4), Var[ 5 t U 1t] σ H 0 Var[U 1] Var[U 11 ] V 1) 1 This proves the first statement of the lemma. 4 V 1) 1 0. σh 0

25 Next we show first statement in Lemma A.4. For the proof of first statement, we need to show that under A1)-A3). Note that Var[ 5 t4 U 1t] σ 0 U 1 + U 13 N N j1 { z j Γ Σ Σ d )Γz j tr )} Σ Σ d. By using Lemma A.1, the variance of U 1 + U 13 is caluclated as [ Var[U 1 + U 13 ] 4 N Var N [z j{γ Σ Σ d )Γ}z j tr ] ) Σ Σ d ] Thus under A1)-A3), j1 4κ 4 N tr[{γ Σ Σ d )Γ} {Γ Σ Σ d )Γ}] + 8 N tr ) Σ Σ d. Var[ 3 t1 U 1t] σ Var[U 11] + Var[U 1 + U 13 ] σ 1. A.9) We also note that under A1)-A3), Var[U 1 ] σ 1. A.10) From A.9) and A.10), under A1)-A3), Var[ 5 t4 U 1t] Var[U 1] Var[ 3 t1 U 1t] 0. σ σ This proves the second statement of the lemma. A.. Proof of Theorem.. From Lemma A.4, Under assumptions A1)-A4), the leading order term in Var[T ] is contributed by U 11. Hence, we only need to show the asymptotic normality of U 11. Let ε j j 1 β k1 l NN 1) 1 k l z ik1 z ik z jl1 z jl. i1 k 1 k,l 1 l 5

26 Then U 11 N j1 ε j. Define F 0 {, Ω}, F j σ{z 1, z,..., z j } for j 1,..., N be the σ-algebra generated by the sequence of random vectors z 1, z,..., z j. Then it is straightforward to show that E[ε j ] 0 and E[ε j F j 1 ] 0. So ε j is a martingale difference sequence. To apply martingale central limit theorem, we need to show that N j1 E[ε j F j 1 ] p 1 and σh 0 N j1 E[ε4 j] σ 4 H 0 0. A.11) We first show the first part of A.11). Note that E[ε j F j 1 ] 4 N N 1 j 1 i 1,i 1 k 1 k,k 3 k 4 γ k1 k k 3 k 4 z i1 k 1 z i1 k z i k 3 z i k 4, where γ k1 k k 3 k 4 l 1 l β k1 l 1 k l β k3 l 1 k 4 l + β k1 l 1 k l β k3 l k 4 l 1. To show the first part of A.11) we decompose N j1 E[ε j F j 1 ] into sum of ten parts N σj Var[U 11 ] + j1 9 R t, t1 where R 1 R R 3 R 4 8 N N 1 8 N N 1 8 N N 1 8 N N 1 j 1 N j γ k1 k k 1 k zik 1 1)zik 1), i1 k 1 k j 1 N γ k1 k k 1 k zik 1 1), i1 k 1 k j 1 N γ k1 k k 1 k zik 1), i1 k 1 k j 1 N γ k1 k k 1 k 3 zik 1 1)z ik z ik3, i1 k 1 k k 3 k 1 j j j 6

27 R 5 R 6 R 7 R 8 R 9 8 N N 1 8 N N 1 8 N N 1 4 N N 1 8 N N 1 j 1 N j γ k1 k k 1 k 3 z ik z ik3, i1 k 1 k k 3 k 1 j 1 N γ k1 k k k 3 zik 1)z ik1 z ik3, i1 k 1 k k 3 k 1 j 1 N γ k1 k k k 3 z ik1 z ik3, i1 k 1 k k 3 k 1 j 1 N γ k1kkk3z ik1z ikz ik3z ik4, i1 k 1 k k 3 k 4 k k 4 k 1 k 3 j 1 N γ k1 k k k 3 z i1 k 1 z i1 k z i k 3 z i k 4. k 1 k,k 3 k 4 j j j j i 1 <i Then we need to show E[Rt ] oσh 4 0 ) for t 1,..., 9 since Var[U 11 ]/σh 0 1. By applying Cauchy-Schwarz inequality, we obtain E[R1] 3κ 4 + γ N N 1) 3 k 1 k k 1 k + γ k1 k k 1 k γ k k 1 k k 1 ) k 1 k 64κ 4 + γ N N 1) 3 k 1 k k 1 k k 1 k 64κ 4 + m β N N 1) 3 k 1 l 1 k l + β k1 l 1 k l β k1 l k 1 l 1 k 1 k l 1 l l 1 l 56κ 4 + m N N 1) 3 56κ 4 + N N 1) 3 56κ 4 + N 3 N 1 k 1 k l 1 l β k 1 l 1 k l k 1,k,l 1,l 1 β k 1 l 1 k l trσ ghσ hg)trσ g h Σ h g ) g g,h h 7.

28 Simiraly, we obtain E[R] 51κ 4 + ) N 3 N 1 E[R 3] 51κ 4 + ) N 3 N 1 E[R 4] 56κ 4 + κ 3 + ) N 3 N 1 E[R 5] 560 N 3 N 1 E[R 6] 56κ 4 + κ 3 + ) N 3 N 1 E[R 7] E[R 8] 560 N 3 N N 3 N 1 trσ ghσ hg)trσ g h Σ h g ), g g,h h trσ ghσ hg)trσ g h Σ h g ), g g,h h trσ ghσ hg)trσ g h Σ h g ), g g,h h trσ ghσ hg)trσ g h Σ h g ), g g,h h trσ ghσ hg)trσ g h Σ h g ), g g,h h trσ ghσ hg)trσ g h Σ h g ), g g,h h trσ ghσ hg)trσ g h Σ h g ). g g,h h From these results and V 1) 1 σ 4, under A1)-A3), E[R t ] σ 4 E[R t ] V 1) 1 ON 1 ) t 1,..., 8). A.1) 8

29 Also, we evaluate E[R 9]. The expectation of R 9 is evaluated as following: E[R9] 3N ) γ 3N N 1) 3 k 1 k k 3 k 4 + γ k1 k k 3 k 4 γ k1 k k 4 k 3 k 1 k,k 3 k 4 +γ k1 k k 3 k 4 γ k k 1 k 3 k 4 + γ k1 k k 3 k 4 γ k k 1 k 4 k 3 ) 18N ) γ 3N N 1) 3 k 1 k k 3 k 4 k 1,k,k 3,k N ) m β 3N N 1) 3 k1 l 1 k l β k3 l 1 k 4 l + β k1 l 1 k l β k3 l k 4 l 1 k 1,k,k 3,k 4 1 l 1 l 56N ) m β 3N N 1) 3 k1 l 1 k l β k3 l 1 k 4 l k 1,k,k 3,k 4 1 l 1 l 56N ) m + β 3N N 1) 3 k1 l 1 k l β k3 l k 4 l 1 k 1,k,k 3,k 4 1 l 1 l 51N ) m β 3N N 1) 3 k1 l 1 k l β k3 l 1 k 4 l k 1,k,k 3,k 4 1 l 1,l 1 51N ) m + β 3N N 1) 3 k1 l 1 k l β k3 l k 4 l 1 k 1,k,k 3,k 4 1 l 1,l 1 104N ) m + β 3N N 1) 3 k1 lk lβ k3 lk 4 l k 1,k,k 3,k 4 1 l1 104N ) g tra 1 ) A g) A g3) A g4) )tra h1) A h) A h3) A h4) ) 3N N 1) 3 g 1 h 1,g h g 3 h 3,g 4 h 4 104N ) + 3N N 1) 3 A g 1 ) A g3) ) A h1) A h3) ) ) A g) A g4) ) A h) A h4) ) ). tr g 1 h 1,g h g 3 h 3,g 4 h 4 9

30 From Lemma A., under A1)-A3), E[R 9] σ 4 o1). A.13) From A.1), A.13) and σ /σ H 0 1 under A1)-A4), we obtain E[R t ] oσ 4 H 0 ). This proves first part of A.11). Next we show the second part of A.11). There exists constants c 1 and c such that E[ε 4 j] c 1j 1)j ) m ) E β N 4 N 1) 4 k1 l 1 k l z jl1 z jl k 1,k 1 l 1 l c j 1)j ) N 4 N 1) 4 trσ ghσ hg)trσ g h Σ h g ). g g,h h From the above results, N j1 E[ε4 j] σ 4 N j1 E[ε4 j] V 1) 1 From A.14) and σ /σ H 0 1 under A1)-A4), we obtain N E[ε 4 j] oσh 4 0 ). j1 ON 1 ). A.14) This proves second part of A.11) and completes the proof of Theorem.1. A.3. Proof of Theorem.3. Under assumptions A1) and A), it holds that Hence trσ gg trσ gg1 + O p N 1/ )). σ H 0 σ H O p N 1/ )) 30

31 and it hold σ H 0 σ H O p N 1/ ). From the above results, it completes the proof of Theorem.. A.4. Proof of Theorem 3.1. From Lemma A.4, under assumptions A1 )-A3), the leading order term in Var[T ] is contributed by U 11 + U 1 + U 13. Hence, we only need to show the asymptotic normality of U 11 + U 1 + U 13. Let ψ j ε j + N { z j Γ Σ Σ d )Γz j tr )} Σ Σ d. Then N j1 ψ j U 11 + U 1 + U 13, and it is straightforward to show that E[ψ j ] 0 and E[ψ j F j 1 ] 0. So ψ j is a martingale difference sequence. To apply martingale central limit theorem, we need to show that N j1 E[ψ j F j 1 ] p 1 and σ N j1 E[ψ4 j ] σ 4 0. A.15) We first show the first part of A.15). To show the first part of A.15) we decompose N j1 E[ψ j F j 1 ] into sum of ten parts E[ψ j F j 1 ] Var[U 11 ] + V 1) + V 1) 3 + where R t t 1,,..., 9) is defined in A.1), and 10 t1 R t, R 10 j 1 8 N N 1) i1 k 1,k,k 3 1 k 1 k γ k1kk3k3z ik1z ik. Then we need to show E[R t ] oσ 4 ) for t 1,..., 10 since under A1 )-A3), Var[U 11 ] + V 1) + V 1) 3 1. σ From A.1) and A.13), under A1 )-A3), E[R t ] σ 4 o1) t 1,,..., 9). A.16) 31

32 Also, we evaluate E[R10]. The expectation of R10 is evaluated as following: E[R10] 3 m m ) m ) γ N N 1) k1 k k 3 k 3 + γ k1 k k 3 k 3 γ k k 1 k 3 k 3 k 1 k k 3 1 k 3 1 k m γ N k1 k N 1) k 3 k 3 k 1 k k m β N k1 l N 1) 1 k l β k3 l 1 k 3 l β k1 lk lβ k3 lk 3 l k 1 k l 1,l,k 3 1 l,k m β N k1 l N 1) 1 k l β k3 l 1 k 3 l k 1,k 1 l 1,l,k m + β N k1 lk N 1) lβ k3 lk 3 l k 1,k 1 l,k N N 1) tr A g) Γ Σ Σ d )ΓA h) 56 + N N 1) [diagγσ Σ d )Γ)] A g) A h)) A i) A j)) diagγσ Σ d )Γ). g g,h h From above result and Lemma A., E[R10] 51 ) N N 1) min 4trΣ4, trσ 4 gg) 1/4 trσ 4 hh) 1/4 trσ Σ d. From above result, under A1 )-A3), E[R 10] σ 4 o1). A.17) Combining A.16) and A.17), we prove the first part of A.15). 3

33 Next we show the second part of A.15). By applying Hölder s inequality, E[ψj 4 ] 8E[ε 4 j] + 18 [ z N E 4 j Γ Σ Σ d )Γz j tr )) ] Σ Σ 4 d. A.18) Note that for any symmetric matrix A, E[z 1Az 1 tra) 4 ] const. {tra }. Hence, there exists a constant c 3 such that E[ { z jγ Σ Σ d )Γz j trσ Σ d) } 4 ] c3 tr Σ Σ d. A.19) From A.18) and A.19), N j1 E[ψ4 j ] oσ 4 ). This proves second part of A.15) and completes the proof of Theorem.3. 33

34 Table 1: Attained significant level in the case of diagonal matrix. our procedure p N Case 1 Case Case 3 Case 4 Case Srivastava 005)

35 Table : Attained significant level in the case of q. our procedure p N Case 1 Case Case 3 Case 4 Case Yata and Aoshima 015) , Srivastava and Reid 01)

36 Table 3: Attained significant level in the case of q 5. p N Case 1 Case Case 3 Case 4 Case Table 4: Empirical power in the case of diagonal matrix for i). our procedure p N Case 1 Case Case 3 Case 4 Case Srivastava 005)

37 Table 5: Empirical power in the case of diagonal matrix for ii). our procedure p N Case 1 Case Case 3 Case 4 Case Srivastava 005)

38 Table 6: Empirical power in the case of q for i). our procedure p N Case 1 Case Case 3 Case 4 Case Yata and Aoshima 015) Srivastava and Reid 01)

39 Table 7: Empirical power in the case of q for ii). our procedure p N Case 1 Case Case 3 Case 4 Case Yata and Aoshima 015) Srivastava and Reid 01)

40 Table 8: Empirical power in the case of q 5 for i). p N Case 1 Case Case 3 Case 4 Case Table 9: Empirical power in the case of q 5 for ii). p N Case 1 Case Case 3 Case 4 Case

41 Supplemental Material:Testing Block-Diagonal Covariance Structure for High-Dimensional Under Non-normality Yuki Yamada, Masashi Hyodo and Takahiro Nishiyama March 30, 016 Abstract The supplement provides full proof of Lemma A.1 and, which are omitted in the original paper. 1 Proof of Lemma A.1 First, we show i). E[z 1Ez 1 z 1F z 1 ] E e ii z1i + i1 e ii f ii E[z1i] 4 + i1 i j e ij z 1i z 1j f ii z1i + i1 e ii f jj E[z1iz 1j] + i j f ij z 1i z 1j i j e ij f ij + e ij f ji )E[z1iz 1j] i j κ 4 tre F ) + tretrf + tref ) + tref ). Second, we show ii). [ m ) m ) m )] E[z 1Az z 1Bz z 1Ez ] E a ij z 1i z j b ij z 1i z j e ij z 1i z j a ij b ij e ij E[z1iz 3 j] 3 κ 3tr A B)E ) Department of Mathematical Information Science, Tokyo University of Science, 1-3, Kagurazaka, Shinjuku-ku, Tokyo, , Japan, y.y.rfpfg.t.o.d.006.s.m.e@gmail.com. Department of Mathematical Sciences, Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, Naka-ku, Sakai, Osaka , Japan. hyodo@ms.osakafu-u.ac.jp Department of Business Administration, Senshu University, -1-1, Higashimita, Tama-ku, Kawasakishi, Kanagawa , Japan. 1

42 Third, we show iii). E [z 1Az z 1Bz z 1Cz z 1Dz ] [ m ) m ) m ) m )] E a ij z 1i z j b ij z 1i z j c ij z 1i z j d ij z 1i z j E a ij b ij z1iz j + + i,j,k,l1 i j,k l + i,j,k1 i j i,j,k1 j k a ij b ik z 1iz i z k + i,j,k1 i j a ij b kl z 1i z 1j z k z l c ij d ij z1iz j + c ik d jk z 1i z 1j z k + a ij b ij c ij d ij E[z1iz 4 j] i,j,k1 j k i,j,k1 j k i,j,k,l1 i j,k l i,j,k1 i j i,j,k,l1 i j,k l i,j,k,l1 i j,k l i,j,k,l1 i j,k l i,j,k,l1 i j,k l i,j,k1 i k a ij b ij c ik d ik E[z 4 1iz jz k] + a ij b ik c ij d ik E[z 4 1iz jz k] + a ik b jk z 1i z 1j z k i,j,k1 j k c ij d kl z 1i z 1j z k z l a ij b ij c kj d kj E[z 1iz 1kz 4 j] i,j,k,l1 i k,j l i,j,k1 j k a ik b il c jk d jl E[z 1iz 1kz jz l] + a ik b jk c ik d jk E[z 1iz 1jz 4 k] + a ik b jk c il d jl E[z 1iz 1jz kz l] + a ik b jl c ik d jl E[z 1iz 1kz jz l] + a ik b jl c jk d il E[z 1iz 1kz jz l] + c ij d ik z 1iz i z k a ij b ij c kl d kl E[z 1iz 1kz jz l] a ij b ik c ik d ij E[z 4 1iz jz k] i,j,k,l1 i j,k l i,j,k1 i j a ik b il c jl d jk E[z 1iz 1kz jz l] a ik b jk c jk d ik E[z 1iz 1jz 4 k] i,j,k,l1 i j,k l i,j,k,l1 i j,k l i,j,k,l1 i j,k l a ik b jk c jl d il E[z 1iz 1jz kz l] a ik b jl c il d ik E[z 1iz 1kz jz l] a ik b jl c jl d ik E[z 1iz 1kz jz l]

43 κ 4 a ij b ij c ij d ij + κ 4 +κ 4 m + + i,j,k1 i,j,k,l1 i,j,k,l1 a ij b ik c ik d ij + a ik b jk c il d jl + a ik b jl c il d ik + m i,j,k1 i,j,k,l1 i,j,k,l1 i,j,k,l1 a ij b ij c kj d kj + κ 4 a ij b ij c kl d kl + a ik b jk c jl d il + a ik b jl c jk d il + m i,j,k1 i,j,k,l1 i,j,k,l1 i,j,k,l1 a ij b ik c ij d ik a ik b il c jk d jl + a ik b jl c ik d jl a ik b jl c jl d ik i,j,k,l1 κ 4tr A B)C D)κ 4 tr AB) CD)) + κ 4 tr AC) BD)) +κ 4 tr AD) BC)) + trab)trcd) + trac)trbd) + trad)trbc) a ik b il c jl d jk +trabcd) + trabdc) + tracbd) + tracdb) + tradbc) + tradcb). Finally, we show iv). m ) 4 [ E a ij z 1i z 1j E i j + a ijz1iz 1j + 4 i j i j k l j l i k i j k i a ij a kl z 1i z 1j z 1k z 1l a ij a ik z 1iz 1j z 1k m ) m 1E a ij z1iz 1j + 48E +3E i j i j k l j l i k a ij a kl z 1i z 1j z 1k z 1l i j k i ) a ij a ik z1iz 1j z 1k. 1.1) The each term on right hand side in 1.1) is evaluated as m E a ij a kl z 1i z 1j z 1k z 1l 8a ija kl + 16a ij a kl a ik a jl )E[z1iz 1jz 1kz 1l] i j k l j l i k 4 i j k l j l i k i j k l j l i k a ija kl m 4 aij, 1.) 3 1.3)

44 m E i j k i ) a ij a ik z1iz 1j z 1k +4 + i j k i i j k i i j k l j l i k κ 4 + 3) + i j k l j l i k a ija ike[z 4 1iz 1jz 1k] a ij a ik a ji a jk E[z 3 1iz 3 1jz 1k] a ij a ik a lj a lk E[z 1iz 1jz 1kz 1l] i j k i a ij a ik a lj a lk a ija ik + 4κ 3 i j k i a ij a ik a ji a jk m κ 4 + 4κ 3 + 8) aij, 1.4) and m E i j a ijz 1iz 1j + a 4 ije[z1iz 4 1j] i j i j k l j l i k i j k i a ija kle[z 1iz 1jz 1kz 1l] a ija ike[z 4 1iz 1jz 1k] { κ κ 4 + 3) + 1 } m aij. 1.5) Combining 1.1)-1.4), we obtain iv). 4

45 Proof of Lemma A. First, we show i). By applying Cauchy-Schwarz inequality tr A g) A h),g h,g h m a g ) 4 ij,g h a g) ij ah) ij m m a g) ij a h) ij a g)4 ij ) 1/4 m ) 1/ m ) 1/4 m a h ) 4 ij ) 1/4 a h)4 ij a g ij a h ij ) 1/4 ) 1/ tra g)4) 1/4 tra h)4) 1/4 tra g ) 4) 1/4 tra h ) 4) 1/4 trσ 4 gg) 1/4 trσ 4 hh) 1/4..1) On the other hand, we note that tr A g) A h) since g1 g) aij ah) ) ij ag ij a h ) ij,g h a 4 ij g1 a ija g) ij + g1 a g) ij trσ 4,.) a 4 ij tra A tra A ) trσ 4, a g) ij g,h1 g,h1 tr A g) A g) )A h) A h) ) ) tra g) A h) ) trγ Σ d Γ Γ Σ d Γ) trσσ d trσ Σ d trσ 4 ) 1/ trσ 4 d) 1/ trσ 4. 5

46 From.1) and.), we obtain ) ) tr A g) A h) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, trσ 4. Second, we show ii). By applying Cauchy-Schwarz inequality tra g) A g ) A h) A h ),g h On the other hand, we note that g) tra A g ) 1/ tra h) A h ) 1/,g h trσ 4 gg) 1/4 trσ 4 hh) 1/4..3) tra g) A g ) A h) A h ),g h tra 4 tra g) A + g1 tra 4 + tra g) A h) g,h1 tra g) A h) g,h1 trσ 4,.4) since tra g) A h) tra g) A h) trσσ d trσ 4. g,h1 g,h1 From.3) and.4), we obtain ) tra g) A g ) A h) A h ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, trσ 4.,g h 6

47 Third, we show iii). By applying Cauchy-Schwarz inequality A g) A h)) )) m A g ) A h ) tr g g,h h On the other hand, we note that A g) A h)) )) A g ) A h ) tr g g,h h since tra A ) trσ 4, tr A g) A h) ) A g) A h) ) ) g,h1 From.5) and.6), we obtain tr g g,h h A g) A h)) )) A g ) A h ) g 1,h 1,g,h 1 g 1 h 1,g h g 1,h 1,g,h 1 g 1 h 1,g h i1 m i1 g 1,h 1,g,h 1 g 1 h 1,g h m j1 i1 j1 j1 a g 1) ij a g ) ij ) m j1 m ) a g 1) ij a g ) ij m a h 1) ij a h ) ij a h 1 ij a h ij a g 1 ij a g ij ) 1/ 1/ ) 1/ a h 1) ij a h ) ij 1/ trσ 4 gg) 1/4 trσ 4 hh) 1/4..5) tra A ) + g,h1 tra A ) + tr A g) A) A g) A) ) g1 tr A g) A h) ) A g) A h) ) ) g,h1 tr A g) A h) ) A g) A h) ) ) trσ 4,.6) g,h1 tra g) A h) trσσ d trσ 4. ) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, trσ 4. 7 )

48 Fourth, we show iv). By applying Cauchy-Schwarz inequality ) ) 1/ A g) A g ) )A h ) A h) tra g) A g ) ) tra h A h)) 1/,g h ) 1/ tra g) A g ) tr,g h,g h ) 1/ tra h ) 4 ) 1/ tra h)4 ) 1/ ) 1/ tra g)4 ) 1/ tra g ) 4 ) 1/,g h ) 1/ tra h ) 4 ) 1/ tra h)4 ) 1/ trσ 4 gg) 1/4 trσ 4 hh) 1/4..7) On the other hand, we note that ) A g) A g ) )A h ) A h),g h tr tr A A)A ) tr A g) A)AA g)) g1 tr A g) A h) )A h) A g)) g,h1 4trΣ 4,.8) 8

49 since ) tr A A)A tra A) 1/ tra 4 ) 1/ trσ 4, tr A g) A)AA g)) a g) ij a ija jk a g) ki g1 i,j,k1 ) 1/ ) 1/ a g) ij a ij a jk a g) ki ) g1 g1 m a 4 ij trσ 4, tr A g) A h) )A h) A g)) g,h1 g,h1 g,h1 ) 1/4 m g1 g1 k1 trσ 4) 1/4 trσσd ) 1/4 trσ 3 Σ d ) 1/ i,j,k1 g,h1 a g) ij ah) ij ah) a g) ij ah) ij jk ag) ki a g) ij a h) ij trσσ d trσ 4. ) 1/4 ) 1/ a g) ij tra g) A g1 a h) jk ag) ki k1 ) 1/ ) 1/ a h) jk ag) ki ) g,h1 k1 From.7) and.8), we obtain ) ) A g) A g ) )A h ) A h) min trσ 4 gg) 1/4 trσ 4 hh) 1/4, 4trΣ 4.,g h tr Fifth, we show v). Note that tr {A g 1) A g 3) ) A h 1) A h 3) )}{A g ) A g 4) ) A h ) A h 4) )} ) tr {A g 1) A g 3) ) A h 1) A h 3) )}{A g 3) A g 1) ) A h 3) A h 1) )} )) 1/ tr {A g) A g4) ) A h) A h4) )}{A g4) A g) ) A h4) A h) )} )) 1/ ) 1/ tra g 1 A g 3 )tra h 1 A h 3 ) tra g A g 4 )tra h A h 4 ) 4 t1 tra gt)4) 1/4 tra ht)4) 1/4. ) 1/ 9

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical