Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality

Size: px

Start display at page:

Download "Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality"

Aubrey Richard
5 years ago
Views:

1 Estimation of multivariate rd moment for high-dimensional data and its application for testing multivariate normality Takayuki Yamada, Tetsuto Himeno Institute for Comprehensive Education, Center of General Education Kagoshima University --0 Korimoto, Kagoshima Japan Faculty of Data Science Shiga University -- Banba, Hikone, Shiga 5-85 Japan Abstract This paper is concerned with the multivariate rd moment and its estimation. Mardia (970) and Srivastava (984) proposed the multivariate skewness and its estimator, independently. However, these estimators can not defined for the case in which the dimension p is larger than the sample size. In this paper, we treat the multivariate rd moment γ which is defined by using Hadamard product of observation vectors, and propose an estimate of γ which is well defined when p >. Based on the estimator, we propose new test for multivariate normality. Simulation results revealed that our proposed test has good accuracy. Keywords: Multivariate rd moment, Hadamard product, testing multivariate normality, (n, p)- asymptotic AMS 00 subject classification: Primary 6H5, Secondary 6E0 Introduction Let x and y be p-variate random vectors which are independently and identically distributed as a p-dimensional distribution F p (µ, Σ) with mean vector µ and covariance matrix Σ. Mardia (970) defined multivariate skewness and kurtosis as β M,p E[{(x µ) Σ (y µ)} ], β M,p E[{(x µ) Σ (x µ)} ], respectively. When p, β,p and β,p are reduced to the ordinal univariate squared skewness and kurtosis, respectively. Let x,..., x be a random sample drown from a population with distribution F p (µ, Σ). Mardia [0] proposed sample counterparts of β M,p and β M,p as respectively, where b M,p b M,p x j {(x i x) S (x j x)}, {(x i x) S (x i x)}, x i, S (x i x)(x i x). Srivastava [8] (as cited in a more recent book by this author [9]) defined other multivariate skewness β S,p and kurtosis β S,p, which are described as follows: Let Γ (γ,..., γ p ) be an orthogonal matrix such that Γ (γ,..., γ p ), where D λ diag(λ,..., λ p ). Let y l γ l x and θ l γ l µ for l,..., p. address:yamada@gm.kagoshima-u.ac.jp

2 Then β S,p and β S,p are given by β S,p p β S,p p { } E[(y l θ l ) ], λ / l E[(y l θ l ) 4 ] λ, l respectively. Srivastava [8] proposed sample counterparts of β S,p and β S,p as b S,p and b S,p, respectively. To describe them, let {/( )}S HD u H, where H (h,..., h p ) is an orthogonal matrix and D u diag(u,..., u p ). With being y li h lx i (l,..., p, i,..., ), b S,p and b S,p are given by b S,p p b S,p p { u / l u l } (y li ȳ l ), (y li ȳ l ) 4, respectively, where ȳ l y li. These sample counterparts of multivariate skewness and kurtosis are applied for testing multivariate normality, i.e., testing null hypothesis that population distribution is multivariate normal. Under multivariate normality, Mardia [0] showed that (/6)b M,p converges in distribution to χ (f M ), chisquared distribution with f M degrees of freedom, and {b M,p p(p + )}/{8p(p + )} / converges in distribution to (0, ). Here, f M p(p + )(p + )/6. Thus, if 6 b M,p χ f M ( α) () non-normality of the data is expected, where χ f M ( α) is the 00( α)% point of χ (f M ). Similarly, the normality is rejected if 8p(p + ) b M,p p(p + ) z α/, where z α/ is the 00( α/)% point of the standard normal distribution. Srivastava [8] showed that {p/6}b S,p converges in distribution to χ (p) under multivariate normality, and showed that (p/4) / (b S,p ) converges in distribution to (0, ). Based on these convergences, Srivastava [8] proposed testing criterion which rejects the normality of the data if and proposed the testing criterion that the normality is rejected if p 4 (b S,p ) z α/. p 6 b S,p χ p( α), () Some other methods to assess the multivariate normality have been studied. For a review of the results, see, e.g., Henze [] and Mecklin and Mundfrom []. In last years, we encounter more and more problems in applications when p is comparable with or even exceeds it. For example, financial data, consumer data, network data and medical data have this feature. When p >, it is impossible to define b M,p, b M,p, b S,p and b S,p, because the sample covariance matrix S becomes singular.

3 Instead of dealing with β M,p or β S,p, Himeno and Yamada [4] has treated κ E[{(x µ) (x µ)}] tr Σ (tr Σ). When Σ equals to identity matrix I p, κ is the same as the another definition of the Mardia [0] s multivariate kurtosis β M, p(p + ). Himeno and Yamada [4] gave the unbiased estimator of κ as the sample counterpart of κ, which is as follows: where ˆκ ( )( ) { tr S + (tr S) ( + )Q}, Q {(x x) (x x)}. It is noted that ˆκ is well defined for the case in which p >. Himeno and Yamada [4] showed that (/8) / p (ˆκ/â ) converges in distribution to (0, ) as and p go to infinity together under the following assumptions: () ( )/p converges to a positive constant as, p go to infinity; () a i tr Σ i /p converges to a positive constant as p, i,..., 4. Here, â is an unbiased estimator of a, which is as follows: â ( )( )( ) {( )( ) tr S + (tr S) ( ) Q}. They proposed testing criterion which rejects the normality of the data if ˆκ 8 p z α/. â It is reported in Himeno and Yamada [4] that the power of this test gets large as and p become large for local alternative hypothesis that population distribution is multivariate t. In this paper, we treat other multivariate rd moment γ instead of dealing with β M,p or β S,p. An estimator of γ is proposed that is well defined for the case in which p >. Based on ˆγ, we propose a new test for assessing multivariate normality. The present paper is organized as follows. In Section, we propose γ, and gave the estimator ˆγ. Asymptotic normality of ˆγ is shown under the asymptotic framework that, p for the case in which population distribution is multivariate normal. We give a testing criterion for multivariate normality based on ˆγ. Some results of small-scale simulation including to see attained significance level and empirical power are reported in Section. We give concluding remarks in Section 4. All technical proofs are relegated to the Appendix. Multivariate rd moment and its estimator Assume that x,..., x are independently and identically distributed (i.i.d.), and assume the following multivariate linear model: x i µ + Σ / ε i, () where ε,..., ε are i.i.d. as a p-dimensional distribution F p (0, I p ). For random vector x µ+σ / ε with ε F p (0, I p ), let γ E[{ (x µ)} p ] p( Σ) p, where the notation i A stands for the Hadamard product of i matrices A, i.e., i A A A A (i times). We note that i A is positive definite, which is guaranteed by Schur s product theorem (cf. Schott [7]). When p, γ is reduced to the univariate skewness. It is hard to obtain analytic form for

4 4 the unbiased estimator of γ. Instead of getting it, we construct the estimator by taking the ratio of unbiased estimator for the numerator of γ divided by for the denominator, which an estimator we proposed is as follows. E[{ ˆγ (x µ)} p ] T, p ( Σ) S p where T 4 * [ { x i P (x j + x k )}] p, S 8 P 6,k * p {(x k x l )(x k x l ) (x α x β )(x α x β ) (x γ x δ )(x γ x δ ) } p. k,l,α,β, γ,δ Here, the notation stands for the sum of all pairs of indices which are not equal. For example,,k j,j i k,k i,k j.. Asymptotic distribution of ˆγ and a test for γ Firstly, we show the asymptotic normality of ˆγ as, p under the assumption that F p (0, I p ) p (0, I p ). It is noted that T is invariant under the location shift. Hence, without loss of generality, we may assume µ 0. It can be described that T 4 { * [Σ / z i P (z j + z k )}] p.,k For deriving the anaclitic form of Var(T ), we decompose T as T + + (a lz i ) ( )( ) ( ),k l z i ) a lz j * a l z i a lz j a lz k { (a l z i ) a la l a lz i } ( ) ( )( ) T + T + T,,k * a l z i a lz j a lz k l z i ) a lz j ( )a la l a lz i where Σ / A (a,..., a p ). After much algebraic calculations, it is shown that Cov(T, T ) 0, Cov(T, T ) 0 and Cov(T, T ) 0, and so σ T Var(T ) Var(T ) + Var(T ) + Var(T ). We obtain analytic forms of Var(T ), Var(T ) and Var(T ), respectively, which are as follows. Var(T ) 6 p( Σ) p, (4) 8 Var(T ) ( ) p( Σ) p, (5) 4 Var(T ) ( )( ) p( Σ) p. (6)

5 5 These proofs are given in Appendix. From (4), (5) and (6), we have It is found that Var(T ) 6 ( )( ) p( Σ) p. Var(T ) σ T From Chebyshev inequality we have ( ) 0 (, p ), Var(T ) σ T 4 0 (, p ). These probability convergences imply that p p T 0 (, p ), T 0 (, p ). σ T σ T σ T (T T ) p 0 (, p ). And we can express that σ T T [ σ T { (a l z i ) a la l a lz i } ] η i σ T. It is shown that E[η i ] 0, and is shown in Appendix that ( ) ηi Var 6 σ T σt p( Σ) p, (7) which converges to as, p. By virtue of the ordinal central limit theorem, we find that the asymptotic distribution of σ T T is (0, ) as, p. Using Slutsly theorem, the asymptotic normality of T can be proved, which is given as the following theorem. Theorem. As, p, σ T T D (0, ). In Appendix, we prove the rate consistency for S/ p( Σ) p under the condition C; C : lim sup p p( Σ + ) p p( Σ) p <, where Σ + ( σ ij ) for Σ (σ ij ). The result is given as the following theorem. Theorem. Suppose that the condition C holds. Then, as, p, S p( Σ) p p. From these theorems (Theorem and Theorem ) and Slutsky s theorem, we find that 6 ˆγ D (0, ) (8) as, p under the condition that C holds. As an application, assessing the assumption of multivariate normality is considered by testing γ 0. The null hypothesis that γ 0 is rejected with significance level α for the case in which /6ˆγ > z α/, where z α is the 00 α% point of the standard normal distribution.

6 6 umerical results. Simulation In order to see the accuracy of the asymptotic normality (8), we tried a numerical simulation. It can be expressed that S A B + C D, (9) P P 4 P 5 P 6 where A C * p (x k x k x l x l x α x α) p, B k,l,α k,l,α,β,γ * p (x k x k x l x α x β x γ) p, D k,l,α,β k,l,α,β,γ,δ * p (x k x k x l x l x α x β) p, * p (x k x l x α x β x γ x δ) p. To compute A, it is needed to calculate triple sum, which is performed by triple-loop calculations. We see that B, C, D and T are needed to perform multi-loop calculations. Generally, it takes a lot time to carry out multi-loop calculations. To save computing time, we shall provide other expressions for A, B, C, D and T. Put Then, it holds that X ij ( i x k )( j x k ) (i, j,..., ), k { X 0i X i0 ( i x k ) p} (i,..., ), s k k ( k x i ) (k,..., ). T ( )( ) s p ( )( ) (s s ) p + ( )( ) ( s ) p, A [ p ( ] X ) X X + X p, B p [ ( X ) + 5X X 6X 4X 0 X X + 4X 0 X +X X + X 0 X 0 ( ] X ) X 0 X 0 X p, C [ p ( X ) 4X X + 4X + 6X 0 X X 4X 0 X 8X X 4X 0 X 0 ( X ) + 8X 0 X 0 X 4X 0 ( X 0 ) X + 4( X 0 ) X + 4X 0 X X 0 4X X 0 X 0 ( X 0 ) X +X 0 X X 0 + ( X 0 ) ( X 0 ) X ] p, D {( s ) p (s s ) p + s p } 6A 8B 9C. One can see that these expressions have no multi-sum expression. Generate the data based on the model (). Firstly, we checked the asymptotic normality of ˆγ given in (8). Make 0 4 samples of the size, each of which is constructed by p-dimensional vectors which are i.i.d. as p (0, Σ). In this study, the following two cases for Σ are treated: () Σ I p ; () Σ has Toeplitz structure such that Σ Σ T D / P D /, where D diag(d,..., d p ) with d i 5+( ) i (p i+)/p, and P (0. i j ). We note that these two cases satisfy the condition C. Figure shows Q-Q plot of /6ˆγ when Σ I p for the case in which (, p) (60, 60), and Figure shows Q-Q plot when Σ Σ T. We can see the goodness of fit of the standard normal distribution to /6ˆγ from these figures. This numerical simulations were carried out for several other dimensions

7 ormal Q Q Plot Sample Quantiles 0 Sample Quantiles 4 4 ormal Q Q Plot Theoretical Quantiles Figure : Q-Q plot of 0γ when Σ I Theoretical Quantiles Figure : Q-Q plot of 0γ when Σ ΣT as well, p 0, 40, 480, and also for other sample sizes, 0, 40, 480, but due to similarity of the graphs, only a selection is reported here. ext, we checked the attained significance level(asl) for testing the null hypothesis that H0 : Fp (0, I p ) p (0, I p ) under the assumption of model () by using the following criterion: /6γ > z α/ reject the null hypothesis that H0 : Fp (0, I p ) p (0, I p ). (0) To compute ASL, Monte Carlo simulation with 04 replication was done for the nominal level α We computed for the case in which 60,0, 40, 480, and p 60, 0, 40, 480, and wrote these values in Table. The settings of Σ treated in this paper are as follows. Case. Σ I p. Case. Σ ΣT. Case. Generate a sparse positive definite covariance matrix Σ by Algorithm. Set sparsity level Algorithm The algorithm for generating sparse covariance matrix Σ : Construct p p matrix R (rij ) as follows. For each i < j, with probability ( s) 0.75, Unif(0, ) Unif(, 0) with probability ( s) 0.75, rij 0 with probability s, where s is the level of sparsity in R. Then, we set rji rij to obtain symmetry. Furthermore, set the diagonal elements of R to equal. : Prepare a p p diagonal matrix D (dij ) such that d,..., dpp are i.i.d. chi-squared distribution with degree of freedom. : Create A DRD. 4: In order to obtain positive definiteness of Σ, calculate the minimum eigenvalue emin of A. Set { A + ( emin + 0.)I p if emin 0, Σ A otherwise. s 0.7. Case 4. Use the same generating method as Case with sparsity level s 0.4. Case 5. Use the same generating method as Case with sparsity level s 0.. For Case -5, we repeat the Monte Carlo simulation 0 times, and wrote the average of these 0 values in the table. The value in parenthesis is the standard error. 7

8 8 Table : Attained significance level of the proposed test p Case Case Case (S.E) Case 4(S.E) Case 5(S.E) (0.00) 0.056(0.006) 0.056(0.004) (0.004) 0.056(0.00) 0.056(0.000) (0.005) 0.056(0.00) 0.056(0.00) (0.005) 0.056(0.00) 0.056(0.00) (0.00) 0.05(0.004) 0.05(0.00) (0.00) 0.05(0.00) 0.05(0.009) (0.00) 0.05(0.00) 0.05(0.004) (0.00) 0.05(0.00) 0.05(0.00) (0.005) 0.05(0.00) 0.05(0.00) (0.005) 0.05(0.000) 0.05(0.000) (0.00) 0.05(0.008) 0.05(0.000) (0.000) 0.05(0.00) 0.05(0.000) (0.00) 0.05(0.00) 0.05(0.00) (0.00) 0.05(0.00) 0.05(0.00) (0.000) 0.05(0.00) 0.05(0.00) (0.00) 0.05(0.004) 0.05(0.00) We can see from Table that ASL of our statistic is little affeected by the sparsity of the covariance matrix. It is observed that the accuracy of the approximation gets better as and p become large. Thirdly, we checked low-dimensional performance of the proposed test. The setting of dimensionality is p 0. We compare ASL of our proposed test with the one of Mardia [0] s test () and the one of Srivastava [8] s test () for the case in which 40, 50, 60, 70, 80, 90, 00 and Σ I 0. Computing methodology is the same as the one in Table. The symbol P denotes the value of ASL for proposed test (0), M denotes for Mardia [0] s test (), and S denotes for Srivastava [8] s test (). We can see that ASL of our proposed test overestimate, whereas ASLs of Mardia [0] s test and Srivastava [8] s test underestimate. Our approximation seems to be good for the case in which 70, whereas other two approximations are not good. The precisions of all three tests gets better as becomes large. ASL P P P P P P P S S S S S S S M M M M M M M Figure : Comparison of ASL when p 0 Finally, we checked the empirical power(emp) of the proposed test by Monte Carlo simulation.

9 9 The setting of the local alternative hypothesis treated in this paper is as follows: For ε (ε,..., ε p ) F p (0, I p ), ν ( ci ) ε i ν (i,..., p), where c,..., c p are i.i.d. as the chi-squared distribution with ν degrees of freedom. We computed the EMP based on 0 repetition for the case in which α 0.05 and ν The settings of and p are the same as the ones in Table. We carried out for all models of Σ written in Table, but due to similarity of the tendencies, Case -5 are omitted here. We can see from Table that EMP are monotone increasing for p and. Table : Empirical power of the proposed test for Case \ p Real data analysis We applied our test to data-set which is used in Kubokawa and Srivastava [6]. The first applied data is colon data which is constructed by 000 (p) genes expression levels on normal colon tissues and 40 tumor colon tissues. We preprocessed the data by applying 0 logarithmic transformation. The second applied data-set is leukemia data which is constructed by 57(p) gene expressions on 47 patients suffering from acute lymphoblastic leukemia (ALL, 47 cases) and 5 patients suffering from acute myeloid leukemia (AML, 5 cases). The data-set are preprocessed by following protocol written in Dudoit et al. []. For the colon data, the p-value of our test is for normal colon tissues and for tumor colon tissues. For the leukemia data, we observed that both the p-value of our test for ALL and for AML are for the leukemia data. These results indicate that the multivariate normality assumption on both sets of data cannot be rejected at the usual significance level 5%. 4 Concluding remarks This paper treats multivariate rd moment γ which is defined by using Hadamard product of observation vectors. When the dimension equals to, γ becomes univariate skewness. An estimator of γ, denoted as ˆγ, which is well defined for p > is proposed. Based on ˆγ, testing criterion for assessing multivariate normality is given. We prepare the expression of the testing statistic without having multi-sum. Calculating time becomes shorten. The performance of the testing criterion, in terms of the attained significance level and the empirical power, is shown through simulations for several setting of sample sizes and dimensions. We apply our proposed test to microarray data sets, some of the most popular are in high-dimensional data. To assess the normality of high-dimensional data, we recommend to use not only our proposed test but also Himeno and Yamada [4] s test based on ˆκ together. Omnibus test using ˆγ and ˆκ, which imitates Jarque and Bera [5] s test, is a future work. A Moment of statistic In this section, analytic forms of Var(T ), Var(T ) and Var(T ) are proposed. We derive them by using these results (Lemma ), proofs of which are tedious but simple, therefore skipped.

10 0 Lemma. Let z be a random vector distributed as p (0, I p ), and a and b be constant vectors. Then, E[(a z) ] a a, E[(a z) 4 ] (a a), E[(a z) (b z) ] (a b) + a ab b, E[(a z) b z] a aa b, E[(a z) 6 ] 5(a a), E[(a z) (b z) ] 6(a b) + 9a aa bb b. Here, we write analytic forms of Var(T ), Var(T ) and Var(T ) in the following lemma. Lemma. For T, T and T, as defined in Section., Proof. In Section, we expressed that where Var(T ) 6 p( Σ) p, 8 Var(T ) ( ) p( Σ) p, 4 Var(T ) ( )( ) p( Σ) p. η i T η i, { (a l z i ) a la l z } lz i. Since η,..., η are i.i.d., it holds that ( Var(T ) Var(η) Var where z p (0, I p ). We find that E[ζ] 0, and so ) {(a lz) a la l a lz}, Var(ζ) E[ζ ] [ E {(a lz) a la l a lz} + * {(a l z) a la l a lz}{(a αz) a αa α a αz} l,α [ {( )(a la l ) } + * {6(a l a l ) + ( )a la l a la α a αa α } l,α 6 (a la l ) + l a α ) 6 p( A ) p 6 p( Σ) p, l,α

11 where the third equality follows from Lemma. Since E[T ] 0, we have Var(T ) E[T ] 9 ( ) E * (a l z i ) 4 (a lz j ) ( )a la l a lz i + * * (a l z i ) 4 (a lz j ) ( )a la l a lz i l,α * (a α z i ) 4 (a αz j ) ( )a αa α a αz i 9 ( ) E l z i ) 4 (a lz j ) + l z i ) (a lz j ) (a lz k ) l,α +( ) (a la l ) (a lz i ) ( )a * la l (a l z i ) (a lz j ) + * l z i ) (a αz i ) a lz j a αz j + l z i ) a lz j a αz j (a αz k ) ( )a * αa α (a l z i ) a lz j a αz j ( )a * la l (a α z i ) a αz j a lz j +( ) a la l a αa α 9 ( ) a lz i a αz i }],k,k [ { ( ) + ( )( ) + ( ) ( ) } (a la l ) + * { ( ){(a la α ) + a la l a αa α }a la α l,α + {( )( ) ( ) ( ) + ( ) }a la l a la α a αa α ] 9 ( ( ) ) 8 ( ) p( A ) p 8 ( ) p( Σ) p, (a la l ) + ( ) l a α ) where the fourth equality follows from Lemma. l,α

12 Since E[T ] 0, we have Var(T ) E[T ] 4 ( ) ( ) E * a l z i a lz j a lz k l,α,k,k + * * a l z i a lz j a lz k * a α z i a αz j a αz k 4 ( ) ( ) E 6 +6,k,k * * a l z i z ia α a lz j z ja α a lz k z ka α l,α,k 4 ( )( ) (a la l ) + 4 ( )( ) p( A ) p 4 ( )( ) p( Σ) p, where the fourth equality follows from Lemma. B Proof of Theorem l z i ) (a lz j ) (a lz k ) l a α ) In this section, we give a proof of Theorem. Before proving it, we prepare three lemmas. The first two lemmas (Lemma and Lemma 4) treat formula about multi-sum, and the last lemma treats the order of the expectations which is used to prove Theorem. Since proofs of Lemma and Lemma 4 are tedious but simple, we skipped here. Lemma. The following equations hold: ij σstσ is σ jt tr{( Σ)Σ} tr DΣ( Σ)ΣD tr D( Σ)Σ( Σ) p( Σ) p,s,t ij σ st σ is σ jt σ it σ js,s,t + p{d ( Σ)} p j s t l,α ij σis,s ij σ is σ jj σjs,s σ ij σ st σ is σ jt σ it σ js tr D( Σ)Σ( Σ) tr D( Σ)Σ( Σ) + p{d ( Σ)} p ii σijσ isσ js,,s ii σijσ isσ js.,s

13 Lemma 4. The following equations hold: ij σis p( Σ) p p{d ( Σ)} p ii σij 6 ij,,s ii σ ij σ is σjs tr DΣ( Σ)ΣD p{d ( Σ)} p ii σij ii σijσ jj,,s ii σijσ isσ js tr D( Σ)Σ( Σ) p{d ( Σ)} p,s ii σij ii σijσ 4 jj. Lemma 5. Let Q klαβγδ a iz k z l a j a i z α z β a j a iz γ z δa j, j where Σ / A (a,..., a p ), and z k, z l, z α, z β, z γ and z δ are i.i.d. as p (0, I p ). Then, E[Q ] O max { p( Σ) p }, σ ij σ st σ is σ jt σ it σ js, () j s t E[Q 4] O max { p( Σ) p }, σ ij σ st σ is σ jt σ it σ js, () j s t E[Q 45] O ( { p( Σ) p } ), () E[Q 456] O ( { p( Σ) p } ). (4) In addition, under the assumption C, E[Q ] O ( { p( Σ) p } ), E[Q 4] O ( { p( Σ) p } ). Proof. Since (), () and (4) can be proved by using the same way to show (), we only show this, and omit other three here. It can be expressed that [ E[Q ] E (a iz z a i a iz z a i a iz z a i ) + i z z a i a iz z a i a iz z a i )(a jz z a j a jz z a j a jz z a j ) + i z z a j a iz z a j a iz z a j ) + 4 i z z a j a iz z a j a iz z a j )(a iz z a s a iz z a s a iz z a s ),s + i z z a j a iz z a j a iz z a j )(a sz z a t a sz z a t a sz z a t ) + 4 +,s,t i z z a i a iz z a i a iz z a i )(a iz z a s a iz z a s a iz z a s ) i z z a i a iz z a i a iz z a i )(a jz z a s a jz z a s a jz z a s )

14 4 {(a ia i ) } * {a i a i a ja j + (a ia j ) } + * {a i a i a ja j + (a ia j ) } i a i a ja s + a ia j a ia s ) +,s i a j a sa t + a ia s a ja t + a ia t a ja s ),s,t + 4 i a i a ia j ) + i a i a ja s + a ia j a ja s ),s 7 σii 6 + ii σjj + 8 ii σijσ jj + 6 ii σijσ 4 jj ij,s + 08 ii σij + 48 ij σis + 6 ii σjs + 6 ii σ ij σ is σjs + 7 ii σijσ isσ js +,s,s ij σst + 8,s,t,s ij σstσ is σ jt + 6,s,t ij σ is σ it σ js σ jt σ st,s,t { p( Σ) p } + 4 σii ii σijσ jj + 6 ii σijσ 4 jj ij + 96 ii σij + 6 ij σis + 6 ii σ ij σ is σjs + 7 ii σ js σijσ is + 8,s ij σstσ is σ jt + 6,s,t,s ij σ is σ it σ js σ jt σ st,s,t { p( Σ) p } + 8 tr{( Σ)Σ} 8 tr DΣ( Σ)ΣD 6 tr D( Σ)Σ( Σ) 8 p( Σ) p + 48 p{d ( Σ)} p + 4 σii ii σijσ jj + 6 ii σijσ 4 jj ij + 96 ii σij + 8 ij σis + 8 ii σ ij σ is σjs + 6 ii σijσ isσ js + 6 j s t,s σ ij σ is σ it σ js σ jt σ st,s,s { p( Σ) p } + 8 tr{( Σ)Σ} 4 p{d ( Σ)} p j s t σ ij σ is σ it σ js σ jt σ st { p( Σ) p } + 8 tr{( Σ)Σ} + 6 j s t,s σ ij σ is σ it σ js σ jt σ st, σii ii σij where the second equality follows from Lemma, the third equality from bottom follows from Lemma and the second equality from bottom follows from Lemma 4. Since Σ is positive definite, we have tr{( Σ)Σ} {tr( Σ)Σ}, where the right-hand side of the inequality can be written as { p( Σ) p }. From this result, it is found that E[Q ] O max { p( Σ) p }, σ ij σ st σ is σ jt σ it σ js. j s t

15 5 ote that σ ij σ st σ is σ jt σ it σ js σ ij σ st ( σ is ) ( σ jt ) σ js σ it From this inequality, we have σ ij σ st σ is σ jt σ it σ js j s t Hence under the assumption C, j s t σ ij σ st σ is σ jt + σsj σ ti σ is σ jt, j s t σ ij σ st σ is σ jt σ it σ js [ tr{( Σ + )Σ + } + tr{( Σ + )Σ + } ] [ p( Σ + ) p ]. σ ij σ st σ is σ jt σ it σ js O ( { p( Σ) p } ), and so E[Q ] O ( { p( Σ) p } ). Proof of Theorem. Since S is unbiased estimator of p( Σ) p, it is sufficient to show that Var(S/ p( Σ) p ) 0 (5) as, p. Since S is invariant under location shift, without loss of generality, we may assume that µ 0. Each of A, B, C and D in (9) can be written as A B C D * p {(Σ / z k z kσ / ) (Σ / z l z lσ / ) (Σ / z α z ασ / )} p, k,l,α k,l,α,β k,l,α,β,γ k,l,α,β,γ,δ From (9), we can express that ( S/ p ( Σ) p ) * p {(Σ / z k z kσ / ) (Σ / z l z lσ / ) (Σ / z α z βσ / )} p, * p {(Σ / z k z kσ / ) (Σ / z l z ασ / ) (Σ / z β z γσ / )} p, * p {(Σ / z k z lσ / ) (Σ / z α z βσ / ) (Σ / z γ z δσ / )} p. [{ } A P p( B Σ) p P 4 p( + C Σ) p P 5 p( D Σ) p P 6 p( Σ) p [ { } ( ) ( ) ( ) ( A B C 4 P p( + Σ) p P 4 p( + Σ) p P 5 p( Σ) p ( ) ( ) ] D + P 6 p(, Σ) p ] )

16 6 where the inequality follows from Jensen inequality. Thus, it is described that ( ) [ [ { } ] ( ) [ ( ) ] S A B Var p( 4 E Σ) p P p( + E Σ) p P 4 p( Σ) p ( ) [ ( ) ] ( ) [ ( ) ]] C D + E P 5 p( + E Σ) p P 6 p(. (6) Σ) p For the first term in the right-hand side of the inequality (6), expanding {.}, and excluding the term whose expectation becomes 0, we have [ { } ] ( ) A E P p( Σ) p P p( E 6 * Q Σ) kkllαα + 8 * Qkkllαα Q kkllββ p k,l,α k,l,α,β +9 * Qkkllαα Q kkββγγ + * Qkkllαα Q ββγγδδ where k,l,α,β,γ k,l,α,β,γ,δ ( ) [6 P E[A ] + 8 P 4 E[A ] + 9 P 5 E[A ]] P { } P 6 + ( P ) ( ) [6 P E[A ] + 8 P 4 E[A ] + 9 P 5 E[A ]] P ( 5 + 0), (7) P E[A ] E[Q ] { p( Σ) p }, E[A ] E[Q Q 44 ] { p( Σ) p }, E[A ] E[Q Q 4455 ] { p( Σ) p }. Using the same calculation method, the second-fourth terms in the right-hand side of the inequality (6) can be written as [ { } ] ( ) B E P 4 p( [ P 4 E[B ] + P 4 E[B ] + 4 P 5 E[B ] + 4 P 5 E[B 4 ] Σ) p P 4 + P 6 E[B 5 ] + P 6 E[B 6 ]], (8) [ { } ] ( ) C E P 5 p( [4 P 5 E[C ] + 8 P 5 E[C ] + 8 P 5 E[C ] + 4 P 5 E[C 4 ] Σ) p P 5 +4 P 6 E[C 5 ] + 8 P 6 E[C 6 ] + 8 P 6 E[C 7 ] + 4 P 6 E[C 8 ]], (9) [ { } ] ( ) D E P 6 p( [6 P 6 E[D ] + 4 P 6 E[D ] + 4 P 6 E[D ] Σ) p P 6 +6 P 6 E[D 4 ]], (0)

17 7 where E[B ] E[Q 4] { p( Σ) p }, E[B ] E[Q 4Q 4 ] { p( Σ) p }, E[B ] E[Q 4Q 554 ] { p( Σ) p }, E[B 4 ] E[Q 4Q 554 ] { p( Σ) p }, E[B 5 ] E[Q 4Q ] { p( Σ) p }, E[B 6 ] E[Q 4Q ] { p( Σ) p }, E[C ] E[Q 45] { p( Σ) p }, E[C ] E[Q 45Q 54 ] { p( Σ) p }, E[C ] E[Q 45Q 45 ] { p( Σ) p }, E[C 4 ] E[Q 45Q 54 ] { p( Σ) p }, E[C 5 ] E[Q 45Q 6645 ] { p( Σ) p }, E[C 6 ] E[Q 45Q 6654 ] { p( Σ) p }, E[C 7 ] E[Q 45Q 6645 ] { p( Σ) p }, E[C 8 ] E[Q 45Q 6654 ] { p( Σ) p }, E[D ] E[Q 456] { p( Σ) p }, E[D ] E[Q 456Q 465 ] { p( Σ) p }, E[D ] E[Q 456Q 465 ] { p( Σ) p }, E[D 4 ] E[Q 456Q 465 ] { p( Σ) p }. From Lemma 5, we have E[A ] O(), E[B ] O(), E[C ] O(), E[D ] O(). It follows from Cauchy-Schwarz inequality that E[A j ] E[A ] (j,, 4), E[B j ] E[B ] (j,..., 6), E[C j ] E[C ] (j,..., 8), E[D j ] E[D ] (j,, 4). From them, we find that E[A j ] O() (j,, 4), E[B j ] O() (j,..., 6), E[C j ] O() (j,..., 8), E[D j ] O() (j,, 4). Combining these results with (7), (8), (9) and (0), it is found that [ { } ] A E P p( O( ), Σ) p [ { } ] B E P 4 p( O( ), Σ) p [ { } ] C E P 5 p( O( 4 ), Σ) p [ { } ] C E P 6 p( O( 6 ), Σ) p and so which converges to 0 as, p. ( ) S Var p( O( ), Σ) p

18 8 References [] Dudoit, S, Fridlyand, J. and Speed, T.P. (00). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc., 97, [] Fujikoshi, Y., Ulyanov, V.V. and Shimizu, R. (00). Multivariate Statistics High-Dimensional and Large-Sample Approximations, John Wiley & Sons, Hoboken, J. [] Henze,. (00). Invariant tests for multivariate normality: a critical review. Statist. Papers, 4, [4] Himeno, T. and Yamada, T. (04). Estimations for some functions of covariance matrix in high dimension under non-normality and its applications. J. Multivariate Anal., 0, [5] Jarque, C.M. and Bera, A.K. (987). A test for normality of observations and regression residuals. Internat. Statist. Rev., 55, 6 7. [6] Kubokawa, T. and Srivastava, M.S. (008). Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data. J. Multivariate Anal., 99, [7] Schott, J.R. (005). Matrix Analysis for Statistics, nd ed., John Wiley & Sons, Hoboken, J. [8] Srivastava, M.S. (984). A measure of skewness and kurtosis and a graphical method for assessing multivariate normality. Statist. Prob. Lett.,, [9] Srivastava, M.S. (00). Methods of Multivariate Statistics, John Wiley & Sons, Y. [0] Mardia, K.V. (970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, [] Mecklin, C.J. and Mundfrom, D.J. (004). An appraisal and bibliography of tests for multivariate normality. Internat. Statist. Rev., 7, 8.

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,