Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality

Size: px
Start display at page:

Download "Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality"

Transcription

1 Estimation of multivariate rd moment for high-dimensional data and its application for testing multivariate normality Takayuki Yamada, Tetsuto Himeno Institute for Comprehensive Education, Center of General Education Kagoshima University --0 Korimoto, Kagoshima Japan Faculty of Data Science Shiga University -- Banba, Hikone, Shiga 5-85 Japan Abstract This paper is concerned with the multivariate rd moment and its estimation. Mardia (970) and Srivastava (984) proposed the multivariate skewness and its estimator, independently. However, these estimators can not defined for the case in which the dimension p is larger than the sample size. In this paper, we treat the multivariate rd moment γ which is defined by using Hadamard product of observation vectors, and propose an estimate of γ which is well defined when p >. Based on the estimator, we propose new test for multivariate normality. Simulation results revealed that our proposed test has good accuracy. Keywords: Multivariate rd moment, Hadamard product, testing multivariate normality, (n, p)- asymptotic AMS 00 subject classification: Primary 6H5, Secondary 6E0 Introduction Let x and y be p-variate random vectors which are independently and identically distributed as a p-dimensional distribution F p (µ, Σ) with mean vector µ and covariance matrix Σ. Mardia (970) defined multivariate skewness and kurtosis as β M,p E[{(x µ) Σ (y µ)} ], β M,p E[{(x µ) Σ (x µ)} ], respectively. When p, β,p and β,p are reduced to the ordinal univariate squared skewness and kurtosis, respectively. Let x,..., x be a random sample drown from a population with distribution F p (µ, Σ). Mardia [0] proposed sample counterparts of β M,p and β M,p as respectively, where b M,p b M,p x j {(x i x) S (x j x)}, {(x i x) S (x i x)}, x i, S (x i x)(x i x). Srivastava [8] (as cited in a more recent book by this author [9]) defined other multivariate skewness β S,p and kurtosis β S,p, which are described as follows: Let Γ (γ,..., γ p ) be an orthogonal matrix such that Γ (γ,..., γ p ), where D λ diag(λ,..., λ p ). Let y l γ l x and θ l γ l µ for l,..., p. address:yamada@gm.kagoshima-u.ac.jp

2 Then β S,p and β S,p are given by β S,p p β S,p p { } E[(y l θ l ) ], λ / l E[(y l θ l ) 4 ] λ, l respectively. Srivastava [8] proposed sample counterparts of β S,p and β S,p as b S,p and b S,p, respectively. To describe them, let {/( )}S HD u H, where H (h,..., h p ) is an orthogonal matrix and D u diag(u,..., u p ). With being y li h lx i (l,..., p, i,..., ), b S,p and b S,p are given by b S,p p b S,p p { u / l u l } (y li ȳ l ), (y li ȳ l ) 4, respectively, where ȳ l y li. These sample counterparts of multivariate skewness and kurtosis are applied for testing multivariate normality, i.e., testing null hypothesis that population distribution is multivariate normal. Under multivariate normality, Mardia [0] showed that (/6)b M,p converges in distribution to χ (f M ), chisquared distribution with f M degrees of freedom, and {b M,p p(p + )}/{8p(p + )} / converges in distribution to (0, ). Here, f M p(p + )(p + )/6. Thus, if 6 b M,p χ f M ( α) () non-normality of the data is expected, where χ f M ( α) is the 00( α)% point of χ (f M ). Similarly, the normality is rejected if 8p(p + ) b M,p p(p + ) z α/, where z α/ is the 00( α/)% point of the standard normal distribution. Srivastava [8] showed that {p/6}b S,p converges in distribution to χ (p) under multivariate normality, and showed that (p/4) / (b S,p ) converges in distribution to (0, ). Based on these convergences, Srivastava [8] proposed testing criterion which rejects the normality of the data if and proposed the testing criterion that the normality is rejected if p 4 (b S,p ) z α/. p 6 b S,p χ p( α), () Some other methods to assess the multivariate normality have been studied. For a review of the results, see, e.g., Henze [] and Mecklin and Mundfrom []. In last years, we encounter more and more problems in applications when p is comparable with or even exceeds it. For example, financial data, consumer data, network data and medical data have this feature. When p >, it is impossible to define b M,p, b M,p, b S,p and b S,p, because the sample covariance matrix S becomes singular.

3 Instead of dealing with β M,p or β S,p, Himeno and Yamada [4] has treated κ E[{(x µ) (x µ)}] tr Σ (tr Σ). When Σ equals to identity matrix I p, κ is the same as the another definition of the Mardia [0] s multivariate kurtosis β M, p(p + ). Himeno and Yamada [4] gave the unbiased estimator of κ as the sample counterpart of κ, which is as follows: where ˆκ ( )( ) { tr S + (tr S) ( + )Q}, Q {(x x) (x x)}. It is noted that ˆκ is well defined for the case in which p >. Himeno and Yamada [4] showed that (/8) / p (ˆκ/â ) converges in distribution to (0, ) as and p go to infinity together under the following assumptions: () ( )/p converges to a positive constant as, p go to infinity; () a i tr Σ i /p converges to a positive constant as p, i,..., 4. Here, â is an unbiased estimator of a, which is as follows: â ( )( )( ) {( )( ) tr S + (tr S) ( ) Q}. They proposed testing criterion which rejects the normality of the data if ˆκ 8 p z α/. â It is reported in Himeno and Yamada [4] that the power of this test gets large as and p become large for local alternative hypothesis that population distribution is multivariate t. In this paper, we treat other multivariate rd moment γ instead of dealing with β M,p or β S,p. An estimator of γ is proposed that is well defined for the case in which p >. Based on ˆγ, we propose a new test for assessing multivariate normality. The present paper is organized as follows. In Section, we propose γ, and gave the estimator ˆγ. Asymptotic normality of ˆγ is shown under the asymptotic framework that, p for the case in which population distribution is multivariate normal. We give a testing criterion for multivariate normality based on ˆγ. Some results of small-scale simulation including to see attained significance level and empirical power are reported in Section. We give concluding remarks in Section 4. All technical proofs are relegated to the Appendix. Multivariate rd moment and its estimator Assume that x,..., x are independently and identically distributed (i.i.d.), and assume the following multivariate linear model: x i µ + Σ / ε i, () where ε,..., ε are i.i.d. as a p-dimensional distribution F p (0, I p ). For random vector x µ+σ / ε with ε F p (0, I p ), let γ E[{ (x µ)} p ] p( Σ) p, where the notation i A stands for the Hadamard product of i matrices A, i.e., i A A A A (i times). We note that i A is positive definite, which is guaranteed by Schur s product theorem (cf. Schott [7]). When p, γ is reduced to the univariate skewness. It is hard to obtain analytic form for

4 4 the unbiased estimator of γ. Instead of getting it, we construct the estimator by taking the ratio of unbiased estimator for the numerator of γ divided by for the denominator, which an estimator we proposed is as follows. E[{ ˆγ (x µ)} p ] T, p ( Σ) S p where T 4 * [ { x i P (x j + x k )}] p, S 8 P 6,k * p {(x k x l )(x k x l ) (x α x β )(x α x β ) (x γ x δ )(x γ x δ ) } p. k,l,α,β, γ,δ Here, the notation stands for the sum of all pairs of indices which are not equal. For example,,k j,j i k,k i,k j.. Asymptotic distribution of ˆγ and a test for γ Firstly, we show the asymptotic normality of ˆγ as, p under the assumption that F p (0, I p ) p (0, I p ). It is noted that T is invariant under the location shift. Hence, without loss of generality, we may assume µ 0. It can be described that T 4 { * [Σ / z i P (z j + z k )}] p.,k For deriving the anaclitic form of Var(T ), we decompose T as T + + (a lz i ) ( )( ) ( ),k l z i ) a lz j * a l z i a lz j a lz k { (a l z i ) a la l a lz i } ( ) ( )( ) T + T + T,,k * a l z i a lz j a lz k l z i ) a lz j ( )a la l a lz i where Σ / A (a,..., a p ). After much algebraic calculations, it is shown that Cov(T, T ) 0, Cov(T, T ) 0 and Cov(T, T ) 0, and so σ T Var(T ) Var(T ) + Var(T ) + Var(T ). We obtain analytic forms of Var(T ), Var(T ) and Var(T ), respectively, which are as follows. Var(T ) 6 p( Σ) p, (4) 8 Var(T ) ( ) p( Σ) p, (5) 4 Var(T ) ( )( ) p( Σ) p. (6)

5 5 These proofs are given in Appendix. From (4), (5) and (6), we have It is found that Var(T ) 6 ( )( ) p( Σ) p. Var(T ) σ T From Chebyshev inequality we have ( ) 0 (, p ), Var(T ) σ T 4 0 (, p ). These probability convergences imply that p p T 0 (, p ), T 0 (, p ). σ T σ T σ T (T T ) p 0 (, p ). And we can express that σ T T [ σ T { (a l z i ) a la l a lz i } ] η i σ T. It is shown that E[η i ] 0, and is shown in Appendix that ( ) ηi Var 6 σ T σt p( Σ) p, (7) which converges to as, p. By virtue of the ordinal central limit theorem, we find that the asymptotic distribution of σ T T is (0, ) as, p. Using Slutsly theorem, the asymptotic normality of T can be proved, which is given as the following theorem. Theorem. As, p, σ T T D (0, ). In Appendix, we prove the rate consistency for S/ p( Σ) p under the condition C; C : lim sup p p( Σ + ) p p( Σ) p <, where Σ + ( σ ij ) for Σ (σ ij ). The result is given as the following theorem. Theorem. Suppose that the condition C holds. Then, as, p, S p( Σ) p p. From these theorems (Theorem and Theorem ) and Slutsky s theorem, we find that 6 ˆγ D (0, ) (8) as, p under the condition that C holds. As an application, assessing the assumption of multivariate normality is considered by testing γ 0. The null hypothesis that γ 0 is rejected with significance level α for the case in which /6ˆγ > z α/, where z α is the 00 α% point of the standard normal distribution.

6 6 umerical results. Simulation In order to see the accuracy of the asymptotic normality (8), we tried a numerical simulation. It can be expressed that S A B + C D, (9) P P 4 P 5 P 6 where A C * p (x k x k x l x l x α x α) p, B k,l,α k,l,α,β,γ * p (x k x k x l x α x β x γ) p, D k,l,α,β k,l,α,β,γ,δ * p (x k x k x l x l x α x β) p, * p (x k x l x α x β x γ x δ) p. To compute A, it is needed to calculate triple sum, which is performed by triple-loop calculations. We see that B, C, D and T are needed to perform multi-loop calculations. Generally, it takes a lot time to carry out multi-loop calculations. To save computing time, we shall provide other expressions for A, B, C, D and T. Put Then, it holds that X ij ( i x k )( j x k ) (i, j,..., ), k { X 0i X i0 ( i x k ) p} (i,..., ), s k k ( k x i ) (k,..., ). T ( )( ) s p ( )( ) (s s ) p + ( )( ) ( s ) p, A [ p ( ] X ) X X + X p, B p [ ( X ) + 5X X 6X 4X 0 X X + 4X 0 X +X X + X 0 X 0 ( ] X ) X 0 X 0 X p, C [ p ( X ) 4X X + 4X + 6X 0 X X 4X 0 X 8X X 4X 0 X 0 ( X ) + 8X 0 X 0 X 4X 0 ( X 0 ) X + 4( X 0 ) X + 4X 0 X X 0 4X X 0 X 0 ( X 0 ) X +X 0 X X 0 + ( X 0 ) ( X 0 ) X ] p, D {( s ) p (s s ) p + s p } 6A 8B 9C. One can see that these expressions have no multi-sum expression. Generate the data based on the model (). Firstly, we checked the asymptotic normality of ˆγ given in (8). Make 0 4 samples of the size, each of which is constructed by p-dimensional vectors which are i.i.d. as p (0, Σ). In this study, the following two cases for Σ are treated: () Σ I p ; () Σ has Toeplitz structure such that Σ Σ T D / P D /, where D diag(d,..., d p ) with d i 5+( ) i (p i+)/p, and P (0. i j ). We note that these two cases satisfy the condition C. Figure shows Q-Q plot of /6ˆγ when Σ I p for the case in which (, p) (60, 60), and Figure shows Q-Q plot when Σ Σ T. We can see the goodness of fit of the standard normal distribution to /6ˆγ from these figures. This numerical simulations were carried out for several other dimensions

7 ormal Q Q Plot Sample Quantiles 0 Sample Quantiles 4 4 ormal Q Q Plot Theoretical Quantiles Figure : Q-Q plot of 0γ when Σ I Theoretical Quantiles Figure : Q-Q plot of 0γ when Σ ΣT as well, p 0, 40, 480, and also for other sample sizes, 0, 40, 480, but due to similarity of the graphs, only a selection is reported here. ext, we checked the attained significance level(asl) for testing the null hypothesis that H0 : Fp (0, I p ) p (0, I p ) under the assumption of model () by using the following criterion: /6γ > z α/ reject the null hypothesis that H0 : Fp (0, I p ) p (0, I p ). (0) To compute ASL, Monte Carlo simulation with 04 replication was done for the nominal level α We computed for the case in which 60,0, 40, 480, and p 60, 0, 40, 480, and wrote these values in Table. The settings of Σ treated in this paper are as follows. Case. Σ I p. Case. Σ ΣT. Case. Generate a sparse positive definite covariance matrix Σ by Algorithm. Set sparsity level Algorithm The algorithm for generating sparse covariance matrix Σ : Construct p p matrix R (rij ) as follows. For each i < j, with probability ( s) 0.75, Unif(0, ) Unif(, 0) with probability ( s) 0.75, rij 0 with probability s, where s is the level of sparsity in R. Then, we set rji rij to obtain symmetry. Furthermore, set the diagonal elements of R to equal. : Prepare a p p diagonal matrix D (dij ) such that d,..., dpp are i.i.d. chi-squared distribution with degree of freedom. : Create A DRD. 4: In order to obtain positive definiteness of Σ, calculate the minimum eigenvalue emin of A. Set { A + ( emin + 0.)I p if emin 0, Σ A otherwise. s 0.7. Case 4. Use the same generating method as Case with sparsity level s 0.4. Case 5. Use the same generating method as Case with sparsity level s 0.. For Case -5, we repeat the Monte Carlo simulation 0 times, and wrote the average of these 0 values in the table. The value in parenthesis is the standard error. 7

8 8 Table : Attained significance level of the proposed test p Case Case Case (S.E) Case 4(S.E) Case 5(S.E) (0.00) 0.056(0.006) 0.056(0.004) (0.004) 0.056(0.00) 0.056(0.000) (0.005) 0.056(0.00) 0.056(0.00) (0.005) 0.056(0.00) 0.056(0.00) (0.00) 0.05(0.004) 0.05(0.00) (0.00) 0.05(0.00) 0.05(0.009) (0.00) 0.05(0.00) 0.05(0.004) (0.00) 0.05(0.00) 0.05(0.00) (0.005) 0.05(0.00) 0.05(0.00) (0.005) 0.05(0.000) 0.05(0.000) (0.00) 0.05(0.008) 0.05(0.000) (0.000) 0.05(0.00) 0.05(0.000) (0.00) 0.05(0.00) 0.05(0.00) (0.00) 0.05(0.00) 0.05(0.00) (0.000) 0.05(0.00) 0.05(0.00) (0.00) 0.05(0.004) 0.05(0.00) We can see from Table that ASL of our statistic is little affeected by the sparsity of the covariance matrix. It is observed that the accuracy of the approximation gets better as and p become large. Thirdly, we checked low-dimensional performance of the proposed test. The setting of dimensionality is p 0. We compare ASL of our proposed test with the one of Mardia [0] s test () and the one of Srivastava [8] s test () for the case in which 40, 50, 60, 70, 80, 90, 00 and Σ I 0. Computing methodology is the same as the one in Table. The symbol P denotes the value of ASL for proposed test (0), M denotes for Mardia [0] s test (), and S denotes for Srivastava [8] s test (). We can see that ASL of our proposed test overestimate, whereas ASLs of Mardia [0] s test and Srivastava [8] s test underestimate. Our approximation seems to be good for the case in which 70, whereas other two approximations are not good. The precisions of all three tests gets better as becomes large. ASL P P P P P P P S S S S S S S M M M M M M M Figure : Comparison of ASL when p 0 Finally, we checked the empirical power(emp) of the proposed test by Monte Carlo simulation.

9 9 The setting of the local alternative hypothesis treated in this paper is as follows: For ε (ε,..., ε p ) F p (0, I p ), ν ( ci ) ε i ν (i,..., p), where c,..., c p are i.i.d. as the chi-squared distribution with ν degrees of freedom. We computed the EMP based on 0 repetition for the case in which α 0.05 and ν The settings of and p are the same as the ones in Table. We carried out for all models of Σ written in Table, but due to similarity of the tendencies, Case -5 are omitted here. We can see from Table that EMP are monotone increasing for p and. Table : Empirical power of the proposed test for Case \ p Real data analysis We applied our test to data-set which is used in Kubokawa and Srivastava [6]. The first applied data is colon data which is constructed by 000 (p) genes expression levels on normal colon tissues and 40 tumor colon tissues. We preprocessed the data by applying 0 logarithmic transformation. The second applied data-set is leukemia data which is constructed by 57(p) gene expressions on 47 patients suffering from acute lymphoblastic leukemia (ALL, 47 cases) and 5 patients suffering from acute myeloid leukemia (AML, 5 cases). The data-set are preprocessed by following protocol written in Dudoit et al. []. For the colon data, the p-value of our test is for normal colon tissues and for tumor colon tissues. For the leukemia data, we observed that both the p-value of our test for ALL and for AML are for the leukemia data. These results indicate that the multivariate normality assumption on both sets of data cannot be rejected at the usual significance level 5%. 4 Concluding remarks This paper treats multivariate rd moment γ which is defined by using Hadamard product of observation vectors. When the dimension equals to, γ becomes univariate skewness. An estimator of γ, denoted as ˆγ, which is well defined for p > is proposed. Based on ˆγ, testing criterion for assessing multivariate normality is given. We prepare the expression of the testing statistic without having multi-sum. Calculating time becomes shorten. The performance of the testing criterion, in terms of the attained significance level and the empirical power, is shown through simulations for several setting of sample sizes and dimensions. We apply our proposed test to microarray data sets, some of the most popular are in high-dimensional data. To assess the normality of high-dimensional data, we recommend to use not only our proposed test but also Himeno and Yamada [4] s test based on ˆκ together. Omnibus test using ˆγ and ˆκ, which imitates Jarque and Bera [5] s test, is a future work. A Moment of statistic In this section, analytic forms of Var(T ), Var(T ) and Var(T ) are proposed. We derive them by using these results (Lemma ), proofs of which are tedious but simple, therefore skipped.

10 0 Lemma. Let z be a random vector distributed as p (0, I p ), and a and b be constant vectors. Then, E[(a z) ] a a, E[(a z) 4 ] (a a), E[(a z) (b z) ] (a b) + a ab b, E[(a z) b z] a aa b, E[(a z) 6 ] 5(a a), E[(a z) (b z) ] 6(a b) + 9a aa bb b. Here, we write analytic forms of Var(T ), Var(T ) and Var(T ) in the following lemma. Lemma. For T, T and T, as defined in Section., Proof. In Section, we expressed that where Var(T ) 6 p( Σ) p, 8 Var(T ) ( ) p( Σ) p, 4 Var(T ) ( )( ) p( Σ) p. η i T η i, { (a l z i ) a la l z } lz i. Since η,..., η are i.i.d., it holds that ( Var(T ) Var(η) Var where z p (0, I p ). We find that E[ζ] 0, and so ) {(a lz) a la l a lz}, Var(ζ) E[ζ ] [ E {(a lz) a la l a lz} + * {(a l z) a la l a lz}{(a αz) a αa α a αz} l,α [ {( )(a la l ) } + * {6(a l a l ) + ( )a la l a la α a αa α } l,α 6 (a la l ) + l a α ) 6 p( A ) p 6 p( Σ) p, l,α

11 where the third equality follows from Lemma. Since E[T ] 0, we have Var(T ) E[T ] 9 ( ) E * (a l z i ) 4 (a lz j ) ( )a la l a lz i + * * (a l z i ) 4 (a lz j ) ( )a la l a lz i l,α * (a α z i ) 4 (a αz j ) ( )a αa α a αz i 9 ( ) E l z i ) 4 (a lz j ) + l z i ) (a lz j ) (a lz k ) l,α +( ) (a la l ) (a lz i ) ( )a * la l (a l z i ) (a lz j ) + * l z i ) (a αz i ) a lz j a αz j + l z i ) a lz j a αz j (a αz k ) ( )a * αa α (a l z i ) a lz j a αz j ( )a * la l (a α z i ) a αz j a lz j +( ) a la l a αa α 9 ( ) a lz i a αz i }],k,k [ { ( ) + ( )( ) + ( ) ( ) } (a la l ) + * { ( ){(a la α ) + a la l a αa α }a la α l,α + {( )( ) ( ) ( ) + ( ) }a la l a la α a αa α ] 9 ( ( ) ) 8 ( ) p( A ) p 8 ( ) p( Σ) p, (a la l ) + ( ) l a α ) where the fourth equality follows from Lemma. l,α

12 Since E[T ] 0, we have Var(T ) E[T ] 4 ( ) ( ) E * a l z i a lz j a lz k l,α,k,k + * * a l z i a lz j a lz k * a α z i a αz j a αz k 4 ( ) ( ) E 6 +6,k,k * * a l z i z ia α a lz j z ja α a lz k z ka α l,α,k 4 ( )( ) (a la l ) + 4 ( )( ) p( A ) p 4 ( )( ) p( Σ) p, where the fourth equality follows from Lemma. B Proof of Theorem l z i ) (a lz j ) (a lz k ) l a α ) In this section, we give a proof of Theorem. Before proving it, we prepare three lemmas. The first two lemmas (Lemma and Lemma 4) treat formula about multi-sum, and the last lemma treats the order of the expectations which is used to prove Theorem. Since proofs of Lemma and Lemma 4 are tedious but simple, we skipped here. Lemma. The following equations hold: ij σstσ is σ jt tr{( Σ)Σ} tr DΣ( Σ)ΣD tr D( Σ)Σ( Σ) p( Σ) p,s,t ij σ st σ is σ jt σ it σ js,s,t + p{d ( Σ)} p j s t l,α ij σis,s ij σ is σ jj σjs,s σ ij σ st σ is σ jt σ it σ js tr D( Σ)Σ( Σ) tr D( Σ)Σ( Σ) + p{d ( Σ)} p ii σijσ isσ js,,s ii σijσ isσ js.,s

13 Lemma 4. The following equations hold: ij σis p( Σ) p p{d ( Σ)} p ii σij 6 ij,,s ii σ ij σ is σjs tr DΣ( Σ)ΣD p{d ( Σ)} p ii σij ii σijσ jj,,s ii σijσ isσ js tr D( Σ)Σ( Σ) p{d ( Σ)} p,s ii σij ii σijσ 4 jj. Lemma 5. Let Q klαβγδ a iz k z l a j a i z α z β a j a iz γ z δa j, j where Σ / A (a,..., a p ), and z k, z l, z α, z β, z γ and z δ are i.i.d. as p (0, I p ). Then, E[Q ] O max { p( Σ) p }, σ ij σ st σ is σ jt σ it σ js, () j s t E[Q 4] O max { p( Σ) p }, σ ij σ st σ is σ jt σ it σ js, () j s t E[Q 45] O ( { p( Σ) p } ), () E[Q 456] O ( { p( Σ) p } ). (4) In addition, under the assumption C, E[Q ] O ( { p( Σ) p } ), E[Q 4] O ( { p( Σ) p } ). Proof. Since (), () and (4) can be proved by using the same way to show (), we only show this, and omit other three here. It can be expressed that [ E[Q ] E (a iz z a i a iz z a i a iz z a i ) + i z z a i a iz z a i a iz z a i )(a jz z a j a jz z a j a jz z a j ) + i z z a j a iz z a j a iz z a j ) + 4 i z z a j a iz z a j a iz z a j )(a iz z a s a iz z a s a iz z a s ),s + i z z a j a iz z a j a iz z a j )(a sz z a t a sz z a t a sz z a t ) + 4 +,s,t i z z a i a iz z a i a iz z a i )(a iz z a s a iz z a s a iz z a s ) i z z a i a iz z a i a iz z a i )(a jz z a s a jz z a s a jz z a s )

14 4 {(a ia i ) } * {a i a i a ja j + (a ia j ) } + * {a i a i a ja j + (a ia j ) } i a i a ja s + a ia j a ia s ) +,s i a j a sa t + a ia s a ja t + a ia t a ja s ),s,t + 4 i a i a ia j ) + i a i a ja s + a ia j a ja s ),s 7 σii 6 + ii σjj + 8 ii σijσ jj + 6 ii σijσ 4 jj ij,s + 08 ii σij + 48 ij σis + 6 ii σjs + 6 ii σ ij σ is σjs + 7 ii σijσ isσ js +,s,s ij σst + 8,s,t,s ij σstσ is σ jt + 6,s,t ij σ is σ it σ js σ jt σ st,s,t { p( Σ) p } + 4 σii ii σijσ jj + 6 ii σijσ 4 jj ij + 96 ii σij + 6 ij σis + 6 ii σ ij σ is σjs + 7 ii σ js σijσ is + 8,s ij σstσ is σ jt + 6,s,t,s ij σ is σ it σ js σ jt σ st,s,t { p( Σ) p } + 8 tr{( Σ)Σ} 8 tr DΣ( Σ)ΣD 6 tr D( Σ)Σ( Σ) 8 p( Σ) p + 48 p{d ( Σ)} p + 4 σii ii σijσ jj + 6 ii σijσ 4 jj ij + 96 ii σij + 8 ij σis + 8 ii σ ij σ is σjs + 6 ii σijσ isσ js + 6 j s t,s σ ij σ is σ it σ js σ jt σ st,s,s { p( Σ) p } + 8 tr{( Σ)Σ} 4 p{d ( Σ)} p j s t σ ij σ is σ it σ js σ jt σ st { p( Σ) p } + 8 tr{( Σ)Σ} + 6 j s t,s σ ij σ is σ it σ js σ jt σ st, σii ii σij where the second equality follows from Lemma, the third equality from bottom follows from Lemma and the second equality from bottom follows from Lemma 4. Since Σ is positive definite, we have tr{( Σ)Σ} {tr( Σ)Σ}, where the right-hand side of the inequality can be written as { p( Σ) p }. From this result, it is found that E[Q ] O max { p( Σ) p }, σ ij σ st σ is σ jt σ it σ js. j s t

15 5 ote that σ ij σ st σ is σ jt σ it σ js σ ij σ st ( σ is ) ( σ jt ) σ js σ it From this inequality, we have σ ij σ st σ is σ jt σ it σ js j s t Hence under the assumption C, j s t σ ij σ st σ is σ jt + σsj σ ti σ is σ jt, j s t σ ij σ st σ is σ jt σ it σ js [ tr{( Σ + )Σ + } + tr{( Σ + )Σ + } ] [ p( Σ + ) p ]. σ ij σ st σ is σ jt σ it σ js O ( { p( Σ) p } ), and so E[Q ] O ( { p( Σ) p } ). Proof of Theorem. Since S is unbiased estimator of p( Σ) p, it is sufficient to show that Var(S/ p( Σ) p ) 0 (5) as, p. Since S is invariant under location shift, without loss of generality, we may assume that µ 0. Each of A, B, C and D in (9) can be written as A B C D * p {(Σ / z k z kσ / ) (Σ / z l z lσ / ) (Σ / z α z ασ / )} p, k,l,α k,l,α,β k,l,α,β,γ k,l,α,β,γ,δ From (9), we can express that ( S/ p ( Σ) p ) * p {(Σ / z k z kσ / ) (Σ / z l z lσ / ) (Σ / z α z βσ / )} p, * p {(Σ / z k z kσ / ) (Σ / z l z ασ / ) (Σ / z β z γσ / )} p, * p {(Σ / z k z lσ / ) (Σ / z α z βσ / ) (Σ / z γ z δσ / )} p. [{ } A P p( B Σ) p P 4 p( + C Σ) p P 5 p( D Σ) p P 6 p( Σ) p [ { } ( ) ( ) ( ) ( A B C 4 P p( + Σ) p P 4 p( + Σ) p P 5 p( Σ) p ( ) ( ) ] D + P 6 p(, Σ) p ] )

16 6 where the inequality follows from Jensen inequality. Thus, it is described that ( ) [ [ { } ] ( ) [ ( ) ] S A B Var p( 4 E Σ) p P p( + E Σ) p P 4 p( Σ) p ( ) [ ( ) ] ( ) [ ( ) ]] C D + E P 5 p( + E Σ) p P 6 p(. (6) Σ) p For the first term in the right-hand side of the inequality (6), expanding {.}, and excluding the term whose expectation becomes 0, we have [ { } ] ( ) A E P p( Σ) p P p( E 6 * Q Σ) kkllαα + 8 * Qkkllαα Q kkllββ p k,l,α k,l,α,β +9 * Qkkllαα Q kkββγγ + * Qkkllαα Q ββγγδδ where k,l,α,β,γ k,l,α,β,γ,δ ( ) [6 P E[A ] + 8 P 4 E[A ] + 9 P 5 E[A ]] P { } P 6 + ( P ) ( ) [6 P E[A ] + 8 P 4 E[A ] + 9 P 5 E[A ]] P ( 5 + 0), (7) P E[A ] E[Q ] { p( Σ) p }, E[A ] E[Q Q 44 ] { p( Σ) p }, E[A ] E[Q Q 4455 ] { p( Σ) p }. Using the same calculation method, the second-fourth terms in the right-hand side of the inequality (6) can be written as [ { } ] ( ) B E P 4 p( [ P 4 E[B ] + P 4 E[B ] + 4 P 5 E[B ] + 4 P 5 E[B 4 ] Σ) p P 4 + P 6 E[B 5 ] + P 6 E[B 6 ]], (8) [ { } ] ( ) C E P 5 p( [4 P 5 E[C ] + 8 P 5 E[C ] + 8 P 5 E[C ] + 4 P 5 E[C 4 ] Σ) p P 5 +4 P 6 E[C 5 ] + 8 P 6 E[C 6 ] + 8 P 6 E[C 7 ] + 4 P 6 E[C 8 ]], (9) [ { } ] ( ) D E P 6 p( [6 P 6 E[D ] + 4 P 6 E[D ] + 4 P 6 E[D ] Σ) p P 6 +6 P 6 E[D 4 ]], (0)

17 7 where E[B ] E[Q 4] { p( Σ) p }, E[B ] E[Q 4Q 4 ] { p( Σ) p }, E[B ] E[Q 4Q 554 ] { p( Σ) p }, E[B 4 ] E[Q 4Q 554 ] { p( Σ) p }, E[B 5 ] E[Q 4Q ] { p( Σ) p }, E[B 6 ] E[Q 4Q ] { p( Σ) p }, E[C ] E[Q 45] { p( Σ) p }, E[C ] E[Q 45Q 54 ] { p( Σ) p }, E[C ] E[Q 45Q 45 ] { p( Σ) p }, E[C 4 ] E[Q 45Q 54 ] { p( Σ) p }, E[C 5 ] E[Q 45Q 6645 ] { p( Σ) p }, E[C 6 ] E[Q 45Q 6654 ] { p( Σ) p }, E[C 7 ] E[Q 45Q 6645 ] { p( Σ) p }, E[C 8 ] E[Q 45Q 6654 ] { p( Σ) p }, E[D ] E[Q 456] { p( Σ) p }, E[D ] E[Q 456Q 465 ] { p( Σ) p }, E[D ] E[Q 456Q 465 ] { p( Σ) p }, E[D 4 ] E[Q 456Q 465 ] { p( Σ) p }. From Lemma 5, we have E[A ] O(), E[B ] O(), E[C ] O(), E[D ] O(). It follows from Cauchy-Schwarz inequality that E[A j ] E[A ] (j,, 4), E[B j ] E[B ] (j,..., 6), E[C j ] E[C ] (j,..., 8), E[D j ] E[D ] (j,, 4). From them, we find that E[A j ] O() (j,, 4), E[B j ] O() (j,..., 6), E[C j ] O() (j,..., 8), E[D j ] O() (j,, 4). Combining these results with (7), (8), (9) and (0), it is found that [ { } ] A E P p( O( ), Σ) p [ { } ] B E P 4 p( O( ), Σ) p [ { } ] C E P 5 p( O( 4 ), Σ) p [ { } ] C E P 6 p( O( 6 ), Σ) p and so which converges to 0 as, p. ( ) S Var p( O( ), Σ) p

18 8 References [] Dudoit, S, Fridlyand, J. and Speed, T.P. (00). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc., 97, [] Fujikoshi, Y., Ulyanov, V.V. and Shimizu, R. (00). Multivariate Statistics High-Dimensional and Large-Sample Approximations, John Wiley & Sons, Hoboken, J. [] Henze,. (00). Invariant tests for multivariate normality: a critical review. Statist. Papers, 4, [4] Himeno, T. and Yamada, T. (04). Estimations for some functions of covariance matrix in high dimension under non-normality and its applications. J. Multivariate Anal., 0, [5] Jarque, C.M. and Bera, A.K. (987). A test for normality of observations and regression residuals. Internat. Statist. Rev., 55, 6 7. [6] Kubokawa, T. and Srivastava, M.S. (008). Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data. J. Multivariate Anal., 99, [7] Schott, J.R. (005). Matrix Analysis for Statistics, nd ed., John Wiley & Sons, Hoboken, J. [8] Srivastava, M.S. (984). A measure of skewness and kurtosis and a graphical method for assessing multivariate normality. Statist. Prob. Lett.,, [9] Srivastava, M.S. (00). Methods of Multivariate Statistics, John Wiley & Sons, Y. [0] Mardia, K.V. (970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, [] Mecklin, C.J. and Mundfrom, D.J. (004). An appraisal and bibliography of tests for multivariate normality. Internat. Statist. Rev., 7, 8.

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School

More information

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means Jackknife Empirical Likelihood est for Equality of wo High Dimensional Means Ruodu Wang, Liang Peng and Yongcheng Qi 2 Abstract It has been a long history to test the equality of two multivariate means.

More information

On the Testing and Estimation of High- Dimensional Covariance Matrices

On the Testing and Estimation of High- Dimensional Covariance Matrices Clemson University TigerPrints All Dissertations Dissertations 1-009 On the Testing and Estimation of High- Dimensional Covariance Matrices Thomas Fisher Clemson University, thomas.j.fisher@gmail.com Follow

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School

More information

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM Junyong Park Bimal Sinha Department of Mathematics/Statistics University of Maryland, Baltimore Abstract In this paper we discuss the well known multivariate

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Comparison of Discrimination Methods for High Dimensional Data

Comparison of Discrimination Methods for High Dimensional Data Comparison of Discrimination Methods for High Dimensional Data M.S.Srivastava 1 and T.Kubokawa University of Toronto and University of Tokyo Abstract In microarray experiments, the dimension p of the data

More information

Multivariate Normal-Laplace Distribution and Processes

Multivariate Normal-Laplace Distribution and Processes CHAPTER 4 Multivariate Normal-Laplace Distribution and Processes The normal-laplace distribution, which results from the convolution of independent normal and Laplace random variables is introduced by

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large Tetsuji Tonda 1 Tomoyuki Nakagawa and Hirofumi Wakaki Last modified: March 30 016 1 Faculty of Management and Information

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Sparse Linear Discriminant Analysis With High Dimensional Data

Sparse Linear Discriminant Analysis With High Dimensional Data Sparse Linear Discriminant Analysis With High Dimensional Data Jun Shao University of Wisconsin Joint work with Yazhen Wang, Xinwei Deng, Sijian Wang Jun Shao (UW-Madison) Sparse Linear Discriminant Analysis

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors Takashi Seo a, Takahiro Nishiyama b a Department of Mathematical Information Science, Tokyo University

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Yuki Yamada, Masashi Hyodo and Takahiro Nishiyama Department of Mathematical Information Science, Tokyo University

More information

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report -

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - Edgar Brunner and Marius Placzek University of Göttingen, Germany 3. August 0 . Statistical Model and Hypotheses Throughout

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions Hong Kong Baptist University HKBU Institutional Repository Department of Economics Journal Articles Department of Economics 1998 Testing for a unit root in an ar(1) model using three and four moment approximations:

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

arxiv: v1 [math.st] 2 Apr 2016

arxiv: v1 [math.st] 2 Apr 2016 NON-ASYMPTOTIC RESULTS FOR CORNISH-FISHER EXPANSIONS V.V. ULYANOV, M. AOSHIMA, AND Y. FUJIKOSHI arxiv:1604.00539v1 [math.st] 2 Apr 2016 Abstract. We get the computable error bounds for generalized Cornish-Fisher

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model Journal of Multivariate Analysis 70, 8694 (1999) Article ID jmva.1999.1817, available online at http:www.idealibrary.com on On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients 17: INFERENCE FOR MULTIPLE REGRESSION Inference for Individual Regression Coefficients The results of this section require the assumption that the errors u are normally distributed. Let c i ij denote the

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Testing for Granger Non-causality in Heterogeneous Panels

Testing for Granger Non-causality in Heterogeneous Panels Testing for Granger on-causality in Heterogeneous Panels Elena-Ivona Dumitrescu Christophe Hurlin December 211 Abstract This paper proposes a very simple test of Granger (1969 non-causality for heterogeneous

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Journal of Multivariate Analysis. Use of prior information in the consistent estimation of regression coefficients in measurement error models

Journal of Multivariate Analysis. Use of prior information in the consistent estimation of regression coefficients in measurement error models Journal of Multivariate Analysis 00 (2009) 498 520 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Use of prior information in

More information

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Bull. Korean Math. Soc. 5 (24), No. 3, pp. 7 76 http://dx.doi.org/34/bkms.24.5.3.7 KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Yicheng Hong and Sungchul Lee Abstract. The limiting

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Confidence Intervals for the Process Capability Index C p Based on Confidence Intervals for Variance under Non-Normality

Confidence Intervals for the Process Capability Index C p Based on Confidence Intervals for Variance under Non-Normality Malaysian Journal of Mathematical Sciences 101): 101 115 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Confidence Intervals for the Process Capability

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0063R2 Title Two-sample tests for high-dimension, strongly spiked eigenvalue models Manuscript ID SS-2016-0063R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

MULTIVARIATE ANALYSIS OF VARIANCE

MULTIVARIATE ANALYSIS OF VARIANCE MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Empirical Power of Four Statistical Tests in One Way Layout

Empirical Power of Four Statistical Tests in One Way Layout International Mathematical Forum, Vol. 9, 2014, no. 28, 1347-1356 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47128 Empirical Power of Four Statistical Tests in One Way Layout Lorenzo

More information

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p. Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

Membership Statistical Test For Shapes With Application to Face Identification

Membership Statistical Test For Shapes With Application to Face Identification IEEE PROCEEDIGS, VOL., O. Y, MOTH 004 1 Membership Statistical Test For Shapes With Application to Face Identification Eugene Demidenko Abstract We develop a statistical test to determine if shape belongs

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES

CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES REVSTAT Statistical Journal Volume 13, Number, June 015, 131 144 CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES Authors: Robert Garthoff Department of Statistics, European University, Große Scharrnstr.

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

Unit Roots in White Noise?!

Unit Roots in White Noise?! Unit Roots in White Noise?! A.Onatski and H. Uhlig September 26, 2008 Abstract We show that the empirical distribution of the roots of the vector auto-regression of order n fitted to T observations of

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

arxiv:math/ v1 [math.pr] 9 Sep 2003

arxiv:math/ v1 [math.pr] 9 Sep 2003 arxiv:math/0309164v1 [math.pr] 9 Sep 003 A NEW TEST FOR THE MULTIVARIATE TWO-SAMPLE PROBLEM BASED ON THE CONCEPT OF MINIMUM ENERGY G. Zech and B. Aslan University of Siegen, Germany August 8, 018 Abstract

More information

Estimating Individual Mahalanobis Distance in High- Dimensional Data

Estimating Individual Mahalanobis Distance in High- Dimensional Data CESIS Electronic Working Paper Series Paper No. 362 Estimating Individual Mahalanobis Distance in High- Dimensional Data Dai D. Holgersson T. Karlsson P. May, 2014 The Royal Institute of technology Centre

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

Test for Parameter Change in ARIMA Models

Test for Parameter Change in ARIMA Models Test for Parameter Change in ARIMA Models Sangyeol Lee 1 Siyun Park 2 Koichi Maekawa 3 and Ken-ichi Kawai 4 Abstract In this paper we consider the problem of testing for parameter changes in ARIMA models

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control Takahiro Nishiyama a a Department of Mathematical Information Science, Tokyo University of Science,

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

YUN WU. B.S., Gui Zhou University of Finance and Economics, 1996 A REPORT. Submitted in partial fulfillment of the requirements for the degree

YUN WU. B.S., Gui Zhou University of Finance and Economics, 1996 A REPORT. Submitted in partial fulfillment of the requirements for the degree A SIMULATION STUDY OF THE ROBUSTNESS OF HOTELLING S T TEST FOR THE MEAN OF A MULTIVARIATE DISTRIBUTION WHEN SAMPLING FROM A MULTIVARIATE SKEW-NORMAL DISTRIBUTION by YUN WU B.S., Gui Zhou University of

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION Gábor J. Székely Bowling Green State University Maria L. Rizzo Ohio University October 30, 2004 Abstract We propose a new nonparametric test for equality

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

TESTING FOR HOMOGENEITY IN COMBINING OF TWO-ARMED TRIALS WITH NORMALLY DISTRIBUTED RESPONSES

TESTING FOR HOMOGENEITY IN COMBINING OF TWO-ARMED TRIALS WITH NORMALLY DISTRIBUTED RESPONSES Sankhyā : The Indian Journal of Statistics 2001, Volume 63, Series B, Pt. 3, pp 298-310 TESTING FOR HOMOGENEITY IN COMBINING OF TWO-ARMED TRIALS WITH NORMALLY DISTRIBUTED RESPONSES By JOACHIM HARTUNG and

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T, Regression Analysis The multiple linear regression model with k explanatory variables assumes that the tth observation of the dependent or endogenous variable y t is described by the linear relationship

More information