ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

J Jaan Statist Soc Vol 34 No 2004 9 26 ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE Yasunori Fujikoshi*, Tetsuto Himeno and Hirofumi Wakaki This aer is concerned with Demster trace criterion for multivariate linear hyothesis which was roosed for high dimensional situation First we derive asymtotic null and nonnull distributions of Demster trace criterion when both the samle size and the dimension tend to infinity Our aroximations are examined through some numerical exeriments Next we comare the ower of Demster trace criterion with the ones of three classical criteria; likelihood ratio criterion, Lawley-Hotelling trace criterion, and Bartlett-Nanda-Pillai trace criterion when the dimension is large comared to the samle size Key words and hrases: Asymtotic null and nonnull distributions, Demster trace criterion, high dimensional situation, MANOVA tests, ower comarison Introduction Let Y be an N observation matrix which is obtained by indeendently observinga dimensional variate y =(y,,y ) for N subjects A multivariate linear model for Y is exressed as () Y = AΘ+E, where A is a known N k design matrix with rank(a) =k, Θ is a k unknown arameter matrix, and E is an N error matrix It is assumed that the rows of E are indeendently distributed as N (0, Σ) For testing (2) H 0 : CΘ =O vs H : CΘ O, let S h and S e be the matrices of sums of squares and roducts due to the hyothesis and the error defined by S h =(C ˆΘ) [ C(A A) C ] C ˆΘ, S e =(Y A ˆΘ) (Y A ˆΘ), resectively, where C is a q k known matrix with rank(c) = q, and ˆΘ = (A A) A Y Then S h and S e are indeendently distributed as a noncentral Received March 3, 2004 Revised May20, 2004 Acceted May24, 2004 *Graduate School of Science, Hiroshima University, -3- Kagamiyama, Higashi-Hiroshima 739-8526, Jaan

20 YASUNORI FUJIKOSHI ET AL Wishart distribution W (q, Σ; MM ) and a central Wishart distribution W (n, Σ), where n = N k, and M is a q matrix such that (3) MM =(CΘ) [ C(A A) C ] CΘ Under the assumtion that n, the followingthree well known statistics have been used: (i) Likelihood Ratio statistic: log( S e / S e + S h ) (ii) Lawley-Hotellingtrace criterion: tr S h Se (iii) Bartlett-Nanda-Pillai trace criterion: tr S h (S e + S h ) When n<, S e becomes singular, and it will be imossible to use the classical statistics For such cases, a non-exact test was first roosed by Demster (958, 960) for one and two samle cases For testing(2), we can write the corresondingstatistic as (iv) Demster trace criterion: (4) T D = (tr S h )/(tr S e ), which may be called Demster trace criterion For the null distribution of T D,it has been roosed to use F -aroximations by Demster (958, 960), Takeda and Goto (999), etc It seems that the F -aroximations are good in some situations with Σ = λi The test was alied by Demster (960) to an examle which consist of 62 biological measurements and 2 subjects, in order to examine whether there is evidence that the 62 items could be used to distinguish between alcoholic or non-alchoholic In general, it is getting imortant to develo multivariate theory for analyzingmultivariate datasets with fewer observations than the dimension, or with large dimension in comarison of the number of samles, in various alied area Note that the criterion has been roosed for high dimensional case, and so, it is imortant to study its asymtotic behavior when the dimension is large comared to the number n In this aer we study its distribution under a high dimensional framework: (5) A : q;fix, n,, c (0, ) n Further, in addition to A we will assume (6) A2 : tr Σk = O(), (k =, 2), for the null case, and in addition to A and A2 (7) A3 : tr Σk Ω=O(), (k =, 2), for the nonnull case, where Ω=Σ /2 Θ C (C(A A) C ) CΘΣ /2

ASYMPTOTICS IN HIGH DIMENSIONAL MANOVA 2 The assumtion (6) will be natural The assumtion (7) corresonds to local alternatives when Σ = λi In Section 2 we derive asymtotic null distribution of T D Our aroximations are numerically examined through some exeriments On the other hand, Wakaki et al (2002) derived asymtotic distributions of the null and nonnull distributions of the three classical test statistics under the assumtion (5) with c< and the assumtions (6) and (7) In Section 3 we comare the ower of Demster trace criterion with the ones of three classical tests Naturally it is seen that T D will be more owerful than the classical tests as becomes near to n 2 Demster s test In this section we derive the limitingnull and nonnull distributions of Demster test statistic 2 Limiting null distribution Under the null hyothesis, S h and S e are indeendently distributed as the central Wishart distributions, W (q, Σ) and W (n, Σ), resectively Let T D = { n trs } h q trs e Let U and V be defined by U = trs h qtrσ, V = trs e ntrσ (2) 2qtrΣ 2 2ntrΣ 2 For our asymtotics, we assume (5) and (6) Then, consideringthe characteristic functions of U and V, it is seen that U and V are asymtotically distributed as the nomal distribution N(0, ) Using(2), we can exand T D as { T D = U 2nq (trσ 2 )/ + q } n(trσ)/ V 2 (trσ 2 )/ + q n(trσ)/ 2q(trΣ = 2 )/ U + o() (trσ)/ Therefore, we obtain the followingtheorem Theorem 2 (6), it holds that Under the asymtotic framework (5) and the assumtion T D σ D d N(0, ), where d denotes convergence in distribution, and σ D = 2q(tr Σ 2 )/ (tr Σ)/

22 YASUNORI FUJIKOSHI ET AL For a ractical situation, we need to use an estimator instead of σ D since Σ is usually unknown Srivastava (2003) has roosed a high dimensional consistent estimator, ie, (n, )-consistent estimator given by ˆσ D = 2q {(trs 2 e )/n 2 (trs e ) 2 /n 3 } / (trs e )/(n) Table 2 Uer 5 ercent oints Σ = diag(,,) Table 22 Actual error robabilities of the first kind when the nominal level is 005 Σ = diag(,,) q n Simu σ D ˆσ D 2 40 40 3589 3290 3354 2 40 80 354 3290 3328 2 80 40 355 3290 339 2 80 80 3459 3290 3339 2 20 20 3427 3290 3304 2 20 200 3406 3290 3337 2 200 20 3399 3290 330 2 200 200 3398 3290 3283 4 40 40 558 4652 4860 4 40 80 5039 4652 4652 4 80 40 4969 4652 4648 4 80 80 490 4652 4665 4 20 20 4835 4652 4648 4 20 200 4803 4652 4567 4 200 20 4798 4652 47 4 200 200 4788 4652 4709 q n σ D ˆσ D 2 40 40 00640 006 2 40 80 0067 00597 2 80 40 00604 00555 2 80 80 00586 00560 2 20 20 00564 00558 2 20 200 00557 00532 2 200 20 0055 00546 2 200 200 0055 00553 4 40 40 0067 00595 4 40 80 00632 00632 4 80 40 00606 00607 4 80 80 00595 0059 4 20 20 00563 00564 4 20 200 00557 00589 4 200 20 00548 00528 4 200 200 00550 00528 Table 23 Maximal value of q =2, 4, 6, 8, 0 n 20 40 60 80 00 20 40 60 80 200 20 8 8 40 6 4 6 0 8 8 60 2 2 6 8 0 0 0 0 80 4 4 8 0 8 0 0 0 00 4 6 6 8 8 0 0 0 20 2 6 8 0 0 0 0 0 40 2 4 4 8 0 0 0 0 0 60 4 4 8 0 0 0 0 0 80 2 4 4 0 0 0 0 0 0 200 2 2 6 8 0 0 0 0 0

ASYMPTOTICS IN HIGH DIMENSIONAL MANOVA 23 We attemted to clear numerical accuracy of the limitingdistribution in Theorem 2 Note that the null distribution of T D deends on Σ through its characteristic roots λ,λ 2,,λ, and hence we may assume that Σ = diag(λ,,λ ) We choosed the values of q, n,, and (λ,,λ ) as follows: q; 2, 4, 6, 8, 0, n, ; 20, 40, 60, 80, 00, 20, 40, 60, 80, 200, (λ,,λ )=(,,) The error robabilities of the first kind are almost monotone decreasingfor n and, monotone increasingfor q, and lager than 005 If n or is smaller than 50, the error robabilities of the first kind are almost lager than 006 If n and are larger than 00, the error robabilities of the first kind are almost smaller than 006 and larger than 005 We show a art of these results on Table 2 and Table 22 Table 2 gives the estimated uer 5 ercent oints based on Monte Carlo simulation, the limitingdistribution in Theorem 2 and the limitingdistribution when σ D is relaced by ˆσ D, resectively Table 22 gives the actual error robabilities of the first kind by usingthe aroximated ercent oints Table 23 gives the maximal values of q in the above range such that the error robability of the first kind is smaller than 006 and larger than 005 It should be noted that our aroximation method underestimates the ercent oint Further, the aroximations are rather imrovingby usingˆσ D instead of σ D However, the aroximations are not very accurate excet the case when n is large in comarison with In order to get more accurate aroximations it is exected to obtain asymtotic exansions u to O(n /2 )oro(n ) under the high dimensional framework (5) and the assumtion (6) with k = 3or k = 4 22 Limiting nonnull distribution Under alternative hyotheses, S e and S h are indeendently distributed as the central and noncentral Wishart distributions, W (n, Σ) and W (q, Σ,MM ), resectively, where M is defined in (3) Let TD = { n trs h q trσω } trs e trσ Let U and V be defined by (22) U = (trs h qtrσ trσω), V = n (trs e ntrσ) For our asymtotics, we assume (5), (6) and (7) Then U and V are asymtotically indeendent and normal More recisely, V 2(tr Σ 2 )/ d N(0, )

24 YASUNORI FUJIKOSHI ET AL Further, usingthe characteristic function of the noncentral Wishart distribution (see, chater 0 of Muirhead (982)), the cumulant generating function of U can be exanded as log E[ex(itU)] = log E [ ( it ex trs h )] it = q ( 2 logdet I 2it ) Σ (qtrσ + trσω) 2 trω + ( 2 trω I 2it ) Σ it (qtrσ + trσω) { } 2it trσ + 2(it)2 trσ 2 2 trω = q 2 { trω + 2it + trσω + 4(it)2 2 it (qtrσ + trσω) + o() { } =(it) 2 q trσ2 +2 trσ2 Ω + o() Using(22) we obtain TD = { nu + q n(trσ)/ + n(trσω)/ V + n(trσ)/ = (trσ)/ U + o() Therefore we obtain the followingtheorem } trσ 2 Ω q trσω } trσ Theorem 22 Under the asymtotic framework (5) and the assumtions (6) and (7), it holds that T D σ D d N(0, ), where σ D = 2q(trΣ 2 )/ + 4(trΣ 2 Ω)/ (trσ)/ In a secial case Ω = 0, we get the limiting null distribution in Theorem 2 3 Power comarison In this section we comare the ower of Demster test with ones of likelihood ratio test, Lawley-Hotellingtest, and Bartlett-Nanda-Pillai test Let δ D = T D T D = trσω trσ,

ASYMPTOTICS IN HIGH DIMENSIONAL MANOVA 25 then the ower of T D with significance level α can be exressed as P D = Pr( T D >σ D z α )=Pr(T D >σ D z α δ D ), where z α is the uer 00α % oint of the standard nomal distribution Using Theorem 22, the asymtotic ower is ( ) lim P D = lim Φ δd σ D z α If the order of trσ k Ω(k =, 2) is larger than, δ D so that the asymtotic ower tends to one, while if the order of trσ k Ω(k =, 2) is smaller than, the asymtotic ower is α since δ D 0 and σd σ D When trσ k Ω=O( )(k =, 2), we obtain the asymtotic result easily, then ( ) lim P δd D =Φ z α σ D (3) =Φ ( σ D trσω 2qtrΣ 2 z α ) On the other hand, let T LR = T H = { m T BNP = ( + m ){ trs hse ( + m S e log (+ S e + S h + q log ) }, m } q, ) {( + m ) } trs h (S e + S h ) q, then under the assumtion trω = O( ), the ower of T G (G=LR, H, BNP) with significance level α can be exressed (see Wakaki et al (2002)) as ( ) lim P δ0 G =Φ σ z α ( ) trω/ (32) =Φ z α, 2q( + r) where r = /m and m = n + q Note that Ω in the case T D is used for trω in the case T G Comaring(3) with (32) and neglectingthe terms of o(), (trω)/ +r (trω)/ +r > trσω trσ 2 P G >P D, < trσω trσ 2 P G <P D

26 YASUNORI FUJIKOSHI ET AL By the definition of Ω, ie, Ω = Σ /2 MM Σ /2 we have trσω (trω/ ) trσ = trmm 2 trσ MM (trσ 2 )/ (vec(m)) vec(m) = (vec(m)) (I q Σ )vec(m) trσ 2 / max λ j j trσ 2 / min λ j j = trσ 2 / min λ j j, max λ j j where λ i s are the characteristic roots of Σ Thus, neglecting the terms of o() we can see that min λ j j R λ = max j λ j > +r P G <P D In articular, () If Σ = ai (a: constant), then P G <P D (2) If R λ is near to one, then P G <P D (3) If R λ is small and is small, then P G >P D (4) If c is close to one, r becomes large since r = /m c/( c), and hence P G <P D Acknowledgements The authors would like to thank two referees for their careful readings and useful comments References Demster, A P (958) A high dimensional two samle significance test, Ann Math Statist, 29, 995 00 Demster, A P (960) A significance test for the searation of two highly multivariate small samles, Biometrics, 6, 4 50 Muirhead, R J (982) Asects of Multivariate Statistical Theory, John Wiley & Sons, New York Srivastava, M (2003) Multivariate analysis with fewer observations than the dimension, submitted for ublication Takeda, J and Goto, M (999) Comarison of highly multivariate small samles Estimation of effective dimensionality in the Demster s aroximation test, Jaanese J Alied Statist, 28, 63 77 (in Jaanese) Wakaki, H, Fujikoshi, Y and Ulyanov, V (2002) Asymtotic exansions of the distributions of MANOVA test statistics when the dimension is large, TR No 02-0, Statistical Research Grou, Hiroshima University