On testing the equality of mean vectors in high dimension

Size: px

Start display at page:

Download "On testing the equality of mean vectors in high dimension"

Katrina Houston
5 years ago
Views:

1 ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at On testing the equality of mean vectors in high dimension Muni S. Srivastava Abstract. In this article, we review various tests that have been proposed in the literature for testing the equality of several mean vectors. In particular, it includes testing the equality of two mean vectors, the so-called two-sample problem as well as that of testing the equality of several mean vectors, the so-called multivariate analysis of variance or MANOVA problem. The total sample size, however, may be less than the dimension of the mean vectors, and so usual tests cannot be used. Powers of these tests are compared using simulation. 1. Introduction In this article, we review various tests that have been proposed in the literature for testing the equality of several mean vectors. We begin with the comparison of the mean vectors of two groups with independently distributed p-dimensional observation vectors x ij and the mean vectors µ i and the covariance matrices Σ i, i = 1, 2, j = 1,... N i. The total number of p-dimensional observation vectors, N = N 1 + N 2 is less than p. Various tests have been proposed in the literature. For normally distributed observation vectors, and when Σ 1 = Σ 2, a test has been proposed by Dempster [3]. Bai and Saranadasa [1] proposed another test which does not require the assumption of normality but have asymptotically the same power as the one proposed by Dempster [3]. Srivastava [7] proposed a Hotelling s T 2 type test, denoted by T +2, by using Moore Penrose inverse of S in place of S 1 as n < p, n = N 1 + N 2 2. It may be noted that all the above three tests, namely, T D, T BS and T +2 are invariant under the group of orthogonal matrices. A test that is invariant under the group of non-singular p p diagonal Received January 12, Mathematics Subject Classification. 62H10, 62H15. Key words and phrases. Equality of two mean vectors, high dimensional, multivariate analysis of variance, sample smaller than dimension, inequality of two covariance matrices. 31

2 32 MUNI S. SRIVASTAVA matrices has recently been proposed by Srivastava and Du [9] and Srivastava [8]. It may be noted that this test is not invariant under the transformation by orthogonal matrices. The power comparison of these tests will be given in this article. Bai and Saranadasa s [1] test as well as the test proposed by Srivastava and Du [9] can be generalized to the case when the covariance matrices Σ 1 and Σ 2 of the two groups are not equal; for testing the equality of two covariance matrices, see Srivastava and Yanagihara [13]. The generalized versions of these two tests have been considered by Chen and Qin [2], and Srivastava, Katayama and Kano [11]. For testing the equality of the mean vectors of several groups, the socalled multivariate analysis of variance or simply MANOVA, it is assumed that all the groups have the same covariance matrix. Under the assumption that (p/n) c, c (0, ), Fujikoshi, Himeno, and Wakaki [4] and Schott [5] have given a generalized version of Dempster [3] and Bai Saranadasa [1] two-sample tests for the MANOVA problem. Tests that do not require the above assumption have been proposed by Srivastava and Fujikoshi [10]. The above two tests considered by Fujikoshi et al. [4], Schott [5], Srivastava and Fujikoshi [10] require the assumption of normality to obtain the asymptotic distributions of these statistics. Following Srivastava [8], it can, however, be shown that these two tests are robust under a general non-normal distributions. A third test based on the Moore Penrose Inverse of the sample covariance matrix has been proposed by Srivastava [7] under the assumption of normality. For comparison of the asymptotic powers of these three tests, see Srivastava and Fujikoshi [10]. The above three tests are, however, not invariant under the transformation by non-singular diagonal matrices. A test that has this property has been recently proposed by Yamada and Srivastava [14] under normality assumption. The organization of this article is as follows. Since it is assumed that N < p, we show in Section 2 that there does not exist a test that is invariant under the transformation of the observation vector by any p p non-singular matrix. Thus, we shall be considering tests that are invariant under smaller groups. Two such groups are the group of orthogonal matrices and the group of non-singular diagonal matrices. Such tests for the equality of two mean vectors will be given in Section 4 for the non-normal model described in Section 3. In Section 5, the problem of testing the equality of two mean vectors is considered again only under normality but the covariance matrices of the two groups are not equal. The MANOVA problem will be considered in Section 6. The paper concludes in Section 7.

3 ON TESTING THE EQUALITY OF MEAN VECTORS Consequences of N < p, Σ non-singular In this section, we show that no invariant test exists under the transformation of the observation vectors by a non-singular matrix. Let be two sample points. Let X = (x 1, x 2,..., x N ) : p N, N < p, X = (x 1, x 2,..., x N) : p N, N < p Z = (X, X 1 ) and Z = (X, X 1), where X 1 and X1 are p (p N) arbitrary matrices chosen such that Z and Z are non-singular (n.s.). Thus Hence I p = (Z ) 1 Z = (Z ) 1 (X, X 1) = [(Z ) 1 X, (Z ) 1 X 1] (Z ) 1 X = = ( ) IN 0. 0 I p N ( ) ( ) IN, Z(Z 0 ) 1 X IN = (X, X 1 ) = X, 0 X = AX, A = Z(Z ) 1, where A is n.s. Thus, for two sample points, X and X, on the sample space, there exists a non-singular matrix taking one point to the other. That is the whole sample space is a single orbit; group of non-singular transformations is transitive. Hence, no invariant test under the group of non-singular transformations exists. Clearly then, we need to consider smaller group of transformations, namely invariance under the group of orthogonal matrices and invariance under the group of non-singular diagonal matrices. Thus, tests have been proposed in the literature that are invariant under orthogonal group or invariant under the group of non-singular diagonal matrices. The latter tests appear to perform better than the ones that are invariant under orthogonal group. 3. A non-normal model To show that the two-sample tests and tests in multivariate analysis of variance (MANOVA) hold when the observation vectors are not necessarily normally distributed, we consider a general model for the independently distributed p-dimensional vectors x ij, j = 1,..., N i, i = 1, 2. We call this model as Model M which we describe next.

4 34 MUNI S. SRIVASTAVA Model M In this model, we assume that the observation vectors x ij satisfy (a), (b), and (c) given below: (a) x ij = µ i + F i u ij, j = 1,..., N i, i = 1, 2, Cov(x ij ) = Σ i = Fi 2, i = 1, 2, [ p ] (b) E k=1 uν k ijk = p k=1 E(uν k ijk ), u ij = (u ij1,..., u ijp ), ν k 0, ν ν 4 4, ν i are integers, (c) E(u 4 ijk ) = K 4 + 3, K 4 2. It may be noted that F i is the unique p p positive definite matrix such that Σ i = F i F i. The results, however, holds for a general factorization of Σ i = C i C i where C i is a p p non-singular matrix (see Srivastava [8]). For simplicity of presentation we have, however, chosen Σ i = Fi 2. Bai and Saranadasa [1] have chosen a model in which Σ i = Γ i Γ i, Γ i is a p m matrix, m p; in our model m = p. However, the assumptions to prove the asymptotic normality of the test statistic proposed by Bai and Saranadasa [1] under their model are much stronger than for Model M, see Section Two sample tests: Σ 1 = Σ 2 Let x 11,..., x 1N1 be independent and identically distributed (i.i.d.) vectors with p-variate normal distribution N p (µ 1, Σ 1 ), and x 21,..., x 2N2 be i.i.d. N p (µ 2, Σ 2 ), where both samples are independently distributed. The sample mean vectors are, respectively, given by x 1 = 1 N 1 x ij, x 2 = 1 N 2 x 2j, N 1 N 2 j=1 j=1 The sample covariance matrices are, respectively, given by S 1 = 1 N 1 (x 1j x 1 )(x 1j x 1 ), n 1 = N 1 1, n 1 j=1 S 2 = 1 N 2 (x 2j x 2 )(x 2j x 2 ), n 2 = N 2 1. n 2 j=1 When Σ 1 = Σ 2 = Σ, an unbiased estimator of Σ is given by S = n 1S 1 + n 2 S 2, n = n 1 + n 2 = N 1 + N 2 2. n

5 ON TESTING THE EQUALITY OF MEAN VECTORS Dempster s test. For testing the hypothesis H : µ 1 = µ 2 versus A : µ 1 µ 2, four tests have been proposed in the literature by Dempster [3], Bai and Saranadasa [1], Srivastava [7] and Srivastava and Du [9]. The first three tests are invariant under orthogonal transformation and the fourth one is invariant under the transformation by any p p non-singular diagonal matrix. We begin with the test proposed by Dempster [3] under the assumption of normality. In the notation of Section 4, Dempster s test statistic is given by ( 1 T D = + 1 ) 1 ( x 1 x 2 ) ( x 1 x 2 )/(tr S). (4.1) N 1 N 2 Let G be an N N orthogonal matrix given by [ ( ) ] 1 (N G = 1 N 2 /N 1 N) N1, g N (N 1 /N 2 N) 1 3,..., g N, N = N 1 + N 2, 2 1 N2 where GG = G G = I N, g i g i = 1, i = 3,..., N, and g i g j = 0, i j. Let X = (x 11,..., x 1N1, x 21,..., x 2N2 ) : p N be the p N matrix of the observation vectors from the two groups, and Then XG = (y 1,..., y N ). E(y 1 ) = N 1 2 (N1 µ 1 + N 2 µ 2 ), E(y 2 ) = ( 1 N N 2 ) 1 2 (µ1 µ 2 ), E(y j ) = 0, j = 3,... N. Hence, we can write T D in terms of (N 1) independent random vectors y 2,..., y N as T D = ny 2 y 2 y 3 y y N y, n = N 1 + N 2 2. (4.2) N If Σ = γ 2 I p, then y 2,..., y N are i.i.d. N p (0, γ 2 I p ) under µ 1 = µ 2. Hence, T D F p,np. It may be noted that when Σ = γ 2 I p and under the assumption of normality Dempster s test T D is uniformly most powerful among all tests whose power depends on µ µ/γ 2. To obtain the distribution of T D when Σ γ 2 I p, it is assumed that under the null hypothesis y i y i, i = 2,..., p, are independently distributed as mχ 2 r, where m > 0, r > 0 are scalar unknown quantities. Clearly, the distributions of T D will not depend on m and thus, we need to find only r. Dempster

6 36 MUNI S. SRIVASTAVA gave two iterative methods to obtain an estimate of r. Bai and Saranadasa [1] obtained an experssion for r as follows: E(y iy i ) = tr Σ = me(χ 2 r) = mr and Hence, V ar(y iy i ) = 2tr Σ 2 = V ar(mχ 2 r) = 2m 2 r. r = (tr Σ) 2 /(tr Σ 2 ). Bai and Saranadasa [1], however, did not provide any estimator of r, as they have provided only ratio consistent estimator of tr Σ 2 which cannot be used here. An estimator of r is obtained as follows. Let b = (a 2 1/a 2 ), where a i = tr Σ i /p, i = 1, 2. It may be noted that from Cauchy Schwarz inequality 0 < (tr Σ) 2 p tr Σ 2, the equality on the right hand side holds if and only if Σ = γ 2 I p, γ > 0. Hence, 0 < b 1, and r p. Under the assumption of normality of the observation vectors and assuming that 0 < lim p a i <, i = 1,..., 4, it has been shown by Srivastava [6] that â 1 = tr S p, â 2 = 1 p [tr S2 1 n (tr S)2 ], are consistent estimators of a i. Thus, ˆb = (â 2 1 /â 2) is a consistent estimator of b giving ˆr = pˆb, and T D F [ˆr],[nˆr]. Thus, under the hypothesis that µ 1 = µ 2, Dempster s T D statistic for normally distributed observations is approximately distributed as an F- distribution with [ˆr] and [nˆr] degrees of freedom, where [a] denotes the largest integer value a. Clearly then the hypothesis of equality of two mean vectors is rejected if T D > F [ˆr][nˆr],1 α, the 100(1 α)% point of the F-distribution with degrees of freedom mentioned above. The asymptotic power of the T D test has been derived by Bai and Saranadasa [1] who proposed another test and showed that the asymptotic power of the T D test is the same as the one proposed by them. This test will be called Bai Saranadasa test in our discussion which we describe next.

7 ON TESTING THE EQUALITY OF MEAN VECTORS Bai Saranadasa test. Consider an asymptotic version of Dempster s statistic. The mean of the numerator is given by ( ) 1 [ ] E ( x 1 x 2 ) ( x 1 x 2 ) N 1 N 2 ( 1 = + 1 ) 1 [ ] E( x N 1 N 1 x 1 2 x 1 x 2 + x 2 x 2 ) 2 ( 1 = + 1 ) 1 [ tr Σ + µ N 1 N 2 N 1µ 1 2µ 1µ 2 + tr Σ ] + µ 1 N 2µ 2 2 = tr Σ + τ(µ 1 µ 2 ) (µ 1 µ 2 ), τ = N 1 N 2 /(N 1 + N 2 ). So the numerator may be estimated by [ ( ) 1 ( x 1 x 2 ) ( x 1 x 2 ) tr S]. N 1 N 2 The asymptotic variance of this estimator is 2tr Σ 2 + o(1). Under the assumption that lim(p/n) = c, 0 < c < and λ max (Σ) = o(p 1 2 ), Bai and Saranadasa [1] showed that as (n, p), (tr Σ 2 ) 1 [ tr S 2 1 n (tr S)2 ] p 1. (4.3) Thus, using the ratio consistent estimator of (tr Σ 2 ) given in (4.3), Bai and Saranadasa [1] proposed the statistic ( ) N1 N2 ( x1 x 2 ) ( x 1 x 2 ) tr S T BS = 2[tr S 2 1 n (tr S)2 ] (4.4) for testing the hypothesis that µ 1 = µ 2. They also showed that under the hypothesis T BS is normally distributed with mean zero and variance 1 for a general model described in Section 3 that includes the normal model as a special case. To obtain the asymptotic distribution of T BS under the alternative hypothesis A = µ 1 µ 2, it is assumed that the difference between the two mean vectors satisfy the following conditions: ( ) 1 1 (µ 1 µ 2 ) Σ(µ 1 µ 2 ) = o(tr Σ 2 ), (4.5) N N 2 (p/n) c > 0 and λ max (Σ) = o( tr Σ 2 ). (4.6) Under the conditions (4.5) and (4.6), Bai and Saranadasa [1] showed that the asymptotic power of Dempster s test T D and Bai and Saranadasa s test

8 38 MUNI S. SRIVASTAVA T BS are equal and is given by ( β(t D ) = β(t BS ) = Φ z 1 α + N 1N 2 (N 1 + N 2 ) ) (µ 1 µ 2 ) (µ 1 µ 2 ), (4.7) 2pa2 where Φ is the distribution function of the standard normally distributed random variable with mean 0 and variance 1, Φ(z 1 α ) = 1 α, 0 < α < 1. It may be recalled that both tests T D as well as T BS are one-sided tests: Reject H : µ 1 = µ 2, if T D > z 1 α ; similarly reject H if T BS > z 1 α Srivastava s T +2 test. Using Moore Penrose of S, Srivastava [7] proposed the statistic ( 1 T +2 = + 1 ) 1 ( x 1 x 2 ) S + ( x 1 x 2 ), (4.8) N 1 N 2 where S + is the Moore Penrose inverse of S, which is unique and satisfies the following conditions: S + SS + = S +, SS + S=S, SS + and S + S are symmetric matrices. Let S = 1 n V, n = N 1 + N 2 2. It can be shown that the T +2 statistic is invariant under the transformation x ij cγx ij, c 0, ΓΓ = I p. Thus, without loss of generality, we may assume that Σ = Λ, a diagonal matrix. Note that S + = nv + = nh L 1 H, HH = I n, where L = diag (l 1,..., l n ), the non-zero eigenvalues of V. Thus, ( 1 T +2 = n + 1 ) 1 ( x 1 x 2 ) H L 1 H( x 1 x 2 ). (4.9) N 1 N 2 Let where ( 1 z = + 1 ) 1 2 AH( x1 x 2 ), N 1 N 2 A = (HΛH ) 1 2. Hence, under the hypothesis that µ 1 = µ 2, T +2 = nz (ALA) 1 z, z N n (0, I n ). (4.10) In order to obtain the asymptotic distribution of T +2, it is assumed that 0 < lim p a i <, i = 1,..., 4, where a i = (tr Σ i /p). Under the above assumption it has been shown in Srivastava [7], that as p, p 1 ALA p bi n, where b = (a 2 1 /a 2). Thus, under the hypothesis, as p, bpt +2 d z z, n

9 ON TESTING THE EQUALITY OF MEAN VECTORS 39 where d stands for in distribution. Since, under the hypothesis z z has a chi-square distribution with n degrees of freedom denoted by χ 2 n (bp/n)t +2 is asymptotically distributed as χ 2 n when p. Although b is unknown, a consistent estimator of b is given by ˆb = (â 2 1 /â 2) as p and n go to infinity. Thus, approximately (ˆbp/n)T +2 is distributed as χ 2 n and the hypothesis is rejected if (ˆbp/n)T +2 > χ 2 n,1 α, (4.11) where P (χ 2 n < χ 2 n,1 α ) = 1 α. Alternatively, we may consider the asymptotic distribution of (ˆbp/n)T +2 as p, and then n. That is, we consider the standardized statistic T + S = (ˆbp/n)T +2 n 2n (4.12) which is asymptotically distributed as standard normal. Thus, the hypothesis is rejected if T + S > z 1 α, where Φ(z 1 α ) = 1 α. It may be noted that for moderate sample size, the approximate distribution in (4.11) may give a better approximation than (4.12). To obtain the asymptotic power of T + S test, we consider local alternative in which, (µ 1 µ 2 ) = (τn) 1 2 δ, (4.13) where δ is fixed, n = N 1 +N 2 2, and τ = N 1 N 2 /(N 1 +N 2 ). The asymptotic power of the T + S statistic is given by ] ( n ) 1 β(t + S [ z ) = Φ 2 τ(µ 1 α + 1 µ 2 ) Λ(µ 1 µ 2 ). (4.14) 2 pa 2 It may be noted that all the above three tests, namely, T D, T BS and T +2 or equivalently T + S are invariant under the transformations x ij aγx ij, a 0, ΓΓ = I p. It has been shown by Bai and Saranadasa [1] that the tests T D and T BS have same asymptotic power given in (4.8). Comparing it with the power of T + S given in (4.13), we find that the test T + S may be preferred if ( ) 1 n 2 θ Λθ > θ θ, θ = (µ pa 1 µ 2 ) (4.15) 2 For example, if θ N p (0, Λ), then on the average (4.15) implies that ( ) 1 n 2 tr Λ 2 > tr Σ, that is pa 2 n > (a 2 1/a 2 2)p = bp. (4.16)

10 40 MUNI S. SRIVASTAVA Since a 2 1 < a 2, many such n exist for large p. Similarly, if θ =( λ 1,..., λ p ), where λ 1 > > λ p are the eigenvalues of Σ, the same inequality given in (4.16) is obtained Srivastava Du test. All the above three tests are invariant under the transformations x ij aγx ij, a 0, ΓΓ = I p, but not invariant under the transformations x ij Dx ij, where D is a p p non-singular diagonal matrix, D = diag (d 1,... d p ). This implies that change of unit of measurements will affect all the above three statistics. Srivastava and Du [9] proposed a statistic that is invariant under the transformation by any p p non-singular diagonal matrix. This test statistic is given by where T SD = ( 1 N1 + 1 N2 ) 1 ( x1 x 2 ) D 1 S [ { 2 tr ˆR } ] 1 2 ( p2 n ) 2 C p,n ( x 1 x 2 ) p, ˆR = D S SD 2 S, D S = diag (S 11,..., S pp ), S = (S ij ), ( tr C p,n = 1 + ˆR ) 2 p 1, as (n, p). p 3 2 Under normality assumptions this test was proposed by Srivastava and Du [9]. It has been shown to be robust by Srivastava [8]. Let Σ = (σ ij ) = D 1 2 σ RD 1 2 σ, (4.17) where D 1 2 σ = diag (σ ,..., σ 1 2 pp ). Then R is the correlation matrix which can be estimated by ˆR defined above. In order to obtain the distribution of T SD, Srivastava and Du [9] assumed the following: ( ) tr R i (i) 0 < lim <, i = 1,..., 4, (4.18) p p (ii) lim p max 1 i p λ ip p = 0, (4.19) where λ 1p,..., λ pp are the eigenvalues of the correlation matrix R. Under the hypothesis of equality of two mean vectors, T SD has asymptotically standard

11 ON TESTING THE EQUALITY OF MEAN VECTORS 41 normal distribution. Under local alternative defined by µ 1 µ 2 = (τn) 1 2 δ, τ = N1 N 2 /(N 1 + N 2 ), n = N 1 + N 2 2, where δ is fixed, the asymptotic distribution of the statistic T SD is given by [ ] lim P (T SD > z 1 α ) = Φ z 1 α + τ(µ 1 µ 2 ) Dσ 1 (µ 1 µ 2 ). (n,p) 2tr R 2 It may be noted that Yamada and Srivastava [14] have shown that the condition (4.19) is not needed in deriving the asymptotic distribution of the statistic T SD. Srivastava and Du [9] showed theoretically that under the condition δ δ 0 < lim p p = lim p δ Dσ 1 δ tr Dσ 1 < (4.20) the T SD test is superior to T BS test. It may be noted that the condition (4.20) is satisfied for δ = (δ 1,..., δ p ) in which δ i = δ 0, i = 1,..., p, that is, all the components of the random vector have the same mean. In the next two sections, we compare the attained significance level and power of the two robust tests, namely T D and T SD through simulation Attained significance level (ASL). To compare the three tests, we need to define the attained significance levels and the empirical powers. Let z 1 α be a 100(1 α)% quantile of the asymptotic null distribution of the test statistic T which is N(0,1) in our case; thus z 1 α is the 100(1 α)% quantile of N(0, 1). With m replications of the data set simulated under the null hypothesis, the ASL is ˆα = (# of t H z 1 α ), m = 10, 000, α = 0.05, m where t H represents the values of the test statistic T based on the data sets simulated under the null hypothesis. ˆα is approximately distributed as Binomial(10000, 0.05) and has the standard deviation estimated by ŝe(ˆα) = /10, For simplicity of presentation, we shall consider one-sample case in which we test that the mean vector is zero Empirical power. To compute the empirical powers, we shall use the empirical critical points. Specifically, we first simulate m replications of the data set under the null hypothesis, then select the (mα) th largest value of the test statistic as the empirical critical point, denoted as ẑ 1 α, that is, the 100(1 α)% quantile of the empirical null distribution of the test statistic

12 42 MUNI S. SRIVASTAVA obtained from the m replications. Then another m replications of the data set are simulated under the alternative with the given choice of µ. The empirical power is calculated by ˆβ = (# of t A ẑ 1 α ). m Table 1. Attained significance levels of T SD and T D under the null hypothesis, when R = I p and R = R 1, respectively D σ = I p D σ = D σ,1 D σ = D σ,2 p N T SD T D T SD T D T SD T D R = I p R = R

13 ON TESTING THE EQUALITY OF MEAN VECTORS 43 Table 2. Empirical powers of T SD and T D under the alternative hypothesis, when R = I p and R = R 1, respectively D σ = I p D σ = D σ,1 D σ = D σ,2 p N T SD T D T SD T D T SD T D R = I p R = R Parameter selection: one-sample case. We consider both independent correlation structures R = I p = diag (1, 1,... 1) and equal correlation structure R = R 1 = (ρ ij ) : ρ ij = 0.25, i j. We also consider different scalar matrix D σ = diag (σ 11,..., σ pp ). We select D σ = I p, D σ = D σ,1 : σ ,..., σ 1 2 iid iid pp Unif(2, 3) and D σ = D σ,2 : σ 11,..., σ pp χ 2 3. For the alternative hypothesis, we choose µ = ν = (ν 1,... ν p ) : ν 2k 1 = 0 iid and ν 2k Unif( 1 2, 1 2 ), k = 1,..., p 2.

14 44 MUNI S. SRIVASTAVA 5. Two sample tests: Σ 1 Σ 2 In this section, we consider the problem of testing the equality of the mean vectors of two groups when the covariance matrices of the two groups are not equal. For normally distributed observation vectors, the equality of two covariance matrices can be ascertained using a test proposed by Srivastava and Yanagihara [13]. And if it is found that the covariance matrices of the two groups are not equal, the tests given in this section should be used. We begin with a test proposed by Chen and Qin [2] Chen Qin test statistic. In the notation of Section 4, this test statistic is given by T cq = 1 N1 N 1 n 1 i j x 1i x 1j + 1 N2 N 2 n 2 [ 2 N 1 n 1 tr Σ N 2 n 2 tr Σ i j x 2i x 2j 2 x 1 x 2 ] 1, 2 tr Σ 1 Σ 2 N 1 n 2 where tr Σ 2 i = 1 N i tr (x ij x N i n i i(j,k) )x ij(x ik x i(jk) x ik ), i = 1, 2, j k tr Σ 1 Σ 2 = 1 N 1 N 2 tr (x 1j x N 1 N 2 1(j) )x 1j(x 2k x 2(k) )x 2k, j=1 k=1 x i(j,k) = 1 N i 2 (N i x i x ij x ik ), i = 1, 2; j, k = 1,..., N i, x i(k) = 1 n i (N i x i x ik ), i = 1, 2; k = 1,....N i. In order to derive the asymptotic distribution of the statistic T cq, Chen and Qin [2] made the following assumption. Assumption (A) A(1) x ij = Γ i z ij + µ i, j = 1,..., N i, i = 1, 2, where each Γ i is a p m matrix for some m p such that Γ i Γ i = Σ i, and {z ij } N i j=1 are m-variate independent and identically distributed random vectors satisfying E(z ij ) = 0, Cov(z ij ) = I m, and for z ij = (z ij1,..., z ijm ), it is assumed that E(zijk 4 ) = K <, and E(z α 1 ijl 1, z α 2 ijl 2,..., z αq ijl q ) = q r=1 E(zαr ijl r ), q r=1 α r 8, and l 1 l 2 l q,

15 ON TESTING THE EQUALITY OF MEAN VECTORS 45 tr Σ A(2) lim i Σ j Σ k Σ l p = 0, i, j, k, l = 1 or 2, {tr (Σ 1 +Σ 2 )} 2 A(3) N(µ 1 µ 2 ) Σ i (µ 1 µ 2 ) = o { tr (Σ 1 + Σ 2 ) 2}. Chen and Qin (2010) proved the asymptotic normality of T cq under the hypothesis that µ 1 = µ 2 and under the Assumptions (A1) (A2) given above. They also obtained the asymptotic power under Assumptions (A1) (A3). It is given by ( ) lim P 1(T cq > z 1 α ) = Φ z 1 α + (µ 1 µ 2 ) (µ 1 µ 2 ) (N 1,N 2,p) σn,p 2, where σ 2 n,p = 2 N 1 n 1 tr (Σ 2 1) + 2 N 2 n 2 tr (Σ 2 2) + 4 N 1 N 2 tr (Σ 1 Σ 2 ). It may be noted that σn,p 2 is the variance of the numerator of T cq which can also be estimated by using the usual consistent estimators of tr Σ 2 i /p and tr Σ 1 Σ 2 /p, i = 1, 2. Most importantly note that the numerator of T cq, 1 N 1 n 1 N 1 i j x 1ix 1j + 1 N 2 n 2 N 2 i j x 2ix 2j 2x 1x 2 = ( x 1 x 2 ) ( x 1 x 2 ) 1 N 1 tr S 1 1 N 2 tr S 2, which is identical to the one obtained by generalizing Bai and Saranadasa test to the case when Σ 1 Σ 2, and much simpler to compute. Thus, Srivastava, Katayama and Kano [11] proposed a simpler test given in the next subsection in which a different consistent estimator of the variance of the numerator is used A simpler test than T cq. Chen Qin test T cq has rather complicated expressions and takes much longer time in computing with no apparent advantage in terms of power than the corresponding simpler test T 2 = p 1 2 â 2i = 1 p [ ( x 1 x 2 ) ( x 1 x 2 ) N 1 1 tr S 1 N 1 2 tr S 2 [ 2N 2 1 â21 + 2N2 2 ] â22 + 4(N 1 N 2 p) tr S 1 S 2 [tr S 2i 1ni (tr S i ) 2 ], n i = N i 1, i = 1, 2. The test T 2 has the same distribution as T cq. The power and ASL for both tests, are indistinguishable, as seen in the attached tables obtained from simulation. ],

16 46 MUNI S. SRIVASTAVA 5.3. Srivastava Katayama Kano test. It may be noted that the tests T cq and T u are both invariant under the transformation by an orthogonal matrix but not invariant under the transformation by any non-singular diagonal matrix. In this subsection we describe a test which is invariant under the group. Let S i = (S ijk ), ˆD i = diag (S i11,..., S ipp ), i = 1, 2, ˆD = N1 1 ˆD 1 + N2 1 ˆD 2, ˆR = ˆD 1 2 (N 1 1 S 1 + N2 1 S 2) ˆD 1 2 = (rij ), q n = [( x 1 x 2 ) ˆD 1 ( x 1 x 2 ) p]. Let σ 2 (q n ) = V ar(q n ) = 2trR 2, where R is the population version of the sample correlation matrix ˆR. Then the statistic proposed by Srivastava, Katayama and Kano [11] is given by T 1 = q n, n [C p,nˆσ 2 (q n )] 1 i = N i 1, i = 1, 2, 2 where tr ˆR 2 C p,n = 1 + p 2 1, as (n, p), and [ ˆσ 2 (q n ) = 2 (tr ˆR 2 ) 1 n 1 N1 2 (tr ˆD 1 S 1 ) 2 1 ] n 2 N2 2 (tr ˆD 1 S 2 ) 2. It can be shown that under the Assumptions (B1) (B3) stated below, ˆσ 2 (q n ) is a ratio consistent estimator of σ 2 (q n ): namely [ˆσ 2 (q n )/σ 2 p (q n )] 1. Assumption (B) (B1) 0 < c 1 < min 1 k p σ ikp < max 1 k p σ ikk < c 2 < uniformly for all p, (B2) lim p [tr R 4 /(tr R 2 ) 2 ] = 0, (B3) (N 1 /N) k (0, 1) as N = N 1 + N 2, (B4) N m = O(p δ ), δ > 1 2, N m = min(n 1, N 2 ), (B5) (µ 1 µ 2 ) D 1 2 RD 1 2 (µ 1 µ 2 ) = o(tr R 2 ),

17 ON TESTING THE EQUALITY OF MEAN VECTORS 47 where R = D 1 2 (N 1 1 Σ 1 + N2 1 2)D 1 2, Σi = (σ ijk ), i = 1, 2, D = (N N2 1 2), D i = diag (σ i11,..., σ ipp ). It can be shown that when µ 1 = µ 2, and under the Assumptions (B1) (B4), we get the following theorem. Theorem 5.1. Under the Assumptions (B1) (B4), when µ 1 = µ 2, P 0 {T 1 z 1 α } Φ(z 1 α ) as (N 1, N 2, p). Srivastava, Katayama and Kano [11] obtained this result assuming normality of the observation vectors. To obtain the distribution under the alternative hypothesis, we choose the local alternative given in (B5), and obtain the following theorem. Theorem 5.2. Under the Assumption (B), the distribution of the statistic T 1 under the local alternative (B5) is given by P 1 {T 1 < z 1 α } Φ { z 1 α + (µ 1 µ 2 ) D 1 (µ 1 µ 2 ) 2tr R Comparison of T cq, T 1 and T 2 tests: simulation. The parameters of the simulation are given by }. N 1 = N 2 = N/2, µ 1 = µ 2 = 0 for ASL, µ 1 = 0, µ 2 = (u 1,..., u p ) T, u i i.i.d. U( 1 2, 3 2 ), Σ 1 = diag (d 1,..., d p ) R 1 diag (d 1,..., d p ), Σ 2 = diag (ψ 1,..., ψ p ) R 2 diag (ψ 1,..., ψ p ), d i = 2 + p i + 1, ψi 2 i.i.d. χ 2 p 3, R 1 = (r ij ) with r ij = ( 1) i+j (0.2) i j 0.1, R 2 = (ρ ij ) with ρ ij = ( 1) i+j (0.4) i j 0.1.

18 48 MUNI S. SRIVASTAVA Table 3. ASL and power for normal distribution: 1000 simulations ASL Power p N 1 = N 2 T 1 T cq T 2 T 1 T cq T Table 4. ASL and power for χ 2 8 : 10,000 simulations p N 1 = N 2 ASL T 1 ASL T 2 power T 1 power T

19 ON TESTING THE EQUALITY OF MEAN VECTORS 49 distribution, 1,000 simula- Table 5. ASL and power: χ 2 32 tions p N 1 = N 2 ASL T 1 ASL T 2 power T 1 power T Multivariate analysis of variance (MANOVA) The problem of testing the equality of (q + 1) mean vectors, q 2, is a special case of the multivariate regression model Y = XΘ + UΛ 1 2, Y = (y1,..., y N ) : N p, U = (u 1,..., u N ) : N p, (u i1,..., u ip ) = u i, where y 1,..., y N are independently distributed p-dimensional observation vectors, X : N k of rank k, matrix of constants, Θ : k p, matrix of parameters, Rather than assuming normality of y i, or equivalently of u i, we shall assume that E(u i ) = 0, Cov(u i ) = I p, E(u 4 ik ) = K 4 + 3, (6.1) Cov(y i ) = Λ = Λ 1 2 Λ 1 2 = (λij ). We also assume that for ν k 0, Σ p k=1 ν k = 4, [ p ] p E = k=1 u ν k ik k=1 E(u ν k ik ). (6.2)

20 50 MUNI S. SRIVASTAVA Under the assumptions (6.1) and (6.2), we consider the problem of testing the hypothesis H : CΘ = 0 versus A : CΘ 0, where C is a q k, matrix with q k, and rank(c) = q. Let where B = Y GY = (C ˆΘ) [C(X X) 1 C ] 1 C ˆΘ, G = X(X X) 1 C [C(X X) 1 C ] 1 C(X X) 1 X, ˆΘ = (X X) 1 X Y, S = n 1 Y (I N H)Y, H = X(X X) 1 X, n = N k. Then, under the hypothesis CΘ = 0, E(B) = qσ, and thus B measures the departure from the hypothesis. On the other hand, E(S) = Σ, irrespective of whether the hypothesis is true or not. Thus, a test is usually constructed by comparing in some manner B with S. Since n < p, the inverse of S does not exist, and thus the likelihood ratio test which uses the inverse of S does not exist. Fujikoshi, Himeno and Wakaki [4] proposed a test by generalizing Dempster s two-sample test, and Srivastava [7] proposed a test using Moore Penrose inverse of S. Srivastava and Fujikoshi [10] and Schott [5] proposed a test by generalizing Bai and Saranadasa s two sample test. These tests will be described in the following subsection Generalization of Dempster s test. Assume (p/n) c (0, ), (6.3) T D = [ ] tr B p tr S q, â 2 = p 1 [tr S 2 1 n (tr S)2 ], ˆσ D 2 = 2q â2 â 2, â 1 = 1 (tr S). p Then, Fujikoshi, Himeno and Wakaki [4] showed that under the hypothesis H, (6.1), and normality, ( T D /ˆσ D ) N(0, 1). Distribution under the alternative hypothesis is also given Generalization of Bai Saranadasa test. Let T 2 = p 1 2 [tr B qtr S] 2qâ2. (6.4) This test was proposed by Srivastava and Fujikoshi [10], and Schott [5]. Under the hypothesis that CΘ = 0, the asymptotic distribution of T 2 is

21 ON TESTING THE EQUALITY OF MEAN VECTORS 51 normal with mean 0 and variance 1. To obtain the asymptotic distribution when CΘ 0, we consider local alternatives. For this, let (CΘ) [C(X X) 1 C ] 1 (CΘ) = MM, (6.5) where M is a p q matrix of rank q < n, when CΘ 0. We shall assume that q is finite and tr ΛMM lim = 0. (6.6) p p Then, Srivastava and Fujikoshi [10] have shown that the power of the T 2 test is given by ( ) tr MM β(t 2 ) Φ z 1 α +, (6.7) 2pqa2 for local alternatives satisfying (6.6), and finite q Srivastava s test. Let V = ns, V + be its Moore Penrose inverse, and d 1,..., d q be the eigenvalues of BV +, q < n. Then Srivastava [7] proposed the satistic q U + = (1 + d i ) 1 = I + BV + 1. (6.8) i=1 When Σ = γ 2 I or Σ is of rank r n, Srivastava [7] obtained distributions of chi-square types, similar to the likelihood ratio test. It may be noted that the hypothesis Σ = γ 2 I can be tested by a test proposed by Srivastava [6] which has been shown to be robust under some departure from normality by Srivastava, Kollo, and von Rosen [12]. For general Σ, however, he showed that under the null hypothesis CΘ = 0, lim p (n,p) [ pˆb log U + qn 2qn < z 1 α ] = Φ(z 1 α ). (6.9) Srivastava and Fujikoshi [10] considered more generalized form of T D and T 2 and obtained the distribution without the condition (6.3) which was required by Schott [5]. Next we give the asymptotic distribution of the statistic U + under local alternatives given by M = n 1 2, (6.10) where is O(1). The asymptotic power of the test based on U + is given by [ ] β(u + ) = lim lim P pˆb log U + qn 1 < z 1 α n p 2qn = lim n lim p Φ [ z 1 α + n tr ΛMM pa 2 2qn ].

22 52 MUNI S. SRIVASTAVA The U + test should be preferred over T 2 if tr ΛMM > pa 2 2qn tr MM n = 2pqa2 ( pa2 ) 1 2. n For example, if M = (m 1,..., m q ), and m i = ( λ 1,..., λ p ) i = 1,..., q, where λ 1,..., λ p are the eigenvalues of Σ given in the same order as the matrix Λ consisting of the eigenvalues of Σ, then the test U + should be preferred if ( ) a 2 n < p 1 = pb. a 2 Since 0 < b 1, many such n exist for large p. It may be noted that the U + test assumes normality and has yet to be generalized for non-normal model such as Model M. Thus, it is not included in the tables of comparison by simulation. These tests are invariant under the group of orthogonal transformations but not invariant under p p non-singular diagonal matrices. A test that is invariant under this transformation has been proposed by Yamada and Srivastava [14] assuming normality of the observation vectors. We describe this test in the next Subsection Invariant test statistics. As mentioned above, TD and T 2 tests are not invariant under non-singular diagonal matrices. Thus, Yamada and Srivastava [14] proposed the statistic ) pq ( tr BD 1 S n n 2 T 1 = [2c p,n q(tr R 2 n 1 p 2 )] 1 2, R = D S SD 2 S under normality, where c p,n = 1 + (tr R 2 /p 3 2 ). The asymptotic distribution under the null hypothesis is standard normal and under local alternatives similar to (6.6), the asymptotic power of the T 1 -test is given by β(t 1 ) Φ [ z 1 α + tr MM D 1 Σ 2q tr R 2 ], where R = D Σ ΣD 2 Σ and D Σ = diag (σ 11,..., σ pp ), σ 11,..., σ pp being the diagonal elements of Σ. For details see Yamada and Srivastava [14], where a theoretical comparison between T 1 and T 2 similar to Srivastava and Du [9] is also given. In the next section, we compare the power of T 1 -test with that of T 2 -test by simulation.

23 ON TESTING THE EQUALITY OF MEAN VECTORS Power comparison by simulation. For simulation, we consider the problem of testing the equality of 3 mean vectors, that is, k = q + 1 = 3 and q = 2, where N 1 = N 2 = N 3 = N, and the cases of (N, p) = (10, 40), (20, 80), (30, 120) and (40, 200) are treated. Note that n = N 1 + N 2 + N 3 k = 3(N 1). For testing the inequality of the three mean vectors, we write Θ = (µ 1, µ 2, µ 3 ) : 3 p, ( ) ( ) µ 1 µ 3 C =, CΘ = µ 2 µ. 3 The observation matrix is Y = (y (1) 1,..., y(1) N ; y (2) 1,..., y(2) N ; y (3) 1,..., y(3) N ), X = 1 N N 0, N N where 1 N = (1,..., 1) : N 1 for N = 3N. For the hypothesis, without loss of generality, we choose µ 1 = µ 2 = µ 3 = 0. For the alternative hypothesis, we choose µ 1 = 0, µ 2 = 3n 1/2 p 1/4 1 p, µ 3 = µ 2. To generate the Y matrix from a non-normal distribution, we generate 3N p i.i.d. random variables u ij from three kinds of chi-square distributions, namely, χ 2 2, χ2 8 and χ 2 32 with 2, 8 and 32 degrees of freedom, respectively, and centre them and scale them as ν ij = (u ij m)/ 2m, for u ij χ 2 m, m = 2, 8, 32. Since the skewness and kurtosis (K 4 + 3) of χ 2 m are, repectively, (8/m) 1/2 and /m, it is noted that χ 2 2 has higher skewness and kurtosis than χ 2 8 and χ2 32. Write them as V = (ν (1) 1,..., ν(1) N ; ν (2) 1,..., ν(2) N ; ν (3) 1,..., ν(3) N ), where ν (i) j vectors are p-vectors, j = 1,..., N, i = 1, 2, 3. For the covariance matrix, we consider two cases. Case 1 : Σ = I p, Case 2 : Σ = D a = diag (a 2 1,..., a2 p), where a i are i.i.d. as chi-square with 3 degrees of freedom. For the Case 1 we define Y = V + X(0, µ 2, µ 3 ), where under the hypothesis, Y = V, and under the alternative, µ 2 and µ 3 are replaced by the vectors mentioned above. For the Case 2 let Y = VD 1/2 a + X(0, µ 2, µ 3 ),

24 54 MUNI S. SRIVASTAVA where under the hypothesis, Y = VDa 1/2, and in the alternative, µ 2 and µ 3 are replaced by the vectors mentioned above. The simulation results under the χ 2 m distributions for m = 2, 8 and 32 are presented in Tables 6, 7 and 8, respectively. The critical values are computed based on 100,000 replications and the powers are obtained based on 10,000 replications. Three tables report the critical values and the power in the hypothesis of the two tests, and it is seen that the critical values are appropriate. Table 6. Critical values and powers of the tests T 1 and T 2 in the case of χ 2 2-distribution with skewness 2 and kurtosis 9 Critical value Power in H Power in A N p T 1 T 2 T 1 T 2 T 1 T 2 Case Case Table 7. Critical values and powers of the tests T 1 and T 2 in the case of χ 2 8-distribution with skewness 1 and kurtosis 4.5 Critical value Power in H Power in A N p T 1 T 2 T 1 T 2 T 1 T 2 Case Case

25 ON TESTING THE EQUALITY OF MEAN VECTORS 55 Table 8. Critical values and powers of the tests T 1 and T 2 in the case of χ 2 32-distribution with skewness 0.5 and kurtosis Critical value Power in H Power in A N p T 1 T 2 T 1 T 2 T 1 T 2 Case Case As reported in the tables, the powers of the two tests perform similarly in Case 1, but the proposed test T 1 has much higher powers than T 2 in Case 2. For the χ 2 2-distribution, which has higher skewness and kurtosis, T 1 has slightly higher power than T 2 in Case 1. Clearly, when Σ = I p, all the components have the same unit of measurements and hence both tests perform equally well but when the unit of measurements are not the same, as in Case 2, the proposed test performs much better than the test based on T Concluding remarks In this article we reviewed several tests for the equality of the two mean vectors including the case when the covariance matrices of the two groups may be unequal. The asymptotic distributions are given under non-normal models. Thus, the tests are robust against the departure from normality. In MANOVA, we assume that the covariance matrices are all equal. We have shown through simulation that the tests that are invariant under nonsingular diagonal matrices perform better than those that are not invariant. Acknowledgements I would like to thank a referee and the editor handling this paper for their suggestions that led to a great improvement in the presentation. The research of this paper was supported by NSERC of Canada.

26 56 MUNI S. SRIVASTAVA References [1] Z. Bai and H. Saranadasa, Effect of high dimension: by an example of a two sample problem, Statist. Sinica 6 (1996), [2] S. X. Chen and Y. L. Qin, A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Statist. 38 (2010), [3] A. P. Dempster, A high dimensional two-sample significance test, Ann. Math. Statist. 29 (1958), [4] Y. Fujikoshi, T. Himeno, and H. Wakaki, Asymptotic results of a high dimensional Manova test and power comparison when the dimension is large compared to the sample size, J. Japan Statist. Soc. 34 (2004), [5] J. R. Schott, Some high-dimensional tests for a one-way MANOVA, J. Multivariate Anal. 98 (2007), [6] M. S. Srivastava, Some tests concerning the covariance matrix in high-dimensional data, J. Japan Statist. Soc. 35 (2005), [7] M. S. Srivastava, Multivariate theory for analysing high dimensional data, J. Japan Statist. Soc. 37 (2007), [8] M. S. Srivastava, A test of the mean vector with fewer observations than the dimension under non-normality, J. Multivariate Anal. 100 (2009), [9] M. S. Srivastava and M. Du, A test for the mean vector with fewer observations than the dimension, J. Multivariate Anal. 99 (2008), [10] M. S. Srivastava and Y. Fujikoshi, Multivariate analysis of variance with fewer observations than the dimension, J. Multivariate Anal. 97 (2006), [11] M. S. Srivastava, S. Katayama, and Y. Kano, A two sample test in high dimension with fewer observations than the dimension, J. Multivariate Anal. 114 (2013), [12] M. S. Srivastava, T. Kollo, and D. von Rosen, Some tests for the covariance matrix with fewer observations than the dimension under non-normality, J. Multivariate Anal. 102 (2011), [13] M. S. Srivastava and H. Yanagihara, Tetsing the equality of several covariance matrices with fewer observations than the dimension, J. Multivariate Anal. 101 (2010), [14] T. Yamada and M. S. Srivastava, A test for the multivariate analysis of variance in high-dimension, Commun. Statist. Theory Methods 41 (2012), 13 14, Department of Statistical Sciences, University of Toronto, 100 St George Street, Toronto, Canada address: srivasta@utstat.toronto.edu

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA

J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets