SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

Size: px

Start display at page:

Download "SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan"

Ross Willis
5 years ago
Views:

1 A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan Abstract We propose a new test for the equality of the mean vectors between a two groups with the same number of the observations in highdimensional data. The existing tests for this problem require a strong condition on the population covariance matrix. The proposed test in this paper does not require such conditions for it. This test will be obtained in a general model, that is, the data need not be normally distributed. 1 Introduction Advances of information technology and database system make it possible to collect and store high-dimensional data which have fewer observations than or equal to the dimension. High-dimensional data appears in various fields, such as DNA microarray data analysis and marketing data analysis. In such high-dimensional data, however, traditional multivariate analysis is not often applicable. In this paper, we propose a new test for the equality of two mean vectors of n observations which are given by (X p (i1), X p (i2) ), i = 1,..., n, where X p (ij) are p-dimensional random vectors. Within the group j = 1, 2, X p (ij) (i = 1,..., n) are independent, while the two groups may be dependent. In a specific situation, we can regard X p (i1) and X p (i2) as the two measurements on the same subject i before and after a treatment. Let X p (i) = X p (i1) X p (i2), µ p = E(X p (i) ) and Σ p = Var(X p (i) ), then the problem is a one-sample problem. We wish to test the following hypothesis: H 0 : µ p = 0 versus H 1 : µ p 0. (1.1) A traditional method to test the null hypothesis is Hotelling s T 2 test. However, Hotelling s T 2 test is not applicable when n p since the inverse of the 1

2 sample covariance matrix does not exist. Bai and Saranadasa (1996) propose a test for the two sample problem with equal population covariance matrices. For the one sample problem in (1.1), the test is based on the statistic F n,p = X T n,p X n,p 1 n trs n,p, (1.2) where X n,p is the sample mean vector and S n,p is the sample covariance matrix. More recently, Chen and Qin (2010) propose a test based on the statistic G n,p = 1 n(n 1) X p (i)t X p (j) (1.3) for the one sample problem in (1.1). They also propose a test for the two sample problem with unequal population covariance matrices. The asymptotic normality of the test statistic is obtained under the condition trσ 4 p = o{(trσ 2 p) 2 } and the null hypothesis H 0 in (1.1) when the sample size n and the dimension p tend to infinity. We easily show that F n,p = G n,p. Hence, for the one sample problem in (1.1), the asymptotic normality of Bai and Saranadasa s test statistic is also obtained under the condition trσ 4 p = o{(trσ 2 p) 2 } which is weaker than the original one. However, this condition is still restrictive. Indeed, the condition exclude some typical situation such as Σ p = (1 ρ)i p + ρ1 p 1 T p, ρ (0, 1), where I p is the p p identity matrix and 1 p is the p-column vector with all entries one, or the case where the maximum eigenvalue of Σ p has larger order of p than 1/2. Moreover, it is hard to estimate whether the condition is satisfied or not from the data because of its high-dimensionality. When the condition is not satisfied, the population covariance matrix makes an enormous effect on the asymptotic null distributions of the two test statistics. Katayama et al. (2010) show that the type of the asymptotic null distribution of Bai and Saranadasa s test statistic depends on the population covariance matrix, hence so is Chen and Qin s test statistic. To illustrate this phenomenon, we give a simple example. Suppose temporarily X p (i) s are independently and identically distributed (i.i.d.) as p-dimensional multivariate normal distribution with mean vector 0 p and covariance matrix Σ p = (1 ρ p )I p +ρ p 1 p 1 T p, ρ p = p α, α (0, 1). Here, 0 p denotes the p-column vector with all entries zero. Note that the condition is satisfied only when 1/2 < α < 1. Since the two test statistics are equivalent, we consider Chen 2

3 and Qin s test statistic only. We define the standardized version of the test statistic as follows: T = G n,p. 2trΣ 2 p/n(n 1) The distributions of T are plotted for (n, p) = (80, 300), (80, 500) based on 10, 000 simulations in Figure 1. This shows that the type of the asymptotic null distributions of T depends on α, hence on Σ p. When the condition is not satisfied, i.e., 0 < α 1/2, the asymptotic distribution of T is obviously not normal for the two cases considered. Indeed, the asymptotic distribution is standardized chi-square distribution with 1 degree of freedom for 0 < α < 1/2, and the convolution of normal and chi-square distributions for α = 1/2. See Katayama et al. (2010) for more details. Hence the type I error of Chen and Qin s test would be asymptotically incorrect. Estimated density of T for n = 80 and p = 300 Estimated density of T for n = 80 and p = 500 density normal α = 0.1 α = 0.3 α = 0.5 α = 0.7 α = 0.9 density normal α = 0.1 α = 0.3 α = 0.5 α = 0.7 α = Figure 1: Distribution of T, when n = 80 and p = 300, 500, based on 10, 000 Monte Carlo simulations Motivated by these findings, in this paper, we construct a test statistic which has the asymptotic normality under H 0 without any assumption on population covariance matrix. Hence, the type I error of the proposed test is always asymptotically correct when we use the percentile point of normal 3

4 distribution. The asymptotic normality is obtained without assuming that the random vectors are i.i.d. as multivariate normal distribution. The paper is organized as follows. In Section 2, we introduce the proposed test and obtain the asymptotic normality. Several simulations and a real data example are given in Section 3. All the proofs are given in Section 4. 2 Proposed Test In this section, we propose a test statistic for (1.1). The asymptotic normality of the test statistic is obtained without any assumption on the population covariance matrix under H 0. Like Srivastava (2009), we assume X (i) p = µ p + C p Z (i) p, i = 1,..., n, (2.1) where C p is a p p matrix with Σ p = C p C T p and Z (i) p = (z i1,..., z ip ) T. Assume that z ij s are independent random variables which satisfy E(z ij ) = 0, E(z 2 ij) = 1, E(z 4 ij) = γ <. For the high-dimensional inference, we assume the following condition: Condition A. n = n(p) as p. Now we propose the test statistic as where X n,p = (X (1) p T n,p = trx T n,pm n X n,p, (2.2),..., X p (n) ) T is the data matrix and M n is a known n n symmetric matrix which satisfies the following condition: Condition B. M n = (m ij ) with m ij 0 and m ii = 0, 1 T nm n 1 n 0, trm 4 n = o{(trm 2 n) 2 }. The proposed test statistic is also defined as T n,p = m ij X p (i)t X p (j). The mean and variance of T n,p are given by E(T n,p ) = (1 T nm n 1 n )µ T p µ p, (2.3) Var(T Mn ) = 2trM 2 ntrσ 2 p + 4(1 T nm 2 n1 n )µ T p Σ p µ p. (2.4) 4

5 The derivation of (2.4) is a little complicated, while the equation (2.3) is easily obtained. We shall give the derivation of (2.4) in Section 4. Since 1 T nm n 1 n 0 under Condition B, we note that T n,p is a valid statistic for testing the null hypothesis in (1.1). In view of trm 2 n 0, we also note that Var(T n,p ) 0 for any population mean vector µ p. For the asymptotic normality of T n,p under the null hypothesis H 0 in (1.1), we have the following theorem. Theorem 2.1 have Under the null hypothesis H 0 and Conditions A and B, we T n,p = T n,p 2trM 2 n trσ 2 p d N(0, 1), p, d where denotes the convergence in distribution and N(0, 1) denotes a standard normal distribution. Since trσ 2 p is unknown generally, we need to estimate it. Define U 2 = 1 P 2 n U 3 = 1 P 3 n U 4 = 1 P 4 n k k l X (i)t p X (i)t p X (i)t p X (j) p X (j) p X (j) p X p (i)t X (j) p, X p (i)t X p (k), X p (k)t X (l) p, where P r n = n!/(n r)!. Following Chen et al. (2010), an unbiased estimator trσ 2 p of trσ 2 p is given by trσ 2 p = U 2 2U 3 + U 4. (2.5) In addition to the unbiasedness, the estimator trσ 2 p is ratio-consistent which is given in the following theorem: Theorem 2.2 where Under Condition A, we have trσ 2 p P 1, p, (2.6) trσ 2 p P 1 denotes the convergence in probability. 5

6 From Theorem 2.1 and 2.2, under H 0 and Conditions A and B, we have T n,p 2trM 2 n trσ 2 p d N(0, 1), p. This result does not require any conditions on the population covariance matrix. The asymptotic normality is attained by using the known matrix M n which satisfies Condition B. For instance, the matrix M n is given by (i) M n = (ρ i j I( i j > 0)) with ρ (0, 1); (ii) M n = ( i j s I( i j > 0)) with s > 1; (iii) M n = (m i j I( i j > 0)) with lim n n 1 k=1 m k <, where I(E) for a event E equals to 1 when E is true, equals to 0 when E is false. The matrices (i) and (ii) are the special cases of (iii). We apply the Szegö theorem (see, e.g., Grenander and Szegö, 1958) to show that (iii) meets Condition B. For the asymptotic power of T n,p, we assume the following condition: Condition C. (1 T nm 2 n1 n )µ T p Σ p µ p = o(trm 2 ntrσ 2 p). The following theorem establishes the asymptotic power of T n,p under the local alternative defined by Condition C. Theorem 2.3 Under the alternative H 1 and Conditions A C, we have ( ) ( ) P Tn,p > z 1 α H 1 Φ z 1 α + (1T nm n 1 n )µ T p µ p 0, p, 2trM 2 n trσ 2 p where Φ denotes the distribution function of a standard normal distribution, z 1 α is the upper α-percentile point of the standard normal distribution, and P ( H 1 ) denotes the probability under the alternative H 1. From Theorem 2.3, we note that the asymptotic power of T n,p depends on λ n = 1 T nm n 1 n /(trmn) 2 1/2. Hence, we suppose that selecting the matrix M n which has larger λ n leads to larger power of T n,p. Obviously, the matrix M n which maximizes λ n is given by c1 n 1 T n where c is any constant. However, this matrix does not satisfy m ii = 0 and trmn 4 = o{(trmn) 2 2 } in Condition B. We need to maximize λ n over the matrices which satisfy Condition B. Unfortunately, it is hard to find such matrix even in the case (iii) since the 6

7 numbers of the parameters, we need to determine to maximize λ n, increase as the sample size increases. For this reason, we shall compare T n,p using (i) and (ii) numerically later in the next Section. It should be mentioned that when we use the matrix M n which maximizes λ n subject only to m ii = 0 in Condition B, the proposed statistic T n,p is equivalent to the statistic G n,p given in (1.3), and hence, to F n,p given in (1.2). Thus we note that the power of the proposed test would be lower than the two tests. In this view, we should use the existing test when Σ p is known and the asymptotic distribution is specified using the result of Katayama et.al. (2010). Nevertheless, the proposed test is useful when we have no information about Σ p for the correct type I error. In the rest of this section, we consider the consistency of the proposed test. Assume that the matrix M n is given by (iii). Then we have and n 1 1 T nm n 1 n = 2 (n k)m k = O(n) k=1 n 1 trmn 2 = 2 (n k)m 2 k = O(n), k=1 since lim n n 1 k=1 m k < and n 1 k=1 m2 k ( n 1 k=1 m k ) 2. Hence, Theorem 2.3 and (2.7) yield the following theorem: 1 T nm n 1 n trm 2 n = O( n). (2.7) Theorem 2.4 Assume Conditions A C and that M n has the form given by (iii). Let δ n,p = n 1/2 µ T p µ p /(trσ 2 p) 1/2. Under the alternative H 1, if δ n,p as p, then we have P ( T n,p > z 1 α H 1 ) 1 where P ( H 1 ) denotes the probability under the alternative H 1. 3 Numerical Results In this section, we evaluate performance of the proposed test statistic T n,p with the existing one T by computing the empirical significance level and power. The performance for a DNA microarray data is also evaluated. 7

8 3.1 Computational Complexity of trσ 2 p Before conducting several simulations and a DNA microarray data analysis, we need to mention the computational complexity of trσ 2 p. Indeed, a large amounts of time is required to calculate trσ 2 p since it includes the summation over the set of up to quadruplet, i.e., {(i, j, k, l) 1 i, j, k, l n}. Let a ij = X p (i) T X p (j) for the sake of simplicity. Then it follows that P 3 nu 3 = and = P 4 nu 4 = a ij a ik a 2 ij 2 a ii a ij a 2 ii {( ) 2 } ( ) a ij a 2 ij 2 a ii a ij + 2 j=1 i,j=1 j=1 i,j,k=1 i,j,k,l=1 a ij a kl 2 a ii a jj 2 k a ii a jk 4 a 2 ij 4 ( ) 2 ( )( = a ij 2 a ii i,j=1 ( ) 2 + a ii + 2 k a ij a ik a ii a ij ) a jk 4 j,k=1 a 2 ij + 8 i,j=1 a 2 ii a 2 ii (3.1) {( ) 2 } a ij j=1 ( ) a ii a ij 6 j=1 a 2 ii. (3.2) Thus, when a ij (i, j = 1,..., p) are given, we only require O(n 2 ) operations to calculate trσ 2 p through the above equations (3.1) and (3.2), while it costs as many as O(n 4 ) operations to calculate it directly. 3.2 Simulations We consider three population distributions in the model (2.1): random variables z ij s are i.i.d. as standard normal distribution, standardized t distribution with 5 degrees of freedom and standardized gamma distribution with shape and scale parameters 2 and 2. For the population covariance matrix, 8

9 we choose Σ 1 = (σ i σ j ρ i j ), Σ 2 = (σ i σ j ρ), Σ 3 = (σ i σ j u ij ) where ρ = 0.5, σi 2 = 2 + (p i + 1)/p, u ii = 1 and u ij = u ji (i j) are i.i.d. as U(0, 1), uniform distribution with the support (0, 1). We note that T n,p and T include the unknown parameter trσ 2 p. The estimator (2.5) is used in the following simulations. The empirical significance level and power are calculated based on 10, 000 Monte Carlo simulations. New Test (1) New Test (2) Covariance Population p Σ 1 Normal t Gamma Σ 2 Normal t Gamma Σ 3 Normal t Gamma Table 1: Empirical significance levels for Σ 1, Σ 2 and Σ 3 at 5% and 1% significance based on the normal approximation. 9

10 In Table 1, we evaluate empirical significance levels of the three statistics based on the normal approximation at 5% and 1% significance. We choose n = 80 and p = 100, 300, 500. In the table, New Test (1) and (2) denote the proposed test using (i) and (ii) as M n respectively. Considering trmn 4 = o{(trmn) 2 2 } in Condition B for the asymptotic normality of T n,p, we determine ρ and s in (i) and (ii) such that trmn/(trm 4 n) 2 2 = denotes the test proposed by Chen and Qin (2010). It is found that the proposed test has better approximation for the significance level, particularly at 1%, for the all populations and population covariance matrices considered. We denote η = µ T p µ p /(trσ 2 p) 1/2 for comparing the empirical powers of the three tests. This simulation is conducted at significance 5%. In Figure 2, we show the empirical power curves for n = 80, p = 300, Σ 1, Σ 2 and Σ 3 with standard normal, t and gamma distributions. It is found that the power of the proposed test which uses (ii) as M n is slightly higher than that using (i). As described in Section 2, the power of the proposed test is lower than that of Chen and Qin s test, theoretically. We also find this result in the figure. For improvement of the power of the proposed test, we may need to find a optimal matrix M n in the matrices which satisfy Condition B. This is a future problem. 3.3 A Real Data Example We evaluate performance of the proposed test using the prostate cancer data (Singh et al., 2002). This data contain 12, 600 genes for 102 patients, 52 of which are prostate tumor patients and 50 of which are prostate normal patients. Following Dudoit et al. (2002), we preprocess the data by first truncating on the closed set [1, 5000], second removing genes with max/min 5 or max min 150 where max and min denote the maximum and minimum intensities over 102 patients, and finally transforming all intensities to base-10 logarithms. The number of genes is reduced to p = 2, 745. We use n = 50 observations from each of the tumor and normal patients. Then this setup leads to the situation of (1.1). In Figure 3, we plot the distributions of T n,p and T under H 0 using the parametric bootstrap method, that is, we draw 50 samples from N p (0 p, S n,p ) ten thousand times, and then plot them each time. For the proposed test statistic T n,p, the matrix M n is given by (ii) of which parameter s is determined such that trm 4 n/(trm 2 n) 2 = This shows that the distribution of the proposed test statistic is much closer to the standard normal distribution 10

11 Normal, Σ 1 t, Σ 1 Gamma, Σ 1 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Normal, Σ 2 t, Σ 2 Gamma, Σ 2 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Normal, Σ 3 t, Σ 3 Gamma, Σ 3 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Figure 2: Power curves of the proposed test and Chen and Qin s test for n = 80, p = 300, Σ 1, Σ 2 and Σ 3 with normal, t and gamma distributions. 11

12 than the existing one. Estimated densities of T np and T density Normal New Test Figure 3: Distributions of T n,p and T under H 0 based on the parametric bootstrap method 4 Proofs In this section, we give proofs of the equation (2.4) and Theorem in Section 2. Throughout this section, we describe X i and Z i for X p (i) and respectively. Z (i) p 12

13 4.1 Proof of the equation (2.4) Note that T n,p = m ij (X i µ p ) T (X j µ p ) + 2 = m ij Zi T Cp T C p Z j + 2 m ij µ T p X j m ij µ T p C p Z j + m ij µ T p µ p m ij µ T p µ p. Hence, we have ( ) ( ) Var(T n,p ) = Var m ij Zi T Cp T C p Z j + 4Var m ij µ T p C p Z j + 4 m ij m kl Cov ( ) Zi T Cp T C p Z j, µ T p C p Z l k l = V 1 + 4V 2 + 4V 3, say. Clearly, V 3 = 0 since Z i s are independent and E(Z i ) = 0 p. For the first term V 1, using the vec-operator vec( ) and the Kronecker product (see, e.g., Schott, 2005), it follows that m ij Zi T Cp T C p Z j = vec(z n,p ) T (Cp T C p M n )vec(z n,p ), where Z n,p = (Z 1,..., Z n ) T. Then, from Lemma 2.1 in Srivastava (2009), V 1 = 2tr(C T p C p M n ) 2 = 2trM 2 ntrσ 2 p. For the second term V 2, let Θ = 1 p µ T p. Note that m ij µ T p C p Z j = vec(θ) T (C p M n )vec(z n,p ). Then we obtain V 2 = vec(θ) T (C p M n )(C p M n ) T vec(θ) = (µ p 1 p )(Σ p M 2 n)(µ p 1 p ) = (1 T p M 2 n1 p )µ T p Σ p µ p. Thus, the equation (2.4) is derived. 13

14 4.2 Proof of Theorem 2.1 This proof is based on the results in Srivastava (2009) and Chen and Qin (2010). Let m ij = m ij /(2trM 2 ntrσ 2 p) 1/2 and M n = ( m ij ). Then we have T n,p = n m ijx T i X j 2trM 2 n trσ 2 p = j 1 m ij Xi T X j = 2 j=2 m ij X T i X j = 2 Y j, j=2 where Y j = j 1 m ijx T i X j. We need to show that Q n = j=2 Y j d N(0, 1/4), p. (4.1) We can apply Martingale Central Limit Theorem (Hall and Heyde, 1980) for Q n to show (4.1). Let F n be the σ algebra which is generated by {X 1,..., X n }. Then we have F j 1 F j for any j = 2,..., n. It is easily obtained that E(Q n ) = 0 and E(Q 2 n) <. We note for n > m that Q n = Q m + n j=m+1 Y j and E( n j=m+1 Y j F m ) = 0 since when m+1 j n, E(Xi T X j F m ) = Xi T E(X j ) = 0 for 1 i m and E(Xi T X j F m ) = E(Xi T X j ) = 0 for m + 1 i n 1. Hence we find that the sequence {Q n, F n } is a zero mean and square integrable martingale array. It suffices to show that under Conditions A and B j=2 E(Yj 2 P F j 1 ) 1 4, p (4.2) j=2 E(Y 4 j ) 0, p. (4.3) We first show (4.2). Note that ( j 1 ) 2 E(Yj 2 F j 1 ) = E m ij Xi T X j F j 1 j=2 = j=2 m 2 ijxi T Σ p X i + i<j = A 1 + A 2, say. j 1 j=2 i 1 i 2 m i1 j m i2 jx T i 1 Σ p X i2 14

15 Also we note that E(A 1 ) = i<j m 2 ije(x T i Σ p X i ) = i<j m 2 ijtrσ 2 p = 1 2 tr M 2 ntrσ 2 p = 1 4, and we have ( ) Var(A 1 ) = Var m 2 ijxi T Σ p X i i<j ( n 1 = j=i+1 ( n 1 = j=i+1 m ij ) 2 Var(X T i Σ p X i ) m ij ) 2 [(γ 3) ] p vkk 2 + trσ 4 p, where v ii is the diagonal elements of C T p Σ p C p = (C T p C p )(C T p C p ). The last equation is derived from Lemma 2.1 in Srivastava (2009). Let D p = C T p C p = (d ij ). Following Lemma 2.6 in Srivastava (2009), we note that k=1 p vkk 2 = k=1 ( p p k=1 d 2 ik ) 2 tr(d T p D p ) 2 = trσ 4 p. (4.4) Also we note that under Condition B ( n 1 2 ( 2 ( m ij) 2 m ij) 2 + j=i+1 j=1 i 1 i 2 j=1 m i1 j m i2 j ) 2 = tr M 4 n. Hence, we have Since trσ 4 p/(trσ 2 p) 2 1, we obtain Var(A 1 ) = O(tr M 4 ntrσ 4 p). tr M 4 ntrσ 4 p = trmntrσ 4 4 p 0, p (4.5) 4(trMn) 2 2 (trσ 2 2 p) 15

16 P under Conditions A and B. Thus A 1 1/4 from the Chebyshev s inequality. P Next, we shall show A 2 0. Since n 1 A 2 = Xi T 1 Σ p X i2 m i1 j m i2 j, i 1 i 2 j=min{i 1,i 2 }+1 and E(X T i 1 Σ p X i2 X T i 3 Σ p X i4 ) = 0 when at least one index is different between (i 1, i 2 ) and (i 3, i 4 ), we have E(A 2 ) = 0 and E(A 2 2) = E(Xi T 1 Σ p X i2 ) 2 i 1 i 2 j=min{i 1,i 2 }+1 m i1 j m i2 j 2. Note that E(X T i 1 Σ p X i2 ) 2 = trσ 4 p for i 1 i 2 and m i1 j m i2 j i 1 i 2 j=min{i 1,i 2 }+1 i 1 i 2 j=1 2 ( 2 m i1 j m i2 j) + = tr M 4 n, ( j=1 m 2 ij ) 2 under Condition B. Then we have E(A 2 2) = O(tr M ntrσ 4 4 p). From (4.5), we obtain that E(A 2 2) 0 as p under Conditions A and B. The P Chebyshev s inequality yields A 2 0. Next, we verify (4.3). Note that E(Y 4 j ) = E = E = ( j 1 [ j 1 j 1 m ij X T i X j ) 4 m 2 ij(xi T X j ) 2 + m 4 ije(x T i X j ) j 1 i 1 i 2 j 1 Following Chen and Qin (2010), we have m i1 j m i2 jx T i 1 X j X T i 2 X j ] 2 i 1 i 2 m 2 i 1 j m 2 i 2 je[(x T i 1 X j ) 2 (X T i 2 X j ) 2 ]. E(X T i X j ) 4 = O(trΣ 4 p) + O{(trΣ 2 p) 2 }, i j. 16

17 Also we have from (4.4), E[(Xi T 1 X j ) 2 (Xi T 2 X j ) 2 ] = E[(Xj T Σ p X j ) 2 ] p = (γ 3) vkk 2 + 2trΣ 4 p + (trσ 2 p) 2 = O(trΣ 4 p) + O{(trΣ 2 p) 2 }, i 1 i 2 j. Following Corollary 2.7 in Srivastava (2009), we have k=1 and j 1 j=2 i 1 i 2 j 1 m 4 ij j=2 m 2 i 1 j m 2 i 2 j j=1 i 1 i 2 m 4 ij tr M n 4 m 2 i 1 j m 2 i 2 j ( j=1 m 2 ij ) 2 tr M 4 n. From (4.5) and the fact that tr M 4 n(trσ 2 p) 2 = trm 4 n 4(trM 2 n) 2 0, p under Conditions A and B, we obtain n j=2 E(Y 4 j ) 0, p. This ends the proof. 4.3 Proof of Theorem 2.2 Note that trσ 2 p is invariant under the location transformations X i X i +c p where c p is an arbitrary p-column constant vector. Hence, we assume µ p = 0 without loss of generality. Obviously, we have E(U 2 ) = trσ 2 p and E(U 3 ) = E(U 4 ) = 0. It follows from Chen et al. (2010) that Var(U 2 ) = 4 n 2 (trσ2 p) (γ 3) n trσ4 p + tr[(cp T C p ) 2 (Cp T C p ) 2 ] [ n 1 + O n 3 (trσ2 p) ] n 2 trσ4 p, Var(U 3 ) = 2 n 3 (trσ2 p) [ 1 n 2 trσ4 p + O n 4 (trσ2 p) ] n 3 trσ4 p 17

18 and Var(U 4 ) = 8 [ 1 n 4 (trσ2 p) 2 + O n 5 (trσ2 p) ] n 4 trσ4 p, where the symbol denotes the Hadamard product. Using the fact that tr[(c T p C p ) 2 (C T p C p ) 2 ] tr(c T p C p ) 4 = trσ 4 p, we have Var(U i /trσ 2 p) 0 for i = 2, 3, 4 under Condition A. This and the Chebychev s inequality yield (2.6). 4.4 Proof of Theorem 2.3 Using Theorem 2.2, we have n m ij(x i µ p ) T (X j µ p ) σ n,p d N(0, 1), p, under Conditions A and B. Note that T n,p = m ij (X i µ p ) T (X j µ p ) + 2 m ij µ T p X j m ij µ T p µ p. Hence, it remains to show that n m ijµ T p X j n m ijµ T p µ p σ n,p P 0, p. (4.6) We notice E( n m ijµ T p X j ) = n m ijµ T p µ p and ( ) Var m ij µ T p X j = 4(1 T nmn1 2 n )µ T p Σ p µ p = o(σn,p), 2 under Condition C. The Chebyshev s inequality yields (4.6). The Slutsky s theorem and (4.6) complete the proof. 18

19 References [1] Bai, Z and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica 6: [2] Chen, S. X. and Qin, Y. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics 38: [3] Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for highdimensional covariance matrices. Journal of American Statistical Association 105: [4] Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Jounal of the American Statistical Association 97: [5] Grenander, U. and Szegö, G. (1958). Toeplitz forms and their applications, 2nd edition. New York: Chelsea Publishing Company. [6] Hall, P and Heyde, C. (1980). Martingale limit theory and applications. New York: Academic Press. [7] Katayama, S., Kano, Y. and Srivastava, M.S. (2010). Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension. Submitted. [8] Schott, J. R. (2005). Matrix analysis for statistics, 2nd edition. New York: Wiley. [9] Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff, P. W., Golub, T. R. and Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: [10] Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. Journal of Multivariate Analysis. 100:

On testing the equality of mean vectors in high dimension

ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.