Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Jackknife Empirical Likelihood est for Equality of wo High Dimensional Means Ruodu Wang, Liang Peng and Yongcheng Qi 2 Abstract It has been a long history to test the equality of two multivariate means. One popular test is the so-called Hotelling 2 test. However, as the dimension diverges, the Hotelling 2 test performs poorly due to the possible inconsistency of the sample covariance estimation. o overcome this issue and allow the dimension to diverge as fast as possible, Bai and Saranadasa (996) and Chen and Qin (200) proposed tests without the sample covariance involved, and derived the asymptotic limits which depend on whether the dimension is fixed or diverges under a specific multivariate model. In this paper, we propose a jackknife empirical likelihood test which has a chi-square limit independent of the dimension, and the conditions are much weaker than those in the existing methods. A simulation study shows that the proposed new test has a very robust size with respect to the dimension, and is powerful too. Keywords: Jackknife empirical likelihood, high dimensional mean, hypothesis test. Introduction Suppose X = (X,,, X,d ),, X n = (X n,,, X n,d) and Y = (Y,,, Y,d ),, Y n2 = (Y n2,,, Y n2,d) are two independent random samples with means µ and µ 2, respectively. It has been a long history to test H 0 : µ = µ 2 against H a : µ µ 2 for a fixed dimension d. When both X and Y have a multivariate normal distribution with equal covariance, the well-known test is the School of Mathematics, Georgia Institute of echnology, Atlanta, GA 30332-060, USA. Email: ruodu.wang@math.gatech.edu, peng@math.gatech.edu 2 Department of Mathematics and Statistics, University of Minnesota Duluth, 7 University Drive, Duluth, MN 5582, USA. Email: yqi@d.umn.edu

so-called Hotelling 2 test defined as 2 = η( X Ȳ) A n ( X Ȳ), () where η = (n +n 2 2)n n 2 n +n 2, X = n n X i, Ȳ = n 2 n2 Y i and A n = n (X i X)(X i X) + n2 (Y i Ȳ)(Y i Ȳ). However, when d = d(n, n 2 ), the Hotelling 2 test performs poorly due to the possible inconsistency of the sample covariance estimation. When d/(n + n 2 ) c (0, ), Bai and Saranadasa (996) derived the asymptotic power of 2. o overcome the restriction c <, Bai and Saranadasa (996) proposed to employ M n = ( X Ȳ) ( X Ȳ) η tr(a n ) instead of 2 under a special multivariate model without assuming multivariate normality while keeping the condition of equal covariance, and derived the asymptotic limit when d/n c > 0. Recently Chen and Qin (200) proposed to use the following test statistic CQ = n i j X i X n2 j n (n ) + i j Y i Y j n 2 (n 2 ) 2 n n2 X i Y j (2) n n 2 in order to allow d to be a possible larger order than that in Bai and Saranadasa (996). Again, the asymptotic limit of the proposed test statistic CQ depends on whether the dimension is fixed or diverges, which results in either a normal limit or a chi-square limit, and special models for {X i } and {Y i } are employed. Another modification of Hotelling 2 test is proposed by Srivastava and Du (2008) and Srivastava (2009) with the covariance matrix replaced by a diagonal matrix. Rates of convergence for high dimensional means are studied by Kuelbs and Vidyashankar (200). For nonasymptotic studies of high dimensional means, we refer to Arlot, Blanchard and Roquain (200a,b). Here, we are interested in seeking a test which does not need to distinguish whether the dimension is fixed or diverges. By noting that µ = µ 2 is equivalent to (µ µ 2 ) (µ µ 2 ) = 0, one may think of applying an empirical likelihood test to the estimating equation E{(X i Y j ) (X i2 Y j2 )} = 0 for i i 2 2

and j j 2. If one directly applies the empirical likelihood method based on estimating equations proposed in Qin and Lawless (994) by using the samples X,, X n and Y,, Y n2, then one may define the empirical likelihood function as sup{{ n (n p i )}{ n 2 (n 2q j )} : p 0,, p n 0, q 0,, q n2 0, n p i =, n2 q j =, n n2 i = i 2 i j = j 2 j (p i X i q j Y j ) (p i2 X i2 q j2 Y j2 ) = 0}, which makes the minimization unsolvable. he reason is that the estimating equation defines a nonlinear functional, and in general one has to linearize the nonlinear functional before applying the empirical likelihood method. For more details on empirical likelihood methods, we refer to Owen (200) and the review paper of Chen and Van Keilegom (2009). Recently, Jing, Yuan and Zhou (2009) proposed a so-called jackknife empirical likelihood method to construct confidence regions for nonlinear functionals with a particular focus on U-statistics. Using this idea, one needs to construct a jackknife sample based on the following estimator n (n ) n 2 (n 2 ) n i i 2 n2 j j 2 (X i Y j ) (X i2 Y j2 ), which equals the statistic CQ given in (2). However, in order to have the jackknife empirical likelihood method work, one has to show that n n 2 n (n ) n 2 (n 2 ) n i i 2 n2 j j 2 (X i Y j ) (X i2 Y j2 ) has a normal limit when µ = µ 2. Consider n = n 2 = n, d = and µ = µ 2. hen it is easy to see that = n (n ) 2 n i i 2 n n n { (X i Y i )} 2 n d (χ 2 )E(X Y ) 2 j j 2 (X i Y j ) (X i2 Y j2 ) n (X i Y i ) 2 2 + n(n ) n X i n Y j 2 (n ) n X i Y i as n, where χ 2 denotes a random variable having a chi-square distribution with degree of freedom. Obviously, the above limit is not normally distributed. Hence a direct application of the jackknife empirical likelihood method to the statistic CQ will not lead to a chi-square limit. In this paper, we propose a novel way to formulate a jackknife empirical likelihood test for testing H 0 : µ = µ 2 against H a : µ µ 2 by dividing the samples into two parts. he proposed new test 3

has no need to distinguish whether the dimension is fixed or goes to infinity. It turns out that the asymptotic limit of the new test under H 0 is a chi-square limit independent of the dimension, the conditions on p and random vectors {X i } and {Y j } are weaker too. A simulation study shows that the size of the new test is quite stable with respect to the dimension and the proposed test is powerful as well. We organize this paper as follows. In Section 2, the new methodology and main results are given. Section 3 presents a simulation study and a real data analysis. All proofs are put in Section 4. 2 Methodology hroughout assume X i = (X i,,, X i,d ) for i =,, n and Y j = (Y j,,, Y j,d ) for j =,, n 2 are two independent random samples with means µ and µ 2, respectively. Assume min{n, n 2 } goes to infinity. he question is to test H 0 : µ = µ 2 against H a : µ µ 2. Since µ = µ 2 is equivalent to (µ µ 2 ) (µ µ 2 ) = 0 and E(X i Y j ) (X i2 Y j2 ) = (µ µ 2 ) (µ µ 2 ) for i i 2 and j j 2, we propose to apply the jackknife empirical likelihood method to the above estimating equation. As explained in the introduction, a direct application fails to have a chi-square limit. Here we propose to split the samples into two groups as follows. Put m = [n /2], m 2 = [n 2 /2], m = m + m 2, Xi = X i+m for i =,, m, and Ȳj = Y j+m2 for j =,, m 2. First we propose to estimate (µ µ 2 ) (µ µ 2 ) by m m 2 (X i Y j ) ( m m X i Ȳj), (3) 2 which is less efficient than the statistic CQ. However, it allows us to add more estimating equations and to employ the empirical likelihood method without estimating the asymptotic covariance. By noting that E{(X i Y j ) ( X i Ȳj)} = (µ µ 2 ) (µ µ 2 ) = µ µ 2 2 instead of O( µ µ 2 ), one may expect that a test based on (3) will not be powerful for a small value of µ µ 2, confirmed by a brief simulation study. In order to improve the power, we propose to apply the jackknife empirical 4

likelihood method in Jing, Yuan and Zhou (2009) to both (3) and a linear functional such as m m 2 m m 2 {α (X i Y j ) + α ( X i Ȳj)} (4) rather than only (3), where α R d is a vector chosen based on prior information. heoretically, when no additional information is available, any linear functional is a possible choice and α = (,, ) R d is a convenient one. Note that more linear functionals in equation (4) with different choices of α can be added to further improve the power. In the simulation study we apply the jackknife empirical likelihood to (3) and (4) with α = (,, ) R d, which results in a test with good power and quite robust size with respect to the dimension. As in Jing, Yuan and Zhou (2009), based on (3) and (4), we formulate the jackknife sample as Z k = (Z k,, Z k,2 ) for k =,, m, where Z k, = m +m 2 m m 2 Z k,2 = m +m 2 m m 2 for k =,, m, and Z k, = m +m 2 m m 2 m m2 m +m 2 (m )m 2 m i k, (X i Y j ) ( X i Ȳj) m2 (X i Y j ) ( X i Ȳj) m m2 {α (X i Y j ) + α ( X i Ȳj)} m +m 2 (m )m 2 m i k, m +m 2 m (m 2 ) Z k,2 = m +m 2 m m 2 m m2 m m +m 2 m (m 2 ) m m2 m m2 {α (X i Y j ) + α ( X i Ȳj)} (X i Y j ) ( X i Ȳj) m2 j k m, (X i Y j ) ( X i Ȳj) {α (X i Y j ) + α ( X i Ȳj)} m2 j k m, {α (X i Y j ) + α ( X i Ȳj)} for k = m +,, m. Based on this jackknife sample, the jackknife empirical likelihood ratio function for testing H 0 : µ = µ 2 is defined as m m m L m = sup{ (mp i ) : p 0,, p m 0, p i =, p i Z i = (0, 0) }. By the Lagrange multiplier technique, we have p i = m { + β Z i } for i =,, m and l m := 2 log L m = 2 5 m log{ + β Z i },

where β satisfies m m Z i + β Z i = (0, 0). (5) Write Σ = (σ i,j ) i d, j d = E{(X µ )(X µ ) }, the covariance matrix of X, and use λ λ d to denote the d eigenvalues of the matrix Σ. Similarly, write Σ = ( σ i,j ) i d, j d = E{(Y µ 2 )(Y µ 2 ) } and use λ λ d to denote the d eigenvalues of the matrix Σ. Also write d d ρ = σi,j 2 = tr(σ 2 ), ρ 2 = σ i,j 2 = tr( Σ 2 ), τ = 2α Σα, τ 2 = 2α Σα. (6) i, i, Here tr means trace for a matrix. Note that ρ = E[(X µ ) (X 2 µ )] 2, ρ 2 = E[(Y µ 2 ) (Y 2 µ 2 )] 2, τ = 2E[α (X µ )] 2 and τ 2 = 2E[α (Y µ 2 )] 2, and these quantities may depend on n, n 2 since d may depend on n, n 2. heorem. Assume min{n, n 2 }, τ and τ 2 in (6) are positive, and for some δ > 0, E (X µ ) ( X µ ) 2+δ ρ (2+δ)/2 E (Y µ 2 ) (Ȳ µ 2 ) 2+δ ρ (2+δ)/2 2 E α (X + X 2µ ) 2+δ τ (2+δ)/2 = o(m δ+min(δ,2) 4 ), (7) = o(m δ+min(δ,2) 4 2 ), (8) = o(m δ+min(δ,2) 4 ), (9) and E α (Y + Ȳ 2µ 2 ) 2+δ τ (2+δ)/2 2 = o(m δ+min(δ,2) 4 2 ). (0) hen, under H 0 : µ = µ 2, l m converges in distribution to a chi-square distribution with two degrees of freedom as min{n, n 2 }. Based on the above theorem, one can test H 0 : µ = µ 2 against H a : µ µ 2 by rejecting H 0 when l m χ 2 2,γ, where χ2 2,γ denotes the ( γ) quantile of a chi-square distribution with two degrees of freedom and γ is the significant level. 6

Remark. In equations (7) (0), the restrictions are put on E W 2+δ /(EW 2 ) (2+δ)/2 for some random variables W, which are necessary for the CL to hold for random arrays. Later we will see those conditions are satisfied by imposing some conditions on the higher-order moments or special dependence structure. Remark 2. he proposed test has the following merits:. he limiting distribution is always chi-square without estimating the asymptotic covariance. 2. It does not require any specific structure such as the one used in Bai and Saranadasa (996) and Chen and Qin (200), which will be discussed later. 3. With higher-order moment condition or special dependence structure of {X i } and {Y i }, d can be very large. 4. here is no restriction imposed on the relation between n and n 2 except that min{n, n 2 }. hat is, no need to assume a limit or bound on the ratio n /n 2. Moreover, no assumptions are needed on ρ /ρ 2 or τ /τ 2. Hence the covariance matrices Σ and Σ 2 can be arbitrary as long as τ,τ 2 > 0, which are simply equivalent to saying that both α X and α Y are non-degenerate random variables. Next we verify heorem by imposing conditions on the moments and dimension of the random vectors. A: 0 < lim inf λ lim sup n n λ d < and 0 < lim inf n 2 λ lim sup λ d < ; n 2 A2: For some δ > 0, d d E{ X,i µ,i 2+δ + Y,i µ 2,i 2+δ } = O(); A3: d = o(m δ+min(δ,2) 2(2+δ) ). Corollary. Assume min{n, n 2 } and conditions A A3 hold. hen, under H 0 : µ = µ 2, conditions (7) (0) are satisfied, i.e., heorem holds. 7

Condition A3 is a somewhat restrictive condition for the dimension d. Note that conditions A and A2 are related only to the covariance matrices and some higher moments on the components of the random vectors. he higher moments we have, the less restriction is imposed on d. Condition A3 can be removed for models with some special dependence structures. For comparisons, we prove the Wilks theorem for the proposed jackknife empirical likelihood test under the following model B considered by Bai and Saranadasa (996), Chen, Peng and Qin (2009) and Chen and Qin (200): B (Factor model). X i = Γ B i +µ for i =,, n, Y j = Γ 2 Bj +µ 2 for j =,, n 2, where Γ, Γ 2 are d k matrices with Γ Γ = Σ, Γ 2Γ 2 = Σ, {B i = (B i,,, B i,k ) } n and { B j = ( B j,,, B j,k ) } n 2 are two independent random samples satisfying that EB i = E B i = 0, Var(B i ) = Var( B i ) = I k k, EB 4 i,j = 3 + ξ <, E B 4 i,j = 3 + ξ 2 <, E k l= Bν l i,l = k l= EBν l i,l and E k l= B ν l i,l = k l= E B ν l i,l whenever ν + + ν k = 4 for distinct nonnegative integers ν l s. heorem 2. Assume min{n, n 2 }, and τ and τ 2 in (6) are positive. Under model B and H 0 : µ = µ 2, l m converges in distribution to a chi-square distribution with two degrees of freedom as min{n, n 2 }. Remark 3. It can be seen from the proof of heorem 2 that assumptions EB 4 i,j = 3 + ξ < and E B 4 i,j = 3+ξ 2 < in model B can be replaced by the much weaker conditions max j k EB 4,j = o(m) and max j k E B 4,j = o(m). Unlike Bai and Saranadasa (996) and Chen and Qin (200), there is no restriction on d and k for our proposed method. he only constraint imposed on matrices Γ and Γ 2 is that both α Σα and α Σα are positive, which is very weak. 3 Simulation study and data analysis 3. Simulation study We investigate the finite sample behavior of the proposed jackknife empirical likelihood test (JEL) and compare it with the test statistic in (2) proposed by Chen and Qin (200) in terms of both size 8

and power. Let W,, W d be iid random variables with the standard normal distribution function, and let W,, W d, independent of W i s be iid random variables with distribution function t(8). Put X, = W, X,2 = W +W 2,, X,d = W d +W d, Y, = W +µ 2,, Y,2 = W + W 2 +µ 2,2,, Y,d = W d + W d + µ 2,d, where µ 2,i = c if i [c 2 d], and µ 2,i = 0 if i > [c 2 d]. hat is, 00c 2 % of the components of Y have a shifted mean compared to that of X. Since we test H 0 : EX = EY against H a : EX EY, the case of c = 0 denotes the size of tests. By drawing, 000 random samples of sizes n = 30, 00, 50 from X = (X,,, X,d ) and independently drawing, 000 random samples of sizes n 2 = 30, 00, 200 from Y = (Y,,, Y,d ) with d = 0, 20,, 00, 300, 500, c = 0, 0. and c 2 = 0.25, 0.75, we calculate the powers of the two tests mentioned above. In ables 3, we report the empirical sizes and powers for the proposed jackknife empirical likelihood test with α = (,, ) R d and the test in Chen and Qin (200) at level 5%. Results for level 0% are similar. From these three tables, we observe that (i) the sizes of both tests, i.e., results for c = 0 are comparable and quite stable with respect to the dimension d; (ii) the proposed jackknife empirical likelihood test is more powerful than the test in Chen and Qin (200) for the case of c 2 = 0.75 and the case when the data is sparse, but d is large (i.e., the case of c = 0., c 2 = 0.25). Since equation (4) has nothing to do with sparsity, it is expected that the proposed jackknife empirical likelihood method is not powerful when the data is sparse. Hence, it would be of interest to connect sparsity with some estimating equations so as to improve the power of the proposed jackknife empirical likelihood test. In conclusion, the proposed jackknife empirical likelihood test has a very stable size with respect to the dimension and is powerful as well. Moreover, the new test is easy to compute, flexible to take other information into account, and works for both fixed dimension and divergent dimension. In comparison with the test in Chen and Qin (200), the new test has a comparable size, but is more powerful when 9

the data is not very sparse. Some further research on formulating sparsity into estimating equations will be pursued in future. 3.2 Data analysis In this subsection we employ the proposed method to the Colon data with 2000 gene expression levels on 22 (n ) normal colon tissues and 40 (n 2 ) tumor colon tissues. his data set is available from the R package plsgenomics, and has been anlayzed by Alon et al. (999) and Srivastava and Du (2009). he p-values of three tests proposed by Srivastava and Du (2009) equal to.38e-06, 4.48e- and 0.00000, which clearly reject the null hypothesis that the tumor group has the same gene expression levels as the normal group. A direct application of the proposed jackknife empirical likelihood method and the CQ test for testing the equality of means gives p-values.36e-0 and 5.06e-09, respectively, which clearly result in a contradiction. Although the test in Chen and Qin (200) and the three tests in Srivastava and Du (2009) clearly reject the null hypothesis, the p-values are significantly different. A closer look at the difference of sample means (see Figure ) shows that some genes have a significant difference of sample means and a high variability, which may play an important role in the CQ test and the equation (4) with α = (,, ) R d of the proposed jackknife empirical likelihood method. o examine this effect, we apply the methods to those genes without the significant difference in sample means and the logarithms of the 2000 gene expression levels. First we apply the proposed jackknife empirical likelihood method and the CQ test to those genes satisfying n n X i,j n 2 n2 Y i,j c 3 for some given threshold c 3 > 0. In able 4, we report the p-values for these two methods for different c 3, which confirm that some genes play an important role in rejecting the equality of means in the CQ test. Figure clearly shows that some genes have a large positive mean and some genes have a large negative mean, and the equation (4) with the simple a = (,,, ) can not catch this characteristic. 0

Here, we propose to replace (4) by m m 2 (X i Y j ){I( m m X i Ȳj > 0) I( X i Ȳj < 0)}, () 2 which results in the P-value 2.63e-03, and so we reject the null hypothesis that the tumor group has the same gene expression levels as the normal group. Although the derived theorems based on (3) and (4) can not be applied to (3) and () directly, results hold under some similar conditions by noting the boundedness of I( X i Ȳj > 0) I( X i Ȳj < 0). It is well known that gene expression data are quite noisy and some transformation and normalization are needed before doing statistical analysis; see Chapter 6 of Lee (2004). Here we apply the CQ test and the proposed empirical likelihood methods based on (3) and (4), and (3) and () to testing the equality of means of the logarithms of the 2000 gene expression levels on normal colon tissues and tumor colon tissues, which give p-values 0.84, 0.206 and 0.48, respectively. hat is, one can not reject the equality of means of the logarithms of the 2000 gene expression levels. We plot the differences of sample means of the logarithms in Figure 2, which are much less volatile than the differences of sample means in Figure. In summary, carefully choosing α in the empirical likelihood method is needed when it is applied to the colon data, which has a small sample size and a large variation. However simply choosing α = (,, ) in the empirical likelihood method gives a similar result as the test in Chen and Qin (200) for testing the equal means of the logarithms of Colon data. 4 Proofs In the proofs we use to denote the L 2 norm of a vector or matrix. Since µ µ 2 is our target and under the null hypothesis µ µ 2 = 0, without loss of generality we assume µ = µ 2 = 0. Write u ij = (X i Y j ) ( X i Ȳj) and v ij = α (X i Y j ) + α ( X i Ȳj) for i m, j m 2. hen it

is easily verified that for i, k m, j, l m 2, E(u ij ) = E(v kl ) = E(u ij v kl ) = 0, and d Var(u kl ) = (σi,j 2 + σ i,j) 2 = ρ + ρ 2, i, Var(v kl ) = 2α (Σ + Σ)α = τ + τ 2. Lemma. Under conditions of heorem, we have as min{n, n 2 } m m X i d N(0, ), (2) ρ X i and Proof. Since Var(X i m 2 m2 m m m 2 m2 Yj Ȳ j ρ2 d N(0, ), (3) α (X i + X i ) τ d N(0, ), (4) α (Y j + Ȳj) τ2 d N(0, ). (5) X i ) = ρ and X X,, X m Xm are i.i.d. for fixed m, equation (2) follows from (7) and the Lyapunov central limit theorem. he rest can be shown in the same way. From now on we denote ρ = m m ρ + m m 2 ρ 2 and τ = m m τ + m m 2 τ 2. Lemma 2. Under conditions of heorem, we have as min{n, n 2 } m m m m 2 ρ X i m 2 Ȳ j p 0, (6) m α p X i 0, (7) m τ 2

and m m 2 m m2 α p Y j 0, (8) m 2 τ m m m 2 m 2 m m 2 m 2 m m 2 X i (X i X i ) 2 ρ p, (9) (Yj Ȳ j ) 2 p, (20) ρ 2 [α (X i + X i )] 2 τ p, (2) [α (Y j + Ȳj)] 2 τ 2 p, (22) X i [α (X i + X i )] ρ τ p 0, (23) Yj Ȳ j [α (Y j + Ȳj)] p 0. (24) ρ2 τ 2 Proof. Note that µ = µ 2 = 0 is assumed in Section 4. hen (6) follows from the fact that m m Var( m m 2 ρ X i m 2 2 Y j ) = E m m m 2 m 2 m2 2 ρ Xi Y j = E m m m 2 m 2 m2 2 ρ (Xi Y j ) 2 [ ] m = E m m 2 ρ (X Y ) 2 = m m m 2 ρ d σ ij σ ij i, m ρ + ρ 2 m + m 2 2ρ 2 ( m + m 2 ) = o(). In the same way, we can show (7) and (8). 3

o show (9), write u i = X i X i. We need to estimate E m u2 i m ρ (2+δ)/2. Note that ρ = Eu 2. When 0 < δ 2, it follows from von Bahr and Esseen (965) that m E u 2 i m ρ (2+δ)/2 2m E u 2 E(u 2 ) (2+δ)/2 = O(m E u 2+δ ). (25) When δ > 2, it follows from Dharmadhikari and Jogdeo (969) that m E u 2 i m ρ (2+δ)/2 = O(m (2+δ)/4 E u 2 E(u 2 ) (2+δ)/2 ) = O(m (2+δ)/4 E u 2+δ ). (26) herefore, by (25), (26) and (7) we have for any ε > 0 m P( u2 i > ε) m ρ ε (2+δ)/2 E m u2 i mρ (2+δ)/2 (m ρ ) (2+δ)/2 = O(m (δ+min(δ,2))/4 E u ρ 2+δ ) = o(), which implies (9). he rest can be shown in the same way. Lemma 3. Under conditions of heorem, we have as min{n, n 2 } m m 2 m u ij ρ m m 2 m m m 2 m 2 m2 2 ρ m2 m m 2 m2 2 ρ k= ( m k= m m m 2 m 2 m2 2 τ m2 m m 2 m2 2 τ k= ( m k= v ij τ u kj d N(0, I 2 ), (27) 2 mρ m ρ u ik ) 2 mρ 2 m 2 ρ v kj 2 mτ m τ v ik ) 2 mτ 2 m τ p 0, (28) p 0, (29) p 0, (30) p 0, (3) 4

m 2 m2 2 m 2 m2 2 where I 2 is the 2 2 identity matrix. m m m 2 ρτ k= m m 2 m ρτ m 2 u ki v kj m u ik v jk k= p 0, (32) p 0, (33) Proof. It follows from Lemma 2 that m m m 2 m 2 m u ij ρ = = = m2 m m m 2 ρ m m X m ρ i mρ m ρ m m m (Xi X i + X i = a m A m + b m B m + o p (), X i + Yj Ȳ j Xi m2 m m 2 ρ X i mρ2 + ρ m2 ρ Yj Ȳ j m2 m 2 Ȳ j Y j X i ) m2 m m m 2 ρ Yj Ȳ j m (Xi ρ2 + o p () Ȳ j + Y j X i ) where a m = mρ m ρ, b m = mρ2 m2 ρ, A m = m m X i X i ρ d N(0, ) and Bm = m2 Yj Ȳj m2 ρ2 d N(0, ). Obviously a 2 m + b 2 m = and A m, B m are independent. Denote the characteristic functions of A m and B m by Φ m and Ψ m, respectively. hen, E exp(it(a m A m + b m B m )) = E exp(ita m A m )E exp(itb m B m ) = Φ m (ta m )Ψ m (tb m ) = [exp( (ta m) 2 ) + o()][exp( (tb m) 2 ) + o()] 2 2 = exp( t2 2 ) + o(), i.e., Similarly, we have m m m 2 m 2 m u ij ρ d N(0, ). (34) m m m 2 m 2 m v ij τ = mτ m τ m m α (X i + X i ) τ mτ2 m2 τ m 2 m2 α (Y j + Ȳj) τ2 d N(0, ). 5

Let a and b be two real numbers with a 2 + b 2 0. Note that m m m 2 m 2 m (a u ij + b v ij ) ρ τ mρ = a m ρ = +b mτ m τ ( a mρ m ρ + a mρ 2 m2 ρ m m m m m m = I + I 2 + o p (). Xi ρ X i m 2 m2 X i mρ2 m2 ρ m2 m 2 α (X i + X i ) mτ2 + τ m2 τ X i + b mτ ρ m τ Yj Ȳ j b mτ 2 ρ2 m2 τ m m m 2 m2 Yj Ȳ j ρ2 m 2 m2 α (X i + X ) i ) τ α (Y j + Ȳj) + o p () τ2 α (Y j + Ȳj) + o p () τ2 Since mρ m ρ, mρ2 m2 ρ, mτ m τ, mτ2 m2 τ are all bounded by one, it is easy to check that I and I 2 satisfy the Lyapunov condition by (7) - (0). herefore and Since X i {a 2 mρ m ρ + b2 mτ m τ } /2 I d N(0, ) {a 2 mρ 2 m 2 ρ + b2 mτ 2 m 2 τ } /2 I 2 d N(0, ). s are independent of Y i s, it follows from the same arguments in proving (34) that I + I 2 d N(0, a 2 + b 2 ), i.e., (27) holds. o prove (28), we write m m m 2 m2 2 ρ k= = m m 2 m2 2 ρ m = m m 2 ρ m k= k= ( m2 ) 2 u kj ( m2 ( X k (X X k k + Yj X k + m 2 m2 Y j Ȳ j Y j ) 2 X k Xk Ȳj) Ȳ j m2 m 2 Y j X k X k (35) ) 2 m2 m 2 Ȳj. 6

Since mρ /m ρ, it follows from Lemma 2 that m m m 2 ρ (Xk X k ) 2 mρ m ρ k= p 0. (36) By Lemma, we have m m m 2 ρ m 2 m 2 k= Y j Ȳ j 2 = O p ( mρ 2 m m 2 ρ ) = o p(). (37) A direct calculation shows that which implies that Similarly we have It follows from (36) and (38) that E{ m2 m 2 Y j X k } 2 = E{( m 2 m2 Y j ) X k X k ( m 2 m2 Y j)} = Etr{ m2 m 2 Y j X k X k ( m 2 m2 Y j)} = Etr{ X k X k ( m 2 m2 Y j)( m 2 m2 Y i )} = tre{ X k X k ( m 2 m2 Y j)( m 2 m2 Y i )} = tr{σ m 2 Σ} = O( ρ +ρ 2 m 2 ) = O( m ρ m 2 m ) + O( ρ 2 m 2 ) = o( m ρ m ), m m m 2 ρ { m 2 m 2 k= m m m 2 ρ Y j {X m 2 k m 2 k= X k } 2 = o p (). (38) Ȳ j } 2 = o p (). (39) m m m 2 ρ k= (X X k k )( m2 m 2 Y j X k ) { m m m 2 ρ k= (X X k k ) 2 } /2 { m m m 2 ρ k= ( m2 m 2 Y j X k ) 2 } /2 (40) = O p ()o p () = o p (). 7

Similarly we can show that m m m 2 ρ k= (X X k k )( m2 m 2 Y j Ȳ j ) = o p () m m 2 ρ m k= (X k X k )(X k m 2 m2 Ȳ j ) = o p() m m 2 ρ m k= ( m 2 m2 Y j Ȳ j )( m2 m 2 Y i X k ) = o p () (4) m m 2 ρ m k= ( m 2 m2 Y j m m m 2 ρ k= ( m2 m 2 Y j Ȳ j )(X k X k )(X k m 2 m2 Ȳi) = o p () m 2 m2 Y i) = o p (). Hence (28) follows from (35) (4). he rest can be shown in the same way as proving (28). Lemma 4. Under conditions of heorem, we have as min{n, n 2 } m m k= Z k, ρ Z k,2 τ d N(0, I 2 ), (42) Moreover, we have mρ mτ m ρτ m Zk, 2 p 0, (43) k= m Zk,2 2 p 0, (44) k= m p Z k, Z k,2 0. (45) k= max Z k, = o p (m /2 ) and k m ρ max Z k,2 = o p (m /2 ). (46) k m τ Proof. Note that for k m, and for m + k m, Z k, = Z k,2 = (m )m (m )m m 2 m m 2 m u ij + m + m 2 (m )m 2 v ij + m + m 2 (m )m 2 m 2 m 2 u kj, v kj, Z k, = (m 2 )m 2 m 2 m u ij + m + m 2 (m 2 )m m u i,k m, 8

hus m 2 m Z k,2 = (m 2 )m 2 u ij + m + m 2 (m 2 )m m v i,k m. m m k= Z k, ρ = = ( m m 2 + m + m + m 2 + m + m 2 m 2 ) (m )m 2 (m 2 )m m m m 2 m 2 m u ij ρ m u ij ρ and which imply (42) by using Lemma 3. It follows from Lemma 3 that mρ = mρ + mρ m k= m k= Z 2 k, m m k= m 2 m (m )m m 2 m 2 m (m 2 )m 2 k= m 2 m = { (m ) u ij } 2 + m mρ Z k,2 τ = m m m 2 u ij + m + m 2 (m )m 2 m 2 m u ij + m + m 2 (m 2 )m v ij τ, m 2 u kj m u ik (m ) 2 m (m ) 2 m 2 2 mρ k= 2 2 m 2 ( u kj ) 2 m m 2 m 2{( mρ(m ) 2 ) /2 u ij } 2 m 2 m + { m m 2 (m 2 ) u ij } 2 m 2 mρ m 2 m (m ) 2 m 2 m + mρ(m 2 ) 2 m 2 ( u ik ) 2 m 2{( mρ(m 2 ) 2 ) /2 u ij } 2 m 2 m = {O p ( m m m +{O p ( m 2 m2 m k= = mρ m ρ + mρ 2 m 2 ρ + o p() = + o p (), m m 2 m )} 2 + (m )2 m 2 (m ) 2 m 2 {mρ m m 2 m )} 2 + (m )2 m 2 2 m 2 (m 2 ) 2 {mρ 2 m ρ + o m m p()} + {O p ( 2 )} 2 m m m 2 m m 2 ρ + o m m p()} + {O p ( 2 )} 2 m 2 m2 m m i.e., (43) holds. Similarly we can show (44) and (45). 9

Since E(( m u ij) 2 ) = Var( m u ij) = m (ρ + ρ 2 ), we have which implies that Similarly we have m m 2 m E( max ( u ij ) 2 ) E(( u ij ) 2 ) = m m 2 (ρ + ρ 2 ) j m 2 m max u ij = O p ( m 2 m (ρ + ρ 2 )). j m 2 m 2 max u ij = O p ( m 2 m (ρ + ρ 2 )). m Hence by Lemma 3 and the expression for Z k,, we have max k m Z k, ρ (m )m m m2 u ij ρ + max k m Similarly we can show that + (m 2 )m 2 m m2 u ij ρ + max k m2 = o p () + O p ( m (m )m 2 ρ {m m 2 (ρ + ρ 2 )} /2 ) +o p () + O p ( m (m 2 )m ρ {m m 2 (ρ + ρ 2 )} /2 ) = o p () + O p ( m /2 (min(m,m 2 )) /2 ) = o p (m /2 ). max Z k,2 = o p (m /2 ). k m τ m m2 (m )m 2 m m (m 2 )m u kj ρ u jk ρ Proof of heorem. It follows from Lemma 4 and the standard arguments in empirical likelihood method (see Owen (990)). o show Corollary and heorem 2, we first prove the following lemmas. Lemma 5. tr(σ 4 ) = O ( (tr(σ 2 )) 2), ρ = d λ2 j, and 2 α 2 λ τ 2 α 2 λ d. Proof. Since tr(σ j ) = d λj i for any positive integer j, the first equality follows immediately. he second equality follows since ρ = tr(σ 2 ). he third pair of inequalities on τ come from the definition of τ. 20

Lemma 6. For any δ > 0 E X X 2+δ d δ ( d E X,i 2+δ ) 2 and E α (X + X d ) 2+δ 2 4+δ α 2+δ d δ/2 E X,i 2+δ. Proof. It follows from the Cauchy-Schwarz inequality that X X 2 X 2 X 2. hen by using the C r inequality we conclude that ( d (2+δ)/2 ( d E X X 2+δ E X,i) 2 E ( d ) (2+δ)/2 2 = E ( d δ/2 X 2,i ) 2 d E X,i 2+δ X 2,i ) (2+δ)/2 = d δ ( d E X,i 2+δ ) 2. Similarly, from the C r inequality we have E α (X + X ) 2+δ 2 4+δ E α X 2+δ 2 4+δ α 2+δ E( X 2+δ ) = 2 4+δ α 2+δ d +δ/2 E X,i 2 d 2 4+δ α 2+δ d δ/2 E X,i 2+δ. his completes the proof. Proof of Corollary. Equations (7) and (9) follow from conditions A A3 by using Lemmas 5 and 6. So do equations (8) and (0), since we have the same assumptions on {X i } and {Y j }. 2

Proof of heorem 2. If suffices to verify conditions (7) and (9) with δ = 2 in heorem. Recall we assume that µ = µ 2 = 0. Note that Var(X ) = Σ = Γ Γ. Denote α Γ = (a,, a k ) and Σ = Γ Γ = (σ j,l ) j,l k. hen and X X = k α (X + X ) = l= k σ j,l B,jB +m,l, k a j (B,j + B +m,j). Set δ j,j 2,j 3,j 4 = E(B,j B,j2 B,j3 B,j4 ). hen δ j,j 2,j 3,j 4 equals 3 + ξ if j = j 2 = j 3 = j 4, equals if j, j 2, j 3 and j 4 form two different pairs of integers, and is zero otherwise. By Lemma 5, we have E(X X ) 4 = = O = O i.e., (7) holds with δ = 2. k k j,j 2,j 3,j 4 = l,l 2,l 3,l 4 = σ j,l σ j 2,l 2 σ j 3,l 3 σ j 4,l 4 δ j,j 2,j 3,j 4 δ l,l 2,l 3,l 4 ( j j2 l l 2 σ j,l σ j,l 2 σ j 2,l σ j 2,l 2 ( k +O l l 2 σ 2 j,l σ 2 j,l 2 ) + O ( k ) ( + O k j j 2 l= σ 2 ) k l= σ 4 j,l ( k k k ) k j = j 2 = l = l 2 = σ j,l σ j,l 2 σ j 2,l σ j 2,l 2 ( k +O k ) ( k k j = j 2 = l= σ 2 j,l σ 2 j 2,l + O k l= σ 4 j,l = O ( tr(σ 4 ) ) ( k ) + O k l= σ 2 j,l )2 = O ( tr(σ 4 ) ) + O ( (tr(σ 2 )) 2) = O ( tr(σ 4 ) ) + O ( (tr(σ 2 )) 2) = O(ρ 2 ), ) j,l σ 2 j 2,l ) 22

Similarly we have E(α (X + X ( k ) 4 )) 4 2 4 E a j B,j = O k j,j 2 = = O ( k k a 2 j a 2 j 2 + O a 2 j ) 2 ( ( ) ) 2 = O α Γ Γ α = O ( τ 2 ), a 4 j which yields (9) with δ = 2. Equations (8) and (0) can be shown in the same way. Hence heorem 2 follows from heorem. Acknowledgment. We thank an associate editor and two reviewers for helpful comments. Wang s research was supported by the Bob Price fellowship in the School of Mathematics at Georgia ech. Peng s research was supported by NSF Grant DMS-005336 and Qi s research was supported by NSF grant DMS-005345. References [] Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S. Mack, D. and Levin A.J. (999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96, 6745 6750. [2] Arlot, S., Blanchard, G. and Roquain, E. (200a). Some nonasymptotic results on resampling in high dimension I: confidence regions. Ann. Statist. 38, 5 82. [3] Arlot, S., Blanchard, G. and Roquain, E. (200b). Some nonasymptotic results on resampling in high dimension II: multiple tests. Ann. Statist. 38, 83 99. 23

[4] Bai, Z. and Saranadasa, H. (996). Effect of high dimension: by an example of a two sample problem. Statist. Sinica 6, 3 329. [5] Chen, S.X., Peng, L. and Qin, Y. (2009). Empirical likelihood methods for high dimension. Biometrika 96, 7 722. [6] Chen, S. and Qin, Y. (200). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38, 808 835. [7] Chen, S. and Van Keilegom, I. (2009). A review on empirical likelihood methods for regression. est 8, 45 447. [8] Dharmadhikari, S.W. and Jogdeo, K. (969). Bounds on moments of certain random variables. Ann. Math. Statist. 40, 506-508. [9] Jing, B.Y., Yuan, J.Q. and Zhou, W. (2009). Jackknife empirical likelihood. J. Amer. Statist. Assoc. 04, 224 232. [0] Lee, M.L. (2004). Analysis of Microarray Gene Expression Data. Kluwer. [] Kuelbs, J. and Vidyashankar, A.N. (200). Asymptotic inference for high dimensional data. Ann. Statist. 38, 836 869. [2] Owen, A. (990). Empirical likelihood ratio confidence regions. Ann. Statist. 8, 90 20. [3] Owen, A. (200). Empirical Likelihood. Chapman & Hall/CRC. [4] Qin, J. and Lawless, J. (994). Empirical likelihood and general estimating equations. Ann. Statist. 22, 300 325. [5] Srivastava, M.S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. J. Multi. Anan. 00, 58 532. 24

[6] Srivastava, M.S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multi. Anan. 99, 386 402. [7] von Bahr, B. and Esseen, C.G. (965). Inequality for the rth absolute moment of a sum of random variables, r 2. Ann. Math. Statist. 36, 299 393. difference of sample means 2000 000 0 000 0 500 000 500 2000 genes Figure : Colon data: differences of the sample means are plotted against each gene. 25

able : Sizes and powers of the proposed jackknife empirical likelihood test (JEL) and the test in Chen and Qin (200) (CQ) are reported for the case of (n, n 2 ) = (30, 30) at level 5%. d JEL CQ JEL CQ JEL CQ c = 0 c = 0 c = 0. c = 0. c = 0. c = 0. c 2 = 0.25 c 2 = 0.25 c 2 = 0.25 c 2 = 0.25 c 2 = 0.75 c 2 = 0.75 0 0.070 0.049 0.07 0.049 0.072 0.062 20 0.056 0.037 0.057 0.049 0.096 0.060 30 0.064 0.047 0.066 0.049 0.3 0.066 40 0.070 0.052 0.069 0.058 0.6 0.072 50 0.067 0.049 0.083 0.054 0.38 0.067 60 0.063 0.039 0.069 0.043 0.74 0.055 70 0.053 0.053 0.076 0.065 0.90 0.08 80 0.056 0.059 0.063 0.067 0.9 0.082 90 0.056 0.044 0.080 0.054 0.204 0.07 00 0.066 0.060 0.082 0.064 0.229 0.09 300 0.056 0.045 0.4 0.054 0.537 0.092 500 0.049 0.05 0.60 0.063 0.73 0.0 26

able 2: Sizes and powers of the proposed jackknife empirical likelihood test (JEL) and the test in Chen and Qin (200) (CQ) are reported for the case of (n, n 2 ) = (00, 00) at level 5%. d JEL CQ JEL CQ JEL CQ c = 0 c = 0 c = 0. c = 0. c = 0. c = 0. c 2 = 0.25 c 2 = 0.25 c 2 = 0.25 c 2 = 0.25 c 2 = 0.75 c 2 = 0.75 0 0.074 0.054 0.072 0.063 0.099 0.090 20 0.043 0.047 0.053 0.055 0.45 0.098 30 0.047 0.047 0.056 0.063 0.9 0.5 40 0.05 0.050 0.063 0.062 0.264 0.25 50 0.055 0.040 0.077 0.06 0.326 0.3 60 0.055 0.044 0.077 0.067 0.374 0.5 70 0.043 0.05 0.063 0.086 0.395 0.50 80 0.042 0.059 0.082 0.079 0.474 0.7 90 0.043 0.040 0.098 0.065 0.527 0.63 00 0.049 0.054 0.09 0.088 0.575 0.94 300 0.048 0.054 0.27 0.02 0.974 0.389 500 0.049 0.04 0.353 0.5 0.999 0.544 27

able 3: Sizes and powers of the proposed jackknife empirical likelihood test (JEL) and the test in Chen and Qin (200) (CQ) are reported for the case of (n, n 2 ) = (50, 200) at level 5%. d JEL CQ JEL CQ JEL CQ c = 0 c = 0 c = 0. c = 0. c = 0. c = 0. c 2 = 0.25 c 2 = 0.25 c 2 = 0.25 c 2 = 0.25 c 2 = 0.75 c 2 = 0.75 0 0.048 0.054 0.054 0.062 0.29 0.6 20 0.055 0.042 0.078 0.075 0.237 0.66 30 0.052 0.054 0.079 0.08 0.330 0.207 40 0.039 0.035 0.070 0.068 0.430 0.22 50 0.039 0.048 0.07 0.094 0.480 0.23 60 0.047 0.05 0.092 0.095 0.598 0.273 70 0.046 0.05 0.086 0.07 0.658 0.309 80 0.042 0.047 0.3 0.09 0.753 0.327 90 0.046 0.043 0.48 0.098 0.78 0.346 00 0.048 0.059 0.4 0.7 0.82 0.365 300 0.044 0.040 0.370 0.63 0.703 500 0.047 0.045 0.555 0.235 0.899 28

0 500 000 500 2000.5.0 0.5 0.0 0.5.0.5 genes difference of sample means of log Figure 2: Colon data: differences of the sample means of logarithms of gene expression levels are plotted against each gene. 29

able 4: Colon data: p-values for testing equal means of those genes with the absolute difference of sample means less than the threshold c 3. c 3 number of genes CQ JEL 50 58 2.94e-0 2.3e-0 00 50 5.63e-0 2.82e-0 200 742 7.2e-0 3.87e-0 500 93 2.7e-02 3.75e-0 000 978 6.79e-05 3.40e-0 3000 2000 5.06e-09.36e-0 30