PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL

Size: px

Start display at page:

Download "PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL"

Merry Harrison
6 years ago
Views:

1 The Annals of Statistics 2014, Vol. 42, No. 5, DOI: /14-AOS1249 Institute of Mathematical Statistics, 2014 PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL BY WEIDONG LIU 1 AND QI-MAN SHAO 2 Shanghai Jiao Tong University and The Chinese University of Hong Kong Applying the Benjamini and Hochberg B H) method to multiple Student s t tests is a popular technique for gene selection in microarray data analysis. Given the nonnormality of the population, the true p-values of the hypothesis tests are typically unknown. Hence it is common to use the standard normal distribution N0, 1), Student s t distribution t n 1 or the bootstrap method to estimate the p-values. In this paper, we prove that when the population has the finite 4th moment and the dimension m and the sample size n satisfy log m = on 1/3 ), the B H method controls the false discovery rate FDR) and the false discovery proportion FDP) at a given level α asymptotically with p-values estimated from N0, 1) or t n 1 distribution. However, a phase transition phenomenon occurs when log m c 0 n 1/3. In this case, the FDR and the FDP of the B H method may be larger than α or even converge to one. In contrast, the bootstrap calibration is accurate for log m = on 1/2 ) as long as the underlying distribution has the sub-gaussian tails. However, such a light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap calibration is very conservative for the heavy tailed distributions. To solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than its usual counterpart. 1. Introduction. Multiple Student s t tests often arise in many real applications, such as gene selection. Consider m tests on the mean values H 0i : μ i = 0 versus H 1i : μ i 0, 1 i m. A popular procedure is to use the Benjamini and Hochberg B H) method to search for significant findings, with the false discovery rate FDR) controlled at a given Received October 2013; revised May Supported by NSFC, Grant No and No , the Program for Professor of Special Appointment Eastern Scholar) at Shanghai Institutions of Higher Learning, Shanghai Pujiang Program, Foundation for the Author of National Excellent Doctoral Dissertation of PR China and a grant from Australian Research Council. 2 Supported in part by Hong Kong RGC GRF and MSC2010 subject classifications. 62H15. Key words and phrases. Bootstrap correction, false discovery rate, multiple t-tests, phase transition. 2003

2 2004 W. LIU AND Q.-M. SHAO level 0 <α<1; that is, [ ] V E α, R 1 where V is the number of wrongly rejected hypotheses and R is the total number of rejected hypotheses. The seminal work of Benjamini and Hochberg 1995)isto reject the null hypotheses for which p i p ˆk),wherep i is the p-value for H 0i, 1) ˆk = max{0 i m : p i) αi/m} and p 1) p m) are the ordered p-values. Let T 1,...,T m be Student s t test statistics T i = X i ŝ ni / n, where X i = 1 n X ki, ŝni 2 n = 1 n X ki X i ) 2, n 1 and X k1,...,x km ),1 k n, are i.i.d. random samples from X 1,...,X m ). When T 1,...,T m are independent and the true p-values p i are known, Benjamini and Hochberg 1995) showed that the B H method controls the FDR at level α. In many applications, the distributions of X i,1 i m, are non-gaussian. Hence it is difficult to know the exact null distributions of T i and the true p-values. When applying the B H method, the p-values are actually estimators. According to the central limit theorem, it is common to use the standard normal distribution N0, 1) or Student s t distribution t n 1 to estimate the p-values, where t n 1 denotes the Student s t random variable with n 1 degrees of freedom. In a microarray analysis, Efron 2004) observed that the null distribution choices substantially affect the simultaneous inference procedure. However, a systematic theoretical study on the influence of the estimated p-values is still lacking. It is important to know how accurate N0, 1) and t n 1 calibrations can be. In this paper, we show that N0, 1) and t n 1 calibrations are accurate when log m = on 1/3 ). Moreover, if the underlying distributions are symmetric, then the dimension can be as large as log m = on 1/2 ). Under the finite 4th moment of X i, the FDR and the false discovery proportion FDP) of the B H method with the estimated p- values ˆp i, = 2 2 T i ) or ˆp i, n 1 = 2 2 n 1 T i ) will converge to αm 0 /m, where m 0 is the number of true null hypotheses, t) is the standard normal distribution and n 1 t) = Pt n 1 t). However, when log m c 0 n 1/3 for some c 0 > 0 and the distributions are asymmetric, N0, 1) and t n 1 calibrations may not work well, and a phase transition phenomenon occurs. Under log m c 0 n 1/3, the number of true alternative hypotheses m 1 = expon 1/3 )) and the average of skewnesses τ = lim m m 1 0 i H 0 EX 3 i /σ 3 i > 0, we show that the FDR of

3 LARGE-SCALE t-tests 2005 the B H method satisfies lim m,n) FDR κ for some constant κ>α,where H 0 ={i : μ i = 0}. Furthermore, if log m/n 1/3, then lim m,n) FDR = 1. Similar results are proven for the false discovery proportion. This indicates that N0, 1) and t n 1 calibrations are inaccurate when the average of skewnesses τ 0 in the ultra high dimensional setting. It is well known that the bootstrap is an effective way to improve the accuracy of an exact null distribution approximation. Fan, Hall and Yao 2007) showed that for the bounded noise, the bootstrap can improve the accuracy and allow a higher dimension log m = on 1/2 ) on controlling the family-wise error rate. Delaigle, Hall and Jin 2011) showed that the bootstrap method has significant advantages in higher criticism. In this paper, we show that when the bootstrap calibration is used and log m = on 1/2 ), the B H method can asymptotically control FDR and FDP at level α. In our results, we assume the sub-gaussian tails instead of the bounded noise in Fan, Hall and Yao 2007). Although the bootstrap method allows for a higher dimension, the light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap method is very conservative for the heavy-tailed distributions. To solve this problem, we propose a regularized bootstrap method that is robust to the tails of the distributions. The proposed regularized bootstrap only requires a finite 6th moment, and the dimension can be as large as log m = on 1/2 ). It is also not uncommon in real applications for X 1,...,X m to be dependent. This results in a dependency between T 1,...,T m. In this paper, we obtain some similar results for the B H method under a general weak dependence condition. It should be noted that much work has been done on the robustness of the FDR/FDP controlling method against dependence. Benjamini and Yekutieli 2001) proved that the B H procedure controlled FDR under positive regression dependency. Storey 2003), Storey, Taylor and Siegmund 2004) andferreira and Zwinderman 2006) imposed a dependence condition that required the law of large numbers for the empirical distributions under the null and alternative hypothesis. Wu 2008) developed FDR controlling procedures for the data coming from special models, such as the time series model. However, to satisfy the conditions in most of the existing methods, it is often necessary to assume that the number of true alternative hypotheses m 1 is asymptotically π 1 m with some π 1 > 0. They exclude the sparse setting m 1 = om), which is important in applications such as gene selection. For example, if m 1 = om), then the conditions of Theorem 4 in Storey, Taylor and Siegmund 2004), and the conditions of the main results in Wu 2008) are not satisfied. In contrast, our results on FDR and FDP control under dependence allows m 1 γmfor some γ<1. The remainder of this paper is organized as follows. In Section 2.1, we show the robustness of and the phase transition phenomenon for the N0, 1) and t n 1 calibrations. In Section 2.2, we show that the bootstrap calibration can improve the FDR and FDP control. The regularized bootstrap method is proposed in Section 3. The results are extended to the dependence case in Section 4. The simulation study

4 2006 W. LIU AND Q.-M. SHAO is presented in Section 5 and the proofs are postponed to Section 6. Throughout the paper, all constants such as γ,b 0,c 0 in the upper bounds and lower bounds do not depend on n and m. 2. Main results Robustness and phase transition. In this section, we assume that the Student s t test statistics T 1,...,T m are independent, and the results are extended to the dependent case in Section 4. Before stating the main theorems, we introduce some notation. Let ˆp i, = 2 2 T i ) and ˆp i, n 1 = 2 2 n 1 T i ) be the p- values calculated from the standard normal distribution and the t-distribution, respectively. Let FDR and FDR n 1 be the FDR of the B H method with ˆp i, and ˆp i, n 1 in 1), respectively. Similarly, we denote the false discovery proportions of the B H method by FDP = R 1 V )andfdp n 1. Recall that R is the total number of rejections. The critical values of the tests are then ˆt = 1 1 αr/2m)) and ˆt n 1 = n αr/2m)). SetY i = X i μ i )/σ i with σi 2 = VarX i ), 1 i m. Recall that m 1 is the number of true alternative hypotheses. Throughout this paper, we assume m 1 γmfor some γ<1, which includes the important sparse setting m 1 = om). THEOREM 2.1. Suppose X 1,...,X m are independent and log m = on 1/2 ). Assume that max 1 i m EYi 4 b 0 for some constant b 0 > 0 and Card { 2) i : μ i /σ i 4 log m/n }. Then where lim n,m) FDR m 0 /m)ακ = 1 and lim n,m) FDR n 1 m 0 /m)ακ n 1 = 1, κ = E [ˆκ I {ˆκ 2α αγ) 1}], i H ˆκ = 0 { expˆt 3 EX3 i / nσi 3)) + exp ˆt 3 EX3 i / nσi 3))} 2m 0 and κ n 1 is defined in the same way. For the false discovery proportion, we have FDP m 0 /m)αˆκ 1 and FDP n 1 m 0 /m)αˆκ n 1 1 in probability as n, m). Let τ = lim m m 1 0 i H 0 EYi 3. We have the following corollary.

5 COROLLARY 2.1. i) If log m = on 1/3 ), then we have LARGE-SCALE t-tests 2007 Assume that the conditions in Theorem 2.1 are satisfied. lim FDR /αm 0 /m) = 1 and FDP /αm 0 /m) 1 in probability. n,m) ii) Suppose log m c 0 n 1/3 for some c 0 > 0 and m 1 = expon 1/3 )). Also assume that τ>0. Then lim n,m) FDR β and lim n,m) PFDP β) = 1 for some constant β>α. iii) Suppose log m/n 1/3 and m 1 = expon 1/3 )). Assume that τ>0. Then we have lim n,m) FDR = 1 and FDP 1 in probability. The same conclusions hold for FDR n 1 and FDP n 1. Theorem 2.1 and Corollary 2.1 show that when log m = on 1/3 ), N0, 1) and t n 1 calibrations are accurate. Note that only a finite 4th moment of Y i is required. Furthermore, if the skewnesses EYi 3 = 0fori H 0, then the dimension can be as large as log m = on 1/2 ). However, a phase transition occurs if the average of skewnesses τ>0, for example, for the exponential distribution. The FDR and FDP of the B H method are greater than α as long as log m c 0 n 1/3 and converge to one when log m/n 1/3. Under a finite 4th moment of X i, Cao and Kosorok 2011) prove the robustness of Student s t test statistics and N0, 1) calibration in the control of FDR and FDP. They require m 1 /m c for some 0 <c<1, which does not cover the sparse case. Corollary 2.1 also indicates that the choice of asymptotic null distributions is important in the study of large-scale testing problems. When the dimension is much larger than the sample size, simply using the null limiting distribution to estimate the true p-values may result in larger FDR and FDP. This is further verified by our simulation study in Section 5. In Theorem 2.1 and Corollary 2.1, we require technical condition 2). Actually, this condition is nearly optimal for the FDP results. If the number of true alternative hypotheses m 1 is fixed as m, then Proposition 2.1 below shows that even for the true p-values, the B H method is unable to control FDP at any level 0 <ξ<1 with overwhelming probability. Note that 2) is only slightly stronger than m 1. Let FDP true be the false discovery proportion of the B H method, with the true p-values p i,1 i m.letu0, 1) be the uniform random variable on 0, 1). PROPOSITION 2.1. Assume that m 1 is fixed as m and X 1,...,X m are independent. Suppose that p i U0, 1) for i H 0. For any 0 <ξ<1, we have lim PFDP true ξ) η n,m) for some η>0, where η may depend on m 1 and ξ.

6 2008 W. LIU AND Q.-M. SHAO Proposition 2.1 indicates that m 1 is a necessary condition for FDP control. In contrast, the control of FDR does not need m 1 when log m = on 1/3 ). However, FDR and FDR n 1 may still converge to one if log m/n 1/3 and τ>0. PROPOSITION 2.2. Suppose m 1 is fixed as m, X 1,...,X m are independent and log m = on 1/2 ). Assume that max 1 i m EYi 4 b 0 for some constant b 0 > 0. i) If log m = on 1/3 ) and p i U0, 1) for i H 0, then lim n,m) FDR α. ii) Suppose log m/n 1/3. Assume that τ > 0. We have lim n,m) FDR = 1. The same conclusions remain valid for FDR n Bootstrap calibration. In this section, we show that the bootstrap procedure can improve the accuracy of FDR and FDP control. Write X i ={X 1i,..., X ni }.LetXki ={X 1ki,...,X nki },1 k N, be resamples drawn randomly with replacement from X i.lettki be Student s t test statistics constructed from {X1ki X i,...,xnki X i }.WeuseG N,m t) = Nm 1 N mi=1 I{ Tki t} to approximate the null distribution and define the p-values by ˆp i,b = G N,m T i ). Let FDR B and FDP B denote the FDR and FDP of the B H method with ˆp i,b in 1), respectively. THEOREM 2.2. Suppose that max 1 i m Ee ty2 i K for some constants t>0 and K>0, and the conditions in Theorem 2.1 are satisfied. 3) i) If log m = on 1/3 ), then we have lim FDR B/αm 0 /m) = 1 and FDP B /αm 0 /m) 1 n,m) ii) If log m = on 1/2 ) and m 1 m η for some η<1, then 3) holds. in probability. Another common bootstrap method is to estimate the p-values individually by p i,b = G i T i), whereg i t) = N 1 N I{Tki t}; seefan, Hall and Yao 2007) and Delaigle, Hall and Jin 2011). Similar results to those achieved in Theorem 2.2 can be obtained if N is large enough. Let FDR B and FDP B be the FDR and FDP of the B H method with p i,b, respectively. The following result holds. PROPOSITION 2.3. Suppose that N m 2+δ for some δ > 0, max 1 i m Ee ty2 i K for some constants t>0and K>0, and log m = on 1/2 ). Assume that X 1,...,X m are independent.

7 LARGE-SCALE t-tests 2009 i) If 2) holds, then the results of Theorem 2.2i) and ii) hold for FDR B and FDP B. ii) Suppose that m 1 is fixed and p i U0, 1) for i H 0. If log m = on 1/2 ), then we have lim n,m) FDR B α. Fan, Hall and Yao 2007) proved that the bootstrap calibration accurately controls the family-wise error rate if log m = on 1/2 ) and P Y i C) = 1for 1 i m. Our result on FDR control only requires the sub-gaussian tails, which is a weaker requirement than the bounded noise. The bootstrap method has often been used in multiple Student s t tests in real applications. Fan, Hall and Yao 2007) anddelaigle, Hall and Jin 2011) have proven that the bootstrap method provides more accurate p-values than the normal or t n 1 approximation for the light-tailed distributions. Theorem 2.2 and Proposition 2.3 show that the bootstrap method allows a higher dimension log m = on 1/2 ) for FDR control as long as max 1 i m Ee ty2 i K. However, some real data may not satisfy such a light-tailed condition. The simulation study in Section 5 also indicates that the bootstrap calibration does not always outperform the N0, 1) or t n 1 calibrations. 3. Regularized bootstrap in large-scale tests. In this section, we introduce a regularized bootstrap method that is robust for heavy-tailed distributions, and the dimension m can be as large as e on1/2). For the regularized bootstrap method, the finite 6th moment condition is enough. Let λ ni be a regularization parameter. Define ˆX ki = X ki I { } X ki λ ni, 1 k n, 1 i m. Write Xˆ i ={ˆX 1i,..., ˆX ni }.LetXˆ ki ={ˆX 1ki,..., ˆX nki },1 k N, be resamples drawn independently and uniformly with replacement from Xˆ i.let ˆT ki be Student s t test statistics constructed from { ˆX 1ki ˆX i,..., ˆX nki ˆX i },where ˆX i = n 1 n ˆX ki.weuseĝ N,m t) = Nm 1 N mi=1 I{ ˆT ki t} to approximate the null distribution and define the p-values by ˆp i,rb = Ĝ N,m T i ). LetFDR RB and FDP RB be the FDR and FDP of the B H method with ˆp i,rb in 1), respectively. THEOREM 3.1. Assume that max 1 i m EXi 6 K for some constant K>0. Suppose X 1,...,X m are independent, 2) holds and min 1 i m σ ii c 1 for some c 1 > 0. Let c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. 4) i) If log m = on 1/3 ), then lim n,m) FDR RB/αm 0 /m) = 1 and FDP RB /αm 0 /m) 1 in probability.

8 2010 W. LIU AND Q.-M. SHAO ii) If log m = on 1/2 ) and m 1 m η for some η<1, then 4) remains valid. In Theorem 3.1, we only require max 1 i m EXi 6 K, which is much weaker than the moment condition in Theorem 2.2. As in Section 2.2, we can also estimate the p-values individually by p i,rb = Ĝ i T i), whereĝ i t) = N 1 N I{ ˆT ki t}. LetFDR RB and FDP RB be the FDR and FDP of the B H method with p i,rb, respectively. We have the following result. PROPOSITION 3.1. Suppose that N m 2+δ for some δ>0, max 1 i m EXi 6 K for some constant K > 0, min 1 i m σ ii c 1 for some c 1 > 0 and c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. Assume that X 1,..., X m are independent. i) Suppose that 2) holds. Then Theorem 3.1i) and ii) hold for FDRRB and FDPRB. ii) Suppose that m 1 is fixed and p i U0, 1) for i H 0. If log m = on 1/2 ), then we have lim n,m) FDRRB α. Theorem 3.1 does not cover the case when m 1 is fixed. However, if p i,rb, 1 i m are used, then Proposition 3.1 shows that the FDR can be controlled when m 1 is fixed and log m = on 1/2 ). Actually, when m 1 is fixed and log m = on 1/3 ), by the proof of Propositions 2.2 and 3.1, we can show that lim n,m) FDR RB α. It is unclear whether the similar result holds for FDR RB when the dimension becomes larger, that is, log m = on 1/2 ). However, under 2), Theorem 3.1 only requires N 1 because we use the average of all m variables. Hence ˆp i,rb have the significant advantage on the computational cost over p i,rb. Moreover, Proposition 2.1 indicates that 2) is nearly necessary for FDP control. Note that when one has FDP control, one can also have FDR control, but the reverse is not true, as Proposition 2.1 shows. Because FDR control is about the FDP average, studying FDP is more appealing in applications than FDR control. In the regularized bootstrap method, we must choose the regularization parameter λ ni. By Theorem 1.2 in Wang 2005), equation 2.2) in Shao 1999) andthe proof of Theorem 3.1, wehave P ˆT t X ˆ ) = 1 t 3 ) [exp 2 Gt) ˆκ i λ ni ) + exp t3 1 ˆκ i λ ni ))] + op 1) ), n n ki uniformly for 0 t on 1/4 ),wheregt) = 2 2 t), X ˆ ={ Xˆ 1,..., Xˆ m }, ˆκ i λ ni ) = 1 n n ˆσ i 3 ˆX ki ˆX i ) 3 and ˆσ i 2 = 1 n 5) ˆX ki ˆX i ) 2. n Also, P T i t ) = 1 t 3 ) [exp 2 Gt) κ i + exp t3 )] 1 ) κ i + o1), n n

9 LARGE-SCALE t-tests 2011 uniformly for 0 t on 1/4 ),whereκ i = EY 3 i. A good choice of λ ni is to make ˆκ i λ ni ) approach κ i.asκ i is unknown, we propose the following cross-validation method. Data-driven choice of λ ni. We propose to choose ˆλ ni = X i +ŝ ni λ,whereλ will be selected as follows. Split the samples into two parts I 0 ={1,...,n 1 } and I 1 ={n 1 + 1,...,n} with sizes n 0 =[n/2] and n 1 = n n 0, respectively. For I = I 0 or I 1,let ˆκ i,i = X i,i = 1 I 1 I ŝni,i 3 k I k I X ki. X ki X i,i ) 3, ŝ 2 ni,i = 1 I X ki X i,i ) 2, Let ˆκ i,i λ ni ), with λ ni = X i,i +ŝ ni,i λ/2, be defined as in 5) based on { ˆX ki,k I}. Define the risk k I We choose λ by 6) R j λ) = m ) 2. ˆκi,Ij λ ni ) ˆκ i,i1 j i=1 { ˆλ = arg min R0 λ) + R 1 λ) }. 0<λ< The final regularization parameter is ˆλ ni = X i +ŝ ni ˆλ. The numerical performance comparison between the data-driven choice ˆλ ni and the theoretical choice [e.g., n/ log m) 1/6 ] is given in Section 5. In addition, it is important to investigate the theoretical property of ˆλ ni and to see whether Theorem 3.1 still holds when ˆλ ni is used. We leave this for future work. 4. FDR control under dependence. To generalize the results to the dependent case, we introduce a class of correlation matrices. Let A = a ij ) beasymmetric matrix. Let k m and s m be positive numbers. Assume that for every 1 j m, 7) Card { 1 i m : a ij k m } sm. Let Ak m,s m ) be the class of symmetric matrices satisfying 7). Let R = r ij ) be the correlation matrix of X. We introduce the following two conditions: C1) Suppose that max 1 i<j m r ij r for some 0 <r<1 and R Ak m,s m ) with k m = log m) 2 θ and s m = Om ρ ) for some θ>0and0<ρ< 1 r)/1 + r). C1 ) Suppose that max 1 i<j m r ij r for some 0 <r<1. For each X i, assume that the number of variables X j that are dependent with X i is no more than s m.

10 2012 W. LIU AND Q.-M. SHAO C1) and C1 ) impose the weak dependence between X 1,...,X m.inc1), each variable can be highly correlated with other s m variables and weakly correlated with the remaining variables. C1 ) is stronger than C1). For each X i,c1 ) requires the independence between X i and other m s m variables. Recall that m 1 γmfor some γ<1. THEOREM 4.1. Assume that max 1 i m EY 4 i b 0 for some constant b 0 > 0, and 2) holds. i) If log m = On ζ ) for some 0 <ζ <3/23 and C1) is satisfied, then we have 8) lim n,m) FDR m 0 /m)α = 1 and FDP 1 in probability. m 0 /m)α ii) Under log m = on 1/3 ) and C1 ), 8) also holds. The same conclusions hold for FDR n 1 and FDP n 1. For the bootstrap and regularized procedures, we have similar results. THEOREM 4.2. Suppose that max 1 i m Ee ty2 i K and 2) is satisfied. 1) Under the conditions of i) or ii) in Theorem 4.1, we have 9) lim n,m) FDR B m 0 /m)α = 1 and FDP B 1 in probability. m 0 /m)α 2) Under C1 ), log m = on 1/2 ) and m 1 m η for some η<1, 9) holds. THEOREM 4.3. Suppose that max 1 i m EXi 6 K for some constant K>0, min 1 i m σ ii c 1 for some c 1 > 0 and 2) is satisfied. Let c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. 1) Under the conditions of i) or ii) in Theorem 4.1, we have 10) lim n,m) FDR RB m 0 /m)α = 1 and FDP RB 1 in probability. m 0 /m)α 2) Under C1 ), log m = on 1/2 ) and m 1 m η for some η<1, 10) holds. Theorems imply that the B H method remains valid asymptotically for weak dependence. As the phase transition phenomenon caused by the growth of the dimension, it would be interesting to investigate when the B H method will fail to control the FDR as the correlation becomes stronger.

11 LARGE-SCALE t-tests Numerical study. In this section, we conduct a small simulation to verify the phase transition phenomenon. Let 11) X i = μ i + ε i Eε i ), 1 i m, where ε 1,...,ε m ) are i.i.d. random variables. We consider three models for ε i and μ i. Model 1. ε i is the exponential random variable with parameter 1. Let μ i = 2σ log m/n for 1 i m 1 with m 1 = 0.05m and μ i = 0form 1 <i m, where σ 2 = Varε i ). Model 2-1. ε i is the Gamma random variable with parameter 0.5, 1). Letμ i = 4σ log m/n for 1 i m 1 with m 1 = 0.05m and μ i = 0form 1 <i m. Model 2-2. ε i is the Gamma random variable with parameter 0.5, 1). Let m 1 = 0. In all three models, the average of skewness is τ>0. We generate n = 30, 50 independent random samples from 11). In our simulation, α is taken to be 0.1, 0.2, 0.3 andm is taken to be 500, 1000, For computational reasons, we only consider the estimated p-values ˆp i,b and ˆp i,rb in the bootstrap and regularized bootstrap procedures, respectively. The number of bootstrap resamples is taken to be N = 200. We use FDR B,FDR RB and FDR RB to denote the FDR of the B H method with bootstrap, regularized bootstrap with data-driven ˆλ ni and regularized bootstrap with theoretical λ ni = n/ log m) 1/6, respectively. The simulation is replicated 1000 times and the empirical FDR and power for m = 3000 are summarized in Tables 1 and 2. To save space, we leave the simulation results for m = 500 and 1000 in the supplementary material of Liu and Shao 2014). The empirical power is defined by the average ratio between the number of correct rejections and m 1. Due to the nonzero skewness and m expn 1/3 ), the empirical FDR and FDR n 1 are much larger than the target FDR. The bootstrap method and the regularized bootstrap method with data-driven ˆλ ni provide more accurate approximations for the true p-values. Thus the empirical FDR B and FDR RB are much closer to α than FDR and FDR n 1. For Models 1, 2-1 and 2-2, the bootstrap method and the proposed regularized bootstrap method with data-driven ˆλ ni perform quite similarly. In addition, the data-driven ˆλ ni performs much better than the theoretical λ ni. All of four methods perform better as the sample size n grows from 30 to 50, although the empirical FDR and FDR n 1 still exhibit a serious departure from α. Next, we consider the following two models to compare the performance between the four methods when the distributions are symmetric and heavy tailed. Model 3. ε i is Student s t distribution with 4 degrees of freedom. Let μ i = 2 log m/n for 1 i m 1 with m 1 = 0.1m and μ i = 0form 1 <i m.

12 2014 W. LIU AND Q.-M. SHAO TABLE 1 Comparison of FDR FDR = α, m = 3000) n = 30 n = 50 α exp1) FDR FDR FDR B FDR RB FDR RB Gamma0.5, 1), m 1 = 0.05m FDR FDR FDR B FDR RB FDR RB Gamma0.5, 1), m 1 = 0 FDR FDR FDR B FDR RB FDR RB Model 4. ε i = ε i1 ε i2,whereε i1 and ε i1 are independent lognormal random variables with parameters 0, 1). Letμ i = 4 log m/n for 1 i m 1 with m 1 = 0.1m and μ i = 0form 1 <i m. For these two models, the normal approximation performs the best on the control of FDR; see Tables 3 and 4. FDR B is much smaller than α, so the bootstrap method is quite conservative. This is mainly due to the heavy tails of the t4) and lognormal distributions. The regularized bootstrap method works much better than the bootstrap method to control FDR. Table 4 shows that it also has a higher power power RB ) than the bootstrap method power B ). Hence the proposed regularized bootstrap is more robust than the commonly used bootstrap method. Finally, we examine the FDP control of the B H method when m is small and p-values are known. To this end, we consider Model 5 in which the exact null distributions are known. Model 5. Let ε i be i.i.d. N0, 1) random variables. Let μ i = 2 log m/n for 1 i m 1 and μ i = 0form 1 <i m,wherem 1 = 0, 1and5. In Figure 1, we plot the curve of the tailed probability of FDP based on 5000 replications, that is, 5000 i=1 I{FDP i t}/5000, where FDP i is the true FDP in the ith replication. From Figure 1, we can see that when m 1 is small, the B H method works unfavorably on FDP control. For example, the empirical probability of FDP > 0.4 is1whenm 1 = 0, 0.35 when m 1 = 1and0.12 when m 1 = 5.

13 LARGE-SCALE t-tests 2015 TABLE 2 Comparison of power FDR = α) n = 30 n = 50 m α exp1) 3000 power power power B power RB power RB Gamma0.5, 1), m 1 = 0.05m 3000 power power power B power RB power RB This phenomenon is in accord with Proposition 2.1. In contrast, as indicated by Theorem 2.1, the performance of FDP control improves when m 1 increases. 6. Proof of main results. We begin the proof by showing a uniform law of large numbers 13), which plays a key role in the proof of main results. According to Theorem 1.2 in Wang 2005) and equation 2.2) in Shao 1999), we have for TABLE 3 Comparison of FDR FDR = α) n = 30 n = 50 m α t4) 3000 FDR FDR FDR B FDR RB FDR RB Lognormal0, 1) 3000 FDR FDR FDR B FDR RB FDR RB

14 2016 W. LIU AND Q.-M. SHAO TABLE 4 Comparison of power FDR = α) n = 30 n = 50 m α t4) 3000 power power power B power RB power RB Lognormal0, 1) 3000 power power power B power RB power RB t on 1/4 ), 12) P T i nμ i /ŝ n t ) = 1 2 Gt) [exp 1 + o1) ), t3 3 n κ i ) t 3 )] + exp 3 n κ i where o1) is uniformly in 1 i m, Gt) = 2 2 t) and κ i = EYi 3. For any b m and b m = om), we first prove that, under C1 )andlogm = on 1/2 ) [orc1)andlogm = On ζ ) for some 0 <ζ <3/23], 13) in probability, where sup 0 t G 1 κ G κ t) = 1 2m 0 Gt) i H 0 b m /m) [ exp t3 i H 0 I{ T i t} m 0 G κ t) 3 n κ i 1 0 ) t 3 ) ] + exp 3 n κ i =: Gt)ˆκ t) and G 1 κ t) = inf{y 0:G κy) = t} for 0 t 1. Note that for 0 t o n), G κ t) is a strictly decreasing and continuous function. Let z 0 <z 1 < < z dm 1andt i = G 1 κ z i), wherez 0 = b m /m, z i = b m /m + bm 2/3 e iδ /m, d m = [{logm b m )/bm 2/3 )} 1/δ ] and 0 <δ<1, which will be specified later. Note that G κ t i )/G κ t i+1 ) = 1 + o1) uniformly in i, andt 0 / 2logm/b m ) = 1 + o1). Then to prove 13), it is enough to show that 14) sup i H 0 I{ T i t j } 1 m 0 G κ t j ) 0 0 j d m

15 LARGE-SCALE t-tests 2017 a) m 1 = 0 b) m 1 = 1 c) m 1 = 5 FIG. 1. Tailed probability of FDP with α = 0.2 and n = 50. The y-axis values are the empirical tailed probabilities 5000 i=1 I{FDP i t}/5000. in probability. Under C1), define and under C1 ), define S j = { i H 0 : r ij log m) 2 θ }, S c j = H 0 S j, S j ={i H 0 : X i is dependent with X j }. We claim that, under C1 )andlogm = on 1/2 ) [orc1)andlogm = On ζ ) for some 0 <ζ <3/23], for any ε>0andsomeγ 1 > 0, 15) I 2 t) := E i H 0 { I{Ti t} P T i t )}) 2 Cm 2 0 G2 κ t) 1 m 0 G κ t) + expr + ε)t2 ) /1 + r)) m 1 ρ + log m) 1 γ 1

16 2018 W. LIU AND Q.-M. SHAO uniformly in t [0,K log m] for all K>0. Take 1 + γ 1 ) 1 <δ<1. Given 15) and G 1 κ b m/m) 2logm/b m ),foranyε>0, we have d m ) i H P 0 I{T i t j } 1 m 0 G κ t j ) ε j=0 d m ) i H P 0 I{T i t j } P T i t j )) m j=0 0 G κ t j ) ε/2 C C dm 1 m 0 G κ t 0 ) + b 1 m + b 2/3 m j=1 d m j=1 1 m 0 G κ t j ) + d mm 1+ρ+2r+2ε)/1+r))+o1) e j δ + o1) ) = o1). + d m log m) 1 γ 1 This proves 14). To prove 15), we need the following lemma, which is proven in the supplementary material Liu and Shao 2014). 16) LEMMA 6.1. i) Suppose that log m = On 1/2 ). For any ε>0, max max P T i t, T j >t ) C exp 1 ε)t 2 /1 + r) ) i S j \j j H 0 uniformly in t [0,on 1/4 )). ii) Suppose that log m = On ζ ) for some 0 <ζ <3/23. We have for any K>0 17) P T i >t, T j >t ) = 1 + A n )P T i >t ) P T j >t ) uniformly in 0 t K log m, j H 0 and i S c j, where A n Clog m) 1 γ 1 for some γ 1 > 0. Set f ij t) = P T i t, T j t) P T i t)p T j t). Note that under C1 ) f ij = 0whenj H 0 \ S i.wehave I 2 t) P Ti t, T j t ) + f ij t) j S i j H 0 \S i i H 0 i H 0 Cm 0 G κ t) + C expr + 2ε)t2 /1 + r)) m 1 ρ m 2 0 G2 κ t) + A nm 2 0 G2 κ t), where the last inequality follows from Lemma 6.1 and G κ t) = Gt)e o1)t2 for t = o n). Thisproves15). )

17 LARGE-SCALE t-tests Proof of Theorem 2.1 and Corollary 2.1. We only prove the theorem for ˆp i,. The proof for ˆp i, n 1 is exactly the same when Gt) is replaced with 2 2 n 1 t). By Lemma 1 in Storey, Taylor and Siegmund 2004), we can see that the B H method with ˆp i, is equivalent to the following procedure: reject H 0i if and only if ˆp i, ˆt 0,where { ˆt 0 = sup 0 t 1:t α max 1 i m I{ˆp i, t}, 1) m It is equivalent to reject H 0i if and only if T i ˆt,where { ˆt = inf t 0:2 2 t) α max 1 i m I{ T i t}, 1) m By the continuity of t) and the monotonicity of the indicator function, it is easy to see that mgˆt) max 1 i m I{ T i ˆt}, 1) = α, where Gt) = 2 2 t). LetM be a subset of {1, 2,...,m} satisfying M {i : μ i /σ i 4 log m/n} and CardM) n. By max 1 i m EYi 4 K and Markov s inequality, for any ε>0, P max ŝ ni 2 /σ i 2 1 ) ε = O1/ n). i M This, together with 2) and12), implies that there exist some c> 2andsome b m, m P I { T i c log m } ) 18) b m 1. i=1 This implies that Pˆt G 1 αb m /m)) 1. Given 13) andg κ t) Gt), itfollows that Pˆt G 1 κ αb m/m)) 1. Therefore, by 13) i H 0 I{ T i ˆt} 1 m 0 G κ ˆt) in probability. Note that Gˆt)= α ˆm m + αm 0 m i H 0 I{ T i ˆt} m 0, where ˆm = i H 1 I{ T i ˆt}. With probability tending to one, }. }. 19) Gˆt)= α ˆm m + αm 0 m Gˆt)ˆκ 1 + o1) ) αm 0 m Gˆt)ˆκ 1 + o1) ).

18 2020 W. LIU AND Q.-M. SHAO Thus Pˆκ m/αm 0 ) + ε) 1foranyε>0. Let ˆκ =ˆκ I{ˆκ 2α1 γ)) 1 }. Note that m/αm 0 ) + ε 2α1 γ)) 1.Wehave FDP i H m 0 /m)αˆκ = 0 I{ T i ˆt} ˆκ ) 1 + o1) 1 m 0 G κ ˆt) ˆκ in probability. Then for any ε>0, and FDR 1 + ε) m 0 m αeˆκ + P FDP 1 + ε) m ) 0 m α ˆκ FDR 1 ε) m 0 m αeˆκ 2 α1 γ) ) 1 P FDP 1 ε) m ) 0 m α ˆκ. This proves Theorem 2.1. Corollary 2.11) follows directly from Theorem 2.1 and Pˆt 2logm) 1. αm To prove Corollary 2.12), we first assume that 0 m ˆκ 1 η for some 1 η)/α > 1. So, by 19) and the condition m 1 = expon 1/3 )), with probability tending to one, Gˆt) 2αη 1 ˆm/m 2αη 1 m 1+o1). Hence, ˆt c log m for any c< 2. Recall that τ = lim m m 1 0 i H 0 EYi 3 > 0. Set H 01 = { i H 0 : EY 3 i τ/8 }. According to the definition of τ and EYi 3 EY i 4)3/4 b 3/4 0, m 1 0 Hc 01 τ/8 + b 3/4 0 m 1 0 H 01 τ/2. This implies that H 01 τb 3/4 0 m 0 /4. Hence, we can get m 1 0 i H 0 EYi 3 2 c τ for some c τ > 0. It follows from Taylor s expansion of the exponential function and ˆt c log m that ˆκ 1 + ɛ for some ɛ>0. However, if αm 0 m ˆκ > 1 η,then ˆκ 1 + ɛ for some ɛ>0. This yields that Pˆκ 1 + ɛ) 1forsomeɛ>0. So we have κ 1 + ɛ for some ɛ>0. Note that m 0 /m 1. We prove Corollary 2.12). We next prove Corollary 2.13). By the inequality e x + e x x, Pˆκ m/αm 0 ) + ε) 1, we obtain that i H 0 ˆt 3 / n) EYi 3 m/αm 0 ) + ε 2m 0 with probability tending to one. By τ>0, we have Pˆt cn 1/6 ) 1 for some constant c>0. Thus PGˆt) exp 2cn 1/3 ) 1. Because ˆm/m exp Mn 1/3 ) for any M>0, and given 19), we have αm 0 m ˆκ 1 in probability. Hence, κ 1/α as m 0 /m 1. The proof is finished.

19 LARGE-SCALE t-tests Proof of Theorems 2.2 and 4.2. Let ˆκ i = 1 n X nŝni 3 ki X i ) 3.Define the event { } 1 n F = max 1 i m nŝni 4 X ki X i ) 4 K 1, max ˆκ i κ i K 2 log m/n 1 i m for some large K 1 > 0andK 2 > 0. We first suppose that PF) 1. Let G i t) = P Tki t) be the conditional distribution of T ki given X ={X 1,...,X m }.Note that, given X andontheeventf, G i t) = 1 [exp 2 Gt) t3 ) t 3 )] 1 3 n ˆκ i + exp 3 n ˆκ ) i + o1) = 1 [exp 2 Gt) t3 ) t 3 )] 1 3 n κ i + exp 3 n κ ) i + o1) uniformly in 0 t on 1/4 ). Hence, given X and on the event F, G 20) i t) P T i nμ i /ŝ n t) = 1 + o1) uniformly in 1 i m and 0 t on 1/4 ).Put Ĝ κ t) = 1 2m Gt) [ exp t3 ) t 3 )] 3 n κ i + exp 3 n κ i. 1 i m Set ĉ m = Ĝ 1 κ b m/m). Note that, given X, Tki,1 k N, 1 i m, are independent. Hence, as 13), we can show that for any b m, G sup N,m t) 21) 1 0 t ĉ m Ĝ κ t) 0 in probability. For t = O log m), under the conditions of Theorem 3.2, we have Ĝ κ t)/g κ t) = 1+o1). So, it is easy to see that 13) still holds when G 1 κ b m/m) is replaced by Ĝκ 1b m/m). This implies that for any b m, 22) sup i H 0 I{ T i t} 0 t ĉ m m 0 G N,m t) 1 0 in probability. Let Then we have { ˆt 0 = sup 0 t 1:t α max 1 i m I{ˆp i,b t}, 1) m ˆt 0 = α max 1 i m I{ˆp i,b ˆt 0 }, 1). m }.

20 2022 W. LIU AND Q.-M. SHAO According to 12)and20)wehave,givenX and on the event F, G i c log m) = m c2 /2+o1) for any c> 2 uniformly in i. So, by Markov s inequality, for any ε>0, we have PG N,m c log m) m c2 /2+ε ) 1. By 2) and18), we have Pˆt 0 αb m /m) 1forsomeb m. It follows from 22) that i H 0 I{ˆp i,b ˆt 0 } 1 m 0ˆt 0 in probability. This finishes the proof of Theorem 2.21), 2) and Theorem 4.2 if we can show that PF) 1. Without loss of generality, we can assume that μ i = 0 and σ i = 1. We first show that for some constant K 1 > 0, n P max X 4 ki EXki 4 ) ) 23) K 1 n = o1). 1 i m For 1 i n, put ˆX ki = X ki I { X ki n/ log m }, X ki = X ki ˆX ki. Then, for large n, n P max X 1 i m ki 4 E X 4 ) ) ki K 1 n/2 nm max P X 1i n/ log m ) 1 i m C explog m + log n tn/log m) = o1). Let Z ki = ˆX ki 4 E ˆX ki 4. By the inequality es 1 s s 2 e maxs,0) and 1 + s e s, we have for η = 2 1 tlog m)/n and some large K 1 ) n P max Z ki K 1 n/2 1 i m m n P 2 i=1 m i=1 m ) ) m n Z ki K 1 n/2 + P Z ki K 1 n/2 i=1 [ n ] n exp ηk 1 n/2) expηz ki ) + exp ηz ki ) i=1 exp ηk 1 n/2 + η 2 nez 2 1i eη Z 1i ) C exp log m tk 1 log m)/4 ) = o1).

21 LARGE-SCALE t-tests 2023 This proves 23). By replacing Xki 4, η = 2 1 tlog m)/n and K 1 n/2 with Xki 3, η = 2 1 t log m)/n and K 1 n log m/2, respectively, in the above proof, we can show that 1 n P max X 3 1 i m n ki EXki) ) 3 24) K 1 log m)/n = o1). Similarly, we have 1 n P max X 2 1 i m n ki EXki) ) 2 25) K 1 log m)/n = o1) and ) 1 n 26) P max X 1 i m ki EX ki ) n K 1 log m)/n = o1). Combining 23) 26), we prove that PF) Proof of Theorems 3.1 and 4.3. Let { } 1 n ˆF = max 1 i m n ˆσ i 4 ˆX ki ˆX i ) 4 K 1, max ˆκ i λ ni ) κ i K 2 log m/n. 1 i m By the proof of Theorems 2.2 and 4.2, it is enough to show that P ˆF) 1. Recall that ˆX ki = X ki I{ X ki λ ni } and put Z ki = ˆX ki 4 E ˆX ki 4.Takeη = log m)/n. We have ) n P max Z ki K 1 n/2 1 i m m 2 i=1 exp ηk 1 n/2 + η 2 nez 2 1i eη Z 1i ) C exp 2logm K 1 log m)/4 ) = o1). Similarly, by replacing ˆX ki 4, η = log m)/n and K 1n/2 with ˆX ki 3, η = log m)/n and K 1 n log m/2, respectively, in the above proof, we can show that 1 n P max ˆX 1 i m ki 3 n E ˆX ki) ) 3 K 1 log m)/n = o1). Also, using the above arguments, it is easy to show that 1 n P max ˆX 1 i m ki 2 n E ˆX ki) ) 2 K 1 log m)/n = o1)

22 2024 W. LIU AND Q.-M. SHAO and Note that and P 1 max n 1 i m ) n ˆX ki E ˆX ki ) K 1 log m)/n = o1). max E X 1i 3 I { } X 1i λ ni C 1 i m log m n max 1 i m EX6 1i max E X 1i 2 I { ) } log m 2/3 X 1i λ ni C max 1 i m n 1 i m EX6 1i. This proves that P ˆF) Proof of Theorem 4.1. Recall that mgˆt) max 1 i m I{ T i ˆt}, 1) = α. From 18), we have Pˆt G 1 αb m /m)) 1. The theorem follows from 13) and the fact that G κ t)/gt) = 1 + o1) uniformly in t [0,on 1/6 )) Proof of Propositions 2.1, 2.2, 2.3 and 3.1. To save space, the proof of these propositions is given in the supplementary material Liu and Shao 2014). Acknowledgments. The authors would like to thank the Associate Editor and two referees for their valuable comments, which have helped to improve the quality and presentation of this paper. SUPPLEMENTARY MATERIAL Supplement to Phase transition and regularized bootstrap in large-scale t- tests with false discovery rate control DOI: /14-AOS1249SUPP;.pdf). The supplementary material includes part of numerical results and the proof of Lemma 6.1 and Propositions 2.1, 2.2, 2.3 and 3.1. REFERENCES BENJAMINI, Y. and HOCHBERG, Y. 1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. BStat. Methodol MR BENJAMINI, Y. and YEKUTIELI, D. 2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist MR CAO, H. and KOSOROK, M. R. 2011). Simultaneous critical values for t-tests in very high dimensions. Bernoulli MR

23 LARGE-SCALE t-tests 2025 DELAIGLE, A., HALL, P. and JIN, J. 2011). Robustness and accuracy of methods for high dimensional data analysis based on Student s t-statistic. J. R. Stat. Soc. Ser. BStat. Methodol MR EFRON, B. 2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc MR FAN, J., HALL, P. and YAO, Q. 2007). To how many simultaneous hypothesis tests can normal, Student s t or bootstrap calibration be applied? J. Amer. Statist. Assoc MR FERREIRA, J. A. and ZWINDERMAN, A. H. 2006). On the Benjamini Hochberg method. Ann. Statist MR LIU, W. and SHAO, Q. 2014). Supplement to Phase transition and regularized bootstrap in largescale t-tests with false discovery rate control. DOI: /14-AOS1249SUPP. SHAO, Q.-M. 1999). A Cramér type large deviation result for Student s t-statistic. J. Theoret. Probab MR STOREY, J. D. 2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist MR STOREY, J. D., TAYLOR, J. E. and SIEGMUND, D. 2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. BStat. Methodol MR WANG, Q. 2005). Limit theorems for self-normalized large deviation. Electron. J. Probab electronic). MR WU, W. B. 2008). On false discovery control under dependence. Ann. Statist MR DEPARTMENT OF MATHEMATICS AND INSTITUTE OF NATURAL SCIENCES SHANGHAI JIAO TONG UNIVERSITY SHANGHAI CHINA weidongl@sjtu.edu.cn DEPARTMENT OF STATISTICS THE CHINESE UNIVERSITY OF HONG KONG SHATIN, N.T., HONG KONG CHINA

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the