PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL

Size: px
Start display at page:

Download "PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL"

Transcription

1 The Annals of Statistics 2014, Vol. 42, No. 5, DOI: /14-AOS1249 Institute of Mathematical Statistics, 2014 PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL BY WEIDONG LIU 1 AND QI-MAN SHAO 2 Shanghai Jiao Tong University and The Chinese University of Hong Kong Applying the Benjamini and Hochberg B H) method to multiple Student s t tests is a popular technique for gene selection in microarray data analysis. Given the nonnormality of the population, the true p-values of the hypothesis tests are typically unknown. Hence it is common to use the standard normal distribution N0, 1), Student s t distribution t n 1 or the bootstrap method to estimate the p-values. In this paper, we prove that when the population has the finite 4th moment and the dimension m and the sample size n satisfy log m = on 1/3 ), the B H method controls the false discovery rate FDR) and the false discovery proportion FDP) at a given level α asymptotically with p-values estimated from N0, 1) or t n 1 distribution. However, a phase transition phenomenon occurs when log m c 0 n 1/3. In this case, the FDR and the FDP of the B H method may be larger than α or even converge to one. In contrast, the bootstrap calibration is accurate for log m = on 1/2 ) as long as the underlying distribution has the sub-gaussian tails. However, such a light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap calibration is very conservative for the heavy tailed distributions. To solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than its usual counterpart. 1. Introduction. Multiple Student s t tests often arise in many real applications, such as gene selection. Consider m tests on the mean values H 0i : μ i = 0 versus H 1i : μ i 0, 1 i m. A popular procedure is to use the Benjamini and Hochberg B H) method to search for significant findings, with the false discovery rate FDR) controlled at a given Received October 2013; revised May Supported by NSFC, Grant No and No , the Program for Professor of Special Appointment Eastern Scholar) at Shanghai Institutions of Higher Learning, Shanghai Pujiang Program, Foundation for the Author of National Excellent Doctoral Dissertation of PR China and a grant from Australian Research Council. 2 Supported in part by Hong Kong RGC GRF and MSC2010 subject classifications. 62H15. Key words and phrases. Bootstrap correction, false discovery rate, multiple t-tests, phase transition. 2003

2 2004 W. LIU AND Q.-M. SHAO level 0 <α<1; that is, [ ] V E α, R 1 where V is the number of wrongly rejected hypotheses and R is the total number of rejected hypotheses. The seminal work of Benjamini and Hochberg 1995)isto reject the null hypotheses for which p i p ˆk),wherep i is the p-value for H 0i, 1) ˆk = max{0 i m : p i) αi/m} and p 1) p m) are the ordered p-values. Let T 1,...,T m be Student s t test statistics T i = X i ŝ ni / n, where X i = 1 n X ki, ŝni 2 n = 1 n X ki X i ) 2, n 1 and X k1,...,x km ),1 k n, are i.i.d. random samples from X 1,...,X m ). When T 1,...,T m are independent and the true p-values p i are known, Benjamini and Hochberg 1995) showed that the B H method controls the FDR at level α. In many applications, the distributions of X i,1 i m, are non-gaussian. Hence it is difficult to know the exact null distributions of T i and the true p-values. When applying the B H method, the p-values are actually estimators. According to the central limit theorem, it is common to use the standard normal distribution N0, 1) or Student s t distribution t n 1 to estimate the p-values, where t n 1 denotes the Student s t random variable with n 1 degrees of freedom. In a microarray analysis, Efron 2004) observed that the null distribution choices substantially affect the simultaneous inference procedure. However, a systematic theoretical study on the influence of the estimated p-values is still lacking. It is important to know how accurate N0, 1) and t n 1 calibrations can be. In this paper, we show that N0, 1) and t n 1 calibrations are accurate when log m = on 1/3 ). Moreover, if the underlying distributions are symmetric, then the dimension can be as large as log m = on 1/2 ). Under the finite 4th moment of X i, the FDR and the false discovery proportion FDP) of the B H method with the estimated p- values ˆp i, = 2 2 T i ) or ˆp i, n 1 = 2 2 n 1 T i ) will converge to αm 0 /m, where m 0 is the number of true null hypotheses, t) is the standard normal distribution and n 1 t) = Pt n 1 t). However, when log m c 0 n 1/3 for some c 0 > 0 and the distributions are asymmetric, N0, 1) and t n 1 calibrations may not work well, and a phase transition phenomenon occurs. Under log m c 0 n 1/3, the number of true alternative hypotheses m 1 = expon 1/3 )) and the average of skewnesses τ = lim m m 1 0 i H 0 EX 3 i /σ 3 i > 0, we show that the FDR of

3 LARGE-SCALE t-tests 2005 the B H method satisfies lim m,n) FDR κ for some constant κ>α,where H 0 ={i : μ i = 0}. Furthermore, if log m/n 1/3, then lim m,n) FDR = 1. Similar results are proven for the false discovery proportion. This indicates that N0, 1) and t n 1 calibrations are inaccurate when the average of skewnesses τ 0 in the ultra high dimensional setting. It is well known that the bootstrap is an effective way to improve the accuracy of an exact null distribution approximation. Fan, Hall and Yao 2007) showed that for the bounded noise, the bootstrap can improve the accuracy and allow a higher dimension log m = on 1/2 ) on controlling the family-wise error rate. Delaigle, Hall and Jin 2011) showed that the bootstrap method has significant advantages in higher criticism. In this paper, we show that when the bootstrap calibration is used and log m = on 1/2 ), the B H method can asymptotically control FDR and FDP at level α. In our results, we assume the sub-gaussian tails instead of the bounded noise in Fan, Hall and Yao 2007). Although the bootstrap method allows for a higher dimension, the light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap method is very conservative for the heavy-tailed distributions. To solve this problem, we propose a regularized bootstrap method that is robust to the tails of the distributions. The proposed regularized bootstrap only requires a finite 6th moment, and the dimension can be as large as log m = on 1/2 ). It is also not uncommon in real applications for X 1,...,X m to be dependent. This results in a dependency between T 1,...,T m. In this paper, we obtain some similar results for the B H method under a general weak dependence condition. It should be noted that much work has been done on the robustness of the FDR/FDP controlling method against dependence. Benjamini and Yekutieli 2001) proved that the B H procedure controlled FDR under positive regression dependency. Storey 2003), Storey, Taylor and Siegmund 2004) andferreira and Zwinderman 2006) imposed a dependence condition that required the law of large numbers for the empirical distributions under the null and alternative hypothesis. Wu 2008) developed FDR controlling procedures for the data coming from special models, such as the time series model. However, to satisfy the conditions in most of the existing methods, it is often necessary to assume that the number of true alternative hypotheses m 1 is asymptotically π 1 m with some π 1 > 0. They exclude the sparse setting m 1 = om), which is important in applications such as gene selection. For example, if m 1 = om), then the conditions of Theorem 4 in Storey, Taylor and Siegmund 2004), and the conditions of the main results in Wu 2008) are not satisfied. In contrast, our results on FDR and FDP control under dependence allows m 1 γmfor some γ<1. The remainder of this paper is organized as follows. In Section 2.1, we show the robustness of and the phase transition phenomenon for the N0, 1) and t n 1 calibrations. In Section 2.2, we show that the bootstrap calibration can improve the FDR and FDP control. The regularized bootstrap method is proposed in Section 3. The results are extended to the dependence case in Section 4. The simulation study

4 2006 W. LIU AND Q.-M. SHAO is presented in Section 5 and the proofs are postponed to Section 6. Throughout the paper, all constants such as γ,b 0,c 0 in the upper bounds and lower bounds do not depend on n and m. 2. Main results Robustness and phase transition. In this section, we assume that the Student s t test statistics T 1,...,T m are independent, and the results are extended to the dependent case in Section 4. Before stating the main theorems, we introduce some notation. Let ˆp i, = 2 2 T i ) and ˆp i, n 1 = 2 2 n 1 T i ) be the p- values calculated from the standard normal distribution and the t-distribution, respectively. Let FDR and FDR n 1 be the FDR of the B H method with ˆp i, and ˆp i, n 1 in 1), respectively. Similarly, we denote the false discovery proportions of the B H method by FDP = R 1 V )andfdp n 1. Recall that R is the total number of rejections. The critical values of the tests are then ˆt = 1 1 αr/2m)) and ˆt n 1 = n αr/2m)). SetY i = X i μ i )/σ i with σi 2 = VarX i ), 1 i m. Recall that m 1 is the number of true alternative hypotheses. Throughout this paper, we assume m 1 γmfor some γ<1, which includes the important sparse setting m 1 = om). THEOREM 2.1. Suppose X 1,...,X m are independent and log m = on 1/2 ). Assume that max 1 i m EYi 4 b 0 for some constant b 0 > 0 and Card { 2) i : μ i /σ i 4 log m/n }. Then where lim n,m) FDR m 0 /m)ακ = 1 and lim n,m) FDR n 1 m 0 /m)ακ n 1 = 1, κ = E [ˆκ I {ˆκ 2α αγ) 1}], i H ˆκ = 0 { expˆt 3 EX3 i / nσi 3)) + exp ˆt 3 EX3 i / nσi 3))} 2m 0 and κ n 1 is defined in the same way. For the false discovery proportion, we have FDP m 0 /m)αˆκ 1 and FDP n 1 m 0 /m)αˆκ n 1 1 in probability as n, m). Let τ = lim m m 1 0 i H 0 EYi 3. We have the following corollary.

5 COROLLARY 2.1. i) If log m = on 1/3 ), then we have LARGE-SCALE t-tests 2007 Assume that the conditions in Theorem 2.1 are satisfied. lim FDR /αm 0 /m) = 1 and FDP /αm 0 /m) 1 in probability. n,m) ii) Suppose log m c 0 n 1/3 for some c 0 > 0 and m 1 = expon 1/3 )). Also assume that τ>0. Then lim n,m) FDR β and lim n,m) PFDP β) = 1 for some constant β>α. iii) Suppose log m/n 1/3 and m 1 = expon 1/3 )). Assume that τ>0. Then we have lim n,m) FDR = 1 and FDP 1 in probability. The same conclusions hold for FDR n 1 and FDP n 1. Theorem 2.1 and Corollary 2.1 show that when log m = on 1/3 ), N0, 1) and t n 1 calibrations are accurate. Note that only a finite 4th moment of Y i is required. Furthermore, if the skewnesses EYi 3 = 0fori H 0, then the dimension can be as large as log m = on 1/2 ). However, a phase transition occurs if the average of skewnesses τ>0, for example, for the exponential distribution. The FDR and FDP of the B H method are greater than α as long as log m c 0 n 1/3 and converge to one when log m/n 1/3. Under a finite 4th moment of X i, Cao and Kosorok 2011) prove the robustness of Student s t test statistics and N0, 1) calibration in the control of FDR and FDP. They require m 1 /m c for some 0 <c<1, which does not cover the sparse case. Corollary 2.1 also indicates that the choice of asymptotic null distributions is important in the study of large-scale testing problems. When the dimension is much larger than the sample size, simply using the null limiting distribution to estimate the true p-values may result in larger FDR and FDP. This is further verified by our simulation study in Section 5. In Theorem 2.1 and Corollary 2.1, we require technical condition 2). Actually, this condition is nearly optimal for the FDP results. If the number of true alternative hypotheses m 1 is fixed as m, then Proposition 2.1 below shows that even for the true p-values, the B H method is unable to control FDP at any level 0 <ξ<1 with overwhelming probability. Note that 2) is only slightly stronger than m 1. Let FDP true be the false discovery proportion of the B H method, with the true p-values p i,1 i m.letu0, 1) be the uniform random variable on 0, 1). PROPOSITION 2.1. Assume that m 1 is fixed as m and X 1,...,X m are independent. Suppose that p i U0, 1) for i H 0. For any 0 <ξ<1, we have lim PFDP true ξ) η n,m) for some η>0, where η may depend on m 1 and ξ.

6 2008 W. LIU AND Q.-M. SHAO Proposition 2.1 indicates that m 1 is a necessary condition for FDP control. In contrast, the control of FDR does not need m 1 when log m = on 1/3 ). However, FDR and FDR n 1 may still converge to one if log m/n 1/3 and τ>0. PROPOSITION 2.2. Suppose m 1 is fixed as m, X 1,...,X m are independent and log m = on 1/2 ). Assume that max 1 i m EYi 4 b 0 for some constant b 0 > 0. i) If log m = on 1/3 ) and p i U0, 1) for i H 0, then lim n,m) FDR α. ii) Suppose log m/n 1/3. Assume that τ > 0. We have lim n,m) FDR = 1. The same conclusions remain valid for FDR n Bootstrap calibration. In this section, we show that the bootstrap procedure can improve the accuracy of FDR and FDP control. Write X i ={X 1i,..., X ni }.LetXki ={X 1ki,...,X nki },1 k N, be resamples drawn randomly with replacement from X i.lettki be Student s t test statistics constructed from {X1ki X i,...,xnki X i }.WeuseG N,m t) = Nm 1 N mi=1 I{ Tki t} to approximate the null distribution and define the p-values by ˆp i,b = G N,m T i ). Let FDR B and FDP B denote the FDR and FDP of the B H method with ˆp i,b in 1), respectively. THEOREM 2.2. Suppose that max 1 i m Ee ty2 i K for some constants t>0 and K>0, and the conditions in Theorem 2.1 are satisfied. 3) i) If log m = on 1/3 ), then we have lim FDR B/αm 0 /m) = 1 and FDP B /αm 0 /m) 1 n,m) ii) If log m = on 1/2 ) and m 1 m η for some η<1, then 3) holds. in probability. Another common bootstrap method is to estimate the p-values individually by p i,b = G i T i), whereg i t) = N 1 N I{Tki t}; seefan, Hall and Yao 2007) and Delaigle, Hall and Jin 2011). Similar results to those achieved in Theorem 2.2 can be obtained if N is large enough. Let FDR B and FDP B be the FDR and FDP of the B H method with p i,b, respectively. The following result holds. PROPOSITION 2.3. Suppose that N m 2+δ for some δ > 0, max 1 i m Ee ty2 i K for some constants t>0and K>0, and log m = on 1/2 ). Assume that X 1,...,X m are independent.

7 LARGE-SCALE t-tests 2009 i) If 2) holds, then the results of Theorem 2.2i) and ii) hold for FDR B and FDP B. ii) Suppose that m 1 is fixed and p i U0, 1) for i H 0. If log m = on 1/2 ), then we have lim n,m) FDR B α. Fan, Hall and Yao 2007) proved that the bootstrap calibration accurately controls the family-wise error rate if log m = on 1/2 ) and P Y i C) = 1for 1 i m. Our result on FDR control only requires the sub-gaussian tails, which is a weaker requirement than the bounded noise. The bootstrap method has often been used in multiple Student s t tests in real applications. Fan, Hall and Yao 2007) anddelaigle, Hall and Jin 2011) have proven that the bootstrap method provides more accurate p-values than the normal or t n 1 approximation for the light-tailed distributions. Theorem 2.2 and Proposition 2.3 show that the bootstrap method allows a higher dimension log m = on 1/2 ) for FDR control as long as max 1 i m Ee ty2 i K. However, some real data may not satisfy such a light-tailed condition. The simulation study in Section 5 also indicates that the bootstrap calibration does not always outperform the N0, 1) or t n 1 calibrations. 3. Regularized bootstrap in large-scale tests. In this section, we introduce a regularized bootstrap method that is robust for heavy-tailed distributions, and the dimension m can be as large as e on1/2). For the regularized bootstrap method, the finite 6th moment condition is enough. Let λ ni be a regularization parameter. Define ˆX ki = X ki I { } X ki λ ni, 1 k n, 1 i m. Write Xˆ i ={ˆX 1i,..., ˆX ni }.LetXˆ ki ={ˆX 1ki,..., ˆX nki },1 k N, be resamples drawn independently and uniformly with replacement from Xˆ i.let ˆT ki be Student s t test statistics constructed from { ˆX 1ki ˆX i,..., ˆX nki ˆX i },where ˆX i = n 1 n ˆX ki.weuseĝ N,m t) = Nm 1 N mi=1 I{ ˆT ki t} to approximate the null distribution and define the p-values by ˆp i,rb = Ĝ N,m T i ). LetFDR RB and FDP RB be the FDR and FDP of the B H method with ˆp i,rb in 1), respectively. THEOREM 3.1. Assume that max 1 i m EXi 6 K for some constant K>0. Suppose X 1,...,X m are independent, 2) holds and min 1 i m σ ii c 1 for some c 1 > 0. Let c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. 4) i) If log m = on 1/3 ), then lim n,m) FDR RB/αm 0 /m) = 1 and FDP RB /αm 0 /m) 1 in probability.

8 2010 W. LIU AND Q.-M. SHAO ii) If log m = on 1/2 ) and m 1 m η for some η<1, then 4) remains valid. In Theorem 3.1, we only require max 1 i m EXi 6 K, which is much weaker than the moment condition in Theorem 2.2. As in Section 2.2, we can also estimate the p-values individually by p i,rb = Ĝ i T i), whereĝ i t) = N 1 N I{ ˆT ki t}. LetFDR RB and FDP RB be the FDR and FDP of the B H method with p i,rb, respectively. We have the following result. PROPOSITION 3.1. Suppose that N m 2+δ for some δ>0, max 1 i m EXi 6 K for some constant K > 0, min 1 i m σ ii c 1 for some c 1 > 0 and c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. Assume that X 1,..., X m are independent. i) Suppose that 2) holds. Then Theorem 3.1i) and ii) hold for FDRRB and FDPRB. ii) Suppose that m 1 is fixed and p i U0, 1) for i H 0. If log m = on 1/2 ), then we have lim n,m) FDRRB α. Theorem 3.1 does not cover the case when m 1 is fixed. However, if p i,rb, 1 i m are used, then Proposition 3.1 shows that the FDR can be controlled when m 1 is fixed and log m = on 1/2 ). Actually, when m 1 is fixed and log m = on 1/3 ), by the proof of Propositions 2.2 and 3.1, we can show that lim n,m) FDR RB α. It is unclear whether the similar result holds for FDR RB when the dimension becomes larger, that is, log m = on 1/2 ). However, under 2), Theorem 3.1 only requires N 1 because we use the average of all m variables. Hence ˆp i,rb have the significant advantage on the computational cost over p i,rb. Moreover, Proposition 2.1 indicates that 2) is nearly necessary for FDP control. Note that when one has FDP control, one can also have FDR control, but the reverse is not true, as Proposition 2.1 shows. Because FDR control is about the FDP average, studying FDP is more appealing in applications than FDR control. In the regularized bootstrap method, we must choose the regularization parameter λ ni. By Theorem 1.2 in Wang 2005), equation 2.2) in Shao 1999) andthe proof of Theorem 3.1, wehave P ˆT t X ˆ ) = 1 t 3 ) [exp 2 Gt) ˆκ i λ ni ) + exp t3 1 ˆκ i λ ni ))] + op 1) ), n n ki uniformly for 0 t on 1/4 ),wheregt) = 2 2 t), X ˆ ={ Xˆ 1,..., Xˆ m }, ˆκ i λ ni ) = 1 n n ˆσ i 3 ˆX ki ˆX i ) 3 and ˆσ i 2 = 1 n 5) ˆX ki ˆX i ) 2. n Also, P T i t ) = 1 t 3 ) [exp 2 Gt) κ i + exp t3 )] 1 ) κ i + o1), n n

9 LARGE-SCALE t-tests 2011 uniformly for 0 t on 1/4 ),whereκ i = EY 3 i. A good choice of λ ni is to make ˆκ i λ ni ) approach κ i.asκ i is unknown, we propose the following cross-validation method. Data-driven choice of λ ni. We propose to choose ˆλ ni = X i +ŝ ni λ,whereλ will be selected as follows. Split the samples into two parts I 0 ={1,...,n 1 } and I 1 ={n 1 + 1,...,n} with sizes n 0 =[n/2] and n 1 = n n 0, respectively. For I = I 0 or I 1,let ˆκ i,i = X i,i = 1 I 1 I ŝni,i 3 k I k I X ki. X ki X i,i ) 3, ŝ 2 ni,i = 1 I X ki X i,i ) 2, Let ˆκ i,i λ ni ), with λ ni = X i,i +ŝ ni,i λ/2, be defined as in 5) based on { ˆX ki,k I}. Define the risk k I We choose λ by 6) R j λ) = m ) 2. ˆκi,Ij λ ni ) ˆκ i,i1 j i=1 { ˆλ = arg min R0 λ) + R 1 λ) }. 0<λ< The final regularization parameter is ˆλ ni = X i +ŝ ni ˆλ. The numerical performance comparison between the data-driven choice ˆλ ni and the theoretical choice [e.g., n/ log m) 1/6 ] is given in Section 5. In addition, it is important to investigate the theoretical property of ˆλ ni and to see whether Theorem 3.1 still holds when ˆλ ni is used. We leave this for future work. 4. FDR control under dependence. To generalize the results to the dependent case, we introduce a class of correlation matrices. Let A = a ij ) beasymmetric matrix. Let k m and s m be positive numbers. Assume that for every 1 j m, 7) Card { 1 i m : a ij k m } sm. Let Ak m,s m ) be the class of symmetric matrices satisfying 7). Let R = r ij ) be the correlation matrix of X. We introduce the following two conditions: C1) Suppose that max 1 i<j m r ij r for some 0 <r<1 and R Ak m,s m ) with k m = log m) 2 θ and s m = Om ρ ) for some θ>0and0<ρ< 1 r)/1 + r). C1 ) Suppose that max 1 i<j m r ij r for some 0 <r<1. For each X i, assume that the number of variables X j that are dependent with X i is no more than s m.

10 2012 W. LIU AND Q.-M. SHAO C1) and C1 ) impose the weak dependence between X 1,...,X m.inc1), each variable can be highly correlated with other s m variables and weakly correlated with the remaining variables. C1 ) is stronger than C1). For each X i,c1 ) requires the independence between X i and other m s m variables. Recall that m 1 γmfor some γ<1. THEOREM 4.1. Assume that max 1 i m EY 4 i b 0 for some constant b 0 > 0, and 2) holds. i) If log m = On ζ ) for some 0 <ζ <3/23 and C1) is satisfied, then we have 8) lim n,m) FDR m 0 /m)α = 1 and FDP 1 in probability. m 0 /m)α ii) Under log m = on 1/3 ) and C1 ), 8) also holds. The same conclusions hold for FDR n 1 and FDP n 1. For the bootstrap and regularized procedures, we have similar results. THEOREM 4.2. Suppose that max 1 i m Ee ty2 i K and 2) is satisfied. 1) Under the conditions of i) or ii) in Theorem 4.1, we have 9) lim n,m) FDR B m 0 /m)α = 1 and FDP B 1 in probability. m 0 /m)α 2) Under C1 ), log m = on 1/2 ) and m 1 m η for some η<1, 9) holds. THEOREM 4.3. Suppose that max 1 i m EXi 6 K for some constant K>0, min 1 i m σ ii c 1 for some c 1 > 0 and 2) is satisfied. Let c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. 1) Under the conditions of i) or ii) in Theorem 4.1, we have 10) lim n,m) FDR RB m 0 /m)α = 1 and FDP RB 1 in probability. m 0 /m)α 2) Under C1 ), log m = on 1/2 ) and m 1 m η for some η<1, 10) holds. Theorems imply that the B H method remains valid asymptotically for weak dependence. As the phase transition phenomenon caused by the growth of the dimension, it would be interesting to investigate when the B H method will fail to control the FDR as the correlation becomes stronger.

11 LARGE-SCALE t-tests Numerical study. In this section, we conduct a small simulation to verify the phase transition phenomenon. Let 11) X i = μ i + ε i Eε i ), 1 i m, where ε 1,...,ε m ) are i.i.d. random variables. We consider three models for ε i and μ i. Model 1. ε i is the exponential random variable with parameter 1. Let μ i = 2σ log m/n for 1 i m 1 with m 1 = 0.05m and μ i = 0form 1 <i m, where σ 2 = Varε i ). Model 2-1. ε i is the Gamma random variable with parameter 0.5, 1). Letμ i = 4σ log m/n for 1 i m 1 with m 1 = 0.05m and μ i = 0form 1 <i m. Model 2-2. ε i is the Gamma random variable with parameter 0.5, 1). Let m 1 = 0. In all three models, the average of skewness is τ>0. We generate n = 30, 50 independent random samples from 11). In our simulation, α is taken to be 0.1, 0.2, 0.3 andm is taken to be 500, 1000, For computational reasons, we only consider the estimated p-values ˆp i,b and ˆp i,rb in the bootstrap and regularized bootstrap procedures, respectively. The number of bootstrap resamples is taken to be N = 200. We use FDR B,FDR RB and FDR RB to denote the FDR of the B H method with bootstrap, regularized bootstrap with data-driven ˆλ ni and regularized bootstrap with theoretical λ ni = n/ log m) 1/6, respectively. The simulation is replicated 1000 times and the empirical FDR and power for m = 3000 are summarized in Tables 1 and 2. To save space, we leave the simulation results for m = 500 and 1000 in the supplementary material of Liu and Shao 2014). The empirical power is defined by the average ratio between the number of correct rejections and m 1. Due to the nonzero skewness and m expn 1/3 ), the empirical FDR and FDR n 1 are much larger than the target FDR. The bootstrap method and the regularized bootstrap method with data-driven ˆλ ni provide more accurate approximations for the true p-values. Thus the empirical FDR B and FDR RB are much closer to α than FDR and FDR n 1. For Models 1, 2-1 and 2-2, the bootstrap method and the proposed regularized bootstrap method with data-driven ˆλ ni perform quite similarly. In addition, the data-driven ˆλ ni performs much better than the theoretical λ ni. All of four methods perform better as the sample size n grows from 30 to 50, although the empirical FDR and FDR n 1 still exhibit a serious departure from α. Next, we consider the following two models to compare the performance between the four methods when the distributions are symmetric and heavy tailed. Model 3. ε i is Student s t distribution with 4 degrees of freedom. Let μ i = 2 log m/n for 1 i m 1 with m 1 = 0.1m and μ i = 0form 1 <i m.

12 2014 W. LIU AND Q.-M. SHAO TABLE 1 Comparison of FDR FDR = α, m = 3000) n = 30 n = 50 α exp1) FDR FDR FDR B FDR RB FDR RB Gamma0.5, 1), m 1 = 0.05m FDR FDR FDR B FDR RB FDR RB Gamma0.5, 1), m 1 = 0 FDR FDR FDR B FDR RB FDR RB Model 4. ε i = ε i1 ε i2,whereε i1 and ε i1 are independent lognormal random variables with parameters 0, 1). Letμ i = 4 log m/n for 1 i m 1 with m 1 = 0.1m and μ i = 0form 1 <i m. For these two models, the normal approximation performs the best on the control of FDR; see Tables 3 and 4. FDR B is much smaller than α, so the bootstrap method is quite conservative. This is mainly due to the heavy tails of the t4) and lognormal distributions. The regularized bootstrap method works much better than the bootstrap method to control FDR. Table 4 shows that it also has a higher power power RB ) than the bootstrap method power B ). Hence the proposed regularized bootstrap is more robust than the commonly used bootstrap method. Finally, we examine the FDP control of the B H method when m is small and p-values are known. To this end, we consider Model 5 in which the exact null distributions are known. Model 5. Let ε i be i.i.d. N0, 1) random variables. Let μ i = 2 log m/n for 1 i m 1 and μ i = 0form 1 <i m,wherem 1 = 0, 1and5. In Figure 1, we plot the curve of the tailed probability of FDP based on 5000 replications, that is, 5000 i=1 I{FDP i t}/5000, where FDP i is the true FDP in the ith replication. From Figure 1, we can see that when m 1 is small, the B H method works unfavorably on FDP control. For example, the empirical probability of FDP > 0.4 is1whenm 1 = 0, 0.35 when m 1 = 1and0.12 when m 1 = 5.

13 LARGE-SCALE t-tests 2015 TABLE 2 Comparison of power FDR = α) n = 30 n = 50 m α exp1) 3000 power power power B power RB power RB Gamma0.5, 1), m 1 = 0.05m 3000 power power power B power RB power RB This phenomenon is in accord with Proposition 2.1. In contrast, as indicated by Theorem 2.1, the performance of FDP control improves when m 1 increases. 6. Proof of main results. We begin the proof by showing a uniform law of large numbers 13), which plays a key role in the proof of main results. According to Theorem 1.2 in Wang 2005) and equation 2.2) in Shao 1999), we have for TABLE 3 Comparison of FDR FDR = α) n = 30 n = 50 m α t4) 3000 FDR FDR FDR B FDR RB FDR RB Lognormal0, 1) 3000 FDR FDR FDR B FDR RB FDR RB

14 2016 W. LIU AND Q.-M. SHAO TABLE 4 Comparison of power FDR = α) n = 30 n = 50 m α t4) 3000 power power power B power RB power RB Lognormal0, 1) 3000 power power power B power RB power RB t on 1/4 ), 12) P T i nμ i /ŝ n t ) = 1 2 Gt) [exp 1 + o1) ), t3 3 n κ i ) t 3 )] + exp 3 n κ i where o1) is uniformly in 1 i m, Gt) = 2 2 t) and κ i = EYi 3. For any b m and b m = om), we first prove that, under C1 )andlogm = on 1/2 ) [orc1)andlogm = On ζ ) for some 0 <ζ <3/23], 13) in probability, where sup 0 t G 1 κ G κ t) = 1 2m 0 Gt) i H 0 b m /m) [ exp t3 i H 0 I{ T i t} m 0 G κ t) 3 n κ i 1 0 ) t 3 ) ] + exp 3 n κ i =: Gt)ˆκ t) and G 1 κ t) = inf{y 0:G κy) = t} for 0 t 1. Note that for 0 t o n), G κ t) is a strictly decreasing and continuous function. Let z 0 <z 1 < < z dm 1andt i = G 1 κ z i), wherez 0 = b m /m, z i = b m /m + bm 2/3 e iδ /m, d m = [{logm b m )/bm 2/3 )} 1/δ ] and 0 <δ<1, which will be specified later. Note that G κ t i )/G κ t i+1 ) = 1 + o1) uniformly in i, andt 0 / 2logm/b m ) = 1 + o1). Then to prove 13), it is enough to show that 14) sup i H 0 I{ T i t j } 1 m 0 G κ t j ) 0 0 j d m

15 LARGE-SCALE t-tests 2017 a) m 1 = 0 b) m 1 = 1 c) m 1 = 5 FIG. 1. Tailed probability of FDP with α = 0.2 and n = 50. The y-axis values are the empirical tailed probabilities 5000 i=1 I{FDP i t}/5000. in probability. Under C1), define and under C1 ), define S j = { i H 0 : r ij log m) 2 θ }, S c j = H 0 S j, S j ={i H 0 : X i is dependent with X j }. We claim that, under C1 )andlogm = on 1/2 ) [orc1)andlogm = On ζ ) for some 0 <ζ <3/23], for any ε>0andsomeγ 1 > 0, 15) I 2 t) := E i H 0 { I{Ti t} P T i t )}) 2 Cm 2 0 G2 κ t) 1 m 0 G κ t) + expr + ε)t2 ) /1 + r)) m 1 ρ + log m) 1 γ 1

16 2018 W. LIU AND Q.-M. SHAO uniformly in t [0,K log m] for all K>0. Take 1 + γ 1 ) 1 <δ<1. Given 15) and G 1 κ b m/m) 2logm/b m ),foranyε>0, we have d m ) i H P 0 I{T i t j } 1 m 0 G κ t j ) ε j=0 d m ) i H P 0 I{T i t j } P T i t j )) m j=0 0 G κ t j ) ε/2 C C dm 1 m 0 G κ t 0 ) + b 1 m + b 2/3 m j=1 d m j=1 1 m 0 G κ t j ) + d mm 1+ρ+2r+2ε)/1+r))+o1) e j δ + o1) ) = o1). + d m log m) 1 γ 1 This proves 14). To prove 15), we need the following lemma, which is proven in the supplementary material Liu and Shao 2014). 16) LEMMA 6.1. i) Suppose that log m = On 1/2 ). For any ε>0, max max P T i t, T j >t ) C exp 1 ε)t 2 /1 + r) ) i S j \j j H 0 uniformly in t [0,on 1/4 )). ii) Suppose that log m = On ζ ) for some 0 <ζ <3/23. We have for any K>0 17) P T i >t, T j >t ) = 1 + A n )P T i >t ) P T j >t ) uniformly in 0 t K log m, j H 0 and i S c j, where A n Clog m) 1 γ 1 for some γ 1 > 0. Set f ij t) = P T i t, T j t) P T i t)p T j t). Note that under C1 ) f ij = 0whenj H 0 \ S i.wehave I 2 t) P Ti t, T j t ) + f ij t) j S i j H 0 \S i i H 0 i H 0 Cm 0 G κ t) + C expr + 2ε)t2 /1 + r)) m 1 ρ m 2 0 G2 κ t) + A nm 2 0 G2 κ t), where the last inequality follows from Lemma 6.1 and G κ t) = Gt)e o1)t2 for t = o n). Thisproves15). )

17 LARGE-SCALE t-tests Proof of Theorem 2.1 and Corollary 2.1. We only prove the theorem for ˆp i,. The proof for ˆp i, n 1 is exactly the same when Gt) is replaced with 2 2 n 1 t). By Lemma 1 in Storey, Taylor and Siegmund 2004), we can see that the B H method with ˆp i, is equivalent to the following procedure: reject H 0i if and only if ˆp i, ˆt 0,where { ˆt 0 = sup 0 t 1:t α max 1 i m I{ˆp i, t}, 1) m It is equivalent to reject H 0i if and only if T i ˆt,where { ˆt = inf t 0:2 2 t) α max 1 i m I{ T i t}, 1) m By the continuity of t) and the monotonicity of the indicator function, it is easy to see that mgˆt) max 1 i m I{ T i ˆt}, 1) = α, where Gt) = 2 2 t). LetM be a subset of {1, 2,...,m} satisfying M {i : μ i /σ i 4 log m/n} and CardM) n. By max 1 i m EYi 4 K and Markov s inequality, for any ε>0, P max ŝ ni 2 /σ i 2 1 ) ε = O1/ n). i M This, together with 2) and12), implies that there exist some c> 2andsome b m, m P I { T i c log m } ) 18) b m 1. i=1 This implies that Pˆt G 1 αb m /m)) 1. Given 13) andg κ t) Gt), itfollows that Pˆt G 1 κ αb m/m)) 1. Therefore, by 13) i H 0 I{ T i ˆt} 1 m 0 G κ ˆt) in probability. Note that Gˆt)= α ˆm m + αm 0 m i H 0 I{ T i ˆt} m 0, where ˆm = i H 1 I{ T i ˆt}. With probability tending to one, }. }. 19) Gˆt)= α ˆm m + αm 0 m Gˆt)ˆκ 1 + o1) ) αm 0 m Gˆt)ˆκ 1 + o1) ).

18 2020 W. LIU AND Q.-M. SHAO Thus Pˆκ m/αm 0 ) + ε) 1foranyε>0. Let ˆκ =ˆκ I{ˆκ 2α1 γ)) 1 }. Note that m/αm 0 ) + ε 2α1 γ)) 1.Wehave FDP i H m 0 /m)αˆκ = 0 I{ T i ˆt} ˆκ ) 1 + o1) 1 m 0 G κ ˆt) ˆκ in probability. Then for any ε>0, and FDR 1 + ε) m 0 m αeˆκ + P FDP 1 + ε) m ) 0 m α ˆκ FDR 1 ε) m 0 m αeˆκ 2 α1 γ) ) 1 P FDP 1 ε) m ) 0 m α ˆκ. This proves Theorem 2.1. Corollary 2.11) follows directly from Theorem 2.1 and Pˆt 2logm) 1. αm To prove Corollary 2.12), we first assume that 0 m ˆκ 1 η for some 1 η)/α > 1. So, by 19) and the condition m 1 = expon 1/3 )), with probability tending to one, Gˆt) 2αη 1 ˆm/m 2αη 1 m 1+o1). Hence, ˆt c log m for any c< 2. Recall that τ = lim m m 1 0 i H 0 EYi 3 > 0. Set H 01 = { i H 0 : EY 3 i τ/8 }. According to the definition of τ and EYi 3 EY i 4)3/4 b 3/4 0, m 1 0 Hc 01 τ/8 + b 3/4 0 m 1 0 H 01 τ/2. This implies that H 01 τb 3/4 0 m 0 /4. Hence, we can get m 1 0 i H 0 EYi 3 2 c τ for some c τ > 0. It follows from Taylor s expansion of the exponential function and ˆt c log m that ˆκ 1 + ɛ for some ɛ>0. However, if αm 0 m ˆκ > 1 η,then ˆκ 1 + ɛ for some ɛ>0. This yields that Pˆκ 1 + ɛ) 1forsomeɛ>0. So we have κ 1 + ɛ for some ɛ>0. Note that m 0 /m 1. We prove Corollary 2.12). We next prove Corollary 2.13). By the inequality e x + e x x, Pˆκ m/αm 0 ) + ε) 1, we obtain that i H 0 ˆt 3 / n) EYi 3 m/αm 0 ) + ε 2m 0 with probability tending to one. By τ>0, we have Pˆt cn 1/6 ) 1 for some constant c>0. Thus PGˆt) exp 2cn 1/3 ) 1. Because ˆm/m exp Mn 1/3 ) for any M>0, and given 19), we have αm 0 m ˆκ 1 in probability. Hence, κ 1/α as m 0 /m 1. The proof is finished.

19 LARGE-SCALE t-tests Proof of Theorems 2.2 and 4.2. Let ˆκ i = 1 n X nŝni 3 ki X i ) 3.Define the event { } 1 n F = max 1 i m nŝni 4 X ki X i ) 4 K 1, max ˆκ i κ i K 2 log m/n 1 i m for some large K 1 > 0andK 2 > 0. We first suppose that PF) 1. Let G i t) = P Tki t) be the conditional distribution of T ki given X ={X 1,...,X m }.Note that, given X andontheeventf, G i t) = 1 [exp 2 Gt) t3 ) t 3 )] 1 3 n ˆκ i + exp 3 n ˆκ ) i + o1) = 1 [exp 2 Gt) t3 ) t 3 )] 1 3 n κ i + exp 3 n κ ) i + o1) uniformly in 0 t on 1/4 ). Hence, given X and on the event F, G 20) i t) P T i nμ i /ŝ n t) = 1 + o1) uniformly in 1 i m and 0 t on 1/4 ).Put Ĝ κ t) = 1 2m Gt) [ exp t3 ) t 3 )] 3 n κ i + exp 3 n κ i. 1 i m Set ĉ m = Ĝ 1 κ b m/m). Note that, given X, Tki,1 k N, 1 i m, are independent. Hence, as 13), we can show that for any b m, G sup N,m t) 21) 1 0 t ĉ m Ĝ κ t) 0 in probability. For t = O log m), under the conditions of Theorem 3.2, we have Ĝ κ t)/g κ t) = 1+o1). So, it is easy to see that 13) still holds when G 1 κ b m/m) is replaced by Ĝκ 1b m/m). This implies that for any b m, 22) sup i H 0 I{ T i t} 0 t ĉ m m 0 G N,m t) 1 0 in probability. Let Then we have { ˆt 0 = sup 0 t 1:t α max 1 i m I{ˆp i,b t}, 1) m ˆt 0 = α max 1 i m I{ˆp i,b ˆt 0 }, 1). m }.

20 2022 W. LIU AND Q.-M. SHAO According to 12)and20)wehave,givenX and on the event F, G i c log m) = m c2 /2+o1) for any c> 2 uniformly in i. So, by Markov s inequality, for any ε>0, we have PG N,m c log m) m c2 /2+ε ) 1. By 2) and18), we have Pˆt 0 αb m /m) 1forsomeb m. It follows from 22) that i H 0 I{ˆp i,b ˆt 0 } 1 m 0ˆt 0 in probability. This finishes the proof of Theorem 2.21), 2) and Theorem 4.2 if we can show that PF) 1. Without loss of generality, we can assume that μ i = 0 and σ i = 1. We first show that for some constant K 1 > 0, n P max X 4 ki EXki 4 ) ) 23) K 1 n = o1). 1 i m For 1 i n, put ˆX ki = X ki I { X ki n/ log m }, X ki = X ki ˆX ki. Then, for large n, n P max X 1 i m ki 4 E X 4 ) ) ki K 1 n/2 nm max P X 1i n/ log m ) 1 i m C explog m + log n tn/log m) = o1). Let Z ki = ˆX ki 4 E ˆX ki 4. By the inequality es 1 s s 2 e maxs,0) and 1 + s e s, we have for η = 2 1 tlog m)/n and some large K 1 ) n P max Z ki K 1 n/2 1 i m m n P 2 i=1 m i=1 m ) ) m n Z ki K 1 n/2 + P Z ki K 1 n/2 i=1 [ n ] n exp ηk 1 n/2) expηz ki ) + exp ηz ki ) i=1 exp ηk 1 n/2 + η 2 nez 2 1i eη Z 1i ) C exp log m tk 1 log m)/4 ) = o1).

21 LARGE-SCALE t-tests 2023 This proves 23). By replacing Xki 4, η = 2 1 tlog m)/n and K 1 n/2 with Xki 3, η = 2 1 t log m)/n and K 1 n log m/2, respectively, in the above proof, we can show that 1 n P max X 3 1 i m n ki EXki) ) 3 24) K 1 log m)/n = o1). Similarly, we have 1 n P max X 2 1 i m n ki EXki) ) 2 25) K 1 log m)/n = o1) and ) 1 n 26) P max X 1 i m ki EX ki ) n K 1 log m)/n = o1). Combining 23) 26), we prove that PF) Proof of Theorems 3.1 and 4.3. Let { } 1 n ˆF = max 1 i m n ˆσ i 4 ˆX ki ˆX i ) 4 K 1, max ˆκ i λ ni ) κ i K 2 log m/n. 1 i m By the proof of Theorems 2.2 and 4.2, it is enough to show that P ˆF) 1. Recall that ˆX ki = X ki I{ X ki λ ni } and put Z ki = ˆX ki 4 E ˆX ki 4.Takeη = log m)/n. We have ) n P max Z ki K 1 n/2 1 i m m 2 i=1 exp ηk 1 n/2 + η 2 nez 2 1i eη Z 1i ) C exp 2logm K 1 log m)/4 ) = o1). Similarly, by replacing ˆX ki 4, η = log m)/n and K 1n/2 with ˆX ki 3, η = log m)/n and K 1 n log m/2, respectively, in the above proof, we can show that 1 n P max ˆX 1 i m ki 3 n E ˆX ki) ) 3 K 1 log m)/n = o1). Also, using the above arguments, it is easy to show that 1 n P max ˆX 1 i m ki 2 n E ˆX ki) ) 2 K 1 log m)/n = o1)

22 2024 W. LIU AND Q.-M. SHAO and Note that and P 1 max n 1 i m ) n ˆX ki E ˆX ki ) K 1 log m)/n = o1). max E X 1i 3 I { } X 1i λ ni C 1 i m log m n max 1 i m EX6 1i max E X 1i 2 I { ) } log m 2/3 X 1i λ ni C max 1 i m n 1 i m EX6 1i. This proves that P ˆF) Proof of Theorem 4.1. Recall that mgˆt) max 1 i m I{ T i ˆt}, 1) = α. From 18), we have Pˆt G 1 αb m /m)) 1. The theorem follows from 13) and the fact that G κ t)/gt) = 1 + o1) uniformly in t [0,on 1/6 )) Proof of Propositions 2.1, 2.2, 2.3 and 3.1. To save space, the proof of these propositions is given in the supplementary material Liu and Shao 2014). Acknowledgments. The authors would like to thank the Associate Editor and two referees for their valuable comments, which have helped to improve the quality and presentation of this paper. SUPPLEMENTARY MATERIAL Supplement to Phase transition and regularized bootstrap in large-scale t- tests with false discovery rate control DOI: /14-AOS1249SUPP;.pdf). The supplementary material includes part of numerical results and the proof of Lemma 6.1 and Propositions 2.1, 2.2, 2.3 and 3.1. REFERENCES BENJAMINI, Y. and HOCHBERG, Y. 1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. BStat. Methodol MR BENJAMINI, Y. and YEKUTIELI, D. 2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist MR CAO, H. and KOSOROK, M. R. 2011). Simultaneous critical values for t-tests in very high dimensions. Bernoulli MR

23 LARGE-SCALE t-tests 2025 DELAIGLE, A., HALL, P. and JIN, J. 2011). Robustness and accuracy of methods for high dimensional data analysis based on Student s t-statistic. J. R. Stat. Soc. Ser. BStat. Methodol MR EFRON, B. 2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc MR FAN, J., HALL, P. and YAO, Q. 2007). To how many simultaneous hypothesis tests can normal, Student s t or bootstrap calibration be applied? J. Amer. Statist. Assoc MR FERREIRA, J. A. and ZWINDERMAN, A. H. 2006). On the Benjamini Hochberg method. Ann. Statist MR LIU, W. and SHAO, Q. 2014). Supplement to Phase transition and regularized bootstrap in largescale t-tests with false discovery rate control. DOI: /14-AOS1249SUPP. SHAO, Q.-M. 1999). A Cramér type large deviation result for Student s t-statistic. J. Theoret. Probab MR STOREY, J. D. 2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist MR STOREY, J. D., TAYLOR, J. E. and SIEGMUND, D. 2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. BStat. Methodol MR WANG, Q. 2005). Limit theorems for self-normalized large deviation. Electron. J. Probab electronic). MR WU, W. B. 2008). On false discovery control under dependence. Ann. Statist MR DEPARTMENT OF MATHEMATICS AND INSTITUTE OF NATURAL SCIENCES SHANGHAI JIAO TONG UNIVERSITY SHANGHAI CHINA weidongl@sjtu.edu.cn DEPARTMENT OF STATISTICS THE CHINESE UNIVERSITY OF HONG KONG SHATIN, N.T., HONG KONG CHINA

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao

TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis of microarray data, and in some other

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? arxiv:math/0701003v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis

More information

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

Doing Cosmology with Balls and Envelopes

Doing Cosmology with Balls and Envelopes Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS

RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS Statistica Sinica 19 (2009, 343-354 RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS Qiying Wang and Peter Hall University of Sydney and University of Melbourne Abstract:

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Self-normalized Cramér-Type Large Deviations for Independent Random Variables Self-normalized Cramér-Type Large Deviations for Independent Random Variables Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu 1. Introduction Let X, X

More information

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University The Annals of Statistics 2006, Vol. 34, No. 1, 394 415 DOI: 10.1214/009053605000000778 Institute of Mathematical Statistics, 2006 FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING

More information

On adaptive procedures controlling the familywise error rate

On adaptive procedures controlling the familywise error rate , pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN Dedicated to the memory of Mikhail Gordin Abstract. We prove a central limit theorem for a square-integrable ergodic

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Estimation and Confidence Sets For Sparse Normal Mixtures

Estimation and Confidence Sets For Sparse Normal Mixtures Estimation and Confidence Sets For Sparse Normal Mixtures T. Tony Cai 1, Jiashun Jin 2 and Mark G. Low 1 Abstract For high dimensional statistical models, researchers have begun to focus on situations

More information

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

Control of the False Discovery Rate under Dependence using the Bootstrap and Subsampling

Control of the False Discovery Rate under Dependence using the Bootstrap and Subsampling Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 337 Control of the False Discovery Rate under Dependence using the Bootstrap and

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

arxiv: v1 [math.st] 15 Nov 2017

arxiv: v1 [math.st] 15 Nov 2017 Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING arxiv:1711.05381v1 [math.st] 15 Nov 2017 By

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),

More information

Heterogeneity and False Discovery Rate Control

Heterogeneity and False Discovery Rate Control Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria

More information

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied

More information

In Memory of Wenbo V Li s Contributions

In Memory of Wenbo V Li s Contributions In Memory of Wenbo V Li s Contributions Qi-Man Shao The Chinese University of Hong Kong qmshao@cuhk.edu.hk The research is partially supported by Hong Kong RGC GRF 403513 Outline Lower tail probabilities

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

SPRING 2007 EXAM C SOLUTIONS

SPRING 2007 EXAM C SOLUTIONS SPRING 007 EXAM C SOLUTIONS Question #1 The data are already shifted (have had the policy limit and the deductible of 50 applied). The two 350 payments are censored. Thus the likelihood function is L =

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES

A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES Bull. Korean Math. Soc. 52 (205), No. 3, pp. 825 836 http://dx.doi.org/0.434/bkms.205.52.3.825 A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES Yongfeng Wu and Mingzhu

More information

ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY

ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY J. Korean Math. Soc. 45 (2008), No. 4, pp. 1101 1111 ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY Jong-Il Baek, Mi-Hwa Ko, and Tae-Sung

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for

More information

Two-stage stepup procedures controlling FDR

Two-stage stepup procedures controlling FDR Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,

More information

EMPIRICAL EDGEWORTH EXPANSION FOR FINITE POPULATION STATISTICS. I. M. Bloznelis. April Introduction

EMPIRICAL EDGEWORTH EXPANSION FOR FINITE POPULATION STATISTICS. I. M. Bloznelis. April Introduction EMPIRICAL EDGEWORTH EXPANSION FOR FINITE POPULATION STATISTICS. I M. Bloznelis April 2000 Abstract. For symmetric asymptotically linear statistics based on simple random samples, we construct the one-term

More information

Multiple testing with the structure-adaptive Benjamini Hochberg algorithm

Multiple testing with the structure-adaptive Benjamini Hochberg algorithm J. R. Statist. Soc. B (2019) 81, Part 1, pp. 45 74 Multiple testing with the structure-adaptive Benjamini Hochberg algorithm Ang Li and Rina Foygel Barber University of Chicago, USA [Received June 2016.

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence Estimating False Discovery Proportion Under Arbitrary Covariance Dependence arxiv:1010.6056v2 [stat.me] 15 Nov 2011 Jianqing Fan, Xu Han and Weijie Gu May 31, 2018 Abstract Multiple hypothesis testing

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Refining the Central Limit Theorem Approximation via Extreme Value Theory Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING

A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING By Wen-Xin Zhou, Koushiki Bose, Jianqing Fan,

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

A REVERSE TO THE JEFFREYS LINDLEY PARADOX

A REVERSE TO THE JEFFREYS LINDLEY PARADOX PROBABILITY AND MATHEMATICAL STATISTICS Vol. 38, Fasc. 1 (2018), pp. 243 247 doi:10.19195/0208-4147.38.1.13 A REVERSE TO THE JEFFREYS LINDLEY PARADOX BY WIEBE R. P E S T M A N (LEUVEN), FRANCIS T U E R

More information

Comprehensive Examination Quantitative Methods Spring, 2018

Comprehensive Examination Quantitative Methods Spring, 2018 Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part

More information

Testing Jumps via False Discovery Rate Control

Testing Jumps via False Discovery Rate Control Testing Jumps via False Discovery Rate Control Yu-Min Yen August 12, 2011 Abstract Many recently developed nonparametric jump tests can be viewed as multiple hypothesis testing problems. For such multiple

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

A remark on the maximum eigenvalue for circulant matrices

A remark on the maximum eigenvalue for circulant matrices IMS Collections High Dimensional Probability V: The Luminy Volume Vol 5 (009 79 84 c Institute of Mathematical Statistics, 009 DOI: 04/09-IMSCOLL5 A remark on the imum eigenvalue for circulant matrices

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

arxiv: v2 [stat.me] 14 Mar 2011

arxiv: v2 [stat.me] 14 Mar 2011 Submission Journal de la Société Française de Statistique arxiv: 1012.4078 arxiv:1012.4078v2 [stat.me] 14 Mar 2011 Type I error rate control for testing many hypotheses: a survey with proofs Titre: Une

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Control of Generalized Error Rates in Multiple Testing

Control of Generalized Error Rates in Multiple Testing Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 245 Control of Generalized Error Rates in Multiple Testing Joseph P. Romano and

More information

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Optional Stopping Theorem Let X be a martingale and T be a stopping time such Plan Counting, Renewal, and Point Processes 0. Finish FDR Example 1. The Basic Renewal Process 2. The Poisson Process Revisited 3. Variants and Extensions 4. Point Processes Reading: G&S: 7.1 7.3, 7.10

More information

New Approaches to False Discovery Control

New Approaches to False Discovery Control New Approaches to False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Statistica Sinica 26 (2016), 1117-1128 doi:http://dx.doi.org/10.5705/ss.202015.0240 A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Xu He and Peter Z. G. Qian Chinese Academy of Sciences

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Journal Club: Higher Criticism

Journal Club: Higher Criticism Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information