PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL
|
|
- Merry Harrison
- 6 years ago
- Views:
Transcription
1 The Annals of Statistics 2014, Vol. 42, No. 5, DOI: /14-AOS1249 Institute of Mathematical Statistics, 2014 PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL BY WEIDONG LIU 1 AND QI-MAN SHAO 2 Shanghai Jiao Tong University and The Chinese University of Hong Kong Applying the Benjamini and Hochberg B H) method to multiple Student s t tests is a popular technique for gene selection in microarray data analysis. Given the nonnormality of the population, the true p-values of the hypothesis tests are typically unknown. Hence it is common to use the standard normal distribution N0, 1), Student s t distribution t n 1 or the bootstrap method to estimate the p-values. In this paper, we prove that when the population has the finite 4th moment and the dimension m and the sample size n satisfy log m = on 1/3 ), the B H method controls the false discovery rate FDR) and the false discovery proportion FDP) at a given level α asymptotically with p-values estimated from N0, 1) or t n 1 distribution. However, a phase transition phenomenon occurs when log m c 0 n 1/3. In this case, the FDR and the FDP of the B H method may be larger than α or even converge to one. In contrast, the bootstrap calibration is accurate for log m = on 1/2 ) as long as the underlying distribution has the sub-gaussian tails. However, such a light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap calibration is very conservative for the heavy tailed distributions. To solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than its usual counterpart. 1. Introduction. Multiple Student s t tests often arise in many real applications, such as gene selection. Consider m tests on the mean values H 0i : μ i = 0 versus H 1i : μ i 0, 1 i m. A popular procedure is to use the Benjamini and Hochberg B H) method to search for significant findings, with the false discovery rate FDR) controlled at a given Received October 2013; revised May Supported by NSFC, Grant No and No , the Program for Professor of Special Appointment Eastern Scholar) at Shanghai Institutions of Higher Learning, Shanghai Pujiang Program, Foundation for the Author of National Excellent Doctoral Dissertation of PR China and a grant from Australian Research Council. 2 Supported in part by Hong Kong RGC GRF and MSC2010 subject classifications. 62H15. Key words and phrases. Bootstrap correction, false discovery rate, multiple t-tests, phase transition. 2003
2 2004 W. LIU AND Q.-M. SHAO level 0 <α<1; that is, [ ] V E α, R 1 where V is the number of wrongly rejected hypotheses and R is the total number of rejected hypotheses. The seminal work of Benjamini and Hochberg 1995)isto reject the null hypotheses for which p i p ˆk),wherep i is the p-value for H 0i, 1) ˆk = max{0 i m : p i) αi/m} and p 1) p m) are the ordered p-values. Let T 1,...,T m be Student s t test statistics T i = X i ŝ ni / n, where X i = 1 n X ki, ŝni 2 n = 1 n X ki X i ) 2, n 1 and X k1,...,x km ),1 k n, are i.i.d. random samples from X 1,...,X m ). When T 1,...,T m are independent and the true p-values p i are known, Benjamini and Hochberg 1995) showed that the B H method controls the FDR at level α. In many applications, the distributions of X i,1 i m, are non-gaussian. Hence it is difficult to know the exact null distributions of T i and the true p-values. When applying the B H method, the p-values are actually estimators. According to the central limit theorem, it is common to use the standard normal distribution N0, 1) or Student s t distribution t n 1 to estimate the p-values, where t n 1 denotes the Student s t random variable with n 1 degrees of freedom. In a microarray analysis, Efron 2004) observed that the null distribution choices substantially affect the simultaneous inference procedure. However, a systematic theoretical study on the influence of the estimated p-values is still lacking. It is important to know how accurate N0, 1) and t n 1 calibrations can be. In this paper, we show that N0, 1) and t n 1 calibrations are accurate when log m = on 1/3 ). Moreover, if the underlying distributions are symmetric, then the dimension can be as large as log m = on 1/2 ). Under the finite 4th moment of X i, the FDR and the false discovery proportion FDP) of the B H method with the estimated p- values ˆp i, = 2 2 T i ) or ˆp i, n 1 = 2 2 n 1 T i ) will converge to αm 0 /m, where m 0 is the number of true null hypotheses, t) is the standard normal distribution and n 1 t) = Pt n 1 t). However, when log m c 0 n 1/3 for some c 0 > 0 and the distributions are asymmetric, N0, 1) and t n 1 calibrations may not work well, and a phase transition phenomenon occurs. Under log m c 0 n 1/3, the number of true alternative hypotheses m 1 = expon 1/3 )) and the average of skewnesses τ = lim m m 1 0 i H 0 EX 3 i /σ 3 i > 0, we show that the FDR of
3 LARGE-SCALE t-tests 2005 the B H method satisfies lim m,n) FDR κ for some constant κ>α,where H 0 ={i : μ i = 0}. Furthermore, if log m/n 1/3, then lim m,n) FDR = 1. Similar results are proven for the false discovery proportion. This indicates that N0, 1) and t n 1 calibrations are inaccurate when the average of skewnesses τ 0 in the ultra high dimensional setting. It is well known that the bootstrap is an effective way to improve the accuracy of an exact null distribution approximation. Fan, Hall and Yao 2007) showed that for the bounded noise, the bootstrap can improve the accuracy and allow a higher dimension log m = on 1/2 ) on controlling the family-wise error rate. Delaigle, Hall and Jin 2011) showed that the bootstrap method has significant advantages in higher criticism. In this paper, we show that when the bootstrap calibration is used and log m = on 1/2 ), the B H method can asymptotically control FDR and FDP at level α. In our results, we assume the sub-gaussian tails instead of the bounded noise in Fan, Hall and Yao 2007). Although the bootstrap method allows for a higher dimension, the light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap method is very conservative for the heavy-tailed distributions. To solve this problem, we propose a regularized bootstrap method that is robust to the tails of the distributions. The proposed regularized bootstrap only requires a finite 6th moment, and the dimension can be as large as log m = on 1/2 ). It is also not uncommon in real applications for X 1,...,X m to be dependent. This results in a dependency between T 1,...,T m. In this paper, we obtain some similar results for the B H method under a general weak dependence condition. It should be noted that much work has been done on the robustness of the FDR/FDP controlling method against dependence. Benjamini and Yekutieli 2001) proved that the B H procedure controlled FDR under positive regression dependency. Storey 2003), Storey, Taylor and Siegmund 2004) andferreira and Zwinderman 2006) imposed a dependence condition that required the law of large numbers for the empirical distributions under the null and alternative hypothesis. Wu 2008) developed FDR controlling procedures for the data coming from special models, such as the time series model. However, to satisfy the conditions in most of the existing methods, it is often necessary to assume that the number of true alternative hypotheses m 1 is asymptotically π 1 m with some π 1 > 0. They exclude the sparse setting m 1 = om), which is important in applications such as gene selection. For example, if m 1 = om), then the conditions of Theorem 4 in Storey, Taylor and Siegmund 2004), and the conditions of the main results in Wu 2008) are not satisfied. In contrast, our results on FDR and FDP control under dependence allows m 1 γmfor some γ<1. The remainder of this paper is organized as follows. In Section 2.1, we show the robustness of and the phase transition phenomenon for the N0, 1) and t n 1 calibrations. In Section 2.2, we show that the bootstrap calibration can improve the FDR and FDP control. The regularized bootstrap method is proposed in Section 3. The results are extended to the dependence case in Section 4. The simulation study
4 2006 W. LIU AND Q.-M. SHAO is presented in Section 5 and the proofs are postponed to Section 6. Throughout the paper, all constants such as γ,b 0,c 0 in the upper bounds and lower bounds do not depend on n and m. 2. Main results Robustness and phase transition. In this section, we assume that the Student s t test statistics T 1,...,T m are independent, and the results are extended to the dependent case in Section 4. Before stating the main theorems, we introduce some notation. Let ˆp i, = 2 2 T i ) and ˆp i, n 1 = 2 2 n 1 T i ) be the p- values calculated from the standard normal distribution and the t-distribution, respectively. Let FDR and FDR n 1 be the FDR of the B H method with ˆp i, and ˆp i, n 1 in 1), respectively. Similarly, we denote the false discovery proportions of the B H method by FDP = R 1 V )andfdp n 1. Recall that R is the total number of rejections. The critical values of the tests are then ˆt = 1 1 αr/2m)) and ˆt n 1 = n αr/2m)). SetY i = X i μ i )/σ i with σi 2 = VarX i ), 1 i m. Recall that m 1 is the number of true alternative hypotheses. Throughout this paper, we assume m 1 γmfor some γ<1, which includes the important sparse setting m 1 = om). THEOREM 2.1. Suppose X 1,...,X m are independent and log m = on 1/2 ). Assume that max 1 i m EYi 4 b 0 for some constant b 0 > 0 and Card { 2) i : μ i /σ i 4 log m/n }. Then where lim n,m) FDR m 0 /m)ακ = 1 and lim n,m) FDR n 1 m 0 /m)ακ n 1 = 1, κ = E [ˆκ I {ˆκ 2α αγ) 1}], i H ˆκ = 0 { expˆt 3 EX3 i / nσi 3)) + exp ˆt 3 EX3 i / nσi 3))} 2m 0 and κ n 1 is defined in the same way. For the false discovery proportion, we have FDP m 0 /m)αˆκ 1 and FDP n 1 m 0 /m)αˆκ n 1 1 in probability as n, m). Let τ = lim m m 1 0 i H 0 EYi 3. We have the following corollary.
5 COROLLARY 2.1. i) If log m = on 1/3 ), then we have LARGE-SCALE t-tests 2007 Assume that the conditions in Theorem 2.1 are satisfied. lim FDR /αm 0 /m) = 1 and FDP /αm 0 /m) 1 in probability. n,m) ii) Suppose log m c 0 n 1/3 for some c 0 > 0 and m 1 = expon 1/3 )). Also assume that τ>0. Then lim n,m) FDR β and lim n,m) PFDP β) = 1 for some constant β>α. iii) Suppose log m/n 1/3 and m 1 = expon 1/3 )). Assume that τ>0. Then we have lim n,m) FDR = 1 and FDP 1 in probability. The same conclusions hold for FDR n 1 and FDP n 1. Theorem 2.1 and Corollary 2.1 show that when log m = on 1/3 ), N0, 1) and t n 1 calibrations are accurate. Note that only a finite 4th moment of Y i is required. Furthermore, if the skewnesses EYi 3 = 0fori H 0, then the dimension can be as large as log m = on 1/2 ). However, a phase transition occurs if the average of skewnesses τ>0, for example, for the exponential distribution. The FDR and FDP of the B H method are greater than α as long as log m c 0 n 1/3 and converge to one when log m/n 1/3. Under a finite 4th moment of X i, Cao and Kosorok 2011) prove the robustness of Student s t test statistics and N0, 1) calibration in the control of FDR and FDP. They require m 1 /m c for some 0 <c<1, which does not cover the sparse case. Corollary 2.1 also indicates that the choice of asymptotic null distributions is important in the study of large-scale testing problems. When the dimension is much larger than the sample size, simply using the null limiting distribution to estimate the true p-values may result in larger FDR and FDP. This is further verified by our simulation study in Section 5. In Theorem 2.1 and Corollary 2.1, we require technical condition 2). Actually, this condition is nearly optimal for the FDP results. If the number of true alternative hypotheses m 1 is fixed as m, then Proposition 2.1 below shows that even for the true p-values, the B H method is unable to control FDP at any level 0 <ξ<1 with overwhelming probability. Note that 2) is only slightly stronger than m 1. Let FDP true be the false discovery proportion of the B H method, with the true p-values p i,1 i m.letu0, 1) be the uniform random variable on 0, 1). PROPOSITION 2.1. Assume that m 1 is fixed as m and X 1,...,X m are independent. Suppose that p i U0, 1) for i H 0. For any 0 <ξ<1, we have lim PFDP true ξ) η n,m) for some η>0, where η may depend on m 1 and ξ.
6 2008 W. LIU AND Q.-M. SHAO Proposition 2.1 indicates that m 1 is a necessary condition for FDP control. In contrast, the control of FDR does not need m 1 when log m = on 1/3 ). However, FDR and FDR n 1 may still converge to one if log m/n 1/3 and τ>0. PROPOSITION 2.2. Suppose m 1 is fixed as m, X 1,...,X m are independent and log m = on 1/2 ). Assume that max 1 i m EYi 4 b 0 for some constant b 0 > 0. i) If log m = on 1/3 ) and p i U0, 1) for i H 0, then lim n,m) FDR α. ii) Suppose log m/n 1/3. Assume that τ > 0. We have lim n,m) FDR = 1. The same conclusions remain valid for FDR n Bootstrap calibration. In this section, we show that the bootstrap procedure can improve the accuracy of FDR and FDP control. Write X i ={X 1i,..., X ni }.LetXki ={X 1ki,...,X nki },1 k N, be resamples drawn randomly with replacement from X i.lettki be Student s t test statistics constructed from {X1ki X i,...,xnki X i }.WeuseG N,m t) = Nm 1 N mi=1 I{ Tki t} to approximate the null distribution and define the p-values by ˆp i,b = G N,m T i ). Let FDR B and FDP B denote the FDR and FDP of the B H method with ˆp i,b in 1), respectively. THEOREM 2.2. Suppose that max 1 i m Ee ty2 i K for some constants t>0 and K>0, and the conditions in Theorem 2.1 are satisfied. 3) i) If log m = on 1/3 ), then we have lim FDR B/αm 0 /m) = 1 and FDP B /αm 0 /m) 1 n,m) ii) If log m = on 1/2 ) and m 1 m η for some η<1, then 3) holds. in probability. Another common bootstrap method is to estimate the p-values individually by p i,b = G i T i), whereg i t) = N 1 N I{Tki t}; seefan, Hall and Yao 2007) and Delaigle, Hall and Jin 2011). Similar results to those achieved in Theorem 2.2 can be obtained if N is large enough. Let FDR B and FDP B be the FDR and FDP of the B H method with p i,b, respectively. The following result holds. PROPOSITION 2.3. Suppose that N m 2+δ for some δ > 0, max 1 i m Ee ty2 i K for some constants t>0and K>0, and log m = on 1/2 ). Assume that X 1,...,X m are independent.
7 LARGE-SCALE t-tests 2009 i) If 2) holds, then the results of Theorem 2.2i) and ii) hold for FDR B and FDP B. ii) Suppose that m 1 is fixed and p i U0, 1) for i H 0. If log m = on 1/2 ), then we have lim n,m) FDR B α. Fan, Hall and Yao 2007) proved that the bootstrap calibration accurately controls the family-wise error rate if log m = on 1/2 ) and P Y i C) = 1for 1 i m. Our result on FDR control only requires the sub-gaussian tails, which is a weaker requirement than the bounded noise. The bootstrap method has often been used in multiple Student s t tests in real applications. Fan, Hall and Yao 2007) anddelaigle, Hall and Jin 2011) have proven that the bootstrap method provides more accurate p-values than the normal or t n 1 approximation for the light-tailed distributions. Theorem 2.2 and Proposition 2.3 show that the bootstrap method allows a higher dimension log m = on 1/2 ) for FDR control as long as max 1 i m Ee ty2 i K. However, some real data may not satisfy such a light-tailed condition. The simulation study in Section 5 also indicates that the bootstrap calibration does not always outperform the N0, 1) or t n 1 calibrations. 3. Regularized bootstrap in large-scale tests. In this section, we introduce a regularized bootstrap method that is robust for heavy-tailed distributions, and the dimension m can be as large as e on1/2). For the regularized bootstrap method, the finite 6th moment condition is enough. Let λ ni be a regularization parameter. Define ˆX ki = X ki I { } X ki λ ni, 1 k n, 1 i m. Write Xˆ i ={ˆX 1i,..., ˆX ni }.LetXˆ ki ={ˆX 1ki,..., ˆX nki },1 k N, be resamples drawn independently and uniformly with replacement from Xˆ i.let ˆT ki be Student s t test statistics constructed from { ˆX 1ki ˆX i,..., ˆX nki ˆX i },where ˆX i = n 1 n ˆX ki.weuseĝ N,m t) = Nm 1 N mi=1 I{ ˆT ki t} to approximate the null distribution and define the p-values by ˆp i,rb = Ĝ N,m T i ). LetFDR RB and FDP RB be the FDR and FDP of the B H method with ˆp i,rb in 1), respectively. THEOREM 3.1. Assume that max 1 i m EXi 6 K for some constant K>0. Suppose X 1,...,X m are independent, 2) holds and min 1 i m σ ii c 1 for some c 1 > 0. Let c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. 4) i) If log m = on 1/3 ), then lim n,m) FDR RB/αm 0 /m) = 1 and FDP RB /αm 0 /m) 1 in probability.
8 2010 W. LIU AND Q.-M. SHAO ii) If log m = on 1/2 ) and m 1 m η for some η<1, then 4) remains valid. In Theorem 3.1, we only require max 1 i m EXi 6 K, which is much weaker than the moment condition in Theorem 2.2. As in Section 2.2, we can also estimate the p-values individually by p i,rb = Ĝ i T i), whereĝ i t) = N 1 N I{ ˆT ki t}. LetFDR RB and FDP RB be the FDR and FDP of the B H method with p i,rb, respectively. We have the following result. PROPOSITION 3.1. Suppose that N m 2+δ for some δ>0, max 1 i m EXi 6 K for some constant K > 0, min 1 i m σ ii c 1 for some c 1 > 0 and c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. Assume that X 1,..., X m are independent. i) Suppose that 2) holds. Then Theorem 3.1i) and ii) hold for FDRRB and FDPRB. ii) Suppose that m 1 is fixed and p i U0, 1) for i H 0. If log m = on 1/2 ), then we have lim n,m) FDRRB α. Theorem 3.1 does not cover the case when m 1 is fixed. However, if p i,rb, 1 i m are used, then Proposition 3.1 shows that the FDR can be controlled when m 1 is fixed and log m = on 1/2 ). Actually, when m 1 is fixed and log m = on 1/3 ), by the proof of Propositions 2.2 and 3.1, we can show that lim n,m) FDR RB α. It is unclear whether the similar result holds for FDR RB when the dimension becomes larger, that is, log m = on 1/2 ). However, under 2), Theorem 3.1 only requires N 1 because we use the average of all m variables. Hence ˆp i,rb have the significant advantage on the computational cost over p i,rb. Moreover, Proposition 2.1 indicates that 2) is nearly necessary for FDP control. Note that when one has FDP control, one can also have FDR control, but the reverse is not true, as Proposition 2.1 shows. Because FDR control is about the FDP average, studying FDP is more appealing in applications than FDR control. In the regularized bootstrap method, we must choose the regularization parameter λ ni. By Theorem 1.2 in Wang 2005), equation 2.2) in Shao 1999) andthe proof of Theorem 3.1, wehave P ˆT t X ˆ ) = 1 t 3 ) [exp 2 Gt) ˆκ i λ ni ) + exp t3 1 ˆκ i λ ni ))] + op 1) ), n n ki uniformly for 0 t on 1/4 ),wheregt) = 2 2 t), X ˆ ={ Xˆ 1,..., Xˆ m }, ˆκ i λ ni ) = 1 n n ˆσ i 3 ˆX ki ˆX i ) 3 and ˆσ i 2 = 1 n 5) ˆX ki ˆX i ) 2. n Also, P T i t ) = 1 t 3 ) [exp 2 Gt) κ i + exp t3 )] 1 ) κ i + o1), n n
9 LARGE-SCALE t-tests 2011 uniformly for 0 t on 1/4 ),whereκ i = EY 3 i. A good choice of λ ni is to make ˆκ i λ ni ) approach κ i.asκ i is unknown, we propose the following cross-validation method. Data-driven choice of λ ni. We propose to choose ˆλ ni = X i +ŝ ni λ,whereλ will be selected as follows. Split the samples into two parts I 0 ={1,...,n 1 } and I 1 ={n 1 + 1,...,n} with sizes n 0 =[n/2] and n 1 = n n 0, respectively. For I = I 0 or I 1,let ˆκ i,i = X i,i = 1 I 1 I ŝni,i 3 k I k I X ki. X ki X i,i ) 3, ŝ 2 ni,i = 1 I X ki X i,i ) 2, Let ˆκ i,i λ ni ), with λ ni = X i,i +ŝ ni,i λ/2, be defined as in 5) based on { ˆX ki,k I}. Define the risk k I We choose λ by 6) R j λ) = m ) 2. ˆκi,Ij λ ni ) ˆκ i,i1 j i=1 { ˆλ = arg min R0 λ) + R 1 λ) }. 0<λ< The final regularization parameter is ˆλ ni = X i +ŝ ni ˆλ. The numerical performance comparison between the data-driven choice ˆλ ni and the theoretical choice [e.g., n/ log m) 1/6 ] is given in Section 5. In addition, it is important to investigate the theoretical property of ˆλ ni and to see whether Theorem 3.1 still holds when ˆλ ni is used. We leave this for future work. 4. FDR control under dependence. To generalize the results to the dependent case, we introduce a class of correlation matrices. Let A = a ij ) beasymmetric matrix. Let k m and s m be positive numbers. Assume that for every 1 j m, 7) Card { 1 i m : a ij k m } sm. Let Ak m,s m ) be the class of symmetric matrices satisfying 7). Let R = r ij ) be the correlation matrix of X. We introduce the following two conditions: C1) Suppose that max 1 i<j m r ij r for some 0 <r<1 and R Ak m,s m ) with k m = log m) 2 θ and s m = Om ρ ) for some θ>0and0<ρ< 1 r)/1 + r). C1 ) Suppose that max 1 i<j m r ij r for some 0 <r<1. For each X i, assume that the number of variables X j that are dependent with X i is no more than s m.
10 2012 W. LIU AND Q.-M. SHAO C1) and C1 ) impose the weak dependence between X 1,...,X m.inc1), each variable can be highly correlated with other s m variables and weakly correlated with the remaining variables. C1 ) is stronger than C1). For each X i,c1 ) requires the independence between X i and other m s m variables. Recall that m 1 γmfor some γ<1. THEOREM 4.1. Assume that max 1 i m EY 4 i b 0 for some constant b 0 > 0, and 2) holds. i) If log m = On ζ ) for some 0 <ζ <3/23 and C1) is satisfied, then we have 8) lim n,m) FDR m 0 /m)α = 1 and FDP 1 in probability. m 0 /m)α ii) Under log m = on 1/3 ) and C1 ), 8) also holds. The same conclusions hold for FDR n 1 and FDP n 1. For the bootstrap and regularized procedures, we have similar results. THEOREM 4.2. Suppose that max 1 i m Ee ty2 i K and 2) is satisfied. 1) Under the conditions of i) or ii) in Theorem 4.1, we have 9) lim n,m) FDR B m 0 /m)α = 1 and FDP B 1 in probability. m 0 /m)α 2) Under C1 ), log m = on 1/2 ) and m 1 m η for some η<1, 9) holds. THEOREM 4.3. Suppose that max 1 i m EXi 6 K for some constant K>0, min 1 i m σ ii c 1 for some c 1 > 0 and 2) is satisfied. Let c 2 n/ log m) 1/6 λ ni c 3 n/ log m) 1/6 for some c 2,c 3 > 0. 1) Under the conditions of i) or ii) in Theorem 4.1, we have 10) lim n,m) FDR RB m 0 /m)α = 1 and FDP RB 1 in probability. m 0 /m)α 2) Under C1 ), log m = on 1/2 ) and m 1 m η for some η<1, 10) holds. Theorems imply that the B H method remains valid asymptotically for weak dependence. As the phase transition phenomenon caused by the growth of the dimension, it would be interesting to investigate when the B H method will fail to control the FDR as the correlation becomes stronger.
11 LARGE-SCALE t-tests Numerical study. In this section, we conduct a small simulation to verify the phase transition phenomenon. Let 11) X i = μ i + ε i Eε i ), 1 i m, where ε 1,...,ε m ) are i.i.d. random variables. We consider three models for ε i and μ i. Model 1. ε i is the exponential random variable with parameter 1. Let μ i = 2σ log m/n for 1 i m 1 with m 1 = 0.05m and μ i = 0form 1 <i m, where σ 2 = Varε i ). Model 2-1. ε i is the Gamma random variable with parameter 0.5, 1). Letμ i = 4σ log m/n for 1 i m 1 with m 1 = 0.05m and μ i = 0form 1 <i m. Model 2-2. ε i is the Gamma random variable with parameter 0.5, 1). Let m 1 = 0. In all three models, the average of skewness is τ>0. We generate n = 30, 50 independent random samples from 11). In our simulation, α is taken to be 0.1, 0.2, 0.3 andm is taken to be 500, 1000, For computational reasons, we only consider the estimated p-values ˆp i,b and ˆp i,rb in the bootstrap and regularized bootstrap procedures, respectively. The number of bootstrap resamples is taken to be N = 200. We use FDR B,FDR RB and FDR RB to denote the FDR of the B H method with bootstrap, regularized bootstrap with data-driven ˆλ ni and regularized bootstrap with theoretical λ ni = n/ log m) 1/6, respectively. The simulation is replicated 1000 times and the empirical FDR and power for m = 3000 are summarized in Tables 1 and 2. To save space, we leave the simulation results for m = 500 and 1000 in the supplementary material of Liu and Shao 2014). The empirical power is defined by the average ratio between the number of correct rejections and m 1. Due to the nonzero skewness and m expn 1/3 ), the empirical FDR and FDR n 1 are much larger than the target FDR. The bootstrap method and the regularized bootstrap method with data-driven ˆλ ni provide more accurate approximations for the true p-values. Thus the empirical FDR B and FDR RB are much closer to α than FDR and FDR n 1. For Models 1, 2-1 and 2-2, the bootstrap method and the proposed regularized bootstrap method with data-driven ˆλ ni perform quite similarly. In addition, the data-driven ˆλ ni performs much better than the theoretical λ ni. All of four methods perform better as the sample size n grows from 30 to 50, although the empirical FDR and FDR n 1 still exhibit a serious departure from α. Next, we consider the following two models to compare the performance between the four methods when the distributions are symmetric and heavy tailed. Model 3. ε i is Student s t distribution with 4 degrees of freedom. Let μ i = 2 log m/n for 1 i m 1 with m 1 = 0.1m and μ i = 0form 1 <i m.
12 2014 W. LIU AND Q.-M. SHAO TABLE 1 Comparison of FDR FDR = α, m = 3000) n = 30 n = 50 α exp1) FDR FDR FDR B FDR RB FDR RB Gamma0.5, 1), m 1 = 0.05m FDR FDR FDR B FDR RB FDR RB Gamma0.5, 1), m 1 = 0 FDR FDR FDR B FDR RB FDR RB Model 4. ε i = ε i1 ε i2,whereε i1 and ε i1 are independent lognormal random variables with parameters 0, 1). Letμ i = 4 log m/n for 1 i m 1 with m 1 = 0.1m and μ i = 0form 1 <i m. For these two models, the normal approximation performs the best on the control of FDR; see Tables 3 and 4. FDR B is much smaller than α, so the bootstrap method is quite conservative. This is mainly due to the heavy tails of the t4) and lognormal distributions. The regularized bootstrap method works much better than the bootstrap method to control FDR. Table 4 shows that it also has a higher power power RB ) than the bootstrap method power B ). Hence the proposed regularized bootstrap is more robust than the commonly used bootstrap method. Finally, we examine the FDP control of the B H method when m is small and p-values are known. To this end, we consider Model 5 in which the exact null distributions are known. Model 5. Let ε i be i.i.d. N0, 1) random variables. Let μ i = 2 log m/n for 1 i m 1 and μ i = 0form 1 <i m,wherem 1 = 0, 1and5. In Figure 1, we plot the curve of the tailed probability of FDP based on 5000 replications, that is, 5000 i=1 I{FDP i t}/5000, where FDP i is the true FDP in the ith replication. From Figure 1, we can see that when m 1 is small, the B H method works unfavorably on FDP control. For example, the empirical probability of FDP > 0.4 is1whenm 1 = 0, 0.35 when m 1 = 1and0.12 when m 1 = 5.
13 LARGE-SCALE t-tests 2015 TABLE 2 Comparison of power FDR = α) n = 30 n = 50 m α exp1) 3000 power power power B power RB power RB Gamma0.5, 1), m 1 = 0.05m 3000 power power power B power RB power RB This phenomenon is in accord with Proposition 2.1. In contrast, as indicated by Theorem 2.1, the performance of FDP control improves when m 1 increases. 6. Proof of main results. We begin the proof by showing a uniform law of large numbers 13), which plays a key role in the proof of main results. According to Theorem 1.2 in Wang 2005) and equation 2.2) in Shao 1999), we have for TABLE 3 Comparison of FDR FDR = α) n = 30 n = 50 m α t4) 3000 FDR FDR FDR B FDR RB FDR RB Lognormal0, 1) 3000 FDR FDR FDR B FDR RB FDR RB
14 2016 W. LIU AND Q.-M. SHAO TABLE 4 Comparison of power FDR = α) n = 30 n = 50 m α t4) 3000 power power power B power RB power RB Lognormal0, 1) 3000 power power power B power RB power RB t on 1/4 ), 12) P T i nμ i /ŝ n t ) = 1 2 Gt) [exp 1 + o1) ), t3 3 n κ i ) t 3 )] + exp 3 n κ i where o1) is uniformly in 1 i m, Gt) = 2 2 t) and κ i = EYi 3. For any b m and b m = om), we first prove that, under C1 )andlogm = on 1/2 ) [orc1)andlogm = On ζ ) for some 0 <ζ <3/23], 13) in probability, where sup 0 t G 1 κ G κ t) = 1 2m 0 Gt) i H 0 b m /m) [ exp t3 i H 0 I{ T i t} m 0 G κ t) 3 n κ i 1 0 ) t 3 ) ] + exp 3 n κ i =: Gt)ˆκ t) and G 1 κ t) = inf{y 0:G κy) = t} for 0 t 1. Note that for 0 t o n), G κ t) is a strictly decreasing and continuous function. Let z 0 <z 1 < < z dm 1andt i = G 1 κ z i), wherez 0 = b m /m, z i = b m /m + bm 2/3 e iδ /m, d m = [{logm b m )/bm 2/3 )} 1/δ ] and 0 <δ<1, which will be specified later. Note that G κ t i )/G κ t i+1 ) = 1 + o1) uniformly in i, andt 0 / 2logm/b m ) = 1 + o1). Then to prove 13), it is enough to show that 14) sup i H 0 I{ T i t j } 1 m 0 G κ t j ) 0 0 j d m
15 LARGE-SCALE t-tests 2017 a) m 1 = 0 b) m 1 = 1 c) m 1 = 5 FIG. 1. Tailed probability of FDP with α = 0.2 and n = 50. The y-axis values are the empirical tailed probabilities 5000 i=1 I{FDP i t}/5000. in probability. Under C1), define and under C1 ), define S j = { i H 0 : r ij log m) 2 θ }, S c j = H 0 S j, S j ={i H 0 : X i is dependent with X j }. We claim that, under C1 )andlogm = on 1/2 ) [orc1)andlogm = On ζ ) for some 0 <ζ <3/23], for any ε>0andsomeγ 1 > 0, 15) I 2 t) := E i H 0 { I{Ti t} P T i t )}) 2 Cm 2 0 G2 κ t) 1 m 0 G κ t) + expr + ε)t2 ) /1 + r)) m 1 ρ + log m) 1 γ 1
16 2018 W. LIU AND Q.-M. SHAO uniformly in t [0,K log m] for all K>0. Take 1 + γ 1 ) 1 <δ<1. Given 15) and G 1 κ b m/m) 2logm/b m ),foranyε>0, we have d m ) i H P 0 I{T i t j } 1 m 0 G κ t j ) ε j=0 d m ) i H P 0 I{T i t j } P T i t j )) m j=0 0 G κ t j ) ε/2 C C dm 1 m 0 G κ t 0 ) + b 1 m + b 2/3 m j=1 d m j=1 1 m 0 G κ t j ) + d mm 1+ρ+2r+2ε)/1+r))+o1) e j δ + o1) ) = o1). + d m log m) 1 γ 1 This proves 14). To prove 15), we need the following lemma, which is proven in the supplementary material Liu and Shao 2014). 16) LEMMA 6.1. i) Suppose that log m = On 1/2 ). For any ε>0, max max P T i t, T j >t ) C exp 1 ε)t 2 /1 + r) ) i S j \j j H 0 uniformly in t [0,on 1/4 )). ii) Suppose that log m = On ζ ) for some 0 <ζ <3/23. We have for any K>0 17) P T i >t, T j >t ) = 1 + A n )P T i >t ) P T j >t ) uniformly in 0 t K log m, j H 0 and i S c j, where A n Clog m) 1 γ 1 for some γ 1 > 0. Set f ij t) = P T i t, T j t) P T i t)p T j t). Note that under C1 ) f ij = 0whenj H 0 \ S i.wehave I 2 t) P Ti t, T j t ) + f ij t) j S i j H 0 \S i i H 0 i H 0 Cm 0 G κ t) + C expr + 2ε)t2 /1 + r)) m 1 ρ m 2 0 G2 κ t) + A nm 2 0 G2 κ t), where the last inequality follows from Lemma 6.1 and G κ t) = Gt)e o1)t2 for t = o n). Thisproves15). )
17 LARGE-SCALE t-tests Proof of Theorem 2.1 and Corollary 2.1. We only prove the theorem for ˆp i,. The proof for ˆp i, n 1 is exactly the same when Gt) is replaced with 2 2 n 1 t). By Lemma 1 in Storey, Taylor and Siegmund 2004), we can see that the B H method with ˆp i, is equivalent to the following procedure: reject H 0i if and only if ˆp i, ˆt 0,where { ˆt 0 = sup 0 t 1:t α max 1 i m I{ˆp i, t}, 1) m It is equivalent to reject H 0i if and only if T i ˆt,where { ˆt = inf t 0:2 2 t) α max 1 i m I{ T i t}, 1) m By the continuity of t) and the monotonicity of the indicator function, it is easy to see that mgˆt) max 1 i m I{ T i ˆt}, 1) = α, where Gt) = 2 2 t). LetM be a subset of {1, 2,...,m} satisfying M {i : μ i /σ i 4 log m/n} and CardM) n. By max 1 i m EYi 4 K and Markov s inequality, for any ε>0, P max ŝ ni 2 /σ i 2 1 ) ε = O1/ n). i M This, together with 2) and12), implies that there exist some c> 2andsome b m, m P I { T i c log m } ) 18) b m 1. i=1 This implies that Pˆt G 1 αb m /m)) 1. Given 13) andg κ t) Gt), itfollows that Pˆt G 1 κ αb m/m)) 1. Therefore, by 13) i H 0 I{ T i ˆt} 1 m 0 G κ ˆt) in probability. Note that Gˆt)= α ˆm m + αm 0 m i H 0 I{ T i ˆt} m 0, where ˆm = i H 1 I{ T i ˆt}. With probability tending to one, }. }. 19) Gˆt)= α ˆm m + αm 0 m Gˆt)ˆκ 1 + o1) ) αm 0 m Gˆt)ˆκ 1 + o1) ).
18 2020 W. LIU AND Q.-M. SHAO Thus Pˆκ m/αm 0 ) + ε) 1foranyε>0. Let ˆκ =ˆκ I{ˆκ 2α1 γ)) 1 }. Note that m/αm 0 ) + ε 2α1 γ)) 1.Wehave FDP i H m 0 /m)αˆκ = 0 I{ T i ˆt} ˆκ ) 1 + o1) 1 m 0 G κ ˆt) ˆκ in probability. Then for any ε>0, and FDR 1 + ε) m 0 m αeˆκ + P FDP 1 + ε) m ) 0 m α ˆκ FDR 1 ε) m 0 m αeˆκ 2 α1 γ) ) 1 P FDP 1 ε) m ) 0 m α ˆκ. This proves Theorem 2.1. Corollary 2.11) follows directly from Theorem 2.1 and Pˆt 2logm) 1. αm To prove Corollary 2.12), we first assume that 0 m ˆκ 1 η for some 1 η)/α > 1. So, by 19) and the condition m 1 = expon 1/3 )), with probability tending to one, Gˆt) 2αη 1 ˆm/m 2αη 1 m 1+o1). Hence, ˆt c log m for any c< 2. Recall that τ = lim m m 1 0 i H 0 EYi 3 > 0. Set H 01 = { i H 0 : EY 3 i τ/8 }. According to the definition of τ and EYi 3 EY i 4)3/4 b 3/4 0, m 1 0 Hc 01 τ/8 + b 3/4 0 m 1 0 H 01 τ/2. This implies that H 01 τb 3/4 0 m 0 /4. Hence, we can get m 1 0 i H 0 EYi 3 2 c τ for some c τ > 0. It follows from Taylor s expansion of the exponential function and ˆt c log m that ˆκ 1 + ɛ for some ɛ>0. However, if αm 0 m ˆκ > 1 η,then ˆκ 1 + ɛ for some ɛ>0. This yields that Pˆκ 1 + ɛ) 1forsomeɛ>0. So we have κ 1 + ɛ for some ɛ>0. Note that m 0 /m 1. We prove Corollary 2.12). We next prove Corollary 2.13). By the inequality e x + e x x, Pˆκ m/αm 0 ) + ε) 1, we obtain that i H 0 ˆt 3 / n) EYi 3 m/αm 0 ) + ε 2m 0 with probability tending to one. By τ>0, we have Pˆt cn 1/6 ) 1 for some constant c>0. Thus PGˆt) exp 2cn 1/3 ) 1. Because ˆm/m exp Mn 1/3 ) for any M>0, and given 19), we have αm 0 m ˆκ 1 in probability. Hence, κ 1/α as m 0 /m 1. The proof is finished.
19 LARGE-SCALE t-tests Proof of Theorems 2.2 and 4.2. Let ˆκ i = 1 n X nŝni 3 ki X i ) 3.Define the event { } 1 n F = max 1 i m nŝni 4 X ki X i ) 4 K 1, max ˆκ i κ i K 2 log m/n 1 i m for some large K 1 > 0andK 2 > 0. We first suppose that PF) 1. Let G i t) = P Tki t) be the conditional distribution of T ki given X ={X 1,...,X m }.Note that, given X andontheeventf, G i t) = 1 [exp 2 Gt) t3 ) t 3 )] 1 3 n ˆκ i + exp 3 n ˆκ ) i + o1) = 1 [exp 2 Gt) t3 ) t 3 )] 1 3 n κ i + exp 3 n κ ) i + o1) uniformly in 0 t on 1/4 ). Hence, given X and on the event F, G 20) i t) P T i nμ i /ŝ n t) = 1 + o1) uniformly in 1 i m and 0 t on 1/4 ).Put Ĝ κ t) = 1 2m Gt) [ exp t3 ) t 3 )] 3 n κ i + exp 3 n κ i. 1 i m Set ĉ m = Ĝ 1 κ b m/m). Note that, given X, Tki,1 k N, 1 i m, are independent. Hence, as 13), we can show that for any b m, G sup N,m t) 21) 1 0 t ĉ m Ĝ κ t) 0 in probability. For t = O log m), under the conditions of Theorem 3.2, we have Ĝ κ t)/g κ t) = 1+o1). So, it is easy to see that 13) still holds when G 1 κ b m/m) is replaced by Ĝκ 1b m/m). This implies that for any b m, 22) sup i H 0 I{ T i t} 0 t ĉ m m 0 G N,m t) 1 0 in probability. Let Then we have { ˆt 0 = sup 0 t 1:t α max 1 i m I{ˆp i,b t}, 1) m ˆt 0 = α max 1 i m I{ˆp i,b ˆt 0 }, 1). m }.
20 2022 W. LIU AND Q.-M. SHAO According to 12)and20)wehave,givenX and on the event F, G i c log m) = m c2 /2+o1) for any c> 2 uniformly in i. So, by Markov s inequality, for any ε>0, we have PG N,m c log m) m c2 /2+ε ) 1. By 2) and18), we have Pˆt 0 αb m /m) 1forsomeb m. It follows from 22) that i H 0 I{ˆp i,b ˆt 0 } 1 m 0ˆt 0 in probability. This finishes the proof of Theorem 2.21), 2) and Theorem 4.2 if we can show that PF) 1. Without loss of generality, we can assume that μ i = 0 and σ i = 1. We first show that for some constant K 1 > 0, n P max X 4 ki EXki 4 ) ) 23) K 1 n = o1). 1 i m For 1 i n, put ˆX ki = X ki I { X ki n/ log m }, X ki = X ki ˆX ki. Then, for large n, n P max X 1 i m ki 4 E X 4 ) ) ki K 1 n/2 nm max P X 1i n/ log m ) 1 i m C explog m + log n tn/log m) = o1). Let Z ki = ˆX ki 4 E ˆX ki 4. By the inequality es 1 s s 2 e maxs,0) and 1 + s e s, we have for η = 2 1 tlog m)/n and some large K 1 ) n P max Z ki K 1 n/2 1 i m m n P 2 i=1 m i=1 m ) ) m n Z ki K 1 n/2 + P Z ki K 1 n/2 i=1 [ n ] n exp ηk 1 n/2) expηz ki ) + exp ηz ki ) i=1 exp ηk 1 n/2 + η 2 nez 2 1i eη Z 1i ) C exp log m tk 1 log m)/4 ) = o1).
21 LARGE-SCALE t-tests 2023 This proves 23). By replacing Xki 4, η = 2 1 tlog m)/n and K 1 n/2 with Xki 3, η = 2 1 t log m)/n and K 1 n log m/2, respectively, in the above proof, we can show that 1 n P max X 3 1 i m n ki EXki) ) 3 24) K 1 log m)/n = o1). Similarly, we have 1 n P max X 2 1 i m n ki EXki) ) 2 25) K 1 log m)/n = o1) and ) 1 n 26) P max X 1 i m ki EX ki ) n K 1 log m)/n = o1). Combining 23) 26), we prove that PF) Proof of Theorems 3.1 and 4.3. Let { } 1 n ˆF = max 1 i m n ˆσ i 4 ˆX ki ˆX i ) 4 K 1, max ˆκ i λ ni ) κ i K 2 log m/n. 1 i m By the proof of Theorems 2.2 and 4.2, it is enough to show that P ˆF) 1. Recall that ˆX ki = X ki I{ X ki λ ni } and put Z ki = ˆX ki 4 E ˆX ki 4.Takeη = log m)/n. We have ) n P max Z ki K 1 n/2 1 i m m 2 i=1 exp ηk 1 n/2 + η 2 nez 2 1i eη Z 1i ) C exp 2logm K 1 log m)/4 ) = o1). Similarly, by replacing ˆX ki 4, η = log m)/n and K 1n/2 with ˆX ki 3, η = log m)/n and K 1 n log m/2, respectively, in the above proof, we can show that 1 n P max ˆX 1 i m ki 3 n E ˆX ki) ) 3 K 1 log m)/n = o1). Also, using the above arguments, it is easy to show that 1 n P max ˆX 1 i m ki 2 n E ˆX ki) ) 2 K 1 log m)/n = o1)
22 2024 W. LIU AND Q.-M. SHAO and Note that and P 1 max n 1 i m ) n ˆX ki E ˆX ki ) K 1 log m)/n = o1). max E X 1i 3 I { } X 1i λ ni C 1 i m log m n max 1 i m EX6 1i max E X 1i 2 I { ) } log m 2/3 X 1i λ ni C max 1 i m n 1 i m EX6 1i. This proves that P ˆF) Proof of Theorem 4.1. Recall that mgˆt) max 1 i m I{ T i ˆt}, 1) = α. From 18), we have Pˆt G 1 αb m /m)) 1. The theorem follows from 13) and the fact that G κ t)/gt) = 1 + o1) uniformly in t [0,on 1/6 )) Proof of Propositions 2.1, 2.2, 2.3 and 3.1. To save space, the proof of these propositions is given in the supplementary material Liu and Shao 2014). Acknowledgments. The authors would like to thank the Associate Editor and two referees for their valuable comments, which have helped to improve the quality and presentation of this paper. SUPPLEMENTARY MATERIAL Supplement to Phase transition and regularized bootstrap in large-scale t- tests with false discovery rate control DOI: /14-AOS1249SUPP;.pdf). The supplementary material includes part of numerical results and the proof of Lemma 6.1 and Propositions 2.1, 2.2, 2.3 and 3.1. REFERENCES BENJAMINI, Y. and HOCHBERG, Y. 1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. BStat. Methodol MR BENJAMINI, Y. and YEKUTIELI, D. 2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist MR CAO, H. and KOSOROK, M. R. 2011). Simultaneous critical values for t-tests in very high dimensions. Bernoulli MR
23 LARGE-SCALE t-tests 2025 DELAIGLE, A., HALL, P. and JIN, J. 2011). Robustness and accuracy of methods for high dimensional data analysis based on Student s t-statistic. J. R. Stat. Soc. Ser. BStat. Methodol MR EFRON, B. 2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc MR FAN, J., HALL, P. and YAO, Q. 2007). To how many simultaneous hypothesis tests can normal, Student s t or bootstrap calibration be applied? J. Amer. Statist. Assoc MR FERREIRA, J. A. and ZWINDERMAN, A. H. 2006). On the Benjamini Hochberg method. Ann. Statist MR LIU, W. and SHAO, Q. 2014). Supplement to Phase transition and regularized bootstrap in largescale t-tests with false discovery rate control. DOI: /14-AOS1249SUPP. SHAO, Q.-M. 1999). A Cramér type large deviation result for Student s t-statistic. J. Theoret. Probab MR STOREY, J. D. 2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist MR STOREY, J. D., TAYLOR, J. E. and SIEGMUND, D. 2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. BStat. Methodol MR WANG, Q. 2005). Limit theorems for self-normalized large deviation. Electron. J. Probab electronic). MR WU, W. B. 2008). On false discovery control under dependence. Ann. Statist MR DEPARTMENT OF MATHEMATICS AND INSTITUTE OF NATURAL SCIENCES SHANGHAI JIAO TONG UNIVERSITY SHANGHAI CHINA weidongl@sjtu.edu.cn DEPARTMENT OF STATISTICS THE CHINESE UNIVERSITY OF HONG KONG SHATIN, N.T., HONG KONG CHINA
Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests
Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the
More informationLarge-Scale Multiple Testing of Correlations
Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity
More informationLarge-Scale Multiple Testing of Correlations
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this
More informationResampling-Based Control of the FDR
Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago
More informationTO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao
TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis of microarray data, and in some other
More informationPROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo
PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures
More informationFactor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)
Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber
More informationarxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao
TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? arxiv:math/0701003v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis
More informationCramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou
Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationApplying the Benjamini Hochberg procedure to a set of generalized p-values
U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure
More informationarxiv: v1 [math.st] 31 Mar 2009
The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN
More informationDoing Cosmology with Balls and Envelopes
Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie
More informationRELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS
Statistica Sinica 19 (2009, 343-354 RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT S t STATISTIC, WITH APPLICATIONS Qiying Wang and Peter Hall University of Sydney and University of Melbourne Abstract:
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationHigh-throughput Testing
High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector
More informationModified Simes Critical Values Under Positive Dependence
Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationSelf-normalized Cramér-Type Large Deviations for Independent Random Variables
Self-normalized Cramér-Type Large Deviations for Independent Random Variables Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu 1. Introduction Let X, X
More informationFALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University
The Annals of Statistics 2006, Vol. 34, No. 1, 394 415 DOI: 10.1214/009053605000000778 Institute of Mathematical Statistics, 2006 FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING
More informationOn adaptive procedures controlling the familywise error rate
, pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing
More informationA Large-Sample Approach to Controlling the False Discovery Rate
A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University
More informationRejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling
Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh
More informationFalse Discovery Control in Spatial Multiple Testing
False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University
More informationSimultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationWeighted Adaptive Multiple Decision Functions for False Discovery Rate Control
Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationBootstrapping high dimensional vector: interplay between dependence and dimensionality
Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang
More informationA GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE
A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationA CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin
A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN Dedicated to the memory of Mikhail Gordin Abstract. We prove a central limit theorem for a square-integrable ergodic
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationA Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices
A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationEstimation and Confidence Sets For Sparse Normal Mixtures
Estimation and Confidence Sets For Sparse Normal Mixtures T. Tony Cai 1, Jiashun Jin 2 and Mark G. Low 1 Abstract For high dimensional statistical models, researchers have begun to focus on situations
More informationEMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS
Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationIntroduction to Self-normalized Limit Theory
Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized
More informationControl of the False Discovery Rate under Dependence using the Bootstrap and Subsampling
Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 337 Control of the False Discovery Rate under Dependence using the Bootstrap and
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationControlling Bayes Directional False Discovery Rate in Random Effects Model 1
Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA
More informationBahadur representations for bootstrap quantiles 1
Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More informationarxiv: v1 [math.st] 15 Nov 2017
Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING arxiv:1711.05381v1 [math.st] 15 Nov 2017 By
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationSummary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper
More informationExceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004
Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),
More informationHeterogeneity and False Discovery Rate Control
Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria
More informationA PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS
Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationOrdinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods
Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied
More informationIn Memory of Wenbo V Li s Contributions
In Memory of Wenbo V Li s Contributions Qi-Man Shao The Chinese University of Hong Kong qmshao@cuhk.edu.hk The research is partially supported by Hong Kong RGC GRF 403513 Outline Lower tail probabilities
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationSPRING 2007 EXAM C SOLUTIONS
SPRING 007 EXAM C SOLUTIONS Question #1 The data are already shifted (have had the policy limit and the deductible of 50 applied). The two 350 payments are censored. Thus the likelihood function is L =
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationA NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES
Bull. Korean Math. Soc. 52 (205), No. 3, pp. 825 836 http://dx.doi.org/0.434/bkms.205.52.3.825 A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES Yongfeng Wu and Mingzhu
More informationON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY
J. Korean Math. Soc. 45 (2008), No. 4, pp. 1101 1111 ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY Jong-Il Baek, Mi-Hwa Ko, and Tae-Sung
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS
Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for
More informationTwo-stage stepup procedures controlling FDR
Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,
More informationEMPIRICAL EDGEWORTH EXPANSION FOR FINITE POPULATION STATISTICS. I. M. Bloznelis. April Introduction
EMPIRICAL EDGEWORTH EXPANSION FOR FINITE POPULATION STATISTICS. I M. Bloznelis April 2000 Abstract. For symmetric asymptotically linear statistics based on simple random samples, we construct the one-term
More informationMultiple testing with the structure-adaptive Benjamini Hochberg algorithm
J. R. Statist. Soc. B (2019) 81, Part 1, pp. 45 74 Multiple testing with the structure-adaptive Benjamini Hochberg algorithm Ang Li and Rina Foygel Barber University of Chicago, USA [Received June 2016.
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationEstimating False Discovery Proportion Under Arbitrary Covariance Dependence
Estimating False Discovery Proportion Under Arbitrary Covariance Dependence arxiv:1010.6056v2 [stat.me] 15 Nov 2011 Jianqing Fan, Xu Han and Weijie Gu May 31, 2018 Abstract Multiple hypothesis testing
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationRefining the Central Limit Theorem Approximation via Extreme Value Theory
Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of
More informationSTAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:
STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two
More informationA NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING
Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING By Wen-Xin Zhou, Koushiki Bose, Jianqing Fan,
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationA REVERSE TO THE JEFFREYS LINDLEY PARADOX
PROBABILITY AND MATHEMATICAL STATISTICS Vol. 38, Fasc. 1 (2018), pp. 243 247 doi:10.19195/0208-4147.38.1.13 A REVERSE TO THE JEFFREYS LINDLEY PARADOX BY WIEBE R. P E S T M A N (LEUVEN), FRANCIS T U E R
More informationComprehensive Examination Quantitative Methods Spring, 2018
Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part
More informationTesting Jumps via False Discovery Rate Control
Testing Jumps via False Discovery Rate Control Yu-Min Yen August 12, 2011 Abstract Many recently developed nonparametric jump tests can be viewed as multiple hypothesis testing problems. For such multiple
More informationGARCH Models Estimation and Inference
GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in
More informationA remark on the maximum eigenvalue for circulant matrices
IMS Collections High Dimensional Probability V: The Luminy Volume Vol 5 (009 79 84 c Institute of Mathematical Statistics, 009 DOI: 04/09-IMSCOLL5 A remark on the imum eigenvalue for circulant matrices
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationarxiv: v2 [stat.me] 14 Mar 2011
Submission Journal de la Société Française de Statistique arxiv: 1012.4078 arxiv:1012.4078v2 [stat.me] 14 Mar 2011 Type I error rate control for testing many hypotheses: a survey with proofs Titre: Une
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationControl of Generalized Error Rates in Multiple Testing
Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 245 Control of Generalized Error Rates in Multiple Testing Joseph P. Romano and
More informationOptional Stopping Theorem Let X be a martingale and T be a stopping time such
Plan Counting, Renewal, and Point Processes 0. Finish FDR Example 1. The Basic Renewal Process 2. The Poisson Process Revisited 3. Variants and Extensions 4. Point Processes Reading: G&S: 7.1 7.3, 7.10
More informationNew Approaches to False Discovery Control
New Approaches to False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie
More informationResearch Article Sample Size Calculation for Controlling False Discovery Proportion
Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,
More informationMultiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More informationFDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES
FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept
More informationAsymptotic inference for a nonstationary double ar(1) model
Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationA CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS
Statistica Sinica 26 (2016), 1117-1128 doi:http://dx.doi.org/10.5705/ss.202015.0240 A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Xu He and Peter Z. G. Qian Chinese Academy of Sciences
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationJournal Club: Higher Criticism
Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationSOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS
SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary
More information