Multiple Testing in a Two-Stage Adaptive Design with Combination Tests Controlling FDR

Similar documents
Controlling the False Discovery Rate in Two-Stage. Combination Tests for Multiple Endpoints

1 Online Learning and Regret Minimization

Acceptance Sampling by Attributes

Tests for the Ratio of Two Poisson Rates

Numerical Integration

Review of Calculus, cont d

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

p-adic Egyptian Fractions

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

The Regulated and Riemann Integrals

New Expansion and Infinite Series

5.7 Improper Integrals

Student Activity 3: Single Factor ANOVA

Lecture 1. Functional series. Pointwise and uniform convergence.

Math 1B, lecture 4: Error bounds for numerical methods

Lecture 14: Quadrature

Non-Linear & Logistic Regression

The steps of the hypothesis test

Improper Integrals, and Differential Equations

MATH 144: Business Calculus Final Review

New data structures to reduce data size and search time

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Vyacheslav Telnin. Search for New Numbers.

ODE: Existence and Uniqueness of a Solution

7.2 The Definite Integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

CHM Physical Chemistry I Chapter 1 - Supplementary Material

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

13: Diffusion in 2 Energy Groups

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

8 Laplace s Method and Local Limit Theorems

Testing categorized bivariate normality with two-stage. polychoric correlation estimates

Recitation 3: More Applications of the Derivative

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

1.9 C 2 inner variations

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

Math& 152 Section Integration by Parts

Numerical Analysis: Trapezoidal and Simpson s Rule

Math Lecture 23

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

Quantum Physics II (8.05) Fall 2013 Assignment 2

Riemann Sums and Riemann Integrals

1 Probability Density Functions

Continuous Random Variables

Chapter 0. What is the Lebesgue integral about?

1B40 Practical Skills

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Riemann Sums and Riemann Integrals

Research Article Moment Inequalities and Complete Moment Convergence

1 The Riemann Integral

For the percentage of full time students at RCC the symbols would be:

DIRECT CURRENT CIRCUITS

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

Riemann is the Mann! (But Lebesgue may besgue to differ.)

APPROXIMATE INTEGRATION

Math 8 Winter 2015 Applications of Integration

Deteriorating Inventory Model for Waiting. Time Partial Backlogging

Time Truncated Two Stage Group Sampling Plan For Various Distributions

Review of basic calculus

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

MAA 4212 Improper Integrals

Chapters 4 & 5 Integrals & Applications

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

221B Lecture Notes WKB Method

Predict Global Earth Temperature using Linier Regression

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), )

Math 426: Probability Final Exam Practice

NUMERICAL INTEGRATION

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

20 MATHEMATICS POLYNOMIALS

Best Approximation. Chapter The General Case

Overview of Calculus I

Advanced Calculus: MATH 410 Uniform Convergence of Functions Professor David Levermore 11 December 2015

2D1431 Machine Learning Lab 3: Reinforcement Learning

Monte Carlo method in solving numerical integration and differential equation

The Wave Equation I. MA 436 Kurt Bryan

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Introduction to Group Theory

THE EXISTENCE OF NEGATIVE MCMENTS OF CONTINUOUS DISTRIBUTIONS WALTER W. PIEGORSCH AND GEORGE CASELLA. Biometrics Unit, Cornell University, Ithaca, NY

Construction and Selection of Single Sampling Quick Switching Variables System for given Control Limits Involving Minimum Sum of Risks

Entropy and Ergodic Theory Notes 10: Large Deviations I

Chapter 5 : Continuous Random Variables

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

Math 360: A primitive integral and elementary functions

Chapter 2 Fundamental Concepts

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Transcription:

Multiple Testing in Two-Stge Adptive Design with Combintion Tests Controlling FDR Snt K. Srkr, Jingjing Chen nd Wenge Guo Mrch 13, 2012 Snt K. Srkr is Cyrus H. K. Curtis Professor, Deprtment of Sttistics, Temple University, Phildelphi, PA 19122 (mil: snt@temple.edu). Jingjing Chen is PhD cndidte, Deprtment of Sttistics, Temple University, Phildelphi, PA 19122 (mil: jjchen@temple.edu). Wenge Guo is Assistnt Professor, Deprtment of Mthemticl Sciences, New Jersey Institute of Technology, Newrk, NJ 07102 (mil: wenge.guo@gmil.com). The reserch of Srkr nd Guo were supported by NSF Grnts DMS-1006344 nd DMS- 1006021 respectively. ABSTRACT In mny scientific studies requiring simultneous testing of multiple null hypotheses, it is often necessry to crry out the multiple testing in two stges to decide which of the hypotheses cn be rejected or ccepted t the first stge nd which should be followed up for further testing hving combined their p-vlues from both stges. Unfortuntely, no multiple testing procedure is vilble yet to perform this tsk meeting pre-specified boundries on the first-stge p-vlues in terms of the flse discovery rte (FDR) nd mintining control over the overll FDR t desired level. We present in this rticle two procedures, extending the clssicl Benjmini-Hochberg (BH) procedure nd its dptive version incorporting n estimte of the number of true null hypotheses from single-stge to two-stge setting. These procedures re theoreticlly proved to control the overll FDR when the pirs of first- nd second-stge p-vlues re independent nd those corresponding to the null hypotheses re identiclly distributed s pir (p 1, p 2 ) stisfying the

p-clud property of Brnnth, Posch nd Buer (2002, Journl of the Americn Sttisticl Assocition, 97, 236-244). We consider two types of combintion function, s nd, nd present explicit formuls involving these functions towrds crrying out the proposed procedures bsed on pre-determined criticl vlues or through estimted FDR s. Our simultion indicte tht the proposed procedures cn hve significnt power improvements over the BH procedure bsed on the first stge dt reltive to the improvement offered by the idel BH procedure tht one would hve used hd the second stge dt been vilble for ll the hypotheses, t lest under independence, nd cn continue to control the FDR under some dependence situtions. The proposed procedures re illustrted through rel gene expression dt. Keywords: Combintion test; erly rejection nd cceptnce boundries; flse discovery rte; multiple testing; stepwise multiple testing procedure; two-stge dptive design. 1 INTRODUCTION Gene ssocition or expression studies tht usully involve lrge number of endpoints (i.e., genetic mrkers) re often quite expensive. Such studies conducted in multi-stge dptive design setting cn be cost effective nd efficient, since genes re screened in erly stges nd selected genes re further investigted in lter stges using dditionl observtions. Multiplicity in simultneous testing of hypotheses ssocited with the endpoints in multi-stge dptive design is n importnt issue, s in single stge design. For ddressing the multiplicity concern, controlling the fmilywise error rte (FWR), the probbility of t lest one type I error mong ll hypotheses, is commonly pplied concept. However, these studies re often explortive, so controlling the flse discovery rte (FDR), which is the expected proportion of type I errors mong ll rejected hypotheses, is more pproprite thn controlling the FWR (Weller et l. 1998; Benjmini nd Hochberg 1995; nd Storey nd Tibshirni 2003). Moreover, with lrge number of hypotheses typiclly being tested in these studies, better power cn be chieved in multiple testing method under the FDR frmework thn under the more conservtive FWR frmework. Adptive designs with multiple endpoints hve been considered in the literture under both the FWR nd FDR frmeworks. Miller et l. (2001) suggested using two-stge 2

design in gene experiments, nd proposed using the Bonferroni method to control the FWR in testing the hypotheses selected t the first stge, lthough only the second stge observtions re used for this method. This ws lter improved by Stgopn nd lston (2003) by incorporting the first stge dt through group sequentil schemes in the finl Bonferroni test. Zehetmyer et l. (2005) considered two-stge dptive design where promising hypotheses re selected using constnt rejection threshold for ech p-vlue t the first stge nd n estimtion bsed pproch to controlling the FDR symptoticlly (s the number of hypotheses goes to infinity) ws tken (Storey 2002; Storey, Tylor nd Siegmund 2004) t the second stge to test the selected hypotheses using more observtions. Zehetmyer et l. (2008) hve extended this work from two-stge to multi-stge dptive designs under both FDR nd FWR frmeworks, nd provided useful insights into the power performnce of optimized multi-stge dptive designs with respect to the number of stges, nd into the power difference between optimized integrted design nd optimized pilot design. Posch et l. (2009) showed tht dt-dependent smple size increse for ll the hypotheses simultneously in multi-stge dptive design hs no effect on the symptotic (s the number of hypotheses goes to infinity) control of the FDR if the hypotheses to be rejected re determined only by the test t the finl interim nlysis, under ll scenrios except the globl null hypothesis when ll the null hypotheses re true. Construction of methods with the FWR or FDR control in the setting of two-stge dptive design llowing reduction in the number of tested hypotheses t the interim nlysis hs been discussed, s seprte issue from smple size dpttions, in Buer nd Kieser (1999) nd Kieser, Buer nd Lehmcher (1999), who presented methods with the FWR control, nd in Victor nd Hommel (2007) who focused on controlling the FDR in terms of generlized globl p-vlues. We revisit this issue in the present pper, but focusing primrily on the FDR control in non-symptotic setting (with the number of hypothesis not being infinitely lrge). Our motivtion behind this pper lies in the fct tht the theory presented so fr (see, for instnce, Victor nd Hommel 2007) towrds developing n FDR controlling procedure in the setting of two-stge dptive design with combintion tests does not seem to be s simple s one would hope for. Moreover, it does not llow setting boundries on the first stge p-vlues in terms of FDR nd operte in mnner tht would be nturl 3

extension of stndrd single-stge FDR controlling methods, like the BH (Benjmini nd Hochberg 1995) or methods relted to it, from single-stge to two-stge design setting. So, we consider the following to be our min problem in this pper: To construct n FDR controlling procedure for simultneous testing of the null hypotheses ssocited with multiple endpoints in the following two-stge dptive design setting: The hypotheses re sequentilly screened t the first stge s rejected or ccepted bsed on pre-specified boundries on their p- vlues in terms of the FDR, nd those tht re left out t the first stge re gin sequentilly tested t the second stge hving determined their secondstge p-vlues bsed on dditionl observtions nd then using the combined p-vlues from the two stges through combintion function. We propose two FDR controlling procedures, one extending the originl single-stge BH procedure, which we cll the BH-TSADC Procedure (BH type procedure for two-stge dptive design with combintion tests), nd the other extending n dptive version of the single-stge BH procedure incorporting n estimte of the number of true null hypotheses, which we cll the Plug-In BH-TSADC Procedure, from single-stge to two-stge setting. Let (p 1i, p 2i ) be the pir of first- nd second-stge p-vlues corresponding to the ith null hypothesis. We provide theoreticl proof of the FDR control of the proposed procedures under the ssumption tht the (p 1i, p 2i ) s re independent nd those corresponding to the true null hypotheses re identiclly distributed s (p 1, p 2 ) stisfying the p-clud property (Brnnth et l. 2002), nd some stndrd ssumption on the combintion function. We consider two specil types of combintion function, s nd, which re often used in multiple testing pplictions, nd present explicit formuls for probbilities involving them tht would be useful to crry out the proposed procedures t the second stge either using criticl vlues tht cn be determined before observing the p-vlues or bsed on estimted FDR s tht cn be obtined fter observing the p-vlues. We crried out extensive simultions to investigte how well our proposed procedures perform in terms of FDR control nd power under independence with respect to the number of true null hypotheses nd the selection of erly stopping boundries. Simultions were lso performed to evlute whether or not the proposed procedures cn continue to control the FDR under different types of (positive) dependence mong the underlying test sttistics 4

we consider, such s equl, clumpy nd uto-regressive of order one [AR(1)] dependence. Since potentil improvement of the usul FDR controlling BH method bsed only on the first-stge dt by considering suitble modifiction of it in the present two-stge setting, but still controlling the FDR, is the min motivtion behind proposing our methods, it is nturl to mesure the power performnce of ech of our proposed methods ginst tht of this first-stge BH method. Of course, it seems obvious tht our methods will be more powerful since they utilize more dt, nd so it won t be fir to ssess this improvement by merely looking t the power performnce nd not by mesuring it ginst the power of the so clled best-cse-scenrio BH method, which is the BH method one would hve used hd the second stge dt been vilble for ll the endpoints. Also, it is importnt tht the cost sving our procedure cn potentilly offer reltive to the mximum possible cost incurred by this idel BH method be tken into ccount while ssessing this improvement. Guging the simulted power improvements offered by our procedures over the firststge BH method ginst tht offered by the idel BH method, we notice tht with equl smple size lloction between the two stges, our procedures bsed on s combintion function re doing much better, t lest under independence, thn those bsed on combintion function. In terms of power, our procedures bsed on s combintion function is more close to the idel BH method thn to the first-stge BH method, wheres those bsed on combintion function is in the middle between the first-stge nd the idel BH methods. Between our two procedures, whether they re bsed on s or combintion function, the BH-TSADC seems to be the better choice in terms of controlling the FDR nd power improvement over the single-stge BH procedure when the proportion of true nulls is lrge. If this proportion is not lrge, the Plug-In BH-TSADC procedure is better, but it might lose the FDR control when the p- vlues exhibit equl or AR(1) type dependence with lrge equl- or uto-correltion. In terms of cost, our simultions indicte tht both our procedures cn provide significntly lrge svings. With 90% true nulls nd hlf of the totl smple size llocted to the first stge, our procedures cn offer 44% sving from the mximum cost incurred by using the BH method bsed on the full dt from both stges. This proportion gets lrger with incresing proportion of true nulls or decresing proportion of the smple size llocted to the first stge. 5

We pplied our proposed two-stge procedures to renlyze the dt on multiple myelom considered before by Zehetmyer et l. (2008), of course, for different purpose. The dt consist of set of 12625 gene expression mesurements for ech of 36 ptients with bone lytic lesions nd 36 ptients in control group without such lesions. We considered this dt in two-stge frmework, with the first 18 subjects per group for Stge 1 nd the next 18 per group for Stge 2. With some pre-chosen erly rejection nd cceptnce boundries, these procedures produce significntly more discoveries thn the first-stge BH procedure reltive to the dditionl discoveries mde by the idel BH procedure bsed on the full dt from both stges. The rticle is orgnized s follows. We review some bsic results on the FDR control in single-stge design in Section 2, present our proposed procedures in Section 3, discuss the results of simultions studies in Section 4, nd illustrte the rel dt ppliction in Section 5. We mke some concluding remrks in Section 6 nd give proofs of our min theorem nd propositions in Appendix. 2 CONTROLLING TH FDR IN A SINGL-STAG DSIGN Suppose tht there re m endpoints nd the corresponding null hypotheses H i, i = 1,..., m, re to be simultneously tested bsed on their respective p-vlues p i, i = 1,..., m, obtined in single-stge design. The FDR of multiple testing method tht rejects R nd flsely rejects V null hypotheses is (FDP), where FDP = V/ mx{r, 1} is the flse discovery proportion. Multiple testing is often crried out using stepwise procedure defined in terms of p (1) p (m), the ordered p-vlues. With H (i) the null hypothesis corresponding to p (i), stepup procedure with criticl vlues γ 1 γ m rejects H (i) for ll i k = mx{j : p (j) γ j }, provided the mximum exists; otherwise, it ccepts ll null hypotheses. A stepdown procedure, on the other hnd, with these sme criticl vlues rejects H (i) for ll i k = mx{j : p (i) γ i for ll i j}, provided the mximum exists, otherwise, ccepts ll null hypotheses. The following re formuls for the FDR s of stepup or single-step procedure (when the criticl vlues re sme in stepup procedure) nd stepdown procedure in single-stge design, which cn guide us in developing 6

stepwise procedures controlling the FDR in two-stge design. We will use the nottion FDR 1 for the FDR of procedure in single-stge design. Result 1. (Srkr 2008). Consider stepup or stepdown method for testing m null hypotheses bsed on their p-vlues p i, i = 1,..., m, nd criticl vlues γ 1 γ m in single-stge design. The FDR of this method is given by FDR 1 i J 0 [ I(pi γ R ( i) m 1 (γ 2,...,γ m )+1 ) R ( i) m 1(γ 2,..., γ m ) + 1 with the equlity holding in the cse of stepup method, where I is the indictor function, J 0 is the set of indices of the true null hypotheses, nd R ( i) m 1(γ 2,..., γ m ) is the number of rejections in testing the m 1 null hypotheses other thn H i bsed on their p-vlues nd using the sme type of stepwise method with the criticl vlues γ 2 γ m. With p i hving the cdf F (u) when H i is true, the FDR of stepup or stepdown method with the thresholds γ i, i = 1,..., m, under independence of the p-vlues, stisfies the following: FDR 1 i J 0 ( F (γr ( i) m 1 (γ 2,...,γ m)+1 ) R ( i) m 1(γ 2,..., γ m ) + 1 When F is the cdf of U(0, 1) nd these thresholds re chosen s γ i = iα/m, i = 1,..., m, the FDR equls π 0 α for the stepup nd is less thn or equl to π 0 α for the stepdown method, where π 0 is the proportion of true nulls, nd hence the FDR is controlled t α. This stepup method is the so clled BH method (Benjmini nd Hochberg, 1995), the most commonly used FDR controlling procedure in single-stge deign. The FDR is bounded bove by π 0 α for the BH s well s its stepdown nlog under certin type of positive dependence condition mong the p-vlues (Benjmini nd Yekutieli 2001; Srkr 2002, 2008). The ide of improving the FDR control of the BH method by plugging into it suitble estimte ˆπ 0 of π 0, tht is, by considering the modified p-vlues ˆπ 0 p i, rther thn the originl p-vlues, in the BH method, ws introduced by Benjmini nd Hochberg (2000), which ws lter brought into the estimtion bsed pproch to controlling the FDR by Storey (2002). ] )., 7

A number of such plugged-in versions of the BH method with proven nd improved FDR control mostly under independence hve been put forwrd bsed on different methods of estimting π 0 (for instnce, Benjmini, Krieger, nd Yekutieli 2006; Blnchrd nd Roquin 2009; Gvrilov, Benjmini nd Srkr 2009; Srkr 2008; nd Storey, Tylor nd Siegmund 2004). 3 CONTROLLING TH FDR IN A TWO-STAG ADAPTIV DSIGN Now suppose tht the m null hypotheses H i, i = 1,..., m, re to be simultneously tested in two-stge dptive design setting. When testing single hypothesis, sy H i, the theory of two-stge combintion test cn be described s follows: Given p 1i, the p-vlue vilble for H i t the first stge, nd two constnts λ < λ, mke n erly decision regrding the hypothesis by rejecting it if p 1i λ, ccepting it if p 1i > λ, nd continuing to test it t the second stge if λ < p 1i λ. At the second stge, combine p 1i with the dditionl p-vlue p 2i vilble for H i using combintion function C(p 1i, p 2i ) nd reject H i if C(p 1i, p 2i ) γ, for some constnt γ. The constnts λ, λ nd γ re determined subject to control of the type I error rte by the test. For simultneous testing, we consider nturl extension of this theory from single to multiple testing. More specificlly, given the first-stge p-vlue p 1i corresponding to H i for i = 1,..., m, we first determine two thresholds 0 ˆλ < ˆλ 1, stochstic or nonstochstic, nd mke n erly decision regrding the hypotheses t this stge by rejecting H i if p 1i ˆλ, ccepting H i if p 1i > ˆλ, nd continuing to test H i t the second stge if ˆλ < p 1i ˆλ. At the second stge, we use the dditionl p-vlue p 2i vilble for followup hypothesis H i nd combine it with p 1i using the combintion function C(p 1i, p 2i ). The finl decision is tken on the follow-up hypotheses t the second stge by determining nother threshold ˆγ, gin stochstic or non-stochstic, nd by rejecting the follow-up hypothesis H i if C(p 1i, p 2i ) ˆγ. Both first-stge nd second-stge thresholds re to be determined in such wy tht the overll FDR is controlled t the desired level α. Let p 1(1) p 1(m) be the ordered versions of the first-stge p-vlues, with H (i) being the null hypotheses corresponding to p 1(i), i = 1,..., m, nd q i = C(p 1i, p 2i ). We 8

describe in the following generl multiple testing procedure bsed on the bove theory, before proposing our FDR controlling procedures tht will be of this type. A Generl Stepwise Procedure. 1. For two non-decresing sequences of constnts λ 1 λ m nd λ 1 λ m, with λ i < λ i for ll i = 1,..., m, nd the first-stge p-vlues p 1i, i = 1,..., m, define two thresholds s follows: R 1 = mx{1 i m : p 1(j) λ j for ll j i} nd S 1 = mx{1 i m : p 1(i) λ i}, where 0 R 1 S 1 m nd R 1 or S 1 equls zero if the corresponding mximum does not exist. Reject H (i) for ll i R 1, ccept H (i) for ll i > S 1, nd continue testing H (i) t the second stge for ll i such tht R 1 < i S 1. 2. At the second stge, consider q (i), i = 1,..., S 1 R 1, the ordered versions of the combined p-vlues q i = C(p 1i, p 2i ), i = 1,..., S 1 R 1, for the follow-up null hypotheses, nd find R 2 (R 1, S 1 ) = mx{1 i S 1 R 1 : q (i) γ R1 +i}, given nother non-decresing sequence of constnts γ r1 +1(r 1, s 1 ) γ s1 (r 1, s 1 ), for every fixed r 1 < s 1. Reject the follow-up null hypothesis H (i) corresponding to q (i) for ll i R 2 if this mximum exists, otherwise, reject none of the follow-up null hypotheses. Remrk 1. We should point out tht the bove two-stge procedure screens out the null hypotheses t the first stge by ccepting those with reltively lrge p-vlues through stepup procedure nd by rejecting those with reltively smll p-vlues through stepdown procedure. At the second stge, it pplies stepup procedure to the combined p-vlues. Conceptully, one could hve used ny type of multiple testing procedure to screen out the null hypotheses t the first stge nd to test the follow-up null hypotheses t the second stge. However, the prticulr types of stepwise procedure we hve chosen t the two stges provide flexibility in terms of developing formul for the FDR nd eventully determining explicitly the thresholds we need to control the FDR t the desired level. Let V 1 nd V 2 denote the totl numbers of flsely rejected mong ll the R 1 null hypotheses rejected t the first stge nd the R 2 follow-up null hypotheses rejected t the second stge, respectively, in the bove procedure. Then, the overll FDR in this two-stge 9

procedure is given by [ FDR 12 = V 1 + V 2 mx{r 1 + R 2, 1} ]. The following theorem (to be proved in Appendix) will guide us in determining the first- nd second-stge thresholds in the bove procedure tht will provide control of FDR 12 t the desired level. This is the procedure tht will be one of those we propose in this rticle. Before stting the theorem, we need to define the following nottions: R ( i) 1 is defined s R 1 in terms of the m 1 first-stge p-vlues {p 11,..., p 1m } \ {p 1i } nd the sequence of constnts λ 2 λ m. ( i) R 1 nd S ( i) 1 re defined s R 1 nd S 1, respectively, in terms of {p 11,..., p 1m }\{p 1i } nd the two sequences of constnts λ 1 λ m 1 nd λ 2 λ m. R ( i) 2 is defined s R 2 with R 1 replced by R ( i) 1 nd S 1 replced by S ( i) 1 + 1 nd noting the number of rejected follow-up null hypotheses bsed on ll the combined p-vlues except the q i nd the criticl vlues other thn the first one; tht is, R ( i) 2 R ( i) ( i) 2 ( R 1, S ( i) 1 + 1) = mx{1 j S ( i) 1 ( i) R 1 : q ( i) ( i) (j) γ ( R R( i) 1 +j+1 1, S ( i) 1 + 1)}, where q ( i) s re the ordered versions of the combined p-vlues for the follow-up null (j) hypotheses except the q i. Theorem 1. The FDR of the bove generl multiple testing procedure stisfies the following inequlity FDR 12 [ I(p1i λ ) ] ( i) R 1 +1 + i J 0 R ( i) 1 + 1 [ I(λ < p R( i) 1 +1 1i λ, q S ( i) i γ 1 +1 R( i) 1 +R ( i) 2 +1,S ( i) R ( i) 1 + R ( i) 2 + 1 i J 0 The theorem is proved in Appendix. 1 +1 ) ]. 10

3.1 BH Type Procedures We re now redy to propose our FDR controlling multiple testing procedures in twostge dptive design setting with combintion function. Before tht, let us stte some ssumptions we need. Assumption 1. The combintion function C(p 1, p 2 ) is non-decresing in both rguments. Assumption 2. The pirs (p 1i, p 2i ), i = 1,..., m, re independently distributed nd the pirs corresponding the null hypotheses re identiclly distributed s (p 1, p 2 ) with joint distribution tht stisfies the p-clud property (Brnnth et l., 2002), tht is, Pr (p 1 u) u nd Pr (p 2 u p 1 ) u for ll 0 u 1. Let us define the function H(c; t, t ) = t 1 t 0 I(C(u 1, u 2 ) c)du 2 du 1. 0 < c < 1, When testing single hypothesis bsed on the pir (p 1, p 2 ) using t nd t s the firststge cceptnce nd rejection thresholds, respectively, nd c s the second-stge rejection threshold, H(c; t, t ) is the chnce of this hypothesis to be followed up in the second stge before being rejected when it is null. Definition 1. (BH-TSADC Procedure). 1. Given the level α t which the overll FDR is to be controlled, three sequences of constnts λ i = iλ/m, i = 1,..., m, λ i = iλ /m, i = 1,..., m, for some prefixed λ < α < λ, nd γ r1 +1,s 1 γ s1,s 1, stisfying H(γ r1 +i,s 1 ; λ r1, λ s 1 ) = (r 1 + i)(α λ), m i = 1,..., s 1 r 1, for every fixed 1 r 1 < s 1 m, find R 1 = mx{1 i m : p 1(j) λ j for ll j i} nd S 1 = mx{1 i m : p 1(i) λ i}, with R 1 or S 1 being equl to zero if the corresponding mximum does not exist. 11

2. Reject H (i) for i R 1 ; ccept H (i) for i > S 1 ; nd continue testing H (i) for R 1 < i S 1 mking use of the dditionl p-vlues p 2i s vilble for ll such follow-up hypotheses t the second stge. 3. At the second stge, consider the combined p-vlues q i = C(p 1i, p 2i ) for the follow-up null hypotheses. Let q (i), i = 1,..., S 1 R 1, be their ordered versions. Reject H (i) [the null hypothesis corresponding to q (i) ] for ll i R 2 (R 1, S 1 ) = mx{1 j S 1 R 1 : q (j) γ R1 +j,s 1 }, provided this mximum exists, otherwise, reject none of the follow-up null hypotheses. Proposition 1. Let π 0 be the proportion of true null hypotheses. Then, the FDR of the BH-TSADC method is less thn or equl to π 0 α, nd hence controlled t α, if Assumptions 1 nd 2 hold. The proposition is proved in Appendix. The BH-TSADC procedure cn be implemented lterntively, nd often more conveniently, in terms of some FDR estimtes t both stges. With R (1) (t) = #{i : p 1i t) nd R (2) (c; t, t ) = #{i : t < p 1i t, C(p 1i, p 2i ) c}, let us define FDR 1 (t) = nd FDR 2 1 (c; t, t ) = { mt if R (1) (t) > 0 R (1) (t) 0 if R (1) (t) = 0, { mh(c;t,t ) R (1) (t)+r (2) (c;t,t ) if R (2) (c; t, t ) > 0 0 if R (2) (c; t, t ) = 0, Then, we hve the following: The BH-TSADC procedure: An lterntive definition. Reject H (i) for ll i R 1 = mx{1 k m : FDR 1 (p 1(j) ) λ for ll j k}; ccept H (i) for ll i > S 1 = mx{1 k m : FDR 1 (p 1(k) ) λ }; continue to test H (i) t the second stge for ll i such tht R 1 < i S 1. Reject H (i), the follow-up null hypothesis corresponding to q (i), t the second stge for ll i R 2 (R 1, S 1 ) = mx{1 k S 1 R 1 : FDR 2 1 (q (k) ; R 1 λ/m, S 1 λ /m) α λ}. Remrk 2. The BH-TSADC procedure is n extension of the BH procedure, from 12

method of controlling the FDR in single-stge design to tht in two-stge dptive design with combintion tests. When λ = 0 nd λ = 1, tht is, when we hve singlestge design bsed on the combined p-vlues, this method reduces to the usul BH method. Notice tht FDR 1 (t) is conservtive estimte of the FDR of the single-step test with the rejection p i t for ech H i. So, the BH-TSADC procedure screens out those null hypotheses s being rejected (or ccepted) t the first stge the estimted FDR s t whose p-vlues re ll less thn or equl to λ (or greter thn λ ). Clerly, the BH-TSADC procedure cn potentilly be improved in terms of hving tighter control over its FDR t α by plugging suitble estimte of π 0 into it while choosing the second-stge thresholds, similr to wht is done for the BH method in single-stge design. As sid in Section 2, there re different wys of estimting π 0, ech of which hs been shown to provide the ultimte control of the FDR, of course when the p-vlues re independent, by the resulting plugged-in version of the single-stge BH method (see, e.g., Srkr 2008). However, we will consider the following estimte of π 0, which is of the type considered in Storey, Tylor nd Siegmund (2004) nd seems nturl in the context of the present dptive design setting where m S 1 of the null hypotheses re ccepted s being true t the first stge: ˆπ 0 = m S 1 + 1 m(1 λ ). The following theorem gives modified version of the the BH-TSADC procedure using this estimte. Definition 2. (Plug-In BH-TSADC Procedure). Consider the BH-TSADC procedure with the erly decision thresholds R 1 nd S 1 bsed on the sequences of constnts λ i = iλ/m, i = 1,..., m, nd λ i = iλ /m, i = 1,..., m, given 0 λ < λ 1, nd the second-stge criticl vlues γ R 1 +i,s 1, i = 1,..., S 1 R 1, given by the equtions for i = 1,..., s 1 r 1. H(γ r 1 +i,s 1 ; λ r1, λ s 1 ) = (r 1 + i)(α λ) mˆπ 0, (1) 13

Proposition 2. The FDR of the Plug-In BH-TSADC method is less thn or equl to α if Assumptions 1 nd 2 hold. A proof of this proposition is given in Appendix. As in the BH-TSADC procedure, the Plug-In BH-TSADC procedure cn lso be described lterntively using estimted FDR s t both stges. Let FDR 2 1(c; t, t ) = { mˆπ0 H(c;t,t ) R (1) (t)+r (2) (c;t,t ) if R (2) (c; t, t ) > 0 0 if R (2) (c; t, t ) = 0, Then, we hve the following: The Plug-In BH-TSADC procedure: An lterntive definition. At the first stge, decide the null hypotheses to be rejected, ccepted, or continued to be tested t the second stge bsed on FDR 1, s in (the lterntive description of) the BH-TSADC procedure. At the second stge, reject H (i), the follow-up null hypothesis corresponding to q (i), for ll i R2(R 1, S 1 ) = mx{1 k S 1 R 1 : FDR 2 1(q (k) ; R 1 λ/m, S 1 λ /m) α λ}. 3.2 Two Specil Combintion Functions We now present explicit formuls of H(c; t, t ) for two specil combintion functions - s nd - often used in multiple testing pplictions. s combintion function: C(p 1, p 2 ) = p 1 p 2. H F isher (c; t, t ) = = t 1 t 0 I(C(u 1, u 2 ) c) du 2 du 1 c ln ( ) t if c < t t c t + c ln ( ) t if t c < t c t t if c t, (2) for c (0, 1). 14

combintion function: C(p 1, p 2 ) = min {2 min(p 1, p 2 ), mx(p 1, p 2 )}. H (c; t, t ) = = t 1 t 0 I(C(u 1, u 2 ) c)du 2 du 1 c 2 (t t) if c t c( t 2 2 if t < c min(2t, t ) c(t t) if t < c 2t c 2 t ) t if 2t < c t c 2 2t ) c2 t 2 if mx(2t, t ) c 2t t t if c 2t, for c (0, 1). See lso Brnnth et l. (2002) for the formul (2). These formuls cn be used to determine the criticl vlues γ i s before observing the combined p-vlues or to estimte the FDR fter observing the combined p-vlues t the second stge in the BH-TSADC nd Plug-In BH-TSADC procedures with s nd combintion functions. Of course, for lrge vlues of m, it is numericlly more chllenging to determine the γ i s thn estimting the FDR t the second stge, nd so in tht cse we would recommend using the lterntive versions of these procedures. Given the p-vlues from the two stges, s combintion function llows us to utilize the evidences from both stges with equl importnce towrds deciding on the corresponding null hypothesis. combintion function, on the other hnd, llows us to mke this decision bsed on the strength of evidence provided by the smller of the two p-vlues reltive to the lrger one for rejecting the null hypothesis. 4 SIMULATION STUDIS There re number of importnt issues relted to our proposed procedures tht re worth investigting. As sid in the introduction, modifying the first-stge BH method to mke it more powerful in the present two-stge dptive design setting, reltive to the idel BH method tht would hve been used hd the second stge dt been collected for ll the hypotheses, of course without losing the ultimte control over the FDR, is n 15

importnt rtionle behind developing our proposed methods. Hence, it is importnt to numericlly investigte, t lest under independence, how well the proposed procedures control the FDR nd how powerful they cn potentilly be compred to both the first-stge nd idel BH methods. Since the ultimte control over the FDR hs been theoreticlly estblished for our methods only under independence, it would be worthwhile to provide some insight through simultions into their FDRs under some dependence situtions. The considertion of cost efficiency is s essentil s tht of improved power performnce while choosing two-stge multiple testing procedure over its single-stge version, s so it is lso importnt to provide numericl evidence of how much cost svings our procedures cn offer reltive to the mximum possible cost incurred by using the idel BH method. We conducted our simultion studies ddressing these issues. More detils bout these studies nd conclusions derived from them re given in the following subsections. 4.1 FDR nd Power Under Independence To investigte how well our procedures perform reltive to the first-stge nd full-dt BH methods under independence, we (i) generted two independent sets of m uncorrelted rndom vribles Z i N(µ i, 1), i = 1,..., m, one for Stge 1 nd the other for Stge 2, hving set mπ 0 of these µ i s t zero nd the rest t 2; (ii) tested H i : µ i = 0 ginst K i : µ i > 0, simultneously for i = 1,..., m, by pplying ech of the following procedures t α = 0.05 to the generted dt: The (lterntive versions of) BH-TSADC nd Plug-In BH-TSADC procedures with both s nd combintion functions, the firststge BH method, nd the BH method bsed on combining the dt from two stges (which we cll the full-dt BH method); nd (iii) noted the flse discovery proportion nd the proportion of flse nulls tht re rejected. We repeted steps (i)-(iii) 1000 times nd verged out the bove proportions over these 1000 runs to obtin the finl simulted vlues of FDR nd verge power (the expected proportion of flse nulls tht re rejected) for ech of these procedures. The simulted FDRs nd verge powers of these procedures for different vlues of π 0 nd selections of erly stopping boundries hve been grphiclly displyed in Figures 1-8. Figures 1 nd 3 compre the BH-TSADC nd Plug-In BH-TSADC procedures bsed on both s nd combintion functions with the first-stge nd full-dt BH 16

procedures for m = 100 (Figure 1) nd 1000 (Figure 3), the erly rejection boundry λ = 0.005, 0.010, or 0.025, nd the erly cceptnce boundry λ = 0.5; wheres, Figures 2 nd 4 do the sme in terms of the verge power. Figures 5 to 8 re reproductions of Figures 1 to 4, respectively, with different erly rejection boundry λ = 0.025 nd erly cceptnce boundry λ = 0.5, 0.8, or 0.9. To exmine the performnce of our proposed procedures in more complicted genetic mode, we explored model with eqully spced exponentilly decresing effect sizes t 1.5 (2 2, 2 1, 2 0.5, 2 0 ). The simultion results cn be found in the supplementry mterils of this rticle. These results show tht our procedures re more powerful in presence of such exponentilly decresing effect sizes thn with constnt effect size for the lterntive hypotheses. 4.2 FDR Under Dependence We considered three different scenrios for dependent p-vlues in our simultion study to investigte the FDR control of our procedures under dependence. In prticulr, we generted two independent sets of m = 100 correlted norml rndom vribles Z i N(µ i, 1), i = 1,..., m, one for Stge 1 nd the other for Stge 2, with mπ 0 of the µ i s being equl to 0 nd the rest being equl to 2, nd correltion mtrix exhibiting one of three different types of dependence - equl, clumpy nd AR(1) dependence. In other words, the Z i s were ssumed to hve common, non-negtive correltion ρ in cse of equl dependence, were broken up into ten independent groups with 10 of the Z i s within ech group hving common, non-negtive correltion ρ in cse of clumpy dependence, nd were ssumed to hve correltions ρ ij = Cor(Z i, Z j ) of the form ρ ij = ρ i j for ll i j = 1,..., m, nd some non-negtive ρ in cse of AR(1) dependence. We then pplied the (lterntive versions of) the BH-TSADC nd Plug-In BH-TSADC procedures t level α = 0.05 with both s nd combintion functions, λ = 0.025, nd λ = 0.5 to these dt sets. These two steps were repeted 1000 times before obtining the simulted FDR s nd verge powers for these procedures, s in our study relted to the independence cse. Figures 9-11 grphiclly disply the simulted FDRs of these procedures for different vlues of π 0 nd types of dependent p-vlues considered. 17

Tble 1: Simulted vlues of the expected proportion of cost sving (with λ = 0.025 nd λ = 0.5) m = 100 m = 1000 m = 5000 π 0 = 0.5 π 0 = 0.9 π 0 = 0.5 π 0 = 0.9 π 0 = 0.5 π 0 = 0.9 f = 0.25 0.4321 0.5653 0.4337 0.5716 0.4336 0.5723 f = 0.50 0.2405 0.4325 0.2442 0.4401 0.2442 0.4407 f = 0.75 0.1075 0.2300 0.1082 0.2319 0.1090 0.2320 4.3 Cost Sving Let s consider determining the cost sving in the context of genome-wide ssocition study. Becuse of high cost of genotyping hundreds of thousnds of mrkers on thousnds of subjects, such genotyping is often crried out in two-stge formt. A proportion of the vilble smples re genotyped on lrge number of mrkers in the first stge, nd smll proportion of these mrkers re selected nd then followed up by genotyping them on the remining smples in the second stge. Suppose tht c is the unit cost of genotyping one mrker for ech ptient, n is the totl number of ptients ssigned cross stges 1 nd 2, nd m is the totl number of mrkers for ech ptient. Then, if we hd to pply the full-dt BH method, the totl cost of genotyping for ll these ptients would be n m c. Wheres, if we pply our proposed methods with frction f of the n ptients ssigned to stge 1, then the expected totl cost would be f n m c + (1 f) n [m (S(f))] c, where S(f) is the totl number of rejected nd ccepted hypotheses in the first stge. Thus, for our proposed methods, the expected proportion of sving from the mximum possible cost of using the full-dt BH method is (1 f) n (S(f)) c m n c = (1 f)(s(f)). m Tble 1 presents the simulted vlues of this expected proportion of cost sving for our proposed two-stge methods in multiple testing of m (= 100, 1000, or 5000) independent norml mens in the present two-stge setting with frction f (= 0.25, 0.50, 0.75, or 1.00) of the totl number of ptients being llocted to the first stge. 18

4.4 Conclusions Our simultions in Sections 4.1 nd 4.2 mimic the scenrios with equl lloction of smple size between the two stges. So, if we mesure the performnce of two-stge procedure by how much power improvement it cn offer over the first-stge BH method reltive to tht offered by the idel, full-dt BH method, then our proposed two-stge FDR controlling procedures with s combintion function re seen from Figures 1-8 to do much better under such equl lloction, t lest when the p-vlues re independent both cross the hypotheses nd stges, thn those bsed on combintion function. Of course, our procedures bsed on combintion function re doing resonbly well in terms of this mesure of reltive power improvement. It s performnce is roughly between those of the first-stge nd the full-dt BH methods. Between our two proposed procedures, whether it s bsed on s or combintion function, the BH-TSADC ppers to be the better choice when π 0 is lrge, like more thn 50%, which is often the cse in prctice. It controls the FDR not only under independence, which is theoreticlly known, but lso the FDR control seems to be mintined even under different types of positive dependence, s seen from Figures 9-11. If, however, π 0 is not lrge, the Plug-In BH- TSADC procedure provides better control of the FDR, lthough it might lose the FDR control when the sttistics generting the p-vlues exhibit equl or AR(1) type dependence with modertely lrge equl- or uto-correltion. Also seen from Figures 1-8, there is no pprecible difference in the power performnces of the proposed procedures over different choices of the erly stopping boundries. From Tble 1, we notice tht our two-stge methods cn provide lrge cost svings. For instnce, with 90% true nulls nd hlf of the totl smple size llocted to the first stge, our procedures cn offer 44% sving from the mximum cost of using the idel, full-dt BH method. This proportion gets lrger with incresing proportion of true nulls or decresing proportion of the totl smple size llocted to the first stge. 5 A RAL DATA APPLICATION To illustrte how the proposed procedures cn be implemented in prctice, we renlyzed dtset tken from n experiment by Tin et l. (2003) nd post-processed by Jeffery et 19

l. (2006). Zehetmyer et l. (2008) considered this dt for different purpose. In this dt set, multiple myelom smples were generted with Affymetrix Humn U95A chips, ech consisting 12, 625 probe sets. The smples were split into two groups bsed on the presence or bsence of focl lesions of bone. The originl dtset contins gene expression mesurements of 36 ptients without nd 137 ptients with bone lytic lesions, However, for the illustrtion purpose, we used the gene expression mesurements of 36 ptients with bone lytic lesions nd control group of the sme smple size without such lesions. We considered this dt in two-stge frmework, with the first 18 subjects per group for Stge 1 nd the next 18 subjects per group for Stge 2. We prefixed the Stge 1 erly rejection boundry λ t 0.005, 0.010, or 0.015, nd the erly cceptnce boundry λ t 0.5, 0.8 or 0.9, nd pplied the proposed (lterntives versions of) BH-TSADC nd plug-in BH-TSADC procedures t the FDR level of 0.025. In prticulr, we considered ll m = 12, 625 probe set gene expression mesurements for the first stge dt of 36 ptients (18 ptients per group) nd the full dt of 72 ptients (36 ptients per group) cross two stges, nd nlyzed them bsed on stepdown procedure with the criticl vlues λ i = iλ/m, i = 1,..., m, nd stepup procedure with the criticl vlues λ i = iλ /m, i = 1,..., m, using the corresponding p-vlues generted from onesided t-tests pplied to the first-stge dt. We noted the probe sets tht were rejected by the stepdown procedure nd those tht were ccepted by the stepup procedure. With these numbers being r 1 nd m s 1, respectively, we took the probe sets tht were neither rejected by the stepdown procedure nor ccepted by the stepup procedure, tht is, the probe sets with the first-stge p-vlues more thn r 1 λ/m but less thn or equl to s 1 λ /m, for further nlysis using estimted FDR bsed on their first-stge nd second-stge p- vlues combined through s nd combintion functions s described in the lterntive versions of the BH-TSADC nd plug-in BH-TSADC procedures. The results of this nlysis re reported in Tble 2. As seen from this tble, the BH- TSADC with s combintion function is doing the best. For instnce, with λ = 0.005 nd λ = 0.9, the proportion of dditionl discoveries it mkes over the first-stge BH method is 104/125 = 83.2% of such dditionl discoveries tht the idel, full-dt BH method could mke; wheres, these percentges re 52/125 = 41.6%, 32/125 = 25.6%, nd 16/125 = 12.8% for the BH-TSADC with combintion function, the Plug- 20

Tble 2: The numbers of discoveries mde out of 12625 probe sets in the Affymetrix Humn U95A Chips dt from Tin et l. (2003) by the BH-TSADC nd Plug-In BH-TSADC procedures, ech with either s or combintion function, t the FDR level of 0.025. s BH BH-TSADC Plug-in BH-TSADC BH-TSADC Plug-in BH-TSADC Stge 1 Dt Full Dt λ = 0.005 λ = 0.5 84 58 33 17 2 127 λ = 0.8 97 35 42 17 2 127 λ = 0.9 106 34 54 18 2 127 λ = 0.010 λ = 0.5 74 41 24 13 2 127 λ = 0.8 81 31 30 16 2 127 λ = 0.9 90 31 37 18 2 127 λ = 0.015 λ = 0.5 56 31 17 12 2 127 λ = 0.8 63 29 23 15 2 127 λ = 0.9 69 27 30 18 2 127 In BH-TSADC with s combintion function, nd the Plug-In BH-TSADC with combintion function, respectively. This pttern of dominnce of the BH-TSADC with s combintion function over the other procedures is noted for other vlues of λ nd λ s well. This tble provides some dditionl insights into our procedures. For instnce, under positive dependence cross hypotheses, which cn be ssumed to be the cse for this dt set, it ppers tht the BH-TSADC procedure, with either s or combintion function, tend to become stedily more powerful with incresing λ but fixed λ or with decresing λ but fixed λ. Note tht we did not hve the opportunity to get this insight from our simultions studies. 6 CONCLUDING RMARKS Our min gol in this rticle hs been to construct two-stge multiple testing procedure tht llows mking erly decisions on the null hypotheses in terms of rejection, cceptnce or continution to the second stge for further testing with more observtions nd eventully controls the FDR. Such two-stge formultion of multiple testing is of prcticl importnce in mny sttisticl investigtions; nevertheless, generliztions of the clssicl BH type methods from single-stge to the present two-stge setting, which seem to be the 21

most nturl procedures to consider, hve not been put forwrd until the present work. We hve been ble to construct two such generliztions with proven FDR control under independence. We hve provided simultion results showing their meningful improvements over the first-stge BH method reltive to tht offered by the idel BH method in terms of both power nd cost sving under independence, nd given n exmple of their utilities in prctice. We lso hve presented numericl evidence tht the proposed procedures cn mintin control over the FDR even under some dependence situtions. It is importnt to emphsize tht the theory behind the developments of our proposed two-stge FDR controlling methods hs been driven by the ide of setting the erly decision boundries λ < λ on the (estimted) FDR t the first-stge p-vlues, rther thn on these p-vlues themselves. In other words, we flg those null hypotheses for rejection (or cceptnce) t the first stge t whose p-vlues the estimted FDR s re ll less thn or equl to λ (or greter thn λ ) before proceeding to the second stge; see Remrk 2. This, we would rgue, is often prcticl nd meningful when we re testing multiple hypotheses in two-stges in n FDR frmework. Brnnth et l. (2002) hve defined globl p-vlue p(p 1, p 2 ) for testing single hypothesis in two-stge dptive design with combintion function C(p 1, p 2 ). With the boundries λ < λ set on ech p 1i, the globl p-vlue for ech H i is defined by { p1i if p 1i λ or p 1i > λ p i p(p 1i, p 2i ) = λ + H(C(p 1i, p 2i ); λ, λ ) if λ < p 1i λ. They hve shown tht ech p i is stochsticlly lrger thn or equl to U(0, 1) when (p 1i, p 2i ) stisfies the p-clud property, nd the equlity holds when p 1i nd p 2i re independently distributed s U(0, 1). So, one my consider the BH method bsed on the p i s. This would control the overll FDR under the ssumptions considered in the pper, mybe under some positive dependence conditions s well. However, it does not set the erly decision boundries on the FDR. We proposed our FDR controlling procedures in this pper considering non-symptotic setting. However, one my consider developing procedures tht would symptoticlly control the FDR by tking the following pproch towrds finding the first- nd second-stge thresholds subject to the erly boundries λ < λ nd the finl boundry α on the FDR. 22

Given two constnts t < t, mke n erly decision regrding H i by rejecting it if p 1i t, ccepting it if p 1i > t, nd continuing to test it t the second stge if t < p 1i t. At the second stge, reject H i if C(p 1i, p 2i ) c. Storey s (2002) estimte of the first-stge FDR is given by FDR 1(t) = { mˆπ0 t R (1) (t) if R (1) (t) > 0 0 if R (1) (t) = 0, for some estimte ˆπ 0 of π 0. Similrly, the overll FDR cn be estimted s follows: Let FDR 12(c, t, t ) = { mˆπ0 [t+h(c;t,t )] R (1) (t)+r (2) (c;t,t ) if R (1) (t) + R (2) (c; t, t ) > 0 0 if R (1) (t) + R (2) (c; t, t ) = 0 ˆt λ = sup{t : FDR 1 (t ) λ for ll t t}, ˆt λ = inf{t : FDR 1 (t ) > λ for ll t > t}, nd ĉ α (λ, λ ) = sup{c : FDR 12 (c, ˆt λ, ˆt λ ) α}. Then, reject H i if p 1i ˆt λ or if ˆt λ < p 1i ˆt λ nd C(p 1i, p 2i ) ĉ α (λ, λ ). This my control the overll FDR symptoticlly under the wek dependence condition nd the consistency property of ˆπ 0 (s in Storey, Tylor nd Siegmund 2004). There re number of other importnt issues relted to the present problem which we hve not touched in this pper but hope to ddress in different communictions. There re other combintion functions, such s s weighted product ( 1932) nd weighted inverse norml (Mosteller nd Bush 1954); their performnces would be worth investigting. Considertion of the conditionl error function (Proschn nd Hunsberger 1995) while defining two-stge design before constructing FDR controlling methods is nother importnt issue. Now tht we know how to test multiple hypotheses in two-stge design subject to first-stge boundries on nd the overll control of the FDR, we should be ble to ddress issues relte to smple size determintions. 23

7 Appendix Proof of Theorem 1. [ ] [ ] [ V 1 + V 2 V 1 FDR 12 = + mx{r 1 + R 2, 1} mx{r 1, 1} V 2 mx{r 1 + R 2, 1} ]. Now, [ ] V 1 = [ ] I(p1i λ R1 ) = mx{r 1, 1} mx{r 1, 1} i J 0 i J 0 [ I(p1i λ ) ] ( i) R 1 +1 ; i J 0 R ( i) 1 + 1 [ ] I(p1i λ R1 ) mx{r 1, 1} (s shown in Srkr, 2008; see lso Result 1). And, [ ] V 2 mx{r 1 + R 2, 1} = [ I(λR1 +1 < p 1i λ ] S 1, q i γ R1 +R 2,S 1, S 1 > R 1, R 2 > 0). (3) R 1 + R 2 i J 0 Writing R 2 more explicitly in terms of R 1 nd S 1, we see tht the expression in (3) is equl 24

to m s 1 1 s 1 r 1 i J 0 s 1 =1 r 1 =0 r 2 =1 [ I(λr1 +1 < p 1i λ ] s 1, q i γ r1 +r 2,s 1, R 1 = r 1, S 1 = s 1, R 2 (r 1, s 1 ) = r 2 ) r 1 + r 2 = m s 1 1 s 1 r 1 i J 0 s 1 =1 r 1 =0 r 2 =1 [ I(λr1 +1 < p 1i λ s 1, q i γ r1 +r 2,s R( i) 1 1 = r 1, S ( i) 1 = s 1 1, R ( i) r 1 + r 2 = m 1 i J 0 = i J 0 s 1 s 1 r 1 s 1 =0 r 1 =0 r 2 =0 ] 2 (r 1, s 1 ) = r 2 1) [ ] I(λr1 +1 < p 1i λ ( i) s 1 +1, q i γ r1 +r 2 +1,s 1 +1, R 1 = r 1, S ( i) 1 = s 1, R ( i) 2 (r 1, s 1 + 1) = r 2 ) r 1 + r 2 + 1 [ I(λ < p R( i) 1 +1 1i λ, q S ( i) i γ ) ] 1 +1 R( i) 1 +R ( i) 2 +1,S ( i) 1 +1 Thus, the theorem is proved. Proof of proposition 1. i J 0 R ( i) 1 + R ( i) 2 + 1 FDR 12 [ P rh (p 1 λ ) ] ( i) R 1 +1 + i J 0 R ( i) 1 + 1 [ P rh (λ < p R( i) 1 +1 1 λ, C(p 1, p S ( i) 2 ) γ 1 +1 R( i) 1 +R ( i) 2 +1,S ( i) R ( i) 1 + R ( i) 2 + 1 i J 0 i J 0 [ λr ( i) 1 +1 R ( i) ] + 1 + 1 [ P r(λ < u R( i) 1 +1 1 λ, C(u 1, u S ( i) 2 ) γ 1 +1 R( i) 1 +R ( i) 2 +1,S ( i) R ( i) 1 + R ( i) 2 + 1. 1 +1 ) 1 +1 ) ] ]. (4) The first sum in (4) is less thn or equl to π 0 λ, since λ R ( i) 1 +1 = [R( i) 1 + 1]λ/m, nd the second sum is less thn or equl to π 0 (α λ), since the probbility in the numertor in 25

this sum is equl to = H(γ ; λ ( i) ( i) R1 +R 2 +1,S ( i) ( i) 1 +1 R1 +1, λ S [ ] ( i) R( i) 1 + 1 + R ( i) 2 (α λ). m 1 +1 ) Thus, the proposition is proved. Proof of Proposition 2. This cn be proved s in Proposition 1. More specificlly, first note tht the FDR here, which we cll the F DR 12, stisfies the following: F DR 12 i J 0 i J 0 [ I(p1i λ ) ] ( i) R 1 +1 + R ( i) 1 + 1 [ I(λ p R( i) 1 +1 1i λ, q S ( i) i γ R( i) 1 +1 1 +R ( i) 2 +1,S ( i) R ( i) 1 + R ( i) 2 + 1 1 +1 ) ], (5) where R ( i) 2 R ( i) ( i) 2 ( R 1, S ( i) 1 + 1) = mx{1 j S ( i) 1 ( i) R 1 : q ( i) (j) γ R( i) }, 1 +j+1,s ( i) 1 +1 with q ( i) (j) being the ordered versions of the combined p-vlues except the q i. As in Proposition 1, the first sum in (5) is less thn or equl to π 0 λ. Before working with the second sum, first note tht the γ stisfying qn. (1), tht is, the following eqution H(γr 1 +i,s 1 ; λ r1, λ s 1 ) = (r 1 + i)(α λ)(1 λ ), m S 1 + 1 is less thn or equl to the γ stisfying H(γr 1 +i,s 1 ; λ r1, λ s 1 ) = (r 1 + i)(α λ)(1 λ ), m S ( j) 1 26

for ny fixed j = 1,..., m. So, the second sum in (5) is less thn or equl to since [ i J 0 i J 0 [ I(λ p R( i) 1 +1 1i λ, q S ( i) i γ 1 +1 R ( i) 1 + R ( i) 2 + 1 [ H(γ R ( i) 1 +R ( i) 2 +1,S ( i) 1 +1 ; λ R( i) 1 +1, λ S ( i) = i J 0 R ( i) 1 + R ( i) 2 + 1 = (α λ) [ ] 1 λ α λ, i J 0 m S ( i) 1 1 λ m S ( i) 1 ] π 0 λ + α λ α, which proves the proposition. ) R ( i) 1 +R ( i) 2 +1,S ( i) 1 +1 1 +1 ) 1; see, for instnce, Srkr (2008, p. 151). Hence, FDR 12 ] ] References [1] Buer, P., nd Kieser, M. (1999), Combining Different Phses in the Development of Medicl Tretments within Single Tril, Sttistics in Medicine, 18, 1833-1848. [2] Benjmini, Y., nd Hochberg, Y. (1995), Controlling the Flse Discovery Rte: A Prcticl nd Powerful Approch to Multiple Testing, Journl of the Royl Sttisticl Society, Series B, 57, 289-300. [3] Benjmini, Y., nd Hochberg, Y. (2000), On the Adptive Control of the Flse Discovery Rte in Multiple Testing with Independent Sttistics, Journl of ductionl nd Behviorl Sttistics, 25, 6083. [4] Benjmini, Y., Krieger, A., nd Yekutieli, D. (2006), Adptive Liner Step-up Flse Discovery Rte Controlling Procedures, Biometrik, 93(3), 491-507. [5] Benjmini, Y., nd Yekutieli, D. (2001), The Control of the Flse Discovery Rte in Multiple Testing under Dependency. Annls of Sttistics, 29, 1165-1188. [6] Blnchrd, G., nd Roquin,. (2009), Adptive FDR Control under Independence nd Dependence, Journl of Mchine Lerning Reserch, 10, 2837-2871. 27