Sequential Bonferroni Methods for Multiple Hypothesis Testing with Strong Control of Familywise Error Rates I and II

Size: px

Start display at page:

Download "Sequential Bonferroni Methods for Multiple Hypothesis Testing with Strong Control of Familywise Error Rates I and II"

Herbert Mills
6 years ago
Views:

1 Sequential Bonferroni Methods for Multiple Hypothesis Testing with Strong Control of Familywise Error Rates I and II Shyamal K. De and Michael Baron Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas, USA Abstract: Sequential procedures are developed for simultaneous testing of multiple hypotheses in sequential experiments. Proposed stopping rules and decision rules achieve strong control of both familywise error rates I and II. The optimal procedure is sought that minimizes the expected sample size under these constraints. Bonferroni methods for multiple comparisons are extended to sequential setting and are shown to attain an approximately 50% reduction in the expected sample size compared with the earlier approaches. Asymptotically optimal procedures are derived under Pitman alternative. Keywords: Asymptotic optimality; Familywise error rate; Multiple comparisons; Pitman alternative; Sequential probability ratio test; Stopping rule. Subject Classifications: 62L, 62F03, 62F05, 62H INTRODUCTION The problem of multiple hypothesis testing often arises in sequential experiments such as sequential clinical trials with more than one endpoint see O Brien 1984, Pocock et al. 1987, Jennison and Turnbull 1993, Glaspy et al. 2001, and others, multichannel change-point detection Tartakovsky et al. 2003, Tartakovsky and Veeravalli 2004, acceptance sampling with different criteria of acceptance Baillie 1987, Hamilton and Lesperance 1991, etc. In such experiments, it is necessary to find a statistical answer to each posed question by testing each individual hypothesis instead of combining all the null hypotheses and giving one global answer to the resulting composite hypothesis. This paper develops sequential methods that result in accepting or rejecting each null hypothesis at the final stopping time. Non-sequential methods of multiple comparisons are already well developed. Proposed procedures include Bonferroni and Sidak testing, Holm and Hommel step-down methods controlling familywise error rate, Benjamini-Hochberg step-up and Guo-Sarkar method controlling false discovery rate, and others, see Sidak 1967, Holm 1979, Benjamini and Hochberg 1995, Benjamini et al. 2004, and Sarkar There is a growing demand for sequential analogues of these procedures. Furthermore, sequential tests of single hypotheses can be designed to control both probabilities of Type I and Type II errors. Generalizing to multiple testing, we develop sequential schemes to control both familywise error rates I and II in the strong sense. In the literature of sequential multiple comparisons, two types of testing problems are commonly understood as sequential multiple hypothesis testing. One of them is to compare effects θ 1,..., θ d of d > 1 Address correspondence to Shyamal K. De, Department of Mathematical Sciences, FO 35, The University of Texas at Dallas, 800 West Campbell Road, Richardson, TX , USA; Fax: ; shyamalkd@gmail.com 1

2 treatments with the goal of finding the best treatment, as in Edwards and Hsu 1983, Edwards 1987, Hughes 1993, Betensky 1996, and Jennison and Turnbull 2000, chap. 16. This involves testing of a composite null hypothesis H 0 : θ 1 = = θ d against H A : H 0 not true. The other type of problems is to classify sequentially observed data into one of the desired models, see Armitage 1950, Baum and Veeravalli 1994, Novikov 2009, and Darkhovsky In this case, the parameter of interest θ is classified into one of d alternative hypotheses, namely, H 1 : θ Θ 1 vs,, vs H d : θ Θ d. This paper deals with a different and more general problem of testing d individual hypotheses sequentially Problem Formulation Suppose a sequence of iid random vectors X n, n = 1, 2,...} is observed, where n is a sampled unit and X n = X n 1,..., X n d R d is a set of d measurements, possibly dependent, made on unit n. Let the j th measurement on each sampled unit has a marginal density f j x θ j with respect to a reference measure µ j, j = 1,..., d. While vectors X n are observed sequentially, the d tests are conducted simultaneously H 1 0 : θ 1 Θ 01 vs H 1 A : θ1 Θ A1, H d 0 : θ d Θ 0d vs H d A : θd Θ Ad. Denote the index set of true null hypotheses by J 0 1,..., d} and the set of false null hypotheses by J A = 1,..., d} \ J 0. Further, let ψ j = ψ j X n } equal 1 if H j 0 is rejected and equal 0 otherwise. Then the familywise error rate I FWER I J 0 = P ψ j > 0 J 0 j J0 is defined as the probability to reject at least one true null hypothesis and therefore, commit at least one Type I error, the familywise error rate II FWER II J A = P 1 ψ j > 0 J A, j J A is the probability of accepting at least one false null hypothesis and committing at least one Type II error, and the familywise power is the probability of rejecting all the false null hypotheses, FP = 1 FWER II e.g., Lee 2004, chap. 14. This paper develops stopping rules and multiple decision rules that control both FWER I and FWER II in the strong sense, i.e., FWER I = sup FWER I J 0 α and FWER II = sup FWER II J A β 1.2 J 0 J A for pre-specified levels α and β. This task alone is not difficult to accomplish. For example, one can conduct d separate sequential probability ratio tests, one for each null hypothesis, at levels α/d and β/d each. However, this leaves room for optimization. Our goal is to control both familywise error rates at the 2

3 minimum sampling cost. Namely, an optimization problem of minimizing the expected sample size ET under the strong control of FWER I and FWER II is investigated. A battery of one-sided tests H j 0 : θ j θ j 0 vs H j A : θj > θ j 0, j = 1,..., d}, is considered, which is typical for clinical trials of safety and efficacy. For example, in a case study of early detection of lung cancer Lange et al. 1994, chap. 15, 30,000 men at high risk of lung cancer were recruited sequentially over 12 years for testing three one-sided hypotheses whether intensive screening improves detectability of lung cancer, whether it reduces mortality due to lung cancer, and whether a surgical treatment is more efficient for the reduction of mortality rate than a nonsurgical treatment. In another study, a crossover trial of chronic respiratory diseases Pocock et al. 1987, Tang and Geller 1999 involved 17 patients with asthma or chronic obstructive airways disease to test any harmful effect of an inhaled drug based on four measures of the standard respiratory function Existing Approaches and Our Objective Sequential methods for simultaneous testing of multiple hypotheses have recently received a new wave of attention. Jennison and Turnbull 2000, chap. 15 develop group sequential methods of testing several outcome measures of a clinical trial. Their sampling strategy is to stop at the first time when one of the tests is accepted or rejected. Protecting FWER I conservatively, all the null hypotheses that are not rejected at the stopping time are accepted. While strongly controlling rate of falsely rejected null hypotheses, this testing scheme sacrifices power significantly. Tang and Geller 1999 introduce a closed testing procedure for group sequential clinical trials with multiple endpoints. Their method controls FWER I and yields power that is at least the power obtained by the Jennison-Turnbull procedure. In a recent work by Bartroff and Lai 2010, conventional methods of multiple testing such as Holm s step-down procedure and closed testing technique have been combined to develop a multistage step-down procedure that also controls FWER I. Their method does not assume any special structure of the hypotheses such as closedness or any correlation structure among different test statistics and can be applied to any group sequential, fully sequential, or truncated sequential experiments. All these methods mainly aim at controlling FWER I. To the best of our knowledge, no theory has been developed to control both FWER I and FWER II and also to design a sequential procedure that yields an optimal expected sampling cost under these constraints. The next Section extends Bonferroni methods for multiple comparisons to sequential experiments. Existing stopping rules are evaluated and modified to achieve lower familywise error rates or lower expected sampling cost. In Section 3, asymptotically optimal testing procedures are derived under Pitman alternative. Controlling FWER I and FWER II in the strong sense, they attain the asymptotically optimal rate of the expected sample size. Finite-sample performances of the proposed schemes are evaluated in a simulation study in Section 4. All lengthy proofs are moved to the Appendix. 2. SEQUENTIAL BONFERRONI METHODS In this section, we propose and compare four sequential approaches for testing d hypotheses controlling FWER I and FWER II that are based on the Bonferroni inequality. Assuming the Monotone Likelihood Ratio MLR property of the likelihood ratios and introducing a practical region of indifference θ j 0, θj A for parameter θj, the one-sided testing considered in Section 1 becomes equivalent to the test of H j 0 : θ j = θ j 0 vs H j A : θj = θ j A for all j = 1,..., d. 3

4 Following Neyman-Pearson arguments for the uniformly most powerful test and Wald s sequential probability ratio test, our methods are based on log-likelihood ratios Λ j n = n i=1 ln f jx j i θ j A f j X j i θ j 0 = n i=1 Z j i for j = 1,..., d. 2.1 With stopping boundaries a j and b j, after n observations, we reject H j 0 if Λ j n Λ j n b j, and continue sampling if Λ j n a j, accept H j 0 if b j, a j. The SPRT stopping time for the j th test is therefore T j = inf } n : Λ j n / b j, a j. 2.2 If Type I and Type II errors for the j th test are α j and β j, Wald s approximation gives the approximate stopping boundaries as 1 βj βj a j ln and b j ln. 2.3 α j 1 α j Now, consider any stopping time τ based on the likelihood ratios Λ j n and some stopping boundaries b j, a j such that Type I and Type II errors of the j th test are controlled at levels α j and β j respectively for j = 1,..., d. The strong control of the familywise error rates I and II at levels α and β follows from the Bonferroni inequality. That is, FWER I FWER II P Type I error on j th test J 0 = α j α, P Type II error on j th test J 0 = β j β for any index set J 0 of true null hypotheses Choice of Stopping and Decision Rules In this Subsection, we discuss stopping and decision rules that appeared in the sequential multiple testing literature and propose their improvements. One multiple decision sequential rule is proposed in Jennison and Turnbull 2000, chap Tmin rule. Jennison and Turnbull proposed a Bonferroni adjustment for the sequential testing of multiple hypotheses. According to their procedure, sampling is terminated as soon as at least one test statistics Λ j n leaves the continue-sampling region b j, a j. This stopping rule that we call Tmin is therefore T min = mint 1,..., T d, where stopping times T j are defined in 2.2. If Λ k n falls in its continue-sampling region at time T min for some k j, then H k 0 is considered accepted. For the stopping boundaries 2.3, the Tmin rule controls FWER I conservatively. However, due to an early stopping the Tmin rule fails to control FWER II and, hence, does not yield a high familywise power. In order to control both FWER I and II, sampling must continue beyond T min. 2. Incomplete T rule. For the control of both Type I and Type II error probabilities for testing each parameter θ j, we propose to continue sampling the j th coordinate at n = T j until Λ j n / b j, a j and 4

5 Accept H 1 Reject H 2 T int 5 Accept both a 2 b 2 Λ 2 n b 1 a 1 Reject both 0 Λ 1 n 5 T 5 T min Reject H 1 Accept H 2 Figure 1. Graphical Display of Tmin, T complete and incomplete, and intersection rule, d = 2. to reject H j 0 if and only if Λ j T j a j. Based on Wald s approximation 2.3, this rule yields approximate control of both Type I and Type II error probabilities at α j and β j for the j th test for each j = 1,..., d. This rule that we call incomplete T continues sampling until results for all tests are obtained, T = T 1,..., T d T min. It is incomplete because for all j with T j < T, only a portion of the available data X j 1,..., Xj T j is utilized to reach a decision. Sobel and Wald 1949 used a similar approach for a different sequential problem of classifying the unknown mean of a normal distribution into one of three regions. According to their rule, testing based on the j th likelihood ratio j = 1, 2 stops as soon as it leaves its continue-sampling region. Using the incomplete T rule, the number of used measurements taken on each sampling unit decreases in the course of this sequential scheme, although the cost of each unit e.g., patient is roughly the same. Typically, it is of no extra cost to ask patients to complete the entire questionnaire instead of a portion of it. Moreover, it may be important to include the entire new observed vectors because it may be necessary to retest some of the hypotheses. Indeed, if one or several test statistics Λ j n return to the continue-sampling region after crossing a stopping boundary once, and the corresponding hypothesis is retested based on a larger sample size n, it may further reduce the rate of false discoveries and false non-discoveries. Therefore, we propose an improvement of the incomplete T decision rule that is based on the same stopping time T but utilizes all the available data. 3. Complete T rule. As a plausible improvement over the incomplete T rule, we propose the complete T rule that continues sampling each coordinate of X and testing the corresponding hypothesis until time T. The sample size requirement for both complete and incomplete T rules are the same although the complete rule utilizes T T j additional observations for testing the j th hypothesis. Based on all the available data at time T, different versions of the terminal decision rules can be considered. Following the SPRT, one can accept or reject H j 0 depending on whether Λ j T is below the acceptance boundary b j or above the rejection boundary a j. However, some of the log-likelihood ratios may fall in the continue-sampling region b j, a j at T. An additional boundary c j [b j, a j ] has to be 5

6 pre-chosen to decide on the j th test. In Section 4, complete and incomplete T rules are compared by a simulation study; superiority of the complete rule is established with a certain choice of c j. Alternatively, we propose a randomized terminal decision rule that is based on a sufficient statistics for θ = θ 1,..., θ d at T unlike the incomplete T rule and show that it strongly controls FWER I and FWER II. Theorem 2.1. The randomized decision rule Dj that rejects Hj 0 at time T with probability p j = P Λ j T j a j Λ T, T strongly controls Type I and Type II error probabilities at levels α j and β j respectively. Proof. Let ψ j = I Λ j T j a j be the j th test function for the incomplete T decision rule D j and L θ j, D j = I θ j = θ j 0, reject Hj 0 + I θ j = θ j A, accept Hj 0 be the loss function for j th test. Also, note that Λ n = Λ 1 n,..., Λ d n is a sufficient statistic for θ for any n, and therefore, Λ T, T is a sufficient statistic Govindarajulu 1987, sect Hence, p j = E ψ j Λ T, T is free of θ. By the Rao-Blackwell theorem for randomized rules in Berger 1985, 36, we have EL θ j, Dj = EL θ j, D j. Since the the incomplete T decision rule Dj controls Type I and Type II error probabilities at levels α j and β j, the randomized rule Dj controls error probabilities at the same levels. 4. Intersection rule. To avoid sampling termination in the continue-sampling region of some coordinate and yet to use all coordinates of the already sampled random vectors, we propose the intersection rule. According to it, sampling continues until all log-likelihood ratios leave their respective continue-sampling regions, d T int = inf n : Λ j n / b j, a j. This is the first moment when SPRT-type decisions are obtained for all d tests simultaneously. Clearly, T int T a.s. The difference, however, is likely to be small, and it appears only when a log-likelihood ratio statistic for some j, having crossed the test boundary once, curved back into the continue-sampling region b j, a j. The following lemma gives sufficient conditions for T int to be a proper stopping time. Lemma 2.1. If for each j = 1..., d, the Kullback-Leibler information numbers } Kθ j, θ j = E θ j log f j X j θ j /f j X j θ j are strictly positive and finite for θ j θ j, then P T int < = 1. This lemma follows from the weak law of large numbers; see De and Baron 2012 for the complete proof. Consequently, T is also a proper stopping time. The strong control of FWER I and II by the intersection rule is shown in the following two statements. Lemma 2.2. Let τ be any stopping time with respect to σx 1,..., X n satisfying P Λ j τ b j, a j = 0 for a j > 0, b j < 0 for all j d. with the terminal decision rule of rejecting H j 0 if and only if Λ j τ a j. For such a test P Type I error on j th test = P Λ j τ P Type II error on j th test = P Λ j τ a j H j 0 e a j for all j = 1,..., d, b j H j A eb j for all j = 1,..., d. 6

7 The proof is given in De and Baron Corollary 2.1. With stopping boundaries a j = ln α j and b j = ln β j for j = 1,..., d, the intersection rule strongly controls FWER I and FWER II at levels α = α j and β = β j respectively. Proof. As a result of Lemma 2.2, Type I and Type II error probabilities are controlled at levels α j and β j for a j = ln α j and b j = ln β j. The corollary is established by applying the Bonferroni inequality and choosing α j, β j such that d α j α and d β j β. The three stopping times compared in this subsection are displayed in Fig. 1, based on a sample path of a two-dimensional log-likelihood ratio random walk. As seen in this figure, stopping time T min occurs when one of the coordinates of Λ n crosses the corresponding stopping boundary, T occurs when all the coordinates of Λ n have crossed their boundaries at least once, and finally, T int occurs when all the coordinates are beyond the boundaries simultaneously. In the case of T min and T int, the crossed boundaries determine the decision rules. For the data in Fig. 1, hypothesis H 1 0 is rejected at time T min, but later, both H 1 0 and H 2 0 are accepted at T int. At time T, the incomplete T rule rejects H 1 0 and accepts H 2 0 based on the first crossings. However, since Λ 1 T b 1, a 1 lands in the continue-sampling region, other decision rules can also be chosen, including the complete T rule with the decision boundary c j or the randomized rule D. 3. ASYMPTOTIC OPTIMALITY UNDER PITMAN ALTERNATIVE In this section, we give an asymptotically optimal solution to the problem considered in the previous sections. Namely, we find the optimal allocation of individual error probabilities α j and β j that yields the asymptotically lowest rate of the expected sample size subject to the constraints on familywise error rates I and II in the strong sense. Asymptotic analysis of the considered stopping times is based on the asymptotics of likelihood ratio and its expectation Kullback-Leibler information under Pitman alternative. The rigorous formulation of this optimization problem under Pitman alternative is as follows. Consider an array of sequences of observed random vectors X 11, X 21,, X i1 X 1ν, X 2ν,, X iν where X iν = X 1 iν,..., Xd iν. Observations X j iν in the νth row have a marginal density pmf or pdf f j. θ ν j for all j = 1,..., d. A battery of d simultaneous tests H j : θj ν = θ j 0 vs H j Aν : θj ν = θ j 0 + ϵ jν for j = 1,..., d, is considered, where min j d ϵ jν 0 as ν. The j th log-likelihood ratio statistic for the ν th row is then n Λ j n,ν = ln f j X j iν θj 0 + ϵ jν n i=1 f j X j iν = Z j θj i,ν for j = 1,..., d. 0 i=1 } Based on this statistic, let T jν = inf n : Λ j n,ϵ jν / B jν, A jν be the single-hypothesis Wald s SPRT stopping time for testing H j vs Hj Aν with stopping boundaries A jν = ln 1 β jν β jν and B jν = ln, α jν 1 α jν 7

8 where α jν and β jν are the individual Type I and Type II error probabilities allocated to the j th test. Also, let Tν = T 1ν,..., T dν be the corresponding stopping time for incomplete and complete T rules. In this section, we study the asymptotic behavior of stopping times T 1ν,..., T dν, and Tν as ν and also find the asymptotically optimal allocation of FWER I and FWER II among the d tests in order to minimize the weighted expected stopping time E π Tν = πj 0 E Tν J 0, 3.1 J 0 1,...,d} where πj 0 are weights or prior probabilities assigned to all possible combinations J 0 of true null hypotheses. This problem implies the asymptotic optimization of stopping boundaries B jν, A jν : j = 1,..., d}, or equivalently, the optimal allocation of individual error probabilities α jν, β jν : j = 1,..., d} among d tests. Asymptotic analysis presented here requires the following regularity conditions on the families of distributions f j. θ j : 1 j d }. Regularity Conditions Simplifying notations, we omit index j from θ j, H j, f j, µ j, etc., and consider the following problem. Suppose X 1, X 2,... are iid random variables with density f. θ. Consider testing of H 0 : θ = θ 0 vs H A : θ θ A, where θ A = θ 0 + ϵ. Define a log-likelihood ratio for one variable as Z ϵ = lnfx θ 0 + ϵ/fx θ 0 }. Assume that R1. The density f. θ is identifiable, i.e., θ θ implies P θ fx 1 θ fx 1 θ } > 0. R2. Identity fx θdx = 1 can be differentiated twice under the integral sign. R3. Derivatives l x, θ and l x, θ of log-likelihood lx, θ = ln fx θ are bounded by integrable functions in some neighborhood of θ 0. That is, there are functions L 1 x and L 2 x and δ > 0 such that θ θ 0 < δ implies E θ L j X 1 < for j = 1, 2, and l x, θ 2 L 1 x and l x, θ L 2 x µ-a.s. R4. The moment generating function Ms, ϵ = M Zϵ s = E exps Z ϵ } of Z ϵ exists for some s > 0, and for this s, 3 ϵ 3 Ms, ϵ is bounded for all ϵ within a neighborhood of 0. Conditions R1, R2, and the first part of R3 correspond to conditions A 0, RR3, and R of Borovkov 1997, chap. 2, 16, 26, and 31. Asymptotics of Log-likelihood Ratio Under regularity conditions R1-R4, the log-likelihood ratio Z ϵ is expanded as Z ϵ = ln fx θ 0 + ϵ ln fx θ 0 = ϵl X, θ 0 + ϵ2 2 l X, θ 0 + R 2,ϵ, by the Taylor theorem, where the Lagrange form of the remainder is R 2,ϵ = ϵ3 6 l X, θ and θ [θ0, θ A ]. 8

9 By regularity condition R3, E R 2,ϵ ϵ3 6 E L 2X = Oϵ 3. Therefore, under H 0, E θ0 Z ϵ = ϵe θ0 l X, θ 0 + ϵ2 2 E θ 0 l X, θ 0 + Oϵ 3 = ϵ2 2 Iθ 0 + Oϵ Under H A, using conditions R1-R4, [ E θa Z ϵ = E θa ln fx θ ] A ϵ fx θ A = ϵe θa l X, θ A ϵ2 2 E θ A l X, θ A + Oϵ 3 = ϵ2 2 Iθ A + Oϵ Next, E θ0 Z ϵ = Kθ 0, θ 0 + ϵ and E θa Z ϵ = Kθ A, θ A ϵ, where Kθ 0, θ A = ln fx θ 0 fx θ A fx θ 0dx is the Kullback-Leibler information distance between f. θ 0 and f. θ A. Thus, the first-order Taylor expansion of Z ϵ is Z ϵ = ϵl X, θ 0 + ϵ2 2 l X, θ 1 where θ 1 [θ 0, θ A ]. Therefore, for θ in ϵ-neighborhood of θ 0, E θ Z 2 ϵ = ϵ 2 E θ l X, θ ϵeθ l X, θ 0 l X, θ 1 + ϵ2 4 E θ l X, θ } 2 1 ϵ 2 E θ l X, θ 0 } 2 + ϵ E θ l X, θ 0 2 E θ l X, θ ϵ2 4 E θ L 1 X ϵ 2 E θ l X, θ Oϵ } c 1 ϵ for some constant c 1 > 0. The boundedness of E θ l X, θ 0 2 follows from condition R3 because E θ l X, θ 0 2 E θ l X, θ 2 + θ θ0 E θ l X, θ 2 Eθ l X, θ + ϵeθ l X, θ 2 E θ L1 X + ϵe θ L 1 X E θ L 1 X + ϵe θ L 2 1X, by the Jensen inequality. Let Λ n,ϵ = n i=1 Z i,ϵ be the log-likelihood ratio of n variables. Note that EΛ 2 n,ϵ = n i=1 EZ 2 i,ϵ + i j nc 1 ϵ 2 + nn 1 EZ ϵ 2 nϵ 2 c 1 + n 1kϵ 2, EZ i,ϵ EZ j,ϵ for some positive constant k. The last inequality holds as 3.2 and 3.3 hold. Therefore, EΛ 2 n,ϵ cnϵ 2, 3.5 9

10 for any c c 1 +n 1kϵ 2. Inequalities 3.2, 3.3, 3.4, and 3.5 are frequently used to prove asymptotic results in the later sections. The last auxiliary result establishes the upper bounds for Chernoff entropy ρ ϵ = inf s E e sz ϵ under both null and alternative hypotheses. Its proof is given in the Appendix. Lemma 3.1. There exist positive constant ϵ 0, K 0, and K A such that for all ϵ ϵ 0, i inf E θ 0 e sz ϵ 1 K 0 ϵ 2, 0<s<1 ii inf E θ 1<s<0 A e sz ϵ 1 KA ϵ 2. Asymptotics of SPRT Stopping Time Next, we derive the asymptotic expressions for the single-hypothesis SPRT stopping time T SPRT ϵ = inf n Λ n,ϵ / B, A} for testing H 0 : θ = θ 0 vs H A : θ θ 0 + ϵ with stopping boundaries A and B. If Type I and Type II error probabilities are given as α and β respectively, Wald s approximation for the expected stopping time gives under H 0, and under H A, α ln E θ0 Tϵ SPRT 1 β α + 1 α ln β 1 α, 3.6 Kθ 0, θ 0 + ϵ 1 β ln 1 β E θa Tϵ SPRT α + β ln β 1 α. 3.7 Kθ A, θ A ϵ If the underlying density f satisfies regularity conditions R1-R4, inequalities 3.2 and 3.3 yield, for ϵ = θ A θ 0, Kθ 0, θ A = ϵ2 2 Iθ 0 + Oϵ 3 and Kθ A, θ 0 = ϵ2 2 Iθ A + Oϵ 3 = ϵ2 2 Iθ 0 + Oϵ Therefore, as ϵ 0 under H 0, and under H A, α ln E θ0 Tϵ SPRT 1 β ln E θa Tϵ SPRT 1 β α + 1 α ln β 1 α ϵ 2 Iθ 0 /2 1 β α + β ln β 1 α ϵ 2 Iθ 0 /2 1 + o1 = k 0T SPRT ϵ ϵ o1, o1 = k AT SPRT ϵ ϵ o As seen from 3.9 and 3.10, under both null and alternative hypotheses, ETϵ SPRT as ϵ 0. The next lemma establishes that the stopping time Tϵ SPRT converges to in probability at the rate of ϵ 2. Lemma 3.2. Under regularity conditions R1-R4, 1 P gϵ ϵ2 Tϵ SPRT gϵ 1 as ϵ 0, for any function gϵ such that gϵ as ϵ 0. Corollary 3.1. ϵ 2 T SPRT ϵ is bounded away in probability from 0 and as ϵ 0. The proofs of Lemma 3.2 and Corollary 3.1 are given in Appendix. 10

11 Asymptotics of Weighted Expected Stopping Time T From 3.9 and 3.10, we have E θj T jν 1 ϵ 2 jν for θ j θ j 0, θj A }, as ν, if α jν and β jν are bounded away from 0 for j = 1,..., d. The following lemma provides a lower bound for weighted expected T defined in 3.1. Lemma 3.3. If α jν and β jν are bounded away from 0, there exist constants k j such that } E w Tν j d Proof. Since Tν T jν a.s. for all j d, E w Tν = π J 0 E J0 Tν where J 0 1,...,d} = π j E H j is the marginal prior probability of H j 0 k j ϵ 2 jν 1 + o1. J 0 1,...,d} π J 0 E J0 T jν T jν + 1 π j E j H T jν, 3.11 Aν π j = E w Tν 1 j d J 0 1,...,d}, j J 0 π J 0. Maximizing 3.11 for j = 1,..., d, π j E j H T jν + 1 π j E j H T jν Aν } Next, from 3.9 and 3.10, since α jν and β jν are bounded away from 0, there exist constant k 0j and k Aj such that E j H T jν = k 0j 1 + o1 and E ϵ 2 j jν H T jν = k Aj 1 + o1. The proof is completed by ϵ Aν 2 jν substituting these expressions into As ν, E w Tν converges to at least at the rate of j d ϵ 2 jν. A natural question arises about the best distribution of error probabilities α j and β j, α j = α, β j = β, among d tests. Is equal error spending optimal? If not, is there an asymptotically optimal choice of error probabilities α jν, β jν such that E w Tν attains its lower bound? The answers are given in the following two subsections Case 1: One Test More Difficult than Others In general, by the difficulty of a test we understand the closeness between the tested null and alternative hypotheses that can be measured by Kullback-Leibler distance between the corresponding distributions. In view of 3.8, an equivalent measure is the absolute difference ϵ between the null and alternative parameters. Let the d th test be the most difficult test in order if ϵ dν ϵ jν, that is, ϵ dν = oϵ jν for all j = 1,..., d 1. Suppose the complete T rule rejects H j if and only if Λj Tν,ν c j for some fixed boundary c j [ln 1 β j α j, ln β j 1 α j ]. Define Type I and II error probabilities of j th test at Tν as α j T ν = P Λ j T ν c j H j and βj T ν = P Λ j T < c ν j H j Aν. The following theorem proves that if the d th test is the most difficult in order, then FWER I and FWER II are controlled asymptotically at levels α d and β d respectively. The proof of this theorem is given in Appendix. 11

12 Theorem 3.1. If ϵ dν = oϵ jν for all j = 1,..., d 1, as ν, then α j T ν α d T ν = o1 and β j Tν = o1 for j = 1,..., d 1, = α d + o1 and β d Tν = β d + o1. According to Theorem 3.1, all error probabilities of less difficult tests converge to zero whereas for the most difficult test, they converge to positive constants α d and β d. Hence, the familywise error rates are controlled asymptotically at levels α d and β d instead of α and β. For this reason, equal allocation of error probabilities is not optimal. Indeed, if equal errors α/d and β/d are allocated to all d tests, FWER I and II are controlled at levels lower than required, which has to be at the cost of larger sample size. The following theorem establishes that an optimal Type I and Type II error probability allocation distributes the error probabilities mostly to the most difficult test. Such allocation of error probabilities leads to asymptotically efficient sequential tests that yield asymptotically optimal rate of the expected sample size. Theorem 3.2. Let ϵ dν ϵ jν for j < d, and α ν 0, β ν 0 are chosen so that ln w j α ν = o ϵjν ϵ dν 2 and ln w j β ν = o for w j > 0, d 1 1 w j = 1, and j = 1,..., d 1. Then, error probabilities α dν = α α ν, β dν = β β ν, and, α jν = w j α ν, β jν = w j β ν for j < d, 2 ϵjν are asymptotically optimal as E w Tν attains the asymptotically lower bound E w Tν = k d 1 ϵ 2 + o dν ϵ 2, dν for a constant k d > 0. Proof. The error constraints are satisfied, ϵ dν α jν = α and β jν = β. Using 3.9, we have for all j = 1,..., d 1, ϵ 2 dν E H j T jν = ϵdν ϵ jν 2 α jν ln 1 βjν α jν + 1 α jν ln Iθ j 0 /2 + O ϵ jν βjν 1 α jν 0, as ν, because ϵdν ϵ jν 2 ln βjν 0 from the choice of β jν. Similarly, ϵ 2 dν E H j T jν 0, Aν as ν for j = 1,..., d 1. Thus E w T jν = π j E H j 1 T jν + 1 π j E j H T jν = o Aν ϵ 2 dν for j = 1,..., d 1. 12

13 Using 3.9 and 3.10 for testing H d 0 vs H d, we have where k d = π d α ln E w T dν = π d E H d A T dν + 1 π d E d H T dν = k d Aν ϵ 2 dν 1 β α + 1 α ln β 1 α Iθ d 0 /2 + 1 π d 1 β ln A lower bound for E w Tν is given by } E w Tν π d E d H T dν + 1 π d E d H T dν = k d Aν ϵ 2 dν and an upper bound for E w Tν is obtained as Hence, E w Tν = k d ϵ 2 dν E w Tν + o 1. ϵ 2 dν 3.2. Case 2: Tests of Same Order of Difficulty E w T jν = k d ϵ 2 dν 1 + o ϵ 2. dν 1 + o ϵ 2, dν 1 β α + β ln β 1 α Iθ d 0 / o ϵ 2, dν Here, we consider a situation when all tests have a common order of difficulty, that is, ϵ jν = r j ϵ ν for j = 1,..., d, where r 1,..., r d are constants, and ϵ ν 0 as ν. The following theorem proves that one should not allocate an error probability that is too large or too small to any particular test in this case. Theorem 3.3. An asymptotically optimal allocation of error probabilities α jν, β jν is such that α jν and β jν are bounded away from 0 for all j = 1,..., d and ν = 1, 2,..., and α jν = α, β jν = β. Then k j 1 j d rj 2 + o1 ϵ 2 νe w Tν k j r 2 j + o1, where k j = π j k 0 T jν + 1 π j k A T jν, functions k 0 and k A being defined in 3.9 and Proof. An upper bound for E w Tν is obtained as E w Tν } π j E j H T jν + 1 π j E j H T jν = Aν k j r 2 j ϵ2 ν 1 + o1 }, as α jν, β jν are bounded away from 0. Also, from Lemma 3.3, a lower bound for E w Tν is } E w Tν j d k j r 2 j ϵ2 ν 1 + o1. 13

14 Thus k j 1 j d rj 2 + o1 ϵ 2 νe w Tν k j r 2 j + o1. Theorems 3.2 and 3.3 establish the asymptotically optimal error spending strategy among the d tests that attains the smallest rate of the expected sample size under the specified constraints on FWER I and II. 4. SIMULATION STUDY Sequential procedures for testing multiple hypotheses proposed above and in the literature are compared in this section. To cover all the interesting cases, simultaneous testing of several Normal means µ j and Bernoulli proportions p j is considered under a strong control of FWER I and FWER II at levels α = 0.05 and β = 0.1 respectively. Comparison of Six Schemes with Uniform Error Spending Table 1 compares performance of six multiple testing procedures including the non-sequential Holm method from Holm 1979, the multi-stage step-down procedure from Bartroff and Lai 2010, Tmin, incomplete T, complete T, and intersection rules for the following multiple testing scenarios. A sequence of random vectors X 1, X 2,... R 3 is observed, where X i = X 1 i, X 2 i, X 3 i, X j i Nµ j, 1 for j = 1, 2, and X 3 i Bernoullip. Three tests about parameters µ 1,2 and p are conducted, H 1 : µ 1 = 0 vs µ 1 = 0.5, H 2 : µ 2 = 0 vs µ 2 = 0.5, and H 3 : p = 0.5 vs p = 0.75, with the uniform error spending α j, β j = α/d, β/d, j = 1, 2, 3, and stopping boundaries b j, a j = ln β j, ln α j for the log-likelihood ratio statistics. We consider the following multiple testing scenarios. Multiple testing 1: H 1 0, H2 0, and H3 0 are true, and the coordinates of X i are independent. Multiple testing 2: H 1 0, H2 0, and H3 A are true, and the coordinates of X i are independent. Multiple testing 3a: H 1 0, H2 A, and H3 A are true, and the coordinates of X i are independent. Multiple testing 3b: H 1 0, H2 A, and H3 A are true, and the two normal components X1 i and X 2 i have correlation coefficient For each situation, the estimated actual FWER I and II are presented as percentage, along with the estimated expected sample size ÊT and its standard error. Results are based on 55,000 simulated sequences. Discussion of Results in Table 1 In all the multiple testing scenarios, our proposed incomplete T, complete T, and intersection rule yield uniformly lower error rates than non-sequential Holm and Bartroff-Lai s multi-stage procedure with approximately 50% lower expected sample size. For complete T rule, the decision boundary c j = a j + b j /2 is used for all j = 1,..., d. As expected, complete T rule yields lower error rates than incomplete T rule while using the same sample size. Tmin rule controls FWER I conservatively and certainly saves on a sample size, however, it is at the expense of a high FWER II leading to a significant reduction of familywise power. The estimated expected sample size and FWER I for the Bartroff-Lai procedure are adopted from Bartroff and Lai

15 Table 1. Comparison of six multiple testing schemes for 3 tests. Multiple Tests Scheme FWER I % FWER II % ÊT S.E. of ÊT non-sequential Holm 4 NA 105 NA multiple testing 1 Bartroff-Lai 4.8 NA NA Tmin 0.86 NA incomplete T 3.77 NA complete T 3.37 NA intersection 2.24 NA non-sequential Holm 4.4 NA 105 NA multiple testing 2 Bartroff-Lai 4.2 NA 98.3 NA Tmin incomplete T complete T intersection non-sequential Holm 4.3 NA 105 NA multiple testing 3a Bartroff-Lai 3.2 NA 92.3 NA Tmin incomplete T complete T intersection non-sequential Holm 4.5 NA 105 NA multiple testing 3b Bartroff-Lai 3.6 NA 94.2 NA Tmin incomplete T complete T intersection Asymptotically Optimal Error Spending Consider now the asymptotic setup where one or more parameters under the alternative hypotheses are close to their corresponding null values. Again, parameters µ j and p j of Nµ j, 1 and Bernoullip j distribution are tested with stopping boundaries b j, a j = ln β j, ln α j. The following testing scenarios are compared. In 4a, 4b, 5a, and 5b, one test is more difficult than the others. Ignoring this, tests 4a and 5a continue to use the uniform error allocation, which is no longer optimal, as shown in Section 3. Conversely, tests 4b and 5b spend the error probabilities according to Theorem 3.2, allocating most of them to the most difficult test. In test 6, all four tests are equally difficult, so uniform error allocation is optimal by Theorem 3.3. Multiple testing 4a: Test a battery of d = 4 hypotheses: H 1 : µ 1 = 0 vs µ 1 = 0.1, H 2 : µ 2 = 0 vs µ 2 = 0.5, and H j : p j = 0.5 vs p j = 0.75 for j = 3, 4 with uniform error spending α j, β j = α/d, β/d for j = 1,..., 4. The set of true null hypotheses is J 0 = 1, 3}. Multiple testing 4b: Tests in 4a are conducted with Type I error probabilities 0.04, 0.004, 0.003, and Type II error probabilities 0.08, 0.008, 0.006, 0.006, allocating most of the errors to the most difficult test and ensuring 4 α j = 0.05 and 4 β j = 0.1. Multiple testing 5a: Same tests as in 4a, with J 0 = 3, 4}, i.e., H 3 0 and H 4 0 are true. Multiple testing 5b: Same tests as in 4b, with J 0 = 3, 4}. Multiple testing 6: Test H j : µ j = 0 vs µ j = 0.1 for j = 1,..., 4, with uniform error spending 15

16 α j, β j = α/4, β/4, and J 0 = 3, 4}. For each multiple testing procedure, Table 2 contains the estimates of FWER I and FWER II in percentages, expected stopping time ÊT, expected SPRT stopping times ÊT j for marginal tests H j, j = 1,..., 4, and marginal Type I and Type II error probabilities ˆα j and ˆβ j in percentages. Results are based on 55,000 simulated sequences. Discussion of Results in Table 2 Proposed schemes control FWER I and FWER II at 5% and 10% levels respectively. In all the testing scenarios, incomplete and complete T rule require the same sample size, but the complete T rule yields lower FWER I and FWER II than the incomplete T rule. This reduction in error rates is due to utilizing the full sample for each test and using a sufficient statistic based on T samples. Compared to the incomplete and complete T rule, the intersection rule reduces both error rates even further, but requires greater expected sample size. Equal allocation of error probabilities α/4, β/4 on each test of multiple testing 4a yields much larger expected sample size 731 compared to the expected sample size 478 obtained in multiple testing 4b. This reduction in expected sample size is due to allocating large portions of error probabilities 0.04 and 0.08 to the first test which is the most difficult in this case. Similarly, expected sample size reduces from 850 in multiple testing 5a to 571 in multiple testing 5b as most of the error probabilities are allocated to the most difficult test. These simulation results support Theorem APPENDIX Proof of Lemma 3.1 Taylor expansion of M 0 s, ϵ = E θ0 e sz ϵ yields M 0 s, ϵ = M 0 s, 0 + ϵm 0 s, 0 + ϵ2 2 M 0 s, 0 + R 02,ϵ, 5.1 where M 0 s, 0 = 1 and R 02,ϵ = M 0 s, ϵ ϵ3 6 = Oϵ3 for some 0 ϵ ϵ because M 0 s, ϵ is bounded by regularity condition R4. Differentiating M 0 s, ϵ, obtain [ ϵ E θ 0 e sz ϵ fx θ0 + ϵ s ] [ ] fx θ0 + ϵ s 1 f X θ 0 + ϵ = Eθ0 = E θ0 s. ϵ fx θ 0 fx θ 0 fx θ 0 [ ] Then, from R2, M 0 s, 0 = se f X θ 0 θ0 fx θ 0 = 0. Differentiating M 0 s, ϵ again at ϵ = 0, f M X θ s, 0 = ss 1E θ0 + se θ0 fx θ 0 Therefore 5.1 becomes M 0 s, ϵ = 1 s1 siθ 0 ϵ2 2 + Oϵ3. There exists ϵ 1 such that for all ϵ ϵ 1 and for fixed s 0, 1, f X θ 0 = s1 siθ 0. fx θ 0 s1 siθ 0 + Oϵ Hence, for any ϵ ϵ 1 and K 0 = s1 siθ 0 /4, s1 s Iθ 0. 2 inf M 0s, ϵ 1 K 0 ϵ 2. 0<s<1 16

17 Table 2. Comparing expected sample size and error rates obtained from the three schemes when some of the four tests are difficult. Multiple Tests Scheme FWER I % FWER II % ÊT ÊT 1,..., ÊT 4 ˆαj% ˆβj % incomplete T ˆα1, ˆα3 = 1.16, 1.06 ˆβ 2, ˆβ 4 = 1.89, 1.8 multiple testing 4a 730.7, 36.4, 27.2, 33.7 complete T ˆα1, ˆα3 = 1.16, ˆβ 2, ˆβ 4 = 0.005, Intersection ˆα1, ˆα3 = 1.16, 0 ˆβ 2, ˆβ 4 = 0, incomplete T ˆα1, ˆα3 = 3.54, 0.23 ˆβ 2, ˆβ 4 = 0.64, 0.46 multiple testing 4b 477.1, 46.2, 37.4, 45.5 complete T ˆα1, ˆα3 = 3.54, 0.01 ˆβ 2, ˆβ 4 = 0.04, Intersection ˆα1, ˆα3 = 3.53, ˆβ 2, ˆβ 4 = 0.009, incomplete T ˆα3, ˆα4 = 1.06, 1.07 ˆβ 1, ˆβ 2 = 2.36, 1.88 multiple testing 5a 850.1, 36.4, 27.2, 27.2 complete T ˆα3, ˆα4 = 0, ˆβ 1, ˆβ 2 = 2.36, 0 Intersection ˆα3, ˆα4 = 0, 0 ˆβ 1, ˆβ 2 = 2.36, 0 incomplete T ˆα3, ˆα4 = 0.23, 0.27 ˆβ 1, ˆβ 2 = 7.39, 0.64 multiple testing 5b 570.9, 46.2, 37.4, 37.3 complete T ˆα3, ˆα4 = 0.002, ˆβ 1, ˆβ 2 = 7.39, Intersection ˆα3, ˆα4 = 0, 0 ˆβ 1, ˆβ 2 = 7.37, 0 incomplete T ˆα3, ˆα4 = 1.14, 1.12 ˆβ 1, ˆβ 2 = 2.17, 2.3 multiple testing , 848.3, 729.3, complete T ˆα3, ˆα4 = 0.93, 0.91 ˆβ 1, ˆβ 2 = 1.62, 1.74 Intersection ˆα3, ˆα4 = 0.48, 0.45 ˆβ 1, ˆβ 2 = 0.83,

18 This proves i of Lemma 3.1. Similarly to 5.1, Taylor expansion of M A s, ϵ = E θa e sz ϵ yields M A s, ϵ = M A s, 0 + ϵm A s, 0 + ϵ2 2 M A s, 0 + R A2,ϵ, 5.2 where M A s, 0 = 1 and R A2,ϵ = M A s, ϵ ϵ3 6 = Oϵ3 for some 0 ϵ ϵ, because M A s, ϵ is bounded R4. Along the same steps, we derive M A s, 0 = 0 and M A s, 0 = ss + 1Iθ A. Therefore, 5.2 becomes M A s, ϵ = 1 ϵ2 2 [1 + s siθ A + Oϵ]. There exists ϵ 2 such that for all ϵ ϵ 2 and for fixed s 1, 0, 1 + s siθ A + Oϵ Hence, for any ϵ ϵ 2 and K A = 1 + s siθ A /4, 1 + s s Iθ A. 2 inf M As, ϵ 1 K A ϵ 2. 1<s<0 Both i and ii of Lemma 3.1 hold for all ϵ ϵ 0 = minϵ 1, ϵ 2. This completes the proof. Proof of Lemma 3.2 For an arbitrary ϵ > 0, 1 P gϵ ϵ2 Tϵ SPRT gϵ = 1 P ϵ 2 Tϵ SPRT 1 P ϵ 2 Tϵ SPRT > gϵ. 5.3 gϵ For all i = 1, 2,..., using 3.2 and 3.3, µ ϵ = E Z i,ϵ kϵ 2 for some positive constant k. Let A 0 = mina, B. Then P ϵ 2 Tϵ SPRT 1 = P Tϵ SPRT 1 gϵ ϵ 2 gϵ P P 1 A 0 1 n 1 ϵ 2 gϵ 1 n 1 ϵ 2 gϵ k gϵ 2 Λ n,ϵ A 0 Λ n,ϵ nµ ϵ A 0 k gϵ 1 ϵ 2 gϵ n=1 EZ n,ϵ µ ϵ The last inequality in 5.4 follows from the Kolmogorov imal inequality for independent random variables e.g., Billingsley From 3.4, EZ n,ϵ µ ϵ 2 c 1 ϵ 2 for some c 1 > 0. Therefore, 5.4 implies P ϵ 2 Tϵ SPRT 1 c 1 /gϵ gϵ 2 = o1 as ϵ A 0 k/gϵ 18

19 From the Markov inequality, P ϵ 2 T SPRT ϵ ϵ > gϵ ϵ2 ET SPRT. gϵ Then, using 3.9 and 3.10, ϵ 2 ETϵ SPRT = O1 as ϵ 0. Hence, The proof is completed by applying 5.5 and 5.6 in 5.3. Proof of Corollary 3.1 c 1 δ P ϵ 2 T SPRT ϵ gϵ = o1 as ϵ Consider δ such that A 0 < 1. Such a choice of δ exists because the quadratic form in δ has discriminant kδ 2 c A 1 0kc 1 > 0. Substituting δ for gϵ in the inequality 5.5, obtain and therefore, Thus, ϵ 2 T SPRT ϵ P ϵ 2 T SPRT ϵ P ϵ 2 T SPRT ϵ δ > δ 1 c 1 δ A 0 kδ 2, c 1 δ A 0 kδ 2 > 0. is bounded away from 0 in probability as ϵ 0. Next, applying the Markov inequality, lim λ Hence, ϵ 2 T SPRT ϵ = O P 1 as ϵ 0. lim sup P ϵ 2 Tϵ SPRT > λ lim lim sup ϵ 0 λ ϵ 0 Proof of Theorem 3.1 Let c d = A d = ln 1 βd α d, and B d = ln βd 1 α d. Consider α d T = P ν d H = P H d Λ d Tν Λ d Tν,ν A d, Tν = T dν + P d H,ν A d + o1 = α d + o1, ϵ 2 ETϵ SPRT = 0. λ Λ d Tν,ν A d, T ν T dν because P Tν T dν = o1. The last equality is subject to Wald s approximation ignoring the overshoot for the d th test. Similarly, β d T = P ν d H Aν = P H d Aν Λ d Tν Λ d Tν,ν < A d, Tν = T dν + P d H Aν,ν < A d + o1 = P d H Λ d Aν ν T Λ d Tν,ν B d,ν < A d, Tν T dν + o1 = β d + o1. 19

20 Then, for any sequence of positive integers g ν as ν such that g ν = o ϵ 2 dν and j<d ϵ 2 jν = o g ν, P Tν < g ν P T dν < g ν P Λ d 1 m g ν m,ν A 0d where A 0d = min A d, B d g ν E Z m,ν d 2 A0d Cϵ 2 dν g 2 using 5.4 ν m=1 K d ϵ 2 dν g ν A0d Cϵ 2 dν g ν 2 = o 1, where the last inequality follows from 3.4. Here, C and K d are some positive constants. For all j = 1, 2,..., d 1, α j T P ν j Λ j H T ν,ν c j, Tν g ν + P j H Tν < g ν for any 0 s 1. Note that P H j = P H j m g ν m i=1 e s m g ν e s m i=1 Zj i,ν } Z j i,ν c j m i=1 Zj i,ν m=n e sc j + o1 + o1 is a nonnegative supermartingale. Applying Theorem 1 in Lake 2000, which is analogous to the Doob s imal inequality for nonnegative submartingales, P j H e s m i=1 Zj i,ν e sc j e sc j E j m g ν H e s gν i=1 Z j i,ν An upper bound of α j T ν is obtained by taking infimum over all s, inf e s gν i=1 Z j i,ν α j T ν 0 s 1 e sc j E j H inf = E 0 s 1 H j inf E 0 s 1 H j e s gν i=1 Z j i,ν + o 1 e szj i,ν } gν + o 1 + o 1 1 L j ϵ 2 jν} gν + o 1 for some large ν = o 1. The last inequality is obtained by using Lemma 3.1. A similar approach proves that for all j < d, β j ν 0 as ν. T ACKNOWLEDGEMENTS This research is supported by the National Science Foundation grant DMS Research of the second author is partially supported by the National Security Agency grant H We thank the associate editor and professor Mukhopadhyay for their help and attention to our work. 20

21 REFERENCES Armitage, P Sequential Analysis with More Than Two Alternative Hypotheses, and Its Relation to Discriminant Function Analysis, Journal of Royal Statistical Society, Series B 12: Baillie, D. H Multivariate Acceptance Sampling Some Applications to Defence Procurement, Statistician, 36: Bartroff, J. and Lai, T. L Multistage Tests of Multiple Hypotheses, Communications in Statistics- Theory and Methods, 39: Baum, C. W. and Veeravalli, V. V A Sequential Procedure for Multihypotheses Testing, IEEE Transactions on Information Theory, 40: Benjamini, Y. and Hochberg, Y Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, Journal of Royal Statistical Society, Series B 57: Benjamini, Y., Bertz, F. and Sarkar, S Recent Developments in Multiple Comparison Procedures, IMS Lecture Notes - Monograph Series, Beachwood, Ohio: Institute of Mathematical Statistics. Berger, J. O Statistical Decision Theory, New York: Springer-Verlag. Betensky, R. A An O Brien-Fleming Sequential Trial for Comparing Three Treatments, Annals of Statistics, 24: Billingsley, P Probability and Measure, New York: Wiley. Borovkov, A. A Matematicheskaya Statistika Mathematical Statistics, Novosibirsk: Nauka. Darkhovsky, B Optimal Sequential Tests for Testing Two Composite and Multiple Simple Hypotheses, Sequential Analysis, 30: De, S. K. and Baron, M Step-up and Step-down Methods for Testing Multiple Hypotheses in Sequential Experiments, Journal of Statistical Planning and Inference, DOI: /j.jspi Edwards, D Extended-Paulson Sequential Selection, Annals of Statistics, 15: Edwards, D. G. and Hsu, J. C Multiple Comparisons with the Best Treatment, Journal of American Statistical Association, 78: Glaspy, J., Jadeja, J. S., Justice, G., Kessler, J., Richards, D., Schwartzberg, L., Rigas, J., Kuter, D., Harmon, D., Prow, D., Demetri, G., Gordon, D., Arseneau, J., Saven, A., Hynes, H., Boccia, R., O Byrne, J., and Colowick, A. B A Dose-Finding and Safety Study of Novel Erythropoiesis Stimulating Protein NESP for the Treatment of Anemia in Patients Receiving Multicycle Chemotherapy, British Journal of Cancer, 84: Govindarajulu, Z The Sequential Statistical Analysis of Hypothesis Testing, Point and Interval Estimation, and Decision Theory, Columbus, Ohio: American Sciences Press. Hamilton, D. C. and Lesperance, M. L A Consulting Problem Involving Bivariate Acceptance Sampling by Variables, Canadian Journal of Statistics, 19: Holm, S A Simple Sequentially Rejective Multiple Test Procedure, Scandinavian Journal of Statistics, 6: Hughes, M. D Stopping Guidelines for Clinical Trials with Multiple Treatments, Statistics in Medicine, 12: Jennison, C. and Turnbull, B. W Group Sequential Tests for Bivariate Response: Interim Analyses of Clinical Trials with Both Efficacy and Safety Endpoints, Biometrics, 49: Jennison, C. and Turnbull, B. W Group Sequential Methods with Applications to Clinical Trials, Boca Raton: Chapman & Hall. Lake, D. E Minimum Chernoff Entropy and Exponential Bounds for Locating Changes, IEEE Transactions on Information Theory, 46:

22 Lange, N., Ryan, L., Billard, L., Brillinger, D., Conquest, L., and Greenhouse, J Case Studies in Biometry, New York: Wiley. Lee, M.-L. T Analysis of Microarray Gene Expression Data, Boston: Kluwer Academic Publishers. Novikov, A Optimal Sequential Multiple Hypothesis Testing, Kybernetika, 45: O Brien, P. C Procedures for Comparing Samples with Multiple Endpoints, Biometrics, 40: Pocock, S. J., Geller, N. L., and Tsiatis, A. A The Analysis of Multiple Endpoints in Clinical Trials, Biometrics, 43: Sarkar, S. K Step-up Procedures Controlling Generalized FWER and Generalized FDR, Annals of Statistics, 35: Sidak, Z Rectangular Confidence Regions for the Means of Multivariate Normal Distributions, Journal of American Statistical Association, 62: Sobel, M. and Wald, A A Sequential Decision Procedure for Choosing One of Three Hypotheses Concerning the Unknown Mean of a Normal Distribution, Annals of Mathematical Statistics, 20: Tang, D.-I. and Geller, N. L Closed Testing Procedures for Group Sequential Clinical Trials with Multiple Endpoints, Biometrics, 55: Tartakovsky, A. G. and Veeravalli, V. V Change-Point Detection in Multichannel and Distributed Systems with Applications, In N. Mukhopadhyay, S. Datta and S. Chattopadhyay, Eds., Applications of Sequential Methodologies, pages , New York: Marcel Dekker, Inc. Tartakovsky, A. G., Li, X. R., and Yaralov, G Sequential Detection of Targets in Multichannel Systems, IEEE Transactions on Information Theory, 49:

Multiple Testing in Group Sequential Clinical Trials

Multiple Testing in Group Sequential Clinical Trials Tian Zhao Supervisor: Michael Baron Department of Mathematical Sciences University of Texas at Dallas txz122@utdallas.edu 7/2/213 1 Sequential statistics