An Alpha-Exhaustive Multiple Testing Procedure

Similar documents
MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

On Generalized Fixed Sequence Procedures for Controlling the FWER

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Familywise Error Rate Controlling Procedures for Discrete Data

Control of Directional Errors in Fixed Sequence Multiple Testing

On adaptive procedures controlling the familywise error rate

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

A class of improved hybrid Hochberg Hommel type step-up multiple test procedures

The International Journal of Biostatistics

Adaptive Dunnett Tests for Treatment Selection

Hochberg Multiple Test Procedure Under Negative Dependence

Applying the Benjamini Hochberg procedure to a set of generalized p-values

On weighted Hochberg procedures

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

Two-stage stepup procedures controlling FDR

INTRODUCTION TO INTERSECTION-UNION TESTS

Statistica Sinica Preprint No: SS R1

Closure properties of classes of multiple testing procedures

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Modified Simes Critical Values Under Positive Dependence

Optimal exact tests for multiple binary endpoints

Adaptive Treatment Selection with Survival Endpoints

A note on tree gatekeeping procedures in clinical trials

Consonance and the Closure Method in Multiple Testing. Institute for Empirical Research in Economics University of Zurich

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Group sequential designs for Clinical Trials with multiple treatment arms

IMPROVING TWO RESULTS IN MULTIPLE TESTING

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Bonferroni - based gatekeeping procedure with retesting option

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

ROI ANALYSIS OF PHARMAFMRI DATA:

Multiple Testing in Group Sequential Clinical Trials

Procedures controlling generalized false discovery rate

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

Lecture 6 April

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Adaptive clinical trials with subgroup selection

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

Two-Phase, Three-Stage Adaptive Designs in Clinical Trials

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

Stepwise Gatekeeping Procedures in Clinical Trial Applications

Type-II Generalized Family-Wise Error Rate Formulas with Application to Sample Size Determination

University of California, Berkeley

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Finding Critical Values with Prefixed Early. Stopping Boundaries and Controlled Type I. Error for A Two-Stage Adaptive Design

ON TWO RESULTS IN MULTIPLE TESTING

Multiple Testing. Anjana Grandhi. BARDS, Merck Research Laboratories. Rahway, NJ Wenge Guo. Department of Mathematical Sciences

Alpha-recycling for the analyses of primary and secondary endpoints of. clinical trials

Statistical Applications in Genetics and Molecular Biology

arxiv: v1 [math.st] 14 Nov 2012

GENERALIZING SIMES TEST AND HOCHBERG S STEPUP PROCEDURE 1. BY SANAT K. SARKAR Temple University

arxiv: v1 [math.st] 13 Mar 2008

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

Statistical Applications in Genetics and Molecular Biology

Group Sequential Trial with a Biomarker Subpopulation

Lecture 7 April 16, 2018

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

UNIFORMLY MOST POWERFUL TESTS FOR SIMULTANEOUSLY DETECTING A TREATMENT EFFECT IN THE OVERALL POPULATION AND AT LEAST ONE SUBPOPULATION

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

14.30 Introduction to Statistical Methods in Economics Spring 2009

Inference for Binomial Parameters

Chapter Seven: Multi-Sample Methods 1/52

Contrasts and Multiple Comparisons Supplement for Pages

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Philippe Delorme a, Pierre Lafaye de Micheaux a,

Step-down FDR Procedures for Large Numbers of Hypotheses

Adaptive Designs: Why, How and When?

This paper has been submitted for consideration for publication in Biometrics

Exact and Approximate Stepdown Methods For Multiple Hypothesis Testing

Dose-response modeling with bivariate binary data under model uncertainty

Hypothesis testing. Data to decisions

Control of Generalized Error Rates in Multiple Testing

Multiple Testing of General Contrasts: Truncated Closure and the Extended Shaffer-Royen Method

Practice Problems Section Problems

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Analysis of Multiple Endpoints in Clinical Trials

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

The Design of a Survival Study

Multiple testing: Intro & FWER 1

A BAYESIAN STEPWISE MULTIPLE TESTING PROCEDURE. By Sanat K. Sarkar 1 and Jie Chen. Temple University and Merck Research Laboratories

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Non-specific filtering and control of false positives

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Notes on Decision Theory and Prediction

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Christopher J. Bennett

On Methods Controlling the False Discovery Rate 1

Transcription:

Current Research in Biostatistics Original Research Paper An Alpha-Exhaustive Multiple Testing Procedure, Mar Chang, Xuan Deng and John Balser Boston University, Boston MA, USA Veristat, Southborough MA, USA Article history Received: 9-07-06 Revised: 07-08-06 Accepted: 06--06 Corresponding Author: Mar Chang Boston University, Boston MA, USA and Veristat, Southborough MA, USA Email: mar.chang@veristat.com Abstract: A multiple testing procedure can be a single-step procedure such as Bonferroni s method or a stepwise procedure such as Hochberg s stepup method and Hommel s method. It can be an -exhaustive or - conservative approach. We develop a single -exhaustive procedure that can improve power -5% over Hochberg s and Hommel s methods in common situations when the test statistics are mutually independent. The method can also be generalized to dependent test statistics. The idea behind our method is to construct the rejection rules using the product of marginal p-values and by controlling the upper bounds of the th order terms so that is controlled for any configuration of null hypotheses. Such upper bounds or critical values are determined progressively from = towards = K, the number of null hypotheses in the problem. Keywords: Multiple Testing, Alpha-Exhaustive, Hypothesis Test Introduction Multiple testing problems are common in pharmaceutical statistics and life-sciences in general. The main goal of Multiple Testing Procedures (MTP) is to (strongly) control the family wise type-i error rate. A MTP can be a single-step data-independent procedure such as Bonferroni s method or a data-dependent stepwise procedure such as Hochberg s stepup method and Hommel s stepup method. It can be an-exhaustive or -conservative approach. Conceptually, stepwise procedures are usually more powerful than single-step procedures and -exhaustive procedures are usually more powerful than -conservative approach. However, consider these two aspects together, comparisons of testing procedures are not that simple, often depending on the configuration of the alternative "hypotheses" or more precisely, the truths. For example, the power of a fallbac procedure is dependent on the weight and "effective sizes" in the alternative hypotheses. A fixed sequence testing procedure is a special case of the fallbac procedure and its power is heavily dependent on the order of the test sequence of the hypothesis. We develop a simple single -exhaustive procedure that can improve power -5% over Hochberg s and Hommel s methods in common situations when the test statistics are independent. The method can also be generalized to dependent test statistics. The idea behind our method is to construct the rejection rules using the product of marginal p-values and by controlling the upper bounds of the th order terms so that is exhausted for any configuration of null hypotheses. Such upper bounds or critical values are determined progressively from = towards = K, the number of null hypotheses in the problem. Unlie common stepwise test procedures, in which every step in the decision rule will only involve one critical value for decision-maing, the proposed -exhaustive approach is a single-step method with multiple critical values involved in the decision rules. The paper is organized as follows. In section, we will review several important stepwise test procedures that will be used in our power comparisons. In section 3, we elaborate our progressive -exhaustive procedure for two-hypothesis testing, outline the idea, derive the formulations for critical values and provide examples of using this procedure in comparison with other methods. We also provide the power formulation for the -exhaustive procedure for two-hypothesis testing. In section, we provide power comparisons among several different methods using simulation. In section 5, we extend the -exhaustive procedure to threehypothesis testing and compare with Hommel s procedure in power under broad conditions. In section 6, we further describe the -exhaustive procedure for general K-hypothesis testing and simulation algorithms for determining the critical values. In the last section, discussion and summary are provided. We place 06 Mar Chang, Xuan Deng and John Balser. This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license.

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. mathematical derivation in the Appendix. To mae the procedure ready for practical use, we have included the SAS code in the Appendix. Multiple Testing Procedures Stepwise procedures are different from single-step procedures, in the sense that a stepwise procedure must follow a specific order to test each hypothesis. In general, stepwise procedures are more powerful than single-step procedures. There are three categories of stepwise procedures that are dependent on how the stepwise tests proceed: Stepup, stepdown and fallbac procedure. The commonly used stepwise procedures include the Bonferroni-Holm stepdown method (Holm, 979), the Holm stepdown method (Dmitrieno et al., 009, p.65), Hommel s stepup procedure (Hommel, 988), Hochberg s stepup method (Hochberg, 988), the fallbac procedure (Wiens, 003) and the sequential test with fixed sequences (Westfall et al., 999). Stepdown Procedure A stepdown procedure starts with the most significant p-value and ends with the least significant p-value. In the procedure, the p-values are arranged in an ascending order, i.e., from the smallest to the largest: p p p () ( ) ( )... ( K ) with the corresponding hypotheses: H, H,..., H () K ( ) ( ) ( ) The test proceeds from H () to H (K). If p () >C ( =,..., K), retain all H (i) (i ); otherwise, reject H () and continue to test H (+). The constants C are different for different procedures. The adjusted p-values are: pɶ = C p ( ) pɶ = max ( pɶ, C p( ) ), =,..., n (3) Therefore an alternative test procedure is to compare the adjusted p-values against the unadjusted. After adjusting p-values, one can test the hypotheses in any order. Fallbac Procedure (Wiens, 003) The Holm procedure is based on a data-driven order of testing, while the fixed-sequence procedure is based on a prefixed order of testing. A compromise between them is the so-called fallbac procedure. The fallbac procedure was introduced by Wiens (003) and was further studied by Dmitrieno et al. (006; Hommel and Bretz, 008). The test procedure can be outlined as follows. In the fallbac procedure, we allocate the overall error rate among the hypotheses according to their weights w, where w 0 and w =. For fixed sequence test, w = and w =... = w K = 0: Step : Test H at = w. If p, reject this hypothesis; otherwise retain it. Go to the next step Step i =,..., K : Test H at = + w if H is rejected and at = w if H is retained. If p, reject H ; otherwise retainit. Go to the next step Step K: Test H K at K = K + w K if H K is rejected and at K = w K if H K is retained. If p K K, reject H K ; otherwise retainit The formula for the adjusted p-value is complicated to be written explicitly. Stepup Procedure A stepup procedure starts with the least significant p- value and ends with the most significant p-value. The procedure proceeds from H (K) to H (). If, P () C ( =,..., K), reject all H (i) (i ); otherwise, retain H () and continue to test H ( ). The adjusted p-values are: pɶ K = CK p( ), K pɶ = min ( pɶ +, C p( ) ), = K,..., () Progressive -Exhaustive Testing Procedure An -exhaustive procedure is a closed testing procedure based on intersection hypothesis tests the size of which is exactly. In other words, Pr (Reject H I ) = for any intersection hypothesis H I, I {,...,K}. Put in a simple way, in an -exhaustive procedure, the supremum of the probability of false rejection in any null hypothesis configuration is equal to. Many stepwise test procedures have been developed, which are not necessarily -exhaustive. Therefore, there is a room for improvement. However, an -exhaustive procedure is not necessarily a powerful test. A fixed sequence testis an -exhaustive test, but it is often the least powerful test if the sequence of tests was inappropriately chosen. Let s discuss the situation of two-hypothesis testing: H : H H versus H : H H (5) o Here H is the negation of H, =,. In this setting, if either H or H is rejected, the null hypothesis H o is 3

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. rejected. Let p and p be the marginalp-values for testing H and H, respectively. The decision rules of the progressive -exhaustive testing procedure are: If p p and p, then reject H If p p and p, then reject H where critical value > 0 and > 0. The idea behind this procedure is to borrow strength among marginal p-values. In plain language, the procedure says that we don t have to mae an adjustment, as long as p and the other p-value p is small. For example, if p = and p = 0.0, we can reject H. The and are so determined that when both H and H are true, the type-i error will not exceed. The procedure can control the Familywise Error Rate (FWER) strongly but at the same time exhaust all under all the null hypothesis configurations: H, H and H H. This is done progressively as described below. Step : To control the familywise error rate at level, when only H is true and H is not true, then p p can be satisfied with probability of (if, for example, the test drug is very effective, p will be virtually always 0). Therefore, to control FWER, a necessary condition is sup Pr(p p p H H ) = sup Pr (p H ) =. Therefore, type-i error is strongly controlled and exhausted when H H is true. By the same toen, we can reach the same conclusion when H H is true. Step : Now we need to determine and to exhaust when H H is true. In this study, we only consider the case when the two test statistics, p and p, are independent. Under the global null hypothesis H 0, p and p are iid U (0, ), which is equivalent to two standard normal test statistics: z = -Φ(p ) and z =-Φ(p ) under H 0 being independent. However, woring on the p-scale, the testing procedure can be used for different endpoints (normal, binary, survival), as long as p and p are independent and stochastically equal to or larger than uniform p and p. Since T = p p < implies p <, we have the p conditional cdf for T: T p ( ) p, < = < F T p, p p (6) Fig.. Pr (p p p ) If, then Pr (p p p H,H ) = Pr (p H,H ) =. If <, then (Fig. for a geometric interpretation): Pr ( p p p ) F ( T p T p ) f ( p ) dp dp dp (7) = < = + 0 0 = + ln Pr Thus: ( p p p ) p, = + ln, (8) Consequently, the type-i error rate under H H is given by (for simplicity, we just use FWER (H H )): ( H ) ( p p p p p p ) ( p p p ) Pr( p p p ) ( p p ( a ) p p ) FWER H = Pr = Pr + Pr min, = + ln + + ln ( ), for min, (9) In (9), we have used the following result: if min(, ), then: 3

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. ( p p ( ) p p ) ( p p ) Pr min, = Pr = (0) However, if > min(a, ), then the probability becomes (Fig. ): Pr ( p p p p ) min (, )/ = dp + dp 0 min (, )/ p = min (, ) + min (, ) ln min, () ( ) To summarize the type-i error rates under various null configurations, we have: ( ) ( ) FWER H = FWER H = FWER H ( H ), min > = + ln + + ln, min + ln + + ln min + ln, min < min where, min = min (, ): () (3) The and are determined so that FWER(H H ) =. We are not interested in, because p p in the rejection criteria has no effect. In fact, FWER(H H ) = - = will have no solution for any between 0 and. We are not interested in < either, because it maes the conditions, p < and p <, have no effect in determining the rejection boundary. In fact: FWER( H H ) = + ln + + ln min + ln = min has no solution for < and <. Therefore, the only scenario that we are interested in is: FWER ( H H ) = + ln + + ln, () With (), we can determine the rejection boundaries, for given. Here are the steps: Choose so that < Let FWER (H H ) = to solve for That is, is the solution of: + ln + + ln =, min (5) Examples of critical values from (5) are presented in Table and. When =, (5) can be simplified as: ln + = (6) Fig.. Pr (p p p p ) The critical values for various with = are presented in Table 3. The rejection boundaries in Table -3 have been verified each by 0,000,000 simulations. Illustrative Example Suppose in the Statistical Analysis Plan for a cardiovascular trial with two primary endpoints (not coprimary endpoints that have to be met simultaneously), the two test statistics for the two hypotheses (H and H ) of the two endpoints are assumed to be independent and the -exhaustive procedure (with one-sided = = 0.00855 and = 0.05) was specified in the analysis for 33

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. the multiplicity adjustment to control FWER. At the end of trial, the p-values for the two endpoints are: Scenario () one-sided p = 0.0 and p = 0.05, scenario () p = 0.0 and p = 0., (3) p = 0.05 and p = 0.0, () p = 0.0 and p = 0.6 and (5) p = 0.0 and p = 0.5. Using the -exhaustive procedure for scenario (), we reject both H and H because p p = 0.0006 p = 0.0 to reject H and p p = 0.0006 p = 0.05 to reject H. For scenario (), we reject H but not H because p p = 0.008 p = 0.0 and p p = 0.008 p = 0. >. For scenario (3), we reject H, but not H. For scenario (), we reject H but not H. For scenario (5), we can reject neither H nor H. Using the test procedures described in the previous section, we summarize the rejection status in Table. The -exhaustive procedure can reject at least one hypothesis except for scenario 5. The reason that -Ex method (with = ) cannot reject any hypothesis for scenario (5) is that the method emphasizes the consistency of the evidence against all the hypotheses and such consistency is obviously not presented in scenario (5). The power of -Ex procedure ( = ) for the twohypothesis at one-side -level can be written as: ( z ) ( z ) ( ( ) ( )) Φ Φ Power = Pr ( min Φ z, Φ z < H, H As a comparison, the power of Hommel s procedure for the two-hypothesis at one-sided -level can be written as: ( Φ( z ) Φ( z )) ( ( ) ( )) min Power = Pr min Φ z, Φ z < / H, H Table. Critical values for one-sided = 0.05 0.000095 0.000650 0.00000 0.00000 0.003000 0.00000 0.00855 0.005000 0.05000 0.088 0.0856 0.009378 0.0078 0.0058 0.00855 0.007 Table. Critical values for one-sided = 0.05 0.00035 0.00500 0.00000 0.005000 0.006000 0.007000 0.008000 0.00097 0.050000 0.0565 0.00078 0.0760 0.05607 0.0393 0.0508 0.00097 Table 3. Critical values for one-sided test when = 0.005000 0.00000 0.05000 0.050000 0.075000 0.00000, 0.0009 0.00897 0.00855 0.00097 0.05739 0.0798 Table. Rejection with different test procedures Scenario ---------------------------------------------------------------------------------------------------------------------- Method 3 5 Fixed Sequence H, H H H H Bonferroni H H Fallbac (w = 0.5) H H Holm-Stepdown H H Hochberg H, H H Hommel H, H H H -exhaustive H, H H H H Note: One-sided = 0.05, = = 0.00855 for -exhaustive Table 5. Power comparisons for two-hypothesis testing (δ = 0.3, σ = ) δ = 0.3 δ = 0.5 δ = 0 ---------------------------------- ---------------------------------- ----------------------------- Method Power Power Power Power Power Power Fixed Seq (H,H ) 0.60 0.800 0. 0.80 0.00 0.05 Fixed Seq (H,H ) 0.60 0.800 0. 0.800 0.00 0.800 Bonferroni 0.59 0.96 0.50 0.77 0.009 0.73 Fallbac (H,H ) 0.590 0.96 0.68 0.78 0.00 0.730 Fallbac (H,H ) 0.590 0.96 0. 0.783 0.08 0.73 Holm 0.65 0.96 0.33 0.78 0.09 0.730 Hochberg 0.660 0.933 0. 0.79 0.00 0.73 Hommel 0.660 0.933 0. 0.79 0.00 0.73 Progressive -Ex 0.660 0.96 0.0 0.83 0.00 0.7 Note: Sample size = 90, one-sided = 0.05, = = 0.00855 for Progressive -Ex 3

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. Power Comparison of Two Hypotheses There seems a general impression that whatever the test procedure to use the power of rejection cannot be improved significantly for two-hypothesis testing. However, this is not necessarily true. We have compared power of seven different testing methods described in section and presented results in Table 5, where Power is probability of simultaneously rejecting H :δ 0 and H : δ 0 and Power is the probability of rejecting either H or H. For the fallbac procedure the weights w = w = 0.5 are used. The fixed sequence method is equivalent to the fallbac procedure with w = and w = 0. The progressive -exhaustive procedure performs overall the best, while the Hommel method performs the second best. In general, Holm procedure is uniformly more powerful than the Bonferroni procedure. Hochberg s procedure is uniformly more powerful than Holm s procedure and Hommel s procedure is uniformly more powerful than Hochberg s procedure. Holm, fixed-sequence and fallbac are nonparametric and control FWER for any joint distribution of test statistics. Hommel and Hochberg procedures are semi parametric and control FWER only for some joint distributions, including positively dependent test statistics such as multivariate normal test statistics. Nonparametric procedures mae no assumptions about the joint distribution of test statistics which results in power loss (Dmitrieno, 03). For two-hypothesis testing, Hochberg s method is equivalent to Hommel s method. The power of the fallbac method depends on the weights w i and the order of the hypotheses. Comparisons of Hommel s method to use of the -Ex method with different and are presented in Table 6. For -Ex, = = 0.00855; -Ex, = 0.003355; = ; -Ex 3, = 0.003798, =.6 ; -Ex, = 0.0033, =.5 ; -Ex 5, = 0.0033, =.5. From the table, we can see that all -Ex procedures perform very well except -Ex 5, in which the treatment effect δ is smaller than δ, but alphas were set up in a wrong direction ( =.5 > ). In general, should be chosen larger than if δ is expected larger than δ ; otherwise choose. Table 6. Power comparison for two-hypothesis testing δ = δ (σ = ) ------------------------------------------------------------------------ Method 0.0/0.3 0.03/0.3 0.05/0.3 0./0.3 0.5/0.3 0./0.3 0.3/0.3 Hommel 0.73 0.736 0.7 0.759 0.79 0.836 0.933 -Ex 0.7 0.736 0.75 0.796 0.83 0.890 0.96 -Ex 0.75 0.76 0.777 0.8 0.85 0.893 0.96 -Ex 3 0.735 0.755 0.770 0.808 0.850 0.89 0.96 -Ex 0.73 0.76 0.76 0.803 0.86 0.89 0.96 -Ex 5 0.700 0.75 0.7 0.790 0.839 0.887 0.96 The reason that -Ex method (with = ) has lower power than Hommel s method at the extreme case when δ = 0 and δ = 0.3 is that the former emphasizes the consistency of the evidence against all the hypotheses, while δ = 0 and δ = 0.3 are clearly inconsistent. If we believe δ is smaller than δ, we should use different and (e.g., =.6 in - Ex 3 ). However, if the directional guess is wrong, it could reduce the power as seen in -Ex 5. Formulation for Three Hypotheses We now discuss the progressive -Exhaustive procedure for three-hypothesis testing: H : H H H vsh : H H H (7) 0 3 a 3 Similar to two-hypothesis testing, the rejectionacceptance rules of the -exhaustive procedure for threehypothesis testing are: If p p p 3 p p p p 3 p, reject H ; otherwise accept H If p p p 3 p p p p 3 p, reject H, otherwise accept H If p p p 3 < p 3 p 3 p 3 p 3 p 3, reject H 3, otherwise accept H 3 Here 3. Determination of Critical Values The derivations of the critical values are placed in the Appendix. Here we described the ey steps and summarized the results. The critical values, and 3 are determined by all the paired null hypotheses: H H, H H 3 and H H 3. To exhaust familywise error rate, it necessarily requires that FWER(H H ) =, FWER(H H 3 ) = and FWER(H 3 H ) =, which are equivalent to (Appendix), respectively: + ln + + ln = + ln + 3 + 3 ln =, < 3 < (8) 3 3 + 3 ln + + ln = 3 Each equation in (8) is in the same expression as (5). Therefore, the critical values in Table -3 are also valid for (8). 35

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. Table 7. Critical values for one-sided test ( = = 3 ) 0.00000 0.05000 0.050000 0.075000 0.00000,, 3 0.00897 0.00855 0.00097 0.05739 0.0798 0.0005 0.00677 0.00557 0.007566 0.009966 Table 8. Power comparison δ = δ (δ 3 = 0.3, σ = ) ------------------------------------------------------------------------------------------------------------------------------- Method 0.0/0.0 0.0/0.3 0.03/0.3 0./0.3 0./0. 0./0. 0.3/0.3 Hommel 0.8 0.735 0.737 0.79 0.6 0.533 0.869 Dunnet Step-up 0.78 0.75 0.78 0.78 0.60 0.58 0.86 Graphic Approach 0.78 0.7 0.7 0.773 0.508 0.55 0.85 -Ex 0.70 0.756 0.775 0.885 0.698 0.599 0.9 Note: = 0.05, = = 3 = 0.00855, = 0.00677, sample size = 60 To exhaust under H H H 3, it requires that FWER(H H H 3 ) =, that is (assume = = 3, Appendix): 3 3 + ln + 3 ( ) + 3 = (9) Now we can use (8) to determine, and 3 and use (9) to determine for the case when = = 3. Examples of critical values for various are presented in Table 7. The critical values can also be determined through simulations. Especially when dimension is high, simulation is a convenient way to obtain the results: For given (,, 3 ), we can use simulation by trying different until FWER(H H H 3 ) =. We have verified the critical values through simulations: For = 0.05 and = = 3 = 0.00855, = 0.00677; the type-i error rate is 0.05003 under H H H 3 through 0,000,000 simulations. This progressive method to determine the critical values can be generalized to K-hypothesis testing. Power Comparison Let H i : δ i 0, i =,, 3. Using the rejection boundaries in Table 7, we can easily obtain the power of the -Ex method through simulations. To compare the performance of -Ex, we compared the best method, Hommel s method as the standard. Since there are three hypotheses, it is meaningful to compare our approach to other more recently developed approaches. However, the gate eeping procedure is difficult to communicate with the nonstatisticians and requires large set of tests when the number of individual hypotheses increases. An iterative graphical approach by Bretz et al. (009) deals with those weaness and constructs the Bonferroni-type tests with a simple updating algorithm that fully describes a sequentially rejective test procedure. The graphic approach was then extended by dissociating the underlying weighting strategy and applied using weighted Bonferroni tests, weighted parametric tests and weighted Simes tests (Bretz et al., 0). The existing methods controls the FWER; However, the power is addressed or compared with other methods. From Table 8, we can see that the -Ex procedure provides more power in all cases except the case when δ = δ = 0 and δ = 0.3. Again, lie in the case of two-hypothesis testing, when the parameters in the alternative hypotheses (e.g., effects of the different endpoints) are very different, we should use different, and 3 such that their trend is in opposite to the trend of parameters in the alternative hypotheses. K-Hypothesis Testing Procedure We now discuss the progressive -exhaustive procedure for K-hypothesis testing. To avoid the rejection boundary being too small, causing inconvenience, we use term p i= i instead of p i = i in the decision rules for a general K-hypothesis testing. It is obvious that these two test statistics are equivalent in terms of power. For K-hypothesis testing, the K rejection rules are specified as: We will reject H i if and only if: /3 / ( ) ( ) p p p, p p, p. i l l i i > l, i, l i i We didn t include higher order term of p-product because through simulations, we find that even we set p i p j p p m, the FWER is controlled. This means that for a multiple testing problem with more than four hypothese, the proposed procedure may not be an alpha-exhaustive one. The rejection boundaries,,... K are progressively determined: Determine = based on 36

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. one-hypothesis testing, then given, determine based on two-hypothesis testing; and given and, determine 3 based on three-hypothesis testing. The process continues until K is determined. For high dimensional hypothesis testing problems, Partition Principle for multiple testing (Hsu, 996) can be used to reduce the number of null configurations to be tested. Simulation is usually more convenient than numerical integration when the dimension is high. Summary and Discussion To construct a MTP, we need to consider at least three things to ensure the power: () -exhaustive, () synergize strengths among data for local hypothesis or marginal p-values and (3) be able to use correlations between local test statistics or local p-values. In principle, the proposed -exhaustive procedure has considered all three aspects. To achieve -exhaustive, we use the marginal p-value product corresponding to each null hypothesis configuration and enforce it with an upper bound in the rejection rules. Such p-value product terms in the rejection rules also ensure the synergy between the marginal p-values. The K-hypothesis testing algorithm can be applied to the test statistics with correlations with modifications of critical regions for rejection (the critical values in Table -3 are applicable for independent test statistics), but due to its complexity and larger applications in clinical trials (dose-finding, subgroup analysis, adaptive design), we are developing separate manuscripts to address that. Unlie traditional stepwise procedures, the decision rule in the progressive -exhaustive procedure explicitly uses a set of statistics (p,p p,p p 3,p p p 3,etc.) with a set of critical values in the decision rule for rejecting a single H ( =,,..., K). In this sense, the decision rules in the -exhaustive procedure are expressed in the form of "adjusted p-values" and hence the order of the tests is irrelevant. Many stepwise testing procedures also have the feature of borrowing strengths among data for local hypotheses, but such dependencies are realized through a discrete function. The-exhaustive procedure uses a continuous dependency function of marginal p- values, i.e., product of p-values, which is more effective. We have also tried other functions such as average p-values or linear combination of normalinverse p-values, the results are similar. In summary, the proposed progressive -exhaustive procedure is not only statistically powerful, but it also stresses the importance of clinical/practical meaningfulness since the method emphasizes the consistency among the evidences coming from different endpoints, different doses and different populations, that is, the totality of the evidence. The test procedure is simple and performs well in broad situations. When the true "standardized" effect size (value of the parameter) is very different for different hypothesis, the critical values don t have to the same for rejecting all the hypotheses. Instead, the critical values can be different and optimized based on the prior information on effect size or considering the importance of different endpoints. Appendices Derivations of Formulations of Critical Values for Three-Hypothesis Testing FWER H = suppr H3 H3 ( H ) ( p p p3 p p p p3 p ) ( p pp3 p p p p3 p ) (( p p p ) ( p p p )) = suppr ( p p p ) Pr ( p p p ) ( p p p p ) = Pr + Pr = + ln + + ln,, < where, FWER(H H ) is the type-i error rate under FWER(H H ) and sup is the supreme under all H3 possible H 3. To control type-i error under H H, H H 3 and H 3 H, it is required that, respectively: ln ln + ln + + ln = + ln + + ln = + + 3 + 3 = 3 3 3 3 (A) From (A ), we can see that among, and 3, at least two should be the same, either = or = 3 ( ). The type-i error rate under H H H 3 can be expressed as: ( 3 ) ( p p p3 p p p p3 p ) ( p p p3 p p p p3 p ) ( p p p3 p3 p 3 p3 p 3 p3 ) FWER H H H = Pr For the purpose of critical value derivation, we rewrite FWER(H H H 3 )as: 37

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. ( ) FWER H H H = π + π 3 + π π π π + π 3 3 3 33 3 where, under H H H 3 : ( p p p p p p p p ) ( p p p p p p p p ) ( p p p p p p p p ) (A) π = Pr 3 3, π = Pr 3 3, (A3) π3 = Pr 3 3 3 3 3 3 p p p p p Pr, 3 π = p p3 p p3 p p p p p3 p p π 3 = Pr, (A) p p3 p3 p 3 p p3 p p p3 p3 p 3 π 3 = Pr, p p3 p p p3 p and: π 3 p p p3 p p < p p3 < = Pr p3 p < p < p < p3 < (A5) We will use Fig. 3 to assist in our integration of π, π andπ 3. That is: π + = adp dp + dp dp + dp dp Ω Ω p Ω3 p Ω dp dp p p = + + + p dpdp dp dp 0 p p dp dp p p = + ln + (A6) Similarly, we use Fig. to assist in our integration of π, π 3 and π 3. That is: π = dp dp + dp dp + dp dp Ω Ω p Ω3 p ( since ) p dpdp 0 p = + < ( ) = a + a = (A7) For π 3, we just give the result for the case when 3 = = 3. Assume, otherwise only p p p 3 has an effect in the joint probabilities, while p p and other will have no effect. Fig. 3.π = π = π 3 38

Mar Chang et al. / Current Research in Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. Fig..π Fig. 5. π 3 39

Mar Chang et al. / American Journal of Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. In fact, the is so small that the three curves p i p j = cut the cube into a smaller cube, as shown in Fig. 5. Therefore, we have: π = dp dp = (A8) 3 3 Ω To Summarize, we have: ( 3 ) FWER H H H = 3 a + ln + 3( ) + 3 = 3 + ln + 3 ( ) + 3 3 (A9) To control, FWER(H H H 3 ) at level, it is required that (assume we have chosen = = 3 ): 3 3a + ln 3 ( ) + 3 = (A0) After = = 3 is determined using (A), we can solve from (A0). For a general case, after is chosen between and, and 3 can be determined using (A) and can be easily obtained from simulations. The power simulations are presented in Table 8. SAS Code for Progressive Test Procedure /* Progressive Alpha-exhaustive Test Procedure for Two-Hypothesis */ %Macro aextesth(nsims, u, u, sigma, N, alpha, alpha, alpha); * nsims = the number of simulation runs; * u, u = parameters for H and H. sigma = common standard deviation; * N = sample size; * alpha, alpha, alpha = critical values on p-scale; * Power = prob of rejecting H or H, * PowerBoth = prob of rejecting H and H.; Data aexh; eep u u N PowerBoth Power; N=&N; u=&u; u=&u; sigma=σ alpha=α Power=0; PowerBoth=0; Do isim= To &nsims; z=rand("normal", &u, sigma/sqrt(n))/sigma*sqrt(n); p=-cdf("normal", z); z=rand("normal", &u, sigma/sqrt(n))/sigma*sqrt(n); p=-cdf("normal", z); sig=0; sig=0; If p*p<=&alpha And p<=alpha Then sig=; If p*p<=&alpha And p<=alpha Then sig=; If sig= OR sig= Then Power=Power+/&nSims; If sig= And sig= Then PowerBoth=PowerBoth+/&nSims; End; Output; Run; %Mend; Title "Checing Type-I Error under H and H"; %aextesth(0000000, 0, 0.0,, 90, 0.00855, 0.00855, 0.05); Proc print data=aexh; Run; Title "Power under H: u=0.3 and H: u= 0.3"; %aextesth(000000, 0.3, 0.3,, 90, 0.00855, 0.00855, 0.05); Proc print data=aexh; Run; /* Progressive Alpha-exhaustive Test Procedure for Three-Hypothesis */ %Macro aextest3h(nsims, u, u, u3, sigma, N, alpha, alpha, alpha); * nsims = the number of simulation runs; * N = sample size; * u, u, u3 = parmeters for H, H and H3; * alpha, alpha, alpha = cretical values on p-scale; * Power = prob of rejecting H or H or H3; * PowerAll = prob of rejecting H, H and H3 simutanneuosly; Data aex3h; eep u u u3 sigma N alpha PowerAll Power; u=&u; u=&u; u3=&u3; sigma=σ N=&N; alpha=α alpha=α alpha=α Power=0; PowerAll=0; Do isim= To &nsims; z=rand("normal", u, sigma/sqrt(n))/sigma*sqrt(n); p=-cdf("normal", z); z=rand("normal", u, sigma/sqrt(n))/sigma*sqrt(n); p=-cdf("normal", z); z3=rand("normal", u3, sigma/sqrt(n))/sigma*sqrt(n); p3=-cdf("normal", z3); sig=0; sig=0; sig3=0; p=p*p*p3; If p<=alpha & p*p<=alpha & p*p3<=alpha & p<=alpha Then sig=; If p<=alpha & p*p<=alpha & p*p3<=alpha & p<=alpha Then sig=; If p<=alpha & p3*p<=alpha & p3*p<=alpha & p3<=alpha Then sig3=; If sig= OR sig= Or sig3= Then Power=Power+/&nSims; 0

Mar Chang et al. / American Journal of Biostatistics 06, 6 (): 30. DOI:0.38/amjbsp.06.30. If sig= And sig= And sig3= Then PowerAll=PowerAll+/&nSims; End; Output; Run; %Mend; Title "Checing Type-I Error under H, H and H3"; %aextest3h(0000000, 0, 0, 0,, 60, 0.00855, 0.00677, 0.05); proc print data=aex3h; Run; Title "Power when u=0, u=0.3 and u3= 0.3"; %aextest3h(000000, 0, 0.3, 0.3,, 60, 0.00855, 0.00677, 0.05); Proc print data=aex3h; Run; Acnowledgment We would lie to than the anonymous reviewers and editors for their reviews. Author s Contributions Mar Chang: Involved in the methodology development, draft and approval of the manuscript. Xuan Deng: Contributed to reveiw, revise and approval of the manuscript. John Balser: Discussion on idea, issues, review manuscript. Ethics This article is original and contains unpublished material. The corresponding author confirms that all of the authors have read and approved the manuscript and no ethical issues involved. Reference Bretz, F., W. Maurer, W. Brannath and M. Posch, 009. A graphical approach to sequentially rejective multiple test procedures. Stat. Med., 8: 586-60. DOI: 0.00/sim.395 Bretz, F., W. Maurer and G. Hommel, 0. Test and power considerations for multiple endpoint analyses using sequentially rejective graphical procedures. Stat. Med., 30: 89.50. DOI: 0.00/sim.3988 Dmitrieno, A., A.C. Tamhane and F. Bretz, 009. Multiple Testing Problems in Pharmaceutical Statistics. st Edn., CRC Press, ISBN-0: 58889853, pp: 30. Dmitrieno, A., B.L. Wiens and P.H. Westfall, 006. Fallbac tests in dose-response clinical trials. J. Biopharmaceutical Stat., 6: 75-755. DOI: 0.080/05300600860600 Dmitrieno, A., 03. Multiple testing procedures in clinical trials. Proceedings of the IBS Worshop, Sept. 9-0, Berlin. Hochberg, Y., 988. A sharper Bonferroni procedure for multiple tests of significance. Biometria, 75: 800-80. DOI: 0.307/33635 Holm, S., 979. A simple sequentially rejective multiple test procedure. Scandinavian J. Stat., 6: 65-70. Hommel, G., 988. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometria, 75: 383-386. DOI: 0.093/biomet/75..383 Hommel, G. and F. Bretz, 008. Aesthetics and power considerations in multiple testing-a contradiction? Biom. J., 50: 657-666. DOI: 0.00/bimj.007063 Hsu, J.C., 996. Multiple Comparisons: Theory and Methods. st Edn., CRC Press, ISBN-0: 0988, pp: 96. Westfall, P.H., R.D. Tobias, D. Rom, R.D. Wolfinger and Y. Hochberg, 999. Multiple comparisons and multiple tests using SAS system. SAS Institute, SAS Campus Drivew, Cary, North Carolina, USA. Wiens, B.L., 003. A fixed sequence Bonferroni procedure for testing multiple endpoints. Pharmaceutical Stat., : -5. DOI: 0.00/pst.6