Multiple Endpoints: A Review and New. Developments. Ajit C. Tamhane. (Joint work with Brent R. Logan) Department of IE/MS and Statistics

Similar documents
Accurate Critical Constants for the One-Sided Approximate Likelihood Ratio. Test of a Normal Mean Vector When the Covariance Matrix Is Estimated

A superiority-equivalence approach to one-sided tests on multiple endpoints in clinical trials

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Hochberg Multiple Test Procedure Under Negative Dependence

Analysis of Multiple Endpoints in Clinical Trials

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

Modified Simes Critical Values Under Positive Dependence

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

Control of Directional Errors in Fixed Sequence Multiple Testing

On Generalized Fixed Sequence Procedures for Controlling the FWER

Control of Generalized Error Rates in Multiple Testing

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Step-down FDR Procedures for Large Numbers of Hypotheses

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

Statistica Sinica Preprint No: SS R1

Stepwise Gatekeeping Procedures in Clinical Trial Applications

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A note on tree gatekeeping procedures in clinical trials

Multistage Tests of Multiple Hypotheses

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

The International Journal of Biostatistics

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

MULTIPLE TESTING TO ESTABLISH SUPERIORITY/EQUIVALENCE OF A NEW TREATMENT COMPARED WITH k STANDARD TREATMENTS

Multiple Comparison Methods for Means

Exact and Approximate Stepdown Methods For Multiple Hypothesis Testing

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

A class of improved hybrid Hochberg Hommel type step-up multiple test procedures

Familywise Error Rate Controlling Procedures for Discrete Data

Lecture 6 April

Chapter Seven: Multi-Sample Methods 1/52

Power and sample size determination for a stepwise test procedure for finding the maximum safe dose

Statistical Applications in Genetics and Molecular Biology

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Reports of the Institute of Biostatistics

5 Inferences about a Mean Vector

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Simultaneous identifications of the minimum effective dose in each of several groups

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

On weighted Hochberg procedures

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

c 2011 Kuo-mei Chen ALL RIGHTS RESERVED

Exact and Approximate Stepdown Methods for Multiple Hypothesis Testing

Inferences about a Mean Vector

Applied Multivariate and Longitudinal Data Analysis

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Multiple Testing. Anjana Grandhi. BARDS, Merck Research Laboratories. Rahway, NJ Wenge Guo. Department of Mathematical Sciences

Multiple comparisons - subsequent inferences for two-way ANOVA

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Confidence Intervals, Testing and ANOVA Summary

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

On adaptive procedures controlling the familywise error rate

Optimal testing of multiple hypotheses with common effect direction

Statistical Applications in Genetics and Molecular Biology

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report -

Multiple Sample Numerical Data

ON TWO RESULTS IN MULTIPLE TESTING

Consonance and the Closure Method in Multiple Testing. Institute for Empirical Research in Economics University of Zurich

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Resampling-Based Control of the FDR

University of California, Berkeley

Statistics 203: Introduction to Regression and Analysis of Variance Course review

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Lecture 5: Hypothesis tests for more than one sample

Lecture 3. Inference about multivariate normal distribution

Robust Backtesting Tests for Value-at-Risk Models

Dose-response modeling with bivariate binary data under model uncertainty

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

STT 843 Key to Homework 1 Spring 2018

A3. Statistical Inference

Multiple Testing in Group Sequential Clinical Trials

Statistical Applications in Genetics and Molecular Biology

arxiv: v1 [math.st] 14 Nov 2012

Central Limit Theorem ( 5.3)

An Application of the Closed Testing Principle to Enhance One-Sided Confidence Regions for a Multivariate Location Parameter

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Asymptotic Statistics-VI. Changliang Zou

UNIFORMLY MOST POWERFUL TESTS FOR SIMULTANEOUSLY DETECTING A TREATMENT EFFECT IN THE OVERALL POPULATION AND AT LEAST ONE SUBPOPULATION

Optimal exact tests for multiple binary endpoints

Bios 6649: Clinical Trials - Statistical Design and Monitoring

IMPROVING TWO RESULTS IN MULTIPLE TESTING

GENERALIZING SIMES TEST AND HOCHBERG S STEPUP PROCEDURE 1. BY SANAT K. SARKAR Temple University

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Group sequential designs for Clinical Trials with multiple treatment arms

arxiv: v1 [math.st] 13 Mar 2008

Closure properties of classes of multiple testing procedures

Multivariate Statistical Analysis

Testing Algebraic Hypotheses

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Optimal rejection regions for testing multiple binary endpoints in small samples

Simultaneous Confidence Intervals and Multiple Contrast Tests

Transcription:

1 Multiple Endpoints: A Review and New Developments Ajit C. Tamhane (Joint work with Brent R. Logan) Department of IE/MS and Statistics Northwestern University Evanston, IL 60208 ajit@iems.northwestern.edu http://users.iems.northwestern.edu/ ajit

2 OUTLINE 1. Notation 2. Hypotheses 3. Global Tests (Homoscedastic Case) 3.1 LR and ALR Tests 3.2 OLS and GLS Tests 3.3 Exact Tests 3.4 Power Comparison Between OLS and SS Tests 4. Global Tests (Heteroscedastic Case) 4.1 ALR Test 4.2 OLS and GLS Tests 5. Global Tests Based on p-values 5.1 Bonferroni Test 5.2 Simes Test

6. Tests for Identifying Individual Significant Endpoints 6.1 Closed Tests 6.2 Normal Theory Tests 6.3 Procedures Based on Individual Adjusted p-values 6.3.1 A Single-Step Procedure 6.3.2 A Step-Down Procedure 6.3.3 A Step-Up Procedure 6.4 Hybrid Approach 7. Some Clinical Decision Rules Based on Multiple Endpoints 3

4 Some Preliminary Remarks This talk will focus on statistical aspects. For clinical aspects and examples, see Chi (1998, 2000), Sankoh (2000), Sankoh, Huque and Dubey (1997), Sankoh, Huque, Russell and D Agostino (1999). This talk will mainly deal with continuous (normal) data. The related problem of multiple tumor sites involving count data will not be addressed.

5 1. Notation Two treatment groups (1 = Treatment, 2 = Control) Sample sizes n 1 and n 2 m = No. of endpoints (all primary) x ijk = Observation on the kth endpoint on the jth subject in the ith treatment group (i = 1, 2; j = 1, 2,..., n i ; k = 1, 2,..., m) x ij = (x ij1,..., x ijm ) N(µ i, Σ i ), i = 1, 2; j = 1, 2,..., n i. δ = µ 1 µ 2 = (δ 1,..., δ m ). x i = (x i 1,..., x i m ) : Vector of sample means. Σ 1 and Σ 2 sample covariance matrices with n 1 1 and n 2 1 degrees of freedom (d.f.)

6 2. Hypotheses Global Hypothesis: Used to decide the overall efficacy of the treatment. H 0 : δ = 0 vs. H 1 : δ O +, where O + is the positive orthant: O + = {δ δ 0, δ 0} Multivariate one-sided testing problem. Marginal Hypotheses: Used to identify individual significant endpoints (endpoint specific inference). For k = 1, 2,..., m, H 0k : δ k = 0 vs. H 1k : δ k > 0. Strong control of the familywise error rate (FWE): FWE = Pr{at least one true H 0k is rejected} α.

7 3. Global Tests (Homoscedastic Case) The common covariance matrix Σ 1 = Σ 2 = Σ = {σ kl } is estimated by Σ = (n 1 1) Σ 1 + (n 2 1) Σ 2 n 1 + n 2 2 with n 1 + n 2 2 d.f. = { σ kl } The population and sample correlation matrices are R = {ρ kl } and R = { ρ kl }. 3.1 LR and ALR Tests Hotelling (1931): T 2 test, being two-sided, is inappropriate. Bartholomew (1959), Kudô (1963): One sample, known Σ. Perlman (1969): One sample, unknown Σ.

8 Tang, Gnecco and Geller (1989): Approximate LR (ALR) Test: One sample, known Σ. As extended to the two sample case the test is as follows: 1. Make the transformation u = n 1 n 2 A(x 1 x 2 ), n 1 + n 2 where A is any positive definite matrix s.t. A A = Σ 1 and AΣA = I. Then u N(θ, I) and the hypotheses become H 0 : θ = 0 vs. H 1 : θ A(δ), where A(δ) is the polyhedral cone: A(δ) = n 1 n 2 Aδ δ O + n 1 + n 2.

2. A is not unique. Tang et al.: Choose A using the Cholesky decomposition s.t. center direction of A(δ) = center direction of O + = (λ,..., λ) for λ > 0. Alternate Choice: Left root symmetric method of Läuter, Kropf and Glimm (1998). 3. A(δ) is approximated by O +. Then projection of u on O + is (u 1 0,..., u m 0) where u k 0 = max(u k, 0). The ALR statistic is g(u) = m (u k 0) 2. k=1 9

10 4. The exact null distribution of g(u) is a chi-bar squared (χ 2 ) distribution: Pr{g(u) > c} = m k=0 m k 2 m Pr ( χ 2 k > c ) If Σ is used in place of Σ then the χ 2 distribution is too liberal. E.g., for m = 6 and α = 0.05:. ν Achieved α 10 0.3550 30 0.1066 50 0.0830 100 0.0611

11 5. F approximation (Tamhane and Logan 2001): Pr{g(u) > c} m k=0 m k 2 m Pr νk F k,ν m+1 > c ν m + 1. Matches the first moment exactly and the second moment approximately. Simulation Results (m = 6, α = 0.05) ν Achieved α 10 0.0464 30 0.0476 50 0.0510

12 Anomalies of LR Tests 1. If the correlations between the endpoints are highly positive then the LR tests may reject H 0 even if δ k = x 1 k x 2 k < 0 for all k (Silvapulle 1997). 2. The LR tests may be nonmonotone, i.e., if δ k = x 1 k x 2 k become more negative, the test statistic may get larger. 3. Cohen and Sackrowitz (1998) proposed cone ordered monotone (test statistic can only increase in the direction of the cone) as a way to fix these problems. Also see Follman (1996). 4. In practice, if several endpoint differences are negative or even if a few differences are large negative then a one-sided test should not be applied.

13 3.2 OLS and GLS Tests To obviate the computational difficulties with the exact LR tests, O Brien restricted the parameter space to δ k σkk = λ k = λ 0 for all k. Then the global hypothesis testing problem becomes H 0 : λ = 0 vs. H 1 : λ > 0. Consider the regression model y ijk = x ijk σkk = µ k σkk + λ 2 I ijk + ɛ ijk where µ k = (µ 1k + µ 2k )/2, I ijk = +1 if i = 1 and 1 if i = 2, and ɛ ijk N(0, 1) r.v. s with Corr(ɛ ij ) = R. Then one can derive λ OLS and λ GLS and their standard errors. Hence one can obtain the corresponding z-statistics if R is known, and t-like statistics if R is unknown for testing H 0.

14 The resulting t OLS and t GLS statistics are as follows. Let t k = n 1 n 2 n 1 + n 2 x 1 k x 2 k σ kk. Then t OLS = j t j Rj and t GLS = j R 1 t j R 1 j, where j = (1, 1,..., 1). Comments: 1. OLS uses the sum of t k statistics. GLS uses a weighted sum of t k statistics with the weight vector = R 1 j. 2. The large sample null distribution of t OLS and t GLS is N(0, 1).

15 3. The small sample null distributions are not known. O Brien suggested to use the t-distribution with n 1 + n 2 2m d.f. as an approx. This approx. is exact for m = 1, but conservative for m > 1. 4. Convergence of t GLS to N(0, 1) is slower than that of t OLS. 5. The powers of OLS and GLS are comparable when applied in a closed testing procedure (Reitmeir and Wassmer 1996). 6. GLS is subject to anomalies. The weights used by GLS may be negative. 7. Therefore OLS is preferred.

16 3.3 Exact Tests Let the total cross-products matrix be V = 2 i=1 n i j=1 (x ij x )(x ij x ). Let w = w(v ) 0 w.p. 1 be any m-vector of weights depending solely on V. Then Läuter(1996) showed that tw = n 1 n 2 n 1 + n 2 w t w Σw is t-distributed with n 1 + n 2 2 d.f. under H 0. Choices for w discussed in Läuter, Kropf and Glimm (1998). Standardized Sum (SS) Test w = 1, v11 1 v22,..., 1 vmm, where v kk = 2 i=1 n i j=1 (x ijk x k ) 2. The SS statistic can be calculated as follows:

17 1. Calculate the standardized sum of observations 2. Calculate y ij = m k=1 x ijk vkk. t SS = n 1 n 2 n 1 + n 2 y 1 y 2 σ y. 3.4 Power Comparison Between OLS and SS Tests The OLS test uses σkk as standardizing factors, while the SS test uses v kk as standardizing factors, which include the between treatment differences. So one would conjecture that Power OLS Power SS.

Asymptotically, for n 1 = n 2 = n, for α-level tests, and Power OLS = Φ Power SS = Φ z α + z α + a δ a Σa b δ b Σb where a = (a 1, a 2,..., a m ), b = (b 1, b 2,..., b m ), Therefore, a k = 1 σ k 2 b k = n 2 n 2 1 σ k 2 + λ 2 k/2. Power OLS Power SS a δ a Σa b δ b Σb. Power OLS > Power SS if λ 1 > 0, λ k = 0 for k > 1. In fact (Frick 1996), lim λ 1 b δ b Σb n 2 < = lim Power SS < 1. λ 1 Note that in this case the OLS test has low power. Power OLS = Power SS if λ k = λ for all k. Note that in this case the OLS test has high power. 18

19 4. Global Tests (Heteroscedastic Case) Covariance matrices are Σ 1, Σ 2 and correlation matrices are R 1, R 2. Define Σ = (1/n 1)Σ 1 + (1/n 2 )Σ 2 1/n 1 + 1/n 2 = Ω 1 + Ω 2 1/n 1 + 1/n 2 and let Ω = Ω 1 + Ω 2. Let Σ = (1/n 1) Σ 1 + (1/n 2 ) Σ 2, Ω i = 1 Σ i, Ω = Ω 1 + Ω 2. 1/n 1 + 1/n 2 n i 4.1 ALR Test Do the test as before using Σ to find A. Use the same F approximation as before, but with the Welch-Satterthwaite approx. d.f. for the multivariate Behrens-Fisher problem (Yao 1965) given by 1 ν = 1 (d Ω 1 d) 2 where d = (x 1 x 2 ). (d Ω 1 Ω 1 Ω 1 d) 2 n 1 1 + (d Ω 1 Ω 2 Ω 1 d) 2 n 2 1,

20 Simulation Results (α = 0.05) σ 2 1 σ 2 2 σ 2 2 ρ 1 ρ 2 n 1 = n 2 = 30 n 1 = n 2 = 50 m = 4 m = 8 m = 4 m = 8 1 4 4 0 0.0536.0486.0506.0487 1 4 2 0 0.0530.0468.0501.0500 1 4 4 0 0.5.0481.0453.0486.0500 1 4 2 0 0.5.0466.0460.0512.0460 1 4 4 0.5 0.0500.0501.0458.0477 1 4 2 0.5 0.0479.0474.0449.0476 1 4 4 0.5 0.5.0469.0525.0479.0482 1 4 2 0.5 0.5.0499.0450.0473.0490

21 4.2 OLS and GLS Tests The large sample statistic for testing H 0k : δ k = 0 is z k = x 1 k x 2 k σ1,kk /n 1 + σ 2,kk /n 2. By analogy with the homoscedastic case, Pocock, Geller and Tsiatis (1987) proposed z GLS = j R 1 z j R 1 j, where z = (z 1, z 2,..., z m ) and R = n 1R 1 +n 2 R 2 n 1 +n 2. 1. This is an ad-hoc statistic. 2. The correlation matrix of z is not R, but Γ = {γ kl }, where γ kl = σ 1,kl /n 1 + σ 2,kl /n 2 (σ1,kk /n 1 + σ 2,kk /n 2 )(σ 1,ll /n 1 + σ 2,ll /n 2 ). Therefore z GLS does not have an asymptotic N(0, 1) null distribution.

Derivation of OLS and GLS Tests from First Principles Define the standardized treatment effect by 22 λ k = δ k σ1,kk + σ 2,kk. Assume λ k = λ 0 for all k. Let y ijk = x ijk σ1,kk + σ 2,kk and Γ i = Cov(y ij1,..., y ijm ) = {γ i,kl }, where γ i,kl = Cov(y ijk, y ijl ) = σ i,kl (σ1,kk + σ 2,kk )(σ 1,ll + σ 2,ll ). Note Γ = Γ 1 + Γ 2 if n 1 = n 2. We can write a regression model for y ijk as before and derive the OLS and GLS tests for testing H 0 : λ = 0.

23 The resulting test statistics are t OLS = j (y 1 y 2 ) j ( Γ 1 /n 1 + Γ 2 /n 2 )j = j t j Γj if n 1 = n 2. and t GLS = j ( Γ 1 /n 1 + Γ 2 /n 2 ) 1 (y 1 y 2 ) j ( Γ 1 /n 1 + Γ 2 /n ) 1 2 j = j Γ 1 t j Γ 1 j if n 1 = n 2.

24 5. Global Tests Based on p-values p 1, p 2,..., p m : Raw (unadjusted) p-values corresponding to H 01, H 02,..., H 0m. p (1) p (2) p (m) : Ordered p-values corresponding to H 0(1), H 0(2),..., H 0(m). 5.1 Bonferroni Test Reject H 0 if p min = min 1 k m p k < α m. Overly conservative esp. if m is large or if the ρ kl are large (Pocock, Geller and Tsiatis 1987). Powerful if only one endpoint has a large effect, but lacks power if most endpoints have moderate effects (O Brien 1984).

25 Being a union-intersection (UI) test, rejection of H 0 = m k=1 H 0k implies rejection of any H 0k with p k < α/m. (Endpoint specific inference) 5.2 Simes Test Reject H 0 if p (k) < (m k + 1)α m for some k = 1, 2,..., m. Thus for m = 5 and α =.05, reject H 0 if p (1) <.05 or p (2) <.04 or p (3) <.03 or p (4) <.02 or p (5) <.01. Simes (1986) gave a proof of type I error control only for independent statistics. The proof was extended by Sarkar and Chang (1997) to statistics having TP 2 distributions and by Sarkar (1998) to MTP 2 and certain scale mixtures of MTP 2 distributions.

6. Tests for Identifying Individual Significant Endpoints 6.1 Closed Tests Kropf (1988) and Lehmacher, Wassmer and Reitmeir (1991) suggested applying any α-level global test to test all subset hypotheses 26 H 0K = k K H 0k, K M = {1, 2,..., m} using the closure principle of Marcus, Peritz and Gabriel (1976). Can obtain decisions on all subsets K M including individual endpoints k = 1, 2,..., m.

27 6.2 Normal Theory Tests Test each H 0k using the statistic t k = n 1 n 2 n 1 + n 2 x 1 k x 2 k σ kk. and the critical value = the (1 α)th quantile of the distribution of max(t 1,..., t m ) (or use a stepwise version). The distribution of t 1,..., t m is not the standard multivariate t because they have different though correlated denominators σkk. In the bivariate case this distribution was derived by Siddiqui (1967); also see Tamhane and Logan (2001). The correlations between the numerators and denominators are unknown.

6.3 Procedures Based on Individual Adjusted p-values Find adjusted (for multiple testing) p-values, p k, such that if p k < α then one can reject H 0k (1 k m) with FWE controlled at level α. The p k depend on the multiple test procedure used. 6.3.1 A Single-Step Procedure Let P 1, P 2,..., P m be the random variables corresponding to observed p-values: p 1, p 2,..., p m. Then for a single-step procedure p k = Pr H0 min P l p k 1 l m Bonferroni (1928): p k = mp k. Sidák (1968): p k = 1 (1 p k ) m.. 28

29 Dubey (1985): p k = 1 (1 p k ) m(1 ρ k ) where ρ k is the average of ρ kl for l k. Since p k is a function of all ρ kl, instead of ρ k one should use the overall average ρ. Note that p k = p k if ρ = 1 and p k = 1 (1 p k ) m if ρ = 0. Armitage and Parmar (1986): p k = 1 (1 p k ) mf, where f is an empirically fitted function of all ρ kl. Tukey, Ciminera and Heyse (1985): p k = 1 (1 p k ) m. James (1991): Analytical approximation to multivariate normal probabilities. Westfall and Young (1993): Resampling method. Distribution-free. Implicitly accounts for actual correlations.

30 6.3.2 A Step-Down Procedure p (m) = Pr H0 min P l p (m) 1 l m p (k) = max Holm (1979): p(k+1), Pr H0 and min P l p (k) 1 l k (1 k m 1). p (m) = mp (m), p (k) = max [ p(k+1), kp (k) ] (1 k m 1). Westfall and Young (1993): Resampling method. 6.3.3 A Step-Up Procedure Hochberg (1988): p (1) = p (1), p (k) = min [ p(k 1), kp (k) ] (2 k m). Troendle (1996): Resampling method.

31 6.4 Hybrid Approach We saw that for identifying individual significant endpoints, closed procedures with global tests have high power when all or most endpoints have positive effects, and tests based on adjusted p-values have high power when only a few endpoints have positive effects. By combining these two approaches we can get a test procedure with a more uniform power performance. Use Hothorn s T max testing principle for combining the tests. Logan and Tamhane (2001) combined OLS and Bonferroni tests. p K = Pr min min k K P k, P K,OLS min min p k, p K,OLS k K Adjusted for multiple testing, but not for step-down.

32 testing. Logan (2001) added the ALR test. Use bootstrap resampling to find estimate pk of p K. Reject H 0K at level α iff all hypotheses H 0L for L K are rejected at level α and pk < α. This procedure is more power-robust than its components. It is also more powerful if the ρ kl are small, but slightly less powerful if the ρ kl are large.

33 Simulation Results for Average Power (m = 4, n = 50) Average Power Bootstrap Bootstrap Closed Corr. δ max t max t/ols OLS 0.0 (0.5,0,0,0) 0.590 0.575 0.192 (0.5,0.5,0,0) 0.623 0.615 0.321 (0.5,0.5,0.5,0) 0.655 0.664 0.511 (0.5,0.5,0.5,0.5) 0.711 0.771 0.784 (0.25,0.25,0.75,0.75) 0.601 0.623 0.614 0.5 (0.5,0,0,0) 0.623 0.619 0.165 (0.5,0.5,0,0) 0.645 0.640 0.253 (0.5,0.5,0.5,0) 0.671 0.665 0.407 (0.5,0.5,0.5,0.5) 0.717 0.737 0.759 (0.25,0.25,0.75,0.75) 0.611 0.616 0.544 0.7 (0.5,0,0,0) 0.658 0.657 0.174 (0.5,0.5,0,0) 0.674 0.673 0.236 (0.5,0.5,0.5,0) 0.698 0.695 0.383 (0.5,0.5,0.5,0.5) 0.729 0.738 0.759 (0.25,0.25,0.75,0.75) 0.625 0.627 0.526

7. Clinical Decision Rules Based on Multiple Endpoints First suppose that all endpoints are primary. Decision Rule 1: Decide that the treatment is effective if at least r of the m endpoints show significant effects. r = 1: Union-intersection (UI) testing problem. Use the Bonferroni or one of its modifications (UI tests). r = m: Intersection-union (IU) testing problem. Laska and Meisner s (1989) MIN test rejects H 0 if min t k > c α, 1 k m where c α is the upper α critical point of the marginal null distribution of each t k. 34

35 This test is very conservative (Cappizi and Zhang 1996) because the null hypothesis is the complete complement of H 1 : δ O + : H 0 : m k=1 (δ k 0). The least favorable (LF) configuration is δ k and δ l = 0 for l k. Hochberg and Mosier (2001) restricted H 0 by assuming no qualitative interaction between treatments and their effects on endpoints. Hence H 0 is a partial complement: H 0 : m k=1 (δ k 0). In this case the LF configuration is δ 1 = = δ m = 0. Hence a more powerful test is obtained.

1 < r < m: In this case an α-level test rejects H 0 if t (m r+1) > c m,r,α,{ρkl }. 36 Tamhane, Liu and Dunnett (1998) tabulated these critical points in the multivariate t and known common correlation case, but they are not applicable here. Decision Rule 2: Decide that the treatment is effective if at least m 1 of the m endpoints are each significant at α 1 level and m 2 = m m 1 endpoints are each significant at α 2 level with α 2 > α 1. Alternatively, superiority in at least m 1 endpoints and equivalence in the remaining m 2 endpoints.

37 Many other decision rules discussed by Chi (2000). Some examples involving primary and secondary endpoints are as follows: Decision Rule 3: T 1 : Primary, S 1, S 2 : Secondary. T 1 significant at level α 1 or S 1 and S 2 both significant at level α 2 with α 1 + α 2 = α. Decision Rule 4: T 1 superior at level α 1 and S 1 and S 2 both equivalent at level α 2. A GENERAL FRAMEWORK FOR SUCH COMPLEX DECISION RULES AND METHODS FOR THEIR ANALYSIS ARE NEEDED.