INTRODUCTION TO INTERSECTION-UNION TESTS

Similar documents
A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

Lecture 13: p-values and union intersection tests

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Lecture 21: October 19

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

LECTURE 5 HYPOTHESIS TESTING

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Statistical Inference

Practice Problems Section Problems

Tests about a population mean

A note on vector-valued goodness-of-fit tests

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Statistical Inference

The International Journal of Biostatistics

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Multiple Testing in Group Sequential Clinical Trials

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Multivariate Statistical Analysis

Brock University. Department of Computer Science. A Fast Randomisation Test for Rule Significance

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

1 Statistical inference for a population mean

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Chapter Seven: Multi-Sample Methods 1/52

Modified Simes Critical Values Under Positive Dependence

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

F79SM STATISTICAL METHODS

Mathematical statistics

On Generalized Fixed Sequence Procedures for Controlling the FWER

Institute of Actuaries of India

Week 14 Comparing k(> 2) Populations

Psychology 282 Lecture #4 Outline Inferences in SLR

Fundamental Probability and Statistics

University of California, Berkeley

IMPROVING TWO RESULTS IN MULTIPLE TESTING

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

ON TWO RESULTS IN MULTIPLE TESTING

Lecture 5: ANOVA and Correlation

Control of Directional Errors in Fixed Sequence Multiple Testing

This paper has been submitted for consideration for publication in Biometrics

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Lecture 8 Inequality Testing and Moment Inequality Models

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

Master s Written Examination

Chapter 8 of Devore , H 1 :

Formal Statement of Simple Linear Regression Model

Test Volume 11, Number 1. June 2002

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Review. December 4 th, Review

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Probability and Measure

Topic 15: Simple Hypotheses

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Chapter 7. Hypothesis Testing

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Alpha-Investing. Sequential Control of Expected False Discoveries

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Chapter 11. Hypothesis Testing (II)

Statistical Data Analysis Stat 3: p-values, parameter estimation

Introductory Econometrics

First Year Examination Department of Statistics, University of Florida

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Bios 6649: Clinical Trials - Statistical Design and Monitoring

First Year Examination Department of Statistics, University of Florida


LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Parameter Estimation, Sampling Distributions & Hypothesis Testing

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

Bios 6649: Clinical Trials - Statistical Design and Monitoring

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Testing Algebraic Hypotheses

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Pubh 8482: Sequential Analysis

Generalized, Linear, and Mixed Models

Consonance and the Closure Method in Multiple Testing. Institute for Empirical Research in Economics University of Zurich

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Finite-sample quantiles of the Jarque-Bera test

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Multivariate Time Series: Part 4

Can we do statistical inference in a non-asymptotic way? 1

Qualifying Exam in Probability and Statistics.

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Lecture 7 Introduction to Statistical Decision Theory

Interval Estimation. Chapter 9

Spring 2012 Math 541B Exam 1

Inferences about a Mean Vector

Transcription:

INTRODUCTION TO INTERSECTION-UNION TESTS Jimmy A. Doi, Cal Poly State University San Luis Obispo Department of Statistics (jdoi@calpoly.edu Key Words: Intersection-Union Tests; Multiple Comparisons; Acceptance Sampling; Bioequivalence. Abstract The Intersection-Union Test (IU T has become increasingly popular, especially through its application in bioequivalence testing. Here we will provide a basic introduction and discuss some properties of the IUT. Since this method is based upon a combination of tests, there may be a concern for the need of a multiplicity adjustment to adequately control the overall type-i error rate. We will address this issue by considering two theorems. The first describes how the IU T can be level-α and the second shows under what conditions the test is size-α. Both results do not require any multiplicity adjustment. A simple example from acceptance sampling will be used to apply these theorems, and we will examine Monte Carlo simulations to verify the expected results. 1 Introduction 1.1 Multiple Comparisons Consider an experiment consisting of 5 treatments. We would then have a total of 10 possible pairwise comparisons or contrasts. Let t 1, t 2,..., t 10 represent 5% level t-tests corresponding to the respective contrasts. Suppose we combine all tests into φ t, where φ t rejects if any of the t i rejects. Then, the familywise error rate α 0.05 despite the fact that each of the individual t i is a 0.05 level test. In fact, the correct value for α would be about 40%. Looking at Table 1 it is evident that the overall type I error rate quickly grows beyond 0.05 as the number of comparisons increases. In order to adequately control the overall type I error rate we would need to utilize a multiplicity adjustment procedure. Examples of such procedures are described by Steel

Table 1: Familywise error rates corresponding to levels of n, the total number of comparisons. n 3 5 10 20 45 α 14% 23% 40% 64% 90% et al. (1997 in their chapter of multiple comparisons. The key issue to keep in mind here is the fact that φ t rejects if at least one of the individual tests t i rejects. In a moment, we will see how this compares to the intersection-union test and its basis for rejection. 1.2 Model Definition Given X f(x θ, suppose H 0 is expressed as a union of k sets (note: the index set need not be finite: H 0 : θ Θ 0 = k Θ i vs. H A : θ Θ c 0 = i=1 For each i, let R i be the rejection region for a test of H 0i : θ Θ i versus H Ai : θ Θ c i. k Θ c i (1 Then the rejection region for the IUT of H 0 versus H A is R = k i=1 R i. In other words, the IUT rejects only if all of the tests reject. The rationale behind R is easy to understand since the null hypothesis (H 0 is false if and only if each of the individual null hypotheses (H 0i is false. Note that the IUT is quite different from φ t in the manner in which rejection is determined. Will this difference have any effect upon the familywise error rate of the IUT? Is there a multiplicity adjustment need for the IUT as well? In order to address this issue, let us discuss two important theorems that deal with the level and size of the IUT. i=1

2 Two Theorems The following theorems are from Berger (1997. 2.1 Theorem 1 Theorem 1 If R i is a level-α test of H 0i, for i = 1,..., k, then the IUT with rejection region R = k i=1r i is a level-α test of H 0 versus H A in 1. Proof. Let θ Θ 0 = k i=1θ i. So, for some i = 1,..., k (say i = i, θ Θ i. Thus, P θ ( R i P θ (R i α Since θ Θ 0 was arbitrarily chosen, the IUT is level-α as sup θ Θ0 P θ ( R i α The result provides an overall type I error rate of α without the need for a multiplicity adjustment. It is made possible through the special way in which the individual tests are combined for the IUT. This illustrates the usefulness of the IUT as we can rely upon the simpler tests of the individual hypotheses and not worry about the formulation of the underlying joint multivariate distribution. This benefit is especially apparent when considering the situation where observations are believed to be correlated. It is important to note, however, that the IUT constructed via Theorem 1 can be quite conservative. In order to find when the IUT is size-α, let us appeal to the following theorem. 2.2 Theorem 2 Theorem 2 For some i = 1,..., k, suppose R i is a size-α rejection region for testing H 0i vs. H Ai. For every j = 1,..., k, j i, suppose R j is a level-α rejection region for testing H 0j vs. H Aj. Suppose there exists a sequence of parameter points θ l, l = 1, 2,..., in Θ i such that and, for every j = i,..., k, j i, lim l P θl (R i = α, lim l P θl (R j = 1.

Then the IUT with rejection region R = k i=1r i is a size α test of H 0 vs. H A. Proof. ( k lim l P θl R m lim l k P θ l (R m (k 1 = α + (k 1 (k 1 = α. The inequality above is based upon the Bonferroni inequality. Applying Theorem 1 we have α sup θ Θ0 P θ ( k R m So, we have ( k ( α sup θ Θ0 P θ R k m lim l P θl R m α ( k Hence, sup θ Θ0 P θ R m = α. Also, refer to the appendix for an alternative proof of Theorem 2. The key component of Theorem 2 is the use of the sequence of parameter points θ l Θ 0. To gain a better understanding of these two theorems, let us consider the following application of an example in acceptance sampling. 3 Example and Simulation Example. A sample of upholstery fabric is considered to be acceptable when it meets or exceeds certain criteria. Let θ 1 be the mean breaking strength of the fabric and θ 2 be the probability of the fabric passing a flammability test. Suppose the fabric is deemed to be acceptable when the following conditions are met: θ 1 > 50 and θ 2 > 0.95. Placing this under the hyposthesis testing framework, let us define H 0 : {(θ 1, θ 2 : θ 1 50 or θ 2 0.95} and H A : {(θ 1, θ 2 : θ 1 > 50 and θ 2 > 0.95}

where the fabric is deemed to be acceptable when the null hypothesis is iid rejected. Suppose we model the data where X 1,..., X n N(θ 1, σ 2 and iid Y 1,..., Y m Bern(θ 2. For the X i, let us use the corresponding likelihood ratio test (LRT of H 01 : θ 1 50 which has the form (x 50/(s/ n > t. Likewise, for the Y i, let us use the corresponding LRT of H 02 : θ 2 0.95 which has the form m i=1 y i > b. Then, combining these two components, the IUT rejection region is { k R = R i = (x, y : x 50 m } s/ n > t and y i > b. i=1 For the Monte Carlo simulation, let n = m = 58. Then, setting t = 1.672 and b = 57, the likelihood ratio tests above are approximate size-α tests. In this example, since the individual tests are (approx. size-α, applying Theorem 1 we have that the IUT is level-α. We can in fact go further by applying Theorem 2 and state that the IUT is size-α. To see this, let us construct a sequence of parameter points θ l Θ 0 that will satisfy the conditions of Theorem 2. Figure 3 illustrates the graphical representation of the null and alternative hypothesis spaces. Notice that, for our particular example, the boundary of the two regions is the union of the sets {(θ 1, θ 2 : θ 2 = 0.95, θ 1 50} and {(θ 1, θ 2 : θ 1 = 50, θ 2 0.95}. If we choose a sequence of parameter points from either set above (representing the horizontal and vertical boundary lines respectively it can be easily shown that the IU T attains size-α. For verification, a Monte Carlo simulation was performed to calculate estimates of the type I error rate, α. In Table 2, estimates were calculated where θ 2 is fixed at 0.95 and θ 1 is increasing (selecting points along the horizontal boundary. In Table 3, estimates were calculated where θ 1 is fixed at 50 and θ 2 is increasing towards 1(selecting points along the vertical boundary. For each simulation, a total of 10,000 runs were performed. As expected, the results indicate that the IUT does attain a size of 0.05. Finally, to determine Monte Carlo estimates of power, we chose 9 specific points from H A (see Figure 2. The values 60, 65, and 70 for θ 1 were chosen since they seemed to generate fairly large differences among the corresponding LRT statistics. The values.975,.990, and.999 for θ 2 were likewise selected for the same reason. This was the basis for generating the 9 specific points and the results of the power estimates are summarized in Table 4. i=1

Table 2: Monte Carlo Estimates of α, Fixed θ 2 and Increasing θ 1. θ 1 =50 θ 1 = 60 θ 1 = 75 θ 1 = 100 θ 2 =.95.002.041.049.051 NOTE: Each estimate based on 10,000 MC runs. Table 3: Monte Carlo Estimates of α, Fixed θ 1 and Increasing θ 2. θ 2 =.95 θ 2 =.99 θ 2 =.999 θ 2 =.9999 θ 1 =50.002.015.049.049 NOTE: Each estimate based on 10,000 MC runs. Table 4: MC Estimates of Power. θ 2.975.99.999 60.185.445.757 θ 1 65.226.544.927 70.230.553.944 NOTE: Each estimate based on 10,000 MC runs.

4 Summary We have defined the IU T model and discussed two theorems that describe how the IU T can attain level-α and size-α. Through these theorems we found that an overall type I error rate could be attained without the need for a multiplicity adjustment. This was made possible by the special way in which the individual tests are combined. The usefulness of the IUT is that there is no need to postulate a multivariate model since we only need to make use of the individual hypothesis tests and these are usually much easier to construct. In addition to the field of acceptance sampling, the IUT has been used for the comparison of regression functions (see Berger (1984 and testing for contingency tables (see Cohen et al. (1983. But, the field that has placed most attention to the IUT is the growing area of bioequivalence (see Berger and Hsu (1996. lim l P θ l APPENDIX Alternate Proof of Theorem 2 Recall that k R m is the rejection region of the IUT. ( k ( ( k 1 P θl R m = lim l ( k 1 lim P θl (R c l m The last inequality above follows from the fact that P ( m A m m P (A m. R c m Note : k P θl (R c m = P θl (R c i + m i P θl (R c m Now, we have lim l P θl (R c i 1 α. Also, for all m i we have lim l P θl (R c m 0. So, ( k ( 1 lim P θl (R c l m = 1 (1 α + 0 = α m i

So, we have lim l P θ l ( k Applying Theorem 1 we also have α sup θ Θ 0 P θ R m α (2 ( k R m (3 Putting 2 and 3 together we finally have ( k ( k α sup P θ R m lim P θl R m α θ Θ 0 l showing that the IUT is indeed a size α test! REFERENCES BERGER, R.L. (1984, Testing whether one regression function is larger than another, Communications in Statistics, Theory and Methods 13, 1793 1810. BERGER, R.L. (1997, Likelihood Ratio Tests and Intersection-Union Tests, Advances in Statistical Decision Theory and Applications, Boston: Birkhauser. pp. 225 237. BERGER, R.L. and Hsu, J.C. (1996, Bioequivalence trials, intersectionunion tests, and quivalence confidence sets. With discussion. Statistical Science, 11, 283 319. COHEN, A., GATSONIS, C. and MARDEN, J.I. (1983, Hypothesis testing for marginal probabilities in a 2 2 2 contingency table with conditional independence, Journal of the American Statistical Association, 78, 920 929. STEEL, R.G.D., TORRIE, J.H., DICKEY, D.A. (1997, Principles and Procedures of Statistics A Biometrical Approach, New York: McGraw- Hill.

Figure 1: Graphical illustration of H 0 versus H A.

Figure 2: Selected points in H A for determination of power estimates