Lecture 13: p-values and union intersection tests

Similar documents
Lecture 26: Likelihood ratio tests

INTRODUCTION TO INTERSECTION-UNION TESTS

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Stat 710: Mathematical Statistics Lecture 27

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Lecture 21. Hypothesis Testing II

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Stat 710: Mathematical Statistics Lecture 31

Mathematical Statistics

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Math 152. Rumbos Fall Solutions to Assignment #12

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Lecture 17: Likelihood ratio and asymptotic tests

8.1-4 Test of Hypotheses Based on a Single Sample

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

When Are Two Random Variables Independent?


Solution: First note that the power function of the test is given as follows,

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Lecture 21: Convergence of transformations and generating a random variable

Chapter 9: Hypothesis Testing Sections

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Non-parametric Inference and Resampling

One-Sample Numerical Data

Statistical Inference

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Topic 3: Hypothesis Testing

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

MTMS Mathematical Statistics

Some General Types of Tests

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration

ST5215: Advanced Statistical Theory

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Lecture 8 Inequality Testing and Moment Inequality Models

Stat 710: Mathematical Statistics Lecture 40

Interval Estimation. Chapter 9

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Probability Theory and Statistics. Peter Jochumzen

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Spring 2012 Math 541B Exam 1

F79SM STATISTICAL METHODS

Lecture 14: Multivariate mgf s and chf s

STAT 830 Hypothesis Testing

Chapter 1: Probability Theory Lecture 1: Measure space and measurable function

Write your Registration Number, Test Centre, Test Code and the Number of this booklet in the appropriate places on the answer sheet.

Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets

Composite Hypotheses and Generalized Likelihood Ratio Tests

ST5215: Advanced Statistical Theory

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

STAT 830 Hypothesis Testing

Lecture 12 November 3

LECTURE NOTES 57. Lecture 9

Problems ( ) 1 exp. 2. n! e λ and

simple if it completely specifies the density of x

Lecture 8: Information Theory and Statistics

Review. December 4 th, Review

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

ST5215: Advanced Statistical Theory

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Math Review Sheet, Fall 2008

Lecture 21: October 19

Topic 15: Simple Hypotheses

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

1 Probability theory. 2 Random variables and probability theory.

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Chapter 7. Hypothesis Testing

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Lecture 23: UMPU tests in exponential families

Convexity in R N Supplemental Notes 1

Chapter 9: Hypothesis Testing Sections

Lecture 8: Information Theory and Statistics

Lecture 32: Asymptotic confidence sets and likelihoods

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

14.30 Introduction to Statistical Methods in Economics Spring 2009

STAT 513 fa 2018 Lec 02

Chapter 4. Theory of Tests. 4.1 Introduction

1 Statistical inference for a population mean

Lebesgue Measure. Dung Le 1

λ(x + 1)f g (x) > θ 0

Asymptotic Statistics-III. Changliang Zou

Lecture 17: Minimal sufficiency

Inferences about a Mean Vector

Mathematical statistics

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

The International Journal of Biostatistics

Lecture 6 April

Transcription:

Lecture 13: p-values and union intersection tests p-values After a hypothesis test is done, one method of reporting the result is to report the size α of the test used to reject H 0 or accept H 0. If α is small, the rejecting H 0 is convincing, but if α is large, rejecting H 0 is not very convincing. Another way of reporting the result of a hypothesis test is to report the value of a certain kind of test statistic called p-value. Definition 8.3.26. A p-value p(x) is a test statistic satisfying 0 p(x) 1 for every x X. mall values of p(x) give evidence that H 1 is true. A p-value is valid iff for every θ Θ 0 and every α [0,1], P θ (p(x) α) α If p(x) is a valid p-value, then the test that rejects H 0 iff p(x) α is a level α test. UW-Madison (tatistics) tat 610 Lecture 13 2016 1 / 16

An advantage to reporting a test result via a p-value is that each person can choose the α he or she considers appropriate and then can compare the p-value to α and know whether the data lead to acceptance or rejection of H 0. The smaller the p-value, the stronger the evidence for rejecting H 0. A p-value reports the results of a test on a more continuous scale, rather than just the dichotomous decision accept" or reject" H 0. The most common way to define a p-value is to define it as the probability of a test statistic is more extreme than its observed value under the null hypothesis. The follow result shows the validity of this p-value. Theorem 8.3.27. Let W (X) be a statistic such that large values of W give evidence that H 1 is true. For each observed sample value x, define p(x) = sup θ Θ 0 P θ (W (X) W (x)). Then p(x) is a valid p-value. UW-Madison (tatistics) tat 610 Lecture 13 2016 2 / 16

Proof. For a fixed θ Θ 0, let F θ (w) denote the cdf of W (X), and p θ (x) = P θ (W (X) W (x)) = P θ ( W (X) W (x)) = F θ ( W (x)). For the random variable p θ (X) = F θ ( W (X)), by the probability integral transformation (Exercise 2.10), P θ (p θ (X) α) α for every α [0,1] ince p(x) = sup θ Θ0 p θ (x) p θ (x) for every x X, P θ (p(x) α) P θ (p θ (X) α) α This is true for every θ Θ 0 and every α [0,1]; hence p(x) is a valid p-value. Another common way to define a p-value involves a class of tests T α indexed by α such that T α is a level α test. We can then define a p-value to be the smallest possible level α at which H 0 would be rejected for the computed T α (x), i.e., p(x) = inf{α (0,1) : T α (x) rejects H 0 } UW-Madison (tatistics) tat 610 Lecture 13 2016 3 / 16

Thus, if p(x) is observed, then we reject H 0 if α p(x), and accept H 0 if α < p(x), based on the observed data x. We now show that this p-value p(x) is valid. Note that inf{t (0,1) : T t (x) rejects H 0 } α implies T α (x) rejects H 0. Hence, for any θ Θ 0, because T α has level α. P θ (p(x) α) P θ (T t (X) rejects H 0 ) α Example 8.3.28 (two sided normal p-value) Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0. Consider testing two sided hypotheses H 0 : µ = µ 0 versus H 1 : µ µ 0 with a constant µ 0. From the last lecture, a UMPU test rejects H 0 for large values of t(x) = n X µ 0 /. If µ = µ 0 (H 0 holds), regardless of the value of σ, t(x) has the t-distribution with degrees of freedom n 1 and, hence, UW-Madison (tatistics) tat 610 Lecture 13 2016 4 / 16

sup P θ ( t(x) > t(x) ) = P µ=µ0 ( t(x) > t(x) ) θ Θ 0 = P( T n 1 > t(x) ) = 2P(T n 1 > t(x) ), where T n 1 denotes a random variable t-distribution with degrees of freedom n 1 which is symmetric about 0. Hence, p(x) defined in Theorem 8.3.27 is 2P(T n 1 > t(x) ). On the other hand, the UMPU test of size α rejects H 0 iff t(x) > t n 1,α/2, where P(T n 1 > t n 1,α/2 ) = α/2. ince t n 1,α/2 as a function of α is continuous, we obtain that inf{t (0,1) : the UMPU test of size t rejects H 0 } = 2P(T n 1 > t(x) ) Thus, the two definitions of p-value are the same. Example 8.3.29 (one-sided normal p-value) Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0. Consider testing one-sided hypotheses H 0 : µ µ 0 versus H 1 : µ > µ 0 withuw-madison a constant (tatistics) µ. tat 610 Lecture 13 2016 5 / 16

From the last lecture, the UMPU test of size α rejects H 0 for large values of t(x) = n( X µ 0 )/. The p-value defined in Theorem 8.3.27 is ( ) n( X µ0 ) sup P θ (t(x) t(x)) = sup P µ t(x) θ Θ 0 θ Θ 0 ( ) n( X µ) n(µ0 µ) = sup P µ t(x) + θ Θ 0 ( ) n(µ0 µ) = sup P µ T n 1 t(x) + θ Θ 0 = P (T n 1 t(x)) imilar to the two-sided hypothesis case, inf{t (0,1) : the UMPU test of size t rejects H 0 } = P(T n 1 > t(x) ) UW-Madison (tatistics) tat 610 Lecture 13 2016 6 / 16

Union-intersection tests In some problems, tests for a complicated null hypothesis can be developed from a number of tests for simpler null hypotheses. The union-intersection method deals with the following hypotheses H 0 : θ Θ γ versus H 1 : θ γ Γ γ ΓΘ c γ where Θ γ Θ for any γ Γ and Γ is an index set that may be finite or infinite. uppose that, for each γ Γ, we have a test for H 0γ : θ Θ γ versus H 1γ : θ Θ c γ with rejection region R γ. Then the rejection region for the union-intersection test (UIT) is R = γ ΓR γ This is because, H 0 is true iff H 0γ holds for every γ Γ so that if any one of H 0γ is rejected, H 0 must be rejected; only if each of H 0γ is accepted will the intersection H 0 be accepted. UW-Madison (tatistics) tat 610 Lecture 13 2016 7 / 16

If each rejection region is of the form R γ = {x : T γ (x) > c}, where T γ is a statistic and c does not depend on γ, then the rejection region for the UIT becomes { } {x : T γ (x) > c} = γ Γ x : supt γ (x) > c γ Γ The form of sup γ Γ T γ (x) may be derived, as the following trivial example indicates. Other examples can be found in Chapter 11. Example 8.2.8. Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0. Consider Γ = {L,U} (two elements), H 0L : µ µ 0 and H 0U : µ µ 0, where µ 0 is a constant. H 0 : µ {µ µ 0 } {µ µ 0 } = {µ 0 } i.e., the intersection of two one-sided null hypotheses is a two-sided null hypothesis. UW-Madison (tatistics) tat 610 Lecture 13 2016 8 / 16

The LRT (as well as UMPU test) of size α for testing H 0L : µ µ 0 versus H 1L : µ > µ 0 rejects H 0L iff n( X µ 0 )/ t n 1,α. The LRT (as well as UMPU test) of size α for testing H 0U : µ µ 0 versus H 1U : µ < µ 0 rejects H 0U iff n( X µ 0 )/ t n 1,α, which is the same as n( X µ 0 )/ t n 1,α. Then sup T γ (X) = max γ=l,u { n( X µ0 ), n( X } µ 0 ) n X µ0 = and the UIT for testing H 0 : µ = µ 0 versus H 1 : µ µ 0 rejects iff n X µ0 / t n 1,α, which is the LRT or UMPU test of size 2α. Intersection-union tests The intersection-union method may be useful when the null hypothesis is conveniently expressed as a union: H 0 : θ Θ γ versus H 1 : θ γ Γ γ ΓΘ c γ UW-Madison (tatistics) tat 610 Lecture 13 2016 9 / 16

uppose that, for each γ Γ, we have a test for H 0γ : θ Θ γ versus H 1γ : θ Θ c γ with rejection region R γ. Then the rejection region for the intersection-union test (IUT) is R = γ ΓR γ H 0 is false iff all of the H 0γ are false so H 0 can be rejected iff each of H 0γ can be rejected. Again, the test can be simplified if each R γ is of the form {x : T γ (x) c} with a c not depending on γ, in which case the rejection region for the IUT is R = γ Γ{x { } : T γ (x) > c} = x : inf T γ(x) > c γ Γ Example 8.2.9 Let X 1,...,X n be iid from N(µ,σ 2 ) with unknown µ R and σ 2 > 0, and let Y 1,...,Y m be iid Bernoulli variables with unknown p (0,1). For the quality control, we want µ > µ 0 and p > p 0, where u 0 and p 0 are constants. UW-Madison (tatistics) tat 610 Lecture 13 2016 10 / 16

Then, we would like to test H 0 : {µ µ 0 } {p p 0 } versus H 1 : {µ > µ 0 } {p > p 0 } The LRT or UMPU test of size α rejects H 01 : µ µ 0 iff n( X µ0 )/ > t n 1,α/2, and the LRT or UMP test rejects H 02 : p p 0 iff m i=1 Y i > b for a constant b. The rejection region for the IUT is { n( X µ0 ) > t n 1,α, m i=1 Y i > b The level and size of a UIT or IUT Although UIT s and IUT s are easy to construct, their properties may not be easy to obtain. The type I error probabilities of UIT s and IUT s can often be bounded above by the level or size of another test. uch bounds are useful to derive the level of a UIT or IUT, but to derive the size of a UIT or IUT may be difficult, because the bounds may not be sharp. } UW-Madison (tatistics) tat 610 Lecture 13 2016 11 / 16

Theorem 8.3.21. Let λ γ (X) be the LRT statistic for testing H 0γ : θ Θ γ versus H 1γ : θ Θ c γ, γ Γ, and let λ(x) be the LRT statistic for testing H 0 : θ Θ 0 versus H 1 : θ Θ c 0, where Θ 0 = γ Γ Θ γ. Define T (X) = inf γ Γ λ γ (X) and form the UIT with rejection region Then {x : λ γ (x) < c for some γ Γ} = {x : T (x) < c} a. T (x) λ(x) for every x X. b. If β T (θ) and β λ (θ) are the power functions of the tests based on T and λ, respectively, then β T (θ) β λ (θ) for every θ Θ. c. If the LRT is a level α test, then the UIT is a level α test. Proof. ince Θ 0 = γ Γ Θ γ Θ γ for any γ Γ, by the definition of the LRT, λ γ (x) λ(x), x X, γ Γ, which implies that This proves a. T (x) = inf γ Γ λ γ(x) λ(x), x X UW-Madison (tatistics) tat 610 Lecture 13 2016 12 / 16

By a, {x : T (x) < c} {x : λ(x) < c} so that b follows because β T (θ) = P θ (T (X) < c) P θ (λ(x) < c) = β λ (θ), From b, if the LRT has level α, then sup β T (θ) sup β λ (θ) α. θ Θ 0 θ Θ 0 θ Θ Theorem 8.3.21 shows that the LRT is more powerful than the UIT and, hence, the LRT should be used when it can be derived. However, the LRT may not be easy to construct, and that is the reason we consider UIT. Theorem 8.3.21 also shows that the UIT may have smaller type I error probabilities than the LRT, although we may be satisfied as long as all type I error probabilities are α. The size of the UIT may be smaller than the size of the LRT. If H 0 is rejected, the UIT may provide us more information about which H 0γ is rejected. In some special cases, the UIT is the same as the LRT, i.e., T (x) = λ(x), x X (e.g., Example 8.2.8). UW-Madison (tatistics) tat 610 Lecture 13 2016 13 / 16

Theorem 8.3.23. For every γ Γ, let α γ be the size of the test with rejection region R γ for testing H 0γ : θ Θ γ versus H 1γ : θ Θ c γ. For testing H 0 : θ Θ 0 versus H 1 : θ Θ c 0, where Θ 0 = γ Γ Θ γ, the IUT with rejection region R = γ Γ R γ has level sup γ Γ α γ. Proof. Let θ Θ 0. Then θ Θ γ for some γ Γ, and P θ (X R) P θ (X R γ ) α γ supα γ γ Γ Then the result follows since θ Θ 0 is arbitrary. Typically, each α γ = α so that the IUT is of level α. In Theorem 8.3.21, tests are LRT s; but in Theorem 8.3.23, tests can be arbitrary. Note that an LRT of a given size or level may not be so easy to obtain. Theorem 8.3.23 gives a level for the IUT, not the size. The following result gives conditions under which the size of the IUT is exactly α. UW-Madison (tatistics) tat 610 Lecture 13 2016 14 / 16

Theorem 8.3.24. Let Θ 0 = k j=1 Θ j, where k is a fixed positive integer and Θ j Θ. For each j = 1,...,k, let R j be the rejection region of a level α test of H 0j : θ Θ j versus H 1j : θ Θ c j. uppose that for some integer i, 1 i k, there exists a sequence of parameter values θ l Θ i, l = 1,2,..., such that (i) lim l P θl (X R i ) = α; (ii) for every integer j i, 1 j k, lim l P θl (X R j ) = 1. Then, the IUT with rejection region k j=1 R j is a size α test for testing H 0 : θ Θ 0 versus H 1 : θ Θ c 0. Proof. By Theorem 8.3.23, the size of the IUT α. ince all θ l Θ i Θ 0, IUT s size = sup θ Θ 0 P θ (X R) lim l P θl (X R) = lim l P θl By Bonferroni s inequality, the last quantity is no smaller than ( X ) k R j j=1 UW-Madison (tatistics) tat 610 Lecture 13 2016 15 / 16

lim l [ k ] ( ) P θl X Rj (k 1) = (k 1) + α (k 1) = α j=1 by conditions (i)-(ii). Thus, the size must be α. Example 8.3.25. In Example 8.2.9 (k = 2), let X = (X 1,...,X n ), Y = (Y 1,...,Y m ), θ = (µ,σ 2,p), Θ 1 = {µ µ 0 }, and Θ 2 = {p p 0 }. For θ l = (µ 0,1,1 l 1 ), l = 1,2,..., θ l Θ 0 = Θ 1 Θ 2 for all l 2, and lim l P θ l ( (X,Y ) : lim l P θ l ( ) n( X µ0 ) > t n 1,α = α ) (X,Y ) : m i=1 Y i > b = 1 Hence, Theorem 8.3.24 applies and the size of the IUT is α. The size of test j in Theorem 8.3.24 does not have to be α. In Example 8.3.25, only the marginal distributions of X i s and Y i s are needed, not the joint distribution of (X,Y ). UW-Madison (tatistics) tat 610 Lecture 13 2016 16 / 16