Lecture 21. Hypothesis Testing II

Similar documents
40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Chapter 4. Theory of Tests. 4.1 Introduction

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Chapter 9: Hypothesis Testing Sections

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Hypothesis Testing - Frequentist

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Lecture 13: p-values and union intersection tests

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

STAT 830 Hypothesis Testing

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

STAT 830 Hypothesis Testing

Mathematical Statistics

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Lecture 32: Asymptotic confidence sets and likelihoods

F79SM STATISTICAL METHODS

Hypothesis Testing One Sample Tests

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

Summary of Chapters 7-9

Hypothesis Testing Chap 10p460

Stat 710: Mathematical Statistics Lecture 31

Topic 10: Hypothesis Testing

14.30 Introduction to Statistical Methods in Economics Spring 2009

Ch. 5 Hypothesis Testing

A Very Brief Summary of Statistical Inference, and Examples

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets

Notes on Decision Theory and Prediction

Institute of Actuaries of India

Topic 15: Simple Hypotheses

Midterm 1. Every element of the set of functions is continuous

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Composite Hypotheses and Generalized Likelihood Ratio Tests

8.1-4 Test of Hypotheses Based on a Single Sample

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

More Empirical Process Theory

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

Optimal Tests of Hypotheses (Hogg Chapter Eight)

Section 21. The Metric Topology (Continued)

LECTURE 5 HYPOTHESIS TESTING

LECTURE NOTES 57. Lecture 9

COSTLY STATE VERIFICATION - DRAFT

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

STAT 514 Solutions to Assignment #6

Lecture 8: Information Theory and Statistics

Lecture 8 Inequality Testing and Moment Inequality Models

Practice Final Exam. December 14, 2009


The International Journal of Biostatistics

simple if it completely specifies the density of x

Exercises Chapter 4 Statistical Hypothesis Testing

Topic 10: Hypothesis Testing

2.2 Some Consequences of the Completeness Axiom

Lecture 26: Likelihood ratio tests

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

STAT 801: Mathematical Statistics. Hypothesis Testing

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Lecture 21: October 19

Stochastic dominance with imprecise information

MAS221 Analysis Semester Chapter 2 problems

Statistical Inference

Proof. We indicate by α, β (finite or not) the end-points of I and call

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Recall the Basics of Hypothesis Testing

Tests and Their Power

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

Lecture 35: December The fundamental statistical distances

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

LEBESGUE INTEGRATION. Introduction

Logistic Regression Analysis

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Can we do statistical inference in a non-asymptotic way? 1

The best expert versus the smartest algorithm

Lecture 8: Information Theory and Statistics

Course 212: Academic Year Section 1: Metric Spaces

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture 17: Likelihood ratio and asymptotic tests

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

p(z)

STA 2101/442 Assignment 2 1

λ(x + 1)f g (x) > θ 0

Control of Directional Errors in Fixed Sequence Multiple Testing

Chapter 7. Hypothesis Testing

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

a n b n ) n N is convergent with b n is convergent.

ECE531 Lecture 4b: Composite Hypothesis Testing

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Transcription:

Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric hypothesis testing framework, we assume that the pdf/pmf of a random variable/vector of interest, X, is known up to a nite dimensional parameter. Denote the pdf/pmf by f X (x, θ) where θ is the parameter and θ is assumed to live in the parameter space Θ R dx. The hypotheses restrict the parameter to subsets of Θ: H 0 : θ Θ 0 vs. H 1 : θ Θ 1 := Θ/Θ 1. (1) Assume that we have a random n-sample, X = {X 1,..., X n }. A test is a rejection region, C R, which typically take the form: C R = {X : T (X) > c}, (2) where T (X) is a test statistic and c is a critical value. We dened the power function of the test to be: γ(θ) := Pr θ (T (X) > c) θ Θ. (3) The size of the test is the maximum null rejection probability: Sz γ(θ). A test is said to be of signicance level α i Sz α. Two tests of the same signicance level may be compared by their power. Fix a signicance level α (0, 1). Consider two tests for the hypotheses in (1), each with power function γ 1 (θ) and γ 2 (θ). Both tests are of signicance level α, i.e., max θ Θ0 γ 1 (θ) α and 1

max θ Θ0 γ 2 (θ) α. The rst test is uniformly more powerful than the second test i γ 1 (θ) γ 2 (θ) for all θ Θ 1 and γ 1 (θ) > γ 2 (θ) for some θ Θ 1. (4) Note that the uniform means uniform over θ Θ 1. Claim 1. If a test of signicance level α has size Sz 1 < α, there must be another test of signicance level α that is uniformly more powerful. Proof. We show this claim by construction. Suppose the rst test rejects i T 1 (X) > c for some test statistic T 1 (X) and critical value c. Introduce an auxiliary random variable ε. Conditional on T 1 (X) > c, ε = 1; and conditional on T 1 (X) c, ε Bern(p) with p = (1 α)/(1 Sz 1 ). Let the test statistic of the new test be: T 2 (X, ε) = εt 1 (X)+(1 ε)(c+1). Let the new test's critical value be also c. Then the new test is of level α because: Sz 2 Pr θ (T 2 (X, ε) > c) [Pr θ (T 2 (X, ε) > c, ε = 1) + Pr θ (T 2 (X, ε) > c, ε = 0)] [Pr θ (T 1 (X) > c, ε = 1) + Pr θ (c + 1 > c, ε = 0)] [Pr θ (T 1 (X) > c) + α Sz 1 ] Pr θ (T 1 (X) > c) + α Sz 1 = Sz 1 + α Sz 1 = α. (5) The new test is uniformly more powerful than the rst test because for any θ Θ 1 (in fact, in this example, for any θ Θ), γ 2 (θ) : = Pr θ (T 2 (X, ε) > c) = Pr θ (T 1 (X) > c) + α Sz 1 > Pr θ (T 1 (X) > c) =: γ 1 (θ). (6) Thus, the claim is proved. Remark. (1) If a test of signicance level α has size strictly less than α, we say this test is conservative. A conservative test can be easily improved upon if we know how much smaller 2

Sz is than our desired signicance level. The proof above gives us a way to do so, through a randomized test. The new test in the proof above is called a randomized test because it uses some extra random variable that we generate (not from the data). (2) The claim above does NOT imply that a conservative test is necessarily less powerful than a non-conservative test (i.e. one with Sz = α). For example, the competely randomized test (that rejects H 0 i a Bernoulli(1 α) independent of the data returns 1) is non-conservative, but its power is α, often much smaller than any test that uses the sample information in any reasonable way. In dicult testing situations, potentially conservative tests are used because it is dicult to gure out what Sz is exactly but it is easier to make (prove) Sz α. A test of signicance level α is said to be uniformly most powerful (UMP) if it is uniformly more powerful than any other test of signicance level α. A corollary of the claim above is that: a conservative test cannot be UMP. exist. UMP tests do not always exist. We will discuss several situations in which they do Case 1. Both H 0 and H 1 are simple hypotheses: H 0 : θ = θ 0 vs. H 1 θ = θ 1. (7) Lemma (Neyman-Pearson). Consider a random n-sample X from a distribution with density f X (x, θ). For the H 0 vs. H 1 above, consider a test with rejection region C R satisfy (1) any X such that f X (X, θ 0 )/f X (X, θ 1 ) < k belongs to C R, (2) any X such that f X (X, θ 0 )/f X (X, θ 1 ) > k does not belong to C R, and (3) Pr θ0 (X C R ) = α. Then the test is a UMP test of signicance level α. Proof. (in the book) Special case: If Pr θ0 (f X (X, θ 0 )/f X (X, θ 1 ) = k) = 0 at k = k(α) the α quantile of :f X (X, θ 0 )/f X (X, θ 1 ), then the UMP rejection region is of the form: C R = {X : f X (X, θ 0 )/f X (X, θ 1 ) < k(α)}. Otherwise, one may need the UMP test to be a randomized test. 3

Example 1. X N(θ, 1) and θ 1 > θ 0. Then f X (x 1,..., x n, θ) = n 1 i=1 e (x i θ) 2 /2. 2π The likelihood ratio is f X (X, θ 0 ) f X (X, θ 1 ) = n i=1e (X i θ 0 )2 /2+(X i θ 1 )2 /2 = n i=1e ( 2(θ 1 θ 0 )X i +θ 2 1 θ2 0 )/2 = e n i=1 ( (θ 1 θ 0 )X i +(θ 2 1 θ2 0 )/2) = e (n/2)(θ2 1 θ2 0 ) n(θ 1 θ 0 ) X n. (8) Under H 0, it is a continuous and strictly monotone function of a continuous random variable ( X n N(θ 0, 1/n)). Thus, it satises the special case: for any k > 0, Pr θ0 (f X (X, θ 0 )/f X (X, θ 1 ) = k) = Pr θ0 ((n/2)(θ 2 1 θ 2 0) n(θ 1 θ 0 ) X n = ln(k)) = Pr θ0 ( X n = 0.5(θ 1 + θ 0 ) ln(k)/(n(θ 1 θ 0 ))) = 0. (9) Thus, we just need to nd k(α) in order to get the UMP test. k(α) should satisfy: Pr θ0 (f X (X, θ 0 )/f X (X, θ 1 ) < k(α)) = α. Thus, k(α) solves the following equation, α = Pr θ0 ((n/2)(θ1 2 θ0) 2 n(θ 1 θ 0 ) X n < ln(k(α))) = Pr θ0 ( X n > 0.5(θ 1 + θ 0 ) ln(k(α))/(n(θ 1 θ 0 ))) = Pr θ0 ( n( X n θ 0 ) > 0.5(θ 1 θ 0 ) n ln(k(α))/( n(θ 1 θ 0 ))) = 1 Φ(0.5(θ 1 θ 0 ) n ln(k(α))/( n(θ 1 θ 0 ))), We could solve the equaltion and get a closed form for k(α), but it is more useful to realize that, for k(α) that satisfy the above equation, f X (X, θ 0 )/f X (X, θ 1 ) < k(α) is equivalent to n( X n θ 0 ) > z α, where z α is the 1 α quantile of N(0, 1). This tells 4

us that the UMP rejection region is of the form: C R = {X : t n := n( X n θ 0 ) > z α }. (10) The t n is called the t-statistic and the test is the t-test. Notice that the UMP test in this example only depends on the null value θ 0, not on the alternative value θ 1. This suggests, C R is also the UMP test for the following simple null against composite alternative hypotheses: H 0 : θ = θ 0 vs. H 1 : θ > θ 0. (11) Case 2. the null is simple and the alternative is composite. H 0 : θ = θ 0 vs. H 1 : θ Θ/{θ 0 }. (12) Lemma. Consider a random n-sample X from a distribution with density f X (x, θ). Assume that for any θ Θ, the likelihood ratio f X (X, θ 0 )/f X (X, θ 1 ) is an increasing function of a statistic T (X). Then, for the H 0 vs. H 1 above, consider a test with rejection region C R satisfy (1) any X such that T (X) < k belongs to C R, (2) any X such that T (X) > k does not belong to C R, and (3) Pr θ0 (X C R ) = α. Then the test is a UMP test of signicance level α. The additional assumption in the above Lemma relative to the Neyman-Pearson lemma is the monotone likelihood ratio assumption. The normal example above satisfy this assumption: the likelihood ratio is an increasing function of n( X n θ 0 ). (Think: what if we'd like to test H 0 : θ = θ 0 vs. H 1 : θ < θ 0. in the normal example)? Case 3. both the null and the alternative are composite. We only consider the one-dimensional example: H 0 : θ < θ 0 vs. H 1 : θ θ 0. (13) Claim. If a test, C R, of signicance level α is UMP for H 0 : θ = θ 0 vs. H 1 : θ θ 0 and inf θ<θ0 Pr θ (X C R ) α, then the test C R is also a UMP test for H 0 : θ < θ 0 vs. H 1 : θ θ 0 of signicance level α. 5

Proof. (exercise) The Neyman-Pearson Lemma and its extensions motivates the use of the likelihood ratio test: to test H 0 : θ Θ 0 against H 1 : θ Θ 1, one can simply let the rejection region be the set of X that satises: max θ Θ0 f X (X, θ) max θ Θ f X (X, θ) < k(α), where k(α) is the 1 α quantile of max f X (X,θ). (Verify) In the three cases above, assuming max θ Θ0 f X (X,θ) max θ Θ f X (X,θ) max θ Θ f X (X,θ) 's cdf is continuous at its 1 α quantile, the likelihood ratio test is UMP. The likelihood ratio test is often used in problems where the UMP test does not exist. It is relatively simple and has reasonably good power (though not necessarily UMP). Typically, one uses the logarithm of the likelihood ratio (log-likelihood ratio) instead of the likelihood ratio itself to dene the rejection region: LR n : ln(f X (X, θ)) max θ Θ ln(f X(X, θ)) < c(α), where c(α) is the 1 α quantile of LR n. In most non-normal settings, the exact distribution of max θ Θ0 ln(f X (X, θ)) max θ Θ ln(f X (X, θ)) is too complicated and the quantile of it is too hard to nd. But under fairly general conditions (for Θ 0 of certain shape), one can show 2LR n converges in distribution to a χ 2 random variable of a known degree of freedom under any θ Θ 0. This leads to the asymptotic version of the likelihood ratio test, where the χ 2 quantiles are used as critical values. The size of the asymptotic version of the test is only asymptotically controlled: where df is the appropriate degree of freedom. lim max Pr θ ( 2LR n > χ 2 n 1 α,df) = α, (14) Some more words about the Neyman-Pearson framework for hypothesis testing. The type-i error and the type-ii error are treated asymmetrically. We control the type-i error to on or below the signicance level α (which typically is a small number). And given that the type-i error is under control, we try to make the type-ii bigger. But the type-ii error can still be very big (close to 1 α) even for the UMP test. This is why a rejection is a stronger conclusion than an acceptance of H 0. If the test (of signicance level α) rejects H 0, the probability that it is making an error is at most α. If the test accepts H 0, the maximum probability that it is making an error sup θ Θ1 (1 γ(θ)), which could be big. For this reason, it is argued that one should use the hypothesis that one wants to reject as the null hypothesis and the opposite of it as the alternative hypothesis. 6

However because the type-i error and the type-ii error are treated asymmetrically, H 0 and H 1 cannot be specied in any way we want. For example, Suppose one wants to argue that θ = 0 (i.e. to reject the hypothesis that θ 0) and tries to species H 0 : θ 0 vs. H 1 : θ = 0. If f X (x, θ) is continuous in θ, then any test of signicance level α cannot have power greater than α. This is because the power function γ(θ) is continuous in θ and Sz := sup θ 0 γ(θ) α implies that γ(0) α.as a result, the test rejects at most 100α% of the time even when H 1 is true. As you may noticed, the signicance level α is important in the N-P framework. However, the choice of α is somewhat arbitrary. Conventionally, people use α = 0.01, α = 0.05 or α = 0.1. There is not much reason why these three values should be used, except that they look small and they look simple. Because dierent α implies dierent choices of the critical value (dierent rejection regions), two researchers using the same test statistic for the same hypotheses may reach dierent conclusion depending on this α. This inconvenience prompts people to use the empirical signicance level, or the p-value (probability value). p-value is the smallest signicance level at which the test rejects H 0. A smaller p-value is stronger evidence against H 0. In practice, the p-value is computed by: The where (x 1,..., x n ) is the observed value of X. p value Pr θ (T (X) > T (x 1,..., x n )), 7