STAT 801: Mathematical Statistics. Hypothesis Testing

Similar documents
STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Quality control of risk measures: backtesting VAR models

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Chapters 10. Hypothesis Testing

Topic 15: Simple Hypotheses

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Lecture 12 November 3

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Topic 10: Hypothesis Testing

Exponential and Logarithmic. Functions CHAPTER The Algebra of Functions; Composite

Chapters 10. Hypothesis Testing

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Topic 10: Hypothesis Testing

Problem Set. Problems on Unordered Summation. Math 5323, Fall Februray 15, 2001 ANSWERS

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

TUTORIAL 8 SOLUTIONS #

8: Hypothesis Testing

Chapter 9: Hypothesis Testing Sections

Chapter 4. Theory of Tests. 4.1 Introduction

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values

Central Limit Theorems and Proofs

Mathematical Statistics

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

( x) f = where P and Q are polynomials.

F79SM STATISTICAL METHODS

1 Relative degree and local normal forms

10. Joint Moments and Joint Characteristic Functions

8.4 Inverse Functions

Chapter 7. Hypothesis Testing

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

y2 = 0. Show that u = e2xsin(2y) satisfies Laplace's equation.

Published in the American Economic Review Volume 102, Issue 1, February 2012, pages doi: /aer

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

RATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions

CS 361 Meeting 28 11/14/18

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Lecture 16 November Application of MoUM to our 2-sided testing problem

VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

1. Definition: Order Statistics of a sample.

(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results

Hypothesis Testing - Frequentist

Differentiation. The main problem of differential calculus deals with finding the slope of the tangent line at a point on a curve.

9.3 Graphing Functions by Plotting Points, The Domain and Range of Functions

Lecture 21. Hypothesis Testing II

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Lab on Taylor Polynomials. This Lab is accompanied by an Answer Sheet that you are to complete and turn in to your instructor.

0,0 B 5,0 C 0, 4 3,5. y x. Recitation Worksheet 1A. 1. Plot these points in the xy plane: A

Basic properties of limits

Statistics 3858 : Maximum Likelihood Estimators

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Summary of Chapters 7-9

Gaussian Process Regression Models for Predicting Stock Trends

Lecture 10: Generalized likelihood ratio test

Ch. 5 Hypothesis Testing

Topic 17: Simple Hypotheses

Extreme Values of Functions

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

VALUATIVE CRITERIA BRIAN OSSERMAN

Basic mathematics of economic models. 3. Maximization

Definition: Let f(x) be a function of one variable with continuous derivatives of all orders at a the point x 0, then the series.

Christoffel symbols and Gauss Theorema Egregium

CSE 312 Final Review: Section AA

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations

SEPARATED AND PROPER MORPHISMS

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction

14.30 Introduction to Statistical Methods in Economics Spring 2009

Lecture notes on statistical decision theory Econ 2110, fall 2013

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.

6.4 Type I and Type II Errors

Math-3 Lesson 8-5. Unit 4 review: a) Compositions of functions. b) Linear combinations of functions. c) Inverse Functions. d) Quadratic Inequalities

The basics of frame theory

Spring 2012 Math 541B Exam 1

Theory of Statistical Tests

A NOTE ON HENSEL S LEMMA IN SEVERAL VARIABLES

8 Testing of Hypotheses and Confidence Regions

simple if it completely specifies the density of x

ECE531 Lecture 4b: Composite Hypothesis Testing

SEPARATED AND PROPER MORPHISMS

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

On the GLR and UMP tests in the family with support dependent on the parameter

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

CLASS NOTES MATH 527 (SPRING 2011) WEEK 6

ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Transcription:

STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o choosing between two hypotheses: H o : θ Θ 0 or H 1 : θ Θ 1 where Θ 0 and Θ 1 are a partition o the model P θ ; θ Θ. That is Θ 0 Θ 1 = Θ and Θ 0 Θ 1 =. A rule or making the required choice can be described in two ways: 1. In terms o the set called the rejection or critical region o the test. R = {X : we choose Θ 1 i we observe X} 2. In terms o a unction φ(x) which is equal to 1 or those x or which we choose Θ 1 and 0 or those x or which we choose Θ 0. For technical reasons which will come up soon I preer to use the second description. However, each φ corresponds to a unique rejection region R φ = {x : φ(x) = 1}. Neyman Pearson approach treats two hypotheses asymmetrically. Hypothesis H o reerred to as the null hypothesis (traditionally the hypothesis that some treatment has no eect). Deinition: The power unction o a test φ (or the corresponding critical region R φ ) is π(θ) = P θ (X R φ ) = E θ (φ(x)) Interested in optimality theory, that is, the problem o inding the best φ. A good φ will evidently have π(θ) small or θ Θ 0 and large or θ Θ 1. There is generally a trade o which can be made in many ways, however. Simple versus Simple testing Finding a best test is easiest when the hypotheses are very precise. Deinition: A hypothesis H i is simple i Θ i contains only a single value θ i. The simple versus simple testing problem arises when we test θ = θ 0 against θ = θ 1 so that Θ has only two points in it. This problem is o importance as a technical tool, not because it is a realistic situation. Suppose that the model speciies that i θ = θ 0 then the density o X is 0 (x) and i θ = θ 1 then the density o X is 1 (x). How should we choose φ? To answer the question we begin by studying the problem o minimizing the total error probability. Type I error: the error made when θ = θ 0 but we choose H 1, that is, X R φ. Type II error: when θ = θ 1 but we choose H 0. The level o a simple versus simple test is α = P θ0 (We make a Type I error) or Other error probability denoted β is α = P θ0 (X R φ ) = E θ0 (φ(x)) β = P θ1 (X R φ ) = E θ1 (1 φ(x)). Minimize α + β, the total error probability given by α + β = E θ0 (φ(x)) + E θ1 (1 φ(x)) = [φ(x) 0 (x) + (1 φ(x)) 1 (x)]dx 1

Problem: choose, or each x, either the value 0 or the value 1, in such a way as to minimize the integral. But or each x the quantity φ(x) 0 (x) + (1 φ(x)) 1 (x) is between 0 (x) and 1 (x). To make it small we take φ(x) = 1 i 1 (x) > 0 (x) and φ(x) = 0 i 1 (x) < 0 (x). It makes no dierence what we do or those x or which 1 (x) = 0 (x). Notice: divide both sides o inequalities to get condition in terms o likelihood ratio 1 (x)/ 0 (x). Theorem: For each ixed λ the quantity β + λα is minimized by any φ which has { 1 1(x) φ(x) = > λ 0 1(x) < λ Neyman and Pearson suggested that in practice the two kinds o errors might well have unequal consequences. They suggested that rather than minimize any quantity o the orm above you pick the more serious kind o error, label it Type I and require your rule to hold the probability α o a Type I error to be no more than some prespeciied level α 0. (This value α 0 is typically 0.05 these days, chiely or historical reasons.) The Neyman and Pearson approach is then to minimize β subject to the constraint α α 0. Usually this is really equivalent to the constraint α = α 0 (because i you use α < α 0 you could make R larger and keep α α 0 but make β smaller. For discrete models, however, this may not be possible. Example: Suppose X is Binomial(n, p) and either p = p 0 = 1/2 or p = p 1 = 3/4. I R is any critical region (so R is a subset o {0, 1,..., n}) then P 1/2 (X R) = k 2 n or some integer k. I we want α 0 = 0.05 with say n = 5 or example we have to recognize that the possible values o α are 0, 1/32 = 0.03125, 2/32 = 0.0625 and so on. Possible rejection regions or α 0 = 0.05: Region α β R 1 = 0 1 R 2 = {x = 0} 0.03125 1 (1/4) 5 R 3 = {x = 5} 0.03125 1 (3/4) 5 So R 3 minimizes β subject to α < 0.05. Raise α 0 slightly to 0.0625: possible rejection regions are R 1, R 2, R 3 and R 4 = R 2 R 3. The irst three have the same α and β as beore while R 4 has α = α 0 = 0.0625 an β = 1 (3/4) 5 (1/4) 5. Thus R 4 is optimal! Problem: i all trials are ailures optimal R chooses p = 3/4 rather than p = 1/2. But p = 1/2 makes 5 ailures much more likely does p = 3/4. Problem: discreteness. Here s how we get around the problem. First we expand the set o possible values o φ to include numbers between 0 and 1. Values o φ(x) between 0 and 1 represent the chance that we choose H 1 given that we observe x; the idea is that we actually toss a (biased) coin to decide! This tactic will show us the kinds o rejection regions which are sensible. In practice we then restrict our attention to levels α 0 or which the best φ is always either 0 or 1. In the binomial example we will insist that the value o α 0 be either 0 or P θ0 (X 5) or P θ0 (X 4) or... Smaller example: 4 possible values o X and 2 4 possible rejection regions. Here is a table o the levels or each possible rejection region R: R α 0 {3}, {0} 1/8 {0,3} 2/8 {1}, {2} 3/8 {0,1}, {0,2}, {1,3}, {2,3} 4/8 {0,1,3}, {0,2,3} 5/8 {1,2} 6/8 {0,1,3}, {0,2,3} 7/8 {0,1,2,3} 1 2

Best level 2/8 test has rejection region {0, 3}, β = 1 [(3/4) 3 + (1/4) 3 ] = 36/64. Best level 2/8 test using randomization rejects when X = 3 and, when X = 2 tosses a coin with P (H) = 1/3, then rejects i you get H. Level is 1/8+(1/3)(3/8) = 2/8; probability o Type II error is β = 1 [(3/4) 3 +(1/3)(3)(3/4) 2 (1/4)] = 28/64. Deinition: A hypothesis test is a unction φ(x) whose values are always in [0, 1]. I we observe X = x then we choose H 1 with conditional probability φ(x). In this case we have and π(θ) = E θ (φ(x)) α = E 0 (φ(x)) β = 1 E 1 (φ(x)) Note that a test using a rejection region C is equivalent to φ(x) = 1(x C) The Neyman Pearson Lemma: In testing 0 against 1 the probability β o a type II error is minimized, subject to α α 0 by the test unction: 1 1(x) > λ φ(x) = γ 1(x) = λ 0 1(x) < λ where λ is the largest constant such that and and where γ is any number chosen so that P 0 ( 1(X) 0 (X) λ) α 0 P 0 ( 1(X) 0 (X) λ) 1 α 0 E 0 (φ(x)) = P 0 ( 1(X) 0 (X) > λ) = α 0 + γp 0 ( 1(X) 0 (X) = λ) The value o γ is unique i P 0 ( 1(X) 0(X) = λ) > 0. Example: Binomial(n, p) with p 0 = 1/2 and p 1 = 3/4: ratio 1 / 0 is 3 x 2 n I n = 5 this ratio is one o 1, 3, 9, 27, 81, 243 divided by 32. Suppose we have α = 0.05. λ must be one o the possible values o 1 / 0. I we try λ = 243/32 then and So λ = 81/32. P 0 (3 X 2 5 243/32) = P 0 (X = 5) = 1/32 < 0.05 P 0 (3 X 2 5 81/32) = P 0 (X 4) = 6/32 > 0.05 3

Since we must solve or γ and ind P 0 (3 X 2 5 > 81/32) = P 0 (X = 5) = 1/32 P 0 (X = 5) + γp 0 (X = 4) = 0.05 γ = 0.05 1/32 5/32 = 0.12 NOTE: No-one ever uses this procedure. Instead the value o α 0 used in discrete problems is chosen to be a possible value o the rejection probability when γ = 0 (or γ = 1). When the sample size is large you can come very close to any desired α 0 with a non-randomized test. I α 0 = 6/32 then we can either take λ to be 243/32 and γ = 1 or λ = 81/32 and γ = 0. However, our deinition o λ in the theorem makes λ = 81/32 and γ = 0. When the theorem is used or continuous distributions it can be the case that the cd o 1 (X)/ 0 (X) has a lat spot where it is equal to 1 α 0. This is the point o the word largest in the theorem. Example: I X 1,..., X n are iid N(µ, 1) and we have µ 0 = 0 and µ 1 > 0 then 1 (X 1,..., X n ) 0 (X 1,..., X n ) = exp{µ 1 Xi nµ 2 1/2 µ 0 Xi + nµ 2 0/2} which simpliies to exp{µ 1 Xi nµ 2 1/2} Now choose λ so that P 0 (exp{µ 1 Xi nµ 2 1/2} > λ) = α 0 Can make it equal because 1 (X)/ 0 (X) has a continuous distribution. Rewrite probability as P 0 ( X i > [log(λ) + nµ 2 1/2]/µ 1 ) Let z α be upper α critical point o N(0, 1); then = ( ) log(λ) + nµ 2 1 Φ 1 /2 n 1/2 µ 1 z α0 = [log(λ) + nµ 2 1/2]/[n 1/2 µ 1 ]. Solve to get a ormula or λ in terms o z α0, n and µ 1. The rejection region looks complicated: reject i a complicated statistic is larger than λ which has a complicated ormula. But in calculating λ we re-expressed the rejection region in terms o Xi > z α0 n The key eature is that this rejection region is the same or any µ 1 > 0. [WARNING: in the algebra above I used µ 1 > 0.] This is why the Neyman Pearson lemma is a lemma! Deinition: In the general problem o testing Θ 0 against Θ 1 the level o a test unction φ is The power unction is α = sup θ Θ 0 E θ (φ(x)) π(θ) = E θ (φ(x)) A test φ is a Uniormly Most Powerul level α 0 test i 4

1. φ has level α α o 2. I φ has level α α 0 then or every θ Θ 1 we have E θ (φ(x)) E θ (φ (X)) Proo o Neyman Pearson lemma: Given a test φ with level strictly less than α 0 we can deine the test φ (x) = 1 α 0 1 α φ(x) + α 0 α 1 α has level α 0 and β smaller than that o φ. Hence we may assume without loss that α = α 0 and minimize β subject to α = α 0. However, the argument which ollows doesn t actually need this. Lagrange Multipliers Suppose you want to minimize (x) subject to g(x) = 0. Consider irst the unction I x λ minimizes h λ then or any other x h λ (x) = (x) + λg(x) (x λ ) (x) + λ[g(x) g(x λ )] Now suppose you can ind a value o λ such that the solution x λ has g(x λ ) = 0. Then or any x we have (x λ ) (x) + λg(x) and or any x satisying the constraint g(x) = 0 we have (x λ ) (x) This proves that or this special value o λ the quantity x λ minimizes (x) subject to g(x) = 0. Notice that to ind x λ you set the usual partial derivatives equal to 0; then to ind the special x λ you add in the condition g(x λ ) = 0. Return to proo o NP lemma For each λ > 0 we have seen that φ λ minimizes λα + β where φ λ = 1( 1 (x)/ 0 (x) λ). As λ increases the level o φ λ decreases rom 1 when λ = 0 to 0 when λ =. There is thus a value λ 0 where or λ < λ 0 the level is less than α 0 while or λ > λ 0 the level is at least α 0. Temporarily let δ = P 0 ( 1 (X)/ 0 (X) = λ 0 ). I δ = 0 deine φ = φ λ. I δ > 0 deine 1 φ(x) = γ 0 1(x) > λ 0 1(x) = λ 0 1(x) < λ 0 where P 0 ( 1 (X)/ 0 (X) < λ 0 ) + γδ = α 0. You can check that γ [0, 1]. Now φ has level α 0 and according to the theorem above minimizes α + λ 0 β. Suppose φ is some other test with level α α 0. Then λ 0 α φ + β φ λ 0 α φ + β φ We can rearrange this as Since β φ β φ + (α φ α φ )λ 0 α φ α 0 = α φ 5

the second term is non-negative and β φ β φ which proves the Neyman Pearson Lemma. Example application o NP: Binomial(n, p) to test p = p 0 versus p 1 or a p 1 > p 0 the NP test is o the orm φ(x) = 1(X > k) + γ1(x = k) where we choose k so that and γ [0, 1) so that P p0 (X > k) α 0 < P p0 (X k) α 0 = P p0 (X > k) + γp p0 (X = k) This rejection region depends only on p 0 and not on p 1 so that this test is UMP or p = p 0 against p > p 0. Since this test has level α 0 even or the larger null hypothesis it is also UMP or p p 0 against p > p 0. Application o the NP lemma: In the N(µ, 1) model consider Θ 1 = {µ > 0} and Θ 0 = {0} or Θ 0 = {µ 0}. The UMP level α 0 test o H 0 : µ Θ 0 against H 1 : µ Θ 1 is φ(x 1,..., X n ) = 1(n 1/2 X > zα0 ) Proo: For either choice o Θ 0 this test has level α 0 because or µ 0 we have P µ (n 1/2 X > z α0 ) = P µ (n 1/2 ( X µ) > z α0 n 1/2 µ) = P (N(0, 1) > z α0 n 1/2 µ) P (N(0, 1) > z α0 ) = α 0 (Notice the use o µ 0. The central point is that the critical point is determined by the behaviour on the edge o the null hypothesis.) Now i φ is any other level α 0 test then we have Fix a µ > 0. According to the NP lemma where φ µ rejects i E 0 (φ(x 1,..., X n )) α 0 E µ (φ(x 1,..., X n )) E µ (φ µ (X 1,..., X n )) µ (X 1,..., X n )/ 0 (X 1,..., X n ) > λ or a suitable λ. But we just checked that this test had a rejection region o the orm n 1/2 X > zα0 which is the rejection region o φ. The NP lemma produces the same test or every µ > 0 chosen as an alternative. So we have shown that φ µ = φ or any µ > 0. This phenomenon is somewhat general. What happened was this. For any µ > µ 0 the likelihood ratio µ / 0 is an increasing unction o X i. The rejection region o the NP test is thus always a region o the orm X i > k. The value o the constant k is determined by the requirement that the test have level α 0 and this depends on µ 0 not on µ 1. Deinition: The amily θ ; θ Θ R has monotone likelihood ratio with respect to a statistic T (X) i or each θ 1 > θ 0 the likelihood ratio θ1 (X)/ θ0 (X) is a monotone increasing unction o T (X). Theorem: For a monotone likelihood ratio amily the Uniormly Most Powerul level α test o θ θ 0 (or o θ = θ 0 ) against the alternative θ > θ 0 is 1 T (x) > t α φ(x) = γ T (X) = t α 0 T (x) < t α 6

where P θ0 (T (X) > t α ) + γp θ0 (T (X) = t α ) = α 0. Typical amily where this works: one parameter exponential amily. Usually there is no UMP test. Example: test µ = µ 0 against two sided alternative µ µ 0. There is no UMP level α test. I there were its power at µ > µ 0 would have to be as high as that o the one sided level α test and so its rejection region would have to be the same as that test, rejecting or large positive values o X µ0. But it also has to have power as good as the one sided test or the alternative µ < µ 0 and so would have to reject or large negative values o X µ0. This would make its level too large. Favourite test: usual 2 sided test rejects or large values o X µ 0. Test maximizes power subject to two constraints: irst, level α; second power is minimized at µ = µ 0. Second condition means power on alternative is larger than on the null. 7