Topic 10: Hypothesis Testing

Similar documents
Topic 10: Hypothesis Testing

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Statistical Inference

Statistical Inference

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Statistical Inference

Topic 15: Simple Hypotheses

F79SM STATISTICAL METHODS

LECTURE 5 HYPOTHESIS TESTING

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Institute of Actuaries of India

Random Variable. Pr(X = a) = Pr(s)

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

STAT 830 Hypothesis Testing

TUTORIAL 8 SOLUTIONS #

STAT 830 Hypothesis Testing

Summary of Chapters 7-9

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Hypothesis testing: theory and methods

Ch. 5 Hypothesis Testing

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

INTERVAL ESTIMATION AND HYPOTHESES TESTING

STAT 801: Mathematical Statistics. Hypothesis Testing

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Notes on Decision Theory and Prediction

HYPOTHESIS TESTING: FREQUENTIST APPROACH.


Mathematical Statistics

Visual interpretation with normal approximation

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Topic 17: Simple Hypotheses

Review. December 4 th, Review

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture 21. Hypothesis Testing II

Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

MTMS Mathematical Statistics

CSE 312 Final Review: Section AA

Chapter 7. Hypothesis Testing

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Math 152. Rumbos Fall Solutions to Assignment #12

Composite Hypotheses. Topic Partitioning the Parameter Space The Power Function

Solution: First note that the power function of the test is given as follows,

Hypothesis Testing: The Generalized Likelihood Ratio Test

8 Testing of Hypotheses and Confidence Regions

Problems ( ) 1 exp. 2. n! e λ and

Spring 2012 Math 541B Exam 1

Chapters 10. Hypothesis Testing

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Econ 325: Introduction to Empirical Economics

CONTINUOUS RANDOM VARIABLES

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π

Basic Concepts of Inference

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Hypothesis Testing Chap 10p460

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

STAT 461/561- Assignments, Year 2015

Statistical Inference. Hypothesis Testing

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Introductory Econometrics. Review of statistics (Part II: Inference)

Lecture 8: Information Theory and Statistics

Statistical Inference

8: Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Topic 19 Extensions on the Likelihood Ratio

Math 151. Rumbos Spring Solutions to Review Problems for Exam 3

STAT 4385 Topic 01: Introduction & Review

Applied Statistics I

Review of Discrete Probability (contd.)

Chapters 10. Hypothesis Testing

SUFFICIENT STATISTICS

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Parameter Estimation, Sampling Distributions & Hypothesis Testing

14.30 Introduction to Statistical Methods in Economics Spring 2009

hypothesis testing 1

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Topic 7: Convergence of Random Variables

Loglikelihood and Confidence Intervals

Topic 3: Hypothesis Testing

LECTURE NOTES 57. Lecture 9

How do we compare the relative performance among competing models?

Math 494: Mathematical Statistics

6.4 Type I and Type II Errors

Lecture 12 November 3

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Transcription:

Topic 10: Hypothesis Testing Course 003, 2016 Page 0

The Problem of Hypothesis Testing A statistical hypothesis is an assertion or conjecture about the probability distribution of one or more random variables. A test of a statistical hypothesis is a rule or procedure for deciding whether to reject that assertion. Suppose we have a sample x = (x 1,..., x n ) from a density f. We have two hypotheses about f. On the basis of our sample, one of the hypotheses is accepted and the other is rejected. The two hypotheses have different status: the null hypothesis, H 0, is the hypothesis under test. It is the conservative hypothesis, not to be rejected unless the evidence is clear the alternative hypothesis H 1 specifies the kind of departure from the null hypothesis that is of interest to us A hypothesis is simple if it completely specifies the probability distribution, else it is composite. Examples: (i) Income is log-normally distributed with known variance but unknown mean. H 0 : µ 8, 000 rupees per month, H 1 : µ < 8, 000 (ii) We would like to know whether parents are more likely to have boys than girls. Denoting with the probability of a boy child being denoted by the Bernoulli parameter p. H 0 : p = 1 2 and H 1 : p > 1 2 Page 1

Statistical tests Before deciding whether or not to accept H 0, we observe a random sample. Denote by S, the set of all possible outcomes X of the random sample. A test procedure or partitions the set S into two subsets, one containing the values that will lead to the acceptance of H 0 and the other containing values which lead its rejection. A statistical test is defined by the critical region R which is the subset of S for which H 0 is rejected. The complement of this region must therefore contain all outcomes for which H 0 is not rejected. Tests are based on values taken by a test statistic ( the same mean, the sample variance or functions of these). In this case, the critical region R, is a subset of values of the test statistic for which H 0 is rejected. The critical values of a test statistic are the bounds of R. When arriving at a decision based on a sample and a test, we may make two types of errors: H 0 may be rejected when it is true- a Type I error H 0 may be accepted when it is false- a Type II error Since a test is any rule which we use to make a decision and we have to think of rules that help us make better decisions (in terms of these errors). We will first discuss how to evaluate a given test using its power function, and then consider situations in which we have optimal tests and derive these. Page 2

The power function One way to characterize a test is to specify, for each value of θ Ω, the probability π(θ) that the test procedure will lead to the rejection of H 0. The function π(θ) is called the power function of the test: π(θ) = Pr(X R) for θ Ω If we are using a test statistic T, π(θ) = Pr(T R) for θ Ω Since the power function of a test specifies the probability of rejecting H 0 as a function of the real parameter value, we can evaluate our test by asking how often it leads to mistakes. What is the power function of an ideal test? Think of examples when such a test exists. It is common to specify an upper bound α 0 on π(θ) for every value θ Ω 0. This bound α 0 is the level of significance of the test. The size of a test, α is the maximum probability, among all values of θ Ω 0 of making an incorrect decision: α = sup π(θ) θ 0 Ω 0 Given a level of significance α 0, only tests for which α α 0 are admissible. Page 3

Example 1: Binomial distribution We want to test a hypothesis about the number of defective bolts in a shipment. The probability of a bolt being defective is assumed to follow a Bernoulli distribution. Consider H 0 : p.02, H 1 : p >.02. We have a sample of 200 items and our test takes the form of rejecting the null hypothesis if we find more than a certain number of items to be defective. We want to find a test for which α 0 =.05. The number of defective items X Bin(n, p) and P(x) is increasing in p for all x > np. We can therefore restrict our attention p =.02 when finding a test with significance level α 0 = 0.05. It turns out that for p =.02, the probability that the number of defective items is greater than 7 is.049 ( display 1-binomial(200,7,.02)). R = {x : x > 7} is therefore the test we choose and its size is 0.049. With discrete distributions, the size will often be strictly smaller than α 0. The size of tests which reject for more than 4, 5 and 6 defective pieces are.37..21 and.11 respectively. Page 4

Example 1: power function Let s graph the power function of this test: twoway function 1-binomial(200,7,x), range(0.1) y 0.2.4.6.8 1 0.02.04.06.08.1 x What happens to this power function as we increase or decrease the critical region? (say R = {x : x > 6} or R = {x : x > 8}. Can you mark off the two types of errors for different values of p? Page 5

Example 2 : Uniform distribution A random sample is taken from a uniform distribution on [0, θ] and we would like to test H 0 : 3 θ 4 against H 1 : θ < 3 or θ > 4 Our test procedure uses the M.L.E. of θ, Y n = max(x 1,..., X n ) and rejects the null hypothesis whenever Y n lies outside [2.9, 4]. What might be the rationale for this type of test? The power function for this test is given by What is the power of the test if θ < 2.9? π(θ) = Pr(Y n < 2.9 θ) + Pr(Y n > 4 θ) When θ takes values between 2.9 and 4, the probability that any sample value is less than 2.9 is given by 2.9 θ and therefore Pr(Y n < 2.9 θ) = ( 2.9 θ )n and Pr(Y n > 4 θ) = 0. Therefore the power function π(θ) = ( 2.9 θ )n When θ > 4, π(θ) = ( 2.9 θ )n + 1 ( 4 θ )n Page 6

The power graph..example 2 9.1 Problems of Testing Hypoth 1 2 3 4 2.9 5 u Let T be a test statistic, and suppose that our test will reject the null hy T c, for some constant c. Suppose also that we desire our test to have t Page 7 significance α 0. The power function of our test is π(θ δ) = Pr(T c θ), an

Example 3: Normal distribution X N(µ, 100) and we are interested in testing H 0 : µ = 80 against H 1 : µ > 80. Let x denote the mean of a sample n = 25 from this distribution and suppose we use the critical region R = {(x 1, x 2,... x 25 ) : x > 83}. The power function is ( X π(µ) = P( X µ > 83) = P σ/ n > 83 µ ) 2 = 1 Φ( 83 µ ) 2 The size of this test is the probability of Type 1 error: α = 1 Φ( 3 2 ) =.067 = π(80) What are the values of π(83), and π(86)? π(80) is given above, π(83) = 0.5, π(86) = 1 Φ( 3 2 ) = Φ( 3 2 ) =.933 stata: display normal(1.5) We can sketch the graph of the power function using the command stata: twoway function 1-normal((83-x)/2), range (70 90) What is the p-value corresponding to x = 83.41? Page 8 This is the smallest level of significance, α 0 at which a given hypothesis would be rejected based on the observed outcome of X? In this case, Pr( X 83.41) = 1 Φ( 3.41 2 ) =.044.

Testing simple hypotheses Suppose that Ω 0 and Ω 1 contain only a single element each and our null and alternative hypotheses are given by H 0 : θ = θ 0 and H 1 : θ = θ 1 Denote by f i (x) the joint density function or p.f. of the observations in our sample under H i : f i (x) = f(x 1 θ i )f(x 2 θ i )... f(x n θ i ) Dentoe the type I and type II errors by α(δ) and β(δ) respectively: α(δ) = Pr( Rejecting H 0 θ = θ 0 ) and β(δ) = Pr( Not Rejecting H 0 θ = θ 1 ) By always accepting H 0, we achieve α(δ) = 0 but then β(δ) = 1. The converse is true if we always reject H 0. It turns out that we can find an optimal test which minimizes any linear combination of α(δ) and β(δ). Page 9

Optimal tests for simple hypotheses The test procedure δ that minimizes aα(δ) + bβ(δ) for specified constants a and b: THEOREM : Let δ denote a test procedure such that the hypothesis H 0 is accepted if af 0 (x) > bf 1 (x) and H 1 is accepted if af 0 (x) < bf 1 (x). Either H 0 or H 1 may be accepted if af 0 (x) = bf 1 (x). Then for any other test procedure δ, aα(δ ) + bβ(δ ) aα(δ) + bβ(δ) So we reject whenever the likelihood ratio f 1(x) f 0 (x) > b a. If we are minimizing the sum of errors, we would reject whenever f 1(x) f 0 (x) > 1. Proof. (for discrete distributions) aα(δ)+bβ(δ) = a x R f 0 (x)+b x R c f 1 (x) = a x R f 0 (x)+b 1 f 1 (x) = b+ [af 0 (x) bf 1 (x)] x R x R The desired function aα(δ) + bβ(δ) will be minimized when the above summation is minimized. This will happen if the critical region includes only those points for which af 0 (x) bf 1 (x) < 0. We therefore reject when the likelihood ratio exceeds a b. Page 10

Minimizing β(δ), given α 0 If we fix a level of significance α 0 we want a test procedure that minimizes β(δ), the type II error subject to α α 0. The Neyman-Pearson Lemma : Let δ denote a test procedure such that, for some constant k, the hypothesis H 0 is accepted if f 0 (x) > kf 1 (x) and H 1 is accepted if f 0 (x) < kf 1 (x). Either H 0 or H 1 may be accepted if f 0 (x) = kf 1 (x). If δ is any other test procedure such that α(δ) α(δ ), then it follows that β(δ) β(δ ). Furthermore if α(δ) < α(δ ) then β(δ) > β(δ ) This result implies that if we set a level of significance α 0 =.05, we should try and find a value of k for which α(δ ) =.05 This procedure will then have the minimum possible value of β(δ). Proof. (for discrete distributions) From the previous theorem we know that α(δ ) + kβ(δ ) α(δ) + kβ(δ). So if α(δ) α(δ ), it follows that β(δ) β(δ ) Page 11

Neyman Pearson Lemma..example 1 Let X 1... X n be a sample from a normal distribution with unknown mean and variance 1. H 0 : θ = 0 and H 1 : θ = 1 We will find a test procedure δ which keeps α =.05 and minimizes β. We have f 0 (x) = 1 (2π) n 2 e 1 2 x 2 i and f 1 (x) = 1 (2π) n 2 e 1 2 (xi 1) 2 f 1 (x) f 0 (x) = en( x n 1 2 ) The lemma tells us to use a procedure which rejects H 0 when the likelihood ratio is greater than a constant k. This condition can be re-written in terms of our sample mean x n > 1 2 + 1 n log k = k. We want to find a value of k for which Pr( X n > k θ = 0) =.05 or, alternatively, Pr(Z > k n) =.05 (why?) This gives us k n = 1.645 or k = 1.645 n. Under this procedure, the type II error, β(δ) is given by β(δ ) = Pr( X n < 1.645 n θ = 1) = Pr(Z < 1.645 n) For n = 9, we have β(δ ) = 0.0877 (display normal(1.645-3)) If instead, we are interested in choosing δ 0 to minimize 2α(δ) + β(δ), we choose k = 2 1 + n 1 log 2, so with n = 9 our optimal procedure rejects H 0 when x n > 0.577. In this case, α(δ 0 ) = 0.0417 (display 1-normal( (.577)*3)) and β(δ 0 ) = 0.1022 (display normal( (.577-1)*3)) and the minimized value of 2α(δ) + β(δ) is 0.186 Page 12

Neyman Pearson Lemma..example 2 Let X 1... X n be a sample from a Bernoulli distribution H 0 : p = 0.2 and H 1 : p = 0.4 We will find a test procedure δ which keeps α =.05 and minimizes β. let Y = X i and y its realization. We have f 0 (x) = (0.2) y (0.8) n y and f 1 (x) = (0.4) y (0.6) n y f 1 (x) ( 3 ) n ( 8 ) y f 0 (x) = 4 3 The lemma tells us to use a procedure which rejects H 0 when the likelihood ratio is greater than a constant k. This condition can be re-written in terms of our sample mean y > log k+n log 4 3 log 8 3 Now we would like to find k such that Pr(Y > k p = 0.2) =.05 We may not however be able to do this given that Y is discrete. If n = 10, we find that Pr(Y > 3 p = 0.2) =.121 and Pr(Y > 4 p = 0.2) =.038, (display 1-binomial(10,4,.2)) so we can decide to set these probabilities as the values of α(δ) and use the corresponding values for k. = k. Page 13