Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Similar documents
Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Mathematical Statistics

Review. December 4 th, Review

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Ch. 5 Hypothesis Testing

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Statistical Inference

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Hypothesis testing: theory and methods

Part III. A Decision-Theoretic Approach and Bayesian testing

Lecture 12 November 3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing - Frequentist

Statistical Inference

Detection theory 101 ELEC-E5410 Signal Processing for Communications

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Mathematical statistics

Chapter 7. Hypothesis Testing

Hypothesis Testing Chap 10p460

Model comparison and selection

ST5215: Advanced Statistical Theory

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Basic Concepts of Inference

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Chapter 4. Theory of Tests. 4.1 Introduction

Math 152. Rumbos Fall Solutions to Assignment #12

Summary of Chapters 7-9

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Introductory Econometrics

A Very Brief Summary of Statistical Inference, and Examples

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Statistical Inference


A Very Brief Summary of Statistical Inference, and Examples

Hypothesis Testing: The Generalized Likelihood Ratio Test

Chapter 9: Hypothesis Testing Sections

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

MTMS Mathematical Statistics

Methods of evaluating tests

Math 494: Mathematical Statistics

STAT 830 Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing

STAT 830 Hypothesis Testing

Mathematical statistics

Chapter 8 Hypothesis Testing

A Handbook to Conquer Casella and Berger Book in Ten Days

Math 494: Mathematical Statistics

Detection and Estimation Chapter 1. Hypothesis Testing

ECE531 Lecture 4b: Composite Hypothesis Testing

F79SM STATISTICAL METHODS

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Lecture notes on statistical decision theory Econ 2110, fall 2013

Some General Types of Tests

Statistical Inference

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

10. Composite Hypothesis Testing. ECE 830, Spring 2014

STAT 514 Solutions to Assignment #6

Practice Problems Section Problems

7. Estimation and hypothesis testing. Objective. Recommended reading

A Very Brief Summary of Bayesian Inference, and Examples

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

P Values and Nuisance Parameters

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Lecture 2: Basic Concepts of Statistical Decision Theory

Central Limit Theorem ( 5.3)

ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Lecture 21: October 19

MIT Spring 2016

Detection Theory. Composite tests

Topic 10: Hypothesis Testing

Interval Estimation. Chapter 9

F & B Approaches to a simple model

8.1-4 Test of Hypotheses Based on a Single Sample

8 Testing of Hypotheses and Confidence Regions

Lecture 1: Bayesian Framework Basics

2.5 Hypothesis Testing

Mathematical statistics

Introductory Econometrics. Review of statistics (Part II: Inference)

Lecture 21. Hypothesis Testing II

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Comparing Variations of the Neyman-Pearson Lemma

Lecture 8: Information Theory and Statistics

Statistical Inference. Hypothesis Testing

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Department of Mathematics

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Lecture 7 Introduction to Statistical Decision Theory

Statistics 3858 : Maximum Likelihood Estimators

Topic 15: Simple Hypotheses

TUTORIAL 8 SOLUTIONS #

Transcription:

Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed by the d-dimensional vector θ Θ IR d. We shall deal with hypotheses of the form θ Θ 0 Θ 3-1

Statistical Hypotheses Definition 3.2 A statistical test requires two hypotheses. The hypothesis to be tested is called the null hypothesis, H 0. The alternative hypothesis, H 1 is the hypothesis which will be accepted as true if we conclude that H 0 is false. Definition 3.3 If a statistical hypothesis completely specifies the population distribution then it is called a simple hypothesis otherwise it is called a composite hypothesis. 3-2

The Critical Region Definition 3.4 The critical region (or rejection region) associated with a statistical test is a subset of the sample space such that we reject the null hypothesis in favour of the alternative if, and only if, the observed sample falls within this set. Usually the critical region is specified in terms of a test statistic. 3-3

Hypothesis Testing Procedure 1. Specify the null hypothesis, H 0, which will be tested. 2. Specify the alternative hypothesis, H 1. 3. Specify the test statistic which will be used to test the hypothesis and define the critical region for the test. 4. Collect the data. 5. Reject H 0 if the observed value of the test statistic lies in the critical region, otherwise conclude that we cannot reject H 0. 3-4

The Null Hypothesis H 0 is generally the sceptical hypothesis. Hypothesis testing can be thought of as the search for evidence against H 0 in favour of H 1. We generally do not conclude that H 0 is true, rather we conclude that there is insufficient evidence to prove it false. It is generally thought to be worse to declare H 0 false when it is not, than to fail to reject H 0 when it is actually false. For this reason we generally try to limit the probability of rejecting H 0 when it is true. 3-5

Size of a Test Definition 3.5 Suppose that we are testing H 0 : θ Θ 0 V H 1 : θ / Θ 0 and that our rejection region is of the form {x : x C}. Then the size of the rejection region (or test) is defined to be α = sup P θ (X C) θ Θ 0 The size of the test is the highest probability of rejecting H 0 when it is true. Generally we decide on a value of α and then find the set C such that the test has size α. 3-6

Likelihood Ratio Tests Definition 3.6 Suppose that x 1,..., x n are the observed values of a random sample from a single parameter distribution and we wish to test the simple null hypothesis H 0 : θ = θ 0 V H 1 : θ θ 0. Let ˆθ be the maximum likelihood estimate of θ. Then the likelihood ratio test has rejection region C = { x 1,..., x n L(θ 0 x 1,..., x n ) L(ˆθ x 1,..., x n ) k where k is chosen such that P θ0 (X C) = α. } 3-7

Generalized Likelihood Ratio Test Definition 3.7 Suppose that x 1,..., x n are the observed values of a random sample from a distribution depending on the parameter vector θ Θ and we wish to test H 0 : θ Θ 0 V H 1 : θ / Θ 0. Then the generalized likelihood ratio test statistic is λ(x) = sup Θ 0 L(θ x 1,..., x n ) sup Θ L(θ x 1,..., x n ) = L(ˆθ 0 x 1,..., x n ) L(ˆθ x 1,..., x n ) where ˆθ is the maximum likelihood estimator and ˆθ 0 is the maximum likelihood estimator constrained to be in the set Θ 0. The generalized likelihood ratio test has rejection region C = {x 1,..., x n λ(x) k} where k is chosen such that sup θ Θ 0 P θ (λ(x) k) = α. 3-8

Likelihood Ratio Tests and Sufficient Statistics Suppose that instead of recording x 1,..., x n we only record t = T (x) where T is a sufficient statistic for θ. Let f T (t θ) be the density (or mass) function for T. Then the likelihood for θ based on t is L (θ t) f T (t θ) How do tests based on L relate to those based on L? 3-9

Likelihood Ratio Tests and Sufficient Statistics Theorem 3.1 Suppose that X is a random sample from a population with density f(x θ) and T (X) is a sufficient statistic for θ. Consider testing H 0 : θ Θ 0 V H 1 : θ / Θ 0 Let λ(x) be the likelihood ratio test statistic based on the sample x and let λ (t) be the likelihood ratio test based on the sufficient statistic t = T (x). The for every x in the sample space λ ( T (x) ) = λ(x). 3-10

Error Probabilities Definition 3.8 When testing a statistical hypothesis there are two types of errors that can be made. Rejecting H 0 when it is actually true is called a Type I Error. Failing to reject H 0 when it is false is called a Type II Error Suppose our test specifies Reject H 0 T (x 1,..., x n ) C. The probability of making a type I error is α = P ( T (X 1,..., X n ) C H 0 is true ) The probability of making a type II error is β = P ( T (X 1,..., X n ) / C H 1 is true ) 3-11

Error Probabilities For testing two simple hypotheses with critical region C: The probability of making a type I error is α = P θ0 ( T (X1,..., X n ) C ) The probability of making a type II error is β = P θ1 ( T (X1,..., X n ) / C ) For testing composite hypotheses: The probability of making a type I error is ( α = sup P θ T (X1,..., X n ) C ) θ Θ 0 The probability of making a type II error is ( β = sup P θ T (X1,..., X n ) / C ) θ Θ 1 3-12

The Power Function of a Test Definition 3.9 The power function, β(θ), of a statistical test is the probability of rejecting the null hypothesis as a function of the true value θ. If the rejection region is {x : T (x) C} then β(θ) = P θ ( T (x) C ) The ideal power function would be β(θ) = { 0 θ Θ0 1 θ / Θ 0 Such a power function is never possible, however. 3-13

Size and Level of Tests Note that the size of a test is given by α = sup β(θ) θ Θ 0 Sometimes it is not possible to construct a test with size α and so we instead consider tests with level α. Definition 3.10 Consider a test of H 0 : θ Θ 0 with power function β(θ). test is said to be a level α test if The sup β(θ) α θ Θ 0 3-14

Unbiased Tests Definition 3.11 Suppose that we are testing H 0 : θ Θ 0 V H 1 : θ Θ 1 based on a test with power function β(θ). Then the test is said to be an unbiased test if, for every θ 0 Θ 0 and θ 1 Θ c 0, β(θ 1 ) β(θ 0 ). Note that, a size α test is unbiased if, and only if β(θ) α for every θ Θ c 0 3-15

Uniformly Most Powerful Tests We cannot simultaneously minimize the probabilities of Type I and Type II errors. Instead we usually control the size or level of the test to be some value α and then try to minimize the probability of Type II errors in the class of all tests with level α. If a single test has the lowest probability of Type II error (highest power) for all possible true values of the parameter then it is called the uniformly most powerful test. 3-16

Uniformly Most Powerful Tests Definition 3.12 A test of H 0 : θ Θ 0 V H 1 : θ Θ 1 based on the critical region C is said to be the uniformly most powerful (UMP) test of level α if 1. sup θ Θ 0 P θ (X C) α. 2. For any other critical region D with we have sup P θ (X D) α θ Θ 0 P θ (X C) P θ (X D) for all θ Θ 1. 3-17

Neymann Pearson Theorem Theorem 3.2 Suppose that we are testing the simple hypotheses H 0 : θ = θ 0 V H 1 : θ = θ 1. Let L(θ x) be the likelihood for the parameters and let C be a subset of the sample space such that P θ0 (X C) = α and there exists a constant k > 0 such that and L(θ 0 x) L(θ 1 x) L(θ 0 x) L(θ 1 x) k for all x C, > k for all x / C. Then the test with critical region C is the most powerful test among all tests of level α. 3-18

Neymann Pearson Theorem Corollary 3.2.1 Suppose that we are testing the simple hypotheses H 0 : θ = θ 0 V H 1 : θ = θ 1. Let T (X) be a sufficient statistic for θ and suppose that g(t θ) is the sampling pdf (or pmf) of T when θ is the true parameter value. Then the most powerful test has critical region R (a subset of the sample space of T ) satisfying for some fixed k 0. t R if g(t θ 1 ) kg(t θ 0 ) and t R if g(t θ 1 ) < kg(t θ 0 ) The size of the test is α = P θ0 (T R). 3-19

Uniformly Most Powerful Tests UMP tests of level α generally do not exist! A situation where this is often the case is for two-sided tests of the form H 0 : θ = θ 0 V H 1 : θ θ 0. For one-sided tests of the form H 0 : θ θ 0 V H 1 : θ < θ 0 or H 0 : θ θ 0 V H 1 : θ > θ 0. we can sometimes find a UMP test. 3-20

Karlin Rubin Theorem Definition 3.13 A family of distributions with pdf (or pmf) f(x θ) is said to have a monotone likelihood ratio if there exists a statistic t = T (x) such that for every pair θ 1 > θ 2 is an increasing function of t. L(θ 1 x) L(θ 2 x) = g(t θ 1) g(t θ 2 ) Theorem 3.3 Suppose that the family of distributions f(x θ) has a monotone likelihood ratio. Then the uniformly most powerful size α test of H 0 : θ θ 0 V H 1 : θ > θ 0 has critical region C = {x : T (x) > t 0 )} where t 0 is such that P θ0 (T > t 0 ) = α. 3-21

p-values of Tests Definition 3.14 A p-value p(x) of H 0 : θ Θ 0 is a test statistic such that 1. 0 p(x) 1 for every sample point x. 2. P θ (p(x) α) α for every θ Θ 0 and every 0 α 1. Small values of p(x) give evidence against H 0. Typically the p-value is calculated from a test statistic although it is a test statistic in its own right. 3-22

p-values of Tests Theorem 3.4 Suppose that we wish to test H 0 : θ Θ 0 V H 1 : θ / Θ 0 and let W (X) be a test statistic such that large values of W give evidence against H 0 in favour of H 1. For every sample point x define ( ) p(x) = sup P θ W (X) W (x) θ Θ 0 Then p(x) is a valid p-value. 3-23

Decision Theory Decision theory is a method of doing inference based on specifying a how much of a loss incorrect decisions can produce. The method is applicable to all forms of inference. It is widely used in Bayesian inference which we shall examine later but can equally well be applied in frequentist inference. Here will shall briefly introduce the concept and its application in hypothesis testing where it seems most natural. 3-24

Decision Rules Definition 3.15 Suppose X 1,..., X n is a random sample and we wish to make inference on a parameter θ. A decision rule δ(x 1,..., X n ) specifies what decision we would take based on the sample. In point estimation, a decision rule is just an estimator. In hypothesis testing suppose that we have a rejection region C then a decision rule could be { Reject H0 if X C δ(x) = Do not RejectH 0 if X / C For convenience we will label these two decision a 0 and a 1 respectively. 3-25

Loss Functions Definition 3.16 A loss function L(θ, δ) is a function of the parameter θ and the decision rule δ(x) and specifies what loss is incurred in using the decision rule δ(x) when θ is the true parameter value. The loss function is specified by the analyst and should be chosen to reflect the seriousness of errors in inference. In estimation common loss functions are Absolute error loss: L(θ, ˆθ) = ˆθ θ. Squared error loss L(θ, ˆθ) = (ˆθ θ) 2. Both of these loss functions are symmetric about the estimator but there is no need for that in general. If overestimation is considered more serious than underestimation for example then the loss function could reflect that. 3-26

Loss Function for Hypothesis Testing The only losses that can be made in a hypothesis testing framework is in making a Type I or Type II error. One very simple loss function is called 0 1 loss L(θ, a 0 ) = { 0 θ Θ0 1 θ / Θ 0 and L(θ, a 1 ) = { 1 θ Θ0 0 θ / Θ 0 This loss function can be generalized if we do not consider Type I and Type II errors to be equally bad. L(θ, a 0 ) = { 0 θ Θ0 C II θ / Θ 0 and L(θ, a 1 ) = { CI θ Θ 0 0 θ / Θ 0 3-27

General Loss Functions for Hypothesis Testing More general loss functions can take into account that the cost of a type I or type II error may be different depending on the value of θ. Consider the one-sided test H 0 : θ θ 0 V H 1 : θ > θ 0. In this case we may consider a loss function of the type L(θ, a 0 ) = L(θ, a 1 ) = { 0 θ θ0 c II (θ θ 0 ) θ > θ 0 { ci (θ 0 θ) θ θ 0 0 θ > θ 0 If deviations when we reject H 0 are more serious than those when we fail to reject H 0 we could have different functions of θ θ 0 in the two parts. 3-28

The Risk Function Definition 3.17 The risk function of a decision rule δ(x) is the expected value of the loss function. R(θ, δ) = E θ [ L(θ, δ(x)) ]. The risk function will depend on the true value θ and what decision rule and loss function we have specified for the problem. Often the decision rule is chosen to minimize the risk. Doing this uniformly for all possible θ is generally not possible but it can be in certain classes. 3-29

Risk Function for Hypothesis Testing Suppose we have a test procedure δ(x) as defined on Page 78 with corresponding power function β(θ) = P θ (X C). The risk function is given by R(θ, δ) = L(θ, a 0 )P θ (δ(x) = a 0 ) + L(θ, a 1 )P θ (δ(x) = a 1 ) = L(θ, a 0 )P θ (X / C) + L(θ, a 1 )P θ (X C) = L(θ, a 0 )(1 β(θ)) + L(θ, a 1 )β(θ) For a generalized 0-1 loss function this becomes R(θ, δ) = { CI β(θ) θ Θ 0 C II (1 β(θ)) θ / Θ 0 3-30

Minimizing the Risk Function For the generalized 0-1 loss function the issue of minimizing risk for a test of a given size is essentially the same problem as maximizing power. In the Neymann Pearson set up we would have { CI α θ = θ 0 R(θ, δ) = = C II (1 β(θ)) θ = θ 1 so the minimum risk test is the same as the most powerful test. In general the issue of minimizing risk is highly related to that of maximizing power although the specific form of the loss function will also play a key role. 3-31