Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Similar documents
Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Lecture 12 November 3

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Hypothesis Testing Chap 10p460

Mathematical Statistics

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Chapter 4. Theory of Tests. 4.1 Introduction

Hypothesis testing (cont d)

Topic 15: Simple Hypotheses

A Very Brief Summary of Statistical Inference, and Examples

Lecture 21: October 19

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Hypothesis testing: theory and methods

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Statistical Inference

Testing Statistical Hypotheses

Chapters 10. Hypothesis Testing

Topic 10: Hypothesis Testing

Nuisance parameters and their treatment

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

8: Hypothesis Testing

STAT 830 Hypothesis Testing

Testing Statistical Hypotheses

A Very Brief Summary of Statistical Inference, and Examples

Topic 10: Hypothesis Testing

Summary of Chapters 7-9

Chapters 10. Hypothesis Testing

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Statistical Inference

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

STAT 830 Hypothesis Testing

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 )

Hypothesis Testing - Frequentist

Chapter 7. Hypothesis Testing

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Math 494: Mathematical Statistics

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Fundamental Probability and Statistics

Lecture notes on statistical decision theory Econ 2110, fall 2013

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Lecture 7 Introduction to Statistical Decision Theory

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

simple if it completely specifies the density of x

Lecture 8: Information Theory and Statistics

LECTURE 5 HYPOTHESIS TESTING

Review. December 4 th, Review

MTMS Mathematical Statistics

Mathematical statistics

Ch. 5 Hypothesis Testing

Detection and Estimation Chapter 1. Hypothesis Testing

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The Purpose of Hypothesis Testing

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

8 Testing of Hypotheses and Confidence Regions

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota.

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Hypothesis Testing: The Generalized Likelihood Ratio Test

Statistical Methods for Particle Physics (I)

Statistical Data Analysis Stat 3: p-values, parameter estimation

Topic 17 Simple Hypotheses

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

Tests and Their Power

STAT 801: Mathematical Statistics. Hypothesis Testing

Institute of Actuaries of India

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

hypothesis testing 1

Announcements. Proposals graded

Detection theory. H 0 : x[n] = w[n]

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Topic 3: Hypothesis Testing

Lecture 13: p-values and union intersection tests

2.5 Hypothesis Testing

Notes on Decision Theory and Prediction

DA Freedman Notes on the MLE Fall 2003

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

Psychology 282 Lecture #4 Outline Inferences in SLR

Some General Types of Tests

TUTORIAL 8 SOLUTIONS #

Statistics for the LHC Lecture 1: Introduction

Statistical Inference

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

4.5.1 The use of 2 log Λ when θ is scalar

Lecture 8: Information Theory and Statistics

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks


2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Topic 17: Simple Hypotheses

Transcription:

Hypothesis Testing BS2 Statistical Inference, Lecture 11 Michaelmas Term 2004 Steffen Lauritzen, University of Oxford; November 15, 2004

Hypothesis testing We consider a family of densities F = {f(x; θ), θ Θ} on a sample space X and wish to test the null hypothesis vs. the alternative hypothesis where Θ 0 Θ. H 0 : θ Θ 0 θ Θ A = Θ \ Θ 0, Within the Neyman Pearson theory, a (non-randomised) test is determined by a critical region C, so that H 0 is rejected if X C and H 0 is accepted if X C.

Power and level of significance The probability of rejecting the hypothesis in a test with critical region C is the power function: φ(θ) = P θ (X C). The size or significance level α of the test is the largest possible probability of making an error of type I, i.e. of rejecting a true null hypothesis: α = sup θ Θ 0 P θ (X C) = sup θ Θ 0 φ(θ). Thus the size of the test is the maximal power under the null hypothesis.

If θ Θ A, the probability of making an error of type II is β = β(θ) = 1 φ(θ) i.e. the probability of accepting H 0 when H 0 is false.

Constructing critical regions A common way of constructing critical regions is to use a test statistic T = t(x) in such a way that large values of T are good indicators of the null hypothesis being false. We then construct a test with critical region C = {x t(x) > t 0 }. If we choose the critical value t 0 to satisfy we get a test of size α. α = sup θ Θ 0 P θ (T t 0 )

p-values and significance levels The p-value when t(x) = t has been observed is p = sup θ Θ 0 P θ (T t) i.e. the largest possible probability of obtaining a value of T which is at least as extreme as the observed, under the assumption that the null hypothesis is true. If the p-value is very small this will then be taken as strong evidence against the null hypothesis, corresponding to rejecting the hypothesis exactly when p α. In this sense, the p-value of a test outcome is the largest possible size needed to accept the null hypothesis.

Example Consider a sample X = (X 1,..., X n ) from a normal distribution N (µ, σ 2 ), with µ and σ 2 > 0 unknown sand consider the hypothesis H 0 : σ 2 µ 2 vs. the alternative H 0 : σ 2 > µ 2. One could directly choose to use the test-statistic where X = 1 n t(x) = S 2 / X 2 Xi, S 2 = 1 n 1 (X i X) 2. Note that t (X) = S 2 k X 2 might as well have been chosen but this leads to quite different critical regions. i

Optimal testing The Neyman Pearson theory is concerned with finding optimal tests, i.e. optimal critical regions or, equivalently, optimal test statistics. Most commonly one attempts to maximize power under the alternative while keeping the size under control or, in other words, minimize the type II error probability while controlling the probability of an error of type I. A more symmetric approach is to minimize a given linear combination of type I and type II error probabilites, but this leads essentially to the same testing procedures.

Controversy between Fisher and Neyman One of the deepest controversies in the history of statistics is that between J. Neyman and R. A. Fisher which went on as long as both were alive. The fierce discussions between these two giants of statistical science were held in an ever more unpleasant and relentless tone, orally as well as in public writing. The dispute was concerned with many aspects of statistics, but in particular with that of hypothesis testing. This divide is, in my personal opinion, much deeper than that between Bayesian and Frequentist inference.

Cournot s principle Briefly and inadequately, the dispute on testing was rooted in Fisher s strong opposed to the mechanical acceptance/rejection which plays a dominating role in the Neyman Pearson theory. Fisher would rather report the p-values and then let the scientist interpret the evidence. The p-value itself may best be interpreted according towhat we could call Cournot s principle: Events with small probability do not happen! Thus, if p is small, the null hypothesis cannot be sustained and must therefore be considered as falsified.

Testing in different contexts Personally, I find point of view behind the Neyman Pearson theory appropriate when testing has to be performed repeatedly, for example in quality control, say when a low propotion of defective items must be maintained by routine inspection procedures, and decisions about accepting or rejecting batches of items must be taken. For scientific inference, the decision aspect is less prominent and I find concepts such as critical region, power and size to be less illuminating. Hypothesis testing is made in many different contexts and each context may emphasize particular aspects of the mathematical theory as being more and less relevant.

Neyman Pearson lemma This fundamental lemma in the theory of hypothesis testing is concerned with the case of a simple hypothesis having Θ 0 = {θ 0 } vs. a simple alternative Θ A = {θ 1 }. In this case it holds that any test with critical region of the form { C = x L(θ } 1; x) L(θ 0 ; x) > K is optimal of its size, i.e. for fixed size α = P θ0 (C), it has maximal power φ(θ 1 ) = P θ1 (C) under the alternative. In other words, if the observation is much more likely under the alternative hypothesis than under the null, we reject the null hypothesis. This test is the Likelihood Ratio Test (LRT).

LRT in exponential families The Likelihood Ratio Test has a simple form in a one-dimensional exponential family. We get log L(θ 1; x) L(θ 0 ; x) = (θ 1 θ 0 )t(x) + c(θ 0 ) c(θ 1 ). So, if θ 1 > θ 0 the critical region has the form C = {x t(x) > t 0 }, where T = t(x) is the canonical sufficient statistic. If θ 1 < θ 0 the inequality in the expression for the critical region must be reversed.

One-sided hypothesis and alternative Two important issues about the LRT for a simple hypothesis vs. a simple alternative should be noted: The critical region has a simple form, i.e. a one-sided interval for the canonical statistic; The critical region does not depend on the specific values (θ 0, θ 1 ) as long as θ 0 < θ 1 (or the converse). It follows that the critical region for the LRT in a one-dimensional canonical exponential family is also optimal for the hypotheses H 0 : θ θ 0 vs. the alternative H A : θ > θ 0, i.e. for a one-sided alternative to a one-sided hypothesis.

Problems with the Neyman Pearson theory A major weakness of the theory is that in other than those very special cases just mentioned, there is typically no optimal (Uniformly Most Powerful) test. This has been sought remedied by demanding in addition that a test should be unbiased α-similar φ(θ) α for θ Θ A, φ(θ) = α for all θ Θ 0, or have certain invariance properties. These additional demands do to some extent ensure the existence of optimal tests in many cases, but far from all.