HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Similar documents
STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Lecture 21. Hypothesis Testing II

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Mathematical Statistics

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Institute of Actuaries of India

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Topic 15: Simple Hypotheses

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Hypothesis Testing Chap 10p460

Summary of Chapters 7-9

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Math 494: Mathematical Statistics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

A Very Brief Summary of Statistical Inference, and Examples

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

4.5.1 The use of 2 log Λ when θ is scalar

Chapter 9: Hypothesis Testing Sections

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Chapters 10. Hypothesis Testing

Topic 10: Hypothesis Testing

Lecture 10: Generalized likelihood ratio test

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

14.30 Introduction to Statistical Methods in Economics Spring 2009

Hypothesis Testing: The Generalized Likelihood Ratio Test

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Lecture 21: October 19

STAT 830 Hypothesis Testing

Lecture 12 November 3

Topic 10: Hypothesis Testing

Robustness and Distribution Assumptions

Testing Statistical Hypotheses

STAT 830 Hypothesis Testing

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

simple if it completely specifies the density of x

8: Hypothesis Testing

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

F79SM STATISTICAL METHODS

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Hypothesis Testing - Frequentist

Hypothesis Testing One Sample Tests

Visual interpretation with normal approximation

Hypothesis testing (cont d)


BTRY 4090: Spring 2009 Theory of Statistics

Topic 3: Hypothesis Testing

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Testing and Model Selection

Testing Statistical Hypotheses

ST495: Survival Analysis: Hypothesis testing and confidence intervals

STAT 801: Mathematical Statistics. Hypothesis Testing

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Lecture 8: Information Theory and Statistics

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Lecture 28: Asymptotic confidence sets

STA 2101/442 Assignment 3 1

Composite Hypotheses and Generalized Likelihood Ratio Tests

TUTORIAL 8 SOLUTIONS #

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Lectures 5 & 6: Hypothesis Testing

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Master s Written Examination

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Lecture 8: Information Theory and Statistics

Interval Estimation. Chapter 9

Basic Concepts of Inference

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

Topic 17: Simple Hypotheses

Review. December 4 th, Review

Econometrics. 4) Statistical inference

STAT Chapter 8: Hypothesis Tests

Tests and Their Power

Chapter 5: HYPOTHESIS TESTING

Psychology 282 Lecture #4 Outline Inferences in SLR

Stat 5101 Lecture Notes

Parameter Estimation and Fitting to Data

Lecture 4: Testing Stuff

Central Limit Theorem ( 5.3)

Econ 325: Introduction to Empirical Economics

Statistical Theory MT 2007 Problems 4: Solution sketches

Math 152. Rumbos Fall Solutions to Exam #2

One-sample categorical data: approximate inference

LECTURE 5 HYPOTHESIS TESTING

Transcription:

HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous stats classes. Here, we will explain where this approach comes from and develop new ideas (all within the context of the parametric set-up). 1. Set-Up The basic setup of (the Neyman-Pearson approach to) hypothesis testing is as follows. There are two hypotheses you are trying to decide between: the null (H 0 ) and the alternative (H A ). If a hypothesis fully determines the behaviour (pdf/pmf or other) of the random variables then it is called simple, otherwise it is known as composite. The hypothesis test will reject H 0 in favour of H A if a test statistic T = T (X) falls into a rejection region (RR). We therefore have that: accept H 0 reject H 0 H 0 true Type I error H A true Type II error The probability of a type I error is denoted as α and is also known as the significance level of a test. The probability of a type II error is denoted as β. The power of a test is 1 β; the probability of doing the correct thing under H A. Note that if H A is composite then both β and power depend on the particular member of H A which holds. In this case we will often plot the power function. Ideally we would have that α = β = 0. However, in practice we most often have that decreasing α drives up β and vice versa. 2. Neyman-Pearson The main idea behind Neyman-Pearson is to fix α in advance (choose α to be small) and then to find a test which yields a small value of β. The Neyman-Pearson lemma tells us that the in such a set-up, the likelihood ratio test (LRT) is the most powerful of all the possible tests. This only works for two simple hypotheses. Date: November 25, 2007. 1

2 HYPOTHESIS TESTING: FREQUENTIST APPROACH. Thus, assume that H 0 and H A are both simple, and let f 0 (x) denote the pdf/pmf (likelihood) of the data under H 0 (and f A (x) under H A ). The LRT is the test which rejects if f 0 (x) f A (x) < c, where c is chosen in such a way so that P (reject) = α. Lemma 2.1 (Neyman-Pearson Lemma). Any other test with significance level α α has power less than or equal to that of the likelihood ratio test. First of all note that this is a very sensible thing to do (we reject H 0 if the data has a bigger likelihood under H A ). Thus, the basic idea is similar to that of maximum likelihood estimation. We next need to take the LRT test and translate it into something easier to handle. Example. ESP example (Bernoulli, sample size is 10). We have that P (T otal 6 H 0 ) = 0.02, and that P (T otal 5 H 0 ) = 0.078 (for sample size 10), we therefore cannot choose α = 0.05 exactly. We will choose the rejection region to be {6, 7, 8, 9, 10}. In this case, the power function is given in Figure 1. The code used to generate Figure 1 in R was: x<-rep(0,250) for(i in 1:250) { x[i]<-1-pbinom(5,10,0.25+i/250*0.75) } plot(x) What happens if we do n independent tests at the same time? Example. Population = exponential. Example. Population = normal, variance known. 3. P-values Performing an α-level test is not very informative as to the amount of information for/against the alternative hypothesis. The quantity that does allow us to measure this is the p-value. The p-value is defined as the smallest value of α for which the null hypothesis will be rejected. Typically it is calculated as the probability of obtaining a test statistic as or more extreme than what was actually observed. Extreme is dictated by the form of the rejection region. For a specific example, in the ESP Bernoulli case, if

HYPOTHESIS TESTING: FREQUENTIST APPROACH. 3 x 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 200 250 Index Figure 1. Power function in ESP example. we observed 5 total successes, then since we reject for T = total large, the p-value is calculated as P (T 5 H 0 ). Imho, you should always report the p-value for a hypothesis test in your research. Example. Under the null hypothesis show that the distribution of the p-value is Uniform[0, 1].

4 HYPOTHESIS TESTING: FREQUENTIST APPROACH. 4. Generalized Likelihood Ratio Test (GLRT) The LRT is optimal for testing a simple hypothesis against a simple hypothesis. However, often we wish to compare simple vs. composite or two composite hypotheses. As the name implies, the generalized LRT is a generalization of the LRT which allows us to handle composite hypotheses. Although no optimality results exist for the generalized version, we do have some nice asymptotic results, and it is easily motivated as a natural extension of the LRT. The set up for the GLRT is as follows. Let f(x θ) denote the pdf/pmf of the data if the parameter θ (possibly multivariate) is known. Notice that f(x θ) is actually the likelihood. The null hypothesis specifies that θ Θ 0 and the alternative says that θ Θ A. We let Θ denote Θ 0 Θ A. The GLRT rejects the null if Λ = max θ Θ 0 f(x θ) max θ ΘA f(x θ) is small. Indeed, this is very reasonable. In practice however it is often easier to work with Λ = max θ Θ 0 f(x θ) max θ Θ f(x θ) and reject H 0 if this is small. Since Λ = min(λ, 1) both versions actually do the same thing. We take the latter as our official definition of the GLRT. Example. Two-sided normal, unknown variance. Theorem 4.1. Under smoothness assumptions on the underlying pdf/pmf, the null distribution of 2 log Λ converges to a χ 2 distribution with degrees of freedom equal to dim Θ dim Θ 0 as the sample size tends to infinity. Since we reject for small values of Λ, we would reject for large values of 2 log Λ. Example. Compare this to the two-sided normal, unknown variance. 5. Power In the previous sections we have really avoided the issue of power. The LRT chooses the test with the highest power for a fixed significance level, but what if this isn t good enough? In practice it often is the case that increasing sample size increases power. The following examples are designed to illustrate this. Example. Normal, variance known. Example. Normal, variance unknown. (To calculate power, approximate using normal!)

HYPOTHESIS TESTING: FREQUENTIST APPROACH. 5 6. Duality of Confidence Intervals and Hypothesis Tests. A confidence interval (or set, in general) can be obtained by inverting a hypothesis test and vice versa. Example. Normal with known variance. Theorem 6.1. Suppose that for every value θ 0 in Θ there is a test at level α of the hypothesis H 0 : θ = θ 0. Denote the acceptance region of the test as A(θ 0 ). Then the set C(X) = {θ : X A(θ)} is a 100(1 α)% confidence region for θ. In words, a 100(1 α)% confidence region for θ consists of all those values of θ 0 for which the hypothesis that θ = θ 0 will not be rejected at level α. Theorem 6.2. Suppose that C(X) is a 100(1 α)% confidence region for θ: that is, for every θ 0 P (θ 0 C(X) θ = θ 0 ) = 1 α. Then an acceptance region for a test at level α of the hypothesis H 0 : θ = θ 0 is A(θ 0 ) = {X θ 0 C(X)}. In words, this says that the hypothesis that θ = θ 0 is accepted if θ 0 lies in the confidence region. This duality works exactly for the t-test and z-tests and associated confidence intervals. For other tests that are typically used (eg. testing a proportion or LRT for the Poisson, say) the typical tests do not invert exactly to the confidence interval and vice versa. This is not because duality fails in these cases, but because the test used is not an exact inversion of the confidence set. Example. Suppose a t-test rejects the two-sided hypothesis test for µ = 0 at the 5% level. Would the 90% CI contain zero? References [R] Rice, J.; Mathematical Statistics and Data Analysis Duxbury Press, 2nd Edition, 1995. Prepared by Hanna Jankowski Department of Statistics, University of Washington Box 354322, Seattle, WA 98195-4322 U.S.A. e-mail: hanna@stat.washington.edu