Hypothesis Testing - Frequentist

Similar documents
Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Recall the Basics of Hypothesis Testing

Primer on statistics:

A Very Brief Summary of Statistical Inference, and Examples

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Statistics for the LHC Lecture 1: Introduction

Hypothesis testing (cont d)

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistical Data Analysis Stat 3: p-values, parameter estimation

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Lecture 21. Hypothesis Testing II

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

P Values and Nuisance Parameters

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistical Methods for Particle Physics (I)

A Very Brief Summary of Statistical Inference, and Examples

Detection Theory. Composite tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Chapter 9: Hypothesis Testing Sections

OVERVIEW OF PRINCIPLES OF STATISTICS

STAT 830 Hypothesis Testing

Hypothesis testing: theory and methods

Ch. 5 Hypothesis Testing

Review. December 4 th, Review

Statistical Methods in Particle Physics

Optimal Tests of Hypotheses (Hogg Chapter Eight)

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Detection theory. H 0 : x[n] = w[n]


simple if it completely specifies the density of x

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

STAT 830 Hypothesis Testing

Frequentist Statistics and Hypothesis Testing Spring

Hypothesis Testing: The Generalized Likelihood Ratio Test

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

ECE531 Lecture 4b: Composite Hypothesis Testing

Introduction to Likelihoods

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Lecture 8: Information Theory and Statistics

Lecture 21: October 19

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

44 CHAPTER 2. BAYESIAN DECISION THEORY

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Chapter 7. Hypothesis Testing

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Practice Problems Section Problems

Use of the likelihood principle in physics. Statistics II

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Answers and expectations

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 3: Interval Estimation

Lecture 7 Introduction to Statistical Decision Theory

F79SM STATISTICAL METHODS

Topic 19 Extensions on the Likelihood Ratio

Statistical Inference

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Brief Review on Estimation Theory

Detection and Estimation Chapter 1. Hypothesis Testing

14.30 Introduction to Statistical Methods in Economics Spring 2009

Mathematical Statistics

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

32. STATISTICS. 32. Statistics 1

Multivariate statistical methods and data mining in particle physics

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

14.30 Introduction to Statistical Methods in Economics Spring 2009

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Techniques

7. Estimation and hypothesis testing. Objective. Recommended reading

STAT 801: Mathematical Statistics. Hypothesis Testing

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

On the GLR and UMP tests in the family with support dependent on the parameter

Testing Statistical Hypotheses

Transcription:

Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating from each of two hypotheses. The two hypotheses are traditionally called: H 0 : the null hypothesis, and H 1 : the alternative hypothesis. As usual, we know P(data H 0 ) and P(data H 1 ). If W is the space of all possible data, we must find a Critical Region w W (in which we reject H 0 ) such that P(data w H 0 ) = α (chosen to be small), and at the same time, P(data (W w) H 1 ) = β is made as small as possible. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 1 / 1

Frequentist Simple and Composite Hypotheses Simple Hypotheses When the hypotheses H 0 and H 1 are completely specified (with no free parameters), they are called simple hypotheses. The theory of hypothesis testing for simple hypotheses is well developed, holds for large or small samples. Composite Hypotheses When a hypothesis contains one or more free parameters, it is a composite hypothesis, for which there is only an asymptotic theory. Unfortunately, real problems usually involve composite hypotheses. Fortunately, we can get exact answers for small samples using Monte Carlo. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 2 / 1

Frequentist - Simple Hypotheses Errors of first and second kinds The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE Acceptance Contamination X w ACCEPT good Error of the H 0 second kind Prob = 1 α Prob = β Loss Rejection X w REJECT Error of the good (critical H 0 first kind region) Prob = α Prob = 1 β F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 3 / 1

Frequentist - Simple Hypotheses Errors of first and second kinds The usefulness of a test depends on its ability to discriminate against the alternative hypothesis H 1. The measure of this usefulness is the power of the test, defined as the probability 1 β of X falling into the critical region if H 1 is true: P(X w H 1 ) = 1 β. (1) In other words, β is the probability that X will fall in the acceptance region for H 0 if H 1 is true: P(X W w H 1 ) = β. In practice, the determination of a multidimensional critical region may be difficult, so one often chooses a single test statistic t(x ) instead. Then the critical region becomes w t and we have: P(t w t H 0 ) = α F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 4 / 1

Frequentist - Simple Hypotheses Example: Separation of two classes of events Problem: Distinguish elastic proton scattering events from inelastic scattering events pp pp (the hypothesis under test, H 0 ) pp ppπ (the alternative hypothesis, H 1 ), in an experimental set-up which measures the proton trajectories, but where the π 0 cannot be detected directly. The obvious choice for the test statistic is the missing mass (the total rest energy of all unseen particles) for the event. Knowing the expected distributions of missing mass for each of the two hypotheses, we can obtain the curves shown in the next slide which allow us to choose a critical region for missing mass M: F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 5 / 1

Frequentist - Simple Hypotheses Resolution functions for the missing mass M under the hypotheses H 0 and H 1, with critical region M > M c. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 6 / 1 Example: Separation of two classes of events P (M) P (M H 0 ) α 0 M 1 M c M π 0 M P (M H 1 ) β 0 M 1 M c M π 0 M

Frequentist - Simple Hypotheses Comparison of Tests: Power functions If the two hypotheses under test can be expressed as two values of some parameter θ: H 0 : θ = θ 0 H 1 : θ = θ 1 and the power of the test is p(θ 1 ) = P(X w θ 1 ) = 1 β. It is of interest to consider p(θ), as a function of θ 1 = θ. [Here H 1 is still considered a simple hypothesis, with θ fixed, but at different values.] p(θ) is called a power function. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 7 / 1

Frequentist - Simple Hypotheses Comparison of Tests: Most powerful 1 power p(θ) β B(θ 1){ B A C α 0 θ 0 θ Power functions of tests A, B, and C at significance level α. Of these tests, B is the best for θ > θ. For smaller values of θ, C is better. θ 1 θ 1 power p(θ) U B A C α 0 θ 0 Power functions of four tests, one of which (U) is Uniformly Most Powerful. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 8 / 1 θ

Frequentist - Simple Hypotheses Power function for a consistent test as a function of N. As N increases, it tends to a step function. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 9 / 1 Tests of Hypotheses: Consistency A test is said to be consistent if the power tends to unity as the number of observations tends to infinity: lim P(X w α H 1 ) = 1, N where X is the set of N observations, and w α is the critical region, of size α, under H 0. 1 N power p(θ) α 0 θ 0 θ

Frequentist - Simple Hypotheses Tests of Hypotheses: Bias Consider the power curve which does not take on its minimum at θ = θ 0. In this case, the probability of accepting H 0 : θ = θ 0 is greater when θ = θ 1 than when θ = θ 0, or 1 β < α That is, we are more likely to accept the null hypothesis when it is false than when it is true. Such a test is called a biased test. 1 power p(θ) α 0 θ 0 θ 1 A B θ 2 θ Power functions for biased (B) and unbiased (A) tests. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 10 / 1

Frequentist - Simple Hypotheses Choice of Tests For a given test, one must still choose a value of α or β. It is instructive to look at different tests in the α β plane. 1 β B A C Unbiased tests must lie below the dotted line. N-P is the most powerful test. N-P B A 0 0 α 1 α 1 F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 11 / 1

Frequentist - Simple Hypotheses The Neyman-Pearson Test Of all tests for H 0 against H 1 with significance α, the most powerful test is that with the best critical region in X -space, that is, the region with the smallest value of β. Suppose that the random variable X = (X 1,..., X N ) has p.d.f. f N (X θ 0 ) under θ 0, and f N (X θ 1 ) under θ 1. From the definitions of α and the power (1 β), we have f N (X θ 0 )dx = α w α 1 β = f N (X θ 1 )dx. w α f N (X θ 1 ) 1 β = w α f N (X θ 0 ) f N(X θ 0 )dx ( ) fn (X θ 1 ) = E wα f N (X θ 0 ) θ =θ 0. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 12 / 1

Frequentist - Simple Hypotheses The Neyman-Pearson Test 1 β = E wα ( f N (X θ 1 ) f N (X θ 0 ) θ=θ0 Clearly this will be maximal if and only if w α is that fraction α of X -space containing the largest values of f N (X θ 1 )/f N (X θ 0 ). Thus the best critical region w α consists of points satisfying The procedure leads to the criteria: ) l N (X, θ 0, θ 1 ) f N(X θ 1 ) f N (X θ 0 ) c α, if l N (X, θ 0, θ 1 ) > c α choose H 1 : f N (X θ 1 ) if l N (X, θ 0, θ 1 ) c α choose H 0 : f N (X θ 0 ). This is the Neyman Pearson test. The test statistic l N is essentially the ratio of the likelihoods for the two hypotheses, and this ratio must be calculable at all points X of the observable space. The two hypotheses H 0 and H 1 must therefore be completely specified simple hypotheses, and then this gives the best test. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 13 / 1

Frequentist - Composite Hypotheses Composite Hypotheses The theory above applies only to simple hypotheses. Unfortunately, we often need to test hypotheses which contain unknown free parameters, composite hypotheses, such as: or H 0 : θ 1 = a, θ 2 = b H 1 : θ 1 a, θ 2 b H 0 : θ 1 = a, θ 2 unspecified H 1 : θ 1 = b, θ 2 unspecified F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 14 / 1

Frequentist - Composite Hypotheses Existence of Optimal Tests For composite hypotheses there is in general no UMP test. However, when the pdf is of the exponential form, there is a result similar to Darmois Theorem for the existence of sufficient statistics. If X 1 X N are independent, identically distributed random variables with a p.d.f. of the form F (X )G(θ) exp[a(x )B(θ)], where B(θ) is strictly monotonic, then there exists a UMP test of H 0 : θ = θ 0 against H 1 : θ > θ 0. (Note that this test is only one-sided) F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 15 / 1

Frequentist - Composite Hypotheses One-sided and two-sided tests When the test involves the value of one parameter θ, we can have a one-sided test of the form H 0 : θ = θ 0 H 1 : θ > θ 0 or a two-sided test of the form H 0 : θ = θ 0 H 1 : θ θ 0 For two-sided tests, no UMP test generally exists as can be seen from the power curves shown below: Test 1 + is UMP for θ > θ 0 Test 1 is UMP for θ < θ 0 Test 2 is the (two-sided) sum of tests 1 + and 1. At the same significance level α, Test 2 is clearly less powerful than Test 1 + on one side and Test 1 on the other. 1 1 p(θ) α 2 1 + 0 θ 0 θ F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 16 / 1

Frequentist - Composite Hypotheses Maximizing Local Power If no UMP test exists, an important alternative is to look for a test which is most powerful in the neighbourhood of the null hypothesis. Then we have H 0 : θ = θ 0 H 1 : θ = θ 0 +, ( small) Expanding the log-likelihood we have ln L(X, θ 1 ) = ln L(X, θ 0 ) + ln L θ +. θ=θ0 If we now apply the Neyman Pearson lemma to H 0 and H 1, the test is of the form: ln L(X, θ 1 ) ln L(X, θ 0 ) > < c α. or ln L θ < > kα. θ=θ0 F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 17 / 1

Frequentist - Composite Hypotheses maximizing local power (cont.) If the observations are independent and identically distributed, then ( ) ln L E θ = 0 θ=θ0 [ ( ) ] ln L 2 E = +NI, θ where N is the number of observations and I is the information matrix. Under suitable conditions ln L/ θ is approximately Normal. Hence a locally most powerful test is approximately given by ln L θ > < λα NI. θ=θ0 F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 18 / 1

Frequentist - Composite Hypotheses Likelihood Ratio Test This is the extension of the Neyman-Pearson Test to composite hypotheses. Unfortunately, its properties are known only asymptotically. Let the observations X have a distribution f (X θ), depending on parameters, θ = (θ 1, θ 2,...). Then the likelihood function is L(X θ) = N f (X i θ). i=1 In general, let the total θ-space be denoted θ, and let ν be some subspace of θ, then any test of parametric hypotheses (of the same family) can be stated as H 0 : θ ν H 1 : θ θ ν F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 19 / 1

Frequentist - Composite Hypotheses Likelihood Ratio Test (cont.) We can then define the maximum likelihood ratio, a test statistic for H 0 : max L(X θ) θ ν λ = L(X θ). max θ θ If H 0 and H 1 were simple hypotheses, λ would reduce to the Neyman-Pearson test statistic, giving the UMP test. For composite hypotheses, we can say only that λ is always a function of the sufficient statistic for the problem, and produces workable tests with good properties, at least for large sets of observations. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 20 / 1

Frequentist - Composite Hypotheses Likelihood Ratio Test (cont.) The maximum likelihood ratio is then the ratio between the value of L(X θ), maximized with respect to θ j, j = 1,..., s, while holding fixed θ i = θ io, i = 1,..., r, and the value of L(X θ), maximized with respect to all the parameters. With this notation the test statistic becomes λ = max θs max θr,θs or λ = L(X θ r0, θ s ) L(X θ r, θ s). L(X θ r0, θ s ) L(X θ r, θ s ) ( ) correct your book p. 271, Eq(10.11) where θ s is the value of θ s at the maximum in the restricted θ region and θ r, θ s are the values of θ r, θ s at the maximum in the full θ region. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 21 / 1

Frequentist - Composite Hypotheses Likelihood Ratio Test (cont.) The importance of the maximum likelihood ratio comes from the fact that asymptotically: if H 0 imposes r constraints on the s + r parameters in H 0 and H 1, then 2 ln λ is distributed as χ 2 (r) under H 0 This means we can read off the confidence level α from a table of χ 2. However, the bad news is that this is only true asymptotically, and there is no good way to know how good the approximation is except to do a Monte Carlo calculation. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 22 / 1

Frequentist - Composite Hypotheses Likelihood Ratio Test - Example Problem: Find the ratio X of two complex decay amplitudes: X = A(reaction 1) A(reaction 2). In the general case, X may be any complex number, but there exist three different theories which predict the following for X : A: If Theory A is valid, X = 0. B: If Theory B is valid, X is real and Im(X ) = 0. C: If Theory C is valid, X is purely imaginary and non-zero. We decide that the value of X is interesting only in so far as it could distinguish between the hypotheses A, B, C or the general case. Therefore, we are doing hypothesis testing, not parameter estimation. Hypothesis A is simple, Hypothesis B is composite, including hypothesis A as a special case. Hypothesis C is also composite, and separate from A and B. The alternative to all these is that Re (X ) and Im (X ) are both non-zero. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 23 / 1

Frequentist - Composite Hypotheses Likelihood Ratio Test - Example The contours of the log-likelihood function ln L(X ) near its maximum. X = d is the point where ln L is maximal. X = b is the maximum of ln L when Im(X ) = 0. X = c is the maximum of ln L when Re(X ) = 0. Im(X) ln L = ln L(ˆθ) 2 d b c 0 Re(X) ln L = ln L(ˆθ) 1/2 F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 24 / 1

Frequentist - Composite Hypotheses Likelihood Ratio Test - Example The maximum likelihood ratio for hypothesis A versus the general case is λ a = L(0) L(d). If hypothesis A is true, 2 ln λ a correct book p.276, line 5 is distributed asymptotically as a χ 2 (2), and this give the usual test for Theory A. To test Theory B, the m.l. ratio for hypothesis B versus the general case is λ b = L(b) L(d). If B is true, 2 ln λ b is distributed asymptotically as a χ 2 (1). Finally, Theory C can be tested in the same way, using L(c) in place of L(b). F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 25 / 1

Bayesian Hypothesis Testing - Bayesian Recall that according to Bayes Theorem: P(hyp data) = P(data hyp)p(hyp) P(data) The normalization factor P(data) can be determined for the case of parameter estimation, where all the possible values of the parameter are known, but in hypothesis testing it doesn t work, since we cannot enumerate all possible hypotheses. However it can be used to find the ratio of probabilities for two hypotheses, since the normalizations cancel: R = P(H 0 data) P(H 1 data) = L(H 0)P(H 0 ) L(H 1 )P(H 1 ) F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 26 / 1

Bayesian Hypothesis Testing - Bayesian But how do we interpret the ratio of two probabilities? This would be obvious for frequentist probabilities, but what is a ratio of beliefs? If R = 2.5, for example, it means that we believe in H 0 2.5 times as much as we believe in H 1. How can we use the value of R? For betting on H 0. It gives us directly the odds we can accept if we want to bet on H 0 against H 1. Note that R is proportional to the ratio of prior probabilities P(H 0 )/P(H 1 ), so there is no way to make the result insensitive to the priors. F. James (CERN) Statistics for Physicists, 4: Hypothesis Testing April 2012, DESY 27 / 1