Hypothesis testing (cont d)

Similar documents
Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Some Statistical Tools for Particle Physics

Statistics for the LHC Lecture 2: Discovery

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Primer on statistics:

Statistical Methods for Particle Physics (I)

Statistics Challenges in High Energy Physics Search Experiments

Hypothesis Testing - Frequentist

Recent developments in statistical methods for particle physics

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Parameter Estimation and Fitting to Data

Statistical Data Analysis Stat 3: p-values, parameter estimation

arxiv: v1 [hep-ex] 9 Jul 2013

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods in Particle Physics

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

P Values and Nuisance Parameters

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Statistical Methods for Astronomy

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Use of the likelihood principle in physics. Statistics II

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Some Topics in Statistical Data Analysis

Discovery significance with statistical uncertainty in the background estimate

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Statistical Methods for Particle Physics

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Multivariate statistical methods and data mining in particle physics

Asymptotic formulae for likelihood-based tests of new physics

Statistics for the LHC Lecture 1: Introduction

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Advanced statistical methods for data analysis Lecture 1

Testing Hypotheses in Particle Physics: Plots of p 0 Versus p 1

Chapters 10. Hypothesis Testing

Parameter Estimation, Sampling Distributions & Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Journeys of an Accidental Statistician

Detection and Estimation Theory

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Introductory Statistics Course Part II

arxiv: v3 [physics.data-an] 24 Jun 2013

Signal Detection Basics - CFAR

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

MODIFIED FREQUENTIST ANALYSIS OF SEARCH RESULTS (THE CL s METHOD)

Lecture 28 Chi-Square Analysis

Hypothesis Testing Chap 10p460

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Search and Discovery Statistics in HEP


Statistics of Small Signals

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Introduction to Statistical Inference

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

E. Santovetti lesson 4 Maximum likelihood Interval estimation

hypothesis testing 1

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

(1) Introduction to Bayesian statistics

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Topic 15: Simple Hypotheses

Detection and Estimation Chapter 1. Hypothesis Testing

6.4 Type I and Type II Errors

Parameter Estimation and Hypothesis Testing

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Topic 3: Hypothesis Testing

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Institute of Actuaries of India

Mathematical Statistics

ETH Zurich HS Mauro Donegà: Higgs physics meeting name date 1

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Statistical Methods in Particle Physics Day 4: Discovery and limits

Summary of Chapters 7-9

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

AST 418/518 Instrumentation and Statistics

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical methods in CMS searches

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

Second Workshop, Third Summary

Importance Sampling and. Radon-Nikodym Derivatives. Steven R. Dunbar. Sampling with respect to 2 distributions. Rare Event Simulation

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Confidence Intervals and Hypothesis Tests

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

TUTORIAL 8 SOLUTIONS #

Transcription:

Hypothesis testing (cont d) Ulrich Heintz Brown University 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 1

Hypothesis testing Is our hypothesis about the fundamental physics correct? We will not be able to give a yes no answer to this question but only a degree of confidence General approach: use parameter estimation techniques and determine p-value to quantify the degree of confidence 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 2

A simple counting experiment Consider an experiment which counts events of a certain type in search of a new process The expected background count from known processes is b = 5.2 The only model parameter is the number of counts from the new process s Thus the expected count is λ = b + s The probability to observe n counts is b+s e b + s n p n s = n! 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 3

p(n s = 0) Results Assume an observed count n The p-value ξ is the probability to observe at least n counts under the hypothesis s = 0 ξ n = p n s n=n n ξ 6 0.419 8 0.155 10 0.040 12 0.007 14 1.0 10 3 21 1.5 10 7 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 4 n

Hypothesis testing Framework for decision making between two hypotheses Null hypothesis H 0 - here s = 0 Alternative hypothesis H 1 - here s > 0 Reject H 0 if the p-value for observed data under H 0 is below some predefined threshold α Possible errors Error of the first kind (or type 1): reject H 0 if it is true ( false discovery ), probability = α Error of the second kind (or type 2): accept H 0 if it is false ( missed discovery), probability = β The probability 1 β to correctly reject H 0 if H 1 is true is called the power of the test For a given α select the test with the largest power 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 5

Z value Often p-values are very small Define z-value ( number of sigmas ) 1 2π z e 1 2 x2 dx = ξ z ξ 1 0.84 0 0.50 1 0.16 2 0.023 3 1.3 10 3 4 3.2 10 5 5 2.9 10 7 ξ For large counts use Gaussian approximation z s b 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 6

Distribution testing Generally it is more powerful to test a distribution of counts of many different event types Likelihood function for n 1, n 2, counts L μ = i p( n i b i + μs i μ is the signal strength parameter H 0 - background only (μ = 0) H 1 - background + signal (μ > 0) 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 7

Test statistic How to define the p-value for a distribution test? Choose a function, called the test statistic, which characterizes how signal-like the data are This could be a sum of squares χ 2 or a likelihood L Choose the function which maximizes the power of the test Only well-defined if H 1 is a simple hypothesis (doesn t have any free parameters) Then the most powerful test statistic is (Neyman-Pearson Lemma) t = ln L(H 1) L H 0 A generalization for complex H 1 is the profile likelihood ratio t = ln max μ>0 L 0 L μ Large values of t favor H 1, small values favor H 0 For an observed value t, the p-value then is ξ = p(t > t H 0 ) 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 8

Example μ = 0.72 μ = 0 Perform two max likelihood fits t = 4.92 What is the parent distribution of t? Generate many sample distributions with the background histogram as parent distribution Compute t for each of them 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 9

Results What fraction of sample distributions yield a value of t > t? ξ = 86 z = 3.13 105 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 10

Systematic uncertainties How can we consider systematic uncertainties in the hypothesis test? Say in our counting experiment we measure the background from a control experiment to be b = 5.2 ± 2.6 Possible approach: Include a probability distribution for b in the likelihood and average over all values of b Such parameters are called nuisance parameters because we are not fundamentally interested in their values (as opposed to the signal strength parameter, which we want to measure) 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 11

Result including systematic uncertainty 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 12

Result including systematic uncertainty 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 13

Result including systematic uncertainty In general, adding systematic uncertainties broadens the distributions of the test statistic, increases the p-value, and reduces the z-value Gaussian approximation for large counts z s b + δb 2 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 14

Look elsewhere effect We reject H 0 if ξ < α the probability to reject H 0 incorrectly is α If we repeat the procedure for the same H 0 but many different alternative hypotheses H 1 (eg test for peaks at different places) the probability that some tests reject H 0 becomes larger than α For n independent tests the probability is nα (if α 1) For example: Assume we want to carry out a test with 3 significance α = 0.0013 The probability to reject H 0 in one test is 0.13% If 10 independent channels are tested the probability to reject H 0 in any one of them is 1.3% Correct local p-value to a global p-value by multiplying with the trial factor n 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 15

Result with look elsewhere effect If we look only at M=500 we had ξ = 8.6 10 4 or z = 3.1 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 16

Result with look elsewhere effect If we look only at M=500 we had ξ = 8.6 10 4 or z = 3.1 If we look at a wider range of M, the probability to observe such a deviation increases Draw random samples and compute the minimum p-value for all values of M For M=300, 500, 700 we get ξ min = 2.2 10 3 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 17

Return to our simple counting experiment Count events of a certain type in search of a new process The expected background count from known processes is b = 5.2 The only model parameter is the number of counts from the new process s Thus the expected count is λ = b + s Suppose we count n = 8 events What statement can we make about s? Can we exclude large values of s such as s = 500? 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 18

Neyman construction Upper limit construction by hypothesis test inversion : 1. For a given s = s 0, carry out a hypothesis test with the null hypothesis s = s 0 and the alternative hypothesis s < s 0 with type-i error α (e.g., α = 0.05). 2. Repeat step 1 for different values of s 0. 3. The confidence interval for s comprises exactly those values s 0 for which the hypothesis test could not reject the null hypothesis s = s 0. For this formulation of the hypothesis test we get an upper limit with confidence level 1 α (here: 95%). This is known as the Neyman Construction. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 19

Neyman construction Example: Counting experiment with b = 5.2. As a function of s, determine n 0 for which p(n obs < n 0 s) α: 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 20

Neyman construction Example: n obs = 8, the 95% C.L. upper limit for s is 9.2. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 21

Empty intervals In the Neyman construction one can obtain empty intervals, e.g. for n obs = 1, one would state s < 0 at 95% C.L. For a correct-coverage method and true μ = 0, this happens in 5% of the cases, when n obs happens to be small. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 22

Empty intervals Empty (or very small) intervals are unsatisfactory: We know we are in the 5% type I error case. We would cite a very strong limit although there is no experimental sensitivity for such small values. To avoid citing such intervals, one can modify the frequentist construction modified frequentist intervals also known as the CLs method. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 23

The CLs method Small/empty intervals happen in case of incompatibility with background-only model (e.g. very few events even for background-only). 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 24

The CLs method Test statistic distribution for background-only model μ = 0. increase limit if data is incompatible with background-only hypothesis μ = 0. increase interval in case of small values for p b. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 25

Definition of CLs The CLs-value is a modified p-value which is large for small p b CLs = p s+b p b In the limit construction, use CLs in place of p s+b as before Limit is μ for which CLs = α. CLs limits are always more conservative than Neyman limits because CLs p s+b by construction The CLs method prevents citing limits with no experimental sensitivity 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 26