Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Similar documents
Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Statistical Data Analysis Stat 3: p-values, parameter estimation

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Hypothesis testing (cont d)

Statistics for the LHC Lecture 1: Introduction

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Hypothesis Testing - Frequentist

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

PoS(NOW2016)041. IceCube and High Energy Neutrinos. J. Kiryluk (for the IceCube Collaboration)

6.4 Type I and Type II Errors

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Journeys of an Accidental Statistician

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

Frequentist Statistics and Hypothesis Testing Spring

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Searches for astrophysical sources of neutrinos using cascade events in IceCube

Statistical Methods for Astronomy

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

A new IceCube starting track event selection and realtime event stream

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

P Values and Nuisance Parameters

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Primer on statistics:

Statistical Methods for Particle Physics (I)

Chapters 10. Hypothesis Testing

Measurement of High Energy Neutrino Nucleon Cross Section and Astrophysical Neutrino Flux Anisotropy Study of Cascade Channel with IceCube

Some Statistical Tools for Particle Physics

Bayesian RL Seminar. Chris Mansley September 9, 2008

Fundamental Probability and Statistics

Mathematical Statistics

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Statistical Inference

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

(1) Introduction to Bayesian statistics

Lecture 4: Statistical Hypothesis Testing

Detection theory 101 ELEC-E5410 Signal Processing for Communications

hypothesis testing 1

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Statistics of Small Signals

A Multimessenger Neutrino Point Source Search with IceCube

Nuisance parameters and their treatment

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

A Search for Point Sources of High Energy Neutrinos with AMANDA-B10

Search for a diffuse cosmic neutrino flux with ANTARES using track and cascade events

Statistical Inference

Statistical Tests: Discriminants

Hypothesis Testing, Bayes Theorem, & Parameter Estimation

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Lecture 5: Bayes pt. 1

Bayesian Inference in Astronomy & Astrophysics A Short Course

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Constructing Ensembles of Pseudo-Experiments

Chapter Three. Hypothesis Testing

Cosmic Neutrinos in IceCube. Naoko Kurahashi Neilson University of Wisconsin, Madison IceCube Collaboration

STAT 536: Genetic Statistics

Frequentist-Bayesian Model Comparisons: A Simple Example

ST 740: Model Selection

Detection theory. H 0 : x[n] = w[n]

Statistics Challenges in High Energy Physics Search Experiments

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

IceCube Results & PINGU Perspectives

High Energy Astrophysical Neutrino Flux Characteristics for Neutrino-induced Cascades Using IC79 and IC86-String IceCube Configurations

PoS(ICRC2017)945. In-ice self-veto techniques for IceCube-Gen2. The IceCube-Gen2 Collaboration

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Naïve Bayes classification

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING: EXAMPLES

Chapter 9: Hypothesis Testing Sections

A Bayesian Approach to Phylogenetics

A Summary of recent Updates in the Search for Cosmic Ray Sources using the IceCube Detector

Ling 289 Contingency Table Statistics

Topic 19 Extensions on the Likelihood Ratio

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Introduction into Bayesian statistics

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Bayes Factors for Discovery

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

How bright can the brightest neutrino source be?

Sensitivity of oscillation experiments to the neutrino mass ordering

Negative binomial distribution and multiplicities in p p( p) collisions

LEARNING WITH BAYESIAN NETWORKS

Bayesian Regression Linear and Logistic Regression

Hypothesis Tests Solutions COR1-GB.1305 Statistics and Data Analysis

Inference for Proportions, Variance and Standard Deviation

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

The Shadow of the Moon in IceCube

Transcription:

Physics 403 Classical Hypothesis Testing: The Likelihood Ratio Test Segev BenZvi Department of Physics and Astronomy University of Rochester

Table of Contents 1 Bayesian Hypothesis Testing Posterior Odds Ratio Comparison to Parameter Estimation 2 Classical Hypothesis Testing Type I and Type II Errors Statistical Significance and Power Neyman-Pearson Lemma Wilks Theorem Using χ 2 instead of 2 ln L 3 Case Study: Detection of Extraterrestrial Neutrinos Segev BenZvi (UR) PHY 403 2 / 26

Hypothesis Testing Parameter estimation and model selection are quite similar; we are just asking different questions of the data In model selection we calculate the probability that some hypothesis H 0 is true, starting from Bayes Theorem: p(h 0 D, I ) = p(d H 0, I ) p(h 0 I ) p(d I ) The marginal evidence p(d I ) can be ignored if we are calculating the odds ratio of H 0 with some other hypothesis H 1 If we actually want to know p(h 0 D, I ) we need to calculate p(d I ). This requires the alternative hypothesis. Using marginalization and the produce rule, p(d I ) = p(d H 0, I ) p(h 0 I ) + p(d H 1, I ) p(h 1 I ) Segev BenZvi (UR) PHY 403 3 / 26

Posterior Odds of Two Models A and B Compare model A with model B which has a tunable parameter λ: O AB = p(a I ) p(b I ) p(d A, I ) p(d ˆλ, B, I ) λ max λ min δλ 2π Combination of prior odds, likelihood ratios, and an Ockham factor that penalizes scanning over parameter λ Segev BenZvi (UR) PHY 403 4 / 26

Comparison with Parameter Estimation Note how this differs from parameter estimation, where we assume that a model is correct and calculate the best parameter given that model To infer a best estimate of a parameter λ from the data, given that B is correct, we write p(λ D, B, I ) = p(d λ, B, I ) p(λ B, I ) p(d B, I ) To estimate λ we want to maximize the likelihood over the range [λ min, λ max ]. As long as the range contains enough of p(d λ, B, I ) around ˆλ, its particular bounds do not matter for finding ˆλ To calculate the odds ratio of A and B we are basically comparing the likelihoods averaged over the parameter space Therefore, in model selection the Ockham factor matters because there is a cost to averaging the likelihood over a larger parameter space Segev BenZvi (UR) PHY 403 5 / 26

Table of Contents 1 Bayesian Hypothesis Testing Posterior Odds Ratio Comparison to Parameter Estimation 2 Classical Hypothesis Testing Type I and Type II Errors Statistical Significance and Power Neyman-Pearson Lemma Wilks Theorem Using χ 2 instead of 2 ln L 3 Case Study: Detection of Extraterrestrial Neutrinos Segev BenZvi (UR) PHY 403 6 / 26

Hypothesis Testing in Classical Statistics Type I Errors Construct a test statistic t and use its value to decide whether to accept or reject a hypothesis The statistic t is basically a summary of the data given the hypothesis we want to test Define a cut value t cut and use that to accept or reject the hypothesis H 0 depending on the value of t measured in data Type I Error: reject H 0 even though it is true with tail probability α (shown in gray) Segev BenZvi (UR) PHY 403 7 / 26

Hypothesis Testing in Classical Statistics Type II Errors You can also specify an alternative hypothesis H 1 and use t to test if it s true Type II Error: accept H 0 even though it is false and H 1 is true. This tail probability β is shown in pink α = β = t cut p(t H 0 ) dt tcut p(t H 1 ) dt Segev BenZvi (UR) PHY 403 8 / 26

Statistical Significance and Power As you can see there is some tension between α and β. Increasing t cut will increase β and reduce α, and vice-versa Significance: α gives the significance of a test. When α is small we disfavor H 0, known as the null hypothesis Power: 1 β is called the power of a test. A powerful test has a small chance of wrongly accepting H 0 Example It s useful to think of the null hypothesis H 0 as a less interesting default/status quo result (your data contain only background) and H 1 as a potential discovery (your data contain signal). A good test will have high significance and high power, since this means a low chance of incorrectly claiming a discovery and a low chance of missing an important discovery. Segev BenZvi (UR) PHY 403 9 / 26

The Neyman-Pearson Lemma The Neyman-Pearson Lemma is used to balance signifance and power. It states that the acceptance region giving the highest power (and hence the highest signal purity ) for a given significance level α (or selection efficiency 1 α) is the region of t-space such that Λ(t) = p(t H 0) p(t H 1 ) > c Here Λ(t) is the likelihood ratio of the test statistic t under the two hypotheses H 0 and H 1. The constant c is determined by α. Note that t can be multidimensional. In practice, one often estimates the distribution of Λ(t) using Monte Carlo by generating t according to H 0 and H 1. Then use the distribution to determine the cut c that will give you the desired significance α. Segev BenZvi (UR) PHY 403 10 / 26

Comparing Two Simple Hypotheses A simple model is one in which the model parameter θ is fixed to some value; i.e., there are no unknown parameters to estimate In comparing two simple models, the null and alternative hypotheses can be written The likelihood ratio is H 0 : θ = θ 0 H 1 : θ = θ 1 Λ(t) = p(t θ 0) p(t θ 1 ), and the decision rule for the test is at significance level α is Λ > c : do not reject H 0 Λ < c : reject H 0 Λ = c : reject H 0 with probability q, where α = q p(λ = c H 0 ) + p(λ < c H 0 ) Segev BenZvi (UR) PHY 403 11 / 26

Comparing Two Composite Hypotheses A composite hypothesis is one in which the parameter θ is part of a subset Θ 0 of a larger parameter space Θ: The likelihood ratio is H 0 : θ Θ 0 H 1 : θ Θ Λ(t) = sup {p(t θ) : θ Θ 0} sup {p(t θ) : θ Θ}, where sup refers to the supremum function, also known as the least upper bound. The numerator is the max likelihood under H 0, and the denominator is the max likelihood under H 1 The Neyman-Pearson lemma states that this likelihood ratio test is the most powerful of all tests of level α for rejecting H 0 Segev BenZvi (UR) PHY 403 12 / 26

Wilks Theorem If H 0 is true and is a subspace of the larger parameter space represented by H 1, then as N, the statistic 2 ln Λ will be distributed as a χ 2 with the number of degrees of freedom equal to the difference in dimensionality of Θ 0 and Θ [1] This is what we call a nested model, and it shows up all the time Example Nested model of constant and line: H 0 : the data are described y = a H 1 : the data are described by y = a + bx Segev BenZvi (UR) PHY 403 13 / 26

Likelihood Ratio Test: Example Example You flip a coin N = 1000 times and get heads n = 550 times. Is it fair? H 0 : p = 0.5 H 1 : p [0, 1] Λ = L(n, N p, H 0) L(n, N p, H 1 ) ln L = n ln p + (N n) ln (1 p) Under H 1 the maximum likelihood estimate is ˆp = 0.55, so 2 ln Λ = 2(ln L 0 ln L 1 ) = 2(550 ln 0.5 + 450 ln 0.5 550 ln 0.55 450 ln 0.55) = 10.02 p(χ 2 > 10.02 N = 1) = 0.17% Segev BenZvi (UR) PHY 403 14 / 26

χ 2 and the Likelihood Ratio Test If you have χ 2 from nested model fits, you can use χ 2 instead of 2 ln L as long as the conditions of Wilks Theorem apply Example: simulated linear data with linear and quadratic fits. The distribution χ 2 has a mean of 1 and a variance of 2, as expected Segev BenZvi (UR) PHY 403 15 / 26

Table of Contents 1 Bayesian Hypothesis Testing Posterior Odds Ratio Comparison to Parameter Estimation 2 Classical Hypothesis Testing Type I and Type II Errors Statistical Significance and Power Neyman-Pearson Lemma Wilks Theorem Using χ 2 instead of 2 ln L 3 Case Study: Detection of Extraterrestrial Neutrinos Segev BenZvi (UR) PHY 403 16 / 26

Extraterrestrial Neutrino Spectra Sources of neutrinos at Earth [2]: Cosmic ν background Solar neutrinos Atmospheric ν s Astrophysical ν s We can t tell apart one kind of ν from another, but the energy spectra differ. So on a statistical basis we can discriminate populations Segev BenZvi (UR) PHY 403 17 / 26

Traditional Neutrino Detection Muons from cosmic rays are a large source of background in IceCube Put detectors underground/ice/sea to reduce muon counts Look in the Northern Hemisphere, where cosmic rays are blocked (but atmospheric ν s from air showers are not) Segev BenZvi (UR) PHY 403 18 / 26

All-Sky Searches for ν Point Sources in IceCube Compare the ratio of likelihoods for observing n s signal events to observing background only (n s = 0) as a function of position x on the sky: p i (x j, n s ) = n s N S i(x j ) + N n s N B i(x j ) The likelihood function is the product of all events L(n s ) = p i (x j, n s ) The test statistic is the log-likelihood ratio 2 ln Λ = 2 ln L(ˆn s ) L(n s = 0) Ignore the trivial sign flip; it s still the usual definition Segev BenZvi (UR) PHY 403 19 / 26

IceCube Signal and Background PDFs S i (x j ) and B i (x j ) depend on the energy and sky position of the i th neutrino: S i = 1 2πσ 2 i e r 2 i /2σ 2 i p(e i α), B i = B zen p atm (E i ) The index α of the source spectrum E α is a nuisance parameter Segev BenZvi (UR) PHY 403 20 / 26

IceCube Skymap The all-sky search calculates the likelihood ratio at each position on the sky. (For this analysis, only data from the Northern Hemisphere were used.) The goal is to look for hotspots, or areas of the sky where the signal PDFs from many ν candidates appear to produce a significant excess in ln Λ In this particular map, the maximum value of ln Λ = 13.4, which corresponds to a 4.8σ excess above background Segev BenZvi (UR) PHY 403 21 / 26

Correction for Look-Elsewhere Effects There is a big look-elsewhere effect in the significance because the analysis included a scan for hotspots over the full sky Correction: simulate 10 4 background-only skymaps and count the number with ln Λ max > 13.4. Result: p = 1.3%, or 2.2σ Segev BenZvi (UR) PHY 403 22 / 26

Major Improvement: Contained Event Search Define the outer shell of the detector to be an atmospheric µ veto layer Effective detection volume reduced, but atmospheric ν s strongly suppressed above E ν = 100 TeV [3] Segev BenZvi (UR) PHY 403 23 / 26

Skymap of Astrophysical Neutrino Sources Skymap of astrophysical ν arrival directions shows some hotspots For now, the value of 2 ln Λ is consistent with random clustering [3] Segev BenZvi (UR) PHY 403 24 / 26

Summary Wilks Theorem: if H 0 is a subset of H 1, the log-likelihood ratio 2 ln Λ(t) = 2 ln L(t H 0) L(t H 1 ) is distributed like a χ 2 with the number of degrees of freedom equal to the difference in the dimensionality between H 0 and H 1 The conditions under which Wilks Theorem hold may not apply to your data. In this case, just produce Monte Carlo to determine the distribution of 2 ln Λ Consider a Bayesian analysis, especially if you want to incorporate prior information Lesson from IceCube: analysis techniques are nice for background suppression, but nothing beats a good experimental design that eliminates sources of background from the start Segev BenZvi (UR) PHY 403 25 / 26

References I [1] S. S. Wilks. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. In: Ann. Math. Statist. 9.1 (Mar. 1938), pp. 60 62. [2] Julia K. Becker. High-energy neutrinos in the context of multimessenger physics. In: Phys.Rept. 458 (2008), pp. 173 246. arxiv: 0710.1557 [astro-ph]. [3] M.G. Aartsen et al. Evidence for High-Energy Extraterrestrial Neutrinos at the IceCube Detector. In: Science 342 (2013), p. 1242856. arxiv: 1311.5238 [astro-ph.he]. Segev BenZvi (UR) PHY 403 26 / 26