Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Similar documents
Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Introduction to Likelihoods

Introductory Statistics Course Part II

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Advanced Statistical Methods. Lecture 6

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Some Topics in Statistical Data Analysis

Statistical Methods in Particle Physics Day 4: Discovery and limits

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistics for the LHC Lecture 1: Introduction

Statistical Methods in Particle Physics

Topics in statistical data analysis for high-energy physics

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistics for the LHC Lecture 2: Discovery

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistical Methods for Particle Physics (I)

A Bayesian Approach to Phylogenetics

Introduction to Statistical Methods for High Energy Physics

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

An introduction to Bayesian reasoning in particle physics

Physics 509: Propagating Systematic Uncertainties. Scott Oser Lecture #12

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Statistics and Data Analysis

Some Statistical Tools for Particle Physics

Statistical Methods for Particle Physics

Statistics Challenges in High Energy Physics Search Experiments

Lectures on Statistical Data Analysis

A Calculator for Confidence Intervals

Recent developments in statistical methods for particle physics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Discovery significance with statistical uncertainty in the background estimate

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Modern Methods of Data Analysis - SS 2009

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Bayesian Regression Linear and Logistic Regression

Error analysis for efficiency

YETI IPPP Durham

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Principles of Bayesian Inference

Second Workshop, Third Summary

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

RWTH Aachen Graduiertenkolleg

CSC 2541: Bayesian Methods for Machine Learning

an introduction to bayesian inference

STA 4273H: Sta-s-cal Machine Learning

Bayesian Phylogenetics:

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Physics 509: Bootstrap and Robust Parameter Estimation

F denotes cumulative density. denotes probability density function; (.)

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Doing Bayesian Integrals

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

arxiv: v1 [physics.data-an] 2 Mar 2011

Statistical Methods for Astronomy

Modern Methods of Data Analysis - WS 07/08

Lecture 7 and 8: Markov Chain Monte Carlo

CS 361: Probability & Statistics

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

STA414/2104 Statistical Methods for Machine Learning II

Markov-Chain Monte Carlo

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Probability and Estimation. Alan Moses

Statistische Methoden der Datenanalyse. Kapitel 1: Fundamentale Konzepte. Professor Markus Schumacher Freiburg / Sommersemester 2009

Journeys of an Accidental Statistician

Lecture 5: Bayes pt. 1

Multivariate statistical methods and data mining in particle physics

arxiv: v1 [hep-ex] 9 Jul 2013

Primer on statistics:

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Bayesian Inference and MCMC

Brief introduction to Markov Chain Monte Carlo

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Networks in Educational Assessment

Markov Chain Monte Carlo

STAT 425: Introduction to Bayesian Analysis

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Bayesian Inference in Astronomy & Astrophysics A Short Course

theta a framework for template-based modeling and inference

Markov chain Monte Carlo methods in atmospheric remote sensing

Bayesian Methods for Machine Learning

Stat 5101 Lecture Notes

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Markov chain Monte Carlo Lecture 9

16 : Approximate Inference: Markov Chain Monte Carlo

Transcription:

Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics, multivariate methods, goodness-of-fit tests 3 Parameter estimation (90 min.) general concepts, maximum likelihood, variance of estimators, least squares 4 Interval estimation (60 min.) setting limits 5 Further topics (60 min.) systematic errors, MCMC... G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Statistical vs. systematic errors Statistical errors: How much would the result fluctuate upon repetition of the measurement? Implies some set of assumptions to define probability of outcome of the measurement. Systematic errors: What is the uncertainty in my result due to uncertainty in my assumptions, e.g., model (theoretical) uncertainty; modelling of measurement apparatus. The sources of error do not vary upon repetition of the measurement. Often result from uncertain value of, e.g., calibration constants, efficiencies, etc. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 2

Systematic errors and nuisance parameters Response of measurement apparatus is never modelled perfectly: y (measured value) x (true value) model: truth: Model can be made to approximate better the truth by including more free parameters. systematic uncertainty nuisance parameters G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 3

Nuisance parameters Suppose the outcome of the experiment is some set of data values x (here shorthand for e.g. x 1,..., x n ). We want to determine a parameter (could be a vector of parameters 1,..., n ). The probability law for the data x depends on : L(x ) (the likelihood function) E.g. maximize L to find estimator Now suppose, however, that the vector of parameters: contains some that are of interest, and others that are not of interest: Symbolically: The are called nuisance parameters. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 4

Example #1: fitting a straight line Data: Model: measured y i independent, Gaussian: assume x i and i known. Goal: estimate 0 (don t care about 1 ). G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 5

Case #1: 1 known a priori For Gaussian y i, ML same as LS Minimize 2 estimator Come up one unit from to find G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 6

Case #2: both 0 and 1 unknown Standard deviations from tangent lines to contour Correlation between to increase. causes errors G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 7

Case #3: we have a measurement t 1 of 1 The information on 1 improves accuracy of G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 8

The profile likelihood The tangent plane method is a special case of using the profile likelihood: is found by maximizing L ( 0, 1 ) for each 0. Equivalently use The interval obtained from is the same as what is obtained from the tangents to Well known in HEP as the MINOS method in MINUIT. Profile likelihood is one of several pseudo-likelihoods used in problems with nuisance parameters. See e.g. talk by Rolke at PHYSTAT05. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 9

The Bayesian approach In Bayesian statistics we can associate a probability with a hypothesis, e.g., a parameter value. Interpret probability of as degree of belief (subjective). Need to start with prior pdf ( ), this reflects degree of belief about before doing the experiment. Our experiment has data x, likelihood function L(x ). Bayes theorem tells how our beliefs should be updated in light of the data x: Posterior pdf p( x) contains all our knowledge about. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 10

Case #4: Bayesian method We need to associate prior probabilities with 0 and 1, e.g., reflects prior ignorance, in any case much broader than Putting this into Bayes theorem gives: based on previous measurement posterior Q likelihood prior G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 11

Bayesian method (continued) We then integrate (marginalize) p( 0, 1 x) to find p( 0 x): In this example we can do the integral (rare). We find Ability to marginalize over nuisance parameters is an important feature of Bayesian statistics. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 12

Digression: marginalization with MCMC Bayesian computations involve integrals like often high dimensionality and impossible in closed form, also impossible with normal acceptance-rejection Monte Carlo. Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation. Google for MCMC, Metropolis, Bayesian computation,... MCMC generates correlated sequence of random numbers: cannot use for many applications, e.g., detector MC; effective stat. error greater than n. Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 13

MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points 1) Start at some point 2) Generate 3) Form Hastings test ratio 4) Generate Proposal density e.g. Gaussian centred about 5) If else move to proposed point old point repeated 6) Iterate G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 14

Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one). For our purposes this correlation is not fatal, but statistical errors larger than naive The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric: Test ratio is (Metropolis-Hastings): I.e. if the proposed step is to a point of higher if not, only take the step with probability If proposed step rejected, hop in place., take it; G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 15

Metropolis-Hastings caveats Actually one can only prove that the sequence of points follows the desired pdf in the limit where it runs forever. There may be a burn-in period where the sequence does not initially follow Unfortunately there are few useful theorems to tell us when the sequence has converged. Look at trace plots, autocorrelation. Check result with different proposal density. If you think it s converged, try it again with 10 times more points. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 16

Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC: Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc. Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?) G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 17

Case #5: Bayesian method with vague prior Suppose we don t have a previous measurement of 1 but rather some vague information, e.g., a theorist tells us: 1 0 (essentially certain); 1 should have order of magnitude less than 0.1 or so. Under pressure, the theorist sketches the following prior: From this we will obtain posterior probabilities for 0 (next slide). We do not need to get the theorist to commit to this prior; final result has if-then character. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 18

Sensitivity to prior Vary ( ) to explore how extreme your prior beliefs would have to be to justify various conclusions (sensitivity analysis). Try exponential with different mean values... Try different functional forms... G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 19

Example #2: Poisson data with background Count n events, e.g., in fixed time or integrated luminosity. s = expected number of signal events b = expected number of background events n ~ Poisson(s+b): Sometimes b known, other times it is in some way uncertain. Goal: measure or place limits on s, taking into consideration the uncertainty in b. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 20

Classical procedure with measured background Suppose we have a measurement of b, e.g., b meas ~ N (b, b ) So the data are really: n events and the value b meas. In principle the confidence interval recipe can be generalized to two measurements and two parameters. Difficult and not usually attempted, but see e.g. talks by K. Cranmer at PHYSTAT03, G. Punzi at PHYSTAT05. G. Punzi, PHYSTAT05 G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 21

Bayesian limits with uncertainty on b Uncertainty on b goes into the prior, e.g., Put this into Bayes theorem, Marginalize over b, then use p(s n) to find intervals for s with any desired probability content. Controversial part here is prior for signal s (s) (treatment of nuisance parameters is easy). G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 22

Cousins-Highland method Regard b as random, characterized by pdf (b). Makes sense in Bayesian approach, but in frequentist model b is constant (although unknown). A measurement b meas is random but this is not the mean number of background events, rather, b is. Compute anyway This would be the probability for n if Nature were to generate a new value of b upon repetition of the experiment with b (b). Now e.g. use this P(n;s) in the classical recipe for upper limit at CL = 1 : Result has hybrid Bayesian/frequentist character. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 23

Integrated likelihoods Consider again signal s and background b, suppose we have uncertainty in b characterized by a prior pdf b (b). Define integrated likelihood as also called modified profile likelihood, in any case not a real likelihood. Now use this to construct likelihood ratio test and invert to obtain confidence intervals. Feldman-Cousins & Cousins-Highland (FHC 2 ), see e.g. J. Conrad et al., Phys. Rev. D67 (2003) 012002 and Conrad/Tegenfeldt PHYSTAT05 talk. Calculators available (Conrad, Tegenfeldt, Barlow). G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 24

Interval from inverting profile LR test Suppose we have a measurement b meas of b. Build the likelihood ratio test with profile likelihood: and use this to construct confidence intervals. See PHYSTAT05 talks by Cranmer, Feldman, Cousins, Reid. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 25

Wrapping up lecture 5 We ve seen some main ideas about systematic errors, uncertainties in result arising from model assumptions; can be quantified by assigning corresponding uncertainties to additional (nuisance) parameters. Different ways to quantify systematics Bayesian approach in many ways most natural; marginalize over nuisance parameters; important tool: MCMC Frequentist methods rely on a hypothetical sample space for often non-repeatable phenomena G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 26

Lecture 5 extra slides G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 27

The error on the error Some systematic errors are well determined Error from finite Monte Carlo sample Some are less obvious Do analysis in n equally valid ways and extract systematic error from spread in results. Some are educated guesses Guess possible size of missing terms in perturbation series; vary renormalization scale Can we incorporate the error on the error? (cf. G. D Agostini 1999; Dose & von der Linden 1999) G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 28

Motivating a non-gaussian prior b (b) Suppose now the experiment is characterized by where s i is an (unreported) factor by which the systematic error is over/under-estimated. Assume correct error for a Gaussian b (b) would be s i i sys, so Width of s (s i ) reflects error on the error. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 29

Error-on-error function s (s) A simple unimodal probability density for 0 < s < 1 with adjustable mean and variance is the Gamma distribution: mean = b/a variance = b/a 2 Want e.g. expectation value of 1 and adjustable standard deviation s, i.e., s (s) In fact if we took s (s)» inverse Gamma, we could integrate b (b) in closed form (cf. D Agostini, Dose, von Linden). But Gamma seems more natural & numerical treatment not too painful. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 30 s

Prior for bias b (b) now has longer tails b (b) b Gaussian ( s = 0) P( b > 4 sys ) = 6.3 10-5 s = 0.5 P( b > 4 sys ) = 0.65% G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 31

A simple test Suppose fit effectively averages four measurements. Take sys = stat = 0.1, uncorrelated. Case #1: data appear compatible Posterior p( y): measurement p( y) experiment Usually summarize posterior p( y) with mode and standard deviation: G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 32

Simple test with inconsistent data Case #2: there is an outlier Posterior p( y): measurement p( y) experiment Bayesian fit less sensitive to outlier. Error now connected to goodness-of-fit. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 33

Goodness-of-fit vs. size of error In LS fit, value of minimized 2 does not affect size of error on fitted parameter. In Bayesian analysis with non-gaussian prior for systematics, a high 2 corresponds to a larger error (and vice versa). posterior 2000 repetitions of experiment, s = 0.5, here no actual bias. from least squares 2 G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 34

Is this workable for PDF fits? Straightforward to generalize to include correlations Prior on correlation coefficients ~ ( ) (Myth: = 1 is conservative ) Can separate out different systematic for same measurement Some will have small s, others larger. Remember the if-then nature of a Bayesian result: We can (should) vary priors and see what effect this has on the conclusions. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 35