Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Similar documents
Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Statistical Methods in Particle Physics Day 4: Discovery and limits

Recent developments in statistical methods for particle physics

Introduction to Likelihoods

Statistics for the LHC Lecture 2: Discovery

Introductory Statistics Course Part II

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Some Statistical Tools for Particle Physics

Statistics for the LHC Lecture 1: Introduction

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Some Topics in Statistical Data Analysis

Discovery significance with statistical uncertainty in the background estimate

Statistical Methods in Particle Physics

Statistical Methods for LHC Physics

Statistical Methods for Particle Physics (I)

Minimum Power for PCL

An introduction to Bayesian reasoning in particle physics

Advanced statistical methods for data analysis Lecture 1

Advanced Statistical Methods. Lecture 6

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Introduction to Statistical Methods for High Energy Physics

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Discovery Potential for the Standard Model Higgs at ATLAS

Statistical Methods for Particle Physics

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Statistics Challenges in High Energy Physics Search Experiments

Asymptotic formulae for likelihood-based tests of new physics

Why I never believed the Tevatron Higgs sensitivity claims for Run 2ab Michael Dittmar

Multivariate statistical methods and data mining in particle physics

Topics in statistical data analysis for high-energy physics

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Statistics Forum News

Discovery and Significance. M. Witherell 5/10/12

YETI IPPP Durham

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Physics 509: Propagating Systematic Uncertainties. Scott Oser Lecture #12

Lectures on Statistical Data Analysis

Primer on statistics:

Hypothesis testing (cont d)

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006

Use of the likelihood principle in physics. Statistics II

arxiv: v3 [physics.data-an] 24 Jun 2013

Model Fitting in Particle Physics

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 509: Bootstrap and Robust Parameter Estimation

RWTH Aachen Graduiertenkolleg

arxiv: v1 [physics.data-an] 2 Mar 2011

Statistical Methods for Particle Physics Tutorial on multivariate methods

Two Early Exotic searches with dijet events at ATLAS

Statistics and Data Analysis

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistical Data Analysis 2017/18

arxiv: v1 [hep-ex] 9 Jul 2013

Search for a heavy gauge boson W e

Overview of Statistical Methods in Particle Physics

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Higgs couplings after Moriond

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Statistical Methods in Particle Physics

Search for Higgs in H WW lνlν

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Albert De Roeck CERN, Geneva, Switzerland Antwerp University Belgium UC-Davis University USA IPPP, Durham UK. 14 September 2012

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Discovery potential of the SM Higgs with ATLAS

RooStatsCms: a tool for analyses modelling, combination and statistical studies

Negative binomial distribution and multiplicities in p p( p) collisions

Leïla Haegel University of Geneva

The Higgs boson discovery. Kern-und Teilchenphysik II Prof. Nicola Serra Dr. Annapaola de Cosa Dr. Marcin Chrzaszcz

Introduction to Bayesian Data Analysis

Single top quark production at CDF

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Combination of Higgs boson searches at ATLAS

Modern Methods of Data Analysis - WS 07/08

Summary of Columbia REU Research. ATLAS: Summer 2014

Combined Higgs Results

W mass measurement in the ATLAS experiment

Second Workshop, Third Summary

Search for b Ø bz. CDF note Adam Scott, David Stuart UCSB. 1 Exotics Meeting. Blessing

Statistical techniques for data analysis in Cosmology

Feasibility of a cross-section measurement for J/ψ->ee with the ATLAS detector

Monte Carlo in Bayesian Statistics

Modern Methods of Data Analysis - SS 2009

Search for SM Higgs Boson at CMS

A Bayesian Approach to Phylogenetics

Bayes Theorem (Recap) Adrian Bevan.

Error analysis for efficiency

theta a framework for template-based modeling and inference

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Transcription:

Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan page 1

Outline Preliminaries Role of probability in data analysis (Frequentist, Bayesian) Systematic errors and nuisance parameters A simple fitting problem Frequentist solution / Bayesian solution When does tot2 = stat2 + sys2 make sense? Systematic uncertainties in a search Example of search for Higgs (ATLAS) Examples of nuisance parameters in fits b s with recoil method (BaBar) Towards a general strategy for nuisance parameters Conclusions page 2

Data analysis in HEP Particle physics experiments are expensive e.g. LHC, ~ $1010 (accelerator and experiments) the competition is intense (ATLAS vs. CMS) vs. Tevatron and the stakes are high: 4 sigma effect 5 sigma effect So there is a strong motivation to know precisely whether one's signal is a 4 sigma or 5 sigma effect. page 3

Frequentist vs. Bayesian approaches In frequentist statistics, probabilities are associated only with the data, i.e., outcomes of repeatable observations. Probability = limiting frequency The preferred hypotheses (theories, models, parameter values,...) are those for which our observations would be considered usual. In Bayesian statistics, interpretation of probability extended to degree of belief (subjective probability). Use Bayes' theorem to relate (posterior) probability for hypothesis H given data x to probability of x given H (the likelihood): Need prior probability, (H), i.e., before seeing the data. page 4

Statistical vs. systematic errors Statistical errors: How much would the result fluctuate upon repetition of the measurement? Implies some set of assumptions to define probability of outcome of the measurement. Systematic errors: What is the uncertainty in my result due to uncertainty in my assumptions, e.g., model (theoretical) uncertainty; modelling of measurement apparatus. Usually taken to mean the sources of error do not vary upon repetition of the measurement. Often result from uncertain value of calibration constants, efficiencies, etc. page 5

Systematic errors and nuisance parameters Model prediction (including e.g. detector effects) never same as "true prediction" of the theory: model: y truth: x Model can be made to approximate better the truth by including more free parameters. systematic uncertainty nuisance parameters page 6

Example: fitting a straight line Data: Model: measured yi independent, Gaussian: assume xi and i known. Goal: estimate 0 (don t care about 1). page 7

Frequentist approach Standard deviations from tangent lines to contour Correlation between causes errors to increase. page 8

Frequentist case with a measurement t1 of 1 The information on 1 improves accuracy of page 9

Bayesian method We need to associate prior probabilities with 0 and 1, e.g., reflects prior ignorance, in any case much broader than based on previous measurement Putting this into Bayes theorem gives: posterior Q likelihood prior page 10

Bayesian method (continued) We then integrate (marginalize) p( 0, 1 x) to find p( 0 x): In this example we can do the integral (rare). We find Usually need numerical methods (e.g. Markov Chain Monte Carlo) to do integral. page 11

Bayesian method with alternative priors Suppose we don t have a previous measurement of 1 but rather, e.g., a theorist says it should be positive and not too much greater than 0.1 "or so", i.e., something like From this we obtain (numerically) the posterior pdf for 0: This summarizes all knowledge about 0. Look also at result from variety of priors. page 12

A more general fit (symbolic) Given measurements: and (usually) covariances: Predicted value: expectation value control variable parameters bias Often take: Minimize Equivalent to maximizing L( )» e /2, i.e., least squares same as maximum likelihood using a Gaussian likelihood function. 2 page 13

Its Bayesian equivalent Take Joint probability for all parameters and use Bayes theorem: To get desired probability for, integrate (marginalize) over b: Posterior is Gaussian with mode same as least squares estimator, same as from 2 = 2min + 1. (Back where we started!) page 14

Alternative priors for systematic errors Gaussian prior for the bias b often not realistic, especially if one considers the "error on the error". Incorporating this can give a prior with longer tails: b (b) Represents error on the error ; standard deviation of s(s) is s. b page 15

A simple test Suppose fit effectively averages four measurements. Take sys = stat = 0.1, uncorrelated. Posterior p( y): p( y) measurement Case #1: data appear compatible experiment Usually summarize posterior p( y) with mode and standard deviation: page 16

Simple test with inconsistent data Posterior p( y): p( y) measurement Case #2: there is an outlier experiment Bayesian fit less sensitive to outlier. (See also D'Agostini 1999; Dose & von der Linden 1999) page 17

Example of systematics in a search Combination of Higgs search channels (ATLAS) Expected Performance of the ATLAS Experiment: Detector, Trigger and Physics, arxiv:0901.0512, CERN-OPEN-2008-20. Standard Model Higgs channels considered (more to be used later): H H WW (*) e H ZZ(*) 4l (l = e, ) H ll, lh Used profile likelihood method for systematic uncertainties: background rates, signal & background shapes. page 18

Statistical model for Higgs search Bin i of a given channel has ni events, expectation value is is global strength parameter, common to all channels. = 0 means background only, = 1 is SM hypothesis. Expected signal and background are: btot, s, b are nuisance parameters page 19

The likelihood function The single-channel likelihood function uses Poisson model for events in signal and control histograms: data in control data in signal histogram histogram here signal rate is only parameter of interest represents all nuisance parameters, e.g., background rate, shapes There is a likelihood Li(, i) for each channel, i = 1,, N. The full likelihood function is page 20

Profile likelihood ratio To test hypothesized value of, construct profile likelihood ratio: Maximized L for given Maximized L Equivalently use q = 2 ln ( ): data agree well with hypothesized q small data disagree with hypothesized q large Distribution of q under assumption of related to chi-square (Wilks' theorem, here approximation valid for roughly L > 2 fb ): page 21

p-value / significance of hypothesized Test hypothesized by giving p-value, probability to see data with compatibility with compared to data observed: Equivalently use significance, Z, defined as equivalent number of sigmas for a Gaussian fluctuation in one direction: page 22

Sensitivity Discovery: Generate data under s+b ( = 1) hypothesis; Test hypothesis = 0 p-value Z. Exclusion: Generate data under background-only ( = 0) hypothesis; Test hypothesis. If = 1 has p-value < 0.05 exclude mh at 95% CL. Presence of nuisance parameters leads to broadening of the profile likelihood, reflecting the loss of information, and gives appropriately reduced discovery significance, weaker limits. page 23

Combined discovery significance Discovery signficance (in colour) vs. L, mh: Approximations used here not always accurate for L < 2 fb but in most cases conservative. page 24

Combined 95% CL exclusion limits p-value of mh (in colour) vs. L, mh: page 25

Fit example: b s (BaBar) B. Aubert et al. (BaBar), Phys. Rev. D 77, 051103(R) (2008). e - Btag Bsignal D* "recoil method" e+ high-energy Xs Decay of one B fully reconstructed (Btag). Look for high-energy from rest of event. Signal and background yields from fit to mes in bins of E. page 26

Fitting mes distribution for b s Fit mes distribution using superposition of Crystal Ball and Argus functions: Crystal Ball Argus log-likelihood: rates shapes obs./pred. events in ith bin page 27

Simultaneous fit of all mes distributions Need fits of mes distributions in 14 bins of E : At high E, not enough events to constrain shape, so combine all E bins into global fit: Shape parameters could vary (smoothly) with E. So make Ansatz for shape parameters such as Start with no energy dependence, and include one by one more parameters until data well described. page 28

Finding appropriate model flexibility Here for Argus parameter, linear dependence gives significant improvement; fitted coefficient of linear term ±. 2(1) 2(2) = 3.48 p-value of (1) = 0.062 data want extra par. D. Hopkins, PhD thesis, RHUL (2007). Inclusion of additional free parameters (e.g., quadratic E dependence for parameter ) do not bring significant improvement. So including the additional energy dependence for the shape parameters converts the systematic uncertainty into a statistical uncertainty on the parameters of interest. page 29

Towards a general strategy (frequentist) In progress together with: S. Caron, S. Horner, J. Sundermann, E. Gross, O Vitells, A. Alam Suppose one needs to know the shape of a distribution. Initial model (e.g. MC) is available, but known to be imperfect. Q: How can one incorporate the systematic error arising from use of the incorrect model? A: Improve the model. That is, introduce more adjustable parameters into the model so that for some point in the enlarged parameter space it is very close to the truth. Then use profile the likelihood with respect to the additional (nuisance) parameters. The correlations with the nuisance parameters will inflate the errors in the parameters of interest. Difficulty is deciding how to introduce the additional parameters. page 30

A simple example 0th order model True model (Nature) Data The naive model (a) could have been e.g. from MC (here statistical errors suppressed; point is to illustrate how to incorporate systematics.) page 31

Comparing model vs. data Model number of entries ni in ith bin as ~Poisson( i) In the example shown, the model and data clearly don't agree well. To compare, use e.g. Will follow chi-square distribution for N dof for sufficiently large ni. page 32

Model-data comparison with likelihood ratio This is very similar to a comparison based on the likelihood ratio where L( ) = P(n; ) is the likelihood and the hat indicates the ML estimator (value that maximizes the likelihood). Here easy to show that Equivalently use logarithmic variable If model correct, q ~ chi-square for N degrees of freedom. page 33

p-values Using either 2P or q, state level of data-model agreement by giving the p-value: the probability, under assumption of the model, of obtaining an equal or greater incompatibility with the data relative to that found with the actual data: where (in both cases) the integrand is the chi-square distribution for N degrees of freedom, page 34

Comparison with the 0th order model The 0th order model gives q = 258.8, p page 35

Enlarging the model Here try to enlarge the model by multiplying the 0th order distribution by a function s: where s(x) is a linear superposition of Bernstein basis polynomials of order m: page 36

Bernstein basis polynomials Using increasingly high order for the basis polynomials gives an increasingly flexible function. page 37

Enlarging the parameter space Using increasingly high order for the basis polynomials gives an increasingly flexible function. At each stage compare the p-value to some threshold, e.g., 0.1 or 0.2, to decide whether to include the additional parameter. Now iterate this procedure, and stop when the data do not require addition of further parameters based on the likelihood ratio test. (And overall goodness-of-fit should also be good.) Once the enlarged model has been found, simply include it in any further statistical procedures, and the statistical errors from the additional parameters will account for the systematic uncertainty in the original model. page 38

Fits using increasing numbers of parameters Stop here page 39

Deciding appropriate level of flexibility When p-value exceeds ~0.1 to 0.2, fit is good enough. Stop here says whether data prefer additional parameter says whether data well described overall page 40

Issues with finding an improved model Sometimes, e.g., if the data set is very large, the total 2 can be very high (bad), even though the absolute deviation between model and data may be small. It may be that including additional parameters "spoils" the parameter of interest and/or leads to an unphysical fit result well before it succeeds in improving the overall goodness-of-fit. Include new parameters in a clever (physically motivated, local) way, so that it affects only the required regions. Use Bayesian approach -- assign priors to the new nuisance parameters that constrain them from moving too far (or use equivalent frequentist penalty terms in likelihood). Unfortunately these solutions may not be practical and one may be forced to use ad hoc recipes (last resort). page 41

Summary and conclusions Key to covering a systematic uncertainty is to include the appropriate nuisance parameters, constrained by all available info. Enlarge model so that for at least one point in its parameter space, its difference from the truth is negligible. In frequentist approach can use profile likelihood (similar with integrated product of likelihood and prior -- not discussed today). Too many nuisance parameters spoils information about parameter(s) of interest. In Bayesian approach, need to assign priors to (all) parameters. Can provide important flexibility over frequentist methods. Can be difficult to encode uncertainty in priors. Exploit recent progress in Bayesian computation (MCMC). Finally, when the LHC announces a 5 sigma effect, it's important to know precisely what the "sigma" means. page 42

Extra slides page 43

Digression: marginalization with MCMC Bayesian computations involve integrals like often high dimensionality and impossible in closed form, also impossible with normal acceptance-rejection Monte Carlo. Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation. MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers: cannot use for many applications, e.g., detector MC; effective stat. error greater than naive n. Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest. page 44

Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC: Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc. Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?) page 45

MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points 1) Start at some point 2) Generate Proposal density e.g. Gaussian centred about 3) Form Hastings test ratio 4) Generate 5) If else move to proposed point old point repeated 6) Iterate page 46

Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one). For our purposes this correlation is not fatal, but statistical errors larger than naive The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric: Test ratio is (Metropolis-Hastings): I.e. if the proposed step is to a point of higher if not, only take the step with probability If proposed step rejected, hop in place., take it; page 47

Metropolis-Hastings caveats Actually one can only prove that the sequence of points follows the desired pdf in the limit where it runs forever. There may be a burn-in period where the sequence does not initially follow Unfortunately there are few useful theorems to tell us when the sequence has converged. Look at trace plots, autocorrelation. Check result with different proposal density. If you think it s converged, try starting from a different point and see if the result is similar. page 48