E. Santovetti lesson 4 Maximum likelihood Interval estimation

Similar documents
Statistical Methods in Particle Physics

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Journeys of an Accidental Statistician

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

Statistics of Small Signals

Solution: chapter 2, problem 5, part a:

Use of the likelihood principle in physics. Statistics II

Statistics for the LHC Lecture 1: Introduction

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Modern Methods of Data Analysis - WS 07/08

Statistical Methods in Particle Physics Day 4: Discovery and limits

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Method of Feldman and Cousins for the construction of classical confidence belts

Primer on statistics:

32. STATISTICS. 32. Statistics 1

Statistical Methods for Particle Physics (I)

Unified approach to the classical statistical analysis of small signals

32. STATISTICS. 32. Statistics 1

Introduction to Likelihoods

Some Topics in Statistical Data Analysis

Recent developments in statistical methods for particle physics

arxiv:physics/ v2 [physics.data-an] 16 Dec 1999

Second Workshop, Third Summary

Constructing Ensembles of Pseudo-Experiments

32. STATISTICS. 32. Statistics 1

Hypothesis Testing - Frequentist

Error analysis for efficiency

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Confidence intervals and the Feldman-Cousins construction. Edoardo Milotti Advanced Statistics for Data Analysis A.Y

Modern Methods of Data Analysis - SS 2009

Statistical Methods for Particle Physics

Bayes Theorem (Recap) Adrian Bevan.

Advanced Statistics Course Part I

Statistics and Data Analysis

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

Practical Statistics part II Composite hypothesis, Nuisance Parameters

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

New Bayesian methods for model comparison

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Confidence intervals fundamental issues

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Introductory Statistics Course Part II

P Values and Nuisance Parameters

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Frequentist Confidence Limits and Intervals. Roger Barlow SLUO Lectures on Statistics August 2006

Physics 509: Bootstrap and Robust Parameter Estimation

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Bayesian Learning (II)

Hypothesis testing (cont d)

Statistics: Learning models from data

Statistical Methods for Astronomy

Introduction to Statistical Methods for High Energy Physics

Search and Discovery Statistics in HEP

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

An introduction to Bayesian reasoning in particle physics

Review: Statistical Model

Statistics for the LHC Lecture 2: Discovery

Lecture 24. Introduction to Error Analysis. Experimental Nuclear Physics PHYS 741

Basic Concepts of Inference

Modern Methods of Data Analysis - WS 07/08

CSC321 Lecture 18: Learning Probabilistic Models

One-parameter models

Some Statistical Tools for Particle Physics

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

arxiv: v1 [hep-ex] 9 Jul 2013

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Statistics Challenges in High Energy Physics Search Experiments

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

OVERVIEW OF PRINCIPLES OF STATISTICS

Introduction to the Terascale: DESY 2015

A Calculator for Confidence Intervals

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Statistical Methods in Particle Physics. Lecture 2

Recommendations for presentation of error bars

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Frequentist Statistics and Hypothesis Testing Spring

STAT 425: Introduction to Bayesian Analysis

6.867 Machine Learning

Relative branching ratio measurements of charmless B ± decays to three hadrons

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

PoS(ICHEP2012)238. Search for B 0 s µ + µ and other exclusive B decays with the ATLAS detector. Paolo Iengo

Bayesian Methods for Machine Learning

Transcription:

E. Santovetti lesson 4 Maximum likelihood Interval estimation 1

Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson r.v., mean. The extended likelihood function is then: If is a function of we have: Example: the expected number of events of a certain process Extended ML uses more informations, so the error on the parameters will be smaller, compared to the case in which n independent In case n doesn't depend from we have the usual likelihood 2

Extended ML example Consider two types of events (e.g., signal and background) each of which predict a given pdf for the variable x: fs(x) and fb(x). We observe a mixture of the two event types, signal fraction = θ, expected total number = ν, observed total number = n Let s = and b = (1- be the number of signal and background that we want to evaluate 3

Extended ML example (2) Consider for signal a Gaussian pdf and for background an exponential Maximize the log L, to find s and b Here errors reflect total Poisson fluctuation as well as that in proportion of signal/background 4

Unphysical values for estimators Here the unphysical estimator is unbiased and should nevertheless be reported, since average of a large number of unbiased estimates converges to the true value (cf. PDG). Repeat entire MC experiment many times, allow unphysical estimates. 5

Extended ML example II LH does not provide any information on the Goodness of the fit. This has to be checked separately. Simulate toy MC according to estimated pdf (using fit results from data as true parameter values) compare max Likelihood value in toy to the one in data Draw data in (binned) histogram, compare distribution with result of ML fit. 6

Extended ML example II Again you want to distinguish (and count) signal events respect to background events. The signal is the process The background is combinatorial: vertex (and then particle) reconstructed with the wrong tracks. To select signal from background we can use two main things: 1) The invariant mass of the two daughter particles has to peak to the B meson mass; 2) The time of flight of the B meson candidate has to be of the order of the B meson life time. These two variables have to behave in a complete different way for the two event categories. Let us see the distribution of this variables 7

Extended ML example II A first look at the distribution (mass and time) allow us to state: pdf for signal mass: double Gaussian pdf for signal time: exponential (negative) pdf for background mass: exponential (almost flat) pdf for background time: exponential + Lorentzian We build the pdf and make as: By maximizing the likelihood, we can estimate the number of signal and background as well as the B meson mass and life time 8

Extended ML example II signal background all data mass Life time The fit is done with the RooFit package (Root) 9

Weighted maximum likelihood Suppose we want to measure the polarization of the J/Ψ meson (1--). The measurement can be done by looking at the angular distribution of the decay product of the meson itself: and are respectively the polar and azimuthal angles of the positive muon, in the decay in the J/Ψ rest frame, measured choosing the J/Ψ direction in the lab frame as polar axis. θ 10

Weighted likelihood polarization measurement We have to measure the angular distribution and fit with the function There are two main problem to face: When we select our signal, there is an unavoidable amount of background events (evident from the mass distribution) The angular distribution of the background events is unknown and also very difficult to parametrize The likelihood function is: Where ε is the total detection efficiency, P is the above angular function and Norm is a normalization function in order to have probability normalized to 1. 11

Weighted likelihood polarization measurement The efficiency term at the denominator does not depend on λ parameters and then is a constant term in the maximization procedure. In order to take into account the background events the likelihood sum is extended at all events but with some proper weights: The background mass distribution is linear. The hypothesis is well satisfied otherwise we can always take into account this by readjusting the weights in a proper way right side band The combinatorial background angular distributions are the same in the signal and side bands regions (can be demonstrated shifting the three regions 300 MeV up) left side band The background events contribution cancels out if: 12

Weighted likelihood polarization measurement How to evaluate the Norm function (depending on detector efficiency)? We can again use the MC simulation, considering an unpolarized sample, P=1 and the sum is over the MC events. 13

Weighted likelihood polarization measurement Then from MC events we can compute the function: 14

The splot technique 15

Relationship between ML and Bayesian estimators In Bayesian statistics, both θ and x are random variables:. In the Bayes approach, if θ is a certain hypothesis: posterior θ pdf (conditional pdf for θ given x) prior θ probability Purist Bayesian: p(θ x) contains all the informations about θ. Pragmatist Bayesian: p(θ x) can be a complicated function: summarize by using new estimator Looking at p(θ x): what do we use for π(θ)? No golden rule (subjective!), often represent prior ignorance by π(θ) = constant, in which case 16 But... we could have used a different parameter, e.g., λ = 1/θ, and if prior π(θ) is constant, then π(λ) is not! Complete prior ignorance is not well defined.

Relationship between ML and Bayesian estimators The main concern expressed by frequentist statisticians regarding the use of Bayesian probability is its intrinsic dependence on a prior probability that can be chosen in an arbitrary way. This arbitrariness makes Bayesian probability to some extent subjective. Adding more measurements increases one s knowledge of the unknown parameter, hence the posterior probability will depend less, and be less sensitive to the choice of the prior probability. In those cases, where a large number of measurements occurs, in most of the cases the results of Bayesian calculations tend to be identical to those of frequentist calculations. Many interesting statistical problems arise in the cases of low statistics, i.e. a small number of measurements. In those cases, Bayesian or frequentist methods usually leads to different results. In those cases, using the Bayesian approach, the choice of the prior probabilities plays a crucial role and has great influence in the results. One main difficulty is how to chose a PDF that models one s complete ignorance on an unknown parameter. One naively could choose a uniform ( flat ) PDF in the interval of validity of the parameter. But it is clear that if we move the parametrization from x to a function of x (say log x or 1/x), the resulting 17 transformed parameter will no longer have a uniform prior PDF

The Jeffreys prior One possible approach has been proposed by Harold Jeffreys, adopting a choice of prior PDF that results invariant under parameter transformation. This choice is: with: Determinant of the Fischer information matrix Examples of Jeffreys prior distributions for some important parameters 18

Interval estimation, setting limits 19

Interval estimation introduction In addition to a point estimate of a parameter we should report an interval reflecting its statistical uncertainty. Desirable properties of such an interval may include: communicate objectively the result of the experiment; have a given probability of containing the true parameter; provide informations needed to draw conclusions about the parameter possibly incorporating stated prior beliefs. Often use +/- the estimated standard deviation of the estimator. In some cases, however, this is not adequate: estimate near a physical boundary, e.g., an observed event rate consistent with zero. We will look briefly at Frequentist and Bayesian intervals. 20

Neyman confidence intervals Rigorous procedure to get confidence intervals in frequentist approach Consider an estimator for a parameter (measurable) We also need the pdf Specify upper and lower tail probabilities, e.g., α = 0.05, β = 0.05, then find functions uα(θ) and vβ(θ) such that Integral over all the possible estimator values We obtain a confidence interval (CL = 1-α-β) for the estimator function of the true parameter value θ. This is the interval for the estimator No unique way to define this interval with the same CL 21

Confidence interval from the confidence belt Confident belt region is a function of the parameter Find points where observed estimate intersects the confidence belt This gives the confidence interval for The true parameter Confidence level = 1 - α - β = probability for the interval to cover true value of the parameter (holds for any possible true θ) with 22

Confidence intervals by inverting a test Confidence intervals for a parameter θ can be found by defining a test of the hypothesized value θ (do this for all θ): Define values of the data that are disfavored by θ (critical region) such that P(data in critical region) γ for a specified γ, e.g., 0.05 or 0.1. If data observed in the critical region, reject the value θ. Now invert the test to define a confidence interval as: set of θ values that would not be rejected in a test of size γ (confidence level is 1 - γ ). We have to collect many data... The interval will cover the true value of θ with probability 1 - γ. Equivalent to confidence belt construction; confidence belt is acceptance region of a test. 23

Relation between confidence interval and p-value Equivalently we can consider a significance test for each hypothesized value of θ, resulting in a p-value, pθ. Equivalently we can consider a significance test for each hypothesized value of θ, resulting in a p-value, pθ. The confidence interval at CL = 1 γ consists of those values of θ that are not rejected. E.g. an upper limit on θ is the greatest value for which pθ γ. In practice find by setting pθ = γ and solve for θ 24

Confidence intervals in practice In practice, in order to find the interval [a, b] we have to solve: we replace uα(θ) with and get a we replace uβ(θ) with and get b a is hypothetical value of θ such that b is hypothetical value of θ such that 25

Meaning of a confidence interval Important to keep in mind: The interval is random The true θ is an unknown constant Often we report this interval as: This does not mean but: repeat the measurements many times, build the interval according to the same prescription each time, in 1 experiments the interval will contain θ 26

Central vs. one-sided confidence intervals Fixed the CL, the choice of and is not unique, in literature this is the so called ordering rule Sometimes, only specified or : one-side interval (limit) Often = = / 2: coverage probability 1- : central confidence interval N.B.: central confidence level does not mean symmetric interval around θ. In HEP the convention to quote the error is: = = / 2 with 1 - = 68.3% = 1σ 27

Intervals from the likelihood function In the large sample limit it can be shown for ML estimators: N-dimensional Gaussian, variance V defines a hyper ellipsoidal confidence region If the θ follows a multi-dimentional Gaussian 28

Approximate confidence regions from L(θ) So the recipe to find the confidence region with CL = 1 - γ is: For finite samples, these are approximate confidence regions. Coverage probability not guaranteed to be exactly equal to 1 - γ ; no simple theorem to say by how far off it will be (use MC). Remember here the interval is random, not the parameter. 29

Example of interval from ln L(θ) For n=1, CL = 1 - γ = 0.683 Q = 1 30

Setting limits on Poisson parameter Consider again the case in which we have a sample of events that contains signal and background (means s and b), and both of them are Poisson variables. Suppose that we can say how many background we expect. Unfortunately we observe: There is clearly no evidence of signal. This means that we cannot exclude s = 0, we can anyway put un upper limit to the number of signal 31

Upper limit for Poisson parameter We have to find the hypothetical s such that there is a given small probability, say, γ = 0.05, to find as few events as we did or less. Solving numerically for s, it gives an upper limit at a confidence level of 1-γ (usually 0.95). Suppose b = 0 and we find n = 0 32

Calculating Poisson parameter limits To find the lower and upper limits we can use the relation to the 2 distribution with z/2. It can be found: For low fluctuation of n this can give negative result for sup; i.e. confidence interval is empty. 33

Limits near a physical boundary Suppose e.g. b = 2.5 and we observe n = 0. If we choose CL = 0.9, we find from the formula for sup negative!? Physicist: We already knew s 0 before we started; can t use negative upper limit to report result of expensive experiment! Statistician: The interval is designed to cover the true value only 90% of the time this was clearly not one of those times. Not uncommon dilemma when limit of parameter is close to a physical boundary 34

Expected limit for s = 0 Physicist: I should have used CL = 0.95, then sup = 0.496 even better: for CL = 0.917923 we get sup = 10-4! We are not taking into account the background fluctuation Reality check: with b = 2.5, typical Poisson fluctuation in n is at least 2.5 = 1.6. How can the limit be so low? Look at the mean limit for the no-signal hypothesis (s = 0) (sensitivity). Distribution of 95% CL limits with b = 2.5, s = 0. Mean upper limit = 4.44 With N MC experiments (poisson) with μ=2.5, I extract n and then evaluate sup with 95% CL 35

The flip-flopping problem In order to determine confidence intervals, a consistent choice of ordering rule has to be adopted. Feldman and Cousins demonstrated that the ordering rule choice must not depend on the outcome of the measurements, otherwise the quoted confidence intervals or upper limits could be incorrect. In some cases, experiment searching for a rare signal make the chose, while quoting their result, to switch from a central interval to an upper limit depending on the outcome of the measurement. A typical choice is to quote an upper limit if the significance of the observed signal is smaller than 3σ, and a central value otherwise We have than to quote the error fixing the CL, say 90%. If x 3σ we choose a symmetric interval (5% each), while if x < 3σ an upper limit implies a completely asymmetric interval. 36

The flip-flopping problem From a single measurement of x we can decide to quote an interval with a e certain CL if x>3σ: or we can decide to quote only an upper limit if our measurement is x<3σ. 37

The flip-flopping problem The choice to switch from a central interval to a fully asymmetric interval (upper limit) based on the observation of x clearly spoils the statistical coverage. Looking at the figure, depending on the value of μ, the interval [x1, x2] obtained by crossing the confidence belt in by an horizontal line, one may have cases where the coverage decreases from 90% to 85%, which is lower than the desired CL. To avoid flip-flopping, decide before the measurement if you quote limit or 2sided interval - and stick to it. Or use Feldman-Cousins 38

The Feldman Cousins method The ordering rule proposed by Feldman and Cousins provides a Neyman confidence belt that smoothly changes from a central or quasi-central interval to an upper limit in the case of low observed signal yield. The ordering rule is based on the likelihood ratio given a value θ0 of the unknown parameter under a Neyman construction, the chosen interval on the variable x is defined from the ratio of two PDFs of x, one under the hypothesis that θis equal to the considered fixed value θ0, the other under the hypothesis that θ is equal to the maximum-likelihood estimate value θbest(x) corresponding to the given measurement x. 39

Feldman Cousins: Gaussian case Let apply the Feldman-Cousins method to a Gaussian distribution When we divide by f(x μbest) we obtain: That is a an asymmetric function with a longer tail to the negative x values Using Feldman-Cousin approach, for alrge x we have the usual symmetric confidence interval. Going at small x (close to the boundary) the interval becomes more and more asymmetric and at certain point it become a completely asymmetric interval (upper limit) 40

The Bayesian approach In Bayesian statistics need to start with prior pdf π(θ), this reflects degree of belief about θ before doing the experiment. Bayes theorem tells how our beliefs should be updated in light of the data x: Then we have to integrate posterior probability to the desired probability confidence level. For the Poisson case, suppose 95% CL, we have: 41

Bayesian prior for Poisson parameter Include knowledge that s 0 by setting prior π(s) = 0 for s<0. Often try to reflect prior ignorance with e.g. Not normalized but this is OK as long as L(s) dies off for large s. Not invariant under change of parameter if we had used instead a flat prior for, say, the mass of the Higgs boson, this would imply a non-flat prior for the expected number of Higgs events. Does not really reflect a reasonable degree of belief, but often used as a point of reference; or viewed as a recipe for producing an interval whose frequentist properties can be studied (coverage will depend on true s). 42

Bayesian interval with flat prior for s Solve numerically to find limit sup. For special case b = 0, Bayesian upper limit with flat prior numerically same as classical case ( coincidence ). Otherwise Bayesian limit is everywhere greater than classical ( conservative ). Never goes negative. Doesn t depend on b if n = 0 43