Journeys of an Accidental Statistician

Similar documents
Statistics of Small Signals

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Unified approach to the classical statistical analysis of small signals

arxiv:physics/ v2 [physics.data-an] 16 Dec 1999

Chapter 3: Interval Estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Confidence intervals and the Feldman-Cousins construction. Edoardo Milotti Advanced Statistics for Data Analysis A.Y

Solution: chapter 2, problem 5, part a:

Method of Feldman and Cousins for the construction of classical confidence belts

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

Constructing Ensembles of Pseudo-Experiments

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Confidence intervals fundamental issues

Statistical Methods in Particle Physics

Use of the likelihood principle in physics. Statistics II

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistics Challenges in High Energy Physics Search Experiments

Advanced Statistics Course Part I

Second Workshop, Third Summary

Hypothesis testing (cont d)

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Some Topics in Statistical Data Analysis

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

OVERVIEW OF PRINCIPLES OF STATISTICS

Statistics for the LHC Lecture 2: Discovery

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Lecture 24. Introduction to Error Analysis. Experimental Nuclear Physics PHYS 741

CREDIBILITY OF CONFIDENCE INTERVALS

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Frequentist Confidence Limits and Intervals. Roger Barlow SLUO Lectures on Statistics August 2006

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

P Values and Nuisance Parameters

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters

Statistical Methods for Particle Physics

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

arxiv: v1 [hep-ex] 9 Jul 2013

Review: Statistical Model

Recent developments in statistical methods for particle physics

Why Try Bayesian Methods? (Lecture 5)

32. STATISTICS. 32. Statistics 1

Statistical Methods for Particle Physics (I)

Statistics for the LHC Lecture 1: Introduction

Introductory Statistics Course Part II

Modern Methods of Data Analysis - SS 2009

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Leïla Haegel University of Geneva

Search and Discovery Statistics in HEP

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Chapter Three. Hypothesis Testing

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Sensitivity of oscillation experiments to the neutrino mass ordering

Practical Statistics part II Composite hypothesis, Nuisance Parameters

AST 418/518 Instrumentation and Statistics

Statistical Methods for Astronomy

Bayesian Inference: Concept and Practice

Mathematical Statistics

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

The Jeffreys-Lindley Paradox and Discovery Criteria in High Energy Physics

32. STATISTICS. 32. Statistics 1

Hypothesis Testing - Frequentist

Bayesian Inference for Normal Mean

Bayes Theorem (Recap) Adrian Bevan.

theta a framework for template-based modeling and inference

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

The Signal Estimator Limit Setting Method

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Search Procedures in High Energy Physics

the unification of statistics its uses in practice and its role in Objective Bayesian Analysis:

Harrison B. Prosper. CMS Statistics Committee

Statistical Data Analysis Stat 3: p-values, parameter estimation

Discovery significance with statistical uncertainty in the background estimate

arxiv: v1 [physics.data-an] 7 Jan 2013

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Discovery and Significance. M. Witherell 5/10/12

Primer on statistics:

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Line Broadening. φ(ν) = Γ/4π 2 (ν ν 0 ) 2 + (Γ/4π) 2, (3) where now Γ = γ +2ν col includes contributions from both natural broadening and collisions.

Statistical Methods for Astronomy

Whaddya know? Bayesian and Frequentist approaches to inverse problems

Machine Learning 4771

A Calculator for Confidence Intervals

Markov Chain Monte Carlo

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Some Statistical Tools for Particle Physics

Bayesian vs frequentist techniques for the analysis of binary outcome data

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Frequentist-Bayesian Model Comparisons: A Simple Example

Harrison B. Prosper. CMS Statistics Committee

Discovery Potential for the Standard Model Higgs at ATLAS

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Frequentist Statistics and Hypothesis Testing Spring

Dealing with Data: Signals, Backgrounds and Statistics

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Transcription:

Journeys of an Accidental Statistician A partially anecdotal account of A Unified Approach to the Classical Statistical Analysis of Small Signals, GJF and Robert D. Cousins, Phys. Rev. D 57, 3873 (1998) and further unpublished work. Gary Feldman 1 Journeys

A Simple Problem Suppose you are searching for a rare process and have a well-known expected background of 3 events, and you observe 0 events. What 90% confidence limit can you set on the unknown rate for this rare process? A classical (or frequentist) statistician makes a statement about the probability of data given theory. That is, given a hypothesis for the value of an unknown true value µ, he or she will give you the probability of obtaining a set of data x, P(x µ). A classical confidence interval (Jerzy Neyman, 1937) is a statement of the form: The unknown true value of µ lies in the region [µ 1,µ 2 ]. If this statement is made at the 90% confidence level, then it will be true 90% of the time, and false 10% of the time. Poisson statistics P(x = 0 µ = 2.3) = 0.1. Therefore, in the standard classical approach, µ < 2.3 at 90% C.L. Since µ = s + b, and b = 3.0, s < -0.7 at 90% C.L. Thus, we are led to a statement that we know is a priori false. Gary Feldman 2 Journeys

Bayesian Statistics A Bayesian takes the opposite position from a classical statistician. He or she calculates the probability of theory given data. That is, given a set of data x, he or she will calculate the probability that the unknown true value is µ, P(µ x). This appears attractive because it is what you really want to know. However, it comes at a price: P(x µ) and P(µ x) are related by Bayes Theorem, which in set theory is the statement that an element is in both A and B is P(A B) P(B) = P(B A) P(A) which for probabilities becomes P(µ x) = P(x µ) P(µ)/P(x). P(x) is just a normalization term, but Bayes Theorem transforms P(µ), the prior distribution of degree of belief in µ, to P(µ x), the posterior distribution. A credible interval or Bayesian confidence interval is µ 2 formed by P(µ x) dµ = 90%. µ 1 Gary Feldman 3 Journeys

An Example Suppose you have a large number of marbles, which are either white or black, and you wish information on the fraction that are white, µ. You draw a single marble, and it is white. What can you say at 90% confidence? Classical: µ 0.1 Prior Bayesian: flat µ 0.316 1/µ µ 0.1 1/(1-µ) unnormalizable µ µ 0.464 (1-µ) µ 0.196 Notice that most of the Bayesian priors do not cover, i.e., they are not true statements the stated fraction of the time (90% in this case). There is no requirement that credible intervals cover. However, Bob Cousins warns [Am. J. Phys. 63, 398 (1995)], if a Bayesian method is known to yield intervals with frequentist coverage appreciably less than the stated C.L. for some possible value of the unknown parameters, then it seems to have no chance of gaining consensus acceptance in particle physics. Gary Feldman 4 Journeys

The Role of Bayesian Statistics Prosper [Phys. Rev. D 37, 1153 (1988)] argues for a 1/µ prior based on a scaling argument. I found it unsatisfactory for two reasons: (1) It fails for x = 0. (Unnormalizable) (2) In general, it undercovers. I decided to continue my search for a solution to the simple problem in classical statistics. To quote Bob s prose from our paper: In our view, the attempt to find a non-informative prior within Bayesian inference is misguided. The real power of Bayesian inference lies in its ability to incorporate informative prior information, not ignorance. Prosper wrote that he was using a Bayesian approach because we are merely acknowledging the fact that a coherent solution to the small-signal problem is more easily achieved within a Bayesian framework than one which uses the methods of classical statistics. By the end of this talk, I hope to convince you that this is no longer true. Gary Feldman 5 Journeys

Construction of Confidence Intervals Neyman s prescription: Before doing an experiment, for each possible value of theory parameters determine a region of data that occurs C.L. of the time, say 90%. After doing the experiment, find all of values of the theory parameters for which your data is in their 90% region. This is the confidence interval. Notice that there is complete freedom of choice of which 90% to choose. This will be the key to our solution. Gary Feldman 6 Journeys

Examples of Poisson Confidence Belts 90% C.L. upper limits for Poisson µ with background = 3 90% C.L. central limits for Poisson µ with background = 3 Gary Feldman 7 Journeys

The Solution For both the upper limit and central limit, x = 0 excludes the whole plane. But consider the problem from the point of view of the data. If one measures no events, then clearly the most likely value of µ is zero. Why should one rule out the most likely scenario? Therefore, we propose a new ordering principle based on the ratio of a given µ to the most likely µ, µ ˆ : P(x µ) R = P(x µ ˆ ) Example for µ = 0.5 and b = 3: x P(x µ) ˆ µ P(x ˆ µ ) R rank U.L. C.L. 0 0.030 0.0 0.050 0.607 6 1 0.106 0.0 0.149 0.708 5 2 0.185 0.0 0.224 0.826 3 3 0.216 0.0 0.224 0.963 2 4 0.189 1.0 0.195 0.966 1 5 0.132 2.0 0.175 0.753 4 6 0.077 3.0 0.161 0.480 7 7 0.039 4.0 0.149 0.259 8 0.017 5.0 0.140 0.121 Gary Feldman 8 Journeys

The Poisson Limits 90% C.L. unified limits for Poisson µ with background = 3 Excerpt from Table IV of the paper (90% C.L.): x/b 0.0 1.0 2.0 3.0 0 0.00-2.44 0.00-1.61 0.00-1.26 0.00-1.08 1 0.11-4.36 0.00-3.36 0.00-2.53 0.00-1.88 2 0.53-5.91 0.00-4.91 0.00-3.91 0.00-3.04 3 1.10-7.42 0.10-6.42 0.00-5.42 0.00-4.42 4 1.47-8.60 0.74-7.60 0.00-6.60 0.00-5.60 5 1.84-9.99 1.25-8.99 0.43-7.99 0.00-6.99 6 2.21-11.47 1.61-10.47 1.08-9.47 0.15-8.47 7 3.56-12.53 2.56-11.53 1.59-10.53 0.89-9.53 8 3.96-13.99 2.96-12.99 2.14-11.99 1.51-10.99 9 4.36-15.30 3.36-14.30 2.53-13.30 1.88-12.30 Gary Feldman 9 Journeys

Examples of Gaussian Confidence Belts 90% C.L. upper limits for Gaussian µ 0 vs. x (total background) in σ 90 % C.L. central limits for Gaussian µ 0 Gary Feldman 10 Journeys

Flip-flopping How might a typical physicist use these plots? (1) If the result x < 3σ, I will quote an upper limit. (2) If the result x > 3σ, I will quote central confidence interval. (3) If the result x < 0, I will pretend I measured zero. This results in the following: 10% 5% In the range 1.36 µ 4.28, there is only 85% coverage! Due to flip-flopping (deciding whether to use an upper limit or a central confidence region based on the data) these are not valid confidence intervals. Gary Feldman 11 Journeys

Unified Solution to the Gaussian Case Notes: (1) This approaches the central limits for x >>1. (2) The upper limit for x = 0 is 1.64, the two-sided rather than the one-sided limit. (3) From the defining 1937 paper of Neyman, this is the only valid confidence belt, since there are 4 requirements for a valid belt: (a) It must cover. (b) For every x, there must be at least one µ. (c) No holes (only valid for single µ). (d) Every limit must include its end points. Gary Feldman 12 Journeys

Including Errors on the Background Our paper assumed perfect knowledge of the backgrounds. However, often backgrounds are estimated from an ancillary experiment (sidebands, target-out run, etc.) How does the error on the ancillary experiment affect the limits? A Bayesian can solve this problem trivially by integrating over the background distribution. However, this is not permitted for frequentist. He or she must provide coverage for all possible values of the unknown true background rate. [The unknown true background rate is technically known as a nuisance parameter, because one must provide coverage for it, but you really do not care what its value is.] The exact efficient solution to this problem is computationally taxing, and has never been published. This is a second reason why physicists have been driven to Bayesian statistics. However, there is a simple solution to this problem within the unified approach, and Bob and I have agreed that we will publish it this summer. Gary Feldman 13 Journeys

Trick for Including Errors on Backgrounds If one provides coverage for the worst case of the nuisance parameter, then perhaps one will have provided coverage for all possible values of the nuisance parameter. Let µ = unknown true value of the signal parameter β = unknown true value of the background parameter x = measurement of µ + β b = measurement of β in the ancillary experiment The rank R becomes R ( µ, x,b) = P(x µ + ˆ β )P(b ˆ β ) P(x µ ˆ + β ˆ )P(b ˆ β ), where µ ˆ and β ˆ maximize the denominator (as usual), and ˆ β maximizes the numerator. Since R is only a function of µ and the data, one proceeds to construct the confidence belt as before. Gary Feldman 14 Journeys

Examples and Test of the Trick Let x be a Poisson measurement of µ + β and b be a Poisson measurement of β/r in an ancillary experiment (i.e., r = signal region/control region), r / n, rb 0, 3 3, 3 6, 3 9, 3 0.0 0.00-1.08 0.00-4.42 0.15-8.47 1.88-12.30 0.5 0.00-1.11 0.00-4.42 0.00-8.47 1.75-12.30 1.0 0.00-1.49 0.00-4.73 0.00-8.70 1.32-12.55 3.0 0.00-1.57 0.00-4.85 0.00-9.36 0.00-13.03 A test of coverage for r = 1.0, 0 µ 20, 0 β 15 at 10,000 randomly picked values of µ and β: Median coverage: 90.25% Fraction that undercovered: 4.10% Median coverage for those that 89.96% undercovered: Worst undercoverage: 89.46% Comment from Harvard statisticians: We have never seen a statistical approximation work this well. Gary Feldman 15 Journeys

Back to Neutrinos As I had hoped, once we had the solution to the simple problem, the solution to the neutrino problem was obvious, unique, and natural. (My postdoc, Fred Weber, was doing almost the correct thing instinctively.) Probability of oscillation: P = sin 2 (2θ) sin 2 (1.27 m 2 L/E) Since χ 2 = 2ln(P), the ordering principle becomes a χ 2 = [χ 2 (data point in plane) - χ 2 (data best fit point in plane)]. The prescription: To calculate the acceptance region for each point in the plane, do a Monte Carlo simulation and determine what critical value of χ 2 will contain 90% of the data. Gary Feldman 16 Journeys

Combining Experiments: The Higgs Mass An example that may be of interest to LEP experimenters is how to use these techniques to combine different experimental measurements of the Higgs Mass. Let µ be the unknown true value of the Higgs mass, which is presumably a known function of the counting rate, ρ i, where i is an index denoting the experiment. In each experiment i, let β i be the unknown true rate of background. Let b i be a measure of the background in an ancillary experiment, say the sidebands. Finally, let n i be the measured rate of ρ i + β i. The ordering principle becomes: ˆ ˆ P(n i µ, β i )P(b i β i ) i R(µ,n i,b i ) = P(n i µ ˆ, β ˆ i )P(b i β ˆ, i ) i where µ ˆ and β ˆ i maximize the denominator and ˆ β i maximize the numerator. Gary Feldman 17 Journeys

The Higgs Mass Example (continued) The maximization of the numerator can be done analytically; in general, the maximization of the denominator requires a simple numerical solution. For each value of µ, the critical value of R c (µ) n i, b i R(µ,n i, b i )<R c ( µ) P(n i µ, ˆ β i )P(b i ˆ β i ) = 0.9 is determined. If the explicit sum is computationally impractical, a Monte Carlo calculation can be substituted. If the measured values of n i and b i are N i and B i, then the 90% confidence interval for µ is given by R(µ,N i, B i ) R c (µ) Gary Feldman 18 Journeys

The Higgs Mass Example (continued) Using the following information from CERN-EP/98-046, Lower bound for the Standard Model Higgs boson mass from combining the results of the four LEP experiments Experiment Expected Background Seen Aleph 0.84 0 Delphi 4.24 2 Opal 4.07 2 Total 9.15 4 (Not enough information given to include L3.) and the figure giving expected signal vs. Higgs mass, assuming backgrounds are perfectly known, I obtained m H < 75 GeV at 95% C.L., but a sensitivity to the Higgs Mass of only 71 GeV. (5% probability of observing 4 events when you expect 9.15.) (CERN-EP/98-047 numbers: 77.5 GeV and 75.5 GeV) Gary Feldman 19 Journeys

Brand X Comparisons: Raster Scan We built a toy model to compare this technique with three other classical techniques that have, or could have, been used in previous experiments. Raster scan: For each m 2, scan along sin 2 (2θ) to find the minimum χ 2 (including the unphysical region), and take the 90% confidence interval to have χ 2 2 χ min + 2.71 (the two-sided limit). This technique (and the other two to be discussed) will have all the problems of the possibility of excluding all physically allowed values, which we originally set out to solve. However, it does cover exactly, but it is not powerful (i.e., it does not do an efficient job of rejecting false hypotheses). Gary Feldman 20 Journeys

Brand X Comparisons: Flip-flop Raster Scan Flip-flop raster scan: Same as raster scan, but use a onesided limit (χ 2 2 χ min + 1.64), unless the signal is > 3σ. This has all the features of the raster scan, but it significantly undercovers (85%) in a substantial region: Gary Feldman 21 Journeys

Brand X Comparisons: Global Scan Global Scan: Find the global minimum and take (χ 2 2 χ min + 4.61, the two-sided 90% C.L. for χ 2 with 2 degrees of freedom). This technique is powerful, but has significant regions of both undercoverage (76%) and overcoverage (94%). Possible reasons: (1) Sinusoidal nature of the oscillation function. (2) One-dimensional regions. Gary Feldman 22 Journeys

Brand X Comparisons: Scorecard Technique Always gives useful information Gives proper coverage Is powerful Raster scan X X Flip-flop R.S. X X X Global scan X X This technique Gary Feldman 23 Journeys

Sensitivity The main objection to this work has been that an experiment that observes fewer events than the expected background may report a lower upper limit than a (better designed?) experiment that has no background. To address this problem and to provide additional information for the reader s assessment of the significance of the results, we suggest that experiments that have fewer counts than expected background also report their sensitivity, which we define as the average upper limit that would be obtained by an ensemble of experiments with the expected background and no true signal. For example, in the initial NOMAD publication of 1995 data, we reported an upper limit on high-mass ν µ ν τ oscillations of sin 2 (2θ) < 4.2 10 3 and a sensitivity of 6.0 10 3. A few days ago, at the Neutrino 98 conference, we reported preliminary results from a partial analysis of 1995-97 data yielding an upper limit of 2.2 10 3 and a sensitivity of 1.9 10 3. Gary Feldman 24 Journeys

Visit to Harvard Statistics Department Towards the end of this work, I decided to try it out on some professional statisticians whom I know at Harvard. They told me that this was the standard method of constructing a confidence interval! I asked them if they could point to a single reference of anyone using this method before, and they could not. They explained that in statistical theory there is a one-toone correspondence between a hypothesis test and a confidence interval. (The confidence interval is a hypothesis test for each value in the interval.) The Neyman-Pearson Theorem states that the likelihood ratio gives the most powerful hypothesis test. Therefore, it must be the standard method of constructing a confidence interval. I decided to start reading about hypothesis testing Gary Feldman 25 Journeys

Kendall and Stuart (1961) At the start of chapter 24 of Kendall and Stuart s The Advanced Theory of Statistics (chapter 23 of Stuart and Ord), I found 1 1/4 cryptic pages that contain all of the information, published and unpublished, in this lecture. (See text) Gary Feldman 26 Journeys

1998 Particle Data Group Review The 1998 PDG Review of Particle Physics [European Physics Journal C3 (1998) 1] has supported this approach strongly: The Bayesian methodology, while well adapted to decision-making situations, is not in general appropriate for the objective presentation of experimental data. In order to correct this problem [undercoverage due to flip-flopping] it is necessary to adopt a unified procedure. The appropriate unified procedure and ordering principle are given in Ref. 11 [Feldman- Cousins]. If it is desired to report an upper limit in an objective way such that it has a well-defined statistical meaning in terms of limiting frequency, then report the Frequentist confidence bound(s) as given by the unified Feldman- Cousins approach. we suggest that a measure of sensitivity should be reported whenever expected background is larger or comparable to the number of observed counts. The best such measure we know of is that proposed and tabulated in Ref. 11. Gary Feldman 27 Journeys

Conclusion I have yet to find a problem in the construction of classical confidence intervals and regions that is not solvable by the ordering principle suggested here. For example, we have been using it with success in the NOMAD experiment to combine the results for different τ decay modes, including errors on the background determination. Post script for LEP audiences: The LEP Higgs paper (CERN-EP/98-046) states: There is no unique, exact, method to deal with this problem. The method presented here is exact by construction. I believe that this method, due to its simplicity and reliance on classical methods, the classical construction of the confidence interval and the use of the likelihood ratio, deserves to be called unique, if any method does. Gary Feldman 28 Journeys