Calculating CLs Limits. Abstract

Similar documents
U-Pb Geochronology Practical: Background

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Composite Hypotheses testing

Linear Approximation with Regularization and Moving Least Squares

An Application of Fuzzy Hypotheses Testing in Radar Detection

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Gaussian Mixture Models

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Likelihood Methods: A Companion to the NEPPSR analysis project. Colin Gay, Yale University

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Georgia Tech PHYS 6124 Mathematical Methods of Physics I

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Lecture Notes on Linear Regression

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Negative Binomial Regression

Estimation: Part 2. Chapter GREG estimation

Chapter 13: Multiple Regression

Economics 130. Lecture 4 Simple Linear Regression Continued

Statistics II Final Exam 26/6/18

Société de Calcul Mathématique SA

Global Sensitivity. Tuesday 20 th February, 2018

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Goodness of fit and Wilks theorem

Limited Dependent Variables

Bayesian predictive Configural Frequency Analysis

} Often, when learning, we deal with uncertainty:

Statistics Chapter 4

Laboratory 3: Method of Least Squares

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

a b a In case b 0, a being divisible by b is the same as to say that

x = , so that calculated

Estimation of the Mean of Truncated Exponential Distribution

Uncertainty in measurements of power and energy on power networks

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Chapter 11: Simple Linear Regression and Correlation

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Comparison of Regression Lines

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Convergence of random processes

Laboratory 1c: Method of Least Squares

DUE: WEDS FEB 21ST 2018

Chapter 12 Analysis of Covariance

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Polynomial Regression Models

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Inductance Calculation for Conductors of Arbitrary Shape

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

Fuzzy Boundaries of Sample Selection Model

First Year Examination Department of Statistics, University of Florida

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Randomness and Computation

Uncertainty as the Overlap of Alternate Conditional Distributions

Professor Chris Murray. Midterm Exam

Quantum and Classical Information Theory with Disentropy

Appendix B. Criterion of Riemann-Stieltjes Integrability

The Order Relation and Trace Inequalities for. Hermitian Operators

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH

Decision-making and rationality

Lecture 20: Hypothesis testing

x i1 =1 for all i (the constant ).

/ n ) are compared. The logic is: if the two

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Stat 543 Exam 2 Spring 2016

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Appendix B: Resampling Algorithms

Kernel Methods and SVMs Extension

Stat 543 Exam 2 Spring 2016

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

The exam is closed book, closed notes except your one-page cheat sheet.

Lecture 12: Discrete Laplacian

Conjugacy and the Exponential Family

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Lecture 10 Support Vector Machines II

Errors for Linear Systems

Expectation Maximization Mixture Models HMMs

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

Strong Markov property: Same assertion holds for stopping times τ.

Singular Value Decomposition: Theory and Applications

CinChE Problem-Solving Strategy Chapter 4 Development of a Mathematical Model. formulation. procedure

Foundations of Arithmetic

An (almost) unbiased estimator for the S-Gini index

Transcription:

DØNote 4492 Calculatng CLs Lmts Harrson B. Prosper Florda State Unversty, Tallahassee, Florda 32306 (Dated: June 8, 2004) Abstract Ths note suggests how the calculaton of lmts based on the CLs method mght be performed more effcently. 1

2 I. INTRODUCTION The CLs method [1], whch has been ntegrated nto the Sngle Top Analyss Framework by Brgtte Vachon, was developed by LEP physcsts n the context of the Hggs search. The method combnes frequentst and Bayesan elements n a procedure judged pragmatc and successful by ts proponents but conceptually problematc by crtcs, of whch I am one. However, my am here s not to smash the chna but to offer constructve suggestons about how the CLs lmts calculaton mght be made more effcent. But let me frst address a queston whose answer, at frst, seems obvous. A To bn or not to bn, That s the queston. Consder data arsng from two sources, sgnal and background, characterzed by the denstes s(x), and b(x), respectvely, where x s a quantty that dscrmnates between the two sources. For example, x could be the output of a neural network. Bnnng data always entals a loss of nformaton. Therefore, the obvous answer to the queston s that, n prncple, data should not be bnned. The lkelhood for unbnned data s proportonal to a product of s(x) + b(x), wth one term per bn. But here s the rub: The modelng of s(x) and b(x) must be good enough so that the uncertanty n the lkelhood, due to modelng uncertantes n these denstes, s neglgble compared wth other uncertantes n the problem. However, f there are too many terms n the product the uncertanty n the form of the lkelhood could become sgnfcant. The upshot s that only a careful analyss desgn study can provde a ratonal answer to the queston whether or not to bn. There s another reason why the answer to that queston s not necessarly the obvous one: The dstncton between unbnned and bnned data s merely one of degree because each expermental datum s, of necessty, represented by a fnte precson number. There s no such thng as contnuous expermental data; such a thng exsts only as a convenent mathematcal abstracton. In the real world, unbnned data are nothng more than data

3 that have been bnned nto a large number of bns of very small wdth. Ths observaton provdes a way to see the connecton between bnned and unbnned lkelhoods, whch I now sketch. Consder the bnned lkelhood L K λ k exp( λ )/k!, (I.1) where K s the number of bns and k and λ are the count and the mean count, respectvely, n bn. The mean count s gven by λ = [s(z) + b(z)] dz. bn (I.2) Consder now the lmt of Eqs. (I.1) and (I.2) as the bn sze x goes to zero. In that lmt the probablty to get more than one count n a bn becomes neglgble relatve to that for k = 0 or 1 and, therefore, only those terms survve. The terms wth k = 0 collapse to unty and we are left wth only the k = 1 terms L exp( N λ j ) λ j, j j = exp( N [s(x j ) + b(x j )] x j ) [s(x j ) + b(x j )] x j, j j N exp( [s(x) + b(x)]dx) [s(x j ) + b(x j )], j (I.3) where N s the number of bns wth k = 1. In the last step, we have taken the lmt K and x 0. Ths heurstc argument shows that an unbnned lkelhood s merely a bnned lkelhood wth suffcently small bns, whch suggests the followng strategy: Always use a bnned lkelhood but choose the bn sze and number of bns so as to maxmze the desred optmalty crteron.

4 B The CLs statstc Lmts calculated wth the CLs method are based on the samplng dstrbuton of the logarthm of the quantty Q = = under two dfferent hypotheses: M exp[ (s + b )] (s + b ) n exp( b ) b n, (I.4) M ( ) n s + b exp( s ), b The sgnal (plus background) hypothess S and background hypothess B. The product s over = 1 M channels [6], where s a σ s the mean sgnal count wth a the acceptance tmes ntegrated lumnosty, and b = N j=1 y j s the mean background count, whch n general s a sum over j = 1 N mean yelds y j, one for each background source j n each channel. The quantty n s the count n the th channel. None of the quanttes a or y j s known. However, we assume that we have estmates A and Y j of them together wth a covarance matrx Cov(v) that characterzes the uncertanty n our knowledge of the vector of parameters v = (y 11, y 12,, y 1N, a 1,, y M1, y 12,, y MN, a M ) assocated wth a known vector of estmates V = (Y 11, Y 12,, Y 1N, A 1,, Y M1, Y 12,, Y MN, A M ). Gven the samplng dstrbuton of q ln Q, p(q v, V, S), under the sgnal hypothess, and the samplng dstrbuton p(q v, V, B), under the background hypothess, one calculates a CLs lmt, at level β, by solvng the followng rato of p-values (that s, tal probabltes) q q β = 0 p(q v, V, S) q q 0 p(q v, V, B), (I.5) for the upper lmt on the cross-secton σ, where q 0 s the observed value of the statstc q. The prncpal task, therefore, s to compute the sums n Eq. (I.5) quckly and accurately. II. NUMERICAL METHOD A fast elegant method s descrbed n Ref. [2] for computng CLs lmts when the statstc q s contnuous. That method can be construed as an applcaton of a general method

5 descrbed some years ago n Ref. [3]. However, the statstc q n Eq. (I.5) s not contnuous; t s a weghted sum of ntegers where q ln Q, = s + Therefore, the task s to compute sums lke c n, (II.1) ( c ln 1 + s ). (II.2) b C(σ, N, H) = q p(q v, V, H) (II.3) = n 1 n M M λ exp( λ )/n!, where N = (N 1,, N M ) are the observed counts. The sums are over the lattce of ponts {n } that satsfy the constrant q q 0, that s, s + c n s + c N, (II.4) or equvalently, M c n t where t = c N M c N. Bascally, we must sum over all the ponts that le between the orgn and the plane t = M c x. The Posson parameter λ s ether s + b or b, dependng on whch of the two hypotheses, H = S or H = B, s beng consdered. For each n the mnmum value s zero, whle the maxmum value s gven by [t/c ], that s, the nteger part of t/c, whch obtans when all other counts are zero. Such constraned sums can usually be done recursvely. For effcency, C(σ, N, H), should be computed at an approprate set of ponts n the cross-secton σ and an nterpolaton of C, wth respect to σ, should be constructed. Then the upper lmt σ U, at CLs level β, can be had by solvng β = C(σU, N, S) C(σ U, N, B). (II.5)

6 III. SYSTEMATIC UNCERTAINTY In the CLs method, systematc uncertanty s accounted for usng the Bayesan procedure of ntegratng the lkelhood weghted by a pror densty π(v) for the vector of parameters v. Ths s equvalent to replacng C(σ, N, H) wth C (σ, N, H) = C(σ, N, H) π(v) dv. (III.1) In the Sngle Top Group, the pror densty s defned n terms of the known covarance matrx Cov(v) and known vector of estmates V. It s assumed that the pror π(v) can be adequately modeled usng a multvarate Gaussan. In that case, t s straghtforward to generate vectors v from the pror and approxmate the ntegral defnng C (σ, N, H) by the sum C (σ, N, H) = C (σ, N, H) (III.2) n whch the th term s evaluated at the generated vector v. Note that snce t = c N depends on v, through the numbers c, ts value wll vary over ths sum. IV. SUMMARY Careful consderaton of what needs to be calculated suggests that t ought to be possble to calculate CLs lmts n a reasonable amount of tme. The suggestons made here may be helpful n ths regard. So far I have tred to be constructve, but now I cannot resst the temptaton to smash a few plates! So here goes. The CLs method melds frequentst and Bayesan elements n a procedure that s nether frequentst nor Bayesan. Therefore, we are warned [1], correctly, not to nterpret CLs lmts n a frequentst or Bayesan way. CLs lmts have frequency propertes that we are at lberty to study. But ths s besde the pont. Any ensemble of anythng has frequency propertes that can be studed, at least on a computer, ncludng Bayesan lmts! The pont s ths: The CLs confdence level β s not defned by ts frequency

7 propertes as s the case, by defnton, for a frequentst confdence level. I suspect, however, that CLs lmts, as s true for all lmts however computed, are almost certanly nternalzed n a Bayesan way because t s qute unnatural to do otherwse. I would hazard a guess that the overwhelmng majorty of consumers of the statement the sngle top producton cross secton s less 6 pb at 95% CL take t to mean there s a roughly 95% chance that the sngle top producton cross secton s less than 6 pb, where 95% smply means that whle one s not certan of the truth of ths statement, one s sure enough of ts truth to be wllng to proceed as f t were. The proponents of CLs pont to the method s pragmatcally useful propertes. Unfortunately, f pragmatc usefulness were the only crteron of acceptablty then one s left wth no compellng reason why ths method s to be favored over any other pragmatcally useful method of whch there are several. All thngs beng equal, I m nclned to favor a method that s both pragmatcally useful and well-founded. APPENDIX Drect numercal evaluaton of the sums n Eq. (II.3) s perhaps the most straghtforward way to proceed. However, t s possble to represent the sums analytcally, whch may perhaps be useful. Snce the product n Eq. (II.3) can be factorzed, we can wrte the sums n Eq. (II.3) as C(σ, N, H) = exp ( ) λ n 1 =0 λ n 1 1 n 1! n M =0 λ n M M M n M! H(t c n ), (IV.1) where the constrant s mposed usng the Heavsde step functon [4] defned by H(x) = 1 f x 0 and H(x) = 0 f x < 0. Usng the nverse Laplace transform [5] representaton of the Heavsde step functon H(x) = 1 γ+ dp exp(px) 1 2π γ p, (IV.2)

8 we can express Eq. (IV.1) as C(σ, N, H) = exp 1 p ( n 1 =0 ) 1 λ 2π λ n 1 1 exp( c 1 n 1 p) n 1! γ+ γ dp exp(pt) n M =0 λ n M M exp( c M n M p). n M! (IV.3) Snce n =0 λ n exp( c n p) n! we can wrte C as the nverse Laplace transform = exp (λ exp( c p)), (IV.4) C(σ, N, H) = 1 2π γ+ γ dp e pt F (p), (IV.5) of the functon ( F (p) = exp λ + ) λ exp( c p) / p. (IV.6) REFERENCES [1] Alex Read, Modfed Frequentst Analyss of Search Results (The CLs Method), CERN Yellow Report 2000-005; http://preprnts.cern.ch/cernrep/2000/2000-005/2000-005.html. [2] H. Hu and J. Nelsen, Analytc Confdence Level Calculatons Usng The Lkelhood Rato and Fourer Transform, CERN Yellow Report 2000-005; http://preprnts.cern.ch/cernrep/2000/2000-005/2000-005.html. [3] Danel Gllespe, A Theorem For Physcsts In The Theory Of Random Varables, Am. J. Phys. 51, 520 (1983). [4] Heavsde Step Functon, http://mathworld.wolfram.com/heavsdestepfuncton.html. [5] Laplace Transform, http://mathworld.wolfram.com/laplacetransform.html. [6] Ths could also nclude products over bns.