How Good Are Your Fits?

Similar documents
Hypothesis testing:power, test statistic CMS:

Statistical Data Analysis Stat 3: p-values, parameter estimation

D 0 -mixing and CP Violation in Charm at Belle

Relative branching ratio measurements of charmless B ± decays to three hadrons

Search for exotic charmonium states

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

FYST17 Lecture 6 LHC Physics II

Evidence for D 0 - D 0 mixing. Marko Starič. J. Stefan Institute, Ljubljana, Slovenia. March XLII Rencontres de Moriond, La Thuile, Italy

CP Violation in the B system

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Recent BaBar results on CP Violation in B decays

Searches for CP violation in decays from BABAR and Belle

The Bayes classifier

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Model Selection in Amplitude Analysis

CP Violation in the B(s) meson system at LHCb Julian Wishahi on behalf of the LHCb collaboration

Recall the Basics of Hypothesis Testing

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Charm mixing and CP violation at LHCb

arxiv: v2 [hep-ex] 8 Aug 2013

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

B. Hoeneisen. Universidad San Francisco de Quito Representing the DØ Collaboration Flavor Physics and CP violation (2013)

Confidence intervals

Multivariate statistical methods and data mining in particle physics

Convergence Rate of Markov Chains

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Chapter 4: Factor Analysis

HQL Virginia Tech. Bob Hirosky for the D0 Collaboration. Bob Hirosky, UNIVERSITY of VIRGINIA. 26May, 2016

Analysis tools: the heavy quark sector. Gianluca Cavoto INFN Roma ATHOS 12 Jun 20 th -22 nd, 2012 Camogli, Italy

D 0 Mixing and CP Violation at the BABAR Experiment

Measurements of R K and R K* at LHCb

Ch. 11 Inference for Distributions of Categorical Data

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland. Rare B decays at CMS

Generalized Partial Wave Analysis Software for PANDA

FYST17 Lecture 6 LHC Physics II

appstats27.notebook April 06, 2017

Measurements of CPV and mixing in charm decays

Statistics and Data Analysis

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Measurement of ϕ 2 /α using

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

B Factories. Alan Watson University of Birmingham, UK

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

arxiv: v1 [stat.ap] 11 Feb 2011

CPSC 340: Machine Learning and Data Mining

Comparing Means from Two-Sample

For True Conditionalizers Weisberg s Paradox is a False Alarm

Multiple Regression Analysis

Statistics for the LHC Lecture 1: Introduction

Negative binomial distribution and multiplicities in p p( p) collisions

Advanced statistical methods for data analysis Lecture 1

Department of Physics and Astronomy, University of California, Riverside, CA, USA

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

ωω: J. Albert Caltech March 24, 2006

Statistical Analysis of the Item Count Technique

Logistic Regression Analysis

Discovery and Significance. M. Witherell 5/10/12

Rencontres de Moriond - EW Interactions and Unified Theories La Thuile, March 14-21, 2015

Statistical Methods for Particle Physics (I)

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Frank Porter February 26, 2013

Recent results from the LHCb

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Measurement of the baryon number transport with LHCb

Testing Restrictions and Comparing Models

Rare decays of beauty mesons

Statistics: revision

LECTURE 15: SIMPLE LINEAR REGRESSION I

Studies of rare B meson decays with the CMS detector

Measurement of Фs, ΔΓs and Lifetime in Bs J/ψ Φ at ATLAS and CMS

Direct CPV. K.Trabelsi kek.jp

33. SOLVING LINEAR INEQUALITIES IN ONE VARIABLE

review session gov 2000 gov 2000 () review session 1 / 38

Statistics for the LHC Lecture 2: Discovery

Numerical methods for lattice field theory

Recent developments on CKM angles

Correlation and regression

R. Mureşan. University of Oxford On behalf of LHCb Collaboration. Prepared for the CERN Theory Institute "Flavour as a Window to New Physics at LHC"

Mr. Stein s Words of Wisdom

Lecture 15: Chapter 10

18.05 Practice Final Exam

Chi Square Analysis M&M Statistics. Name Period Date

Handling uncertainties in background shapes: the discrete profiling method

RESULTS FROM B-FACTORIES

BABAR results on Lepton Flavour Violating (LFV) searches for τ lepton + pseudoscalar mesons & Combination of BABAR and BELLE LFV limits

Hadronic Charm Decays: Experimental Review

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Modern Methods of Data Analysis - WS 07/08

Electric Current Unlike static electricity, electric current is a continuous flow of charged particles (electricity). For current to flow, there must

Robustness and Distribution Assumptions

A.Mordá. INFN - Padova. 7 th July on behalf of the Belle2 Collaboration. CP Violation sensitivity at the Belle II Experiment

Hypothesis Testing hypothesis testing approach

Ch. 7. One sample hypothesis tests for µ and σ

STAT Chapter 8: Hypothesis Tests

Transcription:

How Good Are Your Fits? LHCb Collaboration::Mike Williams Imperial College London & MIT ATHOS 2012 June 21 st, 2012 Mike Williams Imperial College ATHOS, June 2012 1 / 10

Introduction A common situation in physics is that an unbinned maximum likelihood fit to the data has been performed to obtain estimators for any unknown parameters in a PDF (f 0 ). The question is then How well does the fit PDF describe the data? Unfortunately, L does not provide any information that would help answer this question (beware of claims to the contrary!). How do we answer this question? Typically we (somehow) determine the p-value. The p-value is the probability that, if f 0 is the true PDF (f ), a repeat of the experiment would have lesser agreement with the data than what we observe in our experiment. The p-value is not the probability that f 0 = f. If f 0 = f, then the p-value distribution is uniform on (0, 1) for an ensemble of data sets sampled from f. Mike Williams Imperial College ATHOS, June 2012 2 / 10

Binned Tests If the data is dense enough, it can be binned. One would then be tempted to use the χ 2 test to get a p-value... but be careful! If we started by binning the data and determined the estimators for all unknown parameters in f 0 by minimizing the χ 2, then the distribution of the χ 2 statistic (g(χ 2 )) is known if f 0 = f. Thus, we can determine the p-value from, e.g., TMath::Prob(chi2,nbins-npars). The problem here is that we started by getting the estimators by maximizing L. This means that we don t know g; all we know is that χ 2 (n b n p ) χ 2 χ 2 (n b ). In other words, had we explicitly minimized χ 2 then its value would have been the one we obtain from the PDF that maximizes L (by definition). This is something to be aware of and a mistake that is often made in physics analyses. Mike Williams Imperial College ATHOS, June 2012 3 / 10

Unbinned Tests In many analyses, binning just isn t a viable option (high dimensions and/or low stats). Contrary to popular (physics) belief, there are many methods in the statistics literature that deal with these situations. I put the available methods into 5 categories: mixed-sample methods; point-to-point dissimilarity methods; distance to nearest-neighbor methods; local-density methods; kernel-based methods. Clearly, I don t have time to discuss them all ; in this (very short) talk I ll just discuss one method from the point-to-point dissimilarity category. See M. Williams, JINST 5, P09004 (2010) [arxiv:1006.3019] for a full review. Mike Williams Imperial College ATHOS, June 2012 4 / 10

Point-to-Point Dissimilarity Methods If the parent PDF, f ( x), of the data were known, then the GOF of a test PDF, f 0 ( x), could be obtained using the statistic T = 1 2 (f ( x) f0 ( x)) 2 d x. Since f is not known, T cannot be calculated. Of course, if f were known there would also be no reason to perform a fit. A more general expression involves correlating the difference b/t f and f 0 at different points in the multivariate space using a weighting function: T = 1 2 (f ( x) f0 ( x)) (f ( x ) f 0 ( x )) ψ( x x )d xd x. N.b. the 1 st expression is the case ψ( x x ) = δ( x x ). This quantity can be calculated w/o knowing f (using the data)! Mike Williams Imperial College ATHOS, June 2012 5 / 10

Point-to-Point Dissimilarity Methods Using the data and a MC data set sampled from f 0, T can be written as 1 1 T = n d (n d 1) ψ( x i x j )+ n mc(n mc 1) ψ( y i y j ) 1 n d n mc ψ( x i y j ). j>i Values of the weighting function found in the statistical literature: ψ(z) = z 2 [C.M. Cuadras and J. Fortiana, various works (1997-2003)] ψ(z) = z [L. Baringhaus and C. Franz, J. Multivariate Anal. 88 (2004) 190-206] ψ(z) = 1 z, log z or e z2 /2σ 2 [B. Aslan and G. Zech, Stat. Comp. Simul. 75, Issue 2 (2004) 109-119] A&Z note that for ψ(z) = 1 z T is the electrostatic energy of two charge distributions w/ opposite sign (which is minimized if f = f 0 ); hence, they called it the energy test. j>i i,j Mike Williams Imperial College ATHOS, June 2012 6 / 10

Example: CPV in Dalitz Plots Dalitz model with a (very!) small amount of CP Violation We can use the energy test to tell us if these two data sets are consistent with sharing the same parent distribution (model-independent method for observing CP violation!). M. Williams, PRD 84 054015 (2011) [arxiv:1105.5338] Mike Williams Imperial College ATHOS, June 2012 7 / 10

Example: CPV in Dalitz Plots At the 95% confidence level, the no CPV hypothesis is rejected (3 ± 2)% by the χ 2 test but (52 ± 5)% by the energy test! The χ 2 test is blind to the CPV here but the energy test is powerful enough to detect it. This test can also be used to perform non-parametric regression: M. Williams, JINST 6, P10003 (2011) [arxiv:1107.2285]. Mike Williams Imperial College ATHOS, June 2012 8 / 10

Etc. Choose your test before you look at the data (or, at least, before you calculate the p-value)! You can always find a test that will reject the true PDF and another that will except (almost any) alternative PDF. Resolution in this test (and any other MC based one) can be handled when generating the MC. Alternatively, resolution can be put into the PDF directly. A small p-value could mean either you don t understand the physics or the detector or both! The PDF is a product of the two and no statistical test (using just the data and MC) can determine which is the culprit. Mike Williams Imperial College ATHOS, June 2012 9 / 10

Summary For binned analyses, the χ 2 test provides a simple way of obtaining the GOF (but be careful as there are a few common mistakes that can bite you). For unbinned analyses, there are many GOF tests on the market. Many are very powerful (some are not!). Some are slow; some are fast; etc. Choose the right tool for the job. It s always a good idea to validate a GOF method for your analysis in MC. Any true GOF method produces a uniform p-value distribution when the true PDF is used. Mike Williams Imperial College ATHOS, June 2012 10 / 10