An introduction to Bayesian reasoning in particle physics

Similar documents
Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Leïla Haegel University of Geneva

CDF top quark " $ )(! # % & '

Statistical Methods in Particle Physics. Lecture 2

Bayesian Methods for Machine Learning

Introduction to Statistical Methods for High Energy Physics

Two Early Exotic searches with dijet events at ATLAS

Top production measurements using the ATLAS detector at the LHC

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Probability Density Functions

Lecture 5: Bayes pt. 1

Physics at Tevatron. Koji Sato KEK Theory Meeting 2005 Particle Physics Phenomenology March 3, Contents

Data Analysis and Monte Carlo Methods

Search for Lepton Number Violation at LHCb

Bayesian Regression Linear and Logistic Regression

arxiv: v1 [hep-ex] 21 Aug 2011

E. Santovetti lesson 4 Maximum likelihood Interval estimation

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

PoS(ICHEP2012)238. Search for B 0 s µ + µ and other exclusive B decays with the ATLAS detector. Paolo Iengo

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Measurement of CP Violation in B s J/ΨΦ Decay at CDF

Higgs and Z τ + τ in CMS

Lectures on Statistical Data Analysis

Statistics for the LHC Lecture 2: Discovery

Strong Lens Modeling (II): Statistical Methods

Primer on statistics:

Physics at Hadron Colliders

Single top quark production at CDF

PoS(CKM2016)117. Recent inclusive tt cross section measurements. Aruna Kumar Nayak

MCMC Sampling for Bayesian Inference using L1-type Priors

Search for H ± and H ±± to other states than τ had ν in ATLAS

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Weak Decays, CKM, Anders Ryd Cornell University

Precision measurements of the top quark mass and width with the DØ detector

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Measurements of the top-quark mass and properties at CMS

arxiv: v1 [hep-ex] 8 Jan 2018

Bayesian Phylogenetics:

(pp! bbx) at 7 and 13 TeV

Recent results from rare decays

A Bayesian Approach to Phylogenetics

Recent results from the LHCb

LHCb New B physics ideas

Highlights of top quark measurements in hadronic final states at ATLAS

Statistical Methods for Particle Physics (I)

Results from B-Physics (LHCb, BELLE)

theta a framework for template-based modeling and inference

Results on top physics by CMS

Discovery and Significance. M. Witherell 5/10/12

Single top production at ATLAS and CMS. D. Lontkovskyi on behalf of the ATLAS and CMS Collaborations La Thuile, March, 2018

Inclusive top pair production at Tevatron and LHC in electron/muon final states

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Rare B Meson Decays at Tevatron

RWTH Aachen Graduiertenkolleg

Lecture 6: Markov Chain Monte Carlo

CPSC 540: Machine Learning

Recent results from the LHCb experiment

Bayesian analysis in nuclear physics

Recent Heavy Flavors results from Tevatron. Aleksei Popov (Institute for High Energy Physics, Protvino) on behalf of the CDF and DØ Collaborations

Answers and expectations

arxiv: v1 [physics.data-an] 2 Mar 2011

B-Tagging in ATLAS: expected performance and and its calibration in data

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

arxiv: v3 [hep-ex] 11 Feb 2013

Error analysis for efficiency

Events with High P T Leptons and Missing P T and Anomalous Top at HERA

Experimental prospects for B physics and discrete symmetries at LHC and future projects

Measuring the Top Quark Mass using Kinematic Endpoints

Analysis of Top Quarks Using a Kinematic Likelihood Method

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Model Fitting in Particle Physics

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Combining and comparing neutrinoless double beta decay experiments using. using different nuclei

Search for B (s,d) μμ in LHCb

Physics 509: Bootstrap and Robust Parameter Estimation

Infer relationships among three species: Outgroup:

Lepton Number and Lepton Flavour Violation

CosmoSIS Webinar Part 1

arxiv: v1 [hep-ex] 18 Jul 2016

Searches for BSM Physics in Rare B-Decays in ATLAS

W mass measurement in the ATLAS experiment

Oddelek za fiziko. Top quark physics. Seminar. Author: Tina Šfiligoj. Mentor: prof. dr. Svjetlana Fajfer. Ljubljana, february 2012

Hierarchical Bayesian Modeling

How to find a Higgs boson. Jonathan Hays QMUL 12 th October 2012

Frontiers in Theoretical and Applied Physics 2017, Sharjah UAE

Introduction to Bayesian Data Analysis

Cracking the Unitarity Triangle

QCD Multijet Background and Systematic Uncertainties in Top Physics

YETI IPPP Durham

Overview of the Higgs boson property studies at the LHC

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Rare Decays at the Tevatron

RESULTS FROM B-FACTORIES

Markov Chain Monte Carlo methods

Standard Model at the LHC (Lecture 3: Measurement of Cross-Sections) M. Schott (CERN)

PoS(FPCP2009)013. Fabrizio Scuri I.N.F.N. - Sezione di Pisa - Italy

Measurements of the Higgs Boson at the LHC and Tevatron

Transcription:

An introduction to Bayesian reasoning in particle physics Graduiertenkolleg seminar, May 15th 2013

Overview Overview A concrete example Scientific reasoning Probability and the Bayesian interpretation Parameter estimation Numerical method: MCMC Further examples Summary 2

A concrete example Neutrinoless double beta-decay Rare nuclear transition (2nd order weak process): 2νββ: (Z, A) (Z+2, A) + 2 e- + 2 ν ΔL = 0 (T1/2 ~ 1021 y) 0νββ: (Z, A) (Z+2, A) + 2 e- ΔL = 2 (T1/2 > 1025 y) Majorana neutrino (Z, A) Process violates lepton e- number conservation e - Uei W νi νi Uei W (Z + 2, A) NUCLEAR PROCESS 3

A concrete example Neutrinoless double beta-decay Searching for a single peak on top of a (flat) background... 4

A concrete example Neutrinoless double beta-decay Do you see a line? 5

Questions in data analysis Major subjects of data analysis Model comparison: Which model describes the data best? SM background only? Does 0nubb + background describe the data better? Parameter estimation: Given a model, what are the values of its free parameters? What is the rate of 0nubb? What is the actual background level? Goodness-of-fit: Given a model, is it consistent with the data? Does the background-only hypothesis describe the data reasonably well? 6

Scientific reasoning Ingredients 7

Scientific reasoning Deductive reasoning Used when making predictions from a model Application in data analysis: Premise P (model with parameters) Conclusion Q (observables) Premise Q (observables) Conclusion R (set of observations) Thus: Premise P (model) Conclusion R (set of observations) Given a model, the outcome is specified No need to argue, it s math! Example: 0nubb + background model predicts a certain energy spectrum 8

Scientific reasoning Inductive reasoning Used when choosing a model Application in data analysis: Observe R, what does it say about P? Not much since it could have been P1 R, P2 R, P3 R,... Validity of model P? If we know all models, and only P results in R, then we know that P is true. Otherwise, can not verify the model. Premise P (model with parameters) Conclusion R (set of observations) Can try to falsify the model: if we observe something that contradicts the model, it can not be true Can we know which model is true? No! 9

Scientific reasoning Truth and knowledge Plato: knowledge is justified true belief. Proposition P is known to be true if and only if P is true. P is believed to be true. It is justified that P is believed to be true. Can not agree: we can not know the truth, so: Knowledge is justified belief Justification comes from experimental observations: Derive predictions from model and test them The more tests are passed, the greater the belief in the model Examples: SM of particle physics, general relativity,... 10

Scientific reasoning Application in science How do we gain knowledge? Set up models and specify their parameters (check arxiv.org!) Derive (deductively) predictions from the models Can not know all models, so can not verify a model Good model: falsifiable, make predictions which can be proven wrong Use data to gain (inductively) knowledge about the models and parameters Examples: Special relativity predicts time dilation. Atmospheric muons can thus be observed on the earth s surface. Neutrino postulation: Pauli was hesitant to publish his neutrino idea because he thought it would be difficult to discover. Can we quantify the knowledge about a model? Yes, use probabilities 11

Probability Axioms and interpretation Kolmogorov axioms: start from set S 1. For each subset A, assign probability P(A) between 0 and 1 2. Probability P(S) = 1 3. For disjunct subsets A and B: P(A or B) = P(A) + P(B) Nice mathematical formulation, but meaningless! Bayesian interpretation Subsets correspond to hypotheses, i.e. a model with a particular value of the parameter. Example: SM and the electron mass Probability is understood as degree-of-belief (or state-of-knowledge) for this hypothesis to be true Interpretation fully consistent with Kolmogorov axioms 12

Bayes Theorem Bayes Theorem Σ Here: P(theory data): posterior probability (induction) P(data theory): probability of the data, likelihood (deduction) P(theory): prior probability In words: My degree-of-belief of a model is x%, or The parameter values lie in an interval [a,b] with a x% probability 13

Priors Interpretation and criticism Prior can come from personal degree-of-belief, theoretical considerations, auxiliary measurements,... Elegant update of knowledge: posterior of one experiment can be prior of another experiment. Natural way to combine measurements. Subjective? Yes, but made explicit Objective Bayesian movement, try to find objective priors (reference priors) Prior depends on parametrization (lifetime vs. decay constant), Jeffreys prior Choice of (initial) prior should not play a strong role. Difficult to formulate a single prior for a collaboration of ~3.000 people 14

Point and interval estimates (1D) Parameter estimation Full solution: posterior probability Summarize posterior using point and interval estimates Common point estimators: Maximum posterior probability (global mode) Maximum of marginalized probability (local mode) Mean value of marginalized probability Median of marginalized probability p i D = i j d j p D Common interval estimates: Smallest (set of) interval(s) covering 68% probability 16% - 84% quantile Standard deviation of marginalized posterior Upper (lower) limits: 99%, 95%, 90% (1%, 5%, 10%) quantiles Chose such that point estimator lies inside estimated interval! 15

A concrete example Concrete model Data: Binned, number of events Shapes: Background linearly decreasing Signal: Gaussian at fixed position Statistical model: Independent Poisson fluctuations Parameter 1: background strength, Gaussian prior Parameter 2: signal strength, exponentially decreasing prior Fit procedure Template fit: scale signal and background shapes until sum of templates matches data 16

Point and interval estimates (2D) 17

Point and interval estimates (1D) 18

Update of knowledge 19

Fitted spectrum dn/de E [kev] 20

Systematic uncertainties Nuisance parameters Model = Physics model (+ par.) Detector model (+ nuisance par.) Associate nuisance parameters to sources of systematic uncertainties, e.g. Collider: Luminosity uncertainty (1 parameter) Calorimeter: jet energy resolution (typically 3 parameters) Reconstructed objects: reconstruction efficiency (n parameters) Different physics models?... Is it justified to use a nuisance parameter? Discrete vs. continuous par. Choose appropriate prior (typically Gaussian, sometimes flat) Marginalize w.r.t. all nuisance parameters Remove nuisance parameter from the final answer Combine systematic and statistical uncertainties 21

Systematic uncertainties Systematic uncertainties Add two uncertainties: Systematic 1: 10% uncertainty on signal and background yield Systematic 2: 60% uncertainty on signal yield Priors: Gaussian with mean value of 0 and width of 1 sigma 22

Systematic 1 Constrained by background Medium impact on signal 23

Systematic 2 Not constrained by background Large impact on signal 24

Computational steps Numerical issues Point estimate: Maximization of posterior Typical tool: Minuit Also: Simulated annealing Calculation of marginal distributions: Analytical solutions usually difficult Numerical integration methods, e.g. VEGAS Sampling methods: Hit&miss, simple Monte Carlo,... Importance sampling Markov Chain Monte Carlo (MCMC) Revolution of Bayesian computation 25

Markov Chain Monte Carlo How does MCMC work? Output of Bayesian analyses are posterior probability densities, i.e., functions of an arbitrary number of parameters (dimensions). Sampling large dimensional functions is difficult. Idea: use random walk heading towards region of larger values (probabilities) Metropolis algorithm N. Metropolis et al., J. Chem. Phys. 21 (1953) 1087. Start at some randomly chosen xi Randomly generate y around x i If f(y) > f(x ) set x =y i i+1 If f(y) < f(x ) set x = y with prob. p=f(y)/f(xi) i i+1 If y is not accepted set x = xi i+1 Start over 26

Markov Chain Monte Carlo MCMC for Bayesian inference Use MCMC to sample the posterior probability, i.e. marginalized distributions Marginalization of posterior: Fill a histogram with just one coordinate while sampling Error propagation: calculate any function of the parameters while sampling Point estimate: find mode while sampling 27

Markov Chain Monte Carlo Does it work? Test MCMC on a function: 4 2 f x =x sin x Compare MCMC distribution to analytic function Several minima/maxima are no problem. Different orders of magnitude are no problem. But: need to make sure that these chains converge towards the true distribution 28

Markov Chain Monte Carlo Convergence This is where it get s difficult... Add a burn-in phase Use multiple chains Convergence a la Gelman & Rubin Burn-in phase Parameter 0 value vs. iteration Parameter 1 vs parameter 0 29

BAT Bayesian Analysis Toolkit Tool for Bayesian inference written in C++ Based on the ROOT-core functionality, interface to RooStats Uses MCMC for the calculation of the posterior probability Full control over convergence, automatic adjustment of step size Further algorithms: interface to CUBA, Minuit; importance sampling, simulated annealing,... Pre-defined models: histogram fitter, template fitter, tool for combination of measurements,... Web page: http://www.mppmu.mpg.de/bat/ Contact: bat@mppmu.mpg.de Paper on BAT: A. Caldwell, D. Kollar, K. Kröninger, BAT - The Bayesian Analysis Toolkit Comp. Phys. Comm. 180 (2009) 2197-2209 [arxiv:0808.2552]. 30

Particle physics examples Dilepton resonances ATLAS-CONF-2012-129 31

Particle physics examples Classical template fit J.Phys. G40 (2013) 035110 Anomalous couplings JHEP 1206 (2012) 088 Longitudinal structure function (48-dim. fit) Phys.Lett. B682 (2009) 8 32

A complex example Rare b-meson decays Tree-level FCNC forbidden in the SM Effective field theory: add eff. operators to Lagrangian Similar to Fermi s four-point interaction Physics case: search for non-sm contributions Model parameters: Frederic Beaujean et al. JHEP 08(2012)030 3 Wilson coefficients 25 nuisance parameters Input: 59 measurements from BaBar, Belle, CDF, LHCb Theory calculations, quark masses, CKM parameters,... Numerically difficult ~ impossible with MCMC 33

A complex example Frederic Beaujean et al. JHEP 08(2012)030 Rare b-meson decays Use MCMC plus population MC Dominant contribution: 2 2 BR B Kll C 9 C 10 34

A complex example Rare b-meson decays Frederic Beaujean et al. JHEP 08(2012)030 Posterior probability well sampled No hint for new physics Showed necessity to implement new numerical algorithms Example: F. Beaujean et al., Initializing adaptive importance sampling with Markov chains, arxiv:1304.7808 35

Summary Summary Knowledge is justified belief Bayesian probability is degree-of-belief Bayes theorem allows easy update of knowledge Everything else is about math and numerical methods: Parameter (point and interval) estimation Treatment of systematic uncertainties Calculation of marginalized distributions Also: model comparison and goodness-of-fit (not covered) Numerical methods necessary for complex fit with a large number of parameters 36