Bayes Theorem (Recap) Adrian Bevan.

Similar documents
E. Santovetti lesson 4 Maximum likelihood Interval estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical techniques for data analysis in Cosmology

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

First two sided limit on BR(B s μ + μ - ) Matthew Herndon, University of Wisconsin Madison SUSY M. Herndon, SUSY

Journeys of an Accidental Statistician

A profile likelihood analysis of the CMSSM with Genetic Algorithms

TeV-scale type-i+ii seesaw mechanism and its collider signatures at the LHC

First two sided limit on BR(B s µ + µ - ) Matthew Herndon, University of Wisconsin Madison SUSY M. Herndon, SUSY

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistics and Data Analysis

Discovery and Significance. M. Witherell 5/10/12

Statistical Methods in Particle Physics

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Highlights of Higgs Physics at LEP

Search for Higgs Bosons at LEP. Haijun Yang University of Michigan, Ann Arbor

Introduction to Statistical Methods for High Energy Physics

Second Workshop, Third Summary

Lecture 18 - Beyond the Standard Model

Di-photon at 750 GeV! (A first read)

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Searches for Beyond SM Physics with ATLAS and CMS

SUSY Phenomenology & Experimental searches

Higgs and Z τ + τ in CMS

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Statistics Challenges in High Energy Physics Search Experiments

Results on Searches for New Physics at B Factories

Statistics for the LHC Lecture 1: Introduction

Leïla Haegel University of Geneva

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

Hypothesis testing:power, test statistic CMS:

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

BABAR results on Lepton Flavour Violating (LFV) searches for τ lepton + pseudoscalar mesons & Combination of BABAR and BELLE LFV limits

Model Fitting in Particle Physics

Measurement And Uncertainty

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Rare Hadronic B Decays

Discovery searches for light new physics with BaBar

Electroweak-scale Right-handed Neutrino Model And 126 GeV Higgs-like Particle

Searches for Leptonic Decays of the B-meson at BaBar

Primer on statistics:

Natural Electroweak Symmetry Breaking in NMSSM and Higgs at 100 GeV

Search for heavy neutrinos in kaon decays

Searches for heavy long-lived particles at LHCb

Theoretical Review on Lepton Universality and LFV

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Rare B decay searches with BABAR using hadronic tag reconstruction

Higgs Results from LEP2. University of Wisconsin January 24, 2002 WIN 02

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Experimental Signatures for Lorentz Violation (in Neutral B Meson Mixing)

Two-Higgs-Doublet Model

Bios 6649: Clinical Trials - Statistical Design and Monitoring

An introduction to Bayesian reasoning in particle physics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

2. A Basic Statistical Toolbox

DIRAC vs MAJORANA? Neutrinos are the only electrically neutral fermions. ff (quarks, charged leptons) If a fermion is charged, ff

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Combined Higgs Results

Lectures on Statistical Data Analysis

Recommendations for presentation of error bars

Parameter Estimation and Fitting to Data

Bayesian Econometrics

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Recent results from rare decays

High tanβ Grid Search

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

U(1) Gauge Extensions of the Standard Model

Statistics of Small Signals

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Review: Statistical Model

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland. Rare B decays at CMS

Statistische Methoden der Datenanalyse. Kapitel 1: Fundamentale Konzepte. Professor Markus Schumacher Freiburg / Sommersemester 2009

Higgs Boson in Lepton Decay Modes at the CMS Experiment

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Frequentist-Bayesian Model Comparisons: A Simple Example

Searches for New Physics in quarkonium decays at BaBar/Belle

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Lecture 24. Introduction to Error Analysis. Experimental Nuclear Physics PHYS 741

EDMs and flavor violation in SUSY models

Confidence Distribution

Rare Hadronic B decays at BaBar

On behalf of M. Bona, G. Eigen, R. Itoh and E. Kou

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Dalitz Plot Analyses of B D + π π, B + π + π π + and D + s π+ π π + at BABAR

SEARCHES FOR THE NEUTRAL HIGGS BOSONS OF THE MSSM

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Rencontres de Moriond - EW Interactions and Unified Theories La Thuile, March 14-21, 2015

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Advanced Herd Management Probabilities and distributions

Modern Methods of Data Analysis - WS 07/08

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Leaving Plato s Cave: Beyond The Simplest Models of Dark Matter

Some Statistical Tools for Particle Physics

Early SUSY Searches in Events with Leptons with the ATLAS-Detector

arxiv: v1 [hep-ex] 5 Sep 2014

Search for a new spin-zero resonance in diboson channels at 13 TeV with the CMS experiment

Transcription:

Bayes Theorem (Recap) Adrian Bevan email: a.j.bevan@qmul.ac.uk

Overview Bayes theorem is given by The probability you want to compute: The probability of the hypothesis B given the data A. This is sometimes called the posterior probability

Overview Bayes theorem is given by The probability of the data A, given the hypothesis B. This you can compute given a theory or model The probability you want to compute: The probability of the hypothesis B given the data A. This is sometimes called the posterior probability

Overview Bayes theorem is given by The probability of the data A, given the hypothesis B. This you can compute given a theory or model The probability you want to compute: The probability of the hypothesis B given the data A. This is sometimes called the posterior probability The prior probability: a level of belief in the feasibility of a hypothesis.

Overview Bayes theorem is given by The probability of the data A, given the hypothesis B. This you can compute given a theory or model The probability you want to compute: The probability of the hypothesis B given the data A. This is sometimes called the posterior probability The probability of the data A given all possible hypotheses. The prior probability: a level of belief in the feasibility of a hypothesis.

Overview Bayes theorem is given by The probability of the data A, given the hypothesis B. This you can compute given a theory or model The probability you want to compute: The probability of the hypothesis B given the data A. This is sometimes called the posterior probability The probability of the data A given all possible hypotheses. P (A) = X j The prior probability: a level of belief in the feasibility of a hypothesis. P (A B j )P (B j )

Example What is the probability of being dealt an ace from a deck of 52 cards? Two hypothetical outcomes: you are dealt an ace (B 0 ), and you are not dealt an ace (B 1 ). Assume that the deck is unbiased and properly shufjled, so that there are 4 aces out of 52 randomly distributed in the pack of cards. P (B 0 )=4/52 P (B 1 ) = 48/52 Compute the probability of being dealt an ace, given that you have an unbiased P (A B 0 )=4/52 P (A B 1 ) = 48/52 Now we can compute the posterior probability Is this choice of prior reasonable?

P (A) = X j P (A B j )P (B j ) P (A) = (4/52) (4/52) + (48/52) (48/52) = (16 + 2304)/2704 = 0.8580(4 s.f.) P (A B 0 ) = (4/52) (4/52) 0.8580 = 0.007(4 s.f.) This result does not agree with a frequentist interpretation of the data.

An alternative calculation: assume uniform priors P (A) = X j P (A B j )P (B j ) P (A) = (4/52) (1/2) + (48/52) (1/2) = (4 + 48)/104 = 0.5 P (A B 0 ) = (4/52) (1/2) 0.5 = 0.077(3 d.p.) This result agrees with a frequentist interpretation of the data.

Weighted Average (Recap) Adrian Bevan email: a.j.bevan@qmul.ac.uk

Formula for two uncorrelated variables For example, lets just consider the total precision on the measurement of some observable X. If there are two ways to measure X, and these yield σ 1 = 5 and σ 2 =3, then the total error on the average is 21 2 2 2 1 + 2 2 1/2 = 1/2 25 9 25 + 9 = p 6.62 ' 2.57

If one has a set of uncorrelated observables, then it is straightforward to compute the average in the same way foreach observable in the set. i.e. instead of an n dimensional problem, you have n lots of a one dimensional problem as in the previous example. Unfortunately if you have a set of correlated observables from different measurements, then this is no longer the case and the problem becomes a little more complicated.

A more complicated problem arises when the observables are correlated... In this case the covariance matrix between a set of M measured observables will play a role. inverse of the covariance matrix for the j th measurement The j th measurement vector (the n correlated observables) The Jirst factor is common both to the covariance matrix for the average, and the observable values for the average.

Example: Measuring time- dependent CP asymmetries in B meson decays Real example using results from two HEP experiments M = 2 n = 2 Remember if you only have a correlation, you need to compute the covariance in order to compute Vj.

Example: Measuring time- dependent CP asymmetries in B meson decays Real example using results from two HEP experiments

Example: Measuring time- dependent CP asymmetries in B meson decays Real example using results from two HEP experiments

Example: Measuring time- dependent CP asymmetries in B meson decays Real example using results from two HEP experiments 2 i 2 i

Example: Measuring time- dependent CP asymmetries in B meson decays Real example using results from two HEP experiments 2 i 2 i Having constructed x j and V j, one can compute the average...

Starting with the error matrix: Where So Results: Covariance = 0.0002 Variances are read off of the diagonal

Similarly for the average value we can now compute S and C (the n observables of interest in our example) via: Thus the average of the two measurements is: with a covariance of 0.0002 between S and C.

Poisson limits Adrian Bevan email: a.j.bevan@qmul.ac.uk

For a given number of observed events resulting from the study of a rare process one wants to compute a limit on the true value of some underlying theory parameters (e.g. the mean occurrence of the rare process). This is a Poisson problem, where: λ r = mean/variance of the underlying distribution = observed number of events

For a given observation, we can compute a likelihood as a function of λ, e.g. From each likelihood distribution one can construct a one or two sided interval by integrating L(λ,r) to obtain the desired coverage.

For a given observation, we can compute a likelihood as a function of λ, e.g. e.g. for r=5, the interval shown contains 68% of the total integral of the likelihood. From each likelihood distribution one can construct a one or two sided interval by integrating L(λ,r) to obtain the desired coverage. We can build a 2D region from a family of likelihood curves for different values of r.

e.g. r = 5 is indicated by the vertical (red) dashed line Physically there are discrete observed numbers of events for a background free process. If however background plays a role, then the problem becomes more complicated, and noninteger values may be of relevance..

Multi- dimensional constraint An example of constraining a model using data. Adrian Bevan email: a.j.bevan@qmul.ac.uk

Motivation The Standard Model of particle physics describes all phenomenon we know at a sub- atomic level. The model is incomplete: Universal matter- antimatter asymmetry unknown Nature of neutrinos unknown What is Dark Matter What is Dark Energy [related to a higher order GUT] etc. Many particle physicists want to test the Standard Model precisely for two (related) reasons: (i) understand the model better (ii) see where it breaks...

The problem B ±! ± The decay has been measured, and can be compared with theoretical expectations. Measurement: B(B ±! ± )=(1.15 ± 0.23) 10 4 Standard Model expectation: B(B ±! ± ) SM =(1.01 ± 0.29) 10 4 For a simple extension of the Standard Model, called the type II 2 Higgs Doublet Model we know that r H depends on the mass of a charged Higgs and another parameter, β. r H = 1 m 2 B m 2 H tan 2 2

What can we learn about m H and tanβ for this model? We can compute r H from our knowledge of the measured and predicted branching fractions: r H =1.14 ± 0.40 How can we use this to constrain m H and tanβ? m 2 2 B r H = 1 m 2 tan 2 H

Method 1: χ 2 approach Construct a χ 2 in terms of r H 2 = rh br H (m H, tan ) r H 2

Method 1: χ 2 approach Construct a χ 2 in terms of r H From SM theory and experimental measurement 2 = rh br H (m H, tan ) r H 2 From SM theory and experimental measurement

Method 1: χ 2 approach Construct a χ 2 in terms of r H From SM theory and experimental measurement 2 = rh br H (m H, tan ) r H Calculate using m 2 2 B r H = 1 m 2 tan 2 H One has to select the parameter values. 2 From SM theory and experimental measurement

Method 1: χ 2 approach For a given value of m H and tanβ you can compute χ 2. e.g. m H = 0.2TeV tan = 10 br H (m H, tan ) = 0.93 2 1.14 0.93 = 0.4 = 0.28 2 So the task at hand is to scan through values of the parameters in order to study the behaviour of constraint on r H.

Method 1: χ 2 approach A largeχ 2 indicates a region of parameter space that is forbidden. chi2(mh vs tanbeta) Forbidden 2 χ 15 10 10 14 10 12 10 10 8 10 6 10 10 4 10 2 1 10-2 10-4 Allowed region (the valley) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 m H + (TeV) chi2 Entries 10000 Mean x 0.01 Mean y 90 RMS x 0.0002544 RMS y 9.036 0 102030 40 506070 tanβ 80 90100 A small value is allowed. In between we have to decide on a confidence level that we use as a cut-off. We really want to covert this distribution to a probability: so use the χ 2 probability distribution. There are 2 parameters and one constraint (the data), so there are 2 1 degrees of freedom, i.e. ν=1

Method 1: χ 2 approach prob(mh vs tanbeta) Forbidden (P ~ 0) A P ~ 1 means that we have no constraint on the value of the parameters (i.e. they are allowed). Probability 1 10-1 10-2 -3 10 0 10 20 30 40 50 Allowed region (P ~ 1) 60 70 80 90 100 tanβ 2 1 1.2 1.4 1.6 1.8 m H + (TeV) 0 0.2 0.4 0.6 0.8 A small value of P, ~0 means that there is a very low (or zero) probability of the parameters being able to take those values (i.e. the parameters are forbidden in that region). Typically one sets a 1 CL corresponding to 1 or 3 σto talk about the uncertainty of a measurement, or indicate an exclusion region at that CL.

Method 1: χ 2 approach Probability prob(mh vs tanbeta) 1 10-1 10-2 -3 10 0 10 20 30 40 50 60 70 80 90 100 tanβ 1 1.2 1.4 1.6 1.8 (TeV) Artefact: a remnant of binning the data. For these plots there are 100 x 100 bins. As a result visual oddities can occur in regions where the probability (or χ 2 ) changes rapidly. 2 m H + 0 0.2 0.4 0.6 0.8 A P ~ 1 means that we have no constraint on the value of the parameters (i.e. they are allowed). A small value of P, ~0 means that there is a very low (or zero) probability of the parameters being able to take those values (i.e. the parameters are forbidden in that region). Typically one sets a 1 CL corresponding to 1 or 3 σto talk about the uncertainty of a measurement, or indicate an exclusion region at that CL.

Method 1: χ 2 approach A Jiner binning can be used to compute a 1- CL distribution. Here 1, 2 and 3σ intervals are shown. prob(mh vs tanbeta) tanβ 100 90 80 70 60 50 40 30 20 10 Forbidden Allowed: a tiny slice of parameter space is allowed between two regions that are forbidden 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 (TeV) m H +

Summary: χ 2 approach This is nothing new it is just a two- dimensional scan solution to a problem. It is however more computationally challenging to undertake (excel probably won't be a good Jit to solving the problem): 1D problem: N scan points 2D problem: N 2 scan points As N becomes large (e.g. 100 or 1000) the number of sample points becomes very large. i.e. the curse of dimensionality strikes. An MD problem has N M sample points. e.g. Minimal Super- Symmetric Model (MSSM) has ~160 parameters, so one has a problem with N 160 sample points. This is currently not a viable computational method to explore the complexities of MSSM.