Introduction to Bayes

Similar documents
Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Bayesian Inference and MCMC

Strong Lens Modeling (II): Statistical Methods

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Multimodal Nested Sampling

an introduction to bayesian inference

@R_Trotta Bayesian inference: Principles and applications

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Bayesian Inference. Introduction

Introduction to Bayesian Methods

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bayesian analysis in nuclear physics

Bayesian Inference in Astronomy & Astrophysics A Short Course

Probability and Estimation. Alan Moses

Lies, damned lies, and statistics in Astronomy

Bayesian Phylogenetics:

Introduction to CosmoMC

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Statistical Searches in Astrophysics and Cosmology

Hierarchical Bayesian Modeling

Introduc)on to Bayesian Methods

CosmoSIS Webinar Part 1

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Introduction into Bayesian statistics

Introduction to Bayesian Data Analysis

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Large Scale Bayesian Inference

Probabilistic Machine Learning

AST 418/518 Instrumentation and Statistics

Bayesian Analysis of RR Lyrae Distances and Kinematics

Probability Theory Review

Recent advances in cosmological Bayesian model comparison

Machine Learning using Bayesian Approaches

Bayesian Data Analysis

A Bayesian Approach to Phylogenetics

Basics of Bayesian analysis. Jorge Lillo-Box MCMC Coffee Season 1, Episode 5

Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Regression Linear and Logistic Regression

Markov Chain Monte Carlo

Answers and expectations

Probability and Inference

Grundlagen der Künstlichen Intelligenz

Probability distribution functions

Why Try Bayesian Methods? (Lecture 5)

An introduction to Markov Chain Monte Carlo techniques

An introduction to Bayesian reasoning in particle physics

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Lecture 5: Bayes pt. 1

Bayesian inference: what it means and why we care

Bayesian Methods for Machine Learning

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Bayesian RL Seminar. Chris Mansley September 9, 2008

Foundations of Statistical Inference

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Statistical Methods for Astronomy

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

Density Estimation. Seungjin Choi

Statistics and Prediction. Tom

Lecture 6: Model Checking and Selection

Abstract. Statistics for Twenty-first Century Astrometry. Bayesian Analysis and Astronomy. Advantages of Bayesian Methods

Bayesian Inference. Chapter 1. Introduction and basic concepts

Monte Carlo in Bayesian Statistics

David Giles Bayesian Econometrics

Lecture 4 Bayes Theorem


Bayesian search for other Earths

2. A Basic Statistical Toolbox

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Markov Networks.

Tutorial on Probabilistic Programming with PyMC3

Cosmology Dark Energy Models ASTR 2120 Sarazin

Bayesian Inference: Concept and Practice

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Statistical Methods in Particle Physics

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Computational Perception. Bayesian Inference

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Conditional Probability. CS231 Dianna Xu

2 Probability and machine learning principles

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Approximate Bayesian Computation: a simulation based approach to inference

Data Analysis and Uncertainty Part 2: Estimation

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Bayesian Phylogenetics

Introduction to Bayesian Statistics 1

Lecture 4 Bayes Theorem

Reminder of some Markov Chain properties:

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Hierarchical Bayesian Modeling

Lecture 10: General Relativity I

Cosmological Constraints on Dark Energy via Bulk Viscosity from Decaying Dark Matter

Transcription:

Introduction to Bayes Alan Heavens September 3, 2018 ICIC Data Analysis Workshop Alan Heavens Introduction to Bayes September 3, 2018 1 / 35

Overview 1 Inverse Problems 2 The meaning of probability Probability rules and Bayes theorem p(x y) is not the same as p(y x) 3 Parameter Inference The posterior p(parameters data) How to set up a problem Priors 4 Sampling Case study: Eddington 1919 Eclipse expedition The perils of not marginalising 5 The Monty Hall problem Alan Heavens Introduction to Bayes September 3, 2018 2 / 35

Some books for further reading D. Silvia & J. Skilling: Data Analysis: a Bayesian Tutorial (CUP) P. Saha: Principles of Data Analysis. (Capella Archive) http://www.physik.uzh.ch/ psaha/pda/pda-a4.pdf T. Loredo: Bayesian Inference in the Physical Sciences http://www.astro.cornell.edu/staff/loredo/bayes/ M. Hobson et al: Bayesian Methods in Cosmology (CUP) D. Mackay: Information Theory, Inference and Learning Algorithms. (CUP) http://www.inference.phy.cam.ac.uk/itprnn/book.pdf A. Gelman et al: Bayesian Data Analysis (CRC Press) Alan Heavens Introduction to Bayes September 3, 2018 3 / 35

Inverse Problems Analysis problems are inverse problems: given some data, we want to infer something about the process that generated the data Generally harder than predicting the outcome, given a physical process The latter is called forward modelling, or a generative model Typical classes of problem: Parameter inference Model comparison Alan Heavens Introduction to Bayes September 3, 2018 4 / 35

Typical problem: analyse WMAP Cosmic Microwave Background Data Alan Heavens Introduction to Bayes September 3, 2018 5 / 35

WMAP Cosmic Microwave Background Data ΛCDM fits WMAP data well: Alan Heavens Introduction to Bayes September 3, 2018 6 / 35

WMAP Cosmic Microwave Background Data ΛCDM fits WMAP data well: Alan Heavens Introduction to Bayes September 3, 2018 7 / 35

Bayesian Inference What questions do we want to answer? Parameter Inference: I have a set of (x, y) pairs, with errors. If I assume y = mx + c, what are m and c? I have detected 5 X-ray photons. What is the luminosity of the source and its uncertainty? Given LIGO gravitational wave data, what are the masses of the inspiralling objects? Given the Planck CMB map, how much Dark Matter is there? Alan Heavens Introduction to Bayes September 3, 2018 8 / 35

Bayesian Inference What questions do we want to answer? Model Comparison: Do data support General Relativity or Newtonian gravity? Is ΛCDM more probably than alternatives? Alan Heavens Introduction to Bayes September 3, 2018 9 / 35

The meaning of probability Probability describes the relative frequency of outcomes in infinitely long trials (Frequentist view) Probability expresses a degree of belief (Bayesian view) Logical proposition: a statement that could be true or false p(a B) is the degree to which truth of a logical proposition B implies that A is also true The Bayesian view expresses what we often want to know, e.g. given the Planck CMB data, what is the probability that the density parameter of cold dark matter is between 0.3 and 0.4? Alan Heavens Introduction to Bayes September 3, 2018 10 / 35

Probability rules p(x) + (not x) = 1 (sum) p(x, y) = p(x y) p(y) (product) p(x) = k p(x, y k) (marginalisation over all possible discrete y k values) p(x) = p(x, y) dy (marginalisation, continuous variables. p = pdf) p(x, y) = p(y, x) Bayes theorem: Bayes Theorem p(y x) = p(x y)p(y) p(x) Alan Heavens Introduction to Bayes September 3, 2018 11 / 35

Conditional probabilities Avoid the probability 101 mistake p(x y) is not the same as p(y x) e.g. Figure: Julia Margaret Cameron x = is male; y = has beard p(y x) 0.1 p(x y) 1 Alan Heavens Introduction to Bayes September 3, 2018 12 / 35

Example of p(x y) p(y x) errors Medical test An allergy test gives a positive result (T ) in allergic patients (A) with p = 0.8, and has a false positive rate of 0.1. You get a positive result. If 0.01 of the population are allergic, what is the probability that you are? Rule 1: write down what you want to know. It is... p(a T ) What we know are: p(t A) = 0.8, p(t A) = 0.1, p(a) = 0.01 Use Bayes theorem: p(a T ) = p(t A)p(A) p(t ) Marginalisation in the denominator: p(t ) = p(t, A) + p(t, A) and write p(t, A) = p(t A)p(A) and similarly for p(t, A): p(a T ) = p(t A)p(A) p(t A)p(A) + p(t A)p( A) Alan Heavens Introduction to Bayes September 3, 2018 13 / 35

Allergy problem Medical test An allergy test gives a positive result (T ) in allergic patients (A) with p = 0.8, and has a false positive rate of 0.1. You get a positive result. If 0.01 of the population are allergic, what is the probability that you are? Put in numbers: p(a T ) = p(a T ) = p(t A)p(A) p(t A)p(A) + p(t A)p( A) 0.8 0.01 0.8 0.01 + 0.1 0.99 = 0.075 So there is still a 92.5% chance that you do not have the allergy. Alan Heavens Introduction to Bayes September 3, 2018 14 / 35

Notation Data d; Model M; Model parameters θ Rule 1: write down what you want to know Usually, it is the probability distribution for the parameters, given the data, and assuming a model. p(θ d, M) This is the Posterior To compute it, we use Bayes theorem: p(θ d, M) = p(d θ, M)p(θ M) p(d M) where the Likelihood is L(d θ) = p(d θ, M) and the Prior is π(θ) = p(θ M) p(d M) is the Bayesian Evidence, which is important for Model Comparison, but not for Parameter Inference. Dropping the M dependence p(θ d) = L(d θ)π(θ) p(d) Alan Heavens Introduction to Bayes September 3, 2018 15 / 35

The Posterior p(θ d, M) If you just try long enough and hard enough, you can always manage to boot yourself in the posterior. A.J. Liebling. Alan Heavens Introduction to Bayes September 3, 2018 16 / 35

It is all probability The Posterior Everything is focussed on getting at p(θ d). Computing the posterior p(θ d) L(θ) π(θ). We need to analyse the problem: What are the data, d? What is the model for the data? What are the model parameters? What is the likelihood function L(θ)? What is the prior π(θ)? Alan Heavens Introduction to Bayes September 3, 2018 17 / 35

Priors Bayesian: prior = (usually) the state of knowledge before the new data are collected. For parameter inference, the prior becomes unimportant as more data are added and the likelihood dominates. For model comparison, the prior remains important. Issues: One usually wants an uninformative prior, but what does this mean? Typical choices: π(θ) = constant; π(θ) 1/θ - so-called Jeffreys prior (by Astronomers) Alan Heavens Introduction to Bayes September 3, 2018 18 / 35

Priors Alan Heavens Introduction to Bayes September 3, 2018 19 / 35

Uninformative prior A flat prior seems natural, but consider this problem. Imagine cartesian coordinates in N dimensions, with the prior range being ( 1 2, 1 2 ) for all coordinates. The prior probability of being inside the N-sphere which just fits inside the prior volume is π N/2 2 N Γ(1 + N/2) log 10 p vs N -10 20 40 60 80 100-20 -30-40 -50-60 -70 An apparently uninformative prior may be highly informative when viewed a different way. Alan Heavens Introduction to Bayes September 3, 2018 20 / 35

Case Study: fair coin? From Sivia & Skilling. Model: probability of a head is θ. Uniform prior in θ assumed. Sequence is HHTT... Second panel: p(θ H) p(h θ) π(θ) = θ Alan Heavens Introduction to Bayes September 3, 2018 21 / 35

Case Study: fair coin? 3 different priors, shown in first panel. After many data are collected, the posterior becomes insensitive to the prior. In this case, the highly informative prior that supposes the coin is almost fair needs more data to overrule the prior. Alan Heavens Introduction to Bayes September 3, 2018 22 / 35

Hubble parameter h 0.7, in line with expectations. BUT...there are NO data here. This is the prior, marginalised over an odd shape. Alan Heavens Introduction to Bayes September 3, 2018 23 / 35 High-dimensional priors An interesting non-experiment with the VSA (Slozar et al. 2003) Priors: Λ 0, 10 Age/Gyr 20. VSA CMB experiment (Slosar et al 2003) Priors: Ω Λ 0 10 age 20 Gyr h 0.7 ± 0.1 There are no data in these plots it is all coming from the prior! Z p( 1)= d j6=1 p(x ) p( )

High-dimensional priors Now with the data A posterior Alan Heavens Introduction to Bayes September 3, 2018 24 / 35

Sampling The posterior is rarely an analytic function, and evaluating it on a grid in parameter space is usually prohibitively expensive if there are more than 2 or 3 parameters. MCMC Standard technique is MCMC (Markov Chain Monte Carlo), where random steps are taken in parameter space, according to a proposal distribution, and accepted or rejected according to the Metropolis-Hastings algorithm. This gives a chain of samples of the posterior (or the likelihood), with an expected number density proportional to the posterior. MCMC example Alan Heavens Introduction to Bayes September 3, 2018 25 / 35

Planck parameter inference Assuming ΛCDM 0.0275 bh 2 0.0250 0.0225 0.0200 0.13 ch 2 0.12 0.11 0.10 3.20 ln(10 10 As) 3.12 3.04 2.96 1.02 0.99 ns Planck EE+lowP Planck TE+lowP Planck TT+lowP Planck TT,TE,EE+lowP 0.96 0.93 0.16 0.12 0.08 0.04 1.038 1.040 1.042 100 MC 0.0200 0.0225 0.0250 0.0275 0.10 0.11 0.12 0.13 2.96 3.04 3.12 3.20 bh 2 ch 2 ln(10 10 As) 0.93 0.96 0.99 1.02 ns 0.04 0.08 0.12 0.16 Alan Heavens Introduction to Bayes September 3, 2018 26 / 35

Case study: Eddington 1919 Eclipse expedition In General Relativity, light is bent by the Sun through an angle 4GM In Newtonian theory, the bend angle is 2GM. rc 2 rc 2. Figure: Illustration of the lensing effect (highly magnified). If we treat this as a parameter inference problem: bend angle at the limb of the Sun = α, and we will infer α. Alan Heavens Introduction to Bayes September 3, 2018 27 / 35

Case study: Eddington 1919 Eclipse expedition Analyse the experiment: What are the data? Measurements of displacements (Dx, Dy) of (7) stars, between eclipse plate(s) and a reference plate. What is the model? Displacements are radial, with magnitude α (arcsec) for light grazing the Sun. What are the model parameters? α What is the likelihood function? Measurement errors are Gaussian (assumption!) What prior should we choose? Uniform π(α) =constant. Alan Heavens Introduction to Bayes September 3, 2018 28 / 35

Case study: Eddington 1919 Eclipse expedition. Nuisance Parameters Hang on. Reference plate may not be centred correctly Plates may be rotated with respect to each other Eclipse plate may have be scaled in size (thermal effects/different instruments) Model for the data is Dx = ax + by + c + αe x Dy = dx + ey + f + αe y. (1) 7 parameters, including 6 nuisance parameters Likelihood L { } exp [Dx i (ax i + by i + c + αe xi )] 2 2σi 2 stars i Alan Heavens Introduction to Bayes September 3, 2018 29 / 35

Results from Plate II Using only displacements in R.A. (4 parameters) Figure: Hamiltonian MCMC samples of distribution of nuisance parameters a, b, c and bending at Solar limb α (in arcsec; GR predicts 1.75). Marginalising over a, b, c gives p(α) = p(a, b, c, α) da db dc. Alan Heavens Introduction to Bayes September 3, 2018 30 / 35

Marginalise over nuisance parameters! Figure: Exquisite BICEP B-mode CMB map (Credit: BICEP team). Alan Heavens Introduction to Bayes September 3, 2018 31 / 35

BICEP and losing the Nobel Prize Figure: BICEP article (Science 2.0). Dust contribution not marginalised over. Alan Heavens Introduction to Bayes September 3, 2018 32 / 35

Back to Eddington 1919 What if we do not marginalise over a, b, c? Alan Heavens Introduction to Bayes September 3, 2018 33 / 35

The Monty Hall Problem Alan Heavens Introduction to Bayes September 3, 2018 34 / 35

The Monty Hall Problem Alan Heavens Introduction to Bayes September 3, 2018 35 / 35