# Bayesian Inference in Astronomy & Astrophysics A Short Course

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37

2 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning How To Count: Poisson Processes Miscellany: Frequentist Behavior, Experimental Design Why Try Bayesian Methods? p.2/37

3 Overview of Bayesian Inference What to do What s different about it How to do it: Tools for Bayesian calculation p.3/37

4 What To Do: The Bayesian Recipe Assess hypotheses by calculating their probabilities p(h i...) conditional on known and/or presumed information using the rules of probability theory. But... what does p(h i...) mean? p.4/37

5 What is distributed in p(x)? Frequentist: Probability describes randomness Venn, Boole, Fisher, Neymann, Pearson... x is a random variable if it takes different values throughout an infinite (imaginary?) ensemble of identical sytems/experiments. p(x) describes how x is distributed throughout the ensemble. P x is distributed Probability frequency (pdf histogram). x p.5/37

6 Bayesian: Probability describes uncertainty Bernoulli, Laplace, Bayes, Gauss... p(x) describes how probability (plausibility) is distributed among the possible choices for x in the case at hand. Analog: a mass density, ρ(x) P p is distributed x has a single, uncertain value Relationships between probability and frequency were demonstrated mathematically (large number theorems, Bayes s theorem). x p.6/37

7 Interpreting Abstract Probabilities Symmetry/Invariance/Counting Resolve possibilities into equally plausible microstates using symmetries Count microstates in each possibility Frequency from probability Bernoulli s laws of large numbers: In repeated trials, given P (success), predict Nsuccess N total P as N p.7/37

8 Probability from frequency Bayes s An Essay Towards Solving a Problem in the Doctrine of Chances Bayes s theorem Probability Frequency! p.8/37

9 Bayesian Probability: A Thermal Analogy Intuitive notion Quantification Calibration Hot, cold Temperature, T Cold as ice = 273K Boiling hot = 373K uncertainty Probability, P Certainty = 0, 1 p = 1/36: plausible as snake s eyes p = 1/1024: plausible as 10 heads p.9/37

10 The Bayesian Recipe Assess hypotheses by calculating their probabilities p(h i...) conditional on known and/or presumed information using the rules of probability theory. Probability Theory Axioms ( grammar ): OR (sum rule) P (H 1 + H 2 I) = P (H 1 I) + P (H 2 I) P (H 1, H 2 I) AND (product rule) P (H 1, D I) = P (H 1 I) P (D H 1, I) = P (D I) P (H 1 D, I) p.10/37

11 Direct Probabilities ( vocabulary ): Certainty: If A is certainly true given B, P (A B) = 1 Falsity: If A is certainly false given B, P (A B) = 0 Other rules exist for more complicated types of information; for example, invariance arguments, maximum (information) entropy, limit theorems (CLT; tying probabilities to frequencies), bold (or desperate!) presumption... p.11/37

12 Normalization: Three Important Theorems For exclusive, exhaustive H i P (H i ) = 1 Bayes s Theorem: i P (H i D, I) = P (H i I) P (D H i, I) P (D I) posterior prior likelihood p.12/37

13 Marginalization: Note that for exclusive, exhaustive {B i }, P (A, B i I) = P (B i A, I)P (A I) = P (A I) i i = P (B i I)P (A B i, I) i We can use {B i } as a basis to get P (A I). Example: Take A = D, B i = H i ; then P (D I) = i = i P (D, H i I) P (H i I)P (D H i, I) prior predictive for D = Average likelihood for H i p.13/37

14 Inference With Parametric Models Parameter Estimation I = Model M with parameters θ (+ any add l info) H i = statements about θ; e.g. θ [2.5, 3.5], or θ > 0 Probability for any such statement can be found using a probability density function (pdf) for θ: P (θ [θ, θ + dθ] ) = f(θ)dθ = p(θ )dθ p.14/37

15 Posterior probability density: p(θ D, M) = p(θ M) L(θ) dθ p(θ M) L(θ) Summaries of posterior: Best fit values: mode, posterior mean Uncertainties: Credible regions (e.g., HPD regions) Marginal distributions: Interesting parameters ψ, nuisance parameters φ Marginal dist n for ψ: p(ψ D, M) = dφ p(ψ, φ D, M) Generalizes propagation of errors p.15/37

16 Model Uncertainty: Model Comparison I = (M 1 + M ) Specify a set of models. H i = M i Hypothesis chooses a model. Posterior probability for a model: p(m i D, I) = p(m i I) p(d M i, I) p(d I) p(m i )L(M i ) But L(M i ) = p(d M i ) = dθ i p(θ i M i )p(d θ i, M i ). Likelihood for model = Average likelihood for its parameters L(M i ) = L(θ i ) p.16/37

17 Model Uncertainty: Model Averaging Models have a common subset of interesting parameters, ψ. Each has different set of nuisance parameters φ i (or different prior info about them). H i = statements about ψ. Calculate posterior PDF for ψ: p(ψ D, I) = i i p(m i D, I) p(ψ D, M i ) L(M i ) dθ i p(ψ, φ i D, M i ) The model choice is itself a (discrete) nuisance parameter here. p.17/37

18 An Automatic Occam s Razor Predictive probabilities can favor simpler models: p(d M i ) = dθ i p(θ i M) L(θ i ) P(D H) Simple H Complicated H D obs D p.18/37

19 The Occam Factor: p, L Likelihood Prior δθ p(d M i ) = θ dθ i p(θ i M) L(θ i ) p(ˆθ i M)L(ˆθ i )δθ i θ L(ˆθ i ) δθ i θ i = Maximum Likelihood Occam Factor Models with more parameters often make the data more probable for the best fit. Occam factor penalizes models for wasted volume of parameter space. p.19/37

20 What s the Difference? Bayesian Inference (BI): Specify at least two competing hypotheses and priors Calculate their probabilities using probability theory Parameter estimation: p(θ D, M) = p(θ M)L(θ) dθ p(θ M)L(θ) Model Comparison: O dθ1 p(θ 1 M 1 ) L(θ 1 ) dθ2 p(θ 2 M 2 ) L(θ 2 ) p.20/37

21 Frequentist Statistics (FS): Specify null hypothesis H 0 such that rejecting it implies an interesting effect is present Specify statistic S(D) that measures departure of the data from null expectations Calculate p(s H 0 ) = dd p(d H 0 )δ[s S(D)] (e.g. by Monte Carlo simulation of data) Evaluate S(D obs ); decide whether to reject H 0 based on, e.g., >S obs ds p(s H 0 ) p.21/37

22 The role of subjectivity: Crucial Distinctions BI exchanges (implicit) subjectivity in the choice of null & statistic for (explicit) subjectivity in the specification of alternatives. Makes assumptions explicit Guides specification of further alternatives that generalize the analysis Automates identification of statistics: BI is a problem-solving approach FS is a solution-characterization approach The types of mathematical calculations: BI requires integrals over hypothesis/parameter space FS requires integrals over sample/data space p.22/37

23 Complexity of Statistical Integrals Inference with independent data: Consider N data, D = {x i }; and model M with m parameters (m N). Suppose L(θ) = p(x 1 θ) p(x 2 θ) p(x N θ). Frequentist integrals: dx 1 p(x 1 θ) dx 2 p(x 2 θ) dx N p(x N θ)f(d) Seek integrals with properties independent of θ. Such rigorous frequentist integrals usually can t be found. Approximate (e.g., asymptotic) results are easy via Monte Carlo (due to independence). p.23/37

24 Bayesian integrals: d m θ g(θ) p(θ M) L(θ) Such integrals are sometimes easy if analytic (especially in low dimensions). Asymptotic approximations require ingredients familiar from frequentist calculations. For large m (> 4 is often enough!) the integrals are often very challenging because of correlations (lack of independence) in parameter space. p.24/37

25 How To Do It Tools for Bayesian Calculation Asymptotic (large N) approximation: Laplace approximation Low-D Models (m< 10): Randomized Quadrature: Quadrature + dithering Subregion-Adaptive Quadrature: ADAPT, DCUHRE, BAYESPACK Adaptive Monte Carlo: VEGAS, miser High-D Models (m ): Posterior Sampling Rejection method Markov Chain Monte Carlo (MCMC) p.25/37

26 Laplace Approximations Suppose posterior has a single dominant (interior) mode at ˆθ, with m parameters p(θ M)L(θ) p(ˆθ M)L(ˆθ) exp [ 1 ] 2 (θ ˆθ)I(θ ˆθ) where I = 2 ln[p(θ M)L(θ)] 2 θ ˆθ, Info matrix p.26/37

27 Bayes Factors: dθ p(θ M)L(θ) p(ˆθ M)L(ˆθ) (2π) m/2 I 1/2 Marginals: Profile likelihood L p (θ) max φ L(θ, φ) p(θ D, M) L p (θ) I(θ) 1/2 p.27/37

28 The Laplace approximation: Uses same ingredients as common frequentist calculations Uses ratios approximation is often O(1/N) Using unit info prior in i.i.d. setting Schwarz criterion; Bayesian Information Criterion (BIC) ln B ln L(ˆθ) ln L(ˆθ, ˆφ) (m 2 m 1 ) ln N Bayesian counterpart to adjusting χ 2 for d.o.f., but accounts for parameter space volume. p.28/37

29 Low-D (m< 10): Quadrature & Monte Carlo Quadrature/Cubature Rules: dθ f(θ) i w i f(θ i ) + O(n 2 ) or O(n 4 ) Smoothness fast convergence in 1-D Curse of dimensionality O(n 2/m ) or O(n 4/m ) in m-d p.29/37

30 Monte Carlo Integration: dθ g(θ)p(θ) θ i p(θ) g(θ i ) + O(n 1/2 ) O(n 1 ) with quasi-mc Ignores smoothness poor performance in 1-D Avoids curse: O(n 1/2 ) regardless of dimension Practical problem: multiplier is large (variance of g) hard if m> 6 (need good importance sampler p) p.30/37

31 Randomized Quadrature: Quadrature rule + random dithering of abscissas get benefits of both methods Most useful in settings resembling Gaussian quadrature Subregion-Adaptive Quadrature/MC: Concentrate points where most of the probability lies via recursion Adaptive quadrature: Use a pair of lattice rules (for error estim n), subdivide regions w/ large error (ADAPT, DCUHRE, BAYESPACK by Genz et al.) Adaptive Monte Carlo: Build the importance sampler on-the-fly (e.g., VEGAS, miser in Numerical Recipes) p.31/37

32 Subregion-Adaptive Quadrature Concentrate points where most of the probability lies via recursion. Use a pair of lattice rules (for error estim n), subdivide regions w/ large error. ADAPT in action (galaxy polarizations) p.32/37

33 Posterior Sampling General Approach: Draw samples of θ, φ from p(θ, φ D, M); then: Integrals, moments easily found via i f(θ i, φ i ) {θ i } are samples from p(θ D, M) But how can we obtain {θ i, φ i }? Rejection Method: P( ) θ θ Hard to find efficient comparison function if m> 6. p.33/37

34 Markov Chain Monte Carlo (MCMC) Let Λ(θ) = ln [p(θ M) p(d θ, M)] Then p(θ D, M) = e Λ(θ) Z Z dθ e Λ(θ) Bayesian integration looks like problems addressed in computational statmech and Euclidean QFT. Markov chain methods are standard: Metropolis; Metropolis-Hastings; molecular dynamics; hybrid Monte Carlo; simulated annealing; thermodynamic integration p.34/37

35 A Complicated Marginal Distribution Nascent neutron star properties inferred from neutrino data from SN 1987A Two variables derived from 9-dimensional posterior distribution. p.35/37

36 The MCMC Recipe: Create a time series of samples θ i from p(θ): Draw a candidate θ i+1 from a kernel T (θ i+1 θ i ) Enforce detailed balance by accepting with p = α [ α(θ i+1 θ i ) = min 1, T (θ ] i θ i+1 )p(θ i+1 ) T (θ i+1 θ i )p(θ i ) Choosing T to minimize burn-in and corr ns is an art. Coupled, parallel chains eliminate this for select problems ( exact sampling ). p.36/37

37 Summary Bayesian/frequentist differences: Probabilities for hypotheses vs. for data Problem solving vs. solution characterization Integrals: Parameter space vs. sample space Computational techniques for Bayesian inference: Large N: Laplace approximation Exact: Adaptive quadrature for low-d Posterior sampling for hi-d p.37/37

### Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Bayesian

### Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

### Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

### Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics Department

### Model comparison and selection

BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

### Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

### Density Estimation. Seungjin Choi

Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

### PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

### Bayesian Computation: Posterior Sampling & MCMC

Bayesian Computation: Posterior Sampling & MCMC Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Cosmic Populations @ CASt 9 10 June 2014 1/42 Posterior

### Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

### Unsupervised Learning

Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

### Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

### Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

### Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

### CPSC 540: Machine Learning

CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

### STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

### Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

### Confidence Distribution

Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

### Probabilistic Modelling and Bayesian Inference

Probabilistic Modelling and Bayesian Inference Zoubin Ghahramani University of Cambridge, UK Uber AI Labs, San Francisco, USA zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS Tübingen Lectures

### Bayesian Statistics from Subjective Quantum Probabilities to Objective Data Analysis

Bayesian Statistics from Subjective Quantum Probabilities to Objective Data Analysis Luc Demortier The Rockefeller University Informal High Energy Physics Seminar Caltech, April 29, 2008 Bayes versus Frequentism...

### MCMC notes by Mark Holder

MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

### P Values and Nuisance Parameters

P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;

### Parametric Techniques

Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

### Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics

### Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference

### Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

### A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

### arxiv:astro-ph/ v1 14 Sep 2005

For publication in Bayesian Inference and Maximum Entropy Methods, San Jose 25, K. H. Knuth, A. E. Abbas, R. D. Morris, J. P. Castle (eds.), AIP Conference Proceeding A Bayesian Analysis of Extrasolar

### σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

### Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

### Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

### A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

### Probabilistic Modelling and Bayesian Inference

Probabilistic Modelling and Bayesian Inference Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS Tübingen Lectures

### Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

### Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

### Example: Ground Motion Attenuation

Example: Ground Motion Attenuation Problem: Predict the probability distribution for Peak Ground Acceleration (PGA), the level of ground shaking caused by an earthquake Earthquake records are used to update

### Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

### SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

### Statistical Methods for Astronomy

Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical

### Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia,, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University Outline Lecture 1 Introduction Confidence Intervals

### Approximate Inference using MCMC

Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

### (4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

### Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman

IB200, Spring 2016 University of California, Berkeley Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman 1 Basic Probability Theory Probability is a quantitative measurement of

### SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

### Foundations of Statistical Inference

Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

### Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

### Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal

### Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

### An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

### Some Topics in Statistical Data Analysis

Some Topics in Statistical Data Analysis Invisibles School IPPP Durham July 15, 2013 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan

### 1. Fisher Information

1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

### Probabilistic Graphical Models

Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

### Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified

### Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

### Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

### arxiv: v3 [stat.me] 11 Feb 2018

arxiv:1708.02742v3 [stat.me] 11 Feb 2018 Minimum message length inference of the Poisson and geometric models using heavy-tailed prior distributions Chi Kuen Wong, Enes Makalic, Daniel F. Schmidt February

### Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

### Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

### Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

### Principles of Statistical Inference

Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

### Direct Simulation Methods #2

Direct Simulation Methods #2 Econ 690 Purdue University Outline 1 A Generalized Rejection Sampling Algorithm 2 The Weighted Bootstrap 3 Importance Sampling Rejection Sampling Algorithm #2 Suppose we wish

### Theory and Methods of Statistical Inference

PhD School in Statistics cycle XXIX, 2014 Theory and Methods of Statistical Inference Instructors: B. Liseo, L. Pace, A. Salvan (course coordinator), N. Sartori, A. Tancredi, L. Ventura Syllabus Some prerequisites:

### Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture

### Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods

Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods Konstantin Zuev Institute for Risk and Uncertainty University of Liverpool http://www.liv.ac.uk/risk-and-uncertainty/staff/k-zuev/

### Bayesian Machine Learning

Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

### Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009

### Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

### A Convolution Method for Folding Systematic Uncertainties into Likelihood Functions

CDF/MEMO/STATISTICS/PUBLIC/5305 Version 1.00 June 24, 2005 A Convolution Method for Folding Systematic Uncertainties into Likelihood Functions Luc Demortier Laboratory of Experimental High-Energy Physics

### Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

### MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

### Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Journal of Modern Applied Statistical Methods Volume 13 Issue 1 Article 26 5-1-2014 Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Yohei Kawasaki Tokyo University

### Bayesian methods in the search for gravitational waves

Bayesian methods in the search for gravitational waves Reinhard Prix Albert-Einstein-Institut Hannover Bayes forum Garching, Oct 7 2016 Statistics as applied Probability Theory Probability Theory: extends

### Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

### Bayesian Modeling, Inference, Prediction and Decision-Making

Bayesian Modeling, Inference, Prediction and Decision-Making 5a: Simulation-Based Computation David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz Short Course

### The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

### MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review

MCMC Review http://jackman.stanford.edu/mcmc/icpsr99.pdf http://students.washington.edu/fkrogsta/bayes/stat538.pdf http://www.stat.berkeley.edu/users/terry/classes/s260.1998 /Week9a/week9a/week9a.html

### Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

### Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

### Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

### Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

### Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,

### Measuring nuisance parameter effects in Bayesian inference

Measuring nuisance parameter effects in Bayesian inference Alastair Young Imperial College London WHOA-PSI-2017 1 / 31 Acknowledgements: Tom DiCiccio, Cornell University; Daniel Garcia Rasines, Imperial

### General Construction of Irreversible Kernel in Markov Chain Monte Carlo

General Construction of Irreversible Kernel in Markov Chain Monte Carlo Metropolis heat bath Suwa Todo Department of Applied Physics, The University of Tokyo Department of Physics, Boston University (from

### APM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53

APM 541: Stochastic Modelling in Biology Bayesian Inference Jay Taylor Fall 2013 Jay Taylor (ASU) APM 541 Fall 2013 1 / 53 Outline Outline 1 Introduction 2 Conjugate Distributions 3 Noninformative priors

### Bayesian Inference for Normal Mean

Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

### HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

### STA 294: Stochastic Processes & Bayesian Nonparametrics

MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

### A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism?

A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism? Rajendran Raja Fermilab, Batavia, IL 60510, USA PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003 Maximum likelihood

### 32. STATISTICS. 32. Statistics 1

32. STATISTICS 32. Statistics 1 Revised September 2007 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in High Energy Physics. In statistics, we are interested in using a

Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

### Nuisance parameters and their treatment

BS2 Statistical Inference, Lecture 2, Hilary Term 2008 April 2, 2008 Ancillarity Inference principles Completeness A statistic A = a(x ) is said to be ancillary if (i) The distribution of A does not depend

### Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

### ACCOUNTING FOR INPUT-MODEL AND INPUT-PARAMETER UNCERTAINTIES IN SIMULATION. <www.ie.ncsu.edu/jwilson> May 22, 2006

ACCOUNTING FOR INPUT-MODEL AND INPUT-PARAMETER UNCERTAINTIES IN SIMULATION Slide 1 Faker Zouaoui Sabre Holdings James R. Wilson NC State University May, 006 Slide From American

### On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

### Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

### Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition

Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition Karl-Rudolf Koch Introduction to Bayesian Statistics Second, updated and enlarged Edition With 17 Figures Professor Dr.-Ing., Dr.-Ing.

### GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The