Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Similar documents
Monte Carlo in Bayesian Statistics

Bayesian Inference and MCMC

CSC 2541: Bayesian Methods for Machine Learning

Foundations of Statistical Inference

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Markov Chain Monte Carlo

Strong Lens Modeling (II): Statistical Methods

Introduction to Bayes

Introduction into Bayesian statistics

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference


Introduc)on to Bayesian Methods

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Learning the hyper-parameters. Luca Martino

Part 1: Expectation Propagation

Density Estimation. Seungjin Choi

Reminder of some Markov Chain properties:

Bayesian search for other Earths

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

Introduction to Bayesian Inference

Introduction to Bayesian thinking

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

MIT Spring 2016

Normalising constants and maximum likelihood inference

Improving power posterior estimation of statistical evidence

Tutorial on ABC Algorithms

Principles of Bayesian Inference

Session 3A: Markov chain Monte Carlo (MCMC)

Approximate Bayesian Computation: a simulation based approach to inference

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Other Noninformative Priors

David Giles Bayesian Econometrics

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

MCMC: Markov Chain Monte Carlo

an introduction to bayesian inference

Statistical Methods in Particle Physics

Bayesian model selection: methodology, computation and applications

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Bayesian Phylogenetics:

Bayesian Inference. Chapter 1. Introduction and basic concepts

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

ICES REPORT Model Misspecification and Plausibility

Multimodal Nested Sampling

Computational statistics

Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2

Principles of Bayesian Inference

Principles of Bayesian Inference

Likelihood-free MCMC

Lecture 8: Graphical models for Text

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Bayesian inference: what it means and why we care

Quantitative Biology II Lecture 4: Variational Methods

Lecture 7 and 8: Markov Chain Monte Carlo

Bayesian Methods in Multilevel Regression

Model selection, updating and prediction of fatigue. crack propagation using nested sampling algorithm

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Variational Methods in Bayesian Deconvolution

Penalized Loss functions for Bayesian Model Choice

Markov Chain Monte Carlo methods

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

David Giles Bayesian Econometrics

pyblocxs Bayesian Low-Counts X-ray Spectral Analysis in Sherpa

Notes on pseudo-marginal methods, variational Bayes and ABC

Bayesian Model Comparison:

STA 294: Stochastic Processes & Bayesian Nonparametrics

Probability and (Bayesian) Data Analysis

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

7. Estimation and hypothesis testing. Objective. Recommended reading

STAT 425: Introduction to Bayesian Analysis

Bayesian Evidence and Model Selection: A Tutorial in Two Acts

TDA231. Logistic regression

Bayesian room-acoustic modal analysis

Understanding Covariance Estimates in Expectation Propagation

Lecture 6: Model Checking and Selection

Part III. A Decision-Theoretic Approach and Bayesian testing

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

New Insights into History Matching via Sequential Monte Carlo

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

New Bayesian methods for model comparison

A Very Brief Summary of Bayesian Inference, and Examples

Bayesian Inference in Astronomy & Astrophysics A Short Course

ST 740: Model Selection

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Lecture 13 : Variational Inference: Mean Field Approximation

Down by the Bayes, where the Watermelons Grow

Sequential Monte Carlo Methods

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Assessing Regime Uncertainty Through Reversible Jump McMC

STA414/2104 Statistical Methods for Machine Learning II

Approximate Bayesian computation: methods and applications for complex systems

All models are wrong but some are useful. George Box (1979)

CSC321 Lecture 18: Learning Probabilistic Models

Transcription:

Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/

is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular in astrophysics and has some unique strengths.

Marginal Likelihood The marginal likelihood is useful for model selection. Consider two models: M 1 with parameters θ 1, M 2 with parameters θ 2. The marginal likelihoods are: p(d M 1 ) = p(θ 1 M 1 )p(d θ 1, M 1 ) dθ 1 p(d M 2 ) = p(θ 2 M 2 )p(d θ 2, M 2 ) dθ 2 These are the normalising constants of the posteriors, within each model.

Bayesian Model Selection If you have the marginal likelihoods, it s easy: P(M 1 D) P(M 2 D) = P(M 1) P(M 2 ) P(D M 1) P(D M 2 ). (posterior odds) = (prior odds) (bayes factor)

Challenging features Another motivation: standard MCMC methods can get stuck in the following situations: y x

was built to estimate the marginal likelihood. But it can also be used to generate posterior samples, and it can potentially work on harder problems where standard MCMC methods get stuck.

Notation When discussing, we use different symbols: p(d M 1 ) = p(θ 1 M 1 )p(d θ 1, M 1 ) dθ 1 becomes Z = π(θ)l(θ) dθ. Z = marginal likelihood, L(θ) = likelihood function, π(θ) = prior distribution.

Imagine we had an easy 1-D problem, with a Uniform(0, 1) prior, and a likelihood that was strictly decreasing.

The key idea of : Our high dimensional problem can be mapped onto the easy 1-D problem. Figure from Skilling (2006):

X Define X(L ) = π(θ)1 (L(θ) > L ) dθ X is the amount of prior probability with likelihood greater than L. Loosely, X is the volume with likelihood above L. Higher L lower volume.

Numerical Integration If we had some points with likelihoods L i, and we knew the corresponding X-values, we could approximate the integral numerically, using the trapezoidal rule or something similar.

Procedure This procedure gives us the likelihood values. Sample θ = {θ 1,..., θ N } from the prior π(θ). Find the point θ k with the worst likelihood, and let L be its likelihood. Replace θ k with a new point from π(θ) but restricted to the region where L(θ) > L. Repeat the last two steps many times. The discarded points (the worst one at each iteration) are the output.

Generating the new point We need a new point from π(θ) but restricted to the region where L(θ) > L. The point being replaced has the worst likelihood, so all the other points satisfy the constraint! So we can use one of the other points to initialise an MCMC run, trying to sample the prior, but rejecting any proposal with likelihood below L. See code.

Generating the new point There are alternative versions of NS available, such as MultiNest, that use different methods (not MCMC) to generate the new point. I also have a version of NS called Diffusive, which is a better way of doing NS when using MCMC. I m happy to discuss it offline.

Procedure gives us a sequence of points with increasing likelihoods, but we need to somehow know their X-values!

Estimating the X values Consider the simple one-dimensional problem with Uniform(0, 1) prior. When we generate N points from the prior, the distribution for the X-value of the worst point is Beta(N, 1). So we can use a draw from Beta(N, 1) as a guess of the X value.

Estimating the X values Each iteration, the worst point should reduce the volume by a factor that has a Beta(N, 1) distribution. So we can do this: X 1 = t 1 X 2 = t 2 X 1 X 3 = t 3 X 2 and so on, where t i Beta(N, 1). Alternatively, we can use a simple approximation.

Deterministic Approximation Figure: Deterministic approximation. Each iteration reduces the volume by a factor e 1/N. e.g. if N = 5, the worst likelihood accounts for about 1/5th of the remaining prior volume.

Posterior Distribution from The posterior sample can be obtained by assigning weights W j to the discarded points: W j = L jw j Z where w j = X j 1 X j+1 is the prior weight/width associated with the point. The effective sample size is given by m ESS = exp W j log W j j=1

Information NS can also calculate the information, also known as the Kullback-Liebler divergence from the prior to the posterior. H = [ ] p(θ D) p(θ D) log dθ p(θ) ( ) volume of prior log volume of posterior

Code I have written a basic implementation of in Python. Let s use it on the transit problem and the asteroseismology problem.

Plots

Plots A necessary but not sufficient condition for everything being okay is that you see the entire peak in the posterior weights. If it s not there, you haven t done enough NS iterations. i.e. your parameter values have lower likelihoods than what is typical of the posterior distribution.

Plots The shape of the log(l) vs. log(x) plot is also informative: if it is straight for a long time, or concave up at some point, your problem contains a phase transition, and it s a good thing you used!