Reminder of some Markov Chain properties:

Similar documents
Bayesian Inference and MCMC

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov chain Monte Carlo

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Computational statistics

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

MCMC Methods: Gibbs and Metropolis

Bayesian Methods for Machine Learning

19 : Slice Sampling and HMC

Monte Carlo Methods. Leon Gu CSD, CMU

16 : Approximate Inference: Markov Chain Monte Carlo

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Markov chain Monte Carlo

Bayes Nets: Sampling

ST 740: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Computer intensive statistical methods


Markov Chain Monte Carlo methods

MARKOV CHAIN MONTE CARLO

Principles of Bayesian Inference

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

CPSC 540: Machine Learning

Sampling Algorithms for Probabilistic Graphical models

Answers and expectations

STA 294: Stochastic Processes & Bayesian Nonparametrics

Markov Chain Monte Carlo

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

LECTURE 15 Markov chain Monte Carlo

The Metropolis-Hastings Algorithm. June 8, 2012

Markov chain Monte Carlo

Probabilistic Machine Learning

MCMC algorithms for fitting Bayesian models

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Markov Chain Monte Carlo (MCMC)

Monte Carlo Markov Chains: A Brief Introduction and Implementation. Jennifer Helsby Astro 321

A Bayesian Approach to Phylogenetics

Lecture 6: Markov Chain Monte Carlo

CSC 2541: Bayesian Methods for Machine Learning

Brief introduction to Markov Chain Monte Carlo

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Monte Carlo in Bayesian Statistics

Tutorial on Probabilistic Programming with PyMC3

MCMC: Markov Chain Monte Carlo

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

17 : Markov Chain Monte Carlo

DAG models and Markov Chain Monte Carlo methods a short overview

CS 343: Artificial Intelligence

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Multimodal Nested Sampling

Markov Chain Monte Carlo, Numerical Integration

Monte Carlo Inference Methods

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Introduction to Machine Learning

Bayesian Phylogenetics:

Computer Practical: Metropolis-Hastings-based MCMC

STAT 425: Introduction to Bayesian Analysis

Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods

Bayesian Networks in Educational Assessment

Down by the Bayes, where the Watermelons Grow

an introduction to bayesian inference

The Recycling Gibbs Sampler for Efficient Learning

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Tutorial on ABC Algorithms

28 : Approximate Inference - Distributed MCMC

STA 4273H: Statistical Machine Learning

Advanced Statistical Modelling

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

CS 188: Artificial Intelligence. Bayes Nets

A Review of Pseudo-Marginal Markov Chain Monte Carlo

An MCMC Package for R

Machine Learning. Probabilistic KNN.

Markov Chain Monte Carlo

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Advances and Applications in Perfect Sampling

MCMC for Cut Models or Chasing a Moving Target with MCMC

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Strong Lens Modeling (II): Statistical Methods

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Computational Statistics with Matlab. Mark Steyvers

Bayesian GLMs and Metropolis-Hastings Algorithm

Markov chain Monte Carlo

Transcription:

Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent of the past)

Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically + 2. only state that matters is where you currently are (i.e. given present, future is independent of the past) = the next value x1 depends only on the last value x0 according to some transition probability p(x0 x1)

Reminder of some Markov Chain properties: the next value x1 depends only on the last value x0 according to some transition probability p(x0 x1) if you can get to any state from any state in a finite number of steps, the distribution of the population in each state remains the same after a while. This is referred as the stationary distribution. probability distribution! π(x) Probability density for the state x for an infinitely long chain. π*p = π

Markov Chain Monte Carlo 1) Start from some initial parameter value 2) Evaluate the unnormalized posterior 3) Propose a new parameter value 4) Evaluate the new unnormalized posterior 5) Decide whether or not to accept the new value 6) Repeat 3-5 * *

The idea of MCMC * Could we find a transition rule p such that the stationary distribution: - Exists - Equals to the posterior (target distribution)

Markov Chain Monte Carlo 1) Start from some initial parameter value 2) Evaluate the unnormalized posterior 3) Propose a new parameter value 4) Evaluate the new unnormalized posterior 5) Decide whether or not to accept the new value 6) Repeat 3-5 * *

Markov Chain Monte Carlo Looks remarkably similar to optimization Evaluating posterior rather than just likelihood Repeat does not have a stopping condition Criteria for accepting a proposed step Optimization diverse variety of options but no rule MCMC stricter criteria for accepting Converges in distribution rather than to a single point

Markov Chain Monte Carlo (vs rejection sampling) 1) Start from some initial parameter value 2) Evaluate the unnormalized posterior 3) Propose a new parameter value depending on the previous value, according to a transition probability (jump function needed!) 4) Evaluate the new unnormalized posterior 5) Decide whether or not to accept the new value in comparison to the value from the previous step (no envelope function needed!) 6) Repeat

Rejection Sampling MCMC Metropolis algorithm =0.00548 =1.59e-08 =9.65e-06 =0.371 =0.103 w =1.01e-08 w =0.111 w =1.92e-09 w =0.0126 w =1.1e-51 Explore everywhere! Perturb parameters: Q( 0 ; ), e.g. N (, 2 )! P ( 0 D) Accept with probability min 1, P ( D) 2.5 Avoid empty space! 2 1.5 Otherwise keep old parameters Detail: Metropolis, as stated, requires Q( 0 ; ) =Q( ; 0 ) 3 1 0.5 0 0 0.5 1 1.5 2 2.5 3 This subfigure from PRML, Bishop (2006) - Samples are independent - Each sample dependent on previous - Parallelizable (computationally effective) - Sequential (computationally costly) - Easy to implement, easy to design - Easy to implement, harder to design - Need A LOT of draws in higher dimensions (computationally costly) - Sample everywhere, approximate answer - Need slightly more draws than usual in higher dimensions (computationally effective) - Did you miss anything? need to assess convergence

Metropolis Algorithm Most popular form of MCMC Can be applied to most any problem Implementation requires little additional thought beyond writing the model Evaluation/Tuning does require the most skill & experience Indirect Method Requires a second distribution to propose steps

Metropolis Algorithm 1) Start from some initial parameter value q c 2) Evaluate the unnormalized posterior p(q c X) 3) Propose a new parameter value q Random draw from a jump distribution centered on the current parameter value 4) Evaluate new unnormalized posterior p(q X) 5) Decide whether or not to accept the new value Accept new value with probability 6) Repeat 3-5 a = p(q ) / p(q c ) if runif(1,0,1) < a accept else reject

p(x 0! x 1 ) = P(x 1 ) / P(x 0 ) θ c θ * θ * θ c Accept new value with probability a = P(θ * ) / P(θ c ) means P(θ * ) > P(θ c ): a > 1 P(θ * ) < P(θ c ): 0<a<1 accept always accept sometimes if runif(1,0,1) < a accept else reject

Proposal function Idea: concentrate the sampling in the good areas N = 0 A = 0 AR = 0 Target distribution Proposal function x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas N = 1 A = 1 AR = 1 Target distribution x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas N = 2 A = 2 AR = 1 Target distribution x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas N = 3 A = 2 AR = 2/3 Target distribution x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas N = 4 A = 3 AR = 3/4 Target distribution x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas N = 5 A = 3 AR = 3/5 Target distribution x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas N = 6 A = 4 AR = 4/6 Target distribution x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas N = 7 A = 5 AR = 5/7 Target distribution x N Iterations x

Proposal function Idea: concentrate the sampling in the good areas AR = 30-70% Target distribution x Iterations x

Convergence to the target distribution do you have a representative sample of the target distribution (posterior) yet? youtube: youtu.be/zl2lg_nfi80

Jump distribution For Metropolis, Jump distribution J must be SYMMETRIC J(q q c ) = J(q c q )

J c J c J c = J c

Jump distribution For Metropolis, Jump distribution J must be SYMMETRIC J(q q c ) = J(q c q ) Most common Jump distribution is the Normal J(q q c ) = N(q q c,u) User must set the variance of the jump Trial-and-error Tune to get acceptance rate 30-70% Low acceptance = decrease variance (smaller step) Hi acceptance = increase variance (bigger step)

Speed of convergence Speed of convergence is determined by the choice of good proposal distributions

Example Normal with known variance, unknown mean Prior: N(53,10000) Data: y = 43 Known variance: 100 Initial conditions, 3 chains starting at -100, 0, 100 Jump distribution = Normal Jump variance = 3,10,30

Acceptance = 90% Acceptance = 70% Acceptance = 12%

higher autocorrelation, lower effective sample size

Metropolis-Hastings Generalization of Metropolis Allows for asymmetric Jump distribution Acceptance criteria a= p / J c p c / J c Most commonly arise due to bounds on parameter values / non-normal Jump distributions

Symmetric proposal function J c Asymmetric proposal function J c J c J c J c = J c J c = J c

Adjust proposal to posterior shape for effective sampling J c J c

Multivariate example Bivariate Normal N 2 [ 0 0], [ 1 0.5 0.5 1 ] Option 1: Draw from joint distribution J =N 2 [ ] [ c] 1 1 2 2 c,v Option 2: Draw from each parameter iteratively J 1 =N 1 1 c,v 1 J 2 =N 2 2 c,v 2

Joint Iterative

Joint Iterative

Hartig et al. 2011, Ecology Letters theoreticalecology.wordpress.com