Answers and expectations

Similar documents
Part IV: Monte Carlo and nonparametric Bayes

16 : Markov Chain Monte Carlo (MCMC)

Bayesian Methods for Machine Learning

Bayesian Inference and MCMC

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

I. Bayesian econometrics

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Lecture 7 and 8: Markov Chain Monte Carlo

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Markov chain Monte Carlo


CPSC 540: Machine Learning

Monte Carlo Methods. Geoff Gordon February 9, 2006

Markov chain Monte Carlo Lecture 9

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

Robert Collins CSE586, PSU Intro to Sampling Methods

Monte Carlo in Bayesian Statistics

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

LECTURE 15 Markov chain Monte Carlo

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

CSC 446 Notes: Lecture 13

Bayes Nets: Sampling

Introduction to Machine Learning CMU-10701

Machine Learning using Bayesian Approaches

Markov Chain Monte Carlo (MCMC)

Robert Collins CSE586, PSU Intro to Sampling Methods

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Lecture 8: The Metropolis-Hastings Algorithm

MCMC Methods: Gibbs and Metropolis

Markov Chains and MCMC

A Bayesian Approach to Phylogenetics

MCMC: Markov Chain Monte Carlo

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Stochastic Simulation

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Reminder of some Markov Chain properties:

Graphical Models and Kernel Methods

Sampling Methods (11/30/04)

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Probabilistic Machine Learning

Robert Collins CSE586, PSU Intro to Sampling Methods

Results: MCMC Dancers, q=10, n=500

Bayesian Inference for Dirichlet-Multinomials

17 : Markov Chain Monte Carlo

Variational Scoring of Graphical Model Structures

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Phylogenetics:

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

an introduction to bayesian inference

Density Estimation. Seungjin Choi

Markov Chain Monte Carlo

Introduction to Bayesian Statistics 1

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Markov chain Monte Carlo

The Metropolis-Hastings Algorithm. June 8, 2012

Approximate Inference

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Monte Carlo Inference Methods

F denotes cumulative density. denotes probability density function; (.)

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

MCMC algorithms for fitting Bayesian models

Bayesian Phylogenetics

Bayesian Regression Linear and Logistic Regression

Generating the Sample

Markov Chain Monte Carlo (MCMC)

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Computational statistics

Auxiliary Particle Methods

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Machine Learning for Data Science (CS4786) Lecture 24

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Sampling from complex probability distributions

STA 4273H: Statistical Machine Learning

Adaptive Monte Carlo methods

Bayesian Methods with Monte Carlo Markov Chains II

2 Probability and machine learning principles

Bayesian Updating: Discrete Priors: Spring

STAT 425: Introduction to Bayesian Analysis

Lecture 8: Bayesian Estimation of Parameters in State Space Models

CS 343: Artificial Intelligence

Bayesian Inverse Problems

Quantitative Biology II Lecture 4: Variational Methods

Sequential Monte Carlo Methods for Bayesian Computation

Previously Monte Carlo Integration

Approximate Inference using MCMC

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Transcription:

Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E P(x) " x [ f (x)] = f (x)p(x)

The Monte Carlo principle The expectation of f with respect to P can be approximated by E [ f (x)] " 1 n # f (x ) P(x) n i where the x i are sampled from P(x) i=1 Example 1: the average # of spots on a die roll

The Monte Carlo principle Average number of spots E P(x) The law of large numbers n # i=1 [ f (x)] " f (x ) i Number of rolls

More formally µ = E [ f (x)] " 1 P(x) n n # f (x ) = µ i MC i=1 µ MC is consistent, (µ MC - µ) 0 a.s. as n µ MC is unbiased, with E[µ MC ] = µ µ MC is asymptotically normal, with m(µ " µ) # N(0,$ 2 MC MC ) in distribution $ 2 MC = E [( f (x) " E [ f (x)]) 2 ] P(x ) P(x)

When simple Monte Carlo fails Efficient algorithms for sampling only exist for a relatively small number of distributions

Inverse cumulative distribution 1 " x 0 p(x)dx 0 (requires CDF be invertible)

Rejection sampling f(x) Rejection sampling Want to sample from: f (") = g(") / # g(")d" Rejection sampling algorithm For each sample Do until one θ is accepted 1. sample a point θ from the known distribution s(θ) ; 2. sample y from the uniform distribution on [0, 1]; 3. if A y g(θ) /s(θ) then break and accept θ; Rejection sampling uses an easy to sample from density s(θ) Requirement: g(θ) /s(θ) is (upper) bounded by A.

When simple Monte Carlo fails Efficient algorithms for sampling only exist for a relatively small number of distributions Sampling from distributions over large discrete state spaces is computationally expensive mixture model with n observations and k components, HMM with n observations and k states, k n possibilities Sometimes we want to sample from distributions for which we only know the probability of each state up to a multiplicative constant

Why Bayesian inference is hard P(h d) = $ h "# H P(d h)p(h) P(d " h )P( " h ) Evaluating the posterior probability of a hypothesis requires summing over all hypotheses (statistical physics: computing partition function)

Modern Monte Carlo methods Sampling schemes for distributions with large state spaces known up to a multiplicative constant Two example approaches: importance sampling Markov chain Monte Carlo

Importance sampling Basic idea: generate from the wrong distribution, assign weights to samples to correct for this E p(x) [ f (x)] = " f (x)p(x)dx = " f (x) p(x) q(x) q(x) dx " 1 n n # f (x ) p(x i ) i q(x ) i i=1 for x i ~ q(x)

Importance sampling works when sampling from proposal is easy, target is hard

An alternative scheme E p(x) [ f (x)] " 1 n n # f (x ) p(x i ) i q(x ) i i=1 for x i ~ q(x) E p(x) [ f (x)] " n # f (x ) p(x i ) i q(x ) i i=1 n # i=1 p(x i ) q(x i ) for x i ~ q(x) works when p(x) is known up to a multiplicative constant

More formally µ IS is consistent, (µ IS - µ) 0 a.s. as n µ IS is asymptotically normal, with " µ IS is biased, IS with $ p(x) ' 2 = E p(x ) &( f (x) # E p(x) [ f (x)]) 2 ) % q(x) ( µ IS " µ = 1 ) n E [ f (x)]e # p(x) & # p(x) p(x) % (" E p(x ) f (x) p(x) &, + % (. * $ q(x) ' $ q(x) '-

Optimal importance sampling Asymptotic variance is " IS $ p(x) ' 2 = E p(x ) &( f (x) # E p(x) [ f (x)]) 2 ) % q(x) ( This is minimized by q(x) " f (x) # E p(x ) [ f (x)] p(x)

Optimal importance sampling

Likelihood weighting A particularly simple form of importance sampling for posterior distributions Use the prior as the proposal distribution Weights: p(" D) p(") = p(d ")p(") p(d)p(") = p(d ") p(d) # p(d ")

Likelihood weighting Generate samples of all variables except observed variables Assign weights proportional to probability of observed data given values in sample X 4 X 3 X 1 X 2

Importance sampling A general scheme for sampling from complex distributions that have simpler relatives Simple methods for sampling from posterior distributions in some cases (easy to sample from prior, prior and posterior are close) Can be more efficient than simple Monte Carlo particularly for, e.g., tail probabilities Also provides a solution to the question of how we can update beliefs as data come in

Particle filtering s 1 s 2 s 3 s 4 d 1 d 2 d 3 d 4 We want to generate samples from P(s 4 d 1,, d 4 ) P(s 4 d 1,...,d 4 ) " P(d 4 s 4 )P(s 4 d 1,...,d 3 ) = P(d 4 s 4 ) P(s 4 s 3 )P(s 3 d 1,...,d 3 ) s 3 # We can use likelihood weighting if we can sample from P(s 4 s 3 ) and P(s 3 d 1,, d 3 )

Particle filtering P(s 4 d 1,...,d 4 ) " P(d 4 s 4 )# P(s 4 s 3 )P(s 3 d 1,...,d 3 ) s 3 sample from P(s 4 s 3 ) weight by P(d 4 s 4 ) samples from P(s 3 d 1,,d 3 ) samples from P(s 4 d 1,,d 3 ) weighted atoms P(s 4 d 1,,d 4 ) samples from P(s 4 d 1,,d 4 )

Tweaks and variations If we can enumerate values of s 4, can sample from n # i=1 P(s 4 d 1,...,d 4 ) " P(d 4 s 4 ) P(s 4 s 3 (i) ) No need to resample at every step, since we can accumulate weights over multiple observations resampling reduces diversity in samples only necessary when variance of weights is large Stratification and clever resampling schemes reduce variance (Fearnhead, 2001)

The promise of particle filters People need to be able to update probability distributions over large hypothesis spaces as more data become available Particle filters provide a way to do this with limited computing resources maintain a fixed finite number of samples Not just for dynamic models can work with a fixed set of hypotheses, although this requires some further tricks for maintaining diversity

Markov chain Monte Carlo Basic idea: construct a Markov chain that will converge to the target distribution, and draw samples from that chain Just uses something proportional to the target distribution (good for Bayesian inference!) Can work in state spaces of arbitrary (including unbounded) size (good for nonparametric Bayes)

Markov chains x x x x x x x x Transition matrix T = P(x (t+1) x (t) ) Variables x (t+1) independent of all previous variables given immediate predecessor x (t)

An example: card shuffling Each state x (t) is a permutation of a deck of cards (there are 52! permutations) Transition matrix T indicates how likely one permutation will become another The transition probabilities are determined by the shuffling procedure riffle shuffle overhand one card

Convergence of Markov chains Why do we shuffle cards? Convergence to a uniform distribution takes only 7 riffle shuffles Other Markov chains will also converge to a stationary distribution, if certain simple conditions are satisfied (called ergodicity ) e.g. every state can be reached in some number of steps from every other state

Markov chain Monte Carlo x x x x x x x x Transition matrix T = P(x (t+1) x (t) ) States of chain are variables of interest Transition matrix chosen to give target distribution as stationary distribution

Metropolis-Hastings algorithm Transitions have two parts: proposal distribution: Q(x (t+1) x (t) ) acceptance: take proposals with probability A(x (t),x (t+1) ) = min( 1, ) P(x (t+1) ) Q(x (t) x (t+1) ) P(x (t) ) Q(x (t+1) x (t) )

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x) A(x (t), x (t+1) ) = 0.5

Metropolis-Hastings algorithm p(x)

Metropolis-Hastings algorithm p(x) A(x (t), x (t+1) ) = 1

Metropolis-Hastings in a slide

Metropolis-Hastings algorithm For right stationary distribution, we want Sufficient condition is detailed balance:

Metropolis-Hastings algorithm This is symmetric in (x,y) and thus satisfies detailed balance