Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Similar documents
Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Basic Sampling Methods

CS 343: Artificial Intelligence

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

CS 188: Artificial Intelligence. Bayes Nets

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Bayes Nets: Sampling

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Informatics 2D Reasoning and Agents Semester 2,

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

STA 4273H: Statistical Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Intelligent Systems (AI-2)

Markov Chain Monte Carlo

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

MCMC and Gibbs Sampling. Sargur Srihari

Graphical Models and Kernel Methods

17 : Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Artificial Intelligence

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Bayesian Networks BY: MOHAMAD ALSABBAGH

Markov chain Monte Carlo Lecture 9

Bayesian Inference and MCMC

Machine Learning for Data Science (CS4786) Lecture 24

Bayes Nets III: Inference

Approximate Inference using MCMC

Markov Networks.

Introduction to Bayesian Networks

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CPSC 540: Machine Learning

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

16 : Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC)

PROBABILISTIC REASONING SYSTEMS

Artificial Intelligence Bayesian Networks

Inference in Bayesian Networks

Approximate Inference

Sampling from Bayes Nets

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Markov chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

DAG models and Markov Chain Monte Carlo methods a short overview

Inference in Bayesian Networks

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Introduction to Machine Learning CMU-10701

LECTURE 15 Markov chain Monte Carlo

Probabilistic Graphical Models

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Markov Chain Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Lecture 8: Bayesian Networks

Reminder of some Markov Chain properties:

Bayesian networks: approximate inference

The Ising model and Markov chain Monte Carlo

MCMC algorithms for fitting Bayesian models

Chris Bishop s PRML Ch. 8: Graphical Models

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Particle-Based Approximate Inference on Graphical Model

Monte Carlo in Bayesian Statistics

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Lecture 7 and 8: Markov Chain Monte Carlo

Sampling Methods (11/30/04)

Machine Learning for Data Science (CS4786) Lecture 24

Session 3A: Markov chain Monte Carlo (MCMC)

Introduction to Artificial Intelligence (AI)

Probabilistic Graphical Models

Directed and Undirected Graphical Models

Stochastic Simulation

An Introduction to Bayesian Networks: Representation and Approximate Inference

Graphical Models. Lecture 15: Approximate Inference by Sampling. Andrew McCallum

Graphical Models - Part I

Monte Carlo Inference Methods

Bayesian networks. Instructor: Vincent Conitzer

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

STA 4273H: Statistical Machine Learning

CSC 2541: Bayesian Methods for Machine Learning

MCMC Sampling for Bayesian Inference using L1-type Priors

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Monte Carlo Methods. Geoff Gordon February 9, 2006

Markov Chains and MCMC

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

16 : Approximate Inference: Markov Chain Monte Carlo

Approximate Bayesian Computation: a simulation based approach to inference

Probabilistic Graphical Networks: Definitions and Basic Results

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Results: MCMC Dancers, q=10, n=500

Markov Chain Monte Carlo (MCMC)

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

Transcription:

Sampling Methods Oliver Schulte - CMP 419/726 Bishop PRML Ch. 11

Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure defined over cliques of variables Inference ends up being exponential in maximum clique size herefore slow in many cases Sampling methods: represent desired distribution with a set of samples, as more samples are used, obtain more accurate representation

Outline Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Outline Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Sampling he fundamental problem we address in this lecture is how to obtain samples from a probability distribution p(z) his could be a conditional distribution p(z e) We often wish to evaluate expectations such as E[f ] = f (z)p(z)dz e.g. mean when f (z) = z or complicated p(z), this is difficult to do exactly, approximate as ˆf = 1 L f (z (l) ) L l=1 where {z (l) l = 1,..., L} are independent samples from p(z)

Sampling p(z) f(z) z Approximate ˆf = 1 L L f (z (l) ) where {z (l) l = 1,..., L} are independent samples from p(z) Demo on Excel sheet. l=1

Bayesian Networks - Generating air Samples P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 How can we generate a fair set of samples from this BN? from Russell and Norvig, AIMA

Sampling from Bayesian Networks Sampling from discrete Bayesian networks with no observations is straight-forward, using ancestral sampling Bayesian network specifies factorization of joint distribution P(z 1,..., z n ) = n P(z i pa(z i )) i=1 Sample in-order, sample parents before children Possible because graph is a DAG Choose value for z i from p(z i pa(z i ))

Sampling rom Empty Network Example P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 from Russell and Norvig, AIMA

Sampling rom Empty Network Example P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 from Russell and Norvig, AIMA

Sampling rom Empty Network Example P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 from Russell and Norvig, AIMA

Sampling rom Empty Network Example P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 from Russell and Norvig, AIMA

Sampling rom Empty Network Example P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 from Russell and Norvig, AIMA

Sampling rom Empty Network Example P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 from Russell and Norvig, AIMA

Sampling rom Empty Network Example P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 from Russell and Norvig, AIMA

Ancestral Sampling his sampling procedure is fair, the fraction of samples with a particular value tends towards the joint probability of that value

Sampling Marginals P(C) Note that this procedure can be applied to generate samples for marginals as well C P(S C).10 Sprinkler Cloudy Wet Grass Rain C P(R C).80.20 Simply discard portions of sample which are not needed S R P(W S,R).99.01 e.g. or marginal p(rain), sample (cloudy = t, sprinkler = f, rain = t, wg = t) just becomes (rain = t) Still a fair sampling procedure

Other Problems Continuous variables? Gaussian okay, Box-Muller and other methods More complex distributions? Undirected graphs (MRs)?

Outline Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Rejection Sampling p(z) f(z) z Consider the case of an arbitrary, continuous p(z) How can we draw samples from it? Assume we can evaluate p(z), up to some normalization constant p(z) = 1 Z p p(z) where p(z) can be efficiently evaluated (e.g. MR)

Proposal Distribution kq(z 0 ) kq(z) u 0 p(z) z 0 z Let s also assume that we have some simpler distribution q(z) called a proposal distribution from which we can easily draw samples e.g. q(z) is a Gaussian We can then draw samples from q(z) and use these But these wouldn t be fair samples from p(z)?!

Comparison unction and Rejection kq(z 0 ) kq(z) u 0 p(z) z 0 z Introduce constant k such that kq(z) p(z) for all z Rejection sampling procedure: Generate z 0 from q(z) Generate u 0 from [0, kq(z 0 )] uniformly If u 0 > p(z) reject sample z 0, otherwise keep it Original samples are uniform in grey region Kept samples uniform in white region hence samples from p(z)

Rejection Sampling Analysis How likely are we to keep samples? Probability a sample is accepted is: p(accept) = { p(z)/kq(z)}q(z)dz = 1 p(z)dz k Smaller k is better subject to kq(z) p(z) for all z If q(z) is similar to p(z), this is easier In high-dim spaces, acceptance ratio falls off exponentially inding a suitable k challenging

Outline Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Discretization Importance sampling is a sampling technique for computing expectations: E[f ] = f (z)p(z)dz Could approximate using discretization over a uniform grid: E[f ] L f (z (l) )p(z (l) ) l=1 c.f. Riemannian sum Much wasted computation, exponential scaling in dimension Instead, again use a proposal distribution instead of a uniform grid

Importance sampling p(z) q(z) f(z) z Approximate expectation by drawing points from q(z). E[f ] = f (z)p(z)dz = f (z) p(z) q(z) q(z)dz 1 L L l=1 f (z (l) ) p(z(l) ) q(z (l) ) Quantities p(z (l) )/q(z (l) ) are known as importance weights Correct for use of wrong distribution q(z) in sampling

Sampling Importance Resampling Note that importance sampling, e.g. likelihood weighted sampling, gives approximation to expectation, not samples But samples can be obtained using these ideas Sampling-importance-resampling uses a proposal distribution q(z) to generate samples Unlike rejection sampling, no parameter k is needed

SIR - Algorithm Sampling-importance-resampling algorithm has two stages Sampling: Draw samples z (1),..., z (L) from proposal distribution q(z) Importance resampling: Put weights on samples w l = p(z (l) )/q(z (l) ) m p(z(m) )/q(z (m) ) Draw samples from the discrete set z (1),..., z (L) according to weights w l. his two stage process is correct in the limit as L

Outline Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) methods also use a proposal distribution to generate samples from another distribution Unlike the previous methods, we keep track of the samples generated z (1),..., z (τ) he proposal distribution depends on the current state: q(z z (τ) ) Intuitively, walking around in state space, each step depends only on the current state

Metropolis Algorithm he basic Metropolis algorithm assumes the proposal distribution is symmetric q(z A z B ) = q(z B z A ) Simple algorithm for walking around in state space: Draw sample z q(z z (τ) ) Accept sample with probability ( A(z, z (τ) p(z ) ) ) = min 1, p(z (τ) ) If accepted z (τ+1) = z, else z (τ+1) = z (τ) Note that if z is better than z (τ), it is always accepted Every iteration produces a sample hough sometimes it s the same as previous Contrast with rejection sampling

Metropolis Algorithm he basic Metropolis algorithm assumes the proposal distribution is symmetric q(z A z B ) = q(z B z A ) Simple algorithm for walking around in state space: Draw sample z q(z z (τ) ) Accept sample with probability ( A(z, z (τ) p(z ) ) ) = min 1, p(z (τ) ) If accepted z (τ+1) = z, else z (τ+1) = z (τ) Note that if z is better than z (τ), it is always accepted Every iteration produces a sample hough sometimes it s the same as previous Contrast with rejection sampling

Metropolis Algorithm he basic Metropolis algorithm assumes the proposal distribution is symmetric q(z A z B ) = q(z B z A ) Simple algorithm for walking around in state space: Draw sample z q(z z (τ) ) Accept sample with probability ( A(z, z (τ) p(z ) ) ) = min 1, p(z (τ) ) If accepted z (τ+1) = z, else z (τ+1) = z (τ) Note that if z is better than z (τ), it is always accepted Every iteration produces a sample hough sometimes it s the same as previous Contrast with rejection sampling

Metropolis Example 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 p(z) is anisotropic Gaussian, proposal distribution q(z) is isotropic Gaussian. (Isotropic = covariance matrix proportional to identity.) Red lines show rejected moves, green lines show accepted moves As τ, distribution of z (τ) tends to p(z) rue if q(z A z B ) > 0 - ergodicity In practice, burn-in the chain, collect samples after some iterations to get past initial state.

Metropolis Example - Graphical Model P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 Consider running Metropolis algorithm to draw samples from p(cloudy, rain spr = t, wg = t) Define q(z z τ ) to be uniform pick from cloudy, rain, uniformly reset its value

Metropolis Example Cloudy Cloudy Sprinkler Rain Sprinkler Rain Wet Grass Wet Grass Cloudy Cloudy Sprinkler Rain Sprinkler Rain Wet Grass Wet Grass Walk around in this state space, keep track of how many times each state occurs

Metropolis-Hastings Algorithm A generalization of the previous algorithm for asymmetric proposal distributions known as the Metropolis-Hastings algorithm Accept a step with probability ( A(z, z (τ) ) = min 1, ) p(z )q(z (τ) z ) p(z (τ) )q(z z (τ) ) Intuition: consider the ratio between: probability of moving from new state to current state (good), over probability of moving to new state from current state (bad).

Gibbs Sampling A simple coordinate-wise MCMC method Given distribution p(z) = p(z 1,..., z M ), sample each variable (either in pre-defined or random order) Sample z (τ+1) 1 p(z 1 z (τ) 2, z (τ) 3,..., z (τ) M ) Sample z (τ+1) 2 p(z 2 z (τ+1) 1, z (τ) 3,..., z (τ) M )... Sample z (τ+1) M p(z M z (τ+1) 1, z (τ+1) 2,..., z (τ+1) M 1 ) hese are easy if Markov blanket is small, e.g. in MR with small cliques, and forms amenable to sampling

Gibbs Sampling - Example z 2 L l Correlated bivariate Gaussian (red). Marginal distributions: width L. Conditional distributions (green): width l. Step size for Gibbs Sampling (blue): l. z 1

Gibbs Sampling Example - Graphical Model P(C) Cloudy C P(S C).10 Sprinkler Rain C P(R C).80.20 Wet Grass S R P(W S,R).99.01 Consider running Gibbs sampling on p(cloudy, rain spr = t, wg = t) q(z z τ ): pick from cloudy, rain, reset its value according to p(cloudy rain, spr, wg) (or p(rain cloudy, spr, wg)) his is often easy only need to look at Markov blanket

Conclusion Particle iltering is sampling method for temporal models (e.g., hidden markov model). Readings: Ch. 11.1-11.3 (we skipped much of it) Sampling methods use proposal distributions to obtain samples from complicated distributions Different methods, different methods of correcting for proposal distribution not matching desired distribution In practice, effectiveness relies on having good proposal distribution, which matches desired distribution well