Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Similar documents
Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Basic Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

CS 343: Artificial Intelligence

Intelligent Systems (AI-2)

Informatics 2D Reasoning and Agents Semester 2,

CS 188: Artificial Intelligence. Bayes Nets

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayes Nets: Sampling

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Graphical Models and Kernel Methods

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

STA 4273H: Statistical Machine Learning

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Markov Chain Monte Carlo

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

Bayesian Networks BY: MOHAMAD ALSABBAGH

17 : Markov Chain Monte Carlo

Markov Networks.

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Inference in Bayesian Networks

PROBABILISTIC REASONING SYSTEMS

Bayesian Inference and MCMC

Inference in Bayesian networks

Introduction to Bayesian Networks

Artificial Intelligence

Markov chain Monte Carlo Lecture 9

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Approximate Inference using MCMC

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

MCMC and Gibbs Sampling. Sargur Srihari

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Artificial Intelligence Bayesian Networks

Machine Learning for Data Science (CS4786) Lecture 24

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Bayesian networks: approximate inference

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

Bayes Nets III: Inference

Sampling Algorithms for Probabilistic Graphical models

Sampling from Bayes Nets

DAG models and Markov Chain Monte Carlo methods a short overview

Bayesian Machine Learning

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Probabilistic Graphical Models (I)

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Inference in Bayesian Networks

Lecture 17: May 29, 2002

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

CPSC 540: Machine Learning

Approximate Inference

Graphical Models - Part II

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian networks. Instructor: Vincent Conitzer

Lecture 8: Bayesian Networks

Directed and Undirected Graphical Models

Markov Chain Monte Carlo (MCMC)

Introduction to Machine Learning CMU-10701

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning Lecture 14

Bayes Networks 6.872/HST.950

Chris Bishop s PRML Ch. 8: Graphical Models

Graphical Models - Part I

Inference in Bayesian networks

16 : Markov Chain Monte Carlo (MCMC)

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Artificial Intelligence Methods. Inference in Bayesian networks

Probabilistic Graphical Models

LECTURE 15 Markov chain Monte Carlo

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

T Machine Learning: Basic Principles

Graphical Models. Lecture 15: Approximate Inference by Sampling. Andrew McCallum

The Monte Carlo Method: Bayesian Networks

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

An Introduction to Bayesian Networks: Representation and Approximate Inference

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Computational statistics

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

CS Lecture 3. More Bayesian Networks

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Variational Inference (11/04/13)

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Markov chain Monte Carlo

Directed Graphical Models

Probabilistic Graphical Models

CPSC 540: Machine Learning

Probabilistic Reasoning Systems

Transcription:

Sampling Methods Bishop PRML h. 11 Alireza Ghane Sampling Methods A. Ghane /. Möller / G. Mori 1

Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure defined over cliques of variables Inference ends up being exponential in maximum clique size herefore slow in many cases Sampling methods: represent desired distribution with a set of samples, as more samples are used, obtain more accurate representation Sampling Methods A. Ghane /. Möller / G. Mori 2

Outline Sampling Rejection Sampling Importance Sampling Markov hain Monte arlo Sampling Methods A. Ghane /. Möller / G. Mori 3

Outline Sampling Rejection Sampling Importance Sampling Markov hain Monte arlo Sampling Methods A. Ghane /. Möller / G. Mori 4

Sampling he fundamental problem we address in this lecture is how to obtain samples from a probability distribution p(z) his could be a conditional distribution p(z e) We often wish to evaluate expectations such as E[f] = f(z)p(z)dz e.g. mean when f(z) = z or complicated p(z), this is difficult to do exactly, approximate as ˆf = 1 L f(z (l) ) L l=1 where {z (l) l = 1,..., L} are independent samples from p(z) Sampling Methods A. Ghane /. Möller / G. Mori 5

Sampling p(z) f(z) z Approximate ˆf = 1 L L f(z (l) ) l=1 where {z (l) l = 1,..., L} are independent samples from p(z) Sampling Methods A. Ghane /. Möller / G. Mori 6

Bayesian Networks - Generating air Samples P() loudy P(S ).10 [from Russell and Norvig, AIMA] Sprinkler S Wet Grass R P(W S,R).99.01 Rain P(R ).80.20 How can we generate a fair set of samples from this BN? Sampling Methods A. Ghane /. Möller / G. Mori 7

Sampling from Bayesian Networks Sampling from discrete Bayesian networks with no observations is straight-forward, using ancestral sampling Bayesian network specifies factorization of joint distribution P (z 1,..., z n ) = n P (z i pa(z i )) i=1 Sample in-order, sample parents before children Possible because graph is a DAG hoose value for z i from p(z i pa(z i )) Sampling Methods A. Ghane /. Möller / G. Mori 8

Sampling rom Empty Network Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 9

Sampling rom Empty Network Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 10

Sampling rom Empty Network Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 11

Sampling rom Empty Network Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 12

Sampling rom Empty Network Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 13

Sampling rom Empty Network Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 14

Sampling rom Empty Network Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 15

Ancestral Sampling his sampling procedure is fair, the fraction of samples with a particular value tends towards the joint probability of that value Define S P S (z 1,..., z n ) to be the probability of generating the event (z 1,..., z n ) his is equal to p(z 1,..., z n ) due to the semantics of the Bayesian network Define N P S (z 1,..., z n ) to be the number of times we generate the event (z 1,..., z n ) N P S (z 1,..., z n ) lim = S P S (z 1,..., z n ) = p(z 1,..., z n ) N N Sampling Methods A. Ghane /. Möller / G. Mori 16

Sampling Marginals P() Note that this procedure can be applied to generate samples for marginals as well P(S ).10 Sprinkler loudy Rain P(R ).80.20 Simply discard portions of sample which are not needed S R Wet Grass P(W S,R).99.01 e.g. or marginal p(rain), sample (cloudy = t, sprinkler = f, rain = t, wg = t) just becomes (rain = t) Still a fair sampling procedure Sampling Methods A. Ghane /. Möller / G. Mori 17

Sampling with Evidence What if we observe some values and want samples from p(z e)? Naive method, logic sampling: Generate N samples from p(z) using ancestral sampling Discard those samples that do not have correct evidence values e.g. or p(rain cloudy = t, spr = t, wg = t), sample (cloudy = t, spr = f, rain = t, wg = t) discarded Generates fair samples, but wastes time Many samples will be discarded for low p(e) Sampling Methods A. Ghane /. Möller / G. Mori 18

Other Problems ontinuous variables? Gaussian okay, Box-Muller and other methods More complex distributions? Undirected graphs (MRs)? Sampling Methods A. Ghane /. Möller / G. Mori 19

Outline Sampling Rejection Sampling Importance Sampling Markov hain Monte arlo Sampling Methods A. Ghane /. Möller / G. Mori 20

Rejection Sampling p(z) f(z) z onsider the case of an arbitrary, continuous p(z) How can we draw samples from it? Assume we can evaluate p(z), up to some normalization constant p(z) = 1 Z p p(z) where p(z) can be efficiently evaluated (e.g. MR) Sampling Methods A. Ghane /. Möller / G. Mori 21

Proposal Distribution kq(z 0 ) kq(z) u 0 p(z) z 0 z Let s also assume that we have some simpler distribution q(z) called a proposal distribution from which we can easily draw samples e.g. q(z) is a Gaussian We can then draw samples from q(z) and use these But these wouldn t be fair samples from p(z)?! Sampling Methods A. Ghane /. Möller / G. Mori 22

omparison unction and Rejection kq(z 0 ) kq(z) u 0 p(z) z 0 z Introduce constant k such that kq(z) p(z) for all z Rejection sampling procedure: Generate z 0 from q(z) Generate u 0 from [0, kq(z 0 )] uniformly If u 0 > p(z) reject sample z 0, otherwise keep it Original samples are uniform in grey region Kept samples uniform in white region hence samples from p(z) Sampling Methods A. Ghane /. Möller / G. Mori 23

Rejection Sampling Analysis How likely are we to keep samples? Probability a sample is accepted is: p(accept) = { p(z)/kq(z)}q(z)dz = 1 p(z)dz k Smaller k is better subject to kq(z) p(z) for all z If q(z) is similar to p(z), this is easier In high-dim spaces, acceptance ratio falls off exponentially inding a suitable k challenging Sampling Methods A. Ghane /. Möller / G. Mori 24

Outline Sampling Rejection Sampling Importance Sampling Markov hain Monte arlo Sampling Methods A. Ghane /. Möller / G. Mori 25

Discretization Importance sampling is a sampling technique for computing expectations: E[f] = f(z)p(z)dz ould approximate using discretization over a uniform grid: E[f] L f(z (l) )p(z (l) ) l=1 c.f. Riemannian sum Much wasted computation, exponential scaling in dimension Instead, again use a proposal distribution instead of a uniform grid Sampling Methods A. Ghane /. Möller / G. Mori 26

Importance sampling p(z) q(z) f(z) z Approximate expectation E[f] = f(z)p(z)dz = 1 L L l=1 f(z (l) ) p(z(l) ) q(z (l) ) f(z) p(z) q(z) q(z)dz Quantities p(z (l) )/q(z (l) ) are known as importance weights orrect for use of wrong distribution q(z) in sampling Sampling Methods A. Ghane /. Möller / G. Mori 27

Likelihood Weighted Sampling onsider the case where we have evidence e and again desire an expectation over p(x e) If we have a Bayesian network, we can use a particular type of importance sampling called likelihood weighted sampling: Perform ancestral sampling If a variable z i is in the evidence set, set its value rather than sampling Importance weights are:?? Sampling Methods A. Ghane /. Möller / G. Mori 28

Likelihood Weighted Sampling onsider the case where we have evidence e and again desire an expectation over p(x e) If we have a Bayesian network, we can use a particular type of importance sampling called likelihood weighted sampling: Perform ancestral sampling If a variable z i is in the evidence set, set its value rather than sampling Importance weights are: p(z (l) ) q(z (l) ) =? Sampling Methods A. Ghane /. Möller / G. Mori 29

Likelihood Weighted Sampling onsider the case where we have evidence e and again desire an expectation over p(x e) If we have a Bayesian network, we can use a particular type of importance sampling called likelihood weighted sampling: Perform ancestral sampling If a variable z i is in the evidence set, set its value rather than sampling Importance weights are: p(z (l) ) q(z (l) ) = p(x, e) p(e) 1 z i e p(z i pa i ) z i e p(z i pa i ) Sampling Methods A. Ghane /. Möller / G. Mori 30

Likelihood Weighted Sampling Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] w = 1.0 S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 31

Likelihood Weighted Sampling Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] w = 1.0 S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 32

Likelihood Weighted Sampling Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] w = 1.0 0.1 S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 33

Likelihood Weighted Sampling Example P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass [from Russell and Norvig, AIMA] w = 1.0 0.1 S R P(W S,R).99.01 Sampling Methods A. Ghane /. Möller / G. Mori 34

Likelihood Weighted Sampling Example P() loudy P(S ).10 Sprinkler [from Russell and Norvig, AIMA] w = 1.0 0.1 0.99 = 0.099 S Wet Grass R P(W S,R).99.01 Rain P(R ).80.20 Sampling Methods A. Ghane /. Möller / G. Mori 35

Sampling Importance Resampling Note that importance sampling, e.g. likelihood weighted sampling, gives approximation to expectation, not samples But samples can be obtained using these ideas Sampling-importance-resampling uses a proposal distribution q(z) to generate samples Unlike rejection sampling, no parameter k is needed Sampling Methods A. Ghane /. Möller / G. Mori 36

SIR - Algorithm Sampling-importance-resampling algorithm has two stages Sampling: Draw samples z (1),..., z (L) from proposal distribution q(z) Importance resampling: Put weights on samples p(z (l) )/q(z (l) ) w l = m p(z(m) )/q(z (m) ) Draw samples from the discrete set z (1),..., z (L) according to weights w l (uniform distribution) his two stage process is correct in the limit as L Sampling Methods A. Ghane /. Möller / G. Mori 37

Outline Sampling Rejection Sampling Importance Sampling Markov hain Monte arlo Sampling Methods A. Ghane /. Möller / G. Mori 38

Markov hain Monte arlo Markov chain Monte arlo (MM) methods also use a proposal distribution to generate samples from another distribution Unlike the previous methods, we keep track of the samples generated z (1),..., z (τ) he proposal distribution depends on the current state: q(z z (τ) ) Intuitively, walking around in state space, each step depends only on the current state Sampling Methods A. Ghane /. Möller / G. Mori 39

Metropolis Algorithm Simple algorithm for walking around in state space: Draw sample z q(z z (τ) ) Accept sample with probability ( A(z, z (τ) p(z ) ) ) = min 1, p(z (τ) ) If accepted z (τ+1) = z, else z (τ+1) = z (τ) Note that if z is better than z (τ), it is always accepted Every iteration produces a sample hough sometimes it s the same as previous ontrast with rejection sampling he basic Metropolis algorithm assumes the proposal distribution is symmetric q(z A z B ) = q(z B z A ) Sampling Methods A. Ghane /. Möller / G. Mori 40

Metropolis Algorithm Simple algorithm for walking around in state space: Draw sample z q(z z (τ) ) Accept sample with probability ( A(z, z (τ) p(z ) ) ) = min 1, p(z (τ) ) If accepted z (τ+1) = z, else z (τ+1) = z (τ) Note that if z is better than z (τ), it is always accepted Every iteration produces a sample hough sometimes it s the same as previous ontrast with rejection sampling he basic Metropolis algorithm assumes the proposal distribution is symmetric q(z A z B ) = q(z B z A ) Sampling Methods A. Ghane /. Möller / G. Mori 41

Metropolis Algorithm Simple algorithm for walking around in state space: Draw sample z q(z z (τ) ) Accept sample with probability ( A(z, z (τ) p(z ) ) ) = min 1, p(z (τ) ) If accepted z (τ+1) = z, else z (τ+1) = z (τ) Note that if z is better than z (τ), it is always accepted Every iteration produces a sample hough sometimes it s the same as previous ontrast with rejection sampling he basic Metropolis algorithm assumes the proposal distribution is symmetric q(z A z B ) = q(z B z A ) Sampling Methods A. Ghane /. Möller / G. Mori 42

Metropolis Algorithm Simple algorithm for walking around in state space: Draw sample z q(z z (τ) ) Accept sample with probability ( A(z, z (τ) p(z ) ) ) = min 1, p(z (τ) ) If accepted z (τ+1) = z, else z (τ+1) = z (τ) Note that if z is better than z (τ), it is always accepted Every iteration produces a sample hough sometimes it s the same as previous ontrast with rejection sampling he basic Metropolis algorithm assumes the proposal distribution is symmetric q(z A z B ) = q(z B z A ) Sampling Methods A. Ghane /. Möller / G. Mori 43

Metropolis Example 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 p(z) is anisotropic Gaussian, proposal distribution q(z) is isotropic Gaussian Red lines show rejected moves, green lines show accepted moves As τ, distribution of z (τ) tends to p(z) rue if q(z A z B ) > 0 In practice, burn-in the chain, collect samples after some iterations Only keep every M th sample Sampling Methods A. Ghane /. Möller / G. Mori 44

Metropolis Example - Graphical Model P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass S R P(W S,R).99.01 onsider running Metropolis algorithm to draw samples from p(cloudy, rain spr = t, wg = t) Define q(z z τ ) to be uniformly pick from cloudy, rain, uniformly reset its value Sampling Methods A. Ghane /. Möller / G. Mori 45

Metropolis Example loudy loudy Sprinkler Rain Sprinkler Rain Wet Grass Wet Grass loudy loudy Sprinkler Rain Sprinkler Rain Wet Grass Wet Grass Walk around in this state space, keep track of how many times each state occurs Sampling Methods A. Ghane /. Möller / G. Mori 46

Metropolis-Hastings Algorithm A generalization of the previous algorithm for asymmetric proposal distributions known as the Metropolis-Hastings algorithm Accept a step with probability ( A(z, z (τ) ) = min 1, ) p(z )q(z (τ) z ) p(z (τ) )q(z z (τ) ) A sufficient condition for this algorithm to produce the correct distribution is detailed balance Sampling Methods A. Ghane /. Möller / G. Mori 47

Gibbs Sampling A simple coordinate-wise MM method Given distribution p(z) = p(z 1,..., z M ), sample each variable (either in pre-defined or random order) Sample z (τ+1) 1 p(z 1 z (τ) 2, z(τ) 3,..., z(τ) M ) Sample z (τ+1) 2 p(z 2 z (τ+1) 1, z (τ) 3,..., z(τ) M )... Sample z (τ+1) M p(z M z (τ+1) 1, z (τ+1) 2,..., z (τ+1) M 1 ) hese are easy if Markov blanket is small, e.g. in MR with small cliques, and forms amenable to sampling Sampling Methods A. Ghane /. Möller / G. Mori 48

Gibbs Sampling - Example z 2 L l z 1 Sampling Methods A. Ghane /. Möller / G. Mori 49

Gibbs Sampling Example - Graphical Model P() loudy P(S ).10 Sprinkler Rain P(R ).80.20 Wet Grass S R P(W S,R).99.01 onsider running Gibbs sampling on p(cloudy, rain spr = t, wg = t) q(z z τ ): pick from cloudy, rain, reset its value according to p(cloudy rain, spr, wg) (or p(rain cloudy, spr, wg)) his is often easy only need to look at Markov blanket Sampling Methods A. Ghane /. Möller / G. Mori 50

onclusion Readings: h. 11.1-11.3 (we skipped much of it) Sampling methods use proposal distributions to obtain samples from complicated distributions Different methods, different methods of correcting for proposal distribution not matching desired distribution In practice, effectiveness relies on having good proposal distribution, which matches desired distribution well Sampling Methods A. Ghane /. Möller / G. Mori 51