Markov Chains and MCMC

Similar documents
Markov Chains, Random Walks on Graphs, and the Laplacian

16 : Markov Chain Monte Carlo (MCMC)

Convergence Rate of Markov Chains

Lecture 5: Random Walks and Markov Chain

STA 4273H: Statistical Machine Learning

6 Markov Chain Monte Carlo (MCMC)

Introduction to Machine Learning CMU-10701

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Approximate Counting and Markov Chain Monte Carlo

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Machine Learning for Data Science (CS4786) Lecture 24

Simulation - Lectures - Part III Markov chain Monte Carlo

Sampling Methods (11/30/04)

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

CSC 446 Notes: Lecture 13

LECTURE 15 Markov chain Monte Carlo

Markov Chain Monte Carlo

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang


Lecture 7 and 8: Markov Chain Monte Carlo

Monte Carlo Methods. Leon Gu CSD, CMU

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

16 : Approximate Inference: Markov Chain Monte Carlo

The Markov Chain Monte Carlo Method

CS281A/Stat241A Lecture 22

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Markov Chains and MCMC

Probabilistic Machine Learning

Bayesian Methods for Machine Learning

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Markov chain Monte Carlo Lecture 9

CS 188: Artificial Intelligence. Bayes Nets

Markov Chain Monte Carlo Methods

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Bayesian Inference and MCMC

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Numerical methods for lattice field theory

18.440: Lecture 33 Markov Chains

Sampling from complex probability distributions

Quantifying Uncertainty

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Statistical Data Mining and Medical Signal Detection. Lecture Five: Stochastic Algorithms and MCMC. July 6, Motoya Machida

Lecture 14: October 22

Lecture 2: September 8

Answers and expectations

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

18.600: Lecture 32 Markov Chains

19 : Slice Sampling and HMC

Markov chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo

Artificial Intelligence

Markov chain Monte Carlo

The Metropolis-Hastings Algorithm. June 8, 2012

MCMC and Gibbs Sampling. Sargur Srihari

Convex Optimization CMU-10725

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Sampling Algorithms for Probabilistic Graphical models

Lecture 6: Markov Chain Monte Carlo

The Monte Carlo Method

Lecture notes on Regression: Markov Chain Monte Carlo (MCMC)

Graphical Models and Kernel Methods

Markov chain Monte Carlo

Probabilistic Graphical Networks: Definitions and Basic Results

Markov Chain Monte Carlo (MCMC)

CS Lecture 3. More Bayesian Networks

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

6.842 Randomness and Computation February 24, Lecture 6

CPSC 540: Machine Learning

I. Bayesian econometrics

Lecture 6: Entropy Rate

18.175: Lecture 30 Markov chains

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Some Definition and Example of Markov Chain

Topics in Natural Language Processing

Markov-Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Markov Chain Monte Carlo

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Bayesian Methods with Monte Carlo Markov Chains II

Lecture 8: The Metropolis-Hastings Algorithm

Appendix 3. Markov Chain Monte Carlo and Gibbs Sampling

StarAI Full, 6+1 pages Short, 2 page position paper or abstract

PROBABILITY AND INFERENCE

28 : Approximate Inference - Distributed MCMC

Monte Carlo importance sampling and Markov chain

Model Counting for Logical Theories

MCMC Methods: Gibbs and Metropolis

Use of Eigen values and eigen vectors to calculate higher transition probabilities

25.1 Markov Chain Monte Carlo (MCMC)

7.1 Coupling from the Past

Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

Inference in Bayesian Networks

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Transcription:

Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1

Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property, we want to estimate G Either intractable or inefficient to count exactly For i = 1 to N Choose u ε U, uniformly at random Check whether u ε G? Let X i = 1 if u ε G, X i = 0 otherwise Return Variance: Lecture 4 : 590.02 Spring 13 3

Recap: Monte Carlo Method When is this method an FPRAS? U is known and easy to uniformly sample from U. Easy to check whether sample is in G U / G is small (polynomial in the size of the input) Lecture 4 : 590.02 Spring 13 4

Recap: Importance Sampling In certain case G << U, hence the number of samples is not small. Suppose q(x) is the density of interest, sample from a different approximate density p(x) Lecture 4 : 590.02 Spring 13 5

Today s Class Markov Chains Markov Chain Monte Carlo sampling a.k.a. Metropolis-Hastings Method. Standard technique for probabilistic inference in machine learning, when the probability distribution is hard to compute exactly Lecture 4 : 590.02 Spring 13 6

Markov Chains Consider a time varying random process which takes the value X t at time t Values of X t are drawn from a finite (more generally countable) set of states Ω. {X 0 X t X n } is a Markov Chain if the value of X t only depends on X t-1 Lecture 4 : 590.02 Spring 13 7

Transition Probabilities Pr[X t+1 = s j X t = s i ], denoted by P(i,j), is called the transition probability Can be represented as a Ω x Ω matrix P. P(i,j) is the probability that the chain moves from state i to state j Let π i (t) = Pr[X t = s i ] denote the probability of reaching state i at time t Lecture 4 : 590.02 Spring 13 8

Transition Probabilities Pr[X t+1 = s j X t = s i ], denoted by P(i,j), is called the transition probability Can be represented as a Ω x Ω matrix P. P(i,j) is the probability that the chain moves from state i to state j If π(t) denotes the 1x Ω vector of probabilities of reaching all the states at time t, Lecture 4 : 590.02 Spring 13 9

Example Suppose Ω = {Rainy, Sunny, Cloudy} Tomorrow s weather only depends on today s weather. Markov process Pr[X t+1 = Sunny X t = Rainy] = 0.25 Pr[X t+1 = Sunny X t = Sunny] = 0 No 2 consecutive days of sun (Seattle?) Lecture 4 : 590.02 Spring 13 10

Example Suppose Ω = {Rainy, Sunny, Cloudy} Tomorrow s weather only depends on today s weather. Markov process Suppose today is Sunny. What is the weather 2 days from now? Lecture 4 : 590.02 Spring 13 11

Example Suppose Ω = {Rainy, Sunny, Cloudy} Tomorrow s weather only depends on today s weather. Markov process Suppose today is Sunny. What is the weather 7 days from now? Lecture 4 : 590.02 Spring 13 12

Example Suppose Ω = {Rainy, Sunny, Cloudy} Tomorrow s weather only depends on today s weather. Markov process Suppose today is Rainy. What is the weather 2 days from now? Weather 7 days from now? Lecture 4 : 590.02 Spring 13 13

Example After sufficient amount of time the expected weather distribution is independent of the starting value. Moreover, This is called the stationary distribution. Lecture 4 : 590.02 Spring 13 14

Stationary Distribution π is called a stationary distribution of the Markov Chain if That is, once the stationary distribution is reached, every subsequent X i is a sample from the distribution π How to use Markov Chains: Suppose you want to sample from a set Ω, according to distribution π Construct a Markov Chain (P) such that π is the stationary distribution Once stationary distribution is achieved, we get samples from the correct distribution. Lecture 4 : 590.02 Spring 13 15

Conditions for a Stationary Distribution A Markov chain is ergodic if it is: Irreducible: A state j can be reached from any state i in some finite number of steps. Lecture 4 : 590.02 Spring 13 16

Conditions for a Stationary Distribution A Markov chain is ergodic if it is: Irreducible: A state j can be reached from any state i in some finite number of steps. Aperiodic: A chain is not forced into cycles of fixed length between certain states Lecture 4 : 590.02 Spring 13 17

Conditions for a Stationary Distribution A Markov chain is ergodic if it is: Irreducible: A state j can be reached from any state i in some finite number of steps. Aperiodic: A chain is not forced into cycles of fixed length between certain states Theorem: For every ergodic Markov chain, there is a unique vector π such that for all initial probability vectors π(0), Lecture 4 : 590.02 Spring 13 18

Sufficient Condition: Detailed Balance In a stationary walk, for any pair of states j, k, the Markov Chain is as likely to move from j to k as from k to j. Also called reversibility condition. Lecture 4 : 590.02 Spring 13 19

Example: Random Walks Consider a graph G = (V,E), with weights on edges (w(e)) Random Walk: Start at some node u in the graph G(V,E) Move from node u to node v with probability proportional to w(u,v). Random walk is a Markov chain State space = V P(u,v) = w(u,v) / Σ w(u,v ) if (u,v) ε E = 0 if (u,v) is not in E Lecture 4 : 590.02 Spring 13 20

Random walk is ergodic if: Example: Random Walk Irreducible: A state j can be reached from any state i in some finite number of steps. If G is connected. Aperiodic: A chain is not forced into cycles of fixed length between certain states If G is not bipartite Lecture 4 : 590.02 Spring 13 21

Example: Random Walk Uniform random walk: Suppose all weights on the graph are 1 P(u,v) = 1/deg(u) (or 0) Theorem: If G is connected and not bipartite, then the stationary distribution of the random walk is Lecture 4 : 590.02 Spring 13 22

Symmetric random walk: Suppose P(u,v) = P(v,u) Example: Random Walk Theorem: If G is connected and not bipartite, then the stationary distribution of the random walk is Lecture 4 : 590.02 Spring 13 23

Stationary Distribution π is called a stationary distribution of the Markov Chain if That is, once the stationary distribution is reached, every subsequent X i is a sample from the distribution π How to use Markov Chains: Suppose you want to sample from a set Ω, according to distribution π Construct a Markov Chain (P) such that π is the stationary distribution Once stationary distribution is achieved, we get samples from the correct distribution. Lecture 4 : 590.02 Spring 13 24

Metropolis-Hastings Algorithm (MCMC) Suppose we want to sample from a complex distribution f(x) = p(x) / K, where K is unknown or hard to compute Example: Bayesian Inference Lecture 4 : 590.02 Spring 13 25

Metropolis-Hastings Algorithm Start with any initial value x 0, such that p(x 0 ) > 0 Using current value x t-1, sample a new point according some proposal distribution q(x t x t-1 ) Compute With probability α accept the move to x t, otherwise reject x t Lecture 4 : 590.02 Spring 13 26

Why does Metropolis-Hastings work? Metropolis-Hastings describes a Markov chain with transition probabilities: We want to show that f(x) = p(x)/k is the stationary distribution Recall sufficient condition for stationary distribution: Lecture 4 : 590.02 Spring 13 27

Why does Metropolis-Hastings work? Metropolis-Hastings describes a Markov chain with transition probabilities: Sufficient to show: Lecture 4 : 590.02 Spring 13 28

Proof: Case 1 Suppose Then, P(x,y) = q(y x) Therefore P(x,y)p(x) = q(y x) p(x) = p(y) q(x y) = P(y,x) p(y) Lecture 4 : 590.02 Spring 13 29

Proof: Case 2 Proof of Case 3 is identical. Lecture 4 : 590.02 Spring 13 30

When is stationary distribution reached? Next class Lecture 4 : 590.02 Spring 13 31