Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Similar documents
Robert Collins CSE586, PSU Intro to Sampling Methods

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Markov-Chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

Introduction to Machine Learning CMU-10701

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Markov chain Monte Carlo Lecture 9

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chains Handout for Stat 110

Convex Optimization CMU-10725

Stochastic optimization Markov Chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

6 Markov Chain Monte Carlo (MCMC)

Convergence Rate of Markov Chains

Bayesian Methods for Machine Learning

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

Computational statistics

Robert Collins CSE586, PSU Intro to Sampling Methods

Markov chain Monte Carlo methods in atmospheric remote sensing

Quantifying Uncertainty

Markov Chains and MCMC

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

17 : Markov Chain Monte Carlo

Answers and expectations

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

ELEC633: Graphical Models

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

ECEN 689 Special Topics in Data Science for Communications Networks

STA 4273H: Statistical Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Results: MCMC Dancers, q=10, n=500

The Google Markov Chain: convergence speed and eigenvalues

Markov Chains and MCMC

Monte Carlo Methods. Leon Gu CSD, CMU

Reminder of some Markov Chain properties:

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Markov Chain Monte Carlo

Markov chain Monte Carlo

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

STA 294: Stochastic Processes & Bayesian Nonparametrics

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Approximating a single component of the solution to a linear system

Monte Carlo importance sampling and Markov chain

Approximate Inference

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS 343: Artificial Intelligence

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

MARKOV CHAIN MONTE CARLO

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo

Introduction to Computational Biology Lecture # 27: Gibbs Sampling and Bi-clustering

Markov chain Monte Carlo

CS281A/Stat241A Lecture 22

Bayes Nets: Sampling

Markov chain Monte Carlo

TEORIA BAYESIANA Ralph S. Silva

MCMC: Markov Chain Monte Carlo

Markov Processes. Stochastic process. Markov process

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Markov chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo

Markov Chains, Random Walks on Graphs, and the Laplacian

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

CPSC 540: Machine Learning

Uncertainty and Randomization

Monte Carlo Methods. Geoff Gordon February 9, 2006

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Markov Processes Hamid R. Rabiee

MCMC Methods: Gibbs and Metropolis

Markov Chain Monte Carlo (MCMC)

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Markov Networks.

Monte Carlo Markov Chains: A Brief Introduction and Implementation. Jennifer Helsby Astro 321

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

MCMC and Gibbs Sampling. Sargur Srihari

Sampling Algorithms for Probabilistic Graphical models

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Brief introduction to Markov Chain Monte Carlo

9 Markov chain Monte Carlo integration. MCMC

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

MCMC and Gibbs Sampling. Kayhan Batmanghelich

CPSC 540: Machine Learning

Markov Chain Monte Carlo

MATH3200, Lecture 31: Applications of Eigenvectors. Markov Chains and Chemical Reaction Systems

Monte Carlo in Bayesian Statistics

Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Transcription:

Markov-Chain Monte Carlo

References

Problem Intuition: In high dimension problems, the Typical Set (volume of nonnegligable prob in state space) is a small fraction of the total space.

High-Dimensional Spaces consider ratio of volumes of hypersphere inscribed inside hypercube 2D 3D Asymptotic behavior: http://www.cs.rpi.edu/~zaki/courses/dmcourse/fall09/notes/highdim.pdf d most of volume of the hypercube lies outside of hypersphere as dimension d increases

High Dimensional Spaces each pixel has two states: on and off

, but...

ignores neighborhood structure of pixel lattice and empirical evidence that images are smooth

Recall: Markov Chains Markov Chain: A sequence of random variables X 1,X 2,X 3,... Each variable is a distribution over a set of states (a,b,c...) Transition probability of going to next state only depends on the current state. e.g. P(X n+1 = a X n = b) b a d c transition probs can be arranged in an NxN table of elements k ij = P(X n+1 =j X n = i) where the rows sum to one

General Idea: MCMC Sampling Start in some state, and then run the simulation for some number of time steps. After you have run it long enough start keeping track of the states you visit. {... X1 X2 X1 X3 X3 X2 X1 X2 X1 X1 X3 X3 X2...} These are samples from the distribution you want, so you can now compute any expected values with respect to that distribution empirically.

The Theory Behind MCMC Sampling every state is accessible from every other state. expected return time to every state is finite If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution. Then, the average of a function f over samples of the Markov chain is equal to the average with respect to the stationary distribution We can compute this empirically as we generate samples. This is what we want to compute, and is infeasible to compute in any other way.

K= transpose of transition prob table {k ij } (cols sum to one. We do this for computational convenience (next slide)

Question: Assume you start in some state, and then run the simulation for a large number of time steps. What percentage of time do you spend at X1, X2 and X3?

CSE586, four PSU possible initial distributions [.33.33.33] initial distribution distribution after one time step all eventually end up with same distribution -- this is the stationary distribution!

in matlab: [E,D] = eigs(k) (Perron-Frobenius theorem)

The PageRank of a webpage as used by Google is defined by a Markov chain. It is the probability to be at page i in the stationary distribution on the following Markov chain on all (known) webpages. If N is the number of known webpages, and a page i has ki links then it has transition probability (1-q)/ki + q/n for all pages that are linked to and q/n for all pages that are not linked to. The parameter q is taken to be about 0.15.

Another Question: Assume you want to spend a particular percentage of time at X1, X2 and X3. What should the transition probabilities be? P(x1) =.2 P(x2) =.3 X1 P(x3) =.5 K = [????????? ] X3 X2

Thought Experiment Consider only two states. What transition probabilities should we use so that we spend roughly equal time in each of the two states? (i.e. 50% of the time we are in state 1 and 50% of the time we are in state 2)?? 1 2??

Detailed Balance Consider a pair of configuration nodes r,s Want to generate them with frequency relative to their likelihoods L(r) and L(s) Let q(r,s) be relative frequency of proposing configuration s when the current state is r (and vice versa) A sufficient condition to generate r,s with the desired frequency is L(r) r q(r,s) L(r) q(r,s) = L(s) q(s,r) s L(s) detailed balance q(s,r)

Detailed Balance In practice, you just propose some transition probabilities. They typically will NOT satisfy detailed balance (unless you are extremely lucky). Instead, you fix them by introducing a computational fudge factor Detailed balance: a* L(r) q(r,s) = L(s) q(s,r) a * q(r,s) Solve for a: L(r) r a = L(s) q(s,r) L(r) q(r,s) q(s,r) s L(s)

Note: you can just make up transition probability q on-the-fly, using whatever criteria you wish. diff with rejection sampling: instead of throwing away rejections, you replicate them into next time step.

Note: the transition probabilities q(x (t),x) can be arbitrary distributions. They can depend on the current state and change at every time step, if you want.

Metropolis Hastings Example P(x1) =.2 P(x2) =.3 P(x3) =.5 X1 Matlab demo X3 X2 Proposal distribution q(xi, (xi-1)mod3 ) =.4 q(xi, (xi+1)mod3) =.6

Variants of MCMC there are many variations on this general approach, some derived as special cases of the Metropolis-Hastings algorithm

e.g. Gaussian cancels q(x,x) q(x, x )

simpler version, using 1D conditional distributions or line search, or...

1D conditional distr 1D conditional distr interleave

Simulated Annealing introduce a temperature term that makes it more likely to accept proposals early on. This leads to more aggressive exploration of the state space. Gradually reduce the temperature, causing the process to spend more time exploring high likelihood states. Rather than remember all states visited, keep track of the best state you ve seen so far. This is a method that attempts to find the global max (MAP) state.

Trans-dimensional MCMC Exploring alternative state spaces of differing dimensions (example, when doing EM, also try to estimate number of clusters along with parameters of each cluster). Green s reversible-jump approach (RJMCMC) gives a general template for exploring and comparing states of differing dimension.