Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

Similar documents
Convex Optimization CMU-10725

Introduction to Machine Learning CMU-10701

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Stochastic optimization Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Monte Carlo Methods. Leon Gu CSD, CMU

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains and Stochastic Sampling

Markov Chains and MCMC

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Markov Chain Monte Carlo

spring, math 204 (mitchell) list of theorems 1 Linear Systems Linear Transformations Matrix Algebra

Markov chain Monte Carlo Lecture 9

Math 1553, Introduction to Linear Algebra

ELEC633: Graphical Models

TMA Calculus 3. Lecture 21, April 3. Toke Meier Carlsen Norwegian University of Science and Technology Spring 2013

6 Markov Chain Monte Carlo (MCMC)

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

LTCC. Exercises. (1) Two possible weather conditions on any day: {rainy, sunny} (2) Tomorrow s weather depends only on today s weather

I.Classical probabilities

STA 294: Stochastic Processes & Bayesian Nonparametrics

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Perron Frobenius Theory

MATH3200, Lecture 31: Applications of Eigenvectors. Markov Chains and Chemical Reaction Systems

18.600: Lecture 32 Markov Chains

LTCC. Exercises solutions

Markov Chains and MCMC

CS281A/Stat241A Lecture 22

Markov Chains (Part 3)

Stochastic Processes

Lecture 2: September 8

Eigenvalues in Applications

STOCHASTIC PROCESSES Basic notions

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Introduction to Restricted Boltzmann Machines

Language Acquisition and Parameters: Part II

Convergence Rate of Markov Chains

Finite-Horizon Statistics for Markov chains

Discrete Markov Chain. Theory and use

CMPSCI 240: Reasoning Under Uncertainty

On the mathematical background of Google PageRank algorithm

16 : Markov Chain Monte Carlo (MCMC)

16 : Approximate Inference: Markov Chain Monte Carlo

, some directions, namely, the directions of the 1. and

Markov Chains, Stochastic Processes, and Matrix Decompositions

18.440: Lecture 33 Markov Chains

eigenvalues, markov matrices, and the power method

LECTURE 15 Markov chain Monte Carlo

MCMC: Markov Chain Monte Carlo

MARKOV PROCESSES. Valerio Di Valerio

Lecture 3 Eigenvalues and Eigenvectors

Quantifying Uncertainty

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Lecture #5. Dependencies along the genome

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

CSC 446 Notes: Lecture 13

Reflections in Hilbert Space III: Eigen-decomposition of Szegedy s operator

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Lecture 12: Link Analysis for Web Retrieval

Markov Chain Monte Carlo (MCMC)

Answers and expectations

1. In this problem, if the statement is always true, circle T; otherwise, circle F.

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

High-dimensional Markov Chain Models for Categorical Data Sequences with Applications Wai-Ki CHING AMACL, Department of Mathematics HKU 19 March 2013

Lecture 3: Markov chains.

ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

18.175: Lecture 30 Markov chains

A Lambda Calculus Foundation for Universal Probabilistic Programming

Markov Chain Monte Carlo Methods

Lecture 6: Markov Chain Monte Carlo

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Bayes Nets: Sampling

CONTENTS. Preface List of Symbols and Notation

Discrete solid-on-solid models

Lecture 8: The Metropolis-Hastings Algorithm

Math Camp Notes: Linear Algebra II

Stochastic process for macro

In many cases, it is easier (and numerically more stable) to compute

JUST THE MATHS SLIDES NUMBER 9.6. MATRICES 6 (Eigenvalues and eigenvectors) A.J.Hobson

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

MARKOV CHAIN MONTE CARLO

On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors


Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Econ Slides from Lecture 7

MCMC Methods: Gibbs and Metropolis

Transcription:

Understanding MCMC Marcel Lüthi, University of Basel Slides based on presentation by Sandro Schönborn 1

The big picture which satisfies detailed balance condition for p(x) an aperiodic and irreducable If Markov Chain is a- periodic and irreducable it converges to Markov chain induces Metropolis Hastings Algorithm samples from Equilibrium distribution is Distribution p(x)

Understanding Markov Chains 3

Markov Chain State space Sequence of random variables X N i i=1, X i S with joint distribution P X 1, X 2,, X N = P X 1 P(X i X i 1 ) N i=2 Transition probability Initial distribution 1/6 1/3 2 Simplifications: (for our analysis) Discrete state space: S = {1, 2,, K} Automatically true if we use computers (e.g. 32 bit floats) 1 1 1 Homogeneous Chain: P X i = l X i 1 = m = T lm 1/2 3 4

Example: Markov Chain 0.95 0.05 0.8 Simple weather model: dry (D) or rainy (R) hour Condition in next hour? X t+1 State space S = {D, R} Stochastic: P(X t+1 X t ) Depends only on current condition X t D 0.2 R Draw samples from chain: Initial: X 0 = D Evolution: P X t+1 X t DDDDDDDDRRRRRRRRRRRDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDRDD... Long-term Behavior Does it converge? Average probability of rain? Dynamics? How quickly will it converge? 5

Discrete Homogeneous Markov Chain Formally linear algebra: Distribution (vector): P X i : p i = P(X i = 1) P(X i = K) Transition probability (transition matrix): P X i X i 1 : T = P 1 1 P 1 K P K 1 P K K T lm = P l m = P X i = l X i 1 = m 6

Evolution of the Initial Distribution Evolution of P X 1 P(X 2 ): Evolution of n steps: P X 2 = l = P l m P X 1 = m m S p 2 = Tp 1 p n+1 = T n p 1 Is there a stable distribution p? (steady-state) p = Tp A stable distribution is an eigenvector of T with eigenvalue λ = 1 7

Steady-State Distribution: p It exists: T subject to normalization constraint: left eigenvector to eigenvalue 1 T lm = 1 1 1 T = 1 1 l T has eigenvalue λ = 1 (left-/right eigenvalues are the same) Steady-state distribution as corresponding right eigenvector Tp = p Does any arbitrary initial distribution evolve to p? Convergence? Uniqueness? 8

Equilibrium Distribution: p Additional requirement for T: T n lm > 0 for n > N 0 The chain is called irreducible and aperiodic (implies ergodic) All states are connected using at most N 0 steps Return intervals to a certain state are irregular Perron-Frobenius theorem for positive matrices: PF1: λ 1 = 1 is a simple eigenvalue with 1d eigenspace (uniqueness) PF2: λ 1 = 1 is dominant, all λ i < 1, i 1 (convergence) p is a stable attractor, called equilibrium distribution Tp = p 9

Convergence Time evolution of arbitrary distribution p 0 p n = T n p 0 Expand p 0 in Eigen basis of T: Te i = λ i e i, λ i < λ 1 = 1, λ k λ k+1 K p 0 = c i e i i K Tp 0 = c i λ i e i i K T n p 0 = c i λ n i e i = c 1 e 1 + λ n 2 c 2 e 2 + λ n 3 c 3 e 3 + i 10

Convergence (II) K T n p 0 = c i λ n i e i = c 1 e 1 + λ n 2 c 2 e 2 + λ n 3 c 3 e 3 + (n 1) We have convergence: Rate of convergence: i p + λ 2 n c 2 e 2 T n p 0 n p p n p λ 2 n c 2 e 2 = λ 2 n c 2 c 1 e 1 = p Normalizations: e 1 = 1 σ i p i = 1 11

Example: Weather Dynamics Rain forecast for stable versus mixed weather: 0.95 0.2 stable W s = W 0.05 0.8 D R mixed m = 0.85 0.6 0.15 0.4 p = 0.8 0.2 Long-term average probability of rain: 20% p = 0.8 0.2 Eigenvalues: 1, 0.75 Eigenvalues: 1, 0.25 Rainy now, next hours? RRRRDDDDDDDDDDDD DDDDDDDDDDDDD... Rainy now, next hours? RDDDDDDDDDDDDDDD RDDDRDDDDDDDD... 12

Markov Chain: First Results Aperiodic and irreducible chains are ergodic: (every state reachable after > N steps, irregular return time) Convergence towards a unique equilibrium distribution p Equilibrium distribution p Eigenvector of T with eigenvalue λ = 1: Rate of convergence: Tp = p Exponential decay with second largest eigenvalue λ 2 n Only useful if we can design chain with desired equilibrium distribution! 13

Detailed Balance Special property of some Markov chains Distribution p satisfies detailed balance if the total flow of probability between every pair of states is equal, (we have a local equilibrium): P l m p m = P m l p l Detailed balance implies: p is the equilibrium distribution Tp l = T lm p m = T ml p l = p l m m Most MCMC methods construct chains which satisfies detailed balance. 14

The Metropolis-Hastings Algorithm MCMC to draw samples from an arbitrary distribution 16

Idea of Metropolis Hastings algorithm Design a Markov Chain, which satisfies the detailed balance condition T MH x x P x = T MH x x P x Ergodicity ensures that chain converges to this distribution

Attempt 1: A simple algorithm Initialize with sample x Generate next sample, with current sample x 1. Draw a sample x from Q(x x) ( proposal ) 2. Emit current state x as sample It s a Markov chain Need to choose Q for every P to satisfy detailed balance Q x x P x = Q x x P x

Attempt 2: More general solution Initialize with sample x Generate next sample, with current sample x 1. Draw a sample x from Q(x x) ( proposal ) 2. With probability α(x, x ) emit x as new sample 3. With probability 1 α(x, x ) emit x as new sample It s a Markov chain Decouples Q from P through acceptance rule a How to choose a?

What is the acceptance function a? Case A: x = x T MH x x P x = T MH x x P x a x x Q x x P x = a x x Q x x P x Detailed balance trivially satisfied for every a(x,x) Case B: x x We have the following requirement a x x a x x = Q x x P x Q x x P x

What is the acceptance function a? Requirement: Choose a(x x) such that a x x a x x = Q x x P x Q x x P x a x x is probability distribution a x x 1 and a x x 0 Easy to check that: satisfies this property. a x x = min 1, Q x x P x Q x x P x

The big picture which satisfies detailed balance condition for p(x) an aperiodic and irreducable If Markov Chain is a- periodic and irreducable it converges to Markov chain induces Metropolis Hastings Algorithm samples from Equilibrium distribution is Distribution p(x)