Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Similar documents
Convergence Rate of Markov Chains

Markov chain Monte Carlo Lecture 9

Introduction to Machine Learning CMU-10701

Monte Carlo Methods. Leon Gu CSD, CMU

Convex Optimization CMU-10725

MCMC: Markov Chain Monte Carlo

Markov Chains and MCMC


Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

LECTURE 15 Markov chain Monte Carlo

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Computational statistics

16 : Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo

CSC 446 Notes: Lecture 13

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Quantifying Uncertainty

Markov chain Monte Carlo

Lecture 8: Bayesian Estimation of Parameters in State Space Models

MCMC Methods: Gibbs and Metropolis

Bayesian Methods for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

6 Markov Chain Monte Carlo (MCMC)

Markov Chains and MCMC

Simulation - Lectures - Part III Markov chain Monte Carlo

CPSC 540: Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Stochastic optimization Markov Chain Monte Carlo

Session 3A: Markov chain Monte Carlo (MCMC)

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

CS281A/Stat241A Lecture 22

Markov Chain Monte Carlo

The Ising model and Markov chain Monte Carlo

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Introduction to Machine Learning CMU-10701

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Bayesian GLMs and Metropolis-Hastings Algorithm

16 : Approximate Inference: Markov Chain Monte Carlo

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo (MCMC)

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo Methods

17 : Markov Chain Monte Carlo

Bayesian Estimation with Sparse Grids

The Metropolis-Hastings Algorithm. June 8, 2012

I. Bayesian econometrics

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

Expectations, Markov chains, and the Metropolis algorithm

Lecture 6: Markov Chain Monte Carlo

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Ch5. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC notes by Mark Holder

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

ELEC633: Graphical Models

1 Probabilities. 1.1 Basics 1 PROBABILITIES

A = {(x, u) : 0 u f(x)},

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

Stat 516, Homework 1

Markov Chain Monte Carlo methods

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Brief introduction to Markov Chain Monte Carlo

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Monte Carlo integration

MARKOV CHAIN MONTE CARLO

Introduction to Machine Learning

Lecture 7 and 8: Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo

Stochastic Simulation

Markov Chain Monte Carlo methods

Answers and expectations

Markov-Chain Monte Carlo

Lecture notes on Regression: Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo

Lecture 8: The Metropolis-Hastings Algorithm

Monte Carlo Methods. Geoff Gordon February 9, 2006

Bayesian Inference and MCMC

Markov chain Monte Carlo

7.1 Coupling from the Past

ST 740: Markov Chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

9 Markov chain Monte Carlo integration. MCMC

Discrete solid-on-solid models

Bayesian Methods with Monte Carlo Markov Chains II

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Reminder of some Markov Chain properties:

Markov chain Monte Carlo methods in atmospheric remote sensing

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Introduction to Computational Biology Lecture # 27: Gibbs Sampling and Bi-clustering

Transcription:

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10 we have built an HMM model that describes a sequence with one or many occurrence of the motif, and estimated the model s parameters using the EM algorithm. Today we will show a different approach for solving the motif finding problem using Markov chain Monte Carlo (MCMC) method. We have a probability distribution P(x), we would like to compute the expectation of a function on a data set E p [f(x)]. Suppose that we can generate i.i.d samples x 1,..., x n from P(x) then E p [f(x)] 1 n f(x i ) (1) n i=1 is a MC estimator of E p [f(x)], the next question is how to draw i.i.d samples from some P(x)? 2 Markov Chains Before introducing the Metropolis-Hastings algorithm and the Gibbs sampler, a few introductory comments on Markov chains. Markov Process: Let X (t) denote the value of a random variable at time t, and let the state space refer to the range of possible X values. The random variable is a Markov process if the transition probabilities between different values in the state space depend only on the random variables current state, i.e., Pr(X (t+1) = s t+1 X (t) = s t,..., X (1) = s 1 ) = Pr(X (t+1) = s t+1 X (t) = s t ) (2) Markov Chain: A Markov chain is a sequence of random variables X (1),...X (n) with the Markov property (each state depends only on the previous state). A particular chain is defined by its transition probabilities, Pr(i, j) = Pr(i j), which is the probability that a process at state space s i moves to state s j in a single step, P(i, j) = P(i j) = Pr(X (t+1) = s j X (t) = s i ) (3) Homogeneous Markov Chain: A markov chain is called homogeneous (Markov chains with homogeneous transition probabilities) if transition from one state to another is not time-dependent, Formally for all t, t. Pr(X (t+1) X (t) ) = Pr(X (t +1) X (t ) ) (4) 1

irreducible A Markov chain is said to be irreducible if it is possible to get to any state from any state with a positive probability Acyclic process A state is called periodic with period k if any return to state i must occur in some multiple of k steps and k is the largest number with this property k Pr(X (n+d) = s X (n) = s) > 0 d mod k = 0 A process is called acyclic if for every state X, X is not periodic. Put another way, the chain is not forced into some cycle of fixed length between certain states. Definition 2.1 Ergodic Markov chain - A Markov chain will be called ergodic if it is homogeneous, acyclic and irreducible. 3 Sampling using Markov Chain Our problem is to obtain samples from some complex probability distribution p(x). In order to achieve that we will build a Markov Chain with some transition probability Q(X (t) X (t+1) ) that when t converge to some stationary distribution Q(X (t) X (0) ) t Q (X (t) ) s.t. P(X = x) = Q (X (t) = x) (5) In order to use markov chain for sampling we want to reach a steady state that will represent our probability distribution, we ll denote this probability as Q (X = x) = P(X = x) (stationary distribution). Stationary Distribution: A distribution on the states such that the distribution at time n + 1 is the same as the distribution at the time n is called a stationary distribution. The conditions for a stationary distribution is that the chain is irreducible and aperiodic (ergodic). When a chain is periodic, it can cycle in a deterministic fashion between states and hence never settles down to a stationary distribution. A sufficient condition for a unique stationary distribution is that the detailed balance equation holds, for big enough t the probability to reach X t and transit to X t+1 is: P(X (t), X (t+1) ) = Q (X t )Q(X (t+1) X (t) ) (6) A Markov process is said to show detailed balance if the transition rates between each pair of states a and b in the state space obey Q(X (t) = a X (t+1) = b) Q (X t = a) = Q(X (t) = b X (t+1) = a) Q (X t = b) (7) Reversible Markov chains: A reversible Markov chain is a process in which you can generate the same trajectory whether you walk forward or backward in the process. Another way of phrasing it is a process in which we can t tell the order between states, if there are two given states we can t tell which one came before the other (diffusion in a solution is an example for such a reversible process). In reversible Markov chains: P(X = a)q(x (t+1) = b X (t) = a) = P(X = b)q(x (t+1) = a X (t) = b) (8) so the ratio between the stationary probability of two states is: P(X = a) P(X = b) = Q(X(t+1) = a X (t) = b) Q(X (t+1) = b X (t) = a) (9) 2

if this holds for every a and b then the stationary distribution is the only distribution the holds this condition, and we have, P(X = x) = Q (X (t) = x) = P = Q (10) We will introduce two methods to construct a Markov Chain (defining Q) that has the desired distribution as its stationary distribution. 4 Metropolis-Hastings Algorithm Our goal is to draw samples from some distribution p(x) where p(x) = f(x)/k. the normalizing constant K may not be known, and very difficult to compute. The Metropolis-Hastings algorithm (Metropolis and Ulam 1949, Metropolis et al. 1953, Hastings 1970) generates a sequence of draws from this distribution is as follows: 4.1 The Algorithm 1. Specify an initial value θ (0) satisfying f(θ (0) ) > 0. 2. Repeat for t = 1, 2,..., M (a) Using current θ (t) value, sample a candidate point θ from some proposal distribution r(θ θ (t) ). This distribution is also referred to as the jumping or candidate-generating distribution. The only restriction on the proposal density in the Metropolis algorithm is that it is symmetric, i.e., r(a b) = r(b a). (b) Sample q U(0, 1). (c) Let θ (t+1) θ =, if P(θ(t) )r(θ θ (t) ) > q ; P(θ )r(θ (t) θ ) θ (t), othrerwise. 3. Return the values θ (1), θ (2),..., θ (M) We can summarize the Metropolis sampling as first computing α(b a) = min( P(b)r(a b), 1) (11) P(a)r(b a) and then accepting a candidate point with probability α (the probability of a move). This generates a Markov chain (θ (0),..., θ (k),...) as the transition probabilities from θ (t) to θ (t+1) depends only on θ (t) and not (θ (0),..., θ (t 1) ). Following a sufficient burn-in period (of, say, k steps), the chain approaches its stationary distribution and samples from the vector (θ (k+1),..., θ (k+n) ) are samples from p(x). Metropolis-Hasting Sampling as a Markov Chain To demonstrate that the Metropolis-Hasting sampling generates a Markov chain whose equilibrium density is that candidate density p(x), it is sufficient to show that the Metropolis-Hasting transition probability satisfy the detailed balance equation with p(x). Using the Metropolis-Hasting algorithm, we sample from, = r(b a)α(b a) = r(b a) min(1, P(b)r(a b) P(a)r(b a) ) (12) 3

Thus if the transition probability satisfies P(X = a)q(x (t+1) = b X (t) = a) = P(X = b)q(x (t+1) = a X (t) = b) then that stationary distribution corresponds to draws from the target distribution. We assume w.l.o.g that (1 < P(b)r(a b) P(a)r(b a) ) for a b P(b)r(a b) P(a)r(b a) Q(a b) = r(b a) r(a b) 1 = r(b a)p(b)r(a b) r(a b)p(a)r(b a) = P(b) P(a) (13) and the detailed balance equation holds. 5 Gibbs Sampling The Gibbs sampler (introduced in the context of image processing by Geman and Geman 1984), is a special case of Metropolis-Hastings sampling wherein the random value is always accepted. The task remains to specify how to construct a Markov Chain whose values converge to the target distribution. We will present this algorithm with the motif finding problem, Lets first recall the motif finding problem: given a set of n DNA sequences each of length t, find the profile that maximizes the consensus score. 5.1 Algorithm 1. Input a =< a 1,..., a n > 2. Repeat for t = 1, 2,..., M (a) Sample random position in the array i r(n) (b) Set b =< b 1,..., b n > s.t. i. b j = a j if j i ii. b i P(X i X 1 = a 1,..., X i 1 = a i 1, X i+1 = a i+1,..., X n = a n ) 3. Return b =< b 1,..., b n > This algorithm produces detailed balanced Markov chains. We are interested only in transitions from a to b where a b and when a and b differ only in one coordinate (result of the Gibbs algorithm). Denote a i =< a 1,..., a i 1, a i+1,..., a n > so the ratio between the stationary probability of two states is: = r(i)p(x i = b i X i = a i ) (14) Q(a b) = r(i)p(x i = a i X i = b i ) (15) Q(a b) = P(X i = b i X i = a i ) P(X i = a i X i = b i ) (16) since a i = b i, Q(a b) = P(X i = b i X i = b i ) P( X i = b i ) P(X i = a i X i = a i ) P( X i = a i ) = P( b) P( a) (17) 4

The Gibbs sampler is somewhat easier to implement than the Metropolis-Hasting algorithm since we don t have to create the proposal distribution. Furthermore, computing p(x 1 x 2..x n ) can be done using the equation p(x 1..x n ) p(x 1 x 2..x n ) = x p(x 1 1, x 2..x n ) Most of the times the numerator and the denominator can be expressed as a product and then most of the elements will be reduced. A sketch algorithm for the motif finding problem Initialization: Select random locations in sequences x 1,..., x N Compute an initial model M from these locations Sampling Iterations: Remove one sequence x i Recalculate model Pick a new location of motif in x i according to the model 5