Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Similar documents
Markov Chains and Markov Chain Monte Carlo. Yee Whye Teh Department of Statistics TAs: Luke Kelly, Lloyd Elliott

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

STOCHASTIC PROCESSES Basic notions

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Markov Chains Handout for Stat 110

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

BMI/CS 776 Lecture 4. Colin Dewey

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6)

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

18.175: Lecture 30 Markov chains

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

18.600: Lecture 32 Markov Chains

Markov Chains (Part 3)

Lecture 20 : Markov Chains

Statistics 992 Continuous-time Markov Chains Spring 2004

18.440: Lecture 33 Markov Chains

Lecture 3: Markov chains.

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

LTCC. Exercises. (1) Two possible weather conditions on any day: {rainy, sunny} (2) Tomorrow s weather depends only on today s weather

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Markov Chains and Stochastic Sampling

2 Discrete-Time Markov Chains

The Transition Probability Function P ij (t)

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

ISE/OR 760 Applied Stochastic Modeling

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Lecture 9 Classification of States

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Lecture 5: Random Walks and Markov Chain

SMSTC (2007/08) Probability.

Markov Chains, Stochastic Processes, and Matrix Decompositions

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

12 Markov chains The Markov property

ISM206 Lecture, May 12, 2005 Markov Chain

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

Stochastic Processes

Stochastic Processes

Chapter 16 focused on decision making in the face of uncertainty about one future

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

4.7.1 Computing a stationary distribution

Lesson Plan. AM 121: Introduction to Optimization Models and Methods. Lecture 17: Markov Chains. Yiling Chen SEAS. Stochastic process Markov Chains

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

STAT STOCHASTIC PROCESSES. Contents

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Probability, Random Processes and Inference

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Online Social Networks and Media. Link Analysis and Web Search

STA 294: Stochastic Processes & Bayesian Nonparametrics

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes

Markov Chains (Part 4)

Markov Chains, Random Walks on Graphs, and the Laplacian

Lecture Notes: Markov chains

Reinforcement Learning

1 Random walks: an introduction

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Markov Processes Hamid R. Rabiee

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Discrete Time Markov Chain (DTMC)

Stochastic process. X, a series of random variables indexed by t

Discrete Markov Chain. Theory and use

MARKOV PROCESSES. Valerio Di Valerio

Convex Optimization CMU-10725

Mutation models I: basic nucleotide sequence mutation models

LTCC. Exercises solutions

Data analysis and stochastic modeling

Waiting time distributions of simple and compound patterns in a sequence of r-th order Markov dependent multi-state trials

Stochastic Simulation

Statistics 150: Spring 2007

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS

STAT 380 Continuous Time Markov Chains

Quantitative Evaluation of Emedded Systems Solution 2: Discrete Time Markov Chains (DTMC)

Sequence modelling. Marco Saerens (UCL) Slides references

Markov Chains and Monte Carlo Methods

6 Markov Chain Monte Carlo (MCMC)

Stochastic processes and Markov chains (part II)

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Stat 516, Homework 1

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

The Theory behind PageRank

CDA5530: Performance Models of Computers and Networks. Chapter 3: Review of Practical

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

Markov Chains Absorption Hamid R. Rabiee

Language Acquisition and Parameters: Part II

Markov chains. Randomness and Computation. Markov chains. Markov processes

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

1.3 Convergence of Regular Markov Chains

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Countable state discrete time Markov Chains

Stochastic modelling of epidemic spread

Stochastic Models: Markov Chains and their Generalizations

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

Uncertainty Runs Rampant in the Universe C. Ebeling circa Markov Chains. A Stochastic Process. Into each life a little uncertainty must fall.

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

Transcription:

Markov Chains Sarah Filippi Department of Statistics http://www.stats.ox.ac.uk/~filippi TA: Luke Kelly With grateful acknowledgements to Prof. Yee Whye Teh's slides from 2013 14.

Schedule 09:30-10:30 Lecture: Introduction to Markov chains 10:30-12:00 Practical 12:00-13:00 Lecture: Continuous Markov Chain 13:00-14:00 Lunch 14:00-15:30 Practical

Practical Practicals Some mathematical derivations. Some programming in: R MATLAB Probably not possible to do all practicals; pick and choose. Package available at http://www.stats.ox.ac.uk/~filippi/teaching.html

Markov Chains Andrey Andreyevich Markov 1856-1922

Sequential Processes Sequence of random variables X 0, X 1, X 2, X 3,... Not iid (independently and identically distributed). Examples: X i = Rain or shine on day i. X i = Nucleotide base at position i. X i = State of system at time i. Joint probability can be factorized using Bayes Theorem: P(X 0 = x 0,X 1 = x 1,X 2 = x 2...) =P(X 0 = x 0 )P(X 1 = x 1 X 0 = x 0 )P(X 2 = x 2 X 0 = x 0,X 1 = x 1 )

Markov Assumption Markov Assumption: each X i only depends on the previous X i-1. P(X 0 = x 0,X 1 = x 1,X 2 = x 2...) =P(X 0 = x 0 )P(X 1 = x 1 X 0 = x 0 )P(X 2 = x 2 X 0 = x 0,X 1 = x 1 ) =P(X 0 = x 0 )P(X 1 = x 1 X 0 = x 0 )P(X 2 = x 2 X 1 = x 1 ) Future is independent of the past, given the present. Process has no memory. X 0 X 1 X 2 X 3 Higher order Markov chains: P(X t = x t X 0 = x 0,...,X t 1 = x t 1 ) =P(X t = x t X t k = x t k,...,x t 1 = x t 1 )

Random Pub Crawl

Jukes-Cantor DNA Evolution GCTCATGCCG GCTAATGCCG GCTATTGGCG A G C T A 1 3 G 1 3 C 1 3 T 1 3 Mutation process operates independently at each position. Small total probability 3ε of a mutation happening at each generation.

Random Walk on Z Start at 0. Move up or down with probability 1/2. 20 15 10 0.5 3 2 10 5 0 0 1 0 0 10 0.5 1 5 1 1.5 2 10 20 3 2 1 2 3 4 5 6 7 8 9 10 11 4 15 5 0 20 40 60 80 100 120 20 30 40 25 0 200 400 600 800 1000 1200 50 0 2000 4000 6000 8000 10000 12000

Parameterization X 0 X 1 X 2 X 3 Initial distribution: P(X 0 = i) = i Transition probability matrix: P(X t = j X t 1 = i) =T ij Homogeneous Markov chains (transition probabilities do not depend on the time step) Inhomogeneous Markov chains - transitions do depend on time step.

Transition probability matrix Transition probability matrix T has to have: non-negative entries rows that sum to 1 Any such matrix is a transition probability matrix.

State Transition Diagrams 1/4 1/4 1/2 1/2 1/4 1/4 1/4 1/4 1/4 1/4 1/4 1/4 1/4 1 1/2 1/2 1/4 1

Practical Simulating Random Pub Crawl (*) Write a programme to simulate from the random pub crawl. (From the home state allow probability 1/2 of going back to the Royal Oak). Starting from the Home state, run your programme 1000 times, each time simulating a Markov chain of length 100. Each simulation should be a random sequence of values (s 1,s 2,s 3...,s 100 ) where each s i is a pub. Collect statistics of the number of times each state is visited at each time step t = 1...100. How do the statistics defer if you started at Magdelen Arms? Does the distribution over states visited at step t converge for large t? Approximately how long does it take for the chain to forget whether it started at Home or at Magdelen Arms?

Useful Properties of Markov Chains

Chapman-Kolmogorov Equations We can calculate multi-step transition probabilities recursively: P(X t+2 = j X t = i) = X P(X t+2 = j X t+1 = k)p(x t+1 = k X t = i) k = X k T ik T kj Similarly: =(T 2 ) ij P (m) ij :=P(X t+m = j X t = i) = X k P(X t+m = j X t+1 = k)p(x t+1 = k X t = i) = X k P (m 1) ik T kj =(T m ) ij

Marginal Distributions Similarly we can calculate the marginal probabilities of each X i recursively: P (t) i :=P(X t = i) = X k P(X t 1 = k)p(x t = i X t 1 = k) = X k P (t 1) k T ki =( T t ) i where we take λ to be a row vector.

Communicating Classes Open Absorbing state Closed Irreducible: One communicating class.

Periodicity 3 3 3 2 2 1 Period of i: gcd{n : P(returning to i from i in n steps) > 0} If a chain is irreducible, then all states have the same period. If the period is 1, then we say the chain is aperiodic.

Recurrence and Transience If we start at state i, what is the chance that we will return to i? Two possibilities: P( t>0:x t = i X 0 = i) =p<1 Total number of times we will encounter i in all future will be finite. State i is transient. We will return to i infinitely often. State i is recurrent. P( t>0:x t = i X 0 = i) =1 A state i is recurrent if and only if X P (t) ii = 1 t

Random Walk on Z Start at 0. Move up or down with probability 1/2. 20 10 0 10 20 X 2t is the sum of 2t iid {+1,-1} variables. It equals 0 if there are exactly t +1 s, and t -1 s. This probability is: P (2t) 00 = (2t) tt Using Stirling s Formula: This sums to infinity over t, so chain is recurrent. 30 40 2t 1 p 1 p 2 t n p 2 n n+1/2 e n 50 0 2000 4000 6000 8000 10000 12000

Positive Recurrence and Null Recurrence Recurrence: Chain will revisit a state infinitely often. From state i we will return to i after a (random) finite number of steps. But the expected number of steps can be infinite This is called null recurrence. If expected number of steps is finite, this is called positive recurrent. Example: random walk on Z.

Practical Communicating Classes Find the communicating classes and determine whether each class is open or closed, and the periodicity of the closed classes. 0 1 0 0 0 0 1 0 1/2 0 1/2 0 B0 0 1 0 0 C @ 0 1/4 1/4 1/4 1/4A 1 0 0 0 0 0 1 1/2 0 1/2 0 0 0 1/4 0 3/4 0 B 0 0 1/3 0 2/3 C @ 1/4 1/2 0 1/4 0 A 1/3 0 1/3 0 1/3

Convergence of Markov Chains

Do Markov Chains Forget? A Markov chain on {0,1,2,...,100}. At each step: move up or down by 1 at random, except at boundaries. Start at 0 and at 100. 100 90 80 70 60 50 40 30 20 10 0 10 0 10 1 10 2 10 3 10 4

Do Markov Chains Forget?

Stationary Distribution If a Markov chain forgets then for any two initial distributions/ probability vectors λ and γ, In particular, there is a distribution/probability vector π such that T n T n for large n T n as n 1 Taking λ = π, we see that T = Such a distribution is called a stationary or equilibrium distribution. When do Markov chains have stationary distributions? When are stationary distributions unique?

Convergence Theorems A positive recurrent Markov chain T has a stationary distribution. If T is irreducible and has a stationary distribution, then it is unique and i = 1 m i where m i is the mean return time of state i. If T is irreducible, aperiodic and has stationary distribution π then P(X n = i) i as n 1 (Ergodic Theorem): If T is irreducible with stationary distribution π then #{t n : X t = i} i as n n

Stationarity and Reversibility Global balance: at a stationary distribution, the flow of probability mass into and out of each state has to be balanced: KX i=1 it ij = j = KX k=1 jt jk Detailed balance: the flow of probability mass between each pair of states is balanced: it ij = j T ji A Markov chain satisfying detailed balance is called reversible. Reversing the dynamics leads to the same chain. Detailed balance can be used to check that a distribution is the stationary distribution of a irreducible, periodic, reversible Markov chain.

Eigenvalue Decomposition The stationary distribution is a left eigenvector of T, with eigenvalue 1. T = All eigenvalues of T have length 1. (Some eigenvalues can be complex valued). If there is another eigenvector with eigenvalue 1, then stationary distribution is not unique.

Practical Random Walk Show that a random walk on a connected graph is reversible, and has stationary distribution π with π i proportional to deg(i), the number of edges connected to i. What is the probability the drinker is at home at Monday 9am if he started the pub crawl on Friday?

Practical Stationary Distributions Solve for the (possibly not unique) stationary distribution(s) of the following Markov chains. 0 1 0 0 0 0 1 0 1/2 0 1/2 0 B0 0 1 0 0 C @ 0 1/4 1/4 1/4 1/4A 1 0 0 0 0 0 1 1/2 0 1/2 0 0 0 1/4 0 3/4 0 B 0 0 1/3 0 2/3 C @ 1/4 1/2 0 1/4 0 A 1/3 0 1/3 0 1/3

Estimating Markov Chains

Maximum Likelihood Estimation Observe a sequence x 0, x 1, x 2, x 3,... x t. Likelihood of the sequence under the Markov chain model is: ty Y K KY L(,T)= x0 T xs 1 x s = x0 s=1 i=1 j=1 where N ij is the number of observed transitions i -> j. We can solve for the maximum likelihood estimator: T N ij ij T ij = N ij P K k=1 N ik

Practical Markov Model of English Text (*) Download a large piece of English text, say War and Peace from Project Gutenberg. We will model the text as a sequence of characters. Write a programme to compute the ML estimate for the transition probability matrix. You can use the file markov_text.r or markov_text.m to help convert from text to the sequence of states needed. There are K = 96 states and the two functions are text2states and states2text. Generate a string of length 200 using your ML estimate. Does it look sensible?

Continuous-Time Markov Chains

Jukes-Cantor DNA Evolution A G C T A 1 3 G 1 3 C 1 3 T 1 3 Probability of mutation is O(ε) per generation. mutations will appear at rate of once every O(1/ε) generations. Measuring time in units of 1/ε leads to a continuous-time Markov chain. In each time step of length ε, total probability of a mutation is 3ε. 0 1 0 1 3 P = B 1 3 C @ 1 3 A = I + B @ 1 3 A G C T 3 1 1 1 1 3 1 1 1 1 3 1 1 1 1 3 time 1 C A

Continuous-time Markov Chains A collection of random variables (X t ) t 0. An initial distribution λ and a transition rate matrix R. Suppose X t = i. Then in the next ε time, P(X t+ Rows of R sum to 0. = j X t = i) =I ij + R ij Off-diagonal entries are non-negative. On-diagonal entries are negative of sum of off-diagonal ones. 0 B @ 1 3 1 1 1 1 3 1 1 C 1 1 3 1 A 1 1 1 3

Lotka-Volterra Process (Predator-Prey) A continuous-time Markov chain over N 2, number of predators and preys in an ecosystem. R({x, y} {x +1,y}) = x R({x, y} {x 1,y}) = xy R({x, y} {x, y +1}) = xy R({x, y} {x, y 1}) = y

Gillespie s Algorithm A G C T time Start by sampling X 0 from initial distribution λ. When in state i, wait in state for an amount of time distributed as Exp( R ii ) At end of waiting time, transition to a different state j i with probability (R i1,...,r ij 1, 0,R ij+1,...,r ik ) R ii

Chapman-Kolmogorov Equations Denote P(t) the transition probability matrix over time interval t. Transition probabilities can be computed using matrix exponentiation: P (t) ij :=P(X t = j X 0 = i) =P(X tn 1 n = j X 0 = i) ((I + 1 n R)tn ) ij exp(tr) ij Composition of transition probability matrices: P (t + s) =exp((t + s)r) =exp(tr)exp(sr) =P (t)p (s) Forward/backward equations: P (t + )=P(t)(I + R) P (t) t P (t + ) P (t) P (t)r = RP (t)

Convergence to Stationary Distribution Suppose we have an irreducible, aperiodic and positive recurrent continuous-time Markov chain with rate matrix R. Then it has a unique stationary distribution π which it converges to: 1 T Z T 0 P(X t = i) i as t 1 1(X t = i) i as t 1

Reversibility and Detailed Balance If a Markov chain with rate matrix R has reached its stationary distribution π, then flow of probability mass into and out of states is balanced. Global balance: KX i=1 X i6=j KX ir ij = 0 = k=1 ir ij = j R jj = X k6=j jr jk jr jk Detailed balance for reversible chains: ir ij = j R ji

Kimura 80 Model Rate matrix: A G C T A 2 1 1 G 2 1 1 C 1 1 2 T 1 1 2 Distinguish between transitions A G (purine) and C T (pyrimidine) and transversions. Practical: show that the stationary distribution of K80 model is uniform over {A,G,C,T}.

Felsenstein 81 Model Rate matrix: A G C T A G C T G C T G A A C T C T C A G A G T T T A G C A G C Practical: Find the stationary distribution of the F81 model.

Practical Predator-Prey Model (*) Use Gillespie s algorithm to simulate from the predator-prey model: R({x, y} {x +1,y}) = x R({x, y} {x 1,y}) = xy R({x, y} {x, y +1}) = xy R({x, y} {x, y 1}) = y You can represent the continuous-time trajectory using a sequence of pairs (t 0,s 0 ), (t 1,s 1 ),... (t A,s A ), where t 0 = 0, s 0 is the initial state at time 0, each subsequent pair (t a,s a ) is the next state and the time the chain jumps to the state. How is the dynamics of the model affected by the parameters α, β, γ, δ?

Summary Definition of Markov Chains Properties of Markov Chains Communication classes Recurrence and Transience Periodicity Convergence of Markov Chains : Stationary Distribution Estimation of Markov Chains Continuous Time Markov Chains