Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Similar documents
Markov Chains Handout for Stat 110

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Statistics 150: Spring 2007

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Stochastic process. X, a series of random variables indexed by t

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes

STOCHASTIC PROCESSES Basic notions

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

STAT STOCHASTIC PROCESSES. Contents

2 Discrete-Time Markov Chains

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

CDA5530: Performance Models of Computers and Networks. Chapter 3: Review of Practical

Lecture 20 : Markov Chains

6 Markov Chain Monte Carlo (MCMC)

Markov Chains (Part 3)

TMA4265 Stochastic processes ST2101 Stochastic simulation and modelling

SMSTC (2007/08) Probability.

Markov chain Monte Carlo

Stochastic Simulation

2. Transience and Recurrence

ISE/OR 760 Applied Stochastic Modeling

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chain Monte Carlo

TCOM 501: Networking Theory & Fundamentals. Lecture 6 February 19, 2003 Prof. Yannis A. Korilis

Markov Processes Hamid R. Rabiee

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

14 Branching processes

IEOR 6711, HMWK 5, Professor Sigman

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

Probability, Random Processes and Inference

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Lecture 9 Classification of States

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

The Transition Probability Function P ij (t)

Bayesian Methods with Monte Carlo Markov Chains II

Reinforcement Learning

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10

Question Points Score Total: 70

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Ch5. Markov Chain Monte Carlo

Continuous time Markov chains

Lecture 20: Reversible Processes and Queues

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

4 Branching Processes

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Monday, Feb 10, 2014

Chapter 2: Markov Chains and Queues in Discrete Time


Lecture 4a: Continuous-Time Markov Chain Models

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

Discrete time Markov chains. Discrete Time Markov Chains, Definition and classification. Probability axioms and first results

Chapter 16 focused on decision making in the face of uncertainty about one future

Part I Stochastic variables and Markov chains

ISyE 6761 (Fall 2016) Stochastic Processes I

STA 624 Practice Exam 2 Applied Stochastic Processes Spring, 2008

MAS275 Probability Modelling Exercises

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

1 Random Walks and Electrical Networks

Stochastic modelling of epidemic spread

Stochastic modelling of epidemic spread

1 Random walks: an introduction

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Markov Chains. INDER K. RANA Department of Mathematics Indian Institute of Technology Bombay Powai, Mumbai , India

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013

Homework 4 due on Thursday, December 15 at 5 PM (hard deadline).

12 Markov chains The Markov property

Markov Chains Introduction

Stat 516, Homework 1

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

MARKOV PROCESSES. Valerio Di Valerio

Markov Chain Model for ALOHA protocol

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Markov processes and queueing networks

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Chapter 7. Markov chain background. 7.1 Finite state space

Monte Carlo Methods. Leon Gu CSD, CMU

LECTURE #6 BIRTH-DEATH PROCESS

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

MCMC and Gibbs Sampling. Sargur Srihari

Advanced Sampling Algorithms

ISM206 Lecture, May 12, 2005 Markov Chain

LECTURE 3. Last time:

P(X 0 = j 0,... X nk = j k )

Markov Chains and MCMC

T. Liggett Mathematics 171 Final Exam June 8, 2011

Lectures on Markov Chains

Budapest University of Tecnology and Economics. AndrásVetier Q U E U I N G. January 25, Supported by. Pro Renovanda Cultura Hunariae Alapítvány

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Interlude: Practice Final

Inference in Bayesian Networks

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS

EXAM IN COURSE TMA4265 STOCHASTIC PROCESSES Wednesday 7. August, 2013 Time: 9:00 13:00

1 IEOR 4701: Continuous-Time Markov Chains

Transcription:

4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X n+1 =j X n =i] In words, the past is conditionally independent of the future given the present state of the process or given the present state, the past contains no additional information on the future evolution of the system. The Markov property is common in probability models because, by assumption, one supposes that the important variables for the system being modeled are all included in the state space. We consider homogeneous Markov chains for which P[X n+1 =j X n =i] = P[X 1 =j X 0 =i]. 1

Example: physical systems. If the state space contains the masses, velocities and accelerations of particles subject to Newton s laws of mechanics, the system in Markovian (but not random!) Example: speech recognition. Context can be important for identifying words. Context can be modeled as a probability distrubtion for the next word given the most recent k words. This can be written as a Markov chain whose state is a vector of k consecutive words. Example: epidemics. Suppose each infected individual has some chance of contacting each susceptible individual in each time interval, before becoming removed (recovered or hospitalized). Then, the number of infected and susceptible individuals may be modeled as a Markov chain. 2

Define P ij = P [ X n+1 =j X n =i ]. Let P = [P ij ] denote the (possibly infinite) transition matrix of the one-step transition probabilities. Write P 2 ij = k=0 P ikp kj, corresponding to standard matrix multiplication. Then P 2 ij = k P [ X n+1 =k X n =i ] P [ X n+2 =j X n+1 =k ] = k P [ X n+2 =j,x n+1 =k X n =i ] (via the Markov property. Why?) [ { = P Xn+2 =j,x n+1 =k } ] X n =i k = P [ X n+2 =j X n =i ] Generalizing this calculation: The matrix power Pij n gives the n-step transition probabilities. 3

The matrix multiplication identity P n+m = P n P m corresponds to the Chapman-Kolmogorov equation P n+m ij = k=0 Pn ik Pm kj. Let ν (n) be the (possibly infinite) row vector of probabilities at time n, so ν (n) i = P [ X n =i ]. Then ν (n) = ν (0) P n, using standard multiplication of a vector and a matrix. Why? 4

Example. Set X 0 = 0, and let X n evolves as a Markov chain with transition matrix 1/2 1/2 0 P = 0 0 1 1 0 0. 0 1/2 1/2 1 1 1 2 Find ν (n) 0 = P[X n =0] by (i) using a probabilistic argument 5

(ii) using linear algebra. 6

Classification of States State j is accessible from i if Pij k k 0. > 0 for some The transition matrix can be represented as a directed graph with arrows corresponding to positive one-step transition probabilities j is accessible from i if there is a path from i to j. For example, 0 1 2 3 4... Here, 4 is accessible from 0, but not vice versa. i and j communicate if they are accessible from each other. This is written i j, and is an equivalence relation, meaning that (i) i i [reflexivity] (ii) If i j then j i [symmetry] (iii) If i j and j k then i k [transitivity] 7

An equivalence relation divides a set (here, the state space) into disjoint classes of equivalent states (here, called communication classes). A Markov chain is irreducible if all the states communicate with each other, i.e., if there is only one communication class. The communication class containing i is absorbing if P jk = 0 whenever i j but i k (i.e., when i communicates with j but not with k). An absorbing class can never be left. A partial converse is... Example: Show that a communication class, once left, can never be re-entered. 8

State i has period d if Pii n = 0 when n is not a multiple of d and if d is the greatest integer with this property. If d = 1 then i is aperiodic. Example: Show that all states in the same communication class have the same period. Note: This is obvious by considering a generic directed graph for a periodic state: i i+1 i+2 i+d-1...... 9

State i is recurrent if P [ re-enter i X 0 =i] = 1, where {re-enter i} is the event n=1 {X n = i,x k i for k = 1,...,n 1}. If i is not recurrent, it is transient. Let S 0 = 0 and S 1,S 2,... be times of successive returns to i, with N i (t) = max{n : S n t} being the corresponding counting process. If i is recurrent, then N i (t) is a renewal process, since the Markov property gives independence of interarrival times X A n = S n S n 1. Letting µ ii = E[X A 1 ], the expected return time for i, we then have the following from renewal theory: P [ lim t N i (t)/t = 1/µ ii X 0 =i ] = 1 If i j, lim n n k=1 Pk ij /n = 1/µ jj If i is aperiodic, lim n P n ij = 1/µ jj for j i If i has period d, lim n P nd ii = d/µ ii 10

If i is transient, then N i (t) is a defective renewal process. This is a generalization of renewal processes where X A 1,X A 2,... are still iid, abut we allow P[X A 1 = ] > 0. Proposition: i is recurrent if and only if n=1 Pn ii =. 11

Example: Show that if i is recurrent and i j then j is recurrent. Example: If i is recurrent and i j, show that P [ never enter state j X 0 =i ] = 0. 12

If i is transient, then µ ii =. If i is recurrent and µ ii < then i is said to be positive recurrent. Otherwise, if µ ii =, i is null recurrent. Proposition: If i j and i is recurrent, then either i and j are both positive recurrent, or both null recurrent (i.e., positive/null recurrence is a property of communication classes). 13

Random Walks The simple random walk is a Markov chain on the integers, Z = {..., 1,0,1,2,...} with X 0 = 0 and P [ X n+1 =X n +1 ] = p, P [ X n+1 =X n 1 ] = 1 p. Example: If X n counts the number of successes minus the number of failures for a new medical procedure, X n could be modeled as a random walk, with p the success rate of the procedure. When should the trial be stopped? If p = 1/2, the random walk is symmetric. The symmetric random in d dimensions is a vector valued Markov chain, with state space Z d, X (d) 0 = (0,...,0). Two possible definitions are (i) Let X (d) n+1 X(d) n take each of the 2 d possibilities (±1,±1,...,±1) with equal probability. (ii) Let X (d) n+1 X(d) n take one of the 2d values (±1,0,...,0), (0,±1,0,...,0),... with equal probability. This is harder to analyze. 14

Example: A biological signaling molecule becomes separated from its receptor. It starts diffusing, due to themal noise. Suppose the diffusion is well modeled by a random walk. Will the molecule return to the receptor? If the molecule is constrained to a one-dimensional line? A two-dimensional surface? Three-dimensional space? Proposition: The symmetric random walk is null recurrent when d = 1 and d = 2, but transient for d 3. Proof: The method is to employ Stirling s formula, n! n n+1/2 e n 2π where a n b n means lim n (a n /b n ) = 1, to approximate P [ X (d) ]. n =X (d) 0 Note: This is in HW 5. You are expected to solve the simpler case (i), though you can solve (ii) if you want a bigger challenge. 15

Stationary Distributions π = {π i,i = 0,1,...} is a stationary distribution for P = [P ij ] if π j = i=0 π ip ij with π i 0 and i=0 π i = 1. In matrix notation, π j = i=0 π ip ij is π = πp where π is a row vector. Theorem: An irreducible, aperiodic, positive recurrent Markov chain has a unique stationary distribution, which is also the limiting distribution π j = lim n P n ij. Such Markov chains are called ergodic. Proof 16

Proof continued 17

Irreducible chains which are transient or null recurrent have no stationary distribution. Why? Chains which are periodic or which have multiple communicating classes may have lim n Pij n not existing, or depending on i. A chain started in a stationary distribution will remain in that distribution, i.e., will result in a stationary process. If we can find any probability distribution solving the stationarity equations π = πp and we check that the chain is irreducible and aperiodic, then we know that (i) The chain is positive recurrent. (ii) π is the unique stationary distribution. (iii) π is the limiting distribution. 18

Example: Monte Carlo Markov Chain Suppose we wish to evaluate E [ h(x) ] where X has distribution π (i.e., P [ X=i ] = π i ). The Monte Carlo approach is to generate X 1,X 2,...X n π and estimate E [ h(x) ] 1 n n i=1 h(x i) If it is hard to generate an iid sample from π, we may look to generate a sequence from a Markov chain with limiting distribution π. This idea, called Monte Carlo Markov Chain (MCMC), was introduced by Metropolis and Hastings (1953). It has become a fundamental computational method for the physical and biological sciences. It is also commonly used for Bayesian statistical inference. 19

Metropolis-Hastings Algorithm (i) Choose a transition matrix Q = [q ij ] (ii) Set X 0 = 0 (iii) for n = 1,2,... generate Y n with P [ Y n =j X n 1 =i ] = q ij. If X n 1 = i and Y n = j, set j with probability min(1,π j q ji /π i q ij ) X n = i otherwise Here, Y n is called the proposal and we say the proposal is accepted with probability min(1,π j q ji /π i q ij ). If the proposal is not accepted, the chain stays in its previous state. 20

Proposition: Set q ij min(1,π j q ji /π i q ij ) j i P ij = q ii + ik {1 min(1,π k q ki /π i q ik )} j = i k iq Then π is a stationary distribution of the Metropolis-Hastings chain {X n }. If P ij is irreducible and aperiodic, then π is also the limiting distribution. Proof. 21

Example: (spatial models on a lattice) Let {Y ij,1 i m,1 j n} be a spatial stochastic process. Let N(i, j) define a neighborhood, e.g. N(i,j) = {(p,q) : p i + q j = 1} We may model the conditional distribution of Y ij given the neighbors. E.g., if your neighbors vote Republican (R), you are more likely to do so. Say, P [ Y ij =R {Y pq,(p,q) (i,j)}] = 1 2 +α[( (p,q) N(i,j) I {Y pq =R}) 2 ]. The full distribution π of Y is, in principle, specified by the conditional distributions. In practice, one simulates from π using MCMC, where the proposal distribution is to pick a random (i,j) and swap it from R to D or vice versa. 22

Example: Sampling conditional distributions. If X and Θ have joint density f XΘ (x,θ) and we observe X = x, then one wants to sample from f Θ X (θ x) = f XΘ(x,θ) f X (x) f Θ (θ)f X Θ (x θ) For Bayesian inference, f Θ (θ) is the prior distribution, and f Θ X (θ x) is the posterior. The model determines f Θ (θ) and f X Θ (x θ). The normalizing constant, f X (x) = f XΘ (x,θ)dθ, is often unknown. Using MCMC to sample from the posterior allows numerical evaluation of the posterior mean E [ Θ X=x ], posterior variance Var ( Θ X=x ), or a 1 α credible region defined to be a set A such that P [ Θ A X=x ] = 1 α. 23

Galton-Watson Branching Process Let X n be the size of a population at time n. Suppose each individual has an iid offspring distribution, so X n+1 = X n k=1 Z(k) n+1 where Z (k) n+1 Z. We can think of Z (k) n+1 = 0 being the death of the parent, or think of X n as the size of the n th generation. Suppose X 0 = 1. Notice that X n = 0 is an absorbing state, termed extinction. Supposing P[Z = 0] > 0 and P[Z > 1] > 0, we can show that 1,2,3,... are transient states. Why? Thus, the branching process either becomes extinct or tends to infinity. 24

Set φ(s) = E[s Z ], the probability generating function for Z. Set φ n (s) = E[s X n ]. Show that φ n (s) = φ (n) (s) = φ(φ(...φ(s)...)), meaning φ applied n times. 25

If we can find a solution to φ(s) = s, we will have φ n (s) = s for all n. This suggests plotting φ(s) vs s, noting that (i) φ(1) = 1. Why? (ii) φ(0) = P[Z=0]. Why? (iii) φ(s) is increasing, i.e., dφ ds > 0. Why? (iv) φ(s) is convex, i.e., d2 φ ds 2 > 0. Why? 26

Now, notice that, for 0 < s < 1, P [ extinction ] = lim n P[ X n =0 ] = lim n φ n(s). Argue that, in CASE 1, lim n φ n (s) is the unique fixed point φ(s) = s for 0 < s < 1. In CASE 2, lim n φ n (s) = 1. 27

Conclude that in CASE 1 (with dφ ds s=1 > 1, i.e. E[Z] > 1) there is some possibility of infinite growth. In CASE 2 (with dφ ds s=1 1, i.e. E[Z] 1) extinction is assured. Example: take 0 w.p. 1/4 Z = 1 w.p. 1/4 2 w.p. 1/2 Find the chance of extinction.. Now suppose the founding population has size k (i.e., X 0 = k). Find the chance of extinction. 28

Time Reversal Thinking backwards in time can be a powerful strategy. For any Markov chain {X k,k = 0,1,...,n}, we can set Y k = X n k. Then Y k is a Markov chain. Suppose X k has transition matrix P ij, then Y k has inhomogeneous transition matrix P [k] ij, where P [k] ij = P [ Y k+1 =j Y k =i ] = P [ X n k 1 =j X n k =i ] = P[ X n k =i X n k 1 =j ] P [ X n k 1 =j ] P [ X n k =i ] = P ji P [ X n k 1 =j ]/ P [ X n k =i ]. If {X k } is stationary, with stationary distribution π, then {Y k } is homogeneous, with P ij = P jiπ j /π i Surprisingly many important chains have the time-reversibility property P ij = P ij, for which the chain looks the same forwards as backwards. 29

Theorem. (detailed balance equations) If π is a probability distribution with π i P ij = π j P ji, then π is a stationary distribution for P and the chain started at π is time-reversible. Proof The detailed balance equations are simpler than the general equations for finding π. Try to solve them when looking for π, in case you get lucky! Intuition: π i P ij is the rate of going directly from i to j. A chain is reversible if this equals rate of going directly from j to i. Periodic chains (π does not give limiting probabilities) and reducible chains (π is not unique) are both OK here. 30

An equivalent condition for reversibility of ergodic Markov chains is every loop of states starting and ending in i has the same probability as the reverse loop, i.e., for all i 1,...,i k, P ii1 P i1,i 2...P ik 1 i k P ik i = P iik P ik i k 1...P i2 i 1 P i1 i Proof 31

Example: Birth-Death Process. Let P [ X n =X n +1 X n =i ] = p i and P [ X n =X n 1 X n =i ] = q i = 1 p i. If {X n } is positive recurrent, it must be reversible. Heuristic reason: p 3. 4 3 q 4 Find the stationary distribution giving conditions for its existence. p 2 p 1 2 1 p 0 =1 0 q 3 q 2 q 1 Note that this chain is periodic (d = 2). There is still a unique stationary distribution which solves the detailed balance equations. 32

Example: Metropolis Algorithm. The Metropolis-Hastings algorithm with symmetric proposals (q ij = q ji ) is the Metropolis algorithm. Here, q ij min(1,π j /π i ) j i P ij = q ii + ik {1 min(1,π k /π i )} j = i k iq Show that P ij is reversible, with stationary distribution π. Solution 33

Random Walk on a Graph Consider undirected, connected graph with a positive weight w ij on each edge ij. Set w ij = 0 if i & j are not connected by an edge. Set P ij = w ij k w ik The Markov chain {X n } with transition matrix P is called a random walk on a graph. Think of a driver who is lost: each vertex is an intersection, the weight is the width (in lanes) of a road leaving an intersection, the driver picks a random exit when he/she arrives at an intersection but has a preference for bigger streets. 1 1 2 1 3 1 2 4 5 2 2 2 0 34

Show that the random walk on a graph is reversible, and has stationary distribution π i = j w / ij jk w jk Note: the double sum counts each weight twice. Hence, find the limiting probability of being at vertex 3 on the graph shown above. 35

Ergodicity A stochastic process {X n } is ergodic if limiting time averages equal limiting probabilities, i.e., lim n time in state i up to time n n = lim n P[X n=i]. Show that an irreducible, aperiodic, positive recurrent Markov chain is ergodic (i.e., this general idea of ergodicity matches our definition for Markov chains). 36

Semi-Markov Processes A Semi-Markov process is a discrete state, continuous time process {Z(t),t 0} for which the sequence of states visited is a Markov chain {X n,n = 0,1,...} and, conditional on {X n }, the times between transitions are independent. Specifically, define transition times S 0,S 1,... by S 0 = 0 and S n S n 1 F ij conditional on X n 1 =i and X n =j. Then, Z(t) = X N(t) where N(t) = sup{n : S n t}. {X n } is the embedded Markov chain. The special case where F ij Exponential(ν i ) is called a continuous time Markov chain The special case when 0, t < 1 F ij (t) = 1, t 1 results in Z(n) = X n, so we retrieve the discrete time Markov chain. 37

Example: A taxi serves the airport, the hotel district and the theater district. Let Z(t) = 0,1,2 if the taxi s current destination is each of these locations respectively. The time to travel from i to j has distribution F ij. Upon arrival at i, the taxi immediately picks up a new customer whose destination has distribution given by P ij. Then, Z(t) is a semi-markov process. To study the limiting behavior of Z(t) we make some definitions... Say {Z(t)} is irreducible if {X n } is irreducible Say {Z(t)} is non-lattice if {N(t)} is non-lattice Set H i to be the c.d.f. of the time in state i before a transition, so H i (t) = j P ijf ij (t) and the expected time of a visit to i is µ i = xdh 0 i (x) = j P ij xdf 0 ij (x). Let T ii be the time between successive visits to i and define µ ii = E [ T ii ]. 38

Proposition: If {Z(t)} is an irreducible, non-lattice semi-markov process with µ ii <, then lim P[ Z(t)=i Z(0)=j ] = µ i t µ ii time in i before t = lim t t w.p. 1 Proof: identify a relevant alternating renewal process. 39

By counting up time spent in i differently, we get another identity lim t time in i during [0,t] t = π iµ i j π jµ j supposing that {X n } is ergodic (irreducible, aperiodic, positive recurrent) with stationary distribution π. Proof 40

Calculations with semi-markov models typically involve identifying a suitable renewal, alternating renewal, regenerative or reward process. Example Let Y(t) = S N(t)+1 t be the residual life process for an ergodic, non-lattice semi-markov model. Find an expression for lim t P [ Y(t) > x ]. Hint: Consider an alternating renewal process which switches on when Z(t) enters i, and switches off immediately if the next transition is not j or switches off once the time until the transition to j is less than x. 41

Proof (continued) 42