The Transition Probability Function P ij (t) Consider a continuous time Markov chain {X(t), t 0}. We are interested in the probability that in t time units the process will be in state j, given that it is currently in state i P ij (t) = P (X(t + s) = j X(s) = i) This function is called the transition probability function of the process. Example: Find the transition probability functions P ij (t) (i j) and P ii (t) for a pure birth process with birth rates {λ n } n 0. Definition: The Yule process is a pure birth process in which each individual in the population is assumed to give birth at rate λ. That is, λ n = nλ (n 1). Example: Consider a Yule process. Assuming that the population starts with a single individual (X(0) = 1), find the distribution of the size of the population at time t. 61
Recall, that transition probabilities for discrete time Markov chains were subject to the Chapman-Kolmogorov equations which said that k-step transition probabilities could be obtained from raising the 1-step transition probability matrix to the k th power. We would like to derive a similar relationship (but in form of differential equations) for the transition probability functions P ij (t). Recall further that a continuous-time Markov chain can be defined through the rates v i at which transitions from one state to another occur and through the (not time dependent) transition probability matrix P = (P ij ). Definition: Let q ij = v i P ij be the rate at which the process makes transitions from state i to state j. The q ij are called the instantaneous transition rates of the process. Note that both the transition rates and transition probabilities can be defined in terms of the instantaneous transition rates. v i = j v i P ij = j q ij and P ij = q ij = q ij v i Specifying the instantaneous transition rates is equivalent to specifying v and P. To derive differential equations for the transition probability functions, we need a few intermediate results. j q ij Lemma 1: Proof: 1 P ii (h) lim h 0 h = v i 62
Lemma 2: Proof: P ij (h) lim h 0 h = q ij (i j) Lemma 3: Chapman-Kolmogorov equations for CTMC For all s 0, t 0 P ij (t + s) = Proof: P ik (t)p kj (s) Combining the results above, we now have P ij (h + t) P ij (t) = P ik (h)p kj (t) P ij (t) = k i P ik (h)p kj (t) [1 P ii (h)]p ij (t) Hence P ij (t + h) P ij (t) lim h 0 h = lim h 0 ( k i P ik (h) P kj (t) h [ 1 Pii (h) h ] ) P ij (t) Exchanging the limit and summation and using the above results leads to the following Theorem. Theorem: Kolmogorov Backward Equations P ij(t) = q ik P kj (t) v i P ij (t) k i Note: Why are these equations called backward equations? Consider a specific target state j (fixed) that you want to reach by time t+s. The Backward equations 63
assign probabilities of reaching j within s time units if at time t you are currently in state i (as a function of i). That is, backwards in time. Example: Formulate the backwards equations for a birth and death process with rates {λ n } n 0 and {µ n } n 1, respectively. Example: Suppose a machine works for an amount of time with exponential distribution with rate λ before breaking down. The time it takes for the machine to be repaired is exponential with rate µ. Set up a continuous-time Markov chain probability model (i.e., define the states and transition parameters). Formulate the backward equations. 64
Similarly to Kolmogorov s backward equations one can also derive another set of differential equations known as Kolmogorov s forward equations. These equations are useful in finding the probability distribution of the future state j if the current state i is fixed (forwards in time). For the backwards equations we have used the Chapman-Kolmogorov equations in the form P ij (h + t) = P ik (h)p kj (t) of course we could also write P ij (t + h) = P ik (t)p kj (h) When combined with the results of the other lemmata, this leads to: P ij (t + h) P ij (t) = P ik (t)p kj (h) P ij (t) = P ik (t)p kj (h) [1 P jj (h)]p ij (t) which becomes P ij (t + h) P ij (t) lim h 0 h = lim h 0 ( Theorem: Kolmogorov s Forward Equations P ik (t) P kj(h) h [ 1 Pjj (h) h ] ) P ij (t) Under suitable regularity conditions (i.e., whenever it is ok to interchange summation and limit above) the following holds P ij(t) = q kj P ik (t) v j P ij (t) Example: Consider the pure birth process with general birth rates {λ n } n 0. Use the forward equations to find P ii (t) and P ij (t) (i < j). 65
Limiting Probabilities Recall, that we have show for discrete time Markov chains that under certain conditions (the chain being ergodic, for instance) a limiting distribution exists. The limiting distribution was the probability that the chain is in state j after a long time of running, regardless of the initial state. An analogous concept exists for continuous time Markov chains. Definition: Define P j = lim P ij (t) t Note, that the definition assumes that this limits exists and does not depend on the initial state i. For discrete time Markov chains, the limiting probabilities (or stationary distributions) could be found as left eigenvectors of the transition probability matrix corresponding to eigenvalue one. For continuous-time Markov chains we will be using the Kolmogorov forward equations to derive them. P ij(t) = q kj P ik (t) v j P ij (t) Taking limits on both sides (and assuming that limit and summation are exchangeable) we obtain lim P ij(t) = lim q kj P ik (t) v j P ij (t) t t lim P t ij(t) = q kj P k v j P j Since P ij (t) is a probability (and hence bounded between zero and one), the limit of it s derivative must be zero. (Why?) Thus, 0 = q kj P k v j P j or v j P j = q kj P k This set of equations are sometimes called the Balance Equations of the process. Together with the fact that the limiting probabilities P j need to sum to one they can be used to solve for the limiting probabilities. Note, that v j P j is the long-run rate at which the Markov process leaves state j, whereas q kjp k is the long-run rate at which the process enters state j. Note: The above assumed that the limiting probabilities exist. That is the case for continuous time Markov chains if the Markov chain is irreducible (that is, each state is reachable from every other state) and positive recurrent in the sense that the expected time to return to state j from state j is finite for every state j. Continuous time Markov chains of this type are called ergodic. 66
Example: Consider a M/M/1 queue in which arrivals occur according to a Poisson process with rate λ = 20 and service times are exponentially distributed with rate µ = 10. Suppose further that the maximal length of the queue is 1. That is, there will never be more than one customer waiting in line. If a customer arrives, already finding another customer waiting in line, that customer will go away (without being served). Model this problem through a continuous-time Markov chain (define the states and transition rates of the process). Use the model to find the proportion of time the server will be busy. What is the proportion of customers that will go away without being served? 67
Computing Transition Probabilities Recall, that for discrete time Markov chains, we have found stationary distributions by solving the matrix equation πp = π. A similar matrix equation exists also for continuous time Markov chains but we will require additional notation. Definition: Let R = (r ij ) with { qij if i j r ij = v i if i = j R is called the transition rate matrix of the process. In this notation, the Kolmogorov forward and backward equations become (Backward) P ij(t) = q ik P kj (t) v i P ij (t) = k i k r ik P kj (t) (Forward) P ij(t) = q kj P ik (t) v j P ij (t) = k r kj P ik (t) In matrix notation (using P (t) = (P ij(t))) that can be written as matrix differential equations. P (t) = RP(t) and P (t) = P(t)R Similar to scalar differential equations (where the only function which is proportional to its own derivative is the exponential function) one can show that the solution to the above matrix differential equations is P(t) = P(0)e Rt where P(0) = I is the identity matrix and the matrix e Rt is defined through e Rt = n=0 R n tn n! It is not efficient to compute e Rt directly through the sum above (because there is a lot of computational error in raising R to high powers). But there are approximation methods that work well in practice. Recall, that the exponential function can also be expressed as e x = lim n (1 + x n In this context that means that ( e Rt = lim 1 + R t ) n n n Using a large number for n, usually of the form n = 2 k, requires less computation (and thus introduces less error) in approximating e Rt. ) n 68
Example: Consider again the M/M/1 queue from the previous example. Formulate the transition rate matrix for this process and use it to approximate P(5). If there is currently one customer in line, what is the probability that t = 5 (or 50, or 500) time units from now the server will be idle? 69