Discrete time Markov chains. Discrete Time Markov Chains, Definition and classification. Probability axioms and first results

Discrete time Markov chains Discrete Time Markov Chains, Definition and classification 1 1 Applied Mathematics and Computer Science 02407 Stochastic Processes 1, September 5 2017 Today: Short recap of probability theory Markov chain introduction (Markov property) Chapmann-Kolmogorov equations First step analysis Next week Random walks First step analysis revisited Branching processes Generating functions Two weeks from now Classification of states Classification of chains Discrete time Markov chains - invariant probability distribution Basic concepts in probability Probability axioms and first results Sample space Ω set of all possible outcomes Outcome ω Event A, B Complementary event A c = Ω\A outcome in at least one of Union A B A or B Outcome is in Intersection A B both A and B (Empty) or impossible event 0 P(A) 1, P(Ω) = 1 P(A B) = P(A) + P(B) for A B = Leading to P( ) = 0, P(A c ) = 1 P(A) P(A B) = P(A) + P(B) P(A B) (inclusion- exlusion)

Conditional probability and independence Discrete random variables P(A B) = (multiplication rule) P(A B) P(B) P(A B) = P(A B)P(B) Mapping from sample space to metric space (Read: Real space) Probability mass function f (x) = P(X = x) = P ({ω X(ω) = x}) i B i = Ω B i B j = i j Distribution function P(A) = i P(A B i )P(B i ) (law of total probability) F(x) = P(X x) = P ({ω X(ω) x}) = t x f (t) Independence: Expectation P(A B) = P(A B c ) = P(A) P(A B) = P(A)P(B) E(X) = x xp(x = x), E(g(X)) = x g(x)p(x = x) = x g(x)f (x) Joint distribution We can define the joint distribution of (X 0, X 1 ) through P(X 0 = x 0, X 1 = x 1 ) = P(X 0 = x 0 )P(X 1 = x 1 X 0 = x 0 ) = P(X 0 = x 0 )P x0,x 1 f (x 1, x 2 ) = P(X 1 = x 1, X 2 = x 2 ), F(x 1, x 2 ) = P(X 1 x 1, X 2 x 2 ) f X1 (x 1 ) = P(X 1 = x 1 ) = P(X 1 = x 1, X 2 = x 2 ) = f (x 1, x 2 ) x 2 x 2 F X1 (x 1 ) = P(X 1 = t 1, X 2 = x 2 ) = F(x 1, ) t x 1,x 2 Straightforward to extend to n variables Suppose now some stationarity in addition that X 2 conditioned on X 1 is independent on X 0 P(X 0 = x 0, X 1 = x 1, X 2 = x 2 ) = P(X 0 = x 0 )P(X 1 = x 1 X 0 = x 0 )P(X 2 = x 2 X 0 = x 0, X 1 = x 1 ) = P(X 0 = x 0 )P(X 1 = x 1 X 0 = x 0 )P(X 2 = x 2 X 1 = x 1 ) = which generalizes to arbitrary n p x0 P x0,x 1 P x1,x 2

Markov property P (X n = x n H) = P (X n = x n X 0 = x 0, X 1 = x 1, X 2 = x 2, X n 1 = x n 1 ) = P (X n = x n X n 1 = x n 1 ) Generally the next state depends on the current state and the time In most applications the chain is assumed to be time homogeneous, ie it does not depend on time The only parameters needed are P (X n = j X n 1 = i) = p ij We collect these parameters in a matrix {p ij } The joint probability of the first n occurrences is P(X 0 = x 0, X 1 = x 1, X 2 = x 2, X n = x n ) = p x0 P x0,x 1 P x1,x 2 P xn 1,x n A profuse number of applications Storage/inventory models Telecommunications systems Biological models X n the value attained at time n X n could be The number of cars in stock The number of days since last rainfall The number of passengers booked for a flight See textbook for further examples Example 1: Random walk with two reflecting barriers 0 and N Example 2: Random walk with one reflecting barrier at 0 1 p p 0 0 0 0 q 1 p q p 0 0 0 0 q 1 p q 0 0 0 0 0 0 q 1 p q p 0 0 0 0 q 1 q 1 p p 0 0 0 q 1 p q p 0 0 0 q 1 p q p 0 0 0 q 1 p q p

Example : Random walk with two absorbing barriers 1 0 0 0 0 0 0 0 q 1 p q p 0 0 0 0 0 0 q 1 p q p 0 0 0 0 0 0 q 1 p q p 0 0 0 0 0 0 0 0 q 1 p q p 0 0 0 0 0 0 0 1 The matrix can be finite (if the Markov chain is finite) p 1,1 p 1,2 p 1, p 1,n p 2,1 p 2,2 p 2, p 2,n p,1 p,2 p, p,n p n,1 p n,2 p n, p n,n Two reflecting/absorbing barriers or infinite (if the Markov chain is infinite) p 1,1 p 1,2 p 1, p 1,n p 2,1 p 2,2 p 2, p 2,n p,1 p,2 p, p,n p n,1 p n,2 p n, p n,n At most one barrier The matrix P can be interpreted as the engine that drives the process the statistical descriptor of the quantitative behaviour a collection of discrete probability distributions For each i we have a conditional distribution What is the probability of the next state being j knowing that the current state is i p ij = P (X n = j X n 1 = i) j p ij = 1 We say that P is a stochastic matrix

More definitions and the first properties n step transition probabilities We have defined rules for the behaviour from one value and onwards Boundary conditions specify eg behaviour of X 0 X0 could be certain X 0 = a or random P (X0 = i) = p i Once again we collect the possibly infinite many parameters in a vector p P (X n = j X 0 = i) = P (n) ij the probability of being in j at the n th transition having started in i Once again collected in a matrix P (n) = {P (n) ij } The rows of P (n) can be interpreted like the rows of P We can define a new Markov chain on a larger time scale (P n ) Small example Chapmann Kolmogorov equations P (2) = 1 p p 0 0 q 0 p 0 0 q 0 p 0 0 q 1 q (1 p) 2 + pq (1 p)p p 2 0 q(1 p) 2qp 0 p 2 q 2 0 2qp p(1 q) 0 q 2 (1 q)q (1 q) 2 + qp There is a generalisation of the example above Suppose we start in i at time 0 and wants to get to j at time n + m At some intermediate time n we must be in some state k We apply the law of total probability P (B) = k P (B A k) P (A k ) = k P (X n+m = j X 0 = i) P (X n+m = j X 0 = i, X n = k) P (X n = k X 0 = i)

The probability of X n P (X n+m = j X 0 = i, X n = k) P (X n = k X 0 = i) k by the Markov property we get P (X n+m = j X n = k) P (X n = k X 0 = i) k = k which in matrix formulation is P (m) kj P (n) ik = k P (n) ik P(m) kj The behaviour of the process itself - X n The behaviour conditional on X 0 = i is known (P (n) ij ) Define P (X n = j) = p (n) j with p (n) = {p (n) j } we find p (n) = pp (n) = pp n P (n+m) = P (n) P (m) = P n+m Small example - revisited 1 p p 0 0 q 0 p 0 0 q 0 p 0 0 q 1 q with p = ( 1, 0, 0, 2 ) we get ( 1 p (1) =, 0, 0, 2 ) 1 p p 0 0 q 0 p 0 0 q 0 p = 0 0 q 1 q ( 1 p, p, 2q ) 2(1 q), P 2 = p = ( 1, 0, 0, 2 ), (1 p) 2 + pq (1 p)p p 2 0 q(1 p) 2qp 0 p 2 q 2 0 2qp p(1 q) 0 q 2 (1 q)q (1 q) 2 + qp

First step analysis - setup ( 1 p (2) =, 0, 0, 2 ) (1 p) 2 + pq (1 p)p p 2 0 q(1 p) 2qp 0 p 2 q 2 0 2qp p(1 q) 0 q 2 (1 q)q (1 q) 2 + qp ( (1 p) 2 + pq =, (1 p)p, 4qp ) 2p(1 q), Consider the transition probability matrix 1 0 0 α β γ 0 0 1 Define and T = min {n 0 : X n = 0 or X n = 2} u = P(X T = 0 X 0 = 1) v = E(T X 0 = 1) First step analysis - absorption probability First step analysis - time to absorption u = P(X T = 0 X 0 = 1) 2 = P(X 1 = k X 0 = 1)P(X T = 0 X 0 = 1, X 1 = k) = k=0 2 P(X 1 = k X 0 = 1)P(X T = 0 X 1 = k) k=0 = P 1,0 1 + P 1,1 u + P 1,2 0 v = E(T X 0 = 1) 2 = P(X 1 = k X 0 = 1)E(T X 0 = 1, X 1 = k) k=0 = 1 + 2 P(X 1 = k X 0 = 1)E(T X 0 = k)( NB! k=0 = 1 + P 1,0 0 + P 1,1 v + P 1,2 0 And we find u = P 1,0 1 P 1,1 = α α + γ And we find v = 1 1 P 1,1 = 1 1 β

More than one transient state Leading to 1 0 0 0 P 1,0 P 1,1 P 1,2 P 1, P 2,0 P 2,1 P 2,2 P 2, 0 0 0 1 Here we will need conditional probabilities u i = P(X T = 0 X 0 = i) and conditional expectations v i = E(T X 0 = i) and u 1 = P 1,0 + P 1,1 u 1 + P 1,2 u 2 u 2 = P 2,0 + P 2,1 u 1 + P 2,2 u 2 v 1 = 1 + P 1,1 v 1 + P 1,2 v 2 v 2 = 1 + P 2,1 v 1 + P 2,2 v 2 General finite state Markov chain General absorbing Markov chain Q R 0 I T = min {n 0, X n r} In state j we accumulate reward g(j), w i is expected total reward conditioned on start in state i ( T 1 ) w i = E g(x n ) X 0 = i leading to n=0 w i = g(i) + j P i,j w j

Special cases of general absorbing Markov chain g(i) = 1 expected time to absorption (v i ) g(i) = δ ik expected visits to state k before absorption