Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) PDF Free Download

Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321

Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process {X n } = {X n : n =0, 1, 2,...}. The process if defined by the collection of all joint distributions of order m, F Xn1,...,X n m (x 1,...,x m ) for all instants {n 1,...,n m } and for all m =1, 2, 3,... If the process has the property that, given the present, the future is independent of the past, we say that the process satisfies the Markov property, or, it is a Markov chain. Furthermore, we are mainly interested in the case where X t takes on values in some countable set S, called state space. Copyright G. Caire (Sample Lectures) 322

Markov chains Definition 44. property: The process {X n } is a Markov chain if it satisfies the Markov P(X n = j X 0 = x 0,...,X n 1 = i) =P(X n = j X n 1 = i) for all i, j, x 0,...,x n 2 2 S and for all n =1, 2, 3,... The Markov property implies that: P(X nk = j X n0 = x 0,...,X nk 1 = i) =P(X nk = j X nk 1 = i) for all k, n, all n 0 apple n 1 apple apple n k 1 apple n k and all i, j, x 0,...,x nk 1. Also P(X n+m = j X 0 = x 0,...,X m = i) =P(X n+m = j X m = i) Copyright G. Caire (Sample Lectures) 323

Transition probability The evolution of a markov chain is defined by its transition probability, defined by P(X n+1 = j X n = i) (where without loss of generality we may assume that S is an integer set. Definition 45. The chain {X n } is called homogeneous if its transition probabilities do not depend on the time, i.e., P(X n+1 = j X n = i) =P(X 1 = j X 0 = i) for all n, i, j. The transition probability matrix P =[p i,j ] is the S S matrix of the transition probabilities, such that p i,j = P(X n+1 = j X n = i) Copyright G. Caire (Sample Lectures) 324

Property of the transition matrix Theorem 46. The transition matrix P of a Markov chain is a stochastic matrix, that is, it has non-negative elements such that X p i,j =1 j2s (sum of the elements on each row yields 1) In order to characterize the probability for n steps transitions, we introduce the n-step transition probability matrix with elements p i,j (m, m + n) =P(X m+n = j X m = i) By homogeneity, we have that P(m, m + 1) = P. Furthermore, P(m, m + n) = P (n) does not depend on m. Copyright G. Caire (Sample Lectures) 325

Chapman-Kolmogorov equation Theorem 47. p i,j (m, m + n + r) = X k p i,k (m, m + n)p k,j (m + n, m + n + r) Therefore, P(m, m + n + r) =P(m, m + n)p(m + n, m + n + r). It follows that for homogeneous Markov chains, P(m, m + n) =P n, i.e., P (n) = P n. Copyright G. Caire (Sample Lectures) 326

Long-term evolution We let u(n) denote the pmf of X n, that is, for each n we have that u(n) is a vector with S non-negative components that sum to 1. Lemma 33. u(m + n) =u(m)p n, and hence u(n) =u(0)p n. This describes the pmf of X n in terms of the initial state pmf u(0). Copyright G. Caire (Sample Lectures) 327

First step and last step conditioning We have a MC with transition matrix P. Let A S denote a subset of states. For i/2 A, we are interested in the probability P(X k 2 A for some k =1,...,m X 0 = i) =P ([ m k=1{x k 2 A} X 0 = i) Define the stopping time N =min{n 1:X n 2 A}, and the new MC W n = Xn n<n a n N where a is a special symbol. Copyright G. Caire (Sample Lectures) 328

{W n } has transition probability matrix q i,j = p i,j if i/2 A, j /2 A q i,a = X p i,j if i/2 A j2a q a,a = 1 Notice that the original MC enters a state in A by time m if and only if W m = a, then P ([ m k=1{x k 2 A} X 0 = i) =P(W m = a W 0 = i) =q i,a (m) =[Q m ] i,a Copyright G. Caire (Sample Lectures) 329

Now, we are interested in the probability of never entering the subset of states A in the times 0,...,m, i.e., for i, j /2 A we wish to compute P {X m = j} \ m 1 k=1 {X k /2 A} X 0 = i The above condition is equivalent to W m = j conditionally on W 0 = i, hence P {X m = j} \ m 1 k=1 {X k /2 A} X 0 = i = q i,j (m) Copyright G. Caire (Sample Lectures) 330

Next, we wish to consider the first entrance probability, for i/2 A and j 2 A P {X m = j} \ m 1 k=1 {X k /2 A} X 0 = i = = X r/2a = X r/2a P {X m = j} \ {X m 1 = r} \ m 2 k=1 {X k /2 A} X 0 = i P (X m = j X m 1 = r) P {X m 1 = r} \ m 2 k=1 {X k /2 A} X 0 = i = X r/2a p r,j q i,r (m 1) When i 2 A and j/2 A, we can determine P {X m = j} \ m 1 k=1 {X k /2 A} X 0 = i Copyright G. Caire (Sample Lectures) 331

By conditioning on the first transition: P {X m = j} \ m 1 k=1 {X k /2 A} X 0 = i = = X r/2a = X r/2a P {X m = j} \ m 1 k=2 {X k /2 A} \ {X 1 = r} X 0 = i P {X m = j} \ m 1 k=2 {X k /2 A} X 1 = r P(X 1 = r X 0 = i) = X r/2a q r,j (m 1)p i,r Copyright G. Caire (Sample Lectures) 332

Finally, we can calculate the conditional probability of X m = j given that the chains starts at X 0 = i and has not entered the subset A in any time till time m. For i, j /2 A, we have P(X m = j {X 0 = i} \ m k=1 {X k /2 A}) = = P({X m = j} \ m k=1 {X k /2 A} X 0 = i) P(\ m k=1 {X k /2 A} X 0 = i) = = P({X m = j} \ m 1 k=1 {X k /2 A} X 0 = i) Pr/2A P({X m = r} \ m 1 k=1 {X k /2 A} X 0 = i) q i,j (m) P r/2a q i,r(m) Copyright G. Caire (Sample Lectures) 333

Classification of states (1) Definition 46. A state i 2 S is called persistent (or recurrent) if P(X n = i for some n 1 X 0 = i) =1 Otherwise, if the above probability is strictly less than 1, the state is called transient. We are interested in the first passage probability f i,j (n) =P(X 1 6= j, X 2 6= j,..., X n 1 6= j, X n = j X 0 = i) We define f i,j = P 1 n=1 f i,j(n). Notice: state j is persistent if and only if f j,j =1. Copyright G. Caire (Sample Lectures) 334

We define the generating functions P i,j (s) = 1X p i,j (n)s n, F i,j (s) = n=0 1X f i,j (n)s n n=0 with the conditions: p i,j (0) = i,j, f i,j (0) = 0 for all i, j 2 S. We shall assume s < 1, since in this way the generating functions converge, and occasionally evaluate limits for s " 1 using Abel s theorem. Notice that f i,j = F i,j (1). Theorem 48. a) P i,i (s) =1+P i,i (s)f i,i (s), b) P i,j (s) =F i,j (s)p j,j (s), for i 6= j. Copyright G. Caire (Sample Lectures) 335

Classification of states (2) Corollary 6. a) State j is persistent if P n p j,j(n) = 1. If this holds, then P n p i,j(n) =1 for all states i such that f i,j > 0. b) State j is transient if P n p j,j(n) < 1. If this holds then P n p i,j(n) < 1 for all i. c) If j is transient, then p i,j (n)! 0 as n!1. Let N(i) denote the number of times a chain visit state i given that its starting point is i. It is intuitively clear that P(N(i) =1) = 1 i is persistent 0 i is transient Let T j =min{n : X n = j} denote the time to the first visit to state j. This is a random variable, with the convention that T j = 1 is the visit never occurs. We have that P(T j = 1 X 0 = j) > 0 if and only if j is transient an in this case E[T j X 0 = j] =1. Copyright G. Caire (Sample Lectures) 336

We define the mean recurrence time µ j of a state j as P µ j = E[T j X 0 = j] = n nf j,j(n) j is persistent 1 j is transient Notice: µ j may be infinity even though the state is persistent! Definition 47. A persistent state j is called null if µ j = 1 and non-null or positive if µ j < 1. Theorem 49. A persistent state i is null if and only if p i,i (n)! 0 as n!1. If this holds, then p j,i (n)! 0 for all j. Copyright G. Caire (Sample Lectures) 337

Let d(i) denote the period of state i, defined as d(i) =GCD{n : p i,i (n) > 0} We call i periodic, if d(i) > 1, and aperiodic if d(i) =1. Definition 48. A state is called ergodic if it is persistent, non-null and aperiodic. Copyright G. Caire (Sample Lectures) 338

Classification of chains (1) We say that state i communicates with state j (written i! j) if p i,j (n) > 0 for some n 0. In this case, state j can be reached from state i with positive probability. We say that state i and j intercommunicate if i! j and j! i. In this case, we write i $ j. Clearly, i $ i, since p i,i (0) = 1. Also, i! j if and only if f i,j > 0. Intercommunication $ induces an equivalence relation on the state space. In fact, if i $ j and j $ k then i $ k. Hence, S can be partitioned into equivalence classes of mutually intercommunicating states. Copyright G. Caire (Sample Lectures) 339

Theorem 50. If i $ j then: a) i and j have the same period; b) i is transient if and only if j is transient; c) i is null persistent if and only if j is null persistent; Copyright G. Caire (Sample Lectures) 340

Classification of chains (2) Definition 49. A set C S of states is called: a) closed if p i,j =0for all i 2 C and j/2 C. b) irreducible if i $ j for all i, j 2 C. Once the chain enters a closed set of states, it will never leave it (trapped). A closed set of cardinality 1 (single state) is called absorbing state. For example, the state 0 in a branching process is an absorbing state. Equivalence classes with respect to the relation $ are irrducible. We call an irreducible set C aperiodic, persistent null, persistent etc.. if all states in the set have the property. Copyright G. Caire (Sample Lectures) 341

Theorem 51. Decomposition theorem. The state space S can be uniquely partitioned as S = T [ C 1 [ C 2 [ where T is the set of transient states, and C i are irreducible closed sets of persistent states. Copyright G. Caire (Sample Lectures) 342

Qualitative behavior of Markov chains In light of Theorem 51 we can identify the following behaviors: If the chain starts in some state i 2 C r, then it stays in C r forever. If the chain starts in T, then either it stays in T forever or eventually will end up into some C r. If S is finite (finite-state Markov chain), then the first possibility cannot occur: Lemma 34. If S is finite, then at least one state is persistent and all persistent states are non-null. Copyright G. Caire (Sample Lectures) 343

Example Let S = {1, 2, 3, 4, 5, 6} and consider the transition matrix P = 2 6 4 1 1 2 1 3 4 1 1 1 4 4 4 1 1 4 0 4 2 0 0 0 0 4 0 0 0 0 1 4 0 0 1 1 4 0 4 1 0 0 0 0 1 2 0 0 0 0 1 2 2 1 2 3 7 5 Copyright G. Caire (Sample Lectures) 344

Stationary distribution Definition 50. The vector is called a stationary distribution of the chain if it has entries { j : j 2 S} such that: a) j 0 for all j, and P j2s j =1. b) it satisfies = P, that is, j = P i ip i,j, for all j 2 S. This is called stationary distribution since if X 0 is distributed with u(0) =, then all X n will have the same distribution, in fact u(n) =u(0)p n = P n = PP n 1 = P n 1 = = Given the classification of chains and the decomposition theorem, we shall assume that the chain is irreducible, that is, its state space is formed by a single equivalence class of intercommunicating (persistent) states C, or by the class of transient states T. Copyright G. Caire (Sample Lectures) 345

Stationary distribution and mean recurrence time Theorem 52. A irreducible chain has a stationary distribution if and only if all states are non-null persistent. In this case, is unique and satisfies j = 1 µ j, where µ j is the mean recurrence time of state j. Copyright G. Caire (Sample Lectures) 346

Solution of the linear equation x = xp Fix a state k 2 S. Let i (k) denote the mean number of visits of the chain to state i between two successive visits to state k. Defining N i = 1X n=1 I {Xn =i}\{t k n} where T k is the time of first passage to state k, we have i (k) =E[N i X 0 = k] = 1X P(X n = i, T k n X 0 = k) n=1 Copyright G. Caire (Sample Lectures) 347

Since k is persistent, then k (k) = = 1X P(X n = k, T k n X 0 = k) n=1 1X P(T k = n X 0 = k) = n=1 n=1 1X f k,k (n) =1 It is also clear that T k = P i2s N i, since the time between two visits to state k must be spent somewhere else. Taking expectations, we find µ k = X i2s i (k) this shows that the vector (k) ={ i (k) :i 2 S} contains the terms whose sum is equal to the mean recurrence time of state k, denoted as usual by µ k. Copyright G. Caire (Sample Lectures) 348

Lemma 35. For any state k of an irreducible persistent chain, the vector (k) satisfies i (k) < 1 for all i and furthermore (k) = (k)p. For an irreducible persistent chain, we have proved that the equation x = xp has a solution (k) with non-negative entries, that sum to µ k. If state k is non-null, then µ k < 1, and therefore the vector = (k)/µ k is non-negative, with sum 1, and solution of the equation x = xp. It follows that = (k)/µ k is a stationary distribution. This shows that every non-null persistent irreducible chain has a stationary distribution, summarized as follows: Theorem 53. If the chain is irreducible and persistent, there exists a positive root x of the equation x = xp which is unique up to a multiplicative constant. The chain is non-null if P i x i < 1 and null if P i x i = 1. Copyright G. Caire (Sample Lectures) 349

Example: irreducible class of the previous example Let S = {1, 2} and consider the transition matrix P = " 1 2 1 4 1 2 3 4 # Copyright G. Caire (Sample Lectures) 350

Criterion for non-null persistency of irreducible chains Consider an irreducible chain (notice: irreducibility must be checked directly by inspection). In order to check if the chain is non-null persistent, we just have to look for a stationary distribution. Since Theorem 52 is a necessary and sufficient condition, if we find, then we conclude that the chain is non-null persistent. Copyright G. Caire (Sample Lectures) 351

Criterion for transience of irreducible chains Theorem 54. Let s 2 S be a state of an irreducible chain. The chain is transient if an only if there exists a non-zero solution {y j : j 6= s} satisfying y j apple 1 for all j, to the system of equations y i = X j:j6=s p i,j y j, 8 i 6= s Copyright G. Caire (Sample Lectures) 352

Criterion for persistency and null persistency It follows from Theorem 54 that an irreducible chain is persistent if and only if the only bounded solution to the system of equations in Theorem 54 is the zero vector y j =0for all j. If this happens, but the chain has no stationary distribution, then we conclude that the chain is null persistent. Another condition for null persistency is given as follows: Theorem 55. Let s 2 S be a state of an irreducible chain with S = {0, 1, 2,...}. The chain is persistent if there exists a solution {y j : j 6= s} to the system of inequalities X p i,j y j, 8 i 6= s y i such that y i!1as i!1. j:j6=s Copyright G. Caire (Sample Lectures) 353

Ergodic behavior of Markov chains We are interested in the limit of the n-th order transition probabilities, p i,j (n), for large n. Periodicity yields some difficulties, as the following examples shows: Example: consider S = {1, 2}, with p 1,2 = p 2,1 =1. Then, p 1,1 (n) =p 2,2 (n) = 0 if n is odd 1 if n is even It is clear that if states are periodic, the sequence of n-th order transition probabilities may not converge. Copyright G. Caire (Sample Lectures) 354

Ergodic theorem for Markov chains Theorem 56. For an irreducible aperiodic chain, we have that lim p i,j(n) = 1 8 i, j 2 S n!1 µ j Remarks: If the chain is transient or null-persistent, then p i,j (n)! 0 for all i, j since µ j = 1. If the chain is non-null persistent, then p i,j (n)! 1 µ j = j > 0, independent of i, where is the unique stationary distribution. From Theorem 56 we see that in any case, for an irreducible aperiodic chain, the limit of p i,j (n) is independent of X 0 = i. Copyright G. Caire (Sample Lectures) 355

Therefore, the chain forgets its starting point. It follows that P(X n = j) = X i P(X 0 = i)p i,j (n)! 1 µ j irrespectively of the initial pmf vector u(0) = {P(X 0 = i),i2 S}. If the chain {X n } is periodic, it turns out that a sampling of the chain at integer multiples of the period {Y n = X dn : n 0} is aperiodic. In this case, letting q j,j (n) =P(Y n = j Y 0 = j), it turns out that q j,j (n) =P(X nd = j X 0 = j) =p j,j (nd)! d µ j or, equivalently, that the mean recurrence time of the new chain {Y n } is equal to 1/d times the mean recurrence time of the original chain. Copyright G. Caire (Sample Lectures) 356

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321