Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution of the Markov chain to its stationary measure as time tends to infinity. 1.1 Existence and uniqueness of the stationary measure Definition 1.1 [Stationary measure] Let X be an irreducible Markov chain with countable state space S and transition matrix Π. A measure µ on S is called a stationary measure for X if (µπ)(x) : y S µ(y)π(y, x) µ(x) for all x S, (1.1) or equivalently, µ, Πf µ, f for all bounded f, (1.2) where µ, f x S µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution. The equivalence comes from the fact that µ is uniquely determined by its action on bounded test functions, while µ, Πf µπ, f. Example 1.2 A random walk on Z d regardless of the distribution of its increment has µ 1 as a stationary measure by virtue of the translation invariance of Z d. Any irreducible finite state Markov chain admits a unique stationary distribution, which is a left eigenvector of Π with eigenvalue 1. We are interested in the long time behavior of the Markov chain. If the chain is transient, then for any x, y S, G(x, y) n 0 Πn (x, y) <. In particular, Π n (x, y) 0 as n. This rules out the existence and convergence to a stationary probability distribution. The more interesting cases are the null recurrent and positive recurrent Markov chains. Theorem 1.3 [Existence of a stationary measure for recurrent Markov chains] Let X be an irreducible recurrent Markov chain with countable state space S and transition matrix Π. Then for any x S with τ x inf{n 1 : X n x}, the measure µ(y) : n0 [ τ x 1 P x (X n y, n < τ x ) E x n0 is a stationary measure for X, and y S µ(y) E x[τ x ]. 1 {Xny} ], y S, 1

Remark. In words, µ(y) is the expected number of visits to y before the Markov chain returns to x. Note that µ(x) 1. This is sometimes called the cycle trick. Proof. First we show that µ(y) < for all y S. Since µ(x) 1, let y x. Since the Markov chain is irreducible and recurrent, P y (τ x < τ y ) > 0. Therefore starting from y, the number of visits to y before the chain visits x is geometrically distributed. In particular, the expected number of visits to y before τ x is finite, and so is µ(y). For each y x, µ(y) P x (X n y, n < τ x ) P x (X n 1 z, X n y, n < τ x ) z S P x (X n 1 z, n 1 < τ x )Π(z, y) z S z S µ(z)π(z, y), which verifies the stationarity of µ at all y x. On the other hand, by the recurrence of X and a similar decomposition, 1 P x (τ x n) P x (X n 1 y, n 1 < τ x, X n x) y S y S P x (X n 1 y, n 1 < τ x )Π(y, x) y S µ(y)π(y, x), which verifies the stationarity of µ at x. Theorem 1.4 [Uniqueness of stationary measures for recurrent Markov chains] Let X be an irreducible recurrent Markov chain with countable state space S. Then the stationary measure µ for X is unique up to a constant multiple. Proof. Let µ be the stationary measure defined in Theorem 1.3 with µ(x) 1. Let ν be any stationary measure with ν(x) 1. We have for any y x, ν(y) ν(x)π(x, y) + z 1 x ν(z 1 )Π(z 1, y) (1.3) ν(x)π(x, y) + z 1 x ν(x)π(x, z 1 )Π(z 1, y) + z 1,z 2 x ν(z 2 )Π(z 2, z 1 )Π(z 1, y), where we have substituted (1.3) into itself. Iterating the substitution indefinitely then gives ν(y) Π(x, y) + z 1,,z n x P x (X n y, n < τ x ) µ(y). n 1 Π(x, z 1 ) Π(z i, z i+1 ) Π(z n, y) 2

Now suppose that ν(y) > µ(y) for some y S. By irreducibility, there exists n N with Π n (y, x) > 0. The stationarity of µ and ν implies z S µ(z)π n (z, x) µ(x) 1 ν(x) z S ν(z)π n (z, x). Therefore 0 z S(ν(z) µ(z))π n (z, x) (ν(y) µ(y))π n (y, x) > 0, which is a contradiction. Therefore ν µ. Combining Theorems 1.3 and 1.4 with the observation that transient irreducible Markov chains do not admit stationary probability distributions, we have the following. Corollary 1.5 [Stationary distributions] An irreducible Markov chain admits a stationary probability distribution µ (which is necessarily unique) if and only if it is positive recurrent, in which case µ(x) 1 E x[τ x] for all x S. 1.2 Convergence of the Markov chain We now proceed to the study of the convergence of an irreducible Markov chain, i.e., what is the limit of the probability measure Π n (x, ) as n for each x S? When the chain is transient, we have seen that Π n (x, y) 0 for all x, y S. If the chain is null recurrent, then there is a unique (up to a constant multiple) stationary measure, which has infinite mass. Since Π n (x, ) corresponds to the Markov chain starting with unit mass at x, we expect the measure to spread out and approximate a multiple of the stationary measure, hence Π n (x, y) 0 for all x, y S. If the chain is positive recurrent, then it is natural to expect that Π n (x, y) µ(y), the mass of the unique stationary distribution µ at y. The last statement is almost true, except for the issue of periodicity. To illustrate the problem, take a simple random walk on the Torus S : {0, 1,, 2m} where 0 and 2m are identified. Clearly the Markov chain is irreducible and the uniform distribution on S is the unique stationary distribution. However, Π n (0, ) is supported on the even sites when n is even, and on the odd sites when n is odd. So Π n (0, ) does not converge to the uniform distribution on S. Therefore we first need to address the issue of periodicity. Definition 1.6 [Period of a Markov chain] Let X be an irreducible Markov chain with countable state space S and transition matrix Π. For x S, let D x : {n : Π n (x, x) > 0} and let d x be the greatest common divisor (gcd) of D x. Then d x is independent of x S, which we simply denote by d and call it the period of the Markov chain. When d 1, we say the chain is aperiodic. In the definition above, we have used part of the following result. Lemma 1.7 Let X be an irreducible Markov chain with countable state space S. Then d x d y for all x, y S. Furthermore, for any x S, D x contains all sufficiently large multiples of d x. Proof. By irreducibility, there exist K, L N with Π K (x, y) > 0 and Π L (y, x) > 0. Therefore Π K+L (x, x) Π K (x, y)π L (y, x) > 0, 3

and hence d x (K + L), i.e., d x divides K + L. For any m D y, Π m (y, y) > 0, therefore Π K+L+m Π K (x, y)π m (y, y)π L (y, x) > 0. So d x (K +L+m). Since d x (K +L), we have d x m for all m D y. Therefore d x d y. Similarly we also have d y d x, and hence d x d y. Since d x is the greatest common divisor of D x, it is the gcd of a finite subset n 1,, n k D x. By the properties of gcd, there exist a 1,, a k Z such that k a i n i d x. Moving the terms with negative a i to the RHS above shows that there exists m N with md x, (m + 1)d x D x. For any n m 2, we can write nd x (lm + r)d x (l r)md x + r(m + 1)d x, where r is the remainder of n after diving by m, and l m > r by assumption. Therefore nd x D x for all n m 2, which proves the lemma. We are now ready to state the convergence result for irreducible aperiodic Markov chains. Theorem 1.8 [Convergence of transition kernels] Let X be an irreducible aperiodic Markov chain with countable state space S. If the chain is transient or null recurrent, then lim n Πn (x, y) 0 x, y S. (1.4) If the chain is positive recurrent with stationary distribution µ, then Theorem 1.8 follows from the renewal theorem. lim n Πn (x, y) µ(y) x, y S. (1.5) Theorem 1.9 [Renewal Theorem] Let f be a probability distribution on N { } with mean m n nf(n) [1, ]. Assume further that D : {n 1 : f(n) > 0} has greatest common divisor 1. A renewal process (U n ) n 0 with renewal time distribution f is a homogeneous Markov chain with state space {0, 1, } { } and transition probabilities p(x, x + n) f(n) for all x 0 and p(, ) 1. Then we have lim P 0(U i n for some i N) 1 n m. (1.6) Proof of Theorem 1.8. For a Markov chain X starting from x S, if we let U 0 0 and U n be the successive return times of X to x, then clearly U n is a renewal process with f(n) P x (τ x n), m E x [τ x ], and Π n (x, x) P 0 (U i n for some i N). Equations (1.4) (1.5) with y x then follows from the renewal theorem since E x [τ x ] when X is transient or null recurrent, and µ(x) 1 E 1 x[τ x] m when the chain is positive recurrent. When x y, note that n Π n (x, y) P x (τ y i)π n i (y, y). Equations (1.4) (1.5) then follow from the case for x y and the dominated convergence theorem. 4

Remark 1.10 Not surprisingly, the renewal theorem can conversely be deduced from Theorem 1.8. Given a renewal process U on {0, 1, } with renewal time distribution f on N { }, we can construct an irreducible aperiodic Markov chain X on {0, 1, } { } as follows. Let Π(0, l) f(l + 1) for l {0, 1, } { }, Π(i, i 1) 1 for i 1, and Π(, ) 1. Then the successive return times of X to 0 is distributed as U, and P 0 (U i n for some i N) is precisely Π n (0, 0). Since m nf(n) E 0[τ 0 ], (1.6) follows from (1.4) (1.5). Proof of Theorem 1.9. If f( ) > 0, then τ < almost surely for the Markov chain U, and (1.6) clearly holds. From now on, we assume f( ) 0, so that n N f(n) 1. Let p(n) P 0 (U i n for some i N). By decomposing in terms of the first renewal time, p(n) satisfies the recursive relation (known as the renewal equation) n p(n) f(i)p(n i). (1.7) Summing over 1 n N, we obtain p(n) (f(1) + + f(n)) + (f(1) + + f(n 1))p(1) + + f(1)p(n 1) p(n n) n f(i) p(n n)(1 T (n + 1)), where T (n + 1) in+1 f(i). Rearranging terms then gives T (n)p(n n + 1) 1 T (N + 1) f(n). (1.8) Note that T (n) m. By dominated convergence, if lim n p(n) exists, then it must be 1 m. Let a lim sup n p(n), which is bounded by 1 since p(n) 1. By Cantor diagonalization, we can find a sequence (n j ) j N along which p(n j + i) q(i) for all i Z, with q(0) a. We claim that q a. Assuming the claim, then taking the limit N in (1.8) along the sequence n j shows that a 0 when m by Fatou s lemma, and a 1 m when m < by dominated convergence. It remains to verify q a. sequence n j + k in (1.7) gives In particular, Applying the dominated convergence theorem along the q(k) a f(i)q(k i). (1.9) f(i)q( i). Since by definition of a, q( i) a for all i Z, we have q( i) a for all i D : {n N : f(n) > 0}. The same argument applied to (1.9) shows that q( i) a for all i 2 D : {n x + y : x, y D}, and inductively, for all i k D, k N, with k D defined analogously. Since the gcd of D is 1, the proof of Lemma 1.7 shows that q( i) a for all i sufficiently large. Substituting these values of q into (1.9) shows that q a. The same argument can be used to show that lim inf p(n) 1 m when m <, which proves Theorem 1.9. 5