Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Similar documents
Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Modern Discrete Probability Spectral Techniques

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Positive and null recurrent-branching Process

Markov Chains and Stochastic Sampling

MATH 56A SPRING 2008 STOCHASTIC PROCESSES 65

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Necessary and sufficient conditions for strong R-positivity

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

STOCHASTIC PROCESSES Basic notions

Applied Stochastic Processes

Lecture 9 Classification of States

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Markov Chains, Stochastic Processes, and Matrix Decompositions

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

MARKOV CHAINS AND HIDDEN MARKOV MODELS

11.4. RECURRENCE AND TRANSIENCE 93

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Homework set 3 - Solutions

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10

Markov Chains for Everybody

MARKOV CHAINS AND MIXING TIMES

µ n 1 (v )z n P (v, )

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 2. Markov Chains. Introduction

ABSTRACT MARKOV CHAINS, RANDOM WALKS, AND CARD SHUFFLING. Nolan Outlaw. May 2015

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

P(X 0 = j 0,... X nk = j k )

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

RANDOM WALKS. Course: Spring 2016 Lecture notes updated: May 2, Contents

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Markov Chains, Random Walks on Graphs, and the Laplacian

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS

25.1 Ergodicity and Metric Transitivity

Non-homogeneous random walks on a semi-infinite strip

Convex Optimization CMU-10725

Stochastic Processes (Week 6)

2 Discrete-Time Markov Chains

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

Classification of Countable State Markov Chains

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution

Three hours THE UNIVERSITY OF MANCHESTER. 24th January

A note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University

1 Random walks: an introduction

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Convergence Rate of Markov Chains

12 Markov chains The Markov property

Markov Chains on Countable State Space

The Theory behind PageRank

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Some Definition and Example of Markov Chain

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

RECURRENCE IN COUNTABLE STATE MARKOV CHAINS

arxiv: v1 [math.pr] 6 Jan 2014

Stochastic Models: Markov Chains and their Generalizations

Advanced Computer Networks Lecture 2. Markov Processes

Ergodic Properties of Markov Processes

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

STA 294: Stochastic Processes & Bayesian Nonparametrics

LECTURE 3. Last time:

88 CONTINUOUS MARKOV CHAINS

Markov Processes on Discrete State Spaces

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

1 Continuous-time chains, finite state space

Markov chain Monte Carlo

Birth-death chain models (countable state)

Examples of Countable State Markov Chains Thursday, October 16, :12 PM

Prof. Ila Varma HW 8 Solutions MATH 109. A B, h(i) := g(i n) if i > n. h : Z + f((i + 1)/2) if i is odd, g(i/2) if i is even.

Markov Chains (Part 3)

Chapter 2: Markov Chains and Queues in Discrete Time

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

ISyE 6761 (Fall 2016) Stochastic Processes I

Probability & Computing

Markov chains. Randomness and Computation. Markov chains. Markov processes

Lecture 5: Random Walks and Markov Chain

RENEWAL THEORY STEVEN P. LALLEY UNIVERSITY OF CHICAGO. X i

Probabilistic Aspects of Computer Science: Markovian Models

Stochastic Processes

Summary of Results on Markov Chains. Abstract

Math 4121 Spring 2012 Weaver. Measure Theory. 1. σ-algebras

MATH FINAL EXAM REVIEW HINTS

Practical conditions on Markov chains for weak convergence of tail empirical processes

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

MARKOV CHAIN MONTE CARLO

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

Introduction to Machine Learning CMU-10701

Introduction to Markov Chains and Riffle Shuffling

1.2. Markov Chains. Before we define Markov process, we must define stochastic processes.

SMSTC (2007/08) Probability.

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

A note on transitive topological Markov chains of given entropy and period

Transcription:

Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution of the Markov chain to its stationary measure as time tends to infinity. 1.1 Existence and uniqueness of the stationary measure Definition 1.1 [Stationary measure] Let X be an irreducible Markov chain with countable state space S and transition matrix Π. A measure µ on S is called a stationary measure for X if (µπ)(x) : y S µ(y)π(y, x) µ(x) for all x S, (1.1) or equivalently, µ, Πf µ, f for all bounded f, (1.2) where µ, f x S µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution. The equivalence comes from the fact that µ is uniquely determined by its action on bounded test functions, while µ, Πf µπ, f. Example 1.2 A random walk on Z d regardless of the distribution of its increment has µ 1 as a stationary measure by virtue of the translation invariance of Z d. Any irreducible finite state Markov chain admits a unique stationary distribution, which is a left eigenvector of Π with eigenvalue 1. We are interested in the long time behavior of the Markov chain. If the chain is transient, then for any x, y S, G(x, y) n 0 Πn (x, y) <. In particular, Π n (x, y) 0 as n. This rules out the existence and convergence to a stationary probability distribution. The more interesting cases are the null recurrent and positive recurrent Markov chains. Theorem 1.3 [Existence of a stationary measure for recurrent Markov chains] Let X be an irreducible recurrent Markov chain with countable state space S and transition matrix Π. Then for any x S with τ x inf{n 1 : X n x}, the measure µ(y) : n0 [ τ x 1 P x (X n y, n < τ x ) E x n0 is a stationary measure for X, and y S µ(y) E x[τ x ]. 1 {Xny} ], y S, 1

Remark. In words, µ(y) is the expected number of visits to y before the Markov chain returns to x. Note that µ(x) 1. This is sometimes called the cycle trick. Proof. First we show that µ(y) < for all y S. Since µ(x) 1, let y x. Since the Markov chain is irreducible and recurrent, P y (τ x < τ y ) > 0. Therefore starting from y, the number of visits to y before the chain visits x is geometrically distributed. In particular, the expected number of visits to y before τ x is finite, and so is µ(y). For each y x, µ(y) P x (X n y, n < τ x ) P x (X n 1 z, X n y, n < τ x ) z S P x (X n 1 z, n 1 < τ x )Π(z, y) z S z S µ(z)π(z, y), which verifies the stationarity of µ at all y x. On the other hand, by the recurrence of X and a similar decomposition, 1 P x (τ x n) P x (X n 1 y, n 1 < τ x, X n x) y S y S P x (X n 1 y, n 1 < τ x )Π(y, x) y S µ(y)π(y, x), which verifies the stationarity of µ at x. Theorem 1.4 [Uniqueness of stationary measures for recurrent Markov chains] Let X be an irreducible recurrent Markov chain with countable state space S. Then the stationary measure µ for X is unique up to a constant multiple. Proof. Let µ be the stationary measure defined in Theorem 1.3 with µ(x) 1. Let ν be any stationary measure with ν(x) 1. We have for any y x, ν(y) ν(x)π(x, y) + z 1 x ν(z 1 )Π(z 1, y) (1.3) ν(x)π(x, y) + z 1 x ν(x)π(x, z 1 )Π(z 1, y) + z 1,z 2 x ν(z 2 )Π(z 2, z 1 )Π(z 1, y), where we have substituted (1.3) into itself. Iterating the substitution indefinitely then gives ν(y) Π(x, y) + z 1,,z n x P x (X n y, n < τ x ) µ(y). n 1 Π(x, z 1 ) Π(z i, z i+1 ) Π(z n, y) 2

Now suppose that ν(y) > µ(y) for some y S. By irreducibility, there exists n N with Π n (y, x) > 0. The stationarity of µ and ν implies z S µ(z)π n (z, x) µ(x) 1 ν(x) z S ν(z)π n (z, x). Therefore 0 z S(ν(z) µ(z))π n (z, x) (ν(y) µ(y))π n (y, x) > 0, which is a contradiction. Therefore ν µ. Combining Theorems 1.3 and 1.4 with the observation that transient irreducible Markov chains do not admit stationary probability distributions, we have the following. Corollary 1.5 [Stationary distributions] An irreducible Markov chain admits a stationary probability distribution µ (which is necessarily unique) if and only if it is positive recurrent, in which case µ(x) 1 E x[τ x] for all x S. 1.2 Convergence of the Markov chain We now proceed to the study of the convergence of an irreducible Markov chain, i.e., what is the limit of the probability measure Π n (x, ) as n for each x S? When the chain is transient, we have seen that Π n (x, y) 0 for all x, y S. If the chain is null recurrent, then there is a unique (up to a constant multiple) stationary measure, which has infinite mass. Since Π n (x, ) corresponds to the Markov chain starting with unit mass at x, we expect the measure to spread out and approximate a multiple of the stationary measure, hence Π n (x, y) 0 for all x, y S. If the chain is positive recurrent, then it is natural to expect that Π n (x, y) µ(y), the mass of the unique stationary distribution µ at y. The last statement is almost true, except for the issue of periodicity. To illustrate the problem, take a simple random walk on the Torus S : {0, 1,, 2m} where 0 and 2m are identified. Clearly the Markov chain is irreducible and the uniform distribution on S is the unique stationary distribution. However, Π n (0, ) is supported on the even sites when n is even, and on the odd sites when n is odd. So Π n (0, ) does not converge to the uniform distribution on S. Therefore we first need to address the issue of periodicity. Definition 1.6 [Period of a Markov chain] Let X be an irreducible Markov chain with countable state space S and transition matrix Π. For x S, let D x : {n : Π n (x, x) > 0} and let d x be the greatest common divisor (gcd) of D x. Then d x is independent of x S, which we simply denote by d and call it the period of the Markov chain. When d 1, we say the chain is aperiodic. In the definition above, we have used part of the following result. Lemma 1.7 Let X be an irreducible Markov chain with countable state space S. Then d x d y for all x, y S. Furthermore, for any x S, D x contains all sufficiently large multiples of d x. Proof. By irreducibility, there exist K, L N with Π K (x, y) > 0 and Π L (y, x) > 0. Therefore Π K+L (x, x) Π K (x, y)π L (y, x) > 0, 3

and hence d x (K + L), i.e., d x divides K + L. For any m D y, Π m (y, y) > 0, therefore Π K+L+m Π K (x, y)π m (y, y)π L (y, x) > 0. So d x (K +L+m). Since d x (K +L), we have d x m for all m D y. Therefore d x d y. Similarly we also have d y d x, and hence d x d y. Since d x is the greatest common divisor of D x, it is the gcd of a finite subset n 1,, n k D x. By the properties of gcd, there exist a 1,, a k Z such that k a i n i d x. Moving the terms with negative a i to the RHS above shows that there exists m N with md x, (m + 1)d x D x. For any n m 2, we can write nd x (lm + r)d x (l r)md x + r(m + 1)d x, where r is the remainder of n after diving by m, and l m > r by assumption. Therefore nd x D x for all n m 2, which proves the lemma. We are now ready to state the convergence result for irreducible aperiodic Markov chains. Theorem 1.8 [Convergence of transition kernels] Let X be an irreducible aperiodic Markov chain with countable state space S. If the chain is transient or null recurrent, then lim n Πn (x, y) 0 x, y S. (1.4) If the chain is positive recurrent with stationary distribution µ, then Theorem 1.8 follows from the renewal theorem. lim n Πn (x, y) µ(y) x, y S. (1.5) Theorem 1.9 [Renewal Theorem] Let f be a probability distribution on N { } with mean m n nf(n) [1, ]. Assume further that D : {n 1 : f(n) > 0} has greatest common divisor 1. A renewal process (U n ) n 0 with renewal time distribution f is a homogeneous Markov chain with state space {0, 1, } { } and transition probabilities p(x, x + n) f(n) for all x 0 and p(, ) 1. Then we have lim P 0(U i n for some i N) 1 n m. (1.6) Proof of Theorem 1.8. For a Markov chain X starting from x S, if we let U 0 0 and U n be the successive return times of X to x, then clearly U n is a renewal process with f(n) P x (τ x n), m E x [τ x ], and Π n (x, x) P 0 (U i n for some i N). Equations (1.4) (1.5) with y x then follows from the renewal theorem since E x [τ x ] when X is transient or null recurrent, and µ(x) 1 E 1 x[τ x] m when the chain is positive recurrent. When x y, note that n Π n (x, y) P x (τ y i)π n i (y, y). Equations (1.4) (1.5) then follow from the case for x y and the dominated convergence theorem. 4

Remark 1.10 Not surprisingly, the renewal theorem can conversely be deduced from Theorem 1.8. Given a renewal process U on {0, 1, } with renewal time distribution f on N { }, we can construct an irreducible aperiodic Markov chain X on {0, 1, } { } as follows. Let Π(0, l) f(l + 1) for l {0, 1, } { }, Π(i, i 1) 1 for i 1, and Π(, ) 1. Then the successive return times of X to 0 is distributed as U, and P 0 (U i n for some i N) is precisely Π n (0, 0). Since m nf(n) E 0[τ 0 ], (1.6) follows from (1.4) (1.5). Proof of Theorem 1.9. If f( ) > 0, then τ < almost surely for the Markov chain U, and (1.6) clearly holds. From now on, we assume f( ) 0, so that n N f(n) 1. Let p(n) P 0 (U i n for some i N). By decomposing in terms of the first renewal time, p(n) satisfies the recursive relation (known as the renewal equation) n p(n) f(i)p(n i). (1.7) Summing over 1 n N, we obtain p(n) (f(1) + + f(n)) + (f(1) + + f(n 1))p(1) + + f(1)p(n 1) p(n n) n f(i) p(n n)(1 T (n + 1)), where T (n + 1) in+1 f(i). Rearranging terms then gives T (n)p(n n + 1) 1 T (N + 1) f(n). (1.8) Note that T (n) m. By dominated convergence, if lim n p(n) exists, then it must be 1 m. Let a lim sup n p(n), which is bounded by 1 since p(n) 1. By Cantor diagonalization, we can find a sequence (n j ) j N along which p(n j + i) q(i) for all i Z, with q(0) a. We claim that q a. Assuming the claim, then taking the limit N in (1.8) along the sequence n j shows that a 0 when m by Fatou s lemma, and a 1 m when m < by dominated convergence. It remains to verify q a. sequence n j + k in (1.7) gives In particular, Applying the dominated convergence theorem along the q(k) a f(i)q(k i). (1.9) f(i)q( i). Since by definition of a, q( i) a for all i Z, we have q( i) a for all i D : {n N : f(n) > 0}. The same argument applied to (1.9) shows that q( i) a for all i 2 D : {n x + y : x, y D}, and inductively, for all i k D, k N, with k D defined analogously. Since the gcd of D is 1, the proof of Lemma 1.7 shows that q( i) a for all i sufficiently large. Substituting these values of q into (1.9) shows that q a. The same argument can be used to show that lim inf p(n) 1 m when m <, which proves Theorem 1.9. 5