Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if for F n := σ(x 0,, X n ), P(X n+1 A F n ) = P(X n+1 A X n ) n 0 and A S. (1.1) If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. Examples: 1. Random walks; 2. Branching processes; 3. Polya s urn. Remark. Note that any stochastic process (X n ) n 0 taking values in S can be turned into a Markov chain if we enlarge the state space from S to n N Sn, and change the process from (X n ) n 0 to ( X n ) n 0 with X n = (X 0, X 1,, X n ) S n+1, namely the process becomes Markov if we take its entire past to be its present state. A more concrete way of characterizing Markov chains is by transition probabilities. Definition 1.2 Markov chain transition probabilities] A function p : S S 0, 1] is called a transition probability, if (i) For each x S, A p(x, A) is a probability measure on (S, S). (ii) For each A S, x p(x, A) is a measurable function on (S, S). We say a Markov chain (X n ) n 0 has transition probabilities p n if P(X n A F n 1 ) = p n (X n 1, A) (1.2) almost surely for all n N and A S. If p n p for all n N, then we call (X n ) n 0 a time-homogeneous Markov chain, or a Markov chain with stationary transition probabilities. If the underlying state space (S, S) is nice, then the distribution of a Markov chain X satisfying (1.1) can be characterized by the initial distribution µ of X 0 and the transition probabilities (p n ) n N. In particular, if S is a complete separable metric space with Borel σ- algebra S, then regular conditional probability distributions always exist, which guarantees the existence of transition probabilities p n. Conversely, a given family of transition probabilities p n and an initial law µ for X 0 uniquely determine a consistent family of finite dimensional distributions: P µ (X i A i, 0 i n) = µ(dx 0 ) p 1 (x 0, dx 1 ) p n (x n 1, dx n ), (1.3) A 0 A 1 A n which are the finite-dimensional distributions of (X n ) n 0. When (S, S) is a Polish space with Borel σ-algebra S, by Kolmogorov s extension theorem (see 1, Section A.7]), the law of (X n ) n 0, regarded as a random variable taking values in (S N 0, S N 0 ), is uniquely determined. Here N 0 := {0} N and S N 0 is the Borel σ-algebra generated by the product topology on S N 0. 1

Theorem 1.3 Characterization of Markov chains via transition probabilities] Suppose that (S, S) is a Polish space equipped with the Borel σ-algebra. Then to any collection of transition probabilities p n : S S 0, 1] and any probability measure µ on (S, S), there corresponds a Markov chain (X n ) n 0 with state space (S, S), initial distribution µ, and finite dimensional distributions given as in (1.3). Conversely if (X n ) n 0 is a Markov chain with initial distribution µ, then we can construct a family of transition probabilities (p n ) n N such that the finite dimensional distributions of X satisfy (1.3). From (1.3), it is also easily seen that P µ ( ) = P x ( )µ(dx), where P x denotes the law of the Markov chain starting at X 0 = x. Remark. When there is no other randomness involved besides the Markov chain (X n ) n 0, it is customary to let (S N 0, S N 0, P µ ) be the canonical probability space for X with initial distribution µ. From the one-step transition probabilities (p n ) n N, we can easily construct the transition probabilities between times k < l, i.e., P(X l A F k ). Define p k,l (x, A) = p k+1 (x, dy k+1 )p k+2 (y k+1, dy k+2 ) p l (y l 1, A). It is an easy exercise to show that S S Theorem 1.4 Chapman-Kolmogorov equations] The transition probabilities (p k,m ) 0 k<m satisfy the relations p k,n (x, A) = p k,m (x, dy)p m,n (y, A) (1.4) for all k < m < n, x S and A S. In convolution notation, this reads p k,n = p k,m p m,n. In particular, for any 0 m < n, S P(X n A F m ) = p m,n (X m, A) a.s. Time homogeneous Markov chains are determined by their one-step transition probabilities p = p n 1,n for all n N. We call p (k) = p n,n+k the k-step transition probabilities. The Chapman-Kolmogorov equation (1.4) then reads p (m+n) = p (m) p (n). 2 The Markov and strong Markov property We now restrict ourselves to time-homogeneous Markov chains. The Markov property asserts that given the value of X n, the law of (X n, X n+1, ) is the same as that of a Markov chain starting from X n, while the strong Markov property asserts that the same is true if we replace n by a stopping time τ. When the stopping time is a hitting time of a particular point x 0 S, the strong Markov property tells us that the process renews itself and has no memory of the past. Such renewal structures are particularly useful in the study of Markov chains. We will formulate the Markov property as an equality in law in terms of conditional expectations of bounded measurable functions. 2

Theorem 2.1 The Markov property] Let (S N 0, S N 0, P µ ) be the canonical probability space of a homogeneous Markov chain X with initial distribution µ, and let F n = σ(x 0,, X n ). Let θ n : S N 0 S N 0 denote the shift map with (θ n X) m = X m+n for m 0. Then for any bounded measurable function f : S N 0 R, E µ f(θ n X) F n ] = E Xn f], (2.5) where E µ (resp. E Xn ) denotes expectation w.r.t. the Markov chain with initial law µ (resp. δ Xn ). Proof. It suffices to show that for all A F n and all bounded measurable f, E µ f(θ n X)1 A ] = E µ EXn f]1 A ]. (2.6) We can use the π-λ theorem to restrict our attention to sets of the form A = {ω S N 0 : ω 0 A 0, ω 1 A 1,, ω n A n }, and use the monotone class theorem to restrict our attention to functions of the form f(ω) = k i=0 g i(ω i ) for some k N and bounded measurable g i : S R. For A and f of the forms specified above, by successive conditioning and the fact that the transition probabilities p of the Markov chain are regular conditional probabilities, E µ f(θ n X)1 A ] = E µ g k (X n+k ) g 0 (X n )1 An (X n ) 1 A0 (X 0 )] = µ(dx 0 ) p(x 0, dx 1 ) p(x n 1, dx n )g 0 (x n ) A 0 A 1 A n p(x n, dx n+1 )g 1 (x n+1 ) p(x n+k 1, dx n+k )g k (x n+k ) = E µ EXn g 0 g k ]1 A ] = Eµ EXn f]1 A ]. (2.7) Given f = k i=0 g i(ω i ), the collection of sets A F n which satisfy (2.7) is a λ-system, while sets of the form A = {ω S N 0 : ω 0 A 0,, ω n A n } is a π-system. Therefore by π-λ theorem, (2.7) holds for all A F n. Now we fix A F n. Let H denote the set of bounded measurable functions for which (2.7) holds. We have shown that H contains all functions of the form f(ω) = k i=0 g i(ω i ). In particular, H contains indicator functions of sets of the form A = {ω S N 0 : ω 0 A 0,, ω k A k }, which is a π-system that generates the σ-algebra S N 0. Clearly H is closed under addition, scalar multiplication, and increasing limits. Therefore by the monotone class theorem, H contains all bounded measurable functions. Theorem 2.2 Monotone class theorem] Let Π be a π-system which contains the full set Ω, and let H be a collection of real-valued functions satisfying (i) If A Π, then 1 A H. (ii) If f, g H, then f + g H, and cf H for any c R. (iii) If f n H are non-negative, and f n f where f is bounded, then f H. Then H contains all bounded measurable functions w.r.t. the σ-algebra generated by Π. The monotone class theorem is a simple consequence of the π-λ theorem. See e.g. Durrett 1] for a proof. 3

Theorem 2.3 The strong Markov property] Following the setup of Theorem 2.1, let τ be an (F n ) n 0 stopping time. Let (f n ) n 0 be a sequence of uniformly bounded measurable functions from S N 0 R. Then Proof. Let A F τ. Then E µ f τ (θ τ X) F τ ]1 {τ< } = E Xτ f τ ]1 {τ< } a.s. (2.8) E µ f τ (θ τ X)1 A {τ< } ] = E µ f n (θ n X)1 A {τ=n} ]. Since A {τ = n} F n, by the Markov property (2.5), the right hand side equals which proves (2.8). n=0 E µ E Xn f n ]1 A {τ=n} ] = E µ E Xτ f τ ]1 A {τ< } ], n=0 To illustrate the use of the strong Markov property and the reason for introducing the dependence of the functions f n on n, we prove the following. Example 2.4 Reflection principle for simple symmetric random walks] Let X n = n i=1 ξ i, where ξ i are i.i.d. with P(ξ i = ±1) = 1 2. Then for any a N, P( max 1 i n X i a) = 2P(X n a + 1) + P(X n = a). (2.9) Proof. Let τ a = inf{0 k n : X k = a} with τ a = if the set is empty. Then max 1 i n X i a if and only if τ a n. Therefore P( max 1 i n X i a) = P(τ a n) = P(τ a n, X n < a) + P(τ a n, X n > a) + P(τ a n, X n = a). Note that P(τ a n, X n > a) = P(X n > a) because X is a nearest-neighbor random walk, and similarly P(τ a n, X n = a) = P(X n = a), while P(τ a n, X n < a) = E1 {τa n}p(x n < a F τa )] = E1 {τa n}p a (X n τa < a)], where we have applied (2.8) with f k = 1 {Xn k <a} if 0 k n and f k = 0 otherwise. By symmetry, conditional on τ a, we have P a (X n τa < a) = P a (X n τa > a). Therefore which then implies (2.9). P(τ a n, X n < a) = P(τ a n, X n > a) = P(X n > a), Remark. The proof of Theorem 2.3 shows that a discrete time Markov chain is always strong Markov. However, this conclusion is false for continuous time Markov processes. The reason is that there are uncountable number of times which may conspire together to make the strong Markov property fail, even though the Markov property holds almost surely at deterministic times. One way to guarantee the strong Markov property is to require the transition probabilities p t (x, ) to be continuous in t and x, which is called the Feller property. 4

3 Markov chains with a countable state space We now focus on time-homogeneous Markov chains with a countable state space S. Let (p(x, y)) x,y S denote the 1-step transition probability kernel of the Markov chain (X n ) n 0, which is a matrix with non-negative entries and y S p(x, y) = 1 for all x S. Such matrices are called stochastic matrices. The n-step transition probability kernel of the Markov chain is then given by the n-th power of p, i.e., p (n) (x, y) = z S p(n 1) (x, z)p(z, y). We first consider the following subclass of Markov chains. Definition 3.1 Irreducible Markov chains] A Markov chain with a countable state space S is called irreducible if for all x, y S, p (n) (x, y) > 0 for some n 0. In other words, every state communicates with every other state. A markov chain fails to be irreducible either because the state space can be partitioned into non-communicating disjoint subsets, or there are subsets of the Markov chain acting as syncs: once the Markov chain enters the subset, it can never leave it. Definition 3.2 Transience, null recurrence, and positive recurrence] Let τ y := inf{n > 0 : X n = y} be the first hitting time (after time 0) of the state y S by the Markov chain X. Any state x S can then be classified into the following three types: (i) Transient, if P x (τ x < ) < 1. (ii) Null recurrent, if P x (τ x < ) = 1 and E x τ x ] =. (iii) Positive recurrent, if P x (τ x < ) = 1 and E x τ x ] <. It turns out that for an irreducible Markov chain, all states are of the same type. Therefore transience, null recurrence and positive recurrence will also be used to classify irreducible Markov chains. Before proving this claim, we first prove some preliminary results. Lemma 3.3 Let ρ xy = P x (τ y < ) for x, y S. Let G(x, y) = n=0 P x(x n = y) = n=0 p(n) (x, y). If y is transient, then ρ xy if x y, G(x, y) = 1 if x = y. (3.10) If y is recurrent, then G(x, y) = for all x S with ρ xy > 0. Proof. Assuming X 0 = y, let Ty 0 = 0, and define inductively Ty k = inf{i > Ty k 1 : X i = y}. Namely, Ty k are the successive return times to y. By the strong Markov property, P y (Ty k < Ty k 1 < ) = P y (Ty 1 < ) = ρ yy. By successive conditioning, we thus have P y (Ty k < ) = ρ k yy. Therefore G(y, y) = P y (Ty k < ) = ρ k 1 yy =. (3.11) k=0 k=0 Therefore G(y, y) = if and only if ρ yy = 1, i.e., y is recurrent. 5

For x y, we first have to wait till X visits y, and G(x, y) = P x (Ty k < ) = k=1 ρ xy, (3.12) where we used the fact that P x (T 1 y < ) = ρ xy. This completes the proof of the lemma. Lemma 3.4 If x S is recurrent, y x, and ρ xy := P x (τ y < ) > 0, then P x (τ y < τ x ) > 0, ρ yx := P y (τ x < ) = 1 = ρ xy, and y is also recurrent. Proof. If P x (τ y < τ x ) = 0 so that the Markov chain starting from x returns to x before visiting y almost surely, then when it returns to x, it starts afresh and will not visit y before a second return to x. Iterating this reasoning, the Markov chain will visit x infinitely often before visiting y, which means it will never visit y, contradicting the assumption. Suppose that ρ yx < 1. Let k = inf{i > 0 : p (k) (x, y) > 0}. Since P x (τ y < τ x ) > 0, there exists k 1 and y 1,, y k 1 S, all distinct from x and y such that p(x, y 1 )p(y 1, y 2 ) p(y k 1, y) > 0. Then P x (τ x = ) p(x, y 1 ) p(y k 1, y)(1 ρ yx ) > 0, which contradicts the recurrence of x. Hence ρ yx = 1. Since upon each return to x, with probability P x (τ y < τ x ) > 0, the Markov chain will visit y before returning to x, it follows that ρ xy = 1 because the Markov chain returns to x infinitely often by recurrence, and the events that y is visited between different consecutive returns to x are independent by the strong Markov property. Since ρ yx = ρ xy = 1, almost surely the Markov chain starting from y will visit x and then return to y. Therefore y is also recurrent. We are now ready to prove Theorem 3.5 For an irreducible Markov chain, all states are of the same type. Proof. Lemma 3.4 has shown that if x is recurrent, then so is any other y S by the irreducibility assumption. It remains to show that if x is positive recurrent, then so is any y S. Let p = P x (τ y < τ x ), which is positive by Lemma 3.4. Then E x τ x ] P x (τ y < τ x )E y τ x ]. Therefore E y τ x ] 1 p E xτ x ] <. On the other hand, E x τ y ] E x 1 {τy<τ x}τ x ] + E x 1 {τx<τ y}τ y ] = E x 1 {τy<τ x}τ x ] + E x 1 {τx<τ y}eτ y F τx ]] = E x 1 {τy<τ x}τ x ] + E x 1{τx<τ y}(τ x + E x τ y ]) ] Therefore E x τ y ] 1 p E xτ x ], and which proves the positive recurrence of y. = E x τ x ] + (1 p)e x τ y ]. E y τ y ] E y τ x ] + E x τ y ] 2 p E xτ x ] <, Remark. Theorem 3.5 allow us to classify an irreducible countable state space Markov chain to be either transient, null recurrent, or positive recurrent, depending on the type of its states. References 1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont, California, 1996. 6