Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Similar documents
Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Lecture 12. F o s, (1.1) F t := s>t

Lecture 9 Classification of States

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Chapter 2: Markov Chains and Queues in Discrete Time

Chapter 7. Markov chain background. 7.1 Finite state space

2. Transience and Recurrence

Positive and null recurrent-branching Process

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

process on the hierarchical group

Classification of Countable State Markov Chains

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Lecture 20 : Markov Chains

T. Liggett Mathematics 171 Final Exam June 8, 2011

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +


{σ x >t}p x. (σ x >t)=e at.

18.175: Lecture 30 Markov chains

Translation Invariant Exclusion Processes (Book in Progress)

Markov Chains. Chapter Existence and notation. B 2 B(S) and every n 0,

A D VA N C E D P R O B A B I L - I T Y

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Lecture 5: Random Walks and Markov Chain

The Lebesgue Integral

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Ergodic Properties of Markov Processes

P(X 0 = j 0,... X nk = j k )

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )

Stochastic Processes (Stochastik II)

Math Homework 5 Solutions

Applied Stochastic Processes

12 Markov chains The Markov property

Brownian Motion and Stochastic Calculus

1 Stat 605. Homework I. Due Feb. 1, 2011

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Lectures on Markov Chains

RECURRENCE IN COUNTABLE STATE MARKOV CHAINS

STOCHASTIC PROCESSES Basic notions

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS

SMSTC (2007/08) Probability.

Stochastic Simulation

Countable state discrete time Markov Chains

Math 6810 (Probability) Fall Lecture notes

MATH 56A SPRING 2008 STOCHASTIC PROCESSES 65

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

Summary of Results on Markov Chains. Abstract

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

25.1 Ergodicity and Metric Transitivity

An Introduction to Entropy and Subshifts of. Finite Type

In terms of measures: Exercise 1. Existence of a Gaussian process: Theorem 2. Remark 3.

MATH 202B - Problem Set 5

Hitting Probabilities

Markov Chains for Everybody

Intertwining of Markov processes

PROBABILITY THEORY II

Random Times and Their Properties

Random Process Lecture 1. Fundamentals of Probability

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

Stochastic Processes (Week 6)

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Probability Theory. Richard F. Bass

FINITE MARKOV CHAINS

Stochastic Processes

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Stochastic modelling of epidemic spread

Necessary and sufficient conditions for strong R-positivity

P (A G) dp G P (A G)

MATH 6605: SUMMARY LECTURE NOTES

Weak quenched limiting distributions of a one-dimensional random walk in a random environment

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Universal examples. Chapter The Bernoulli process

Part III Advanced Probability

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

An Introduction to Stochastic Processes in Continuous Time

Stochastic Realization of Binary Exchangeable Processes

Exercises: sheet 1. k=1 Y k is called compound Poisson process (X t := 0 if N t = 0).

The Essential Equivalence of Pairwise and Mutual Conditional Independence

Chapter 2. Markov Chains. Introduction

Markov Chains (Part 3)

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

4th Preparation Sheet - Solutions

Advanced Computer Networks Lecture 2. Markov Processes

A review of Continuous Time MC STA 624, Spring 2015

1 Random walks: an introduction

Markov Chains, Stochastic Processes, and Matrix Decompositions

Lecture 7. Sums of random variables

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Probability and Measure

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Transcription:

1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if for F n := σ(x 0,, X n ), P(X n+1 A F n ) = P(X n+1 A X n ) n 0 and A S. (1.1) If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. Examples: 1. Random walks; 2. Branching processes; 3. Polya s urn. Remark. Note that any stochastic process (X n ) n 0 taking values in S can be turned into a Markov chain if we enlarge the state space from S to n N Sn, and change the process from (X n ) n 0 to ( X n ) n 0 with X n = (X 0, X 1,, X n ) S n+1, namely the process becomes Markov if we take its entire past to be its present state. A more concrete way of characterizing Markov chains is by transition probabilities. Definition 1.2 Markov chain transition probabilities] A function p : S S 0, 1] is called a transition probability, if (i) For each x S, A p(x, A) is a probability measure on (S, S). (ii) For each A S, x p(x, A) is a measurable function on (S, S). We say a Markov chain (X n ) n 0 has transition probabilities p n if P(X n A F n 1 ) = p n (X n 1, A) (1.2) almost surely for all n N and A S. If p n p for all n N, then we call (X n ) n 0 a time-homogeneous Markov chain, or a Markov chain with stationary transition probabilities. If the underlying state space (S, S) is nice, then the distribution of a Markov chain X satisfying (1.1) can be characterized by the initial distribution µ of X 0 and the transition probabilities (p n ) n N. In particular, if S is a complete separable metric space with Borel σ- algebra S, then regular conditional probability distributions always exist, which guarantees the existence of transition probabilities p n. Conversely, a given family of transition probabilities p n and an initial law µ for X 0 uniquely determine a consistent family of finite dimensional distributions: P µ (X i A i, 0 i n) = µ(dx 0 ) p 1 (x 0, dx 1 ) p n (x n 1, dx n ), (1.3) A 0 A 1 A n which are the finite-dimensional distributions of (X n ) n 0. When (S, S) is a Polish space with Borel σ-algebra S, by Kolmogorov s extension theorem (see 1, Section A.7]), the law of (X n ) n 0, regarded as a random variable taking values in (S N 0, S N 0 ), is uniquely determined. Here N 0 := {0} N and S N 0 is the Borel σ-algebra generated by the product topology on S N 0. 1

Theorem 1.3 Characterization of Markov chains via transition probabilities] Suppose that (S, S) is a Polish space equipped with the Borel σ-algebra. Then to any collection of transition probabilities p n : S S 0, 1] and any probability measure µ on (S, S), there corresponds a Markov chain (X n ) n 0 with state space (S, S), initial distribution µ, and finite dimensional distributions given as in (1.3). Conversely if (X n ) n 0 is a Markov chain with initial distribution µ, then we can construct a family of transition probabilities (p n ) n N such that the finite dimensional distributions of X satisfy (1.3). From (1.3), it is also easily seen that P µ ( ) = P x ( )µ(dx), where P x denotes the law of the Markov chain starting at X 0 = x. Remark. When there is no other randomness involved besides the Markov chain (X n ) n 0, it is customary to let (S N 0, S N 0, P µ ) be the canonical probability space for X with initial distribution µ. From the one-step transition probabilities (p n ) n N, we can easily construct the transition probabilities between times k < l, i.e., P(X l A F k ). Define p k,l (x, A) = p k+1 (x, dy k+1 )p k+2 (y k+1, dy k+2 ) p l (y l 1, A). It is an easy exercise to show that S S Theorem 1.4 Chapman-Kolmogorov equations] The transition probabilities (p k,m ) 0 k<m satisfy the relations p k,n (x, A) = p k,m (x, dy)p m,n (y, A) (1.4) for all k < m < n, x S and A S. In convolution notation, this reads p k,n = p k,m p m,n. In particular, for any 0 m < n, S P(X n A F m ) = p m,n (X m, A) a.s. Time homogeneous Markov chains are determined by their one-step transition probabilities p = p n 1,n for all n N. We call p (k) = p n,n+k the k-step transition probabilities. The Chapman-Kolmogorov equation (1.4) then reads p (m+n) = p (m) p (n). 2 The Markov and strong Markov property We now restrict ourselves to time-homogeneous Markov chains. The Markov property asserts that given the value of X n, the law of (X n, X n+1, ) is the same as that of a Markov chain starting from X n, while the strong Markov property asserts that the same is true if we replace n by a stopping time τ. When the stopping time is a hitting time of a particular point x 0 S, the strong Markov property tells us that the process renews itself and has no memory of the past. Such renewal structures are particularly useful in the study of Markov chains. We will formulate the Markov property as an equality in law in terms of conditional expectations of bounded measurable functions. 2

Theorem 2.1 The Markov property] Let (S N 0, S N 0, P µ ) be the canonical probability space of a homogeneous Markov chain X with initial distribution µ, and let F n = σ(x 0,, X n ). Let θ n : S N 0 S N 0 denote the shift map with (θ n X) m = X m+n for m 0. Then for any bounded measurable function f : S N 0 R, E µ f(θ n X) F n ] = E Xn f], (2.5) where E µ (resp. E Xn ) denotes expectation w.r.t. the Markov chain with initial law µ (resp. δ Xn ). Proof. It suffices to show that for all A F n and all bounded measurable f, E µ f(θ n X)1 A ] = E µ EXn f]1 A ]. (2.6) We can use the π-λ theorem to restrict our attention to sets of the form A = {ω S N 0 : ω 0 A 0, ω 1 A 1,, ω n A n }, and use the monotone class theorem to restrict our attention to functions of the form f(ω) = k i=0 g i(ω i ) for some k N and bounded measurable g i : S R. For A and f of the forms specified above, by successive conditioning and the fact that the transition probabilities p of the Markov chain are regular conditional probabilities, E µ f(θ n X)1 A ] = E µ g k (X n+k ) g 0 (X n )1 An (X n ) 1 A0 (X 0 )] = µ(dx 0 ) p(x 0, dx 1 ) p(x n 1, dx n )g 0 (x n ) A 0 A 1 A n p(x n, dx n+1 )g 1 (x n+1 ) p(x n+k 1, dx n+k )g k (x n+k ) = E µ EXn g 0 g k ]1 A ] = Eµ EXn f]1 A ]. (2.7) Given f = k i=0 g i(ω i ), the collection of sets A F n which satisfy (2.7) is a λ-system, while sets of the form A = {ω S N 0 : ω 0 A 0,, ω n A n } is a π-system. Therefore by π-λ theorem, (2.7) holds for all A F n. Now we fix A F n. Let H denote the set of bounded measurable functions for which (2.7) holds. We have shown that H contains all functions of the form f(ω) = k i=0 g i(ω i ). In particular, H contains indicator functions of sets of the form A = {ω S N 0 : ω 0 A 0,, ω k A k }, which is a π-system that generates the σ-algebra S N 0. Clearly H is closed under addition, scalar multiplication, and increasing limits. Therefore by the monotone class theorem, H contains all bounded measurable functions. Theorem 2.2 Monotone class theorem] Let Π be a π-system which contains the full set Ω, and let H be a collection of real-valued functions satisfying (i) If A Π, then 1 A H. (ii) If f, g H, then f + g H, and cf H for any c R. (iii) If f n H are non-negative, and f n f where f is bounded, then f H. Then H contains all bounded measurable functions w.r.t. the σ-algebra generated by Π. The monotone class theorem is a simple consequence of the π-λ theorem. See e.g. Durrett 1] for a proof. 3

Theorem 2.3 The strong Markov property] Following the setup of Theorem 2.1, let τ be an (F n ) n 0 stopping time. Let (f n ) n 0 be a sequence of uniformly bounded measurable functions from S N 0 R. Then Proof. Let A F τ. Then E µ f τ (θ τ X) F τ ]1 {τ< } = E Xτ f τ ]1 {τ< } a.s. (2.8) E µ f τ (θ τ X)1 A {τ< } ] = E µ f n (θ n X)1 A {τ=n} ]. Since A {τ = n} F n, by the Markov property (2.5), the right hand side equals which proves (2.8). n=0 E µ E Xn f n ]1 A {τ=n} ] = E µ E Xτ f τ ]1 A {τ< } ], n=0 To illustrate the use of the strong Markov property and the reason for introducing the dependence of the functions f n on n, we prove the following. Example 2.4 Reflection principle for simple symmetric random walks] Let X n = n i=1 ξ i, where ξ i are i.i.d. with P(ξ i = ±1) = 1 2. Then for any a N, P( max 1 i n X i a) = 2P(X n a + 1) + P(X n = a). (2.9) Proof. Let τ a = inf{0 k n : X k = a} with τ a = if the set is empty. Then max 1 i n X i a if and only if τ a n. Therefore P( max 1 i n X i a) = P(τ a n) = P(τ a n, X n < a) + P(τ a n, X n > a) + P(τ a n, X n = a). Note that P(τ a n, X n > a) = P(X n > a) because X is a nearest-neighbor random walk, and similarly P(τ a n, X n = a) = P(X n = a), while P(τ a n, X n < a) = E1 {τa n}p(x n < a F τa )] = E1 {τa n}p a (X n τa < a)], where we have applied (2.8) with f k = 1 {Xn k <a} if 0 k n and f k = 0 otherwise. By symmetry, conditional on τ a, we have P a (X n τa < a) = P a (X n τa > a). Therefore which then implies (2.9). P(τ a n, X n < a) = P(τ a n, X n > a) = P(X n > a), Remark. The proof of Theorem 2.3 shows that a discrete time Markov chain is always strong Markov. However, this conclusion is false for continuous time Markov processes. The reason is that there are uncountable number of times which may conspire together to make the strong Markov property fail, even though the Markov property holds almost surely at deterministic times. One way to guarantee the strong Markov property is to require the transition probabilities p t (x, ) to be continuous in t and x, which is called the Feller property. 4

3 Markov chains with a countable state space We now focus on time-homogeneous Markov chains with a countable state space S. Let (p(x, y)) x,y S denote the 1-step transition probability kernel of the Markov chain (X n ) n 0, which is a matrix with non-negative entries and y S p(x, y) = 1 for all x S. Such matrices are called stochastic matrices. The n-step transition probability kernel of the Markov chain is then given by the n-th power of p, i.e., p (n) (x, y) = z S p(n 1) (x, z)p(z, y). We first consider the following subclass of Markov chains. Definition 3.1 Irreducible Markov chains] A Markov chain with a countable state space S is called irreducible if for all x, y S, p (n) (x, y) > 0 for some n 0. In other words, every state communicates with every other state. A markov chain fails to be irreducible either because the state space can be partitioned into non-communicating disjoint subsets, or there are subsets of the Markov chain acting as syncs: once the Markov chain enters the subset, it can never leave it. Definition 3.2 Transience, null recurrence, and positive recurrence] Let τ y := inf{n > 0 : X n = y} be the first hitting time (after time 0) of the state y S by the Markov chain X. Any state x S can then be classified into the following three types: (i) Transient, if P x (τ x < ) < 1. (ii) Null recurrent, if P x (τ x < ) = 1 and E x τ x ] =. (iii) Positive recurrent, if P x (τ x < ) = 1 and E x τ x ] <. It turns out that for an irreducible Markov chain, all states are of the same type. Therefore transience, null recurrence and positive recurrence will also be used to classify irreducible Markov chains. Before proving this claim, we first prove some preliminary results. Lemma 3.3 Let ρ xy = P x (τ y < ) for x, y S. Let G(x, y) = n=0 P x(x n = y) = n=0 p(n) (x, y). If y is transient, then ρ xy if x y, G(x, y) = 1 if x = y. (3.10) If y is recurrent, then G(x, y) = for all x S with ρ xy > 0. Proof. Assuming X 0 = y, let Ty 0 = 0, and define inductively Ty k = inf{i > Ty k 1 : X i = y}. Namely, Ty k are the successive return times to y. By the strong Markov property, P y (Ty k < Ty k 1 < ) = P y (Ty 1 < ) = ρ yy. By successive conditioning, we thus have P y (Ty k < ) = ρ k yy. Therefore G(y, y) = P y (Ty k < ) = ρ k 1 yy =. (3.11) k=0 k=0 Therefore G(y, y) = if and only if ρ yy = 1, i.e., y is recurrent. 5

For x y, we first have to wait till X visits y, and G(x, y) = P x (Ty k < ) = k=1 ρ xy, (3.12) where we used the fact that P x (T 1 y < ) = ρ xy. This completes the proof of the lemma. Lemma 3.4 If x S is recurrent, y x, and ρ xy := P x (τ y < ) > 0, then P x (τ y < τ x ) > 0, ρ yx := P y (τ x < ) = 1 = ρ xy, and y is also recurrent. Proof. If P x (τ y < τ x ) = 0 so that the Markov chain starting from x returns to x before visiting y almost surely, then when it returns to x, it starts afresh and will not visit y before a second return to x. Iterating this reasoning, the Markov chain will visit x infinitely often before visiting y, which means it will never visit y, contradicting the assumption. Suppose that ρ yx < 1. Let k = inf{i > 0 : p (k) (x, y) > 0}. Since P x (τ y < τ x ) > 0, there exists k 1 and y 1,, y k 1 S, all distinct from x and y such that p(x, y 1 )p(y 1, y 2 ) p(y k 1, y) > 0. Then P x (τ x = ) p(x, y 1 ) p(y k 1, y)(1 ρ yx ) > 0, which contradicts the recurrence of x. Hence ρ yx = 1. Since upon each return to x, with probability P x (τ y < τ x ) > 0, the Markov chain will visit y before returning to x, it follows that ρ xy = 1 because the Markov chain returns to x infinitely often by recurrence, and the events that y is visited between different consecutive returns to x are independent by the strong Markov property. Since ρ yx = ρ xy = 1, almost surely the Markov chain starting from y will visit x and then return to y. Therefore y is also recurrent. We are now ready to prove Theorem 3.5 For an irreducible Markov chain, all states are of the same type. Proof. Lemma 3.4 has shown that if x is recurrent, then so is any other y S by the irreducibility assumption. It remains to show that if x is positive recurrent, then so is any y S. Let p = P x (τ y < τ x ), which is positive by Lemma 3.4. Then E x τ x ] P x (τ y < τ x )E y τ x ]. Therefore E y τ x ] 1 p E xτ x ] <. On the other hand, E x τ y ] E x 1 {τy<τ x}τ x ] + E x 1 {τx<τ y}τ y ] = E x 1 {τy<τ x}τ x ] + E x 1 {τx<τ y}eτ y F τx ]] = E x 1 {τy<τ x}τ x ] + E x 1{τx<τ y}(τ x + E x τ y ]) ] Therefore E x τ y ] 1 p E xτ x ], and which proves the positive recurrence of y. = E x τ x ] + (1 p)e x τ y ]. E y τ y ] E y τ x ] + E x τ y ] 2 p E xτ x ] <, Remark. Theorem 3.5 allow us to classify an irreducible countable state space Markov chain to be either transient, null recurrent, or positive recurrent, depending on the type of its states. References 1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont, California, 1996. 6