Summary of Results on Markov Chains. Abstract

Similar documents
Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

STOCHASTIC PROCESSES Basic notions

Markov chains. Randomness and Computation. Markov chains. Markov processes

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

2. Transience and Recurrence

Markov Chain Monte Carlo

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

Discrete Markov Chain. Theory and use

Markov Chains (Part 3)

STATS 3U03. Sang Woo Park. March 29, Textbook: Inroduction to stochastic processes. Requirement: 5 assignments, 2 tests, and 1 final

Chapter 7. Markov chain background. 7.1 Finite state space

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Lecture 20 : Markov Chains

Lecture 9 Classification of States

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Markov Chains, Stochastic Processes, and Matrix Decompositions

Stochastic Models: Markov Chains and their Generalizations

Probability, Random Processes and Inference

Markov Chains on Countable State Space

Markov Chains Handout for Stat 110

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Classification of Countable State Markov Chains

Markov Processes Hamid R. Rabiee

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

Markov Chains and Stochastic Sampling

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

6 Markov Chain Monte Carlo (MCMC)

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Lectures on Markov Chains

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Lecture 3: September 10

Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015

Examples of Countable State Markov Chains Thursday, October 16, :12 PM

SMSTC (2007/08) Probability.

Approximate Counting and Markov Chain Monte Carlo

MARKOV CHAINS AND HIDDEN MARKOV MODELS

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

Math Homework 5 Solutions

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Markov chain Monte Carlo

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Convex Optimization CMU-10725

25.1 Ergodicity and Metric Transitivity

2 Discrete-Time Markov Chains

Chapter 2: Markov Chains and Queues in Discrete Time

Stochastic Processes (Week 6)

Markov Chains, Random Walks on Graphs, and the Laplacian

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

An Application of Graph Theory in Markov Chains Reliability Analysis

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes

Introduction to Machine Learning CMU-10701

Markov Chains for Everybody

IEOR 6711: Professor Whitt. Introduction to Markov Chains

Markov Processes. Stochastic process. Markov process

Sampling Methods (11/30/04)

Lecture #5. Dependencies along the genome

Markov and Gibbs Random Fields

Countable state discrete time Markov Chains

Chapter 11 Advanced Topic Stochastic Processes

Stat 516, Homework 1

Lecture 2: September 8

FINITE MARKOV CHAINS

The Theory behind PageRank

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

The Optimal Stopping of Markov Chain and Recursive Solution of Poisson and Bellman Equations

STAT STOCHASTIC PROCESSES. Contents

ISM206 Lecture, May 12, 2005 Markov Chain

Applied Stochastic Processes

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Probability & Computing


Using Markov Chains To Model Human Migration in a Network Equilibrium Framework

Chapter 2. Markov Chains. Introduction

CSC 446 Notes: Lecture 13

Markov Chains and MCMC

12 Markov chains The Markov property

The Distribution of Mixing Times in Markov Chains

30 Classification of States

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Reducing Markov Chains for Performance Evaluation

Notes on Measure Theory and Markov Processes

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

Markov Processes on Discrete State Spaces

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

Markov Random Fields

1 Gambler s Ruin Problem

Lecture 2 : CS6205 Advanced Modeling and Simulation

Lecture 4 - Random walk, ruin problems and random processes

Non-homogeneous random walks on a semi-infinite strip

Discrete Time Markov Chain (DTMC)

Transcription:

Summary of Results on Markov Chains Enrico Scalas 1, 1 Laboratory on Complex Systems. Dipartimento di Scienze e Tecnologie Avanzate, Università del Piemonte Orientale Amedeo Avogadro, Via Bellini 25 G, 15100 Alessandria, Italy (Dated: August 30, 2007) Abstract These short lecture notes contain a summary of results on the elementary theory of Markov chains. The purpose of these notes is to let the reader understand as quickly as possible the concept of statistical equilibrium, based on the stationary distribution of homogeneous Markov chains. Some exercises related to these notes can be found in a separate document. PACS numbers: 02.50.-r, 02.50.Ey, 05.40.-a, 05.40.Jc, Electronic address: enrico.scalas@mfn.unipmn.it; URL: www.mfn.unipmn.it/~scalas 1

I. INTRODUCTION Many models used in Economics, in Physics or in other sciences are instances of Markov chains. This is the case of Schelling s model [1] or of the closely related Ising s model [2] with the usual Monte Carlo dynamics [3]. Economists will find further motivation to study Markov chains in a recent book by Aoki and Yoshikawa [4]. Markov chains have the advantage that their theory can be introduced and many results can be proven in the framework of the elementary theory of probability, without extensively using measure theoretical tools. In order to compile the present summary, the books by Hoel et al., by Kemeny and Laurie Snell, by Durrett and by Çinlar [5 8] have been consulted. These notes can be considered as a summary of the first two chapters of Hoel et al.. In this summary, random variables will be denoted by capital letters X, Y,... and their values by small letters x, y,.... In order to define a Markov chain, a random variable X n will be considered that can assume values in a finite or at most denumerable set of states S at instants denoted by the subscript n = 0, 1, 2,.... This subscript will always be interpreted as a discrete-time index. It will be further assumed that P (X n+1 = x n+1 X 0 = x 0,..., X n = x n ) = P (X n+1 = x n+1 X n = x n ), (1) for every choice of the non-negative integer n and of the values x 0,..., x n which belong to S. P ( ) is a conditional probability. The meaning of equation (1) is that the probability of X n+1 does not depend on the past history, but only on the value of X n ; this equation, the socalled Markov property, can be used to define Markov chains. The conditional probabilities P (X n+1 = x n+1 X n = x n ) are called transition probabilities. If they do not depend on n, they are stationary (or homogeneous) transition probabilities and the corresponding Markov chains are stationary (or homogeneous) Markov chains. II. PROPERTIES OF MARKOV CHAINS A. Transitions and initial distribution The transition function, P (x, y) of a Markov chain, X n, is defined as P (x, y) = P (X 1 = y X 0 = x), x, y S. (2) 2

The values of P (x, y) are non-negative and the sum over the final states y of P (x, y) is 1. In the finite case with M states, this function can be represented as a square M M matrix with non-negative matrix elements and with rows summing up to 1. For a stationary Markov chain, one has that P (X n+1 = y X n = x) = P (x, y), n 1, (3) the initial distribution is π 0 (x) = P (X 0 = 0), (4) and the joint probability distribution P (X 0 = x 0, X 1 = x 1,..., X n = x n ) can be expressed as a product of π 0 (x) and P (x, y) s in the following way P (X 0 = x 0, X 1 = x 1..., X n = x n ) = π 0 (x 0 )P (x 0, x 1 ) P (x n 1, x n ). (5) The m-step transition function P m (x, y) is the probability of going from state x to state y in m steps. It is given by P m (x, y) = P (x, y 1 )P (y 1, y 2 ) P (y m 2, y m 1 )P (y m 1, y) (6) y 1 y m 1 for m 2; for m = 1, it coincides with P (x, y) and for m = 0, it is 1 if x = y and 0 elsewhere. The following three formulae involving P m (x, y) are useful in the theory of Markov chains: P n+m (x, y) = z P (X n = y) = x P (X n+1 = y) = x P n (x, z)p m (z, y) (7) π 0 (x)p n (x, y) (8) P (X n = x)p (x, y). (9) B. Hitting times and classification of states Given a subset of states A, the hitting time T A is defined as T A = min{n > 0 : X n A}. (10) Thanks to the concept of hitting time, it is possible to classify the states of Markov chains in a very useful way. Let P x ( ) denote the probability of an event for a Markov chain starting at state x. Then one has the following formula for the n-step transition function: n P n (x, y) = P x (T y = m)p n m (y, y). (11) m=1 3

An absorbing state of a Markov chain is a state a for which P (a, a) = 1 or, equivalently, P (a, y) = 0 for any state y a. If the chain reaches such a state, it is trapped there and it will never leave. For an absorbing state, it turns out that P n (x, a) = P x (T a n) for n 1. The quantity ρ xy = P x (T y < ) (12) can be used to introduce two classes of states. ρ yy is the probability that a chain starting at y will ever return to y. A state y is recurrent if ρ yy = 1 and transient if ρ yy < 1. For a transient state, there is a positive probability to never return back. An absorbing state is recurrent. The indicator function I y (z) helps in defining the counting random variable N(y). The indicator function I y (X n ) is 1 if X n = y and 0 otherwise, therefore N(y) = I y (X n ) (13) n=1 counts the number of times in which the chain reaches state y. The event {N(y) 1} coincides with the event {T y < }. Therefore, one can write By induction, one can prove that for m 1 hence and finally P x (N(y) 1) = P x (T y < ) = ρ xy. (14) P x (N(y) m) = ρ xy ρ m 1 yy, (15) P x (N(y) = m) = ρ xy ρ m 1 yy (1 ρ yy ), (16) P x (N(y) = 0) = 1 P x (N(y) 1) = 1 ρ xy. (17) One can define G(x, y) = E x (N(y)), the average number of visits to state y for a Markov chain that started at x. It turns out that G(x, y) = E x (N(y)) = It is now possible to state the following P n (x, y). (18) n=1 Theorem 1. 1. Let y be a transient state. Then P x (N(y) < ) = 1 and G(x, y) = ρ xy 1 ρ yy, x S (19) finite for all states. 4

2. Let y be a recurrent state. Then P y (N(y) = ) = 1 and G(y, y) = and one also has P x (N(y) = ) = P x (T y < ) = ρ xy, x S. (20) Finally, if ρ xy = 0, then G(x, y) = 0, else if ρ xy > 0, then G(x, y) =. This theorem tells that the Markov chain pays only a finite number of visits to a transient state, whereas if it starts from a recurrent state it will come back there an infinite number of times. If the Markov chain starts at any state x, it may well be that it will never visit the recurrent state y, but if it gets there, it will come back infinitely many times. A Markov chain is called transient if it has only transient states and recurrent if all of its states are recurrent. A finite Markov chain at least has one recurrent state and cannot be transient. C. The decomposition of space state A state x leads to another state y if ρ xy > 0 or, equivalently, if there exists a positive integer n for which P n (x, y) > 0. If x leads to y and y leads to z, then x leads to z. Based on this concept, there is the following Theorem 2. Let x be a recurrent state and suppose that x leads to y, Then y is recurrent and ρ xy = ρ yx = 1. A set of states C is said to be closed if no state in C leads to a state outside C. An absorbing state a defines the closed set {a}. There are several caracterizations of closed sets, but they will not be included here. A closed set C is irreducible (or ergodic) if for any choice of two states x and y in C, x leads to y. It is a consequence of Theorem (2) that if C is an irreducible closed set, either every state in C is transient or every state in C is recurrent. Another consequence of Theorems (1,2) is the following Corollary 1. For an irreducible closed set of recurrent states, C one has ρ xy = 1, P x (N(y) = ) = 1, and G(x, y) = for all choices of x and y in C. Finally, one has the following important result as a direct consequence of the above theorems and corollaries Theorem 3. If C is a finite irreducible closed set, then any state in C is recurrent. 5

If we are given a finite Markov chain, it is often possible to directly verify if the process is irreducible (or ergodic) by using the transition function (matrix) and controlling whether any state leads to any other state. Finally, one can prove the following decomposition into irreducible (ergodic) components Theorem 4. A non-empty set S R of recurrent states is the union of a finite or countably infinite number of disjoint irreducible closed sets C 1, C 2,.... If the initial state of the Markov chain is within one of the sets C i, the time evolution will take place within this set and the chain will visit any of these states an infinite number of times. If the chain starts within the set of transient states S T, either it will stay in this set visiting any transient state only a finite number of times, or, if it reaches one of the C i, it will stay there and will visit any state of the irreducible closed set infinitely many times. The problem arises to determine the hitting time distribution of the various ergodic components for a chain that starts in a transient state, as well as the absorption probability ρ C (x) = P x (T C < ) for x S T. The latter problem has the following solution when S T is finite. Theorem 5. Let the set S T be finite and let C be a closed irreducible set of recurrent states. Then the system of equations f(x) = P (x, y) + P (x, y)f(y), x S T (21) y C y S T has the unique solution f(x) = ρ C (x), x S T. III. THE PATH TO STATISTICAL EQUILIBRIUM A. The stationary distribution The stationary distribution, π(x), is a function of the Markov chain state space such that its values are non-negative, its sum over state space is 1, and π(x)p (x, y) = π(y), y S. (22) x It is interesting to notice that, for all n π(x)p n (x, y) = π(y), y S. (23) x 6

Moreover, if X 0 follows the stationary distribution, then, for all n, the distribution of X n also follows the stationary distribution. Indeed, the distribution of X n does not depend on n if and only if π 0 (x) = π(x). If π(x) is a stationary distribution and n P n (x, y) = π(y) holds for every initial state x and for every state y then one can conclude that n P (X n = y) = π(y) irrespective of the initial distribution. This means that, after a transient period, the distribution of chain states reaches a stationary distribution, which can then be interpreted as an equilibrium distribution in the statistical sense. For the reasons discussed above, it is important to see under which conditions, π(x) exists and is unique and to study the convergence properties of P n (x, y). B. How many times is a recurrent state visited in average? Let N n (t) denote the number of visits to a state y up to time step n. This random variable is defined as n N n (y) = I y (X m ). (24) m=1 One can also define the average number of visits to state y, starting from x up to step n: n G n (x, y) = E x (N n (y)) = P m (x, y). (25) If m y = E y (T y ) is taken to indicate the mean return (recurrence) time to come back to y for a chain starting at y, then, as an application of the strong law of large numbers one has m=1 Theorem 6. Let y be a recurrent state, then N(y) n n = I {T y< } m y (26) with probability one and G n (x, y) n n = ρ xy m y, x S (27) The meaning of this theorem is that if a chain reaches a recurrent state y, then it returns there with frequency 1/m y. Note that the quantity N n (y)/n is immediately accessible from Monte Carlo simulation of Markov chains. A corollary is of immediate relevance to finite Markov chains: 7

Corollary 2. Let x, y be two generic states in an irreducible closed set of recurrent states C, then G n (x, y) n n and if P (X 0 C) = 1, then with probability one for any state y in C If m y = the right sides are both 0. A null recurrent state y is a recurrent state for which m y = 1 m y, (28) N(y) n n = 1 (29) m y =. A positive recurrent state y is is a recurrent state for which m y <. The following result characterizes positive recurrent states Theorem 7. If x is a positive recurrent state and x leads to y then also y is positive recurrent. In a finite irreducible closed set of states there is no null recurrent state: Theorem 8. If C is a finite irreducible closed set of states, every state in C is positive recurrent. These corollaries are immediate consequences of the above theorems and corollary Corollary 3. An irreducible Markov chain having a finite number of states is positive recurrent. Corollary 4. A Markov chain having a finite number of states has no null recurrent states. As a final remark of this subsection, note that Theorem (6) and Corollary (2) connect time averages defined by N n (y)/n to ensemble averages defined by G n (x, y)/n and they can be called ergodic theorems. Ergodic theorems are related to the so-called strong law of large numbers, one of the important results of probability theory. Theorem 9. Let ξ 1, ξ 2,..., be independent and identically distributed random variables with finite mean µ, then ξ 1 + ξ 2 + + ξ n n n If these random variables are positive with infinite mean, the theorem still holds with µ = +. 8 = µ

C. Existence, uniqueness and convergence to the stationary distribution Eventually, the main results on the existence and uniqueness of π(x) and the iting behaviour of P n (x, y) can be stated. The ergodic theorems discussed in the previous subsection do provide a rule for the Monte Carlo approximation of π(x) that can be used to prove its existence and uniqueness. First of all, the stationary weight of both transient states and null recurrent states is zero. Theorem 10. If π(x) is a stationary distribution and x is a transient state or a null recurrent state then π(x) = 0. This means that a Markov chain without positive recurrent states cannot have a stationary probability distribution. However, Theorem 11. An irreducible positive recurrent Markov chain has a unique stationary disrtribution π(x) given by π(x) = 1. (30) m x This theorem provides the utimate justification for the use of Markov chain Monte Carlo simulations to sample the stationary distribution if the hypotheses of the theorem are fulfilled. In order to get an approximate value for π(y) one lets the system equilibrate (and to fully justify this step, the convergence theorem will be necessary) and then counts the number of occurences of state y, N n (y) in a long enough simulation of the Markov chain and divides it by the number of Monte Carlo steps n. This program can be carried out when the state space is not too large. In a typical Monte Carlo simulation of the Ising model, with K sites, the number of states is 2 K and soon grows to become untractable. In a simulation, many states will be never sampled even if the Markov chain is irreducible. For this reason, Metropolis et al. introduced the importance sampling trick whose explanation is outside the scope of the present notes [3, 9]. The next corollary provides a nice characterization of positive recurrent Markov chains. Corollary 5. An irreducible Markov chain is positive recurrent if and only if it has a stationary distribution. For chains with a finite number of states the existence and uniqueness of the stationary distribution is granted if they are irreducible. 9

Corollary 6. If a Markov chain having a fnite number of states is irreducible, it has a unique stationary distribution and, finally, the corollary discussed above, where the recipe was given to estimate π(x) from Monte Carlo simulations: Corollary 7. For an irreducible positive recurrent Markov chain having stationary distribution π, one has with probability one N n (x) n n For reducible Markov chains the following results hold = π(x). (31) Theorem 12. Let S P denote the positive recurrent states of a Markov chain 1. if S P is empty, the stationary distribution does not exist; 2. if S P is not empty and irreducible, the chain has a unique distribution; 3. if S P is non empty and reducible, the chain has an infinite number of stationary distributions. Case 3 is when the chain reaches one of the closed irreducible sets and then stays there forever. It is a subtle case, where Monte Carlo simulations may not give proper results if the chain reducibility is not studied. If x is a state of a Markov chain such that P n (x, x) > 0 for some n 1, its period d x can be defined as the greatest common divisor of the set {n 1 : P n (x, x) > 0}. For two states x and y leading to each other, d x = d y. States in an irreducible Markov chain have a common period d. The chain is called periodic of period d if d > 1 and aperiodic if d = 1. The following theorem gives the conditions for the convergence of P n (x, y) to the stationary distribution: Theorem 13. For an aperiodic irreducible positive recurrent Markov chain with stationary probability π(x) P n (x, y) = π(y), x, y S. (32) n For a periodic chain with the same properties and with period d, for each pair of states in S, there is an integer r, 0 r < d, such that P n (x, y) = 0 unless n = md + r for some non-negative integer m, and P md+r (x, y) = dπ(y), x, y S. (33) m 10

This theorem is the only one in the list that needs (mild) number-theoretic tools to be proven. Acknowledgements These notes were written during a visit to Marburg University supported by an Erasmus fellowship. The author wishes to thank Prof. Guido Germano and his group for their warm hospitality. [1] T.S. Schelling, (1971) Dynamic Models of Segregation, Journal of Mathematical Sociology 1, 143-186. [2] E. Ising, (1924) Beitrag zur Theorie des Ferro- und Paramagnetismus, Dissertation, Mathematisch-Naturwissenschaftliche Fakultät der Hamburgischen Universität, Hamburg. [3] D. Landau and K. Binder (1995) A Guide to Monte Carlo Simulations in Statistical Physics, Cambridge University Press. [4] M. Aoki and H. Yoshikawa (2007) Reconstructing Macroeconomics. A Perspective from Statistical Physics and Combinatorial Stochastic Processes, Cambridge University Press. [5] P.G. Hoel, S.C. Port, and C.J. Stone (1972) Introduction to Stochastic Processes, Houghton Mifflin, Boston. [6] J.G. Kemeny and J. Laurie Snell (1976) Finite Markov Chains, Springer, New York. [7] R. Durrett (1999) Essentials of Stochastic Processes, Springer, New York. [8] E. Çinlar (1975) Introduction to Stochastic Processes, Prentice Hall, Englewood Cliffs. [9] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller (1953) Equation of State Calculations by Fast Computing Machines, Journal of Chemical Physics, 21, 1087 1092. 11