Discrete time Markov chains

Size: px

Start display at page:

Download "Discrete time Markov chains"

Geraldine James
6 years ago
Views:

1 Chapter Discrete time Markov chains In this course we consider a class of stochastic processes called Markov chains. The course is roughly equally divided between discrete-time and continuous-time Markov chains. We shall study various methods to help understand the behaviour of Markov chains, in particular over the long term.. What is a Markov chain?.. Examples A stochastic process is a mathematical model for a random development in time. Formally, a discretetime stochastic process is a sequence {X n : n =,,,...} of random variables. The value of the random variable X n is interpreted as the state of the random development after n time steps. Example.. Suppose a virus can exist in two different strains α, β and in each generation either stays the same, or with probability p / mutates to the other strain. Suppose the virus is in strain α initially, what is the probability that it is in the same strain after n generations? We let X n be the strain of the virus in the nth generation, which is a random variable with values in {α, β}. The crucial point here is that the random variable X n and X n+ are not independent and things you have learnt about i.i.d. sequences of random variables do not apply here! To check this out note that P{X n = α, X n+ = α} = P{X n = α and no mutation occurs in step n +} =( p)p{x n = α}, and hence This gives, P{X n = β,x n+ = α} = P{X n = β and mutation occurs in step n +} = pp{x n = β}, P{X n+ = α} = P{X n = α, X n+ = α} + P{X n = β,x n+ = α} = ( p)p{x n = α} + p( P{X n = α}) = ( p)p{x n = α} + p<( p). P({X n = α} {X n+ = α}) =( p)p{x n = α} > P{X n+ = α}p{x n = α},

2 contradicting independence. Recall that another way to express the last formula is that P{X n+ = α X n = α} > P{X n+ = α}. To study this process we therefore need a new theory, which is the theory of discrete-time Markov chains. The possible values of X n are making up the statespace I of the chain, here I = {α, β}. The characteristic feature of a Markov chain is that the past influences the future only via the present.for example you should check yourself that P{X n+ = α X n = α, X n = α} = P{X n+ = α X n = α}. Here the state of the virus at time n + (future) does not depend on the state of the virus at time n (past) if the state at time n (present) is already known. The question raised above can be anserwed using this theory, we will give an intuitive argument below. Example. The simple random walk A particle jumps about at random on the set Z of integers. At time the particle is in fixed position x Z. At each time n N a coin with probability p of heads and q = p of tails is tossed. If the coin falls heads, then the particle jumps one position to the right, if the coin falls tails the particle jumps one position to the left. For n N the position X n of the particle at time n is therefore X n = x + Y + + Y n = X n + Y n where { if kth toss is heads, Y k := ifkth toss is tails. The Y k s are independent random variables with P{Y k =} = p and P{Y k = } = q. The stochastic process {X n : n N} has again the time-parameter set N and the statespace is the discrete (but infinite) set Z of integers. The rules of evaluation are given by the laws of Y k and we note that P{X n+ = k X n = j, X n = x n,...,x = x } = P{Y n+ = k j Y n = j x n,y n = x n x n,...,y = x x,y = x x}} if k j, = P{Y n+ = k j} = p if k j =, q if k j =, is again independent of x,...,x n. Observe that here the independence of the random variables Y k plays an important role, though the random variables X n are not independent... Intuition Markov chain theory offers many important models for application and presents systematic methods to study certain questions. The existence of these systematic methods does not stop us from using intuition and common sense to guess the behaviour of the models.

3 For example, in Example. one can use the equation together with P{X n+ = α X = α} = pp{x n = β X = α} +( p)p{x n = α X = α} P{X n = β X = α} = P{X n = α X = α}, to obtain for the desired quantity p n = P{X n = α X = α} the recursive relation This has a unique solution given by p n+ = p( p n )+( p)p n = p n ( p)+p, p =. p n = + ( p)n. As n this converges to the long term probability that the virus is in strain α, whichis/ and therefore independent of the mutation probability p. The theory of Markov chains provides a systematic approach to this and similar questions... Definition of discrete-time Markov chains Suppose I is a discrete, i.e. finite or countably infinite, set. A stochastic process with statespace I and discrete time parameter set N = {,,,...} is a collection {X n : n N} of random variables (on the same probability space) with values in I. The stochastic process {X n : n N} is called a Markov chain with statespace I and discrete time parameter set N if its law of evolution is specified by the following: (i) An initial distribution on the state space I given by a probability mass function (w i with w i and i I w i =. : i I), (ii) A one-step transition matrix P =(p ij : i, j I) withp ij for all i, j I and p ij = foralli I. j I The law of evolution is given by P{X = x,x = x,...,x n = x n } = w x p x x p xn x n, for all x,...,x n I...4 Discussion of the Markov property Interpretation of the one-step transition matrix All jumps from i to j in one step occur with probability p ij,soifwefixastatei and observe where we are going in the next time step, then the probability distribution of the next location in I has the probability mass function (p ij : j I). Given the present, the future is independent of the past

4 We have P{X = x,x = x,...,x n = x n } = w x p x x p xn x n, and P{X = x,x = x,...,x n = x n } = w x p x x p xn x n, for all x,...,x n I. Recall the definition of conditional probabilities Dividing the two equations gives P(A B) = P(A B). P(B) P{X n = x n X = x,...,x n = x n } = p xn x n. (..) In order to find P{X n = x n X n = x n } we use the properties of the one-step transition matrix P to see that P{X n = x n,x n = x n } = P{X n = x n,...,x = x } x I x n I = w x p x x p xn x n p xn x n x I x n I = P{X = x,...,x n = x n }p xn x n x I x n I = P{X n = x n } p xn x n. Hence, P{X n = x n X n = x n } = p xn x n. (..) Now compare (..) and (..). We get the formula known as the Markov property: P{X n = x n X n = x n,...,x = x } = P{X n = x n X n = x n } = p xn x n, for all x,...,x n I and n...5 Examples Example. Recall the virus example. Initially the virus is in strain α, hence w =(w α,w β )=(, ). The P -matrix is ( ) p p P =. p p Example. For the simple random walk, the particle starts in a fixed point x, hence the initial distribution is given by { if i = x. w i = P{X = i} = otherwise. The one-step transition matrix is given by if j i, p ij = P{X n+ = j X n = i} = p if j i =, q if j i =. 4

5 Example. Suppose that X,X,X,... is a sequence of independent and identically distributed random variables with P{X n = i} = µ(i) for all n N,i I, for some finite statespace I and probability mass function µ : I [, ]. Then the initial distribution is w =(w i : i I) withw i = µ(i) and the one-step transition matrix is P = µ(i ) µ(i ) µ(i n ) µ(i ) µ(i ) µ(i n ) µ(i ) µ(i ) µ(i n ) where we enumerated the statespace I = {i,...,i n }. Example.4 Random walk on a finite graph A particle is moving on the graph below by starting on the top left vertex and at each time step moving along one of the adjacent edges to a neighbouring vertex, choosing the edge with equal probability and independently of all previous movements. There are four vertices, which we enumerate from left to right by {,...,4}. Attimen =wearein vertex. Hence w =(,,, ) is the initial distribution. Each vertex has exactly two neighbours, so that it jumps to each neighbour with probability /. This gives P = / / / / / / / /.. Fundamental questions about the long term behaviour Will the Markov chain converge to some equilibrium regime? precisely? And, what does this mean How much time does the Markov chain spend in the different states? favourite state? Does the answer depend on the starting position? What is the chain s How long does it take, on average, to get from some given state to another one? 5

6 . n-step transition probabilities Unless otherwise stated X = {X n : n N} is a Markov chain with (discrete) statespace I and onestep transition matrix P. The Markov property shows that, whatever the initial distribution of the Markov chain is, we have P{X n+ = j X n = i} = p ij. Let us consider a two-step transition, P{X n+ = j X n = i} = k I P{X n+ = j, X n+ = k X n = i} = k I P{X n+ = k X n = i} P{X n+ = j X n+ = k, X n = i} = p ik p kj =(P ) ij, k I where P is the product of the matrix P with itself. More generally, P{X n+k = j X k = i} =(P n ) ij. Moreover, if the vector (w i : i I) is the initial distribution, we get P{X n = j} = P{X = k}p{x n = j X = k} k I = k I w k (P n ) kj. Henceweget P{X n = j} =(wp n ) j. Hence, if we can calculate the matrix power P n, we can find the distribution of X n and the n-step transition probabilities, i.e. the probabilities of being in state j at time n + k if we are in state i at time n. If n is large (recall that we are particularly interested in the long term behaviour of the process!) it is not advisable to calculate P n directly, but there are some more efficient methods, which we shall discuss now... Calculation of matrix powers by diagonalization We present the diagonalization method by an example. Suppose a village has three pubs along the main road. A customer decides after every pint whether he moves to the next pub on the left or on the right, and he chooses each option with the same probability. If there is no pub in this direction, he stays where he is for another pint. Here is a graph indicating the situation The statespace of this Markov chain is I = {,, }, where the three numbers indicate the three pubs and the one-step transition matrix is P =

7 To find P k for large k, wediagonalize P, if possible. This is happening in two steps. Step Find det(λi P ), the characteristic polynomial, of P.Here λ λi P = λ, λ and hence det(λi P ) = ( ) ( ) λ (λ )det λ + det λ = (λ )(λ λ/ 4 ) 4 (λ ) = (λ )(λ )(λ + ). Thus P has three distinct eigenvalues λ =,λ =,λ =, so it must be diagonalizable (which in this particular case was clear from the fact that P is symmetric). Important remark: isalways an eigenvalue of p, because the vector v =(,...,) T satisfies Pv = v. ThisisthefactthattherowsumsofP-matrices are! Step Now we find the corresponding eigenvectors v,v,v with Pv i = λ i v i. You can either solve the simultaneous equations (P λ i I)v i = or guess the answer. Let S be the matrix with the eigenvectors as columns. Inourcase, S =, then Hence P = SΛS,and Continuing, we get λ S PS = λ =: Λ. λ P = SΛS SΛS = SΛ S. λ n P n = S λ n S. (..) λ n 7

8 HencewehaveP n. Looking back at our example, in our case S =. If our friend starts at Pub, what is the probability that he is in Pub after pints, i.e. what is P{X =}? We find P = /8 /8 /4 /8 = /8 /4 /8. /8 /4 /8 /8 (Check: row sums are still one, all entries positive!) Now we need the initial distribution, which is given by the probability mass function w =(,, ) (start is in Pub ) and we get P{X =} =(wp ) =/8, P{X =} =(wp ) =/8, P{X =} =(wp ) =/4... Tricks for Diagonalization We continue with the same example and give some useful tricks to obtain a quick diagonalization of (small) matrices. In step, to find the eigenvalues of the matrix P, one has three automatic equations, () is always an eigenvalue for a P -matrix, () the trace of P (=sum of the diagonal entries) equals the sum of the eigenvalues, () the determinant of P is the product of the eigenvalues. In the or case one may get the eigenvalues by solving the system of equations resulting from these three facts. In our example we have λ = by Fact (), Fact () gives + λ + λ =,whichmeansλ = λ. Fact () gives λ λ = /4, hence λ =/, and λ = / (or vice versa). If we have distinct eigenvalues, the matrix is diagonalizable. Then from the representation of P n in (..) we see that (in the case) there exist matrices U,U and U with P n = λ n U + λ n U + λ n U. We know this without having to find the eigenvectors! Now taking the values n =,, in the equation gives () U + U + U = P = I =, () U + U U = P, () U + 4 U + 4 U = P. 8

9 All one has to do is to solve the (matrix) simultaneous equations. In the example () 4 () gives U = I 4P and () + () gives U +U = I +P.Now, Hence P = = U = (4P I) = [ ] = And, U = (I +P U )= [ + ] =, U = I U U = =. Hence, for all n =,,,..., ( P n = + ) n + And recall that the n-step are just the entries of this matrix, i.e ( ) n p (n) ij := P{X n = j X = i} =(P n ) ij, for example P{X n = X =} =(P n ) = ( )n + ( )n. As a further example look at P = Then the eigenvalues are λ =(always)andλ,λ satisfy λ λ λ =detp =,andλ + λ + λ = trace P =/. Hence λ = / andλ =. As the eigenvalues are distinct, the matrix P is diagonalizable and there exist matrices U,U,U with ( P n = U + ) nu + n U, for all n =,,,.... Either take n =,, andsolveor find U directly as U =, 9

10 by solving πp = π [see tutorials for this trick]. In the second case one only has to use the equations for n =, andgetsi = U + U + U and P = U U. Hence U =(U P )= For n wedonotneedtofindu and get P n = + ( ) n.. The method of generating functions. This method if particularly efficient when we want to find certain entries of the matrix P n without having to determine the complete matrix. The starting point is the geometric series, +z + z + =, for all z <. z For a finite square matrix A the same proof gives that I + A + A + =(I A), if lim n An =andi A is invertible. This is the case if all the eigenvalues of A have modulus <. Recall that the one-step transition matrix of a Markov chain does not fulfill this condition, as is always an eigenvalue, however we have the following useful fact.. Lemma. Suppose P is the one-step transition matrix of a finite state Markov chain, then all eigenvalues have modulus less or equal to one. We can thus use this series for the matrices θp for all θ <. The idea is to expand (I θp) as a power series in θ around θ = and to find P n by comparing the coefficients of θ n. Letuslookatanexample. Let X be a Markov chain with statespace I = {,, } and one-step transition matrix P = 4 4. Then θ I θp = 4 θ θ 4 θ θ To invert this matrix, recall that A = C T / det A where C = {c ij : i, j } is the matrix of cofactors of A, i.e. c ij is the determinant of the matrix A with the ith row and jth column removed,.

11 multiplied with the factor ( ) i+j. For example, (I θp) = = c det(i θp) = ( det det θ 4 θ θ θ θ 4 θ = θ 4 θ ( θ)( + θ). ( θ ) θ ) ( θ)det To expand this as a power series in θ we need to use partial fractions, write θ ( θ)( + θ) = a θ + b + θ. ( 4 θ 4 θ then θ = a( + θ)+b( θ). Using this for θ =givesa =/, and for θ =wegetb = /. We can thus continue with (I θp) = ( θ) ( ( θ/)) = ( + θ + θ + ) ( θ + θ 4 θ 8 + ). For the last expansion we have used the geometric series. Now we can compare the coefficients here with those coming from the matrix valued geometric series, (I θp) = θ n (P n ) = n= n= θ n p (n), ) and get, for all n N, P{X n = X =} = p (n) = ( n. ) You may wish to check that p () =. As another example we ask for the probability that, starting in state, we are again in state after n time steps. Then, ( θ det θ ) 4 (I θp) = c det(i θp) = θ ( θ)( + θ ) = θ θ 4 ( θ)( + θ ). Now we use partial fractions again (note that we need the α-term, as numerator and denominator havethesameorder), θ θ 4 ( θ)( + θ ) = α + β θ + γ + θ. To get α, β, γ, welookat α( θ)( + θ )+β( + θ )+γ( θ) = θ θ 4.

12 Then θ =givesβ =/, θ = givesγ =/ andcomparingtheθ coefficient yields α =/. Hence, (I θp) = + ( θ) + ( ( θ )) = + ( + θ + θ + θ + )+ ( θ + θ 4 ). Equating coefficients of θ n gives, { ( n P{X n = X =} = + ) for n, for n =.. Hitting probabilities and expected waiting times As always, X is a Markov chain with discrete statespace I and one-step transition matrix P.Wenow introduce a notation which allows us to look at the Markov chain and vary the initial distributions, this point of view turns out to be very useful later. We denote by P i the law of the chain X when X starts in the fixed state i I. Inotherwords,forall x,...,x n I, P i {X = x,...,x n = x n } = p ix p x x p xn x n. We can also think of P i as the law of the Markov chain conditioned on X = i, i.e. P i (A) =P{A X = i}. The law P w of X with initial distribution w is then given as the mixture or weighted average of the laws P i, i.e. for the initial distribution w =(w i : i I) of the Markov chain and any event A, P w (A) = i I P w (A {X = i}) = i I P w {X = i}p i (A) = i I w i P i (A). We also use this notation for expected values, i.e. E i refers to the expectation with respect to P i and E w refers to expectation with respect to P w. Remember that in expressions such as P{X n+ = x n+ X n = x n } = p xnxn+ we do not have to write the index w because if we are told the state at time n we can forget about the initial distribution... Hitting probabilities We define the first hitting time T j of a state j I by T j := min{n >:X n = j} and understand that T j := if the set is empty. Note the strict inequality here: we always have T j >. Also observe the following equality of events, {T j = } = {X never hits j}.

13 Using this we can now define the hitting probabilities (F ij : i, j I), where F ij is the probability that the chain started in i hits the state j (in positive, finite time). We let F ij := P i {T j < }. Note that F ii is in fact the probability that the chain started in i returns to i in finite time and need not be! Example: We suppose a particle is moving on the integers Z as follows: it is started at the origin and in the first step moves to the left with probability / and to the right with probability /. In each further step it moves one step away from the origin. (Exercise: write down the P -matrix!) Here we have if j>i>orj<i<, if i< <jor i> >j, F ij = / if i =,j, if i = j. Theorem. For a fixed state b I the probabilities x i := F i,b := P i {T b < }, for i I, form the least non-negative solution of the system of equations, ( ) x i = p ij x j + p ib for i I. (..) j b In particular, if y i for all i I form a solution of ( ) y i = p ij y j + p ib, (..) then y i x i for every i I. j b Note: y i =foralli I always solves (..). Proof: First step: We show that x i = P i {T b < } is a solution of (..). Let E = {T b < } be the event that the chain hits b. Then x i = P i (E) = P i (E {X = j}) j I = P i {X = j}p i {E X = j} j I = p ij P{E X = j}. j I

14 Looking at the last probability we see that P{E X = j} = { if j = b, P j {T b < } if j b. This gives x i = j b p ij x j + p ib. Second step: Suppose y i forms a solution of (..). To finish the proof we have to show that y i x i.observethat, y i = Repeated substitution yields, ( ) p ij y j + p ib j b = p ib + j b = p ib + j b ( ) p ij p jk y k + p jb k b p ij p jb + j b p ij p jk y k k b = P i {X = b} + P i {X b, X = b} + j b p ij p jk y k. y i = P i {X = b} + P i {X b, X = b} P i {X b,...,x n b, X n = b} + p ij p j j p jn j n y jn. j b j n b Because y j the last term is positive and we get, for all n N, y i P i {X = b} + P i {X b, X = b} P i {X b,...,x n b, X n = b} = P i {T b =} + + P i {T b = n} = P i {T b n}. Hence, letting n, ( y i lim P ) i{t b n} = P i {T b n} = P i {T b < } = x i. n n k b Example Consider a Markov chain with state space I = {,,,...} given by the diagram below, where, for i =,,...,wehave<p i = q i <. Here is an absorbing state and x i := P i {T < } is the hitting probability of state if the chain is started in state i I. 4

15 q p q p q p q 4 4 p 4 q 5 5 p 5... For γ :=,γ i := q iq i q p i p i p for i, If i= γ i = we have x i =foralli I. If i= γ i <, wehave j=i x i = γ j j= γ j for i I. To prove this, we first use the one-step method to find the equations x =and The solutions of this system are given by x i = q i x i + p i x i+, for i. x i = A(γ + γ i ), for i, for any choice of a fixed A R. To prove this we have to check that the given form solves the equation, show that any solution is of this form. For the first part we just plug in: and q i x i + p i x i+ = q i ( A(γ + + γ i )) + p i ( A(γ + + γ i )) = A(q i (γ + + γ i )+p i (γ + + γ i )), q i (γ + + γ i )+p i (γ + + γ i )=γ + + γ i + p iq i q + q i q p i p = γ + + γ i. For the second part assume that (x i : i =,, )isany solution. Let y i = x i x i. Then p i y i+ = q i y i, which implies inductively that ( qi ) y i+ = y i = γ i y. p i Hence Therefore, for A := y,wehave x x i = y + + y i = y (γ + + γ i ). x i = A(γ + + γ i ). 5

16 To complete the proof we have to find the smallest nonnegative solution, which means we want to make A as large a spossible without making x i negative. First suppose that i= γ i =, thenfor any A> the solution gets negative eventually, so that the largest nonnegative solution corresponds to the case A =, i.e. x i =. Next suppose that i= γ i := M<. Then the solution remains nonnegative iff AM, i.e. if A /M, so that the choice A =/M gives the smallest solution. This solution is the one giving the right value for the hitting probabilities. Now we look at a simple random walk {X n : n =,, } with parameter p [/, ]. If X = i>, we are essentially in the situation of the previous example with p i = p and q i = p. Then γ i = (( p)/p) i and i= γ i dicverges if p = and converges otherwise. Hence F i =ifp = and otherwise, for i>, Note that, by the one-step method, F i = ( ) j p j=i p ( ) j = p j= p ( p ) i. F = pf + qf = p p + q = p + q = p. p Hence we can summarize our results for the random walk case as follows. Theorem. The hitting probabilities F ij = P i {T j < } for a simple random walk with parameter p [/, ] are given in the symmetric case p =/ by F ij = for all i, j Z, and in the asymmetric case p>/ by if i<j, F ij = p if i = j, ) i j if i>j. ( p p By looking at X the theorem also gives full information about the case p</. As a special case note that, for all p [, ], P {X does not return to } = p q. p.. Expected waiting times In a manner similar to the problem of hitting probabilities, Theorem., one can prove the following theorem for the expected waiting time until a state b ishitforthefirsttime. Theorem.4 Fix a state b I and let y i := E i {T b }.Then y i =+ j b p ij y j (..) and y is the least nonnegative solution of this equation.

17 A B D C Note that y i = is possible for some, or even all, i, andtheconvention = is in place. Example. We again look at the simple random walk and calculate y i = E{T }, the average time taken to reach the origin if we start from the state i. To save some work we use the intuitively obvious extra equation y n = ny for all n. Combining this with (..) for i =gives y =+py =+py. If p</ (the case of a downward drift) we get the solution E n {T } = n p for all n. If p / thisgivesy +y, which implies y =. Inparticular,forp =/ weget, The average waiting time until a symmetric, simple random walk travels from state i to state j is infinite for all i j. However, we know that the waiting time is finite almost surely! Example. Consider a Markov chain with statespace I = {A, B, C, D} and jump probabilities given by the diagram. Problem : Find the expected time until the chain started in C reaches A. Solution: Let x i = E i {T A }. By considering the first step and using the Markov property (or just using (..)) we get x C = + x B + x D x B = + + x C x D = + + x C. Hence, x C =+ ( + x C)+ ( + x C)=+ x C, 7

18 which implies E C {T A } = x C = 4. The expected time until the chain started in C reaches A is 4. Problem : What is the probability that the chain started in A reaches the state C before B? Solution: Let x i = P i {T C <T B }. By considering the first step and using the Markov property we get x A = x D +, x D = x A +. Hence x A =(/)x A +(/), which gives x A =/5. Hence the probability of hitting C before B when we start in A is /5..4 Classification of states and the renewal theorem We now begin our study of the long-term behaviour of the Markov chain. Recall that T i =inf{n > :X n = i} and Definition The state i is called F ii = P i {T i < } = P i {X returns to i at some positive time}. transient if F ii <, i.e. if there is a positive probability of escape from i, recurrent or persistent if F ii =, i.e. if the chain returns to state i almost surely, and hence infinitely often. Let us look at the total number of visits to a state i given by the random variable V i = {Xn=i}, n= where {Xn=i} takes the value if X n = i and otherwise. Note that we include time n =. If i is recurrent we have P i {V i = } =, in particular the expected number of visits to state i is E i {V i } =. Ifi is transient, then, for n, hence E i {V i } = n= P i {V i = n} = F n ii ( F ii ), np i {V i = n} =( F ii ) n= nf n ii = F ii ( F ii ) = F ii <, recalling n= nxn =( x) for x <. Example: We look at the simple random walk again, focusing on the case p>qof an upward drift. We know that F ii =q = (p q), so E i V i =(p q). Intuitively, this can be explained by recalling that X n /n p q. For large n, the walk must visit n(p q) states in its first n steps, so it can spend a time of roughly /(p q) in each state. 8

19 .4. The renewal theorem The aim of this section is to study the long term asymptotics of P i {X n = i} =(P n ) ii. We start by deriving a second formula for E i {V i }. Directly from the definition of V i we get E i {V i } = E i { {Xn=i}} = n= Recall that P i {X n = i} =(P n ) ii. Hence, E i {V i } = P i {X n = i}. n= (P n ) ii. n= We thus get the renewal theorem in the transient case. Theorem.5 (Renewal Theorem in the transient case) If i is a transient state of the Markov chain X, then (P n ) ii = E i {V i } = <. F ii n= In particular, we have lim n (P n ) ii =. The more interesting case of the renewal theorem refers to the recurrent case. In this case, E i {V i } = n= (P n ) ii =, leavingopen whether lim n (P n ) ii =. In fact, as we shall see below, both cases can occur. Definition A recurrent state i is called positive recurrent if E i {T i } <, i.e. if the mean time until return to i is finite, null recurrent if E i {T i } =. For example we have seen before that in the symmetric simple random walk E {T } =, so(and all other states) is null recurrent. If we want study the limiting behaviour of (P n ) ii we first have to deal with the problem of periodicity. For example, for simple random walk we have (P n ) ii { = ifn odd, > otherwise. We can only expect interesting behaviour for the limit of (P n ) ii. Generally, we define the period of a state i I as the greatest common divisor of the set {n > : (P n ) ii > }. Wewrited(i) for the period of the state i. For example, the period of every state in the simple random walk is. For another example let ( ) P =. Although one cannot return to the first state imediately, the period is one. 9

20 Theorem. (Renewal Theorem (main part)) (a) If i is a null recurrent state, then lim n (P n ) ii =. (b) If i is positive recurrent, then (P n ) ii =if n is not a multiple of d(i). Otherwise, lim (P nd(i) ) ii = n d(i) E i {T i }. We omit the proof and look at examples instead. Example.7 This is a trivial example without randomness, a particle moves always one step counterclockwise through the graph. A B D C Here the period of every state is 4 and the average return time is also 4, the transition probabilities satisfy (P 4n ) ii =. The theorem holds (even without the limit!) Example.8 Consider the example given by the following graph. The one-step transition matrix is given by P = / /. All states have period. We can find (a) P n by diagonalization with tricks, (b) E i {T i } by the one-step method. Then we can verify the statement of the theorem in our example. (a) We have trace P =,detp = and hence eigenvalues,,. Solving πp = π gives π =(/4, /, /4). Hence P n = U +( ) n U + n U for all n,

21 and /4 / /4 U = /4 / /4, /4 / /4 /4 / /4 U = U P = /4 / /4, /4 / /4 and U is irrelevant. For example, we get (P n ) = 4 +( )n 4 for all n. (b) For y i = E i {T } we get the equations y = +y y = + y y = +y. Solving gives, for example, E {T } = y =4,E {T } = 4 by symmetry and trivially E {T } =. Checking the renewal theorem we observe that (P n ) =ifn is odd and lim (P n ) = lim n n 4 +( )n 4 = = d() E {T }..4. Class properties and irreducibility We say that the state i communicates with the state j, andwritei j if there exist n, m with (P n ) ij > and(p m ) ji >. The relation is an equivalence relation on the state space I, because i i, i j implies j i, i j and j k implies i k. Only the last statement is nontrivial. To prove it assume i j and j k. Then there exist n,n with (P n ) ij > and(p n ) jk >. Then, (P n +n ) ik = P i {X(n + n )=k} P i {X(n )=j, X(n + n )=k} P i {X(n )=j}p i {X(n + n )=k X(n )=j} = (P n ) ij (P n ) jk >. Similarly, there exist m,m with (P m +m ) ki > and hence i k. Since is an equivalence relation, we can define the corresponding equivalence classes, which are called communicating classes. Theclassofi consists of all j I with i j. A property of a state is a class property if whenever it holds for one state, it holds for all states in the same class. The following properties are class properties,

22 i is transient, i is positive recurrent, i is null recurrent, i has period d. Now we can attribute the property to the class, saying for example that a class has period d, etc. We can decompose the statespace I as a disjoint union I = T R R..., where T is the set of transient states and R,R,... are the recurrent classes. If X starts in T, it can either stay in T forever (somehow drifting off to infinity) or get trapped (and stay forever) in one of the recurrent classes. p>q p p q p q p q... Figure.: classes, recurrent, 4 transient A Markov chain is called irreducible if all states communicate with each other, i.e. if I is the only communicating class. Thanks to the decomposition the study of general chains is frequently reduced to the study of irreducible chains..5 The Big Theorem.5. The invariant distribution Let π be a probability mass function on I, i.e. π : I [, ] with i I π i =. π is called an invariant distribution or equilibrium distribution if πp = π, thatis π i p ij = π j for all j. i I If such a π exists, then we can use is at as initial distribution to the following effect, P π {X = j} = i I P π {X = i, X = j} = i I π i p ij = π j = P π {X = j},

23 and, more generally, P π {X n = j} = π j for all j I. In other words the law of X n under P π is π at all times n, we say that the system is in equilibrium. We now assume that the chain X is irreducible. The all states have the same period d. Thechainis called aperiodic if d =. Theorem.7 (The Big Theorem) Let X be an irreducible chain. Then the following statements are equivalent: X has a positive recurrent state, all states of X are positive recurrent, X has an invariant distribution π. If this holds, the invariant distribution is given by Moreover, π i = E i {T i } >. (a) For all initial distributions w, withprobability, ( ) #visits to state i by time n π i. n (b) E j {#visits to state i before T j } = π i π j for all i j. (c) In the aperiodic case d =, we have for all initial distributions w, lim P w{x n = j} = π j for all j I. n Note how (b) tallies with the following fact deduced from (a), #visits to state i by time n #visits to state j by time n π i π j. We do not give the full proof here, but sketch the proofof(c), because it is a nice example of a coupling argument. WeletX be the Markov chain with initial distribution w and Y an independent Markov chain with the same P -matrix and initial distribution π. The proof comes in two steps. Step. Fix any state b I and let T =inf{n >:X n = b and Y n = b}. We show that P{T < } =. The process W = {W n =(X n,y n ) : n N} is a Markov chain with statespace I I and n-step transition probabilities ( P n ) (i,k)(j,l) =(P n ) ij (P n ) kl,

24 which is positive for sufficiently large n since P is aperiodic. Hence W is irreducible. W has an invariant distribution given by π (i,k) = π i π k and hence by the first part of the big theorem it must be positive recurrent. Positive recurrence implies that the expected first hitting time of every state is finite with probability one, see Q on Sheet. Now observe that T =inf{n >:X n = b and Y n = b} =inf{n >:W n =(b, b)}, and note that we have shown that T< almost surely. Step. The trick is to use the finite time T to switch from the chain X to the chain Y. Let Z be the Markov chain given by { Xn if n T, Z n = Y n if n T. It is intuitively obvious and not hard to show that Z is a Markov chain with the same P -matrix as X and initial distribution w. Wehave Hence, P{Z n = j} = P{X n = j and n<t} + P{Y n = j and n T }. P{X n = j} π j = P{Z n = j} P{Y n = j} = P{X n = j and n<t} P{Y n = j and n<t} P{n <T}. As P{T < } = the last term converges to and we are done..5. Time-reversible Markov chains We discuss the case of time-reversible or symmetrizable Markov chains. Suppose (m i : i I) isa collection of nonnegative numbers, not all zero. We call a Markov chain m-symmetrizable if we have m i p ij = m j p ji for all i, j I. These equations are called detailed balance equations. Theyimplythat m i p ij = m j p ji = m j for all j I. i I If, moreover, M = i I m i <, then π i = m i M i I is an invariant distribution and also solves the detailed balance equations, P π {X = i,...,x n = i n } = P π {X = i n,...,x n = i }. In other words, under the law P π the sequence X,...,X n has the same law as the time-reversed sequence X n,...,x. Note that both statements are very easy to check! Remarks: 4

25 If X is π-symmetrizable, then π is an invariant distribution of X. But conversely, if π is an invariant distribution of X this does not imply that X is π-symmetrizable! It is sometimes much easier to solve the detailed balance equations and thus find an invariant distribution, rather than solving πp = π, see Example. below. If the invariant distribution does not solve the detailed balance equations, then they have no solution. Example.9 Let X be a Markov chain with state space I = {,, } and transition matrix / /4 /4 P = / / / /4 / 7/ Find the equilibrium distribution π. Is Pπ-symmetrizable? π has to satisfy π + π + π = and, from πp = π, π + π + 4 π = π, 4 π + π + π = π, 4 π + π + 7 π = π. Thiscanbesolvedtogiveπ =(/5, /5, /5). To check symmetrizability we have to verify π i p ij = π j p ji for all i, j {,, }. There are three non-trivial equations to be checked, π p = π p π =π π p = π p π = π π p = π p π =π. This is satisfied for our π. In fact one could have started with these equations and π + π + π = and the only solution is the invariant distribution π. Example. Let X be a Markov chain with state space I = {,, } and transition matrix / /4 /4 P = / / /. / / / This matrix is not symmetrizable. Trying to find a suitable m leads to the equations m p = m p m =m m p = m p m =m m p = m p m =m. These equations have only the trivial solution m = m = m =, which is not permitted in the definition of symmetrizability! Still, one can find an invariant distribution π =(/, 5/4, 7/4). 5

26 . Finding the invariant distribution using generating functions Let X be an irreducible chain. The Big Theorem tells us that it is worth trying to find the invariant distribution, because it can tell us lots about the long term behaviour of the Markov chain, for example the ergodic principle ( ) #visits to state j by time n π j n and, in the aperiodic case, lim P w{x n = j} = π j for all j I. n If the state space I = N, then we cannot find π by solving a finite system of linear equations, as before. Instead the powerful method of generating functions is available. The generating function ˆπ of an invariant distribution is given by ˆπ(s) = π n s n. n= One of our requirements for an invariant distribution is that ˆπ() =, which is equivalent to n= π n =. We study the method by looking at an example. Example. Let X be a Markov chain with statespace I = N and one-step transition matrix The system πp = π gives /7 / /7 /7 P = /7 / π + 7 π = π 7 π = π 7 π + 7 π = π 7 π + 7 π 4 = π, and so on. Multiplying the equations above by s, s,s,... gives for the generating function 7 sˆπ(s)+ ) (ˆπ(s) π sπ = sˆπ(s). Rearranging gives, ( ˆπ(s) 7 s s + 7) ( = π 7 ) 7 s, hence ( s) ˆπ(s) =π s 7s +. To find π recall ˆπ() =. Here are two ways to use this.

27 First Method: Use L Hôpital s rule to get ˆπ() = lim s π s 7 = π, hence π =. Better Method: Factorize and cancel the common factor s if possible. Here ˆπ(s) =π ( s) ( s)(s + s ). Hence ˆπ() = π /4 and we arrive at the same conclusion. Altogether 4 ˆπ(s) = s + s = 4 (s +)(s ). Once we have found ˆπ we can find π n by writing ˆπ as a power series and equating coefficients. For this purpose first use partial fractions 4 (s +)(s ) = a s + + b s, which gives 4 =a(s ) + b(s +)ands =givesb = 4/5 ands = givesa =4/5, hence ˆπ(s) = 4 5 s s. Now we use ( s) =+s + s + s + and obtain ˆπ(s) = 4 5 Hence the coefficient at s n is, for all n, ( ( s + s s ) + + (+ s + s + s )) +. π n = 4 5 ( ) ( )n + n+ n+. We have thus found the invariant distribution π. To find the long term average state of the system, recall that we just have to find the mean of the distribution π, whichis nπ n =ˆπ (), n= because ˆπ (s) = n= nπ ns n by taking termwise derivatives of a power series. In our example ˆπ (s) = then ˆπ () = /4 is the long time average state. 4( + s) (s + s ), 7

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]