M3/4/5 S4 Applied Probability

Size: px

Start display at page:

Download "M3/4/5 S4 Applied Probability"

Rose Perkins
5 years ago
Views:

1 M3/4/5 S4 Applied Probability Autumn 215 Badr Missaoui Room 545 Huxley Building Imperial College London Department of Mathematics Imperial College London 18 Queens Gate, London, SW7 2AZ Last Update: Sunday 11 th October, 215

2 Abstract This course aims to give students an understanding of the basics of stochastic processes. The theory of different kinds of processes will be described, and will be illustrated by applications in several areas. The groundwork will be laid for further deep work, especially in such areas as genetics, finance, industrial applications, and medicine. Revision of basic ideas of probability. Important discrete and continuous probability distributions. Random processes: Bernoulli processes, point processes. Poisson processes and their properties; Superposition, thinning of Poisson processes; Nonhomogeneous, compound, and doubly stochastic Poisson processes. Autocorrelation functions. Probability generating functions and how to use them. General continuous-time Markov chains: generator, forward and backward equations, holding times, stationarity, long-term behaviour, jump chain, explosion; birth, death, immigration, emigration processes. Differential and difference equations and pgfs. Finding pgfs. Embedded processes. Time to extinction. Queues. Brownian motion and its properties. Random walks. Gamblers ruin. Branching processes and their properties. Galton-Watson model. Absorbing and reflecting barriers. Markov chains. Chapman-Kolmogorov equations. Recurrent, transient, periodic, aperiodic chains. Returning probabilities and times. Communicating classes. The basic limit theorem. Stationarity. Ergodic Theorem. Time-reversibility. 1

3 Contents Course Information 3 1 Preliminaries The Probability Space Some Rules of Probability Random Variables and Stochastic Processes Discrete Time Markov Chains The Markov Property The Chapman-Kolmogorov Equations Dynamics of the Markov Chain Properties of Markov Chains Recurrence and Transience Aperiodicity and Ergodicity Communicating classes The Decomposition Theorem Class properties Application: Gambler s Ruin Stationarity Existence of a Stationary Distribution on a Finite State Space Ergodic theorem Time Reversibility Summary Properties of irreducible Markov chains Markov chains with a finite state space Poisson Processes Introduction Some Definitions Poisson Process: First Definition Poisson Process: Second Definition Equivalence of Definitions Some Properties of Poisson Processes Inter-Arrival Time Distribution Time to the n th event Conditional Distribution of the Arrival Times Some Extension to Poisson Processes Superposition Thinning Non-Homogeneous Poisson Processes Compound Poisson Processes The Cramér-Lundberg Model in Insurance Mathematics

4 3.6 The Coalescent Process Problem Process

5 Chapter Course Information Lecture times: 9am Monday (room 34), 9am Tuesday (room 34) and 1pm Thursday (room 34); first lecture: Thursday at 1pm. Office Hour: 1-2pm on Mondays in room 545, Huxley Building. If you have any questions regarding the course material, please come to my office hour or ask me before/after the lectures. I cannot answer longer questions by . Problem Sheets: There will be non-assessed problem sheets, to be handed out during the course. I will use 7 lecture slots for problem classes (dates to be announced) and it is recommended that these problems are attempted beforehand. Assessed Coursework: There are two pieces of assessed course work for all students. First assessed coursework: To be handed out on..., (hand in...) (to be confirmed) Second assessed coursework: To be handed out on the... (hand in...) (to be confirmed). You are reminded that any form of plagiarism in the assessed coursework will be treated as an examination offense. Exam: 2 hour May/June exam comprised of 4 questions. There will be no formula sheet in the exam, and it is expected that you will know standard properties of densities (and their functional form). Mastery Examination: In addition to the regular exam, fourth year and MSc students will need to take a mastery fifth question (half an hour). You will obtain feedback through: Marked assessed coursework, Non-assessed coursework, One-to-one meeting with me in my office hour. You can give me feedback about the course by: sending me an or talking to me before/after the lectures or in my office hour, contacting one of the two course representatives, who we will elect in mid October. 3

6 Course details: This course is focused upon applied probability, which is a rather complex academic discipline. We will study discrete and continuous-time stochastic processes and their applications, that is random variables that will vary according to a time parameter. This parameter can change either continuously or discretely. In particular, we will focus upon three broad areas: Markov Chains in discrete time Markov Chains in continuous time (incl. Poisson, Birth-death processes) Brownian Motion: A Markov process in continuous time on a continuous state space Applications: During the course we will look at genuine applications of these ideas, within the context of finance, population genetics and stochastic simulation. Examples include The evolution of a stock, S t over a day of financial trading. Here t [, T ]. The time evolution of DNA sequences back to a most recent common ancestor. The state of a simulation algorithm. intervals. For example, sampling a random variable at discrete time This course is based on material by Dr. Almut Veraart, who taught the course in 212, 213 and 214. Note that this course differs substantially from previous years (that is, before 29), but covers the same material as in the academic years , and The notes are designed so that it is possible to understand applied probability, however, at some points, there are missing definitions (which are filled in during lectures). At numerous points there are exercises in the lecture notes, and it is quite important to attempt them. (The model solutions are provided in the lecture notes, but you are not supposed to read them before you have tried to solve the problem yourself!) The course material will be available on-line through Blackboard Learn, where you will need to sign up for this course: All parts of the lecture notes and of the coursework are potentially examinable. 4

7 References: The recommended reference for this course is: Grimmett, G. & Stirzaker, D. Probability and Random Processes (21). Third edition, Oxford University Press: Oxford. At times, the course relies quite heavily on this text, and reading this book, as well as doing the exercises, will help quite a lot. Solutions to the exercises in the book can be found in: Grimmett, G. & Stirzaker, D. One Thousand Exercises in Probability (21). Oxford University Press: Oxford. Other very good textbooks include: Norris, J. R. Markov chains (1998). Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press: Cambridge. Ross, S. Introduction to Probability Models (27). Academic Press: London. Pinsky, M. and Karlin, S. An Introduction to Stochastic Modeling (211). Fourth Edition. Academic Press. For very motivated students, the references: Billingsley, P. Probability and Measure (1995). Wiley: New York. Kallenberg, O. Foundations of modern probability (22). Second edition. Probability and its Applications. Springer-Verlag, New York. Shiryaev, A. Probability (1996). Springer: New York. Williams, D. Probability with Martingales (1991). Cambridge University Press: Cambridge. might be of interest, but note that they are very advanced. 5

8 Chapter 1 Preliminaries 1.1 The Probability Space Before we begin the course, there are number of concepts that need to be known. You should be familiar with the material in this Section from your first and second year probability and statistics courses. If you have forgotten some of the material, however, then it is a good idea to start revising the material now before we start with the new material! Recall that probability theory starts with a probability space which consists of a triple (Ω, F, P), where the set Ω is the sample-space (or state-space), which is the set of all possible outcomes. E. g. when we role a die once, the corresponding sample space is given by Ω = {1, 2, 3, 4, 5, 6}. F is a collection of events/sets we can make probability statements about (we will make this precise below). P is a probability measure which measures the probability of each set A F. Let us revise the underlying concepts briefly. Let Ω be a set. Definition A collection F of subsets of Ω is called an algebra on Ω if it satisfies the following three conditions: 1. F ; 2. if A F, then A c F ; 3. if A 1, A 2 F, then A 1 A 2 F. Note that the above definition implies that Ω = c F, and that if A 1, A 2 F, then A 1 A 2 = (A c 1 A c 2) c F. So we have seen that an algebra is stable under finitely many set operations. Definition A collection F of subsets of Ω is called a σ algebra on Ω if F is an algebra such that if A 1, A 2,..., F, then i=1 A i F. Note that if F is a σ-algebra and A 1, A 2,..., F, then i=1 A i = ( i=1 Ac i )c F. Hence a σ-algebra is stable under countably infinite set operations. Definition Let Ω denote a set and let F be a σ-algebra on Ω. We call the pair (Ω, F) a measurable space. An element of F is called a F-measurable subset of Ω. 6

9 Definition Let A denote an arbitrary family of subsets of Ω. Then the σ algebra generated by A is defined as the smallest σ algebra which contains A, i.e. σ(a) := {G : G is a σ-algebra, A G}. Example The smallest σ algebra associated with Ω is the collection F = σ(ω) = {, Ω}. Example Let A be any subset of Ω. Then σ(a) = {, A, A c, Ω}. Example A very important example is the σ algebra generated by the open subsets of R, which we call the Borel σ algebra and which is denoted by B = B(R). One can show that B = σ({(, y] : y R}). Note that (, y] = n N (, y + 1 n ) is a countable intersection of open sets and hence is in B. Definition A probability measure P on the measurable space (Ω, F) is a function P : F [, 1] satisfying the following three conditions: 1. P( ) = ; 2. P(Ω) = 1; 3. For any collection {A i : i = 1, 2,... } of disjoint (i.e. A i A j = for all pairs i, j satisfying i j) and F-measurable subsets of Ω, we have ( ) P A i = P(A i ). i i Example Suppose that Ω = {, 1}, then it can be established that F = {, {}, {1}, Ω}. If P({}) = P({1}) =.5 then we have defined a probability space for the uniform distribution on {, 1}. Summarising, we re-iterate that throughout the course, we always assume that we are on a probability space (Ω, F, P), where the set Ω is the sample space and ω Ω is called a sample point; F is a σ-algebra on Ω which describes the family of events. An event is defined as an element of F, i.e. an event is a F-measurable subset of Ω; P is a probability measure on the measurable space (Ω, F). 1.2 Some Rules of Probability As a reminder, and in a non-rigorous way, we have the following identities. All notation is as above and we refer to any sets as above. Note that we use these ideas, without further reference, later in the notes. 7

10 Addition Law Complement P(A c ) = 1 P(A). P(A B) = P(A) + P(B) P(A B). Conditional Probability If P(B), we define P(A B) = P(A B). P(B) In this course, we will deal with conditional probabilities a lot! When we write down a conditional probability, we will always (implicitly) assume that, the event on which we condition, does not have zero probability! Definition 1. A finite or countably infinite family of events {A i : i = 1, 2,... } is called a partition of the set Ω if A i A j = when i j, and A i = Ω. Recall the law of total probability, which we will be using many, many times throughout this course. Theorem 2 (Law of Total Probability). Let (Ω, F, P) denote a probability space. Let {A i : i = 1, 2,... } be a partition of Ω with A i F and P(A i ) > for all i. Then we have i P(B) = i P(B A i ) = i P(B A i )P(A i ). Example Given two events A, B F such that < P(B) < 1, we have P(A) = P(A B)P(B) + P(A B c )P(B c ). Definition Events A and B are independent if P(A B) = P(A)P(B). More generally, a family {A i : i I} is called independent if P ( i J A i ) = P(A i ), i J for all finite subsets J of I. 8

11 1.3 Random Variables and Stochastic Processes Let us recall further important definitions: Definition A random variable is a map X : Ω R, such that for any A B(R), X 1 (A) = {ω Ω : X(ω) A} F. Such a function is said to be F-measurable. Here B(R) is the Borel σ-algebra on R. You can picture a F-measurable function X as follows: Ω X R F X 1 B (R). Note that a random variable X, say, defines a σ algebra which is called the σ algebra generated by X. σ(x) = {X 1 (A), A B(R)} F, Definition The random variable is called discrete if it takes values in some countable subset {x 1, x 2,..., }, only, of R. The discrete random variable has (probability) mass function f : R [, 1] given by f(x) = P(X = x). Definition The random variable X is called continuous if its distribution function can be expressed as F (x) = P(X x) = x f(u)du, x R, for some integrable function f : R [, ) called the (probability) density function of X. Recall that in Definition we defined independent events. Next, we want to define the concept of independence of random variables. Definition Discrete random variables X and Y are independent if the events {X = x} and {Y = y} are independent for all x and y. More generally: A family {X i : i I} of discrete random variables is independent if and only if P (X i = x i for all i J) = i J P(X i = x i ), for all sets {x i : i I} and for all finite subsets J of I. Recall that we cannot define the independence of continuous random variables X and Y in terms of events such as {X = x} and {Y = y}, since these events have zero probabilities and are hence trivially independent. We now state a definition of independence which is valid for any pair of random variables, regardless of their types (discrete, continuous, etc.). Definition Random variables X and Y are called independent if {X x} and {Y y} are independent events for all x, y R. 9

12 Let G, F be σ-algebras on Ω. We call G a sub-σ-algebra of F if G F. Definition The sub-σ-algebras G 1, G 2,... of F are independent, if whenever i 1,..., i n (for n N) are distinct and G ik G ik (for k {1,..., n}), then P(G i1 G in ) = n P(G ik ). Definition Random variables X 1, X 2,... are called independent if the corresponding σ-algebras σ(x 1 ), σ(x 2 ),... are independent. One can show that Definition and Definition are equivalent, i.e. random variables which are independent according to Definition are also independent according to Definition and vice versa. Definition A stochastic process (X t ) t T on (Ω, F, P) is a collection of R valued random variables. I.e. X : Ω T R, where T is some time domain (e.g. T = [, T ] or T = [, )). Definition A filtration on (Ω, F, P) is a family of σ-algebras denoted by {F t } t that is increasing, i.e. F s F t F if s t. Example Let (X t ) t T be a stochastic process. Then {F X t } t with F X t = σ({x s : s t}) is called the filtration generated by X or the natural filtration of X. You can view the natural filtration as the history of the process, since it is generated by past values of the process. Note that F X t denotes the smallest σ-algebra on Ω which contains all pre-images of the type i.e. we have F X t k=1 X 1 s (A) for s t, A B(R) (1.3.1) = σ{xs 1 (A) s t, A B(R)} meaning that you need to take the intersection of all possible σ-algebras that contain the pre-images given in (1.3.1) (cf. Definition 1.1.4). Please note that in the literature, you will find both and F X t = σ({x s : s t}) (1.3.2) F X t = σ{x 1 s (A) s t, A B(R)}. (1.3.3) While the equation (1.3.2) is a notation which is widely used, equation (1.3.3) is the definition of the natural filtration. Note that this does NOT mean that the sets {X s : s t} and {Xs 1 (A) s t, A B(R)} are the same! But again this is just notation which is used interchangeably in the literature. Typically we work under the usual conditions meaning that F contains all the P-null sets of F (which we denote by N ) and that the filtration is right continuous, i.e. F t = F t+ := n N F t+ 1. Intuitively n speaking, right-continuity means that nothing can happen in a infinitesimally small time interval. A stochastic process {X t } t is said to be adapted to the filtration if for each t X t is F t measurable. Clearly, {X t } t is adapted to {Ft X } t! 1

13 Chapter 2 Discrete Time Markov Chains Markov processes are one of the most important (if not the most important) of all stochastic processes. Examples include Poisson processes, Brownian motion, diffusion processes etc, etc. The underlying idea is to encapsulate the dependence structure of the process using a simple property. Andrei Markov ( ) was a Russian mathematician, who is well known for his influential work on stochastic processes. Informally, a Markov process has the property that conditional on the present value, the future is independent of the past. We will, of course, formalize this concept as the Chapter progresses. Markov processes, are often the term for a continuous time process, on a general (continuous) statespace. We will consider one such a process towards the end of this course, but this Chapter is focussed upon Markov chains, which, for our purposes are Markov processes in discrete (or continuous time) on a discrete state space. We begin with discrete-time, discrete space Markov chains, and we will investigate a variety of properties, including stationarity. In the next Chapter, we then turn to continuous-time, discrete state, Markov chains. This will be connected to the birth process Section in the previous Chapter. 11

14 2.1 The Markov Property We begin by giving the basic set-up. Let X, X 1,... be a sequence of random variables, each variable taking some value in a state-space E Z (indeed E can be any countably infinite set). We write N = N {}. Here {X n } n N is a discrete time stochastic process (note the time parameter is countably infinite). Definition A discrete-time stochastic process {X n } n N on E is a Markov chain if it satisfies the Markov condition P(X n = s X n 1 = x n 1,..., X = x ) = P(X n = s X n 1 = x n 1 ) for all n N and for all s, x,..., x n 1 E. Here we see that the dependence, of X n, conditional upon the sequence X :n 1 = (X,..., X n 1 ), is only on X n 1. An important point to note is that X n is not independent of (say) X n 2. That is, for n 2 = = = P(X n = x n, X n 2 = x n 2 ) P(X n = x n, X n 1 = x n 1, X n 2 = x n 2 ) x n 1 E x n 1 E x n 1 E P(X n = x n X n 1 = x n 1, X n 2 = x n 2 )P(X n 1 = x n 1, X n 2 = x n 2 ) P(X n = x n X n 1 = x n 1 )P(X n 1 = x n 1 X n 2 = x n 2 )P(X n 2 = x n 2 ), where we used the law of total probability, the definition of the conditional probability and the Markov property. Note that the expression above does not factorize into the two marginal probabilities of X n 2 and X n. In the following, we will always work with time-homogeneous transition probabilities (unless stated otherwise): Definition The Markov chain {X n } n N is time-homogeneous if P(X n+1 = j X n = i) = P(X 1 = j X = i) for every n, i, j E. 2. In addition, let K = Card(E) = E. The transition matrix P = (p ij ) is the K K matrix of transition probabilities p ij = P(X n+1 = j X n = i). We finish this Section with the following theorem: Theorem The transition matrix P is a stochastic matrix, i.e.: 1. P has non-negative entries, p ij for all i, j E 2. P has row sums equal to 1; j E p ij = 1 for all i E. Proof. Exercise! Remark Note that throughout this section, we always assume that the state space E either finite (i.e. K < ) or countably infinite (with K = ). 12

15 A useful tool in the context of Markov chains are so called transition diagrams, where we draw a node for each state in E and a directed edges between the nodes i and j (i, j E) if p ij >. Example Let us consider an example of a Markov chain with three possible states {1, 2, 3} and strictly positive transition probabilities p ij for i, j {1, 2, 3}. The corresponding transition diagram is given by p 11 p 22 p 33 p 12 p p 21 p 32 p 31 p The Chapman-Kolmogorov Equations We have looked at the 1 step transition dynamics of a Markov chain. However, it is often of interest to consider the n step transition dynamics, defined as follows. Definition Let n 1. The n step transition matrix P n = (p ij (n)) is the matrix of n step transition probabilities p ij (n) = P(X m+n = j X m = i), with m. Note that p ij (n) is the probability that a process which currently is in state i will be in state j after n steps. Exercise Before we continue, consider three random variables X 1, X 2, X 3. Show that P(X 1 = x 1, X 2 = x 2 X 3 = x 3 ) = P(X 1 = x 1 X 2 = x 2, X 3 = x 3 )P(X 2 = x 2 X 3 = x 3 ). 13

16 Remark For a discrete Markov chain (X n ) n on the state space E, we have P(X n+m = x n+m X n = x n,..., X = x ) = P(X n+m = x n+m X n = x n ), for m 1 and for all x n+m, x n,..., x E (see Problem sheet 1). We can now formulate the Chapman Kolmogorov equations, which can be used for computing n step transition probabilities. Theorem Let m, n 1. Then we have p ij (m + n) = l E p il (m)p lj (n) that is P m+n = P m P n and P n = P n. Remark Note that in the case that K < matrix multiplication is well defined. For the general case we extend the definition in the natural way: Let x be a K dimensional row vector and let P be a K K matrix where K =. Then (xp) j := i E x i p ij, (P 2 ) ik := j E p ij p jk, for i, j, k N. Similarly, we define P n for any n. Also P is the identity matrix, where (P ) ij = δ ij. Proof. By definition p ij (m + n) = P(X m+n = j X = i) = P(X m+n = j, X m = l X = i) l E = P(X m+n = j X m = l, X = i)p(x m = l X = i) l E = l E P(X m+n = j X m = l)p(x m = l X = i) = l E p il (m)p lj (n). Here we have used the law of total probability, Exercise and the Markov property. The proof that P n = P n is left as an exercise, see Problem Sheet 1, Question 1. 14

17 Example Consider the following transition matrix of a two state Markov chain on the state space E = {, 1}, ( ) α 1 α P =, β 1 β for α, β (, 1). (Check whether this is indeed a transition matrix!). Let α =.7, β =.4. What is p (4)? Note that p (4) is the probability, that we will be in state in four steps given that we are in state now. We need to compute P 4. Then ( ) ( ) ( ) P = =, P 4 =.4.6 ( Hence we have p (4) = ) ( ) = ( ). 2.3 Dynamics of the Markov Chain The equations are called the Chapman-Kolmogorov (CK) equations and are the most basic ingredient of Markov chains. They help to relate the long-term behaviour of the Markov chain, to the local transition dynamics (or transition matrix) of the chain. We can also use the CK equations to describe the marginal distribution of the chain, at time n of the process. Let ν (n) i = P(X n = i), i E, n, (this is the probability mass function of X n ) and for K = card(e) = E let ( ) ν (n) = ν (n) 1,..., ν (n) K, be the K dimensional row vector of probabilities for the chain at time n. Theorem With the notation introduced above, we have ν (m+n) = ν (m) P n, for n 1, m. and hence ν (n) = ν () P n, for n 1. Proof. Consider, j E, then ν (m+n) j = P(X m+n = j) = i E P(X m+n = j X m = i)p(x m = i) = i E ν (m) i p ij (n). It is then concluded that: The dynamics of the time-homogeneous Markov chain are determined by the initial probability mass function ν () and the transition matrix P. 15

18 Remark Note that the CK equations are necessary for the Markov property, but they are not sufficient! This is related to the fact that pairwise independence of random variables is weaker than independence! You can find an example of a stochastic process, which satisfies the CK equations, but which is not a Markov chain in Grimmet & Stirzaker, p Exercise Suppose that E = Z. A famous (and sometimes useful) Markov chain is the simple random walk: p if j = i + 1 p ij = 1 p if j = i 1 o/w for p (, 1). Find the n step transition probabilities. Note that the simple random walk can be written as the sum X n = n Y i, where Y 1, Y 2,... are independent random variables taking the values -1, 1 with probabilities (1 p) and p, respectively. Also X = Y denotes the initial value. We want to find P(X n = j X = i). In order to get from i to j in n steps, we could go up u times and down d times. Note that we require Solving for u and d we have i= n = u + d, i + u d = j. u = 1 2 (n i + j), d = 1 (n j + i) for u, d. 2 There are ( n u) possibilities of going up u steps, hence, we have if n i + j is even and p ij (n) = otherwise. ( ) n p ij (n) = P(X n = j X = i) = p u (1 p) d u ( ) n = p 1 2 (n i+j) (1 p) 1 2 (n j+i), (n i + j) 2.4 Properties of Markov Chains 1 2 We now discuss a variety of properties of Markov chains which will allow us to solve a number of interesting questions associated to Markov chains. 16

19 2.4.1 Recurrence and Transience We begin with the concept of recurrence: Definition Let {X n } n N be a Markov chain on a state-space E. A state i E is recurrent if P(X n = i for some n 1 X = i) = 1 that is, the probability of returning to i, starting from i is 1. If the probability is less than 1, the state i is transient. Recurrence is of interest, if we want to answer questions about eventual returns to given states of the Markov chain. Next we introduce the idea of the first passage time: f ij (n) = P(X n = j, X n 1 j,..., X 1 j X = i) with i j. This is the probability that the first time we visit state j, is at time n. Likewise, we can define f ii (n) = P(X n = i, X n 1 i,..., X 1 i X = i), which is the probability that the first time we return to state i, is at time n. Also, write f ij = f ij (n) n=1 which is the probability that the chain ever visits j, starting at i. Exercise What condition, on f jj is needed for state j to be recurrent? Clearly, we need f jj = 1. Exercise Show that f ii (1) = p ii and that we have the recursion n p ii (n) = f ii (l)p ii (n l), l= n 1, (f ii () := for all i). We start off with the case n = 1. Here we have p ii () = 1 and f ii () =. Also, for n = 1, the first visit is equal to the nth visit, we can write p ii (1) = f ii (1) and 1 f ii (l)p ii (1 l) = f ii () p ii (1) + f ii (1) p ii () = f ii (1). }{{}}{{} = =1 l= Altogether we have 1 p ii (1) = f ii (l)p ii (1 l). l= 17

20 In the general case, we argue as follows. Consider all possible cases where the process starts at X = i and also X n = i. The first return to state i occurs at the lth transition. Now define the disjoint events (for l = 1,..., n) E l that the first visit to i (after time ) takes place at time l, i.e. Then (using the law of total probability): E l := {X l = i, X r i, for 1 r < l}. P(X n = i X = i) = Now, we use the Markov property to deduce that So, altogether, we have n P({X n = i} E l {X = i}). l=1 P({X n = i} E l {X = i}) = P({X n = i} E l {X = i}) P(E l {X = i}) P({X = i}) = P({X n = i} E l {X = i})p(e l {X = i}) = P({X n = i} {X l = i})p(e l {X = i}) = p ii (n l)f ii (l). P(X n = i X = i) = p ii (n) = n p ii (n l)f ii (l). From the simple exercise, we see that: A state j E is recurrent if and only if, the f jj = 1 holds. Likewise, a state j E is transient if and only if f jj < 1. In general, this can be very difficult to calculate, so let us consider a condition on the n step transition probabilities. Theorem We have: 1. j E is recurrent if and only if n=1 p jj(n) =. 2. j E is transient if and only if n=1 p jj(n) <. l=1 Proof. Suppose the Markov chain is currently in state j E and j is recurrent, then with probability 1 it will return to state j. By the Markov property, we know that the process starts over again when it reenters j, and we know that with probability 1, it will reenter state j again (and again and again...). Hence, the process will in fact reenter the recurrent state j infinitely often. 18

21 j transient n=1 p jj(n) < : Suppose now that j is transient. Then, each time when the chain visits j, there is a positive probability (1 f jj ) that it will never again enter state j. Let M denote the number of periods that the chain is in state j: M = I n, where I n = n= { 1, if Xn = j,, if X n j, If the chain starts at j, then the probability that the chain will be in state j exactly n times (for n 1) is given by P(M = n X = j) = f n 1 jj (1 f jj ), This is the probability mass function of a geometric distribution, which has (finite) mean Then we have > E(M X = j) = n=1 1 (1 f jj ) = E (M X = j) = nf n 1 jj (1 f jj ) = (1 f jj) (1 f jj ) 2 = 1 (1 f jj ) <, = E(I n X = j) = P(X n = j X = j) n= p jj (n). n= j transient n=1 p jj(n) < : Conversely, suppose that n= p jj(n) <. Then M is a random variable with finite mean, thus M must be finite. That implies that starting from state j, the chain returns to state j only finitely many times. Hence, there is a positive probability, that starting from state j, the chain never returns to j. I.e. 1 f jj >. Hence f jj < 1, hence j is transient. j recurrent n=1 p jj(n) = : Similarly, we conclude that the state j is recurrent if and only if, starting in state j, the expected number of returns to state j is infinite. n= These results lead us to Corollary If j is transient then p ij (n) as n for all i. 19

22 Proof. If j is transient, then n=1 p jj(n) <. Hence p jj (n) as n. Similarly as in Exercise 2.4.3, we have for any i E that p ij (n) = n f ij (n l)p jj (l), n 1, f ij () :=, for all i. l= Note that p ij () = for i j. Then p ij (n) = n= = n f ij (n l)p jj (l) = p jj (l) f ij (n l) n= l= l= n= } {{ } =f ij l= n=l p jj (l) f ij (n) p jj (l) <. l= The necessary condition for the convergence of the infinite series is p ij (n) as n. We can see that a state is either recurrent or transient. First Hitting Times and Mean Recurrence Time An alternative and important way to classify the states of a Markov chain is through the first hitting time T j = inf{n 1 : X n = j} (here we use the convention that inf{ } = ). Note that P(T i = X = i) > iff i is transient; in that case E[T i X = i] =. Definition The mean recurrence time µ i of a state i is defined as { µ i = E[T i X = i] = n=1 nf ii(n) if i is recurrent if i is transient. Definition A recurrent state i is called null if µ i = and positive (or non-null) if µ i <. An important result, proved in Grimmett & Stirzaker (21) is: Theorem A recurrent state i is null iff p ii (n) as n ; if this holds then p ji (n) as n for all j. 2

23 2.4.2 Aperiodicity and Ergodicity Definition The period of a state i is defined by d(i) = gcd{n : p ii (n) > } the greatest common divisor of the epochs at which return is possible. If d(i) > 1 then the state is periodic otherwise it is aperiodic. Definition A state is ergodic if it is positive, recurrent and aperiodic. Example Consider the Markov chain with state space E = {1, 2, 3, 4} and transition matrix 1 P = We can easily see that for i {1, 2, 3, 4} we have p ii (n) = 1 > for n {4, 8, 12,... }. Hence d i = Communicating classes We now look at the way various states are inter-related. Definition We say that state j is accessible from state i, written i j, if the chain may ever visit state j, with positive probability, starting from i. In other words, i j if there exist m such that p ij (m) >. 2. i and j communicate if i j and j i, written i j. Clearly, if i j, i j iff f ij >. Theorem The concept of communication is an equivalence relation. 21

24 Proof. 1. Reflexivity (i i): Note that we have p ii () = 1 since p ij () = δ ij = { 1, if i = j,, if i j. 2. Symmetry (if i j, then j i): This follows directly from the definition. 3. Transitivity (if i j and j k, then i k): i j and j k imply that there exist integers n, m such that p ij (n) > and p jk (m) >. Hence (using the CK equations and the positivity of the transition probabilities) p ik (n + m) = l E p il (n)p lk (m) p ij (n)p jk (m) > i k. Similarly, one can show that i k. Remark The totality of states can be partitioned into equivalence classes, which are called communicating classes. Note that the states in an equivalence class are those which communicate with each other. Note that it may be possible starting in one equivalence class to enter another class with positive probability. In that case, we could not return to the initial class (else the classes would form a single equivalence class). These findings will be formalized in Theorem We have the following Theorem: Theorem If i j then 1. i and j have the same period 2. i is transient if and only if j is transient 3. i is recurrent if and only if j is recurrent. 4. i is null recurrent if and only if j is null recurrent. Proof. We only show the recurrence: Let i j and let i be recurrent. Then there exist integers n, m such that p ij (n) >, p ji (m) >. For any integer l, we have (using the CK equations) Then l=1 p jj (m + l + n) p ji (m) p ii (l) p ij (n). }{{}}{{} > > p jj (l) p jj (m + l + n) p ji (m)p ij (n) p ii (l) =. l=1 Now, we only need to apply Theorem to finish the proof. See Grimmett & Stirzaker (p. 224) for the complete proof. l=1 22

25 Definition A set of states C is 1. closed if p ij = for all i C, j / C 2. irreducible if i j for all i, j C. What the definition tells us, is once we enter a closed set, we never leave; if a closed set only contains one state, i.e. C = {i} for an i E, then i is called absorbing. An irreducible set is aperiodic (or null recurrent etc) if all the states in C have this property; thanks to Theorem 2.5.4, this makes sense. Importantly, if the entire state-space E is irreducible, then we say that the Markov chain is irreducible The Decomposition Theorem An important result is: Theorem The state-space E can be partitioned uniquely into ( ) E = T C i, where T is the set of transient states, and the C i are irreducible, closed sets of recurrent states. i Proof. Let C 1, C 2,... denote the recurrent equivalence classes of. We only need to show that each C r is closed. Suppose there exists an i C r, j C r, such that p ij >. Hence, i j, but j i. Then which contradicts that i is recurrent. P(X n i for all n 1 X = i) P(X 1 = j X = i) = p ij >, This result helps us to understand what is going on, in terms of the Markov chain. If we start the chain in any of the C i then the chain never leaves, and, effectively this is the state-space. On the other hand, if the chain starts in the transient set, the chain either stays there forever, or moves, and gets absorbed into a closed set. We finish the Subsection with the interesting result: Lemma Let K <. Then at least one state is recurrent and all recurrent states are positive. Proof. Suppose that all states are transient; then according to Corollary lim n + p ij (n) = for all i. Since the transition matrix is stochastic, we have j E p ij(n) = 1. Then take the limit through the summation sign to obtain lim n + p ij (n) =. However, this is a contradiction as the L.H.S must be 1. The second part is proved similarly. j Note: The proof of the second part goes as follows: Assume there exists a null recurrent state j E = T ( i C i). I.e. there exists an i such that j C i. Since C i is a closed, recurrent communicating class with one null recurrent state, we know that in fact all states in C i are null recurrent. Also, all states in C i are essential (since they are recurrent). Note now that a transition matrix restricted to an essential class is stochastic. 1 = p ik (n) = p ik (n). k E k C i According to Theorem we know that p jj (n) as n and also p ij (n) as n for all i E. Since all states in the class C i are null recurrent, we get in fact p ik (n) as n for all 23

26 i E and also for all k C i. As before, we take the limit through the summation sign to obtain 1 = lim p ik (n) =, n + k C i which is a contradiction. We conclude this subsection with some remarks. Remark Every recurrent class is closed. Every finite closed class is recurrent and also positive. If you would like to read the proofs of the properties stated above, then have a look at Chapter 1 of the following excellent textbook: J.R. Norris (1997): Markov Chains. Cambridge University Press Class properties Type of class Finite Infinite positive recurrent Closed positive recurrent null recurrent transient Not closed transient transient 2.6 Application: Gambler s Ruin Now we study a very famous example: The Gambler s ruin problem. Consider a gambler who at each play of the game has probability p of winning one unit; probability q = 1 p of losing one unit; Assume that successive games are independent. What is the probability, that if the gambler starts with i units, that the gambler s fortune will reach N before reaching? (We assume that N N.) Let X n denote the gambler s fortune at time n. Then {X n } n {,1,2,... } is a Markov chain with transition probabilities p = p NN = 1, p i(i+1) = p = 1 p i(i 1), i = 1, 2,..., N 1. How many classes does this Markov chain have? Three classes: {}, {1, 2,..., N 1}, {N},the first and third being positive recurrent and the second transient. Recall that a transient class will only be visited finitely many times. Hence after some finite amount of time, the gambler will either win N or go broke. 24

27 For i =, 1,..., N, define the event and W i := gambler s fortune will eventually reach N if he starts with i units, h i = P(W i ) = P(X n = N for some n X = i). Define the event F := gambler wins first game. Clearly, P(F ) = p. Also F c = gambler loses first game, P(F c ) = q = 1 p. If he wins, then he has i + 1 otherwise he has i 1. By conditioning on the outcome of the initial game we get the recurrence relation h i = P(W i ) = P(W i F )P(F ) + P(W i F c )P(F c ) = h i+1 p + h i 1 q, i = 1, 2,..., N 1. We need to solve the difference equation Try h i = cx i. Then Hence we get the auxiliary/characteristic equations: h i = h i+1 p + h i 1 q, i = 1, 2,..., N 1. cx i = cpx i+1 + cqx i 1. x = px 2 + q = px 2 x + q, The solutions are x = q/p and x = 1. The general solution is given by for constants c 1, c 2. h i = { c 1 ( q p) i + c2, if p q, c 1 i + c 2 if p = q, Using the boundary conditions h = and h N = 1, we get As N, we have h i = { 1 (q/p) i h i, 1 (q/p) N if p q, i.e.p 1/2, i N if p = q = 1/2, { 1 (q/p) i, if p > 1/2, if p 1/2, So if p >.5, there is a positive probability that the gambler s fortune will increase indefinitely. However, if p.5, then the gambler will go broke with probability 1 (assuming he plays against an infinitely rich adversary). 2.7 Stationarity We will (finally) start to use the theory that we have been building up on Markov chains. The question we look to answer here is the following: what can we say about the probabilistic behaviour of the chain; is there a stationary behaviour? To answer this question, we begin with the concept of a stationary distribution. 25

28 Definition A vector π is a stationary distribution of a Markov chain {X n } n N on E if: 1. for each j E, π j and j E π j = π = πp, that is, for each j E, π j = i E π ip ij. The term stationarity is used for the following reason: that is πp 2 = (πp)p = πp = π πp n = π for any n N. That is to say, if π is the initial distribution of the Markov chain, then the marginal distribution for any subsequent time instant is also π. Exercise Assume X has distribution π. Show that also X n has distribution π for all n N. Hint: Use Theorem Our objective is to study the limiting behaviour of the Markov chain. We start with a result for Markov chains with a finite state space. Theorem Let K = E <. Suppose for some i E that Then π is a stationary distribution. Proof. Homework, see problem sheet! p ij (n) π j as n for all j E. Note that sometimes we refer to the stationary distribution also as invariant distribution or as equilibrium. Now we turn to the case of a not necessarily finite state space. We formulate a very important theorem, which we will prove in detail. Theorem An irreducible chain has a stationary distribution π if and only if all the states are positive recurrent; in this case π is the unique stationary distribution of the chain and is given by π i = µ 1 i for each i E and where µ i is the mean recurrence time. In order to prove the theorem, we need an intermediate Lemma. Fix a state j and define ρ i (j) to be the expected number of visits to the state i between two successive visits to state j, i.e. ρ i (j) = E[N i (j) X = j], where N i (j) = I {i} {Tj n}(x n, X 1:n ), (2.7.1) where T j is the first time to return to state j. (Hence N j (j) = 1 and ρ j (j) = 1). Then ρ i (j) = n=1 P(X n = i, T j n X = j). n=1 Write ρ(j) as the vector of these quantities. Since T j = i E N i(j), we find, taking (conditional) expectations that µ j = E[T j X = j] = E[N i (j) X = j] = ρ i (j). (2.7.2) i E i E 26

29 Lemma For any state j E of an irreducible, recurrent chain, the vector ρ(j) satisfies ρ i (j) < for all i, and furthermore ρ(j) = ρ(j)p. Note that the sum in the definition of N i (j) in formula (2.7.1) has to be understood in the following sense: For each n N, we have X 1:n := (X 1,..., X n ) and { 1, if Xn = i and T I {i} {Tj n}(x n, X 1:n ) = j n,, otherwise. Recall the notion of the first passage time f ij (n)! Proof. First we prove that ρ i (j) < for all i j. Let l ji (n) = P(X n = i, T j n X = j), denote the probability that the chain reaches state i in n steps without intermediate return to its starting point j. Then f jj (m + n) l ji (m)f ij (n), (2.7.3) since the first return takes place after n + m steps if (1.) X m = i, (2.) there is no return to j prior to time m, (3.) the next subsequent visit to j takes place after another n steps. Since the chain is irreducible, there exists an n such that f ij (n ) >. Using (2.7.3), we get l ji (m) f jj(m + n ) f ij (n, ) then ρ i (j) = 1 l ji (m) f ij (n ) m=1 f jj (m + n ) m=1 1 f ij (n ) f jj (m) m=1 } {{ } =f jj=1 1 f ij (n ) <. 27

30 Next, we prove ρ(j) = ρ(j)p. As before, ρ i (j) = l ji (n). n=1 Also, l ji (1) = p ji, and for n 2: l ji (n) = P(X n = i, X n 1 = r, T j n X = j) r:r j = P(X n = i X n 1 = r, T j n, X = j)p(x n 1 = r, T j n X = j) r:r j = p ri l jr (n 1) r:r j Then, by summing over n, we get ρ i (j) = }{{} 1 p ji + ( ) l jr (n 1) p ri = =ρ j(j) r:r j n=2 r }{{} =ρ r(j) ρ r (j)p ri, which concludes the proof. Lemma Every positive recurrent, irreducible chain has a stationary distribution. Proof. From Lemma we get the representation result ρ(j) = ρ(j)p. From equation (2.7.2) we know that µ j = i E ρ i(j), which is clearly nonnegative. Also the µ j are finite for any positive recurrent chain. Define π i := ρ i(j) µ j. Then π i for all i and i E π i = 1 and π(j) = π(j)p, hence π is a stationary distribution. Theorem If the chain is irreducible and recurrent, then there exists a positive root x of the equation x = xp, which is unique up to a multiplicative constant. Moreover, the chain is positive recurrent if i x i < and null if i x i =. Proof. The existence of the root is an immediate consequence of Lemma This root is always non negative and can in fact to be taken strictly positive. The proof of the uniqueness is left as an exercise! You can find the solution to the above theorem in Grimmet and Stirzaker (29): One Thousand Exercises in Probability (p. 35, Problem ). Exercise A useful result used below is as follows. Let X be a non-negative, integer valued random variable. Show that E[X] = P(X > n) = P(X n). (2.7.4) n= n=1 28

31 You can check your solution with E[X] = np(x = n) = = n= P(X m) = m=1 n 1 n= m= P(X = n) = P(X > m). m= m= n=m+1 P(X = n) We are now in a position to prove Theorem Recall that we already proved that any irreducible, positive recurrent chain has a stationary distribution. Hence, we need to show: If an irreducible chain has a stationary distribution, then the chain is positive recurrent. π i = 1/µ i. Note that the uniqueness of the stationary distribution will follow from Theorem Proof of Theorem Suppose that π is the stationary distribution of the chain. Assume there exists a transient state. Since the chain is irreducible that implies that all states are transient. If all states are transient then p ij (n), as n, for all i, j, by Corollary Since πp n = π, for any j π j = i E π i p ij (n) n thus, π could not be a stationary vector (see Definition 2.7.1); all states are recurrent. This follows from switching the order of summation and limits using the dominated convergence theorem. Theorem (Dominated Convergence Theorem). If i a i(n) is an absolutely convergent series for all n N such that 1. for all i the limit lim n a i (n) = a i exists, 2. there exists a sequence (b i ) i, such that b i for all i and i b i < such that for all n, i : a i (n) b i. Then i a i < and a i = i i lim a i(n) = lim n n i a i (n). Here we have a i (n) = π i p ij (n). Clearly, i a i(n) is absolutely convergent for all n since i π ip ij (n) = i π ip ij (n) = π j 1 <. Also a i (n) =: a i for all n. Next, a i (n) = π i p ij (n) π i =: b i and i b i = i π i = 1 <. Applying Theorem concludes the proof. 29

32 Now, we show that the existence of π implies that all states are positive (recurrent) and that π i = µ 1 i each i. Suppose that X π (i.e. P(X = i) = π i for each i), using (2.7.4), π j µ j = P(X = j)e(t j X = j) = = P(T j n, X = j). n=1 P(T j n X = j)p(x = j) But, P(T j 1, X = j) = P(X = j) (since T j 1 by definition) and for n 2 n=1 P(T j n, X = j) = P(X = j, X m j, 1 m n 1) = P(X m j, 1 m n 1) P(X m j, m n 1) = P(X m j, m n 2) P(X m j, m n 1) = a n 2 a n 1, where we have used homogeneity. Note that we have used the law of total probability again: P(X m j, 1 m n 1) = P(X = j, X m j, 1 m n 1) + P(X j, X m j, 1 m n 1), hence P(X = j, X m j, 1 m n 1) = P(X m j, 1 m n 1) P(X j, X m j, 1 m n 1). We define a n = P(X m j, m n). Then, summing over n (telescoping sum!) However, π j µ j = P(X = j) + by recurrence of j. That is, π 1 j = µ j if π j >. (a n 2 a n 1 ) n=2 = P(X = j) + P(X j) lim n + a n = 1 lim n + a n. a n P(X m j, m) = To see that π j > for all j, suppose the converse; then = π j = i E π i p ij (n) π i p ij (n) for for all i, n, yielding that π i = whenever i j. However, the chain is irreducible, so that π i = for each i - a contradiction to the fact that π is a stationary vector. Thus µ i < and all states are positive. To finish, if π exists then it is unique and all states are positive recurrent. Conversely, if the states of the chain are positive recurrent then the chain has a stationary distribution from Lemma Remark Note that Theorem provides a very useful criterion for checking whether an irreducible chain is positive recurrent: You just have to look for a stationary solution! Note that the stationary solution is a left eigenvector of the transition matrix! π = πp. 3

33 Example Let E = {1, 2} and the transition matrix is given by ( ).5.5 P = Find the stationary solution! We need to find a solution to the following equation We get the following system of equations: (π 1, π 2 ) = (π 1, π 2 )P. π 1 = 1 2 π π 2, π 2 = 1 2 π π 2. Also, we can use the probability condition that π 1 + π 2 = 1. We obtain ( 1 (π 1, π 2 ) = 3, 2 ). 3 Compute the mean recurrent times! µ 1 = 1/π 1 = 3, µ 2 = 1/π 2 = 3/2. An important question to answer is, when is the limiting distribution (as the time parameter goes to ) also the stationary distribution? Theorem For an irreducible aperiodic, chain we have lim n + p ij(n) = 1 µ j. The proof is quite long and can be seen in Grimmett & Stirzaker (21). We conclude this section with some important remarks: Remark If the irreducible chain is transient or null recurrent, then p ij (n) for all i, j E since µ j =. If the chain is irreducible aperiodic and positive recurrent then we have: where π is the unique stationary distribution. lim p ij(n) = π j = 1/µ j for all i, j E, n + The limiting distribution of an irreducible aperiodic chain does not depend on the starting point (X = i), but forgets its origin, hence we have P(X n = j) = i E P(X = i)p ij (n) 1 µ j, as n. 31

34 2.7.1 Existence of a Stationary Distribution on a Finite State Space So far, we have studied general Markov chains in discrete time with a discrete state space. As mentioned before, the state space E could be any countably infinite set, e.g. E = N. There are however, many examples where the state space is indeed finite, i.e. K = Card(E) = E <, e.g. E = {1, 2, 3} or E = {sunny, rainy} etc.. In that case, the theory simplifies and we obtain some very nice results which are very useful in applications. The main results we will establish in this Subsection are the following ones: Existence: A discrete-time Markov chain on a finite state space always has (at least) one stationary distribution. Uniqueness: Every Markov Chain with a finite state space has a unique stationary distribution unless the chain has two or more closed communicating classes. Definition A state i E is essential if for all j such that i j it is also true that j i. A state i E is inessential if it is not essential. Lemma If i is an essential state and i j, then j is essential. Proof. Choose any l E such that j l. We know that i j, hence we have i l. Since i is essential, we have l i and also i j, which implies l j. Remark The previous Lemma implies that all states in a single communicating class are either all essential or inessential. Remark The definition above is equivalent to saying: A state is essential if it belongs to a closed communicating class. Also, a state is essential, if it communicates with every state which is accessible from it. Note that one can show that all inessential states are transient. This implies that all recurrent states are essential! I.e. inessential = transient recurrent = essential. Note here that essential states need not be recurrent. For example, when we consider the asymmetric random walk on E = Z (with < p < 1, p.5), then we have an irreducible Markov chain (which implies in particular that all states are essential), which is transient. However, on a finite state space (i.e. when K < ) the definition of a transient state coincides with that of an inessential state. Also, the definition of a recurrent state coincides with the one of an essential state, i.e. inessential transient recurrent essential. Lemma Every finite chain has at least one essential class. Proof. We can inductively define a sequence (i, i 1,... ) as follows: Fix an arbitrary initial state i. For n 1, given (i, i 1,..., i n 1 ), if i n 1 is essential, stop. Otherwise we need to proceed... Recall that an essential state is a state in a closed communicating class. Hence, if state i n 1 is inessential, it is in a communicating class, which is not closed. Hence, we can find a state i n such that i n 1 i n but i n i n 1. There can be no repeated states in the sequence since if m < n and i n i m, then i n i n 1 which would be a contradiction. Since the state space is finite and since the sequence cannot repeat elements, the sequence must terminate in a state, which is in a closed communicating class, i.e. in an essential state. 32

35 Remark Note that according to Lemma 2.5.7, every finite Markov chain has at least one recurrent state. Then, Remark states that every recurrent class is closed. Hence, combining these results, we get that every finite Markov chain has at least one closed communicating class. This implies the existence of an essential class. So there was actually no need to prove Lemma , since the result is already implied by our previous results! Lemma Let C denote an essential communicating class. Then the transition matrix P restricted to C is stochastic. Often we write P(C) (or P C ) for the restriction of P to C. Proof. Since all elements of P are non-negative, this is also true for P(C). j E p ij = 1 for all i E. Let i C, then p il = for l C. Hence, we have 1 = j E p ij = j C p ij + j E\C p ij = j C p ij. Further, we know that Theorem Suppose we have a finite state space. If π is stationary for the transition matrix P, then π i = for all inessential states i. Proof. Let C denote an essential communicating class and let P denote the transition matrix for the entire chain (not just for the restriction to C). Define πp(c) := j C(πP) j. Note that (πp) j = l C π l p lj + l C π l p lj. Since both sums are finite we can interchange the order of summation and we obtain πp(c) = (πp) j = π l p lj + π l p lj j C j C l C l C = π l p lj + π l p lj. l C j C l C j C 33

36 Recall that for l C we have j C p lj = 1. Hence πp(c) = π l + π l p lj. l C l C j C Define π(c) := l C π l. Since π is a stationary distribution, we have that πp = π, and hence also πp(c) = π(c). This implies that π l p lj =. l C j C The sum consists of only nonnegative elements and is therefore only zero if all elements are zero, i.e. π l p lj =, for all l C, j C. (2.7.5) Suppose now that l is inessential. According to the proof of Lemma we can find a sequence of states l, l 1, l 2,..., l r satisfying p li 1l i > for i = 1,..., r; the states l, l 1, l 2,..., l r 1 are inessential and l r C where C is an essential communicating class. We know that p lr 1l r >, but also π lr 1 p lr 1l r = (from (2.7.5)). Hence π lr 1 =. We can now carry out an induction backwards where we argue that if π li =, then = π li = π j p jli, j E which implies π j p jli = and π j = for j {l r 1, l r 2,..., l }. We conclude that π l =. 34

37 Exercise Suppose we have a Markov chain with state space E = {1, 2, 3, 4} and transition matrix P = Draw a transition diagram an find the communicating classes! Determine whether the classes are (positive) recurrent or transient. Solution: The transition diagram is given by: We have three communicating classes: 1. C 1 = {1, 2} is (positive) recurrent, closed; 2. E 1 = {3} is transient, not closed. 3. C 2 = {4} is (positive) recurrent, closed, absorbing. We can now formulate an important result. Theorem Suppose we have a finite state space. The stationary distribution π for a transition matrix P is unique if and only if there is a unique essential communicating class. Proof of Theorem First suppose there is a unique essential communicating class C. Write P C for the restriction of the matrix P to the states in C. Suppose i C and p ij >. Since i is essential and i j, it must also hold that j i, hence j C. So we know that P C is a transition matrix which has to be irreducible on C (not necessarily on the entire state space!). From Lemma we obtain that all states in C are positive recurrent, hence we can apply Theorem to conclude that there exists a unique stationary distribution π C 1: C for P C. Let π be a stationary distribution on E satisfying π = πp. Theorem tells us that π i = whenever i C. Hence π is supported on C only. 35

38 Consequently, for i C, we have π i = π j p ji = π j p ji = π j p C ji, j E j C j C where π restricted to C is stationary for P C. Since the stationary distribution for P C is unique, we get π i = πi C for all i C. Altogether we have π i = { π C i, if i C,, if i C., and the solution π = πp is unique. Now suppose that there is a unique stationary distribution and two distinct essential communicating classes for P, say C 1 and C 2. Clearly, the restriction of P to each of these classes is irreducible. Therefore, for each i = 1, 2 there exists a measure π (i) supported on C i which is stationary for P Ci. We can see (check it!) that each π (i) is stationary for P. Hence P has a stationary solution, but it is not unique Ergodic theorem Now we formulate the ergodic theorem which is concerned with limiting behaviour of averages over time. Theorem (Ergodic Theorem). Suppose we are given an irreducible Markov chain {X n } n N with state space E. Let µ i denote the expected mean recurrence time to state i E and let n 1 V i (n) = k= 1 {Xk =i}, denote the number of visits to i before n. Then V i (n)/n denotes the proportion of time before n spent in state i. Then ( Vi (n) P 1, n µ i See Norris (1997), Chapter 1.1 for a proof. ) as n = 1. Altogether, we get the following results. If the chain is irreducible and positive recurrent, then V i (n)/n π i (the unique stationary solution) as n. If it is irreducible and null recurrent or transient, we have V i (n)/n as n. 36

39 2.8 Time Reversibility An interesting concept in the study of Markov chains is that of time reversibility. The idea is to reverse the time scale of the Markov chain; such a concept, as we will see in a moment, is very useful for constructing Markov chains with a pre-specified stationary distribution. This is important, for example for Markov chain Monte Carlo (MCMC) algorithms. Define an irreducible, positive recurrent Markov chain {X n } n {,1,...,N} for an N N. We assume that π is the stationary distribution, and P is the transition matrix, and that for any n {, 1,..., N} the marginal distribution is also π. The reversed chain is defined to be, for any n {, 1,..., N} Y n = X N n. Theorem The sequence Y is a Markov chain which satisfies P(Y n+1 = j Y n = i) = π j π i p ji. Proof. P(Y n+1 = i n+1 Y n = i n, Y n 1 = i n 1,..., Y = i ) = P(Y k = i k, k n + 1) P(Y k = i k, k n) = P(X N k = i k, k n + 1). P(X N k = i k, k n) Now we apply Bayes theorem and the Markov property to deduce that Hence P(X N k = i k, k n + 1) = P(X N = i X N k = i k, 1 k n + 1)P(X N k = i k, 1 k n + 1) = P(X N = i X N 1 = i 1 )P(X N k = i k, 1 k n + 1) = P(X N = i X N 1 = i 1 )P(X N 1 = i 1 X N 2 = i 2 ) P(X N n = i n X N n 1 = i n+1 ) P(X N n 1 = i n+1 ) = π in+1 p in+1i n... p i1i Similarly, we get that P(Y n+1 = i n+1 Y n = i n, Y n 1 = i n 1, Y = i ) = π i n+1 p in+1i n... p i1i π in p ini n 1... p i1i P(Y n+1 = i n+1 Y n = i n ) = P(Y n+1 = i n+1, Y n = i n ) P(Y n = i n ) = P(X N n = i n X N n 1 = i n+1 )P(X N n 1 = i n+1 ) P(X N n = i n ) = π i n+1 p in+1i n π in. = P(X N n 1 = i n+1, X N n = i n ) P(X N n = i n ) = π i n+1 p in+1i n π in. 37

40 So overall we have shown that for any n N and for any states i,..., i n+1 E we have that P(Y n+1 = i n+1 Y n = i n, Y n 1 = i n 1,..., Y = i ) = P(Y n+1 = i n+1 Y n = i n ) = π i n+1 p in+1i n π in, which completes the proof. Definition Let X = {X n : n {, 1,..., N}} be an irreducible Markov chain such that X n has a stationary distribution π for all n {, 1,..., N}. The Markov chain X is called time reversible if the transition matrices of X and its time reversal Y are the same. Theorem {X n } n {,1,...,N} is time reversible if and only if for any i, j E Note that the condition (2.8.1) is often referred to as detailed balance. π i p ij = π j p ji. (2.8.1) Proof. Let Q be the transition matrix of {Y n } n {,1,...,N}. Then from the above arguments, we have thus q ij = p ij iff (2.8.1) holds. q ij = p ji π j π i Theorem For an irreducible chain, if there exist a probability vector π such that (2.8.1) holds, for any i, j E, then the chain is time reversible (once it is in its stationary regime) and positive recurrent, with stationary distribution π. Proof. Given the detailed balance condition and any j E, we have π j p ji = π j p ji = π j i E π i p ij = i E thus π is stationary. The remainder of the result follows from Theorem Essentially, the result tells us, if we want to construct a chain with stationary distribution π, then one way is through the detailed balance condition. Remark Note that it is possible to extend the definition of time reversibility to an infinite time set {, 1, 2,... }, or even to a doubly infinite time set {..., 2, 1,, 1, 2,... }. Exercise Let {X n } n N denote a Markov chain with state space E = {1, 2, 3} with transition matrix p 1 p P = 1 p p, for < p < 1. p 1 p Is the Markov chain reversible? Solution: The Markov chain is irreducible, with finite state space. Hence there is a unique stationary distribution, which is given by π = (1/3, 1/3, 1/3). Now we check the detailed balance equations: π i p ij = π j p ji. Here we need i E 1 3 p ij = 1 3 p ji, i.e. p ij = p ji 38

41 for any i, j {1, 2, 3}. These equations only hold if and only if p = 1 p p = 1/2. So, the chain is reversible if and only if p = 1/2. 39

42 2.9 Summary Properties of irreducible Markov chains There are three kinds of irreducible Markov chains: 1. Positive recurrent (a) Stationary distribution π exists. (b) Stationary distribution is unique. (c) All mean recurrence times are finite and µ i = 1 π i (d) V i (n)/n π i as (n ), where V i (n)/n denotes the proportion of time before n spent in state i. (e) If the chain is aperiodic, then 2. Null recurrent P(X n = i) π i for all i E as n. (a) Recurrent, but all mean recurrence times are infinite. (b) No stationary solution exists. (c) V i (n)/n as (n ) (d) 3. Transient (a) Any particular state is eventually never visited. (b) No stationary distribution exists. (c) V i (n)/n as (n ) (d) P(X n = i) for all i E as n. P(X n = i) for all i E as n. 4

43 2.9.2 Markov chains with a finite state space Suppose we have a Markov chain with a finite state space. Then: A stationary distribution always exists. The stationary distribution is unique. There is a unique essential communicating class. There is a unique closed communicating class. If you need to find a stationary distribution, proceed as follows: Find all closed communicating classes C i (e.g. by looking at the transition diagram or by examining the transition matrix P). For each closed communicating class C i, i = 1, 2,..., you need to solve a system of equations. I.e. if C i is such a closed communicating class, let π Ci denote a Card(C i )-dimensional row vector with non-negative entries. Solve π Ci P(C i ) = π Ci. One possible stationary solution is then given by the row vector π which consists of the corresponding elements of the vectors π Ci, for i = 1, 2,... and of zeros corresponding to the inessential (transient) states. You need to be careful to get the order of the elements right. We often study nicely blocked Markov chains in this course, but that does not need to be the case in a real application! In a final step, derive the conditions needed to ensure that π has non-negative entries and the elements sum up to 1! You might also want to check, that if you only found one closed class, the above conditions should lead to a unique stationary distribution. If you still have some free parameters, then there has to be a mistake in your calculations! 41

44 Chapter 3 Poisson Processes 3.1 Introduction After having studied Markov chains in discrete time, we want to study Markov chains in continuous time. The general theory will be introduced in the next chapter. Here we start off with one particular example of a Markov process in continuous-time, the Poisson process. Poisson processes are the most basic form of continuous-time stochastic processes. Informally, we have a process that, starting at zero, counts events that occur during some time period; a realization of the process is displayed in the following figure. 7 Poisson process A realisation of a Poisson process on [, 1]. 42

tree. Siméon-Denis Poisson (21 June 1781-25 April 184) was a French mathematician, geometer, and physicist.

45 Such a process is very useful in practice: it can be used as a model for earthquakes, queues, traffic etc. More interestingly, it can be combined with more complex processes to describe: Jumps in the value of a stock A process for mutations on a genealogical tree. Siméon-Denis Poisson (21 June April 184) was a French mathematician, geometer, and physicist. As a digression, the Poisson process (as well as Brownian motion is a Lévy process, and we will see some common features in the definition of Poisson processes and Brownian motion. Paul Pierre Lévy ( ) was a French mathematician. 43

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process