Lectures on Probability and Statistical Models

Lectures on Probability and Statistical Models Phil Pollett Professor of Mathematics The University of Queensland c These materials can be used for any educational purpose provided they are are not altered

13 Imprecise (intuitive) definition. A Markov process is a random process that forgets its past, in the following sense: Pr(Future = y Present = x and Past = z) = Pr(Future = y Present = x). Thus, given the past and the present state of the process, only the present state is of use in predicting the future.

Equivalently, Pr(Future = y and Past = z Present = x) = Pr(Future = y Present = x) Pr(Past = z Present = x), so that, given the present state of the process, its past and its future are independent. If the set of states S is discrete, then the process is called a Markov chain. Remark. At first sight this definition might appear to cover only trivial examples, but note that the current state could be complicated and could include a record of the recent past.

Andrei Andreyevich Markov (Born: 14/06/1856, Ryazan, Russia; Died: 20/07/1922, St Petersburg, Russia) Markov is famous for his pioneering work on, which launched the theory of stochastic processes. His early work was in number theory, analysis, continued fractions, limits of integrals, approximation theory and convergence of series.

Example. There are two rooms, labelled A and B. There is a spider, initially in Room A, hunting a fly that is initially in Room B. They move from room to room independently: every minute each changes rooms (with probability p for the spider and q for the fly) or stays put, with the complementary probabilities. Once in the same room, the spider eats the fly and the hunt ceases. The hunt can be represented as a Markov chain with three states: (0) the spider and the fly are in the same room (the hunt has ended), (1) the spider is in Room A and the fly is in Room B, and, (2) the spider is in Room B and the fly is in Room A.

Eventually we will be able to answer questions like What is the probability that the hunt lasts more than two minutes? Let X n be the state of the process at time n (that is, after n minutes). Then, X n S = {0, 1, 2}. The set S is called the state space. The initial state is X 0 = 1. State 0 is called an absorbing state, because the process remains there once it is reached.

Definition. A sequence {X n,n = 0, 1,...} of random variables is called a discrete-time stochastic process; X n usually represents the state of the process at time n. If {X n } takes values in a discrete state space S, then it is called a Markov chain if Pr(X m+1 = j X m = i, X m 1 = i m 1,...,X 0 = i 0 ) = Pr(X m+1 = j X m = i). (1) for all time points m and all states i 0,...,i m 1, i,j S. If the right-hand side of (1) is the same for all m, then the Markov chain is said to be time homogeneous.

We will consider only time-homogeneous chains, and we shall write p (n) ij = Pr(X m+n = j X m = i) = Pr(X n = j X 0 = i) for the n-step transition probabilities and p ij := p (1) ij = Pr(X m+1 = j X m = i) = Pr(X 1 = j X 0 = i) for the 1-step transition probabilities (or simply transition probabilities).

By the law of total probability, we have that j S p (n) ij = j S Pr(X n = j X 0 = i) = 1, and in particular that j S p ij = 1. The matrix P (n) = (p (n) ij, i,j S) is called the n-step transition matrix and P = (p ij, i,j S) is called the 1-step transition matrix (or simply transition matrix).

Remarks. (1) Matrices like this (with non-negative entries and all row sums equal to 1) are called stochastic matrices. Writing 1 = (1, 1,...) T (where T denotes transpose), we see that P1 = 1. Hence, P (and indeed any stochastic matrix) has an eigenvector 1 corresponding to an eigenvalue λ = 1. (2) We may usefully set P (0) = I, where, as usual, I denotes the identity matrix: { p (0) 1 if i = j, ij = δ ij := 0 if i j.

Example. Returning to the hunt, the three states were: (0) the spider and the fly are in the same room, (1) the spider is in Room A and the fly is in Room B, and, (2) the spider is in Room B and the fly is in Room A. Since the spider changes rooms with probability p and the fly changes rooms with probability q, 1 0 0 P = r (1 p)(1 q) pq, r pq (1 p)(1 q) where r = p(1 q) + q(1 p) = p + q 2pq = 1 [(1 p)(1 q) + pq].

For example, if p = 1/4 and q = 1/2, then 1 0 0 P = 1/2 3/8 1/8. 1/2 1/8 3/8 What is the chance that the hunt is over by n minutes? Can we calculate the chance of being in each of the various states after n minutes?

By the law of total probability, we have p (n+m) ij = Pr(X n+m = j X 0 = i) But, = k S Pr(X n+m = j X n = k,x 0 = i) Pr(X n+m = j X n = k,x 0 = i) Pr(X n = k X 0 = i). = Pr(X n+m = j X n = k) (Markov property) = Pr(X m = j X 0 = k) (time homogeneous)

and so, for all m,n 1, p (n+m) ij = k S p (n) ik p(m) kj, i,j S, or, equivalently, in terms of transition matrices, P (n+m) = P (n) P (m). Thus, in particular, we have P (n) = P (n 1) P (remembering that P := P (1) ). Therefore, P (n) = P n, n 1. Note that since P (0) = I = P 0, this expression is valid for all n 0.

Example. Returning to the hunt, if the spider and the fly change rooms with probability p = 1/4 and q = 1/2, respectively, then 1 0 0 P = 1/2 3/8 1/8. 1/2 1/8 3/8 A simple calculation gives 1 0 0 P 2 = 3/4 5/32 3/32, 3/4 3/32 5/32

P 3 = 1 0 0 7/8 9/128 7/128, 7/8 7/128 9/128 et cetera, and, to four decimal places, 1 0 0 P 15 = 1.0000 0.0000 0.0000. 1.0000 0.0000 0.0000 Recall that X 0 = 1, so p (n) 10 is the probability that the hunts ends by n minutes. What, then, is the probability that the hunt lasts more than two minutes? Answer: 1 3/4 = 1/4.

Arbitrary initial conditions. What if we are unsure about where the process starts? Let π (n) j = Pr(X n = j) and define a row vector π (n) = (π (n) j, j S), being the distribution of the chain at time n. Suppose that we know the initial distribution π (0), that is, the distribution of X 0 (in the previous example we had π (0) = (0 1 0)).

By the law of total probability, we have π (n) j = Pr(X n = j) = i S Pr(X n = j X 0 = i) Pr(X 0 = i) = i S π (0) i p (n) ij, and so π (n) = π (0) P n, n 0. Definition. If π (n) = π is the same for all n, then π is called a stationary distribution. If lim n π (n) exists and equals π, then π is called a limiting distribution.

Example. Returning to the hunt with p = 1/4 and q = 1/2, suppose that, at the beginning of the hunt, each creature is equally likely to be in either room, so that π (0) = (1/2 1/4 1/4). Then, π (n) = π (0) P n = (1/2 1/4 1/4) 1 0 0 1/2 3/8 1/8 1/2 1/8 3/8 n.

For example, π (3) = (1/2 1/4 1/4) 3 1 0 0 1/2 3/8 1/8 1/2 1/8 3/8 = (1/2 1/4 1/4) 1 0 0 7/8 9/128 7/128 7/8 7/128 9/128 = (15/16 1/32 1/32). So, if, initially, each creature is equally likely to be in either room, then the probability that the hunt ends within 3 minutes is 15/16.

The two state chain. Let S = {0, 1} and let ( ) 1 p p P =, q 1 q where p,q (0, 1). It can be shown that ( ) ( ) ( ) P = 1 1 p 1 0 q p p + q 1 q 0 r 1 1, where r = 1 p q. This is of the form P = V DV 1. Check it! (The procedure is called diagonalization.)

This is good news because P 2 = (V DV 1 )(V DV 1 ) = V D(V 1 V )DV 1 = V (DID)V 1 = V D 2 V 1. Similarly, P n = V D n V 1 for all n 1. Hence, ( ) ( 1 p q P (n) = 1 p + q = 1 p + q 1 q ( q + pr n q qr n ) ( 1 0 0 r n ) p 1 1 ) p pr n p + qr n.

Thus we have an explicit expression for the n-step transition probabilities. Remark. The above procedure generalizes to any Markov chain with a finite state space.

If the initial distribution is π (0) = (a b), then, since π (n) = π (0) P n, Pr(X n = 0) = Pr(X n = 1) = q + (ap bq)rn p + q p (ap bq)rn p + q,. (You should check this for n = 0 and n = 1.) Notice that when ap = bq, we have Pr(X n = 0) = 1 Pr(X n = 1) = q/(p + q), for all n 0, so that π = (q/(p + q) p/(p + q)) is a stationary distribution.

Notice also that r < 1, since p,q (0, 1). Therefore, π is also a limiting distribution because lim Pr(X n = 0) = q/(p + q), n lim Pr(X n = 1) = p/(p + q). n Remark. If, for a general Markov chain, a limiting distribution π exists, then it is a stationary distribution, that is, πp = π (π is a left eigenvector corresponding to the eigenvalue 1). For details (and the converse), you will need a more advanced course on Stochastic Processes.

Example. Max (a dog) is subjected to a series of trials, in each of which he is given a choice of going to a dish to his left, containing tasty food, or a dish to his right, containing food with an unpleasant taste. Suppose that if, on any given occasion, Max goes to the left, then he will return there on the next occasion with probability 0.99, while if he goes to the right, he will do so on the next occasion with probability 0.1 (Max is smart, but he is not infallible).

Poppy and Max

Let X n be 0 or 1 according as Max chooses the dish to the left or the dish to the right on trial n. Then, {X n } is a two-state Markov chain with p = 0.01 and q = 0.9 and hence r = 0.09. Therefore, if the first dish is chosen at random (at time n = 1), then Max chooses the tasty food on the n-th trial with probability 90 91 89 182 (0.09)n 1, the long-term probability being 90/91.

Birth-death chains. Their state space S is either the integers, the non-negative integers, or {0, 1,...,N}, and, jumps of size greater than 1 are not permitted; their transition probabilities are therefore of the form p i,i+1 = a i, p i,i 1 = b i and p ii = 1 a i b i, with p ij = 0 otherwise. The birth probabilities (a i ) and the death probabilities (b i ) are strictly positive and satisfy a i + b i 1, except perhaps at the boundaries of S, where they could be 0. If a i = a and b i = b, the chain is called a random walk.

Gambler s ruin. A gambler successively wagers a single unit in an even-money game. X n is his capital after n bets and S = {0, 1,...,N}. If his capital reaches N he stops and leaves happy, while state 0 corresponds to bust. Here a i = b i = 1/2, except at the boundaries (0 and 1 are absorbing states). It is easy to show that the player goes bust with probability 1 i/n if his initial capital is i.

The Ehrenfest diffusion model. N particles are allowed to pass through a small aperture between two chambers A and B. We assume that at each time epoch n, a single particle, chosen uniformly and at random from the N, passes through the aperture. Let X n be the number in chamber A at time n. Then, S = {0, 1,...,N} and, for i S, a i = 1 i/n and b i = i/n. In this model, 0 and N are reflecting barriers. It is easy to show that the stationary distribution is binomial B(N, 1/2).

Population models. Here X n is the size of the population time n (for example, at the end of the n-th breeding cycle, or at the time of the n-th census). S = {0, 1,...}, or S = {0, 1,...,N} when there is an upper limit N on the population size (frequently interpretted as the carrying capacity). Usually 0 is an absorbing state, corresponding to population extinction, and N is reflecting.

Example. Take S = {0, 1,...} with a 0 = 0 and, for i 1, a i = a > 0 and b i = b > 0, where a + b = 1. It can be shown that extinction occurs with probability 1 when a b, and with probability (b/a) i when a > b, where i is the initial population size. This is a good simple model for a population of cells: a = λ/(λ + µ) and b = µ/(λ + µ), where µ and λ are, respectively, the death and the cell division rates.

The logistic model. This has S = {0,...,N}, with 0 absorbing and N reflecting, and, for i = 1,...,N 1, a i = λ(1 i/n) µ + λ(1 i/n), b i = µ µ + λ(1 i/n). Here λ and µ are birth and death rates. Notice that the birth and the death probabilities depend on i only through i/n, a quantity which is proportional to the population density: i/n = (i/area)/(n/area). Models with this property are called density dependent.

Telecommunications. (1) A communications link in a telephone network has N circuits. One circuit is held by each call for its duration. Calls arrive at rate λ > 0 and are completed at rate µ > 0. Let X n be the number of calls in progress at the n-th time epoch (when an arrival or a departure occurs). Then, S = {0,...,N}, with 0 and N both reflecting barriers, and, for i = 1,...,N 1, a i = λ λ + iµ, b i = iµ λ + iµ.

(2) At a node in a packet-switching network, data packets are stored in a buffer of size N. They arrive at rate λ > 0 and are transmitted one at a time (in the order in which they arrive) at rate µ > 0. Let X n be the number of packets yet to be transmitted just after the n-th time epoch (an arrival or a departure). Then, S = {0,...,N}, with 0 and N both reflecting barriers, and, for i = 1,...,N 1, a i = λ λ + µ, b i = µ λ + µ.

Genetic models. The simplest of these is the Wright-Fisher model. There are N individuals, each of two genetic types, A-type and a-type. Mutation (if any) occurs at birth. We assume that A-types are selectively superior in that the relative survival rate of A-type over a-type individuals in successive generations is γ > 1. Let X n be the number of A-type individuals, so that N X n is the number of a-type.

Wright and Fisher postulated that the composition of the next generation is determined by N Bernoulli trials, where the probability p i of producing an A-type offspring is given by p i = γ[i(1 α) + (N i)β] γ[i(1 α) + (N i)β] + [iα + (N i)(1 β)], where α and β are the respective mutation probabilities. We have S = {0,...,N} and ( ) N p ij = p j i j (1 p i) N j, i,j S.