MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

Size: px

Start display at page:

Download "MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents"

Ambrose Mathews
6 years ago
Views:

1 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss total variation distance and mixing times to bound the rate of convergence of a Markov Chain to its stationary distribution. Functions on state spaces are then considered, including a discussion of which properties of Markov Chains are preserved over functions, and we will show that the mixing time of a Markov chain is greater than or equal to the mixing time of its image. Contents. Introduction Denition.. A Markov Chain with countable state space is a sequence of - valued random variables, ; ; 3 ; : : :, such that for any states x i, and any time n, P f n = x n jj n = x n ; n = x i ; : : : 0 = x 0 g = P f n = x n jj n = x n g This denition says that the state of a Markov Chain depends only on the state immediately preceding it, and is independent of any behavior of the chain before that. We will often denote the probability of going from one state to another, P f n = x n j n = x n g as p(x n ; x n ). To refer to the probability of going from one state to another in j steps, P f n+j = yj n = xg, we will use the notation of p j (x; y). We will also often refer to the transition probability matrix P of a Markov Chain. This means the jj jj matrix, where for all i; j in, P ij = p(i; j). Denition.. A Markov Chain n with transition probability matrix P is irreducible if for any two states x; y there exists an integer t (possibly depending on x and y) such that p t (x; y) > 0 By this denition, if a Markov chain n is irreducible, there is positive probability that, starting at any state, the Markov chain will reach any other state in nite time. Denition.3. Let T (x) = ft : p t (x; x) > 0g be the set of times t for which there is a positive probability for the chain to return to starting position x at time t. The period of state x is dened to be the greatest common divisor of T (x). Date: August 3, 00.

2 JAMES READY Denition.4. A Markov Chain n is said to be aperiodic if the period of all of its states is. We will often discuss Markov Chains that are both irreducible and aperiodic. In this case, the following theorem is useful: Theorem.5. If n is an aperiodic and irreducible Markov Chain with transition probability matrix P on some nite state space, then there exists an integer M such that for all m > M, p m (x; y) > 0 for all x; y. Proof. For any x, recall from the denition of periodicity that T (x) = ft j P t (x; x) > 0g, and since n is aperiodic, we know gcd(t (x)) =. Consider any s and t in T (x). Since p s+t (x; x) p s (x; x)p t (x; x) > 0 we know that s + t is in T (x) as well, so T (x) is closed under addition. We will now use the fact from number theory that any set of non-negative integers which is closed under addition and which has greatest common divisor must contain all but nitely many of the non-negative integers. This implies that we can nd some t x such that, for all t t x, t T (x). Fix any x; y in. Because n is irreducible, there exists some k (x;y) such that p k (x;y)(x; y) > 0. Thus, for t t x + k (x;y), p t (x; y) p t k (x;y) (x; x)p k (x;y) (x; y) > 0 Now, for any x, let t 0 x = t x +max y k (x;y). For all y, we have p t0 x (x; y) > 0. We then know that for all m max t 0 x, we have p m (x; y) > 0 for any x; y.. Stationary Distributions Denition.. For any Markov Chain n with transition probability matrix P, a stationary distribution of n is any probability distribution satisfying the condition that = P By matrix multiplication, it is equivalent that, for any y, (y) = (x)p(x; y) I will show in this section that, for any Markov chain n on a nite state space, the stationary distribution exists, and if n is irreducible, then is unique. In the next section, I will show that if n is an irreducible and aperiodic Markov chain, there is a notion of convergence to its stationary distribution. For most of the theorems in this section, the condition of a nite state space is necessary. To show this, consider the simple random walk on Z, which is a Markov Chain represented by the transition probabilities: p(x; y) =( : if jx yj = 0 : otherwise

3 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 3 I will show that the simple random walk cannot have any stationary distribution. Suppose the simple random walk on has a stationary distribution, that is, there exists some probability distribution such that (y) = (x)p(x; y) ) (y) = (y ) + (y + ) This means that : Z! [0; ] is a harmonic function; that is, it satises the condition that, for all x, (x) equals the average of its neighbors. It can be shown that the only harmonic functions on Z are linear. Because must always be nonnegative, we know that the slope of must be zero. If (x) P = 0 for all x, then is not a probability distribution; similarly, if (x) 6= 0, then Z (x) =, so is not a probability distribution. Thus, the simple random walk cannot have any stationary distribution. Theorem.. If n is a Markov Chain with transition probability matrix P on nite state space, then, the stationary distribution of n, exists. Proof. Let S = probability distributions onf; ; : : : ; ng R n. Because the sum of all components of a probability distribution must equal one, S is a compact subset of R n. Choose any S. We know that P; P ; P 3 ; : : : are all also in S. Now, consider the sequence n = n n k=0 n is also a sequence in S. Because S is compact, it must have some convergent subsequence. Thus, there exists some subsequence nj that converges to some in S. I claim that this is a stationary distribution for P. We want to show that P =, or equivalently, P = 0. P = lim v n j! j P = lim j! ) P = lim j! = lim j! = lim j! = 0 P k n j P k+ n j k=0 n j n j k=0 n j n j k= n j P k+ n j n j P k n j n j ((P nj ) ) since ((P nj ) ) is bounded and lim j! n j goes to 0. Theorem.3. If n is an irreducible Markov Chain with transition probability matrix P on nite state space, then the stationary distribution is unique. k=0 k=0 P k P k

4 4 JAMES READY Proof. Suppose there exists and such that P = and P =. It is sucient to show that (x) (x) is a constant over all x, because if all stationary vectors are scalar multiples of each other, there can be at most one with components that sum to. Because n is irreducible, we P know that, if P =, then (x) > 0 for all x. This is true because Because (x) =, there must exist some y such that (y) > 0, so if we choose any x, there must be some m such that p m (y; x) > 0 (or else n would be reducible), so it immediately follows that can have no 0 components. Since is nite, it follows that there exist some > 0 such that This implies that, for all y, min x (y) (y) 0 (x) (x) = and = 0 for some y If is not the zero vector, then we get that, for some appropriate scalar, ( ) is also a stationary distribution. However, ( ) = 0 at some x, contradicting what we showed above, that no stationary distribution of an irreducible Markov chain can have any components equal to 0. Thus, must equal 0 for all x, so (x) (x) is a constant, and P can only have one stationary distribution. 3. Convergence The goal of this section is to place an upper bound on the time it takes any irreducible, aperiodic Markov chains converge to its stationary distribution. This section is mostly based on Chapter 4 of Markov Chains and Mixing Times. 3.. Total Variation Distance. In order to dene Mixing Times, we rst need to dene a method to measure distance between probability distributions. Denition 3.. For some probability distributions and, the Total Variation Distance between and, denoted jj jj T V, is jj jj T V = max j(a) (A)j A Now, we will introduce some other denitions of Total Variation distance and show that they are equivalent. Proposition 3.. jj jj T V = j(x) (x)j Proof. Let B = fx j (x) (x)g. For any event A, Similarly, Because (A) (A) (A \ B) (A \ B) (B) (B) (A) (A) (B c ) (B c ) (B) + (B c ) = = (B) + (B c )

5 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 5 we get that (B c ) (B c ) = (B) (B), so this is an upper bound of jj jj T V, and if we consider the event A = B, we get equality. Thus, we get, jj jj T V = [(B) (B) + (Bc ) (B c )] = j(x) (x)j Note that the above proof also shows that jj jj T V = ;(x)>(x) j(x) (x)j To understand the next denition of Total Variation Distance, we need to use coupling. Denition 3.3. A coupling of two probability distributions and v is a pair of random variables (; Y ) such that the marginal distribution of is and the marginal distribution of Y is v. Thus, if q is a joint distribution of ; Y on, meaning q(x; y) = P f = q(x; y) = (y). Clearly, and Y can always have the same value only if and are identical. x; Y = yg, then P y q(x; y) = (x) and P Proposition 3.4. jj jj T V = inffp f 6= Y g j (; Y ) is a coupling of and g Note: Such a coupling is called an optimal coupling. In the proof below, we will show an optimal coupling always exists. Proof. For any event A, (A) (A) = P ( A) P (Y A) P f A; Y = Ag P f 6= Y g ) jj jj T V inffp f 6= Y (; Y ) is a coupling of and g Now, we need to construct a coupling such that equality holds. Let p = minf(x); (x)g

6 6 JAMES READY We show that p = minf(x); (x)g = (x) + ;(x)<(x) = (x) + ;(x)<(x) = ((x) (x)) ;(x)(x) ;(x)(x) ;(x)(x) (x) (x) + ;(x)(x) (x) ;(x)(x) (x) = jj(x) (x)jj T V Now, ip a coin with probability p of heads. If the coin is heads, choose the value of by the probability distribution minf(x); (x)g (x) = p In this case, let Y =. If the coin is tails, choose the value of using the probability distribution ( (x) (x) : if (x) > (x) p (x) = 0 : otherwise Independently choose the value of Y using the probability distribution ( (x) (x) : if (x) > (x) p (x) = 0 : otherwise Using this, we have, for the distribution of, and for the distribution of Y, we have p + ( p) = p + ( p) = This means that we have dened a coupling (; Y ) of and. Note that 6= Y if and only if the tail lands tails. Thus, P f 6= Y g = p = jj jj T V 3.. The Convergence Theorem. Theorem 3.5. Suppose P is an irreducible, aperiodic Markov Chain, with stationary distribution. Then there exists (0; ) and C > 0 such that, for all t 0, max jjp t (x; ) jj T V C t Proof. Since P is irreducible and aperiodic, there exists > 0 and r N such that, for all x; y in, p r (x; y) (y) Let =. Let be a square matrix with jj rows, each of which is. P r = ( ) + Q denes a stochastic matrix Q.

7 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 7 Now, we will show P rk = ( k ) + k Q k by induction. For k =, this is true, because it is exactly how we dened Q. Assume this holds for k = n. P r(n+) = P rn P r = [( n ) + n Q n ]P r = ( n ) P r + ( ) n Q n + (n+) Q n Q Because P r =, and Q n =, we get completing the induction. P r(n+) = ( n+ ) + n+ Q n+ ) P rk P j = ( k ) P j + k Q k P j P rk+j = k + k Q k P j P rk+j = k (Q k P j ) P rk+j (x 0 ; ) (x 0 ; ) = k Q k P j (x 0 ; ) (x 0 ; ) Thus, the left hand side above becomes the Total Variation Distance, and the second term on the right hand side is at most, so we get that there exists some C j such that max jjp rk+j (x 0 ; ) jj T V C j t Now, let C = maxfc 0 ; C ; : : : ; C r ; C r g, let k = [ t ], and we have r max jjp rk+j (x 0 ; ) jj T V C t 3.3. Standardizing Distance from Stationary. We will now introduce notation to show the distance of a Markov Chain from its stationary distribution, and the distance between two Markov Chains. Denition 3.6. Lemma 3.7. d(t) = max jjp t (x; ) jj T V d(t) = max x;y jjp t (x; ) P t (y; )jj T V d(t) d(t) d(t) Proof. Because the triangle inequality holds for Total Variation Distance, for all x; y in, jjp t (x; ) P t (y; )jj T V so d(t) d(t). Since is stationary, jjp t (x; ) jj T V + jjp t (y; ) jj T V (A) = y (y)p t (y; A)

8 8 JAMES READY Thus, we have d(t) = max jjp t (x; ) jj T V = max = max max max max jp t (x; A) (A)j A j max (y)jp t (x; A) P t (y; A)j A y (y) max jp t (x; A) P t (y; A)j A y max jjp t (x; ) P t (y; )jj T V y max x;y jjp t (x; ) P t (y; )jj T V = d(t) P The second to last inequality holds since y (y) is a convex linear combination. Lemma 3.8. d is submuliplicative; that is, d(s + t) d(s)d(t) Proof. Fix x; y in, and let ( s ; Y s ) be the optimal coupling of P s (x; ) and P s (y; ). We early showed this coupling does exist, and that jjp s (x; ) P s (y; )jj T V = P f s 6= Y S g p s+t (x; w) = z p s (x; z)p t (z; w) = z P x f s = zgp t (z; w) = E x (P t ( s ; w)) Similarly, p s+t (y; w) = E(p t (Y s ; w)). ) p s+t (x; w) p s+t (y; w) = E(P t ( s ; w) P t (Y s ; w)) ) jjp s+t (x; w) P s+t (y; w)jj T V = je(p t ( s ; w) P t (Y s ; w))j w E( jp t ( s ; w) P t (Y s ; w)j) w = E(jjP t ( s ; ) P t (Y s ; )jj T V ) d(t)p f s 6= Y s g d(t)d(s) We now get to the following corollary: Corollary 3.9. For all c N, d(ct) d(ct) d(t) c

9 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 9 Proof. From the earlier lemma, d(ct) d(ct). From the lemma immediately above, d(ct) = d(t + t + : : : + t) (c times) d(t)d(t) : : : d(t) (c times) = d(t) c 3.4. Mixing Times. Denition 3.0. The mixing time of a Markov Chain, with respect to, is We will also let t mix = t mix ( 4 ). t mix () = minft j d(t) g The choice of exactly 4 is somewhat arbitrary, but as we will see below, it simplies the calculation of a key inequality. By corollary 3.9, for any l N d(lt mix ()) d(lt mix ()) d(t mix ()) l () l ) d(lt mix ) l This means that a Markov Chain converges to its stationary distribution exponentially fast. 4. Functions We will now consider the idea of functions on state spaces. We let n be a Markov Chain with transition probability matrix P on some state space, and consider some function F :! W. F will always be surjective, but not necessarily injective. We will let W n = F ( n ). For all of this section, assume is at most countable. I will show, by counterexample, that W n is not always a Markov Chain. Let = fa; b; cg and W = fd; eg. Let n be the Markov Chain represented by the below directed graph, where p(i; j) = if i has an outgoing edge to j, and 0 otherwise. a / b / c Dene F such that F (a) = F (b) = d, and F (c) = e. Let n have the starting distribution fp ; p ; p 3 g over fa; b; cg. We want to show the Markov condition does not hold for W n ; that is, there exists some n and w n such that P fw n = w n jw n = w n g 6= P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g This is true for W = d, because P fw = djw 0 = e; W = eg = P f = a; 0 = (b or c); = (b or c)g P f 0 = (b or c); = (b or c)g = p p =

10 0 JAMES READY but P fw = djw = eg = P f = a; = (b or c)g P f = (b or c)g = p p + p P fw = djw 0 = e; W = eg 6= P fw = djw = eg, so W n is not a Markov Chain. However, there does exist a condition on n that guarantees W n to be a Markov Chain. Theorem 4.. Let n be a Markov Chain with transition probability matrix P on some state space, and consider some function F :! W. Let W n = F ( n ). For each w W, let F (w) = fx jf (x) = wg. If for every pair of states w ; w in W and every pair of states x ; x in F (w ), yf (w ) then W n is a Markov Chain on W. p(x ; y) = yf (w ) p(x ; y) I will say that any Markov Chain with a function which satises the condition of the above theorem is function regular. Proof. We want to show that P fw n = w n jw n = w n g = P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g We know for every x ; x in F (w n ), there exists some! R such that I claim that yf (w n) p(x ; y) = yf (w n) p(x ; y) =! P fw n = w n jw n = w n g = P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g =! First, I will show this is true for the left hand side. P fw n = w n jw n = w n g = P f n F (w n )j n F (w n )g = P f n F (w n ); n F (w n )g P f n F (w n )g P PF = (P ( (w n ) n = x) yf (w n) p(x; y)) P f n F (w n )g PF (P ( (w n ) n = x)!) = P f n F (w n )g =! P F (w n ) (P ( n = x)) P f n F (w n )g =!

11 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES Now, I will show this is true for the right hand side. P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g = P f n F (w n )j n F (w n ); n F (w n ); ; 0 F (w 0 )g = P f n F (w n ); n F (w n ); n F (w n ); ; 0 F (w 0 )g P f n F (w n ); n F (w n ); ; 0 F (w 0 )g = P F (w n ) P f n F (w n ); ; 0 F (w 0 )g P f n = xj n F (w n ); ; 0 F (w 0 )g ( P yf (w n) P (x; y))!= P f n F (w n ); n F (w n ); ; 0 F (w 0 )g =! P F (w n ) P f n F (w n ); ; 0 F (w 0 )g P f n = xj n F (w n ); ; 0 F (w 0 )g F (w 0 )g!!! = P f n F (w n ); n F (w n ); ; 0 =! Thus, W n is a Markov Chain. However, function regularity is only a sucient condition for the image to be a Markov Chain, not a necessary one. Let n be the Markov Chain represented by the transition probability matrix below. a b c d a b c d Let F (a) = ; F (b) = F (c) =, and F (d) =. Because p(b; a) 6= p(c; a), n is not function regular. We then have that W n is represented by the following transition probability matrix: This counterexample demonstrates an issue with deciding how we determine the starting distribution of the n. Given any starting distribution of W n, can we choose how we want to pull this back onto the n? If we know or can x any pull back of starting probabilities in, then W n is a Markov Chain, so this is

12 JAMES READY a counterexample. However, if we do not know how the starting distribution is pulled back in, and all we know is the observed probabilities of the W n, then P fw = jw 0 = g is not well dened, and W n is not a Markov Chain. For this rest of this section, unless otherwise stated we will not assume n is function regular. However, we will assume that W n is a Markov Chain. We will now discuss some properties of Markov Chains that are, or are not, preserved over functions on state spaces. First, we will prove a lemma needed for the next theorem. Lemma 4.. Let P be a transition probability matrix on a state space Y. Then a probability distribution on Y is stationary for P if, on some probability space, there exist Y-valued random variables Y 0 ; Y such that (a) each of Y 0 and Y has marginal distribution ; and (b) the conditional distribution of Y given Y 0 = y is the y th row of P. Proof. Suppose we have some random variables Y 0 ; Y such that P fy = xjy 0 = yg = P (y; x) and P fy 0 = yg = P fy = yg = (y). We want to show that is stationary; that is, we want to show (y i ) = P yy ((y)p (y; y i)). (y; y i )) = (P fy 0 = yg P fy = xjy 0 = yg) yy((y)p yy = yy(p fy = y i ; Y 0 = yg) = (y i ) Theorem 4.3. Let n, with stationary distribution, be a function regular Markov Chain. Let W n = F ( n ), as usual. Then the projection F of to W is a stationary distribution for W n. Proof. We will rst choose two random variables Y 0 ; Y such that: P fy = x j Y 0 = yg = q(y; x) () P fy 0 = yg = P fy = yg = (F (y)) () = (z) zf (y) By the lemma, we know that if two such random variables exist, than (F ) is a stationary distribution. Let P (Y = x; Y 0 = y) = f(x; y). Having assumed P fy = x j Y 0 = Y g = q(y; x), we want to show that f(x; y) satises the condition () above. P fy = x; Y 0 = Y g = q(y; x) P fy 0 = yg f(x; y) ) = q(y; x) PzF (y) (z) ) f(x; y) = q(y; x) ) f(x; y) = W zf (y) zf (y) (z) (z)

13 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES3 P The last step follows since W q(y; x) =. P Thus, we have just shown that P (Y 0 = y) = zf (y) (z), the second part of condition. Now, all that is left is to show that P (Y = x) = ( F (x)). q(y; x) = P fw = y j W 0 = xg = P fx F (y) j 0 F (x)g = P fx F (y) j 0 = z F (x)g We know that, since n is function regular, the last step above does not depend on the choice of z. Choose any u in F (y). yw f(x; y) = yw P (u; v) vf (x) = yw zf (y) vf (x) = yw zf (y) vf (x) = vf = vf (x) (x) z = ( F )(x) p(z; v)(z) zf (y) (z) P (u; v)(z) P (z; v)(z) (v) (Since is stationary in ) (Since we were able to combine the y sum and the z sum) We just showed that P y f(x; y) = P fy = xg = ( F )(x), so we are done. Now, as an example of a Markov Chain with a function on its state space, I will introduce the Ehrenfest Urn model. The state space is f0; g N. At each integer time t, one random index j N is chosen, and the j th coordinate from time t is switched. Thus, for two vectors x and y, the transition probabilities are: p(x; y) =( : if x and y dier in exactly one coordinate N 0 : otherwise This can be visualized as two urns of balls. At each time t, one ball is randomly chosen to switch urns. I claim that the stationary distribution of the Ehrenfest Urn model, is that all possible vectors are uniformly distributed with probability. Because there N are N vectors, this is a probability distribution. This distribution is stationary because, for any y = N N N N = N N N fxjx and y dier in exactly one coordinate g (y) = (x)p(x; y) Because the Ehrenfest Urn model is irreducible and is nite, is unique.

14 4 JAMES READY Now, let W = f0; ; ; : : : ; Ng, and dene W n = N j= n (j) We know that W n is a Markov chain because n is function regular (any vector with k 's at time t is equally likely to have l 's at time t). We can determine the transition probabilities of W n by looking of the transition probabilities of n. If W n = w, then W n+ increases by if a 0 is chosen to be switched in n, and decreases by if a is chosen in n. the transition probabilities are as follows: P (w; w + ) = N w N P (w; w ) = w N P (w; any other value) = 0 Using Theorem 5.3, we can also determine the stationary distribution of W n by considering the stationary distribution,, of n. Since all vectors are equally likely, it is simply a matter of counting how many vectors in go to each value in N Y. The number of ways to have w 's in an N length vector is w. Because there are N binary vectors of length N, we can see that the stationary distribution of W n,, is: (w) = We will now show that, if n converges to its stationary distribution, then W n converges to its stationary distribution at least as fast. Theorem 4.4. If n is irreducible and aperiodic, then W n is irreducible and aperiodic. Proof. Because n is irreducible and aperiodic, there exists some M such that for all m > M and x; y, p m (x; y) > 0. We will now show that q m (w ; w ) > 0 for any m > M and w ; w W, where q m (w ; w ) represents the transition probabilities of going from w to w in m steps in the chain of the W n. Choose any w ; w W. N w N q m (w ; w ) = P f n F (w ) j n m F (w )g = P f n F (w ); n m F (w )g P f n m F (w )g PF (w );yf (w ) pm (x; y) = > 0 P f n m F (w )g This condition is stronger than the condition for irreducibility, so W n is irreducible. If we let w = w, this condition says that for any m > M and w W, q m (w ; w ) > 0, so T (w ) fm j m > Mg. This implies that gcd(t (w )) = for all w W, so W n is aperiodic. Theorem 4.5. d(t) Wn d(t) n.

15 j j MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES5 Proof. d(t) Wn = max ww jj yw = max ww jj z max ww max jj z = d(t) n q t (w; y) (y) F (w) jj max F (w) z p t (x; z) (x) The third line followed because F (w) jj T V P f t = xj t F p t (x; z) (x) jj T V (w)gp t (x; z) (x) jj T V By Theorem 5.3 jj T V! P f t = xj t F (w)g = Corollary 4.6. For all, t mix ()(W n ) t mix ()( n ) Proof. Recall that t mix () = minftjd(t) g. This corollary follows immediately from theorem 5. I'll now show some counterexamples of properties that are not preserved under functions on state spaces. If n is reducible, W n is not necessarily irreducible. Consider an n of two identical, non-communicating classes, for example, the Markov Chain represented by the graph below: a c * * b c It is clear that there is a function (namely, F (a) = F (c) = and F (b) = F (d) = ) that maps the corresponding states in each class to the same point, creating a Markov chain with a single communicating class. If n is periodic, then W n is not necessarily period. Consider the Markov Chain represented by the transition probability matrix below to be n : a b c d a b c d Now let F (a) = F (b) = and F (c) = F (d) =. We then have that W n represented by the transition probability matrix below: Here, n has period, but its image has period, so W n is aperiodic. Consider the following lemma: is

16 6 JAMES READY Lemma 4.7. Let n be a function regular Markov Chain, and W n = F ( n ). q(w ; w ) = 0 if and only if for all x F (w ) and y F (w ), we have p(x; y) = 0. Proof. First, we will show that the right hand side implies the left hand side. Suppose that for all x F (w ), y F (w ), p(x; y) = 0. We then get that P f n F (w ) j n F (w )g = 0 which is equivalent to saying q(w ; w ) = 0, as desired. Now, we will show the left hand side implies the right hand side by proving the contrapositive. Suppose there exists some P x 0 F (w ) and y 0 F (w ) such that p(x; y) = 0. This means that yf p(x (w ) 0; y) P =, for some > 0. Because n is function regular, we know that for all x, yf (w ) p(x; y) =. Also, because we are conditioning on the event n F (w ), we will assume that P f n F (w )g > 0. We can now show that q(x; y) = P f n F (w ) j n F (w )g = P f n F (w ); n F (w )g P f n F (w )g P PF = P f (w ) n = xg ( yf (w ) p(x; y)) P f n F (w )g = P f n F (w )g P f n F (w )g > 0 Corollary 4.8. If n is irreducible and function regular, then W n is irreducible. Proof. Recall our denition of an irreducible Markov Chain: n is irreducible if for any two states x; y there exists an integer t (possibly depending on x and y) such that p t (x; y) > 0. It follows immediately from the lemma that if n is irreducible, then W n is irreducible. Corollary 4.9. If n is aperiodic and function regular, then W n is aperiodic. Proof. Recall our set T (x) = ft j p t (x; x) > 0g from our denition of the period of a Markov Chain. Let T (x) = ft j q t (F (x); F (x)) > 0g. The above lemma shows that T (x) must be always contain T (x). Thus, for any w, the gcd(t (x)) gcd(t (x)). Because n is aperiodic, gcd(x) = for all x, so, because F is always onto, gcd(w) = for all w W. Thus, W n is aperiodic. Acknowledgments. It is a pleasure to thank my mentor, Yuan Shao, for helping me decide on a topic, assisting me with the dicult proofs and concepts, and looking over all of my work to ensure its correctness. I could not have done this paper without him. References [] David A. Levin, Yuval Peres, and Elizabeth Wilmer. Markov Chains and Mixing Times. dlevin/markov/markovmixing.pdf. [] Steven P. Lalley. Markov Chains: Basic Theory. lalley/courses/3/markovchains.pdf.

17 MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES7 [3] Steven P. Lalley. Statistics 3: Homework Assignment 5. lalley/courses/3/hw5.pdf

MARKOV CHAINS AND HIDDEN MARKOV MODELS

MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe