MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

Similar documents
MARKOV CHAINS AND HIDDEN MARKOV MODELS

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

MARKOV CHAIN MONTE CARLO

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Markov Chains and Stochastic Sampling

Some Definition and Example of Markov Chain

Convergence Rate of Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Introduction and Preliminaries

Modern Discrete Probability Spectral Techniques

Analysis on Graphs. Alexander Grigoryan Lecture Notes. University of Bielefeld, WS 2011/12

RECURRENCE IN COUNTABLE STATE MARKOV CHAINS

Near convexity, metric convexity, and convexity

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a

Introduction to Markov Chains and Riffle Shuffling

CHAPTER 1. Metric Spaces. 1. Definition and examples

Uniform random posets

Introduction to Stochastic Processes

MATH FINAL EXAM REVIEW HINTS

4.6 Montel's Theorem. Robert Oeckl CA NOTES 7 17/11/2009 1

AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES

Probability & Computing

ARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions

Markov Processes Hamid R. Rabiee

MARKOV CHAINS AND MIXING TIMES

Midterm 1. Every element of the set of functions is continuous

0 Sets and Induction. Sets

TUCBOR. is feaiherinp hit nest. The day before Thanks, as to reflect great discredit upon that paper. Clocks and Jewelry repaired and warranted.

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Convex Optimization CMU-10725

Continuity. Chapter 4

Introduction to Real Analysis

Real Analysis. Joe Patten August 12, 2018

Lecture 9: Cardinality. a n. 2 n 1 2 n. n=1. 0 x< 2 n : x = :a 1 a 2 a 3 a 4 :::: s n = s n n+1 x. 0; if 0 x<

Nader H. Bshouty Lisa Higham Jolanta Warpechowska-Gruca. Canada. (

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

for average case complexity 1 randomized reductions, an attempt to derive these notions from (more or less) rst

Lecture 20 : Markov Chains

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Extremal Cases of the Ahlswede-Cai Inequality. A. J. Radclie and Zs. Szaniszlo. University of Nebraska-Lincoln. Department of Mathematics

4.4 Noetherian Rings

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Lecture 8: Basic convex analysis

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Markov and Gibbs Random Fields

Finite and Infinite Sets

Power Domains and Iterated Function. Systems. Abbas Edalat. Department of Computing. Imperial College of Science, Technology and Medicine

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Exercise Solutions to Functional Analysis

Chapter 1 : The language of mathematics.

Notes on uniform convergence

Lecture Notes on DISCRETE MATHEMATICS. Eusebius Doedel

Metric Spaces and Topology

1 Take-home exam and final exam study guide

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers

Lecture 2: September 8

1 Solutions to selected problems

Chapter 8. P-adic numbers. 8.1 Absolute values

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Math 140A - Fall Final Exam

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker CHAPTER 2. UNIVARIATE DISTRIBUTIONS

Math 3450 Homework Solutions

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Which of the following are propositions? If it is a proposition, determine its truth value: A propositional function, but not a proposition.

Markov chains. Randomness and Computation. Markov chains. Markov processes

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011

Markov Chains Handout for Stat 110

Lecture Notes 1 Basic Concepts of Mathematics MATH 352

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

QUASI-UNIFORMLY POSITIVE OPERATORS IN KREIN SPACE. Denitizable operators in Krein spaces have spectral properties similar to those

ADDENDUM B: CONSTRUCTION OF R AND THE COMPLETION OF A METRIC SPACE

Continuity. Chapter 4

Nets Hawk Katz Theorem. There existsaconstant C>so that for any number >, whenever E [ ] [ ] is a set which does not contain the vertices of any axis

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Notes on Iterated Expectations Stephen Morris February 2002

Final Exam Review. 2. Let A = {, { }}. What is the cardinality of A? Is

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes

Math 320: Real Analysis MWF 1pm, Campion Hall 302 Homework 8 Solutions Please write neatly, and in complete sentences when possible.

MAT115A-21 COMPLETE LECTURE NOTES

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Algebraic function fields

1 Introduction It will be convenient to use the inx operators a b and a b to stand for maximum (least upper bound) and minimum (greatest lower bound)

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

6 Markov Chain Monte Carlo (MCMC)

Mathematical Methods for Computer Science

Monte Carlo Methods. Leon Gu CSD, CMU

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Contents. 4 Arithmetic and Unique Factorization in Integral Domains. 4.1 Euclidean Domains and Principal Ideal Domains

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS

Lectures on Markov Chains

Fuzzy and Non-deterministic Automata Ji Mo ko January 29, 1998 Abstract An existence of an isomorphism between a category of fuzzy automata and a cate

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Transcription:

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss total variation distance and mixing times to bound the rate of convergence of a Markov Chain to its stationary distribution. Functions on state spaces are then considered, including a discussion of which properties of Markov Chains are preserved over functions, and we will show that the mixing time of a Markov chain is greater than or equal to the mixing time of its image. Contents. Introduction Denition.. A Markov Chain with countable state space is a sequence of - valued random variables, ; ; 3 ; : : :, such that for any states x i, and any time n, P f n = x n jj n = x n ; n = x i ; : : : 0 = x 0 g = P f n = x n jj n = x n g This denition says that the state of a Markov Chain depends only on the state immediately preceding it, and is independent of any behavior of the chain before that. We will often denote the probability of going from one state to another, P f n = x n j n = x n g as p(x n ; x n ). To refer to the probability of going from one state to another in j steps, P f n+j = yj n = xg, we will use the notation of p j (x; y). We will also often refer to the transition probability matrix P of a Markov Chain. This means the jj jj matrix, where for all i; j in, P ij = p(i; j). Denition.. A Markov Chain n with transition probability matrix P is irreducible if for any two states x; y there exists an integer t (possibly depending on x and y) such that p t (x; y) > 0 By this denition, if a Markov chain n is irreducible, there is positive probability that, starting at any state, the Markov chain will reach any other state in nite time. Denition.3. Let T (x) = ft : p t (x; x) > 0g be the set of times t for which there is a positive probability for the chain to return to starting position x at time t. The period of state x is dened to be the greatest common divisor of T (x). Date: August 3, 00.

JAMES READY Denition.4. A Markov Chain n is said to be aperiodic if the period of all of its states is. We will often discuss Markov Chains that are both irreducible and aperiodic. In this case, the following theorem is useful: Theorem.5. If n is an aperiodic and irreducible Markov Chain with transition probability matrix P on some nite state space, then there exists an integer M such that for all m > M, p m (x; y) > 0 for all x; y. Proof. For any x, recall from the denition of periodicity that T (x) = ft j P t (x; x) > 0g, and since n is aperiodic, we know gcd(t (x)) =. Consider any s and t in T (x). Since p s+t (x; x) p s (x; x)p t (x; x) > 0 we know that s + t is in T (x) as well, so T (x) is closed under addition. We will now use the fact from number theory that any set of non-negative integers which is closed under addition and which has greatest common divisor must contain all but nitely many of the non-negative integers. This implies that we can nd some t x such that, for all t t x, t T (x). Fix any x; y in. Because n is irreducible, there exists some k (x;y) such that p k (x;y)(x; y) > 0. Thus, for t t x + k (x;y), p t (x; y) p t k (x;y) (x; x)p k (x;y) (x; y) > 0 Now, for any x, let t 0 x = t x +max y k (x;y). For all y, we have p t0 x (x; y) > 0. We then know that for all m max t 0 x, we have p m (x; y) > 0 for any x; y.. Stationary Distributions Denition.. For any Markov Chain n with transition probability matrix P, a stationary distribution of n is any probability distribution satisfying the condition that = P By matrix multiplication, it is equivalent that, for any y, (y) = (x)p(x; y) I will show in this section that, for any Markov chain n on a nite state space, the stationary distribution exists, and if n is irreducible, then is unique. In the next section, I will show that if n is an irreducible and aperiodic Markov chain, there is a notion of convergence to its stationary distribution. For most of the theorems in this section, the condition of a nite state space is necessary. To show this, consider the simple random walk on Z, which is a Markov Chain represented by the transition probabilities: p(x; y) =( : if jx yj = 0 : otherwise

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 3 I will show that the simple random walk cannot have any stationary distribution. Suppose the simple random walk on has a stationary distribution, that is, there exists some probability distribution such that (y) = (x)p(x; y) ) (y) = (y ) + (y + ) This means that : Z! [0; ] is a harmonic function; that is, it satises the condition that, for all x, (x) equals the average of its neighbors. It can be shown that the only harmonic functions on Z are linear. Because must always be nonnegative, we know that the slope of must be zero. If (x) P = 0 for all x, then is not a probability distribution; similarly, if (x) 6= 0, then Z (x) =, so is not a probability distribution. Thus, the simple random walk cannot have any stationary distribution. Theorem.. If n is a Markov Chain with transition probability matrix P on nite state space, then, the stationary distribution of n, exists. Proof. Let S = probability distributions onf; ; : : : ; ng R n. Because the sum of all components of a probability distribution must equal one, S is a compact subset of R n. Choose any S. We know that P; P ; P 3 ; : : : are all also in S. Now, consider the sequence n = n n k=0 n is also a sequence in S. Because S is compact, it must have some convergent subsequence. Thus, there exists some subsequence nj that converges to some in S. I claim that this is a stationary distribution for P. We want to show that P =, or equivalently, P = 0. P = lim v n j! j P = lim j! ) P = lim j! = lim j! = lim j! = 0 P k n j P k+ n j k=0 n j n j k=0 n j n j k= n j P k+ n j n j P k n j n j ((P nj ) ) since ((P nj ) ) is bounded and lim j! n j goes to 0. Theorem.3. If n is an irreducible Markov Chain with transition probability matrix P on nite state space, then the stationary distribution is unique. k=0 k=0 P k P k

4 JAMES READY Proof. Suppose there exists and such that P = and P =. It is sucient to show that (x) (x) is a constant over all x, because if all stationary vectors are scalar multiples of each other, there can be at most one with components that sum to. Because n is irreducible, we P know that, if P =, then (x) > 0 for all x. This is true because Because (x) =, there must exist some y such that (y) > 0, so if we choose any x, there must be some m such that p m (y; x) > 0 (or else n would be reducible), so it immediately follows that can have no 0 components. Since is nite, it follows that there exist some > 0 such that This implies that, for all y, min x (y) (y) 0 (x) (x) = and = 0 for some y If is not the zero vector, then we get that, for some appropriate scalar, ( ) is also a stationary distribution. However, ( ) = 0 at some x, contradicting what we showed above, that no stationary distribution of an irreducible Markov chain can have any components equal to 0. Thus, must equal 0 for all x, so (x) (x) is a constant, and P can only have one stationary distribution. 3. Convergence The goal of this section is to place an upper bound on the time it takes any irreducible, aperiodic Markov chains converge to its stationary distribution. This section is mostly based on Chapter 4 of Markov Chains and Mixing Times. 3.. Total Variation Distance. In order to dene Mixing Times, we rst need to dene a method to measure distance between probability distributions. Denition 3.. For some probability distributions and, the Total Variation Distance between and, denoted jj jj T V, is jj jj T V = max j(a) (A)j A Now, we will introduce some other denitions of Total Variation distance and show that they are equivalent. Proposition 3.. jj jj T V = j(x) (x)j Proof. Let B = fx j (x) (x)g. For any event A, Similarly, Because (A) (A) (A \ B) (A \ B) (B) (B) (A) (A) (B c ) (B c ) (B) + (B c ) = = (B) + (B c )

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 5 we get that (B c ) (B c ) = (B) (B), so this is an upper bound of jj jj T V, and if we consider the event A = B, we get equality. Thus, we get, jj jj T V = [(B) (B) + (Bc ) (B c )] = j(x) (x)j Note that the above proof also shows that jj jj T V = ;(x)>(x) j(x) (x)j To understand the next denition of Total Variation Distance, we need to use coupling. Denition 3.3. A coupling of two probability distributions and v is a pair of random variables (; Y ) such that the marginal distribution of is and the marginal distribution of Y is v. Thus, if q is a joint distribution of ; Y on, meaning q(x; y) = P f = q(x; y) = (y). Clearly, and Y can always have the same value only if and are identical. x; Y = yg, then P y q(x; y) = (x) and P Proposition 3.4. jj jj T V = inffp f 6= Y g j (; Y ) is a coupling of and g Note: Such a coupling is called an optimal coupling. In the proof below, we will show an optimal coupling always exists. Proof. For any event A, (A) (A) = P ( A) P (Y A) P f A; Y = Ag P f 6= Y g ) jj jj T V inffp f 6= Y (; Y ) is a coupling of and g Now, we need to construct a coupling such that equality holds. Let p = minf(x); (x)g

6 JAMES READY We show that p = minf(x); (x)g = (x) + ;(x)<(x) = (x) + ;(x)<(x) = ((x) (x)) ;(x)(x) ;(x)(x) ;(x)(x) (x) (x) + ;(x)(x) (x) ;(x)(x) (x) = jj(x) (x)jj T V Now, ip a coin with probability p of heads. If the coin is heads, choose the value of by the probability distribution minf(x); (x)g (x) = p In this case, let Y =. If the coin is tails, choose the value of using the probability distribution ( (x) (x) : if (x) > (x) p (x) = 0 : otherwise Independently choose the value of Y using the probability distribution ( (x) (x) : if (x) > (x) p (x) = 0 : otherwise Using this, we have, for the distribution of, and for the distribution of Y, we have p + ( p) = p + ( p) = This means that we have dened a coupling (; Y ) of and. Note that 6= Y if and only if the tail lands tails. Thus, P f 6= Y g = p = jj jj T V 3.. The Convergence Theorem. Theorem 3.5. Suppose P is an irreducible, aperiodic Markov Chain, with stationary distribution. Then there exists (0; ) and C > 0 such that, for all t 0, max jjp t (x; ) jj T V C t Proof. Since P is irreducible and aperiodic, there exists > 0 and r N such that, for all x; y in, p r (x; y) (y) Let =. Let be a square matrix with jj rows, each of which is. P r = ( ) + Q denes a stochastic matrix Q.

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 7 Now, we will show P rk = ( k ) + k Q k by induction. For k =, this is true, because it is exactly how we dened Q. Assume this holds for k = n. P r(n+) = P rn P r = [( n ) + n Q n ]P r = ( n ) P r + ( ) n Q n + (n+) Q n Q Because P r =, and Q n =, we get completing the induction. P r(n+) = ( n+ ) + n+ Q n+ ) P rk P j = ( k ) P j + k Q k P j P rk+j = k + k Q k P j P rk+j = k (Q k P j ) P rk+j (x 0 ; ) (x 0 ; ) = k Q k P j (x 0 ; ) (x 0 ; ) Thus, the left hand side above becomes the Total Variation Distance, and the second term on the right hand side is at most, so we get that there exists some C j such that max jjp rk+j (x 0 ; ) jj T V C j t Now, let C = maxfc 0 ; C ; : : : ; C r ; C r g, let k = [ t ], and we have r max jjp rk+j (x 0 ; ) jj T V C t 3.3. Standardizing Distance from Stationary. We will now introduce notation to show the distance of a Markov Chain from its stationary distribution, and the distance between two Markov Chains. Denition 3.6. Lemma 3.7. d(t) = max jjp t (x; ) jj T V d(t) = max x;y jjp t (x; ) P t (y; )jj T V d(t) d(t) d(t) Proof. Because the triangle inequality holds for Total Variation Distance, for all x; y in, jjp t (x; ) P t (y; )jj T V so d(t) d(t). Since is stationary, jjp t (x; ) jj T V + jjp t (y; ) jj T V (A) = y (y)p t (y; A)

8 JAMES READY Thus, we have d(t) = max jjp t (x; ) jj T V = max = max max max max jp t (x; A) (A)j A j max (y)jp t (x; A) P t (y; A)j A y (y) max jp t (x; A) P t (y; A)j A y max jjp t (x; ) P t (y; )jj T V y max x;y jjp t (x; ) P t (y; )jj T V = d(t) P The second to last inequality holds since y (y) is a convex linear combination. Lemma 3.8. d is submuliplicative; that is, d(s + t) d(s)d(t) Proof. Fix x; y in, and let ( s ; Y s ) be the optimal coupling of P s (x; ) and P s (y; ). We early showed this coupling does exist, and that jjp s (x; ) P s (y; )jj T V = P f s 6= Y S g p s+t (x; w) = z p s (x; z)p t (z; w) = z P x f s = zgp t (z; w) = E x (P t ( s ; w)) Similarly, p s+t (y; w) = E(p t (Y s ; w)). ) p s+t (x; w) p s+t (y; w) = E(P t ( s ; w) P t (Y s ; w)) ) jjp s+t (x; w) P s+t (y; w)jj T V = je(p t ( s ; w) P t (Y s ; w))j w E( jp t ( s ; w) P t (Y s ; w)j) w = E(jjP t ( s ; ) P t (Y s ; )jj T V ) d(t)p f s 6= Y s g d(t)d(s) We now get to the following corollary: Corollary 3.9. For all c N, d(ct) d(ct) d(t) c

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES 9 Proof. From the earlier lemma, d(ct) d(ct). From the lemma immediately above, d(ct) = d(t + t + : : : + t) (c times) d(t)d(t) : : : d(t) (c times) = d(t) c 3.4. Mixing Times. Denition 3.0. The mixing time of a Markov Chain, with respect to, is We will also let t mix = t mix ( 4 ). t mix () = minft j d(t) g The choice of exactly 4 is somewhat arbitrary, but as we will see below, it simplies the calculation of a key inequality. By corollary 3.9, for any l N d(lt mix ()) d(lt mix ()) d(t mix ()) l () l ) d(lt mix ) l This means that a Markov Chain converges to its stationary distribution exponentially fast. 4. Functions We will now consider the idea of functions on state spaces. We let n be a Markov Chain with transition probability matrix P on some state space, and consider some function F :! W. F will always be surjective, but not necessarily injective. We will let W n = F ( n ). For all of this section, assume is at most countable. I will show, by counterexample, that W n is not always a Markov Chain. Let = fa; b; cg and W = fd; eg. Let n be the Markov Chain represented by the below directed graph, where p(i; j) = if i has an outgoing edge to j, and 0 otherwise. a / b / c Dene F such that F (a) = F (b) = d, and F (c) = e. Let n have the starting distribution fp ; p ; p 3 g over fa; b; cg. We want to show the Markov condition does not hold for W n ; that is, there exists some n and w n such that P fw n = w n jw n = w n g 6= P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g This is true for W = d, because P fw = djw 0 = e; W = eg = P f = a; 0 = (b or c); = (b or c)g P f 0 = (b or c); = (b or c)g = p p =

0 JAMES READY but P fw = djw = eg = P f = a; = (b or c)g P f = (b or c)g = p p + p P fw = djw 0 = e; W = eg 6= P fw = djw = eg, so W n is not a Markov Chain. However, there does exist a condition on n that guarantees W n to be a Markov Chain. Theorem 4.. Let n be a Markov Chain with transition probability matrix P on some state space, and consider some function F :! W. Let W n = F ( n ). For each w W, let F (w) = fx jf (x) = wg. If for every pair of states w ; w in W and every pair of states x ; x in F (w ), yf (w ) then W n is a Markov Chain on W. p(x ; y) = yf (w ) p(x ; y) I will say that any Markov Chain with a function which satises the condition of the above theorem is function regular. Proof. We want to show that P fw n = w n jw n = w n g = P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g We know for every x ; x in F (w n ), there exists some! R such that I claim that yf (w n) p(x ; y) = yf (w n) p(x ; y) =! P fw n = w n jw n = w n g = P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g =! First, I will show this is true for the left hand side. P fw n = w n jw n = w n g = P f n F (w n )j n F (w n )g = P f n F (w n ); n F (w n )g P f n F (w n )g P PF = (P ( (w n ) n = x) yf (w n) p(x; y)) P f n F (w n )g PF (P ( (w n ) n = x)!) = P f n F (w n )g =! P F (w n ) (P ( n = x)) P f n F (w n )g =!

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES Now, I will show this is true for the right hand side. P fw n = w n jw n = w n ; W n = w n ; ; W 0 = w 0 g = P f n F (w n )j n F (w n ); n F (w n ); ; 0 F (w 0 )g = P f n F (w n ); n F (w n ); n F (w n ); ; 0 F (w 0 )g P f n F (w n ); n F (w n ); ; 0 F (w 0 )g = P F (w n ) P f n F (w n ); ; 0 F (w 0 )g P f n = xj n F (w n ); ; 0 F (w 0 )g ( P yf (w n) P (x; y))!= P f n F (w n ); n F (w n ); ; 0 F (w 0 )g =! P F (w n ) P f n F (w n ); ; 0 F (w 0 )g P f n = xj n F (w n ); ; 0 F (w 0 )g F (w 0 )g!!! = P f n F (w n ); n F (w n ); ; 0 =! Thus, W n is a Markov Chain. However, function regularity is only a sucient condition for the image to be a Markov Chain, not a necessary one. Let n be the Markov Chain represented by the transition probability matrix below. a b c d a 0.5.5 0 b 0 0 0 c 0 0 0 d 0.5.5 0 Let F (a) = ; F (b) = F (c) =, and F (d) =. Because p(b; a) 6= p(c; a), n is not function regular. We then have that W n is represented by the following transition probability matrix: 0 0.5 0.5 0 0 This counterexample demonstrates an issue with deciding how we determine the starting distribution of the n. Given any starting distribution of W n, can we choose how we want to pull this back onto the n? If we know or can x any pull back of starting probabilities in, then W n is a Markov Chain, so this is

JAMES READY a counterexample. However, if we do not know how the starting distribution is pulled back in, and all we know is the observed probabilities of the W n, then P fw = jw 0 = g is not well dened, and W n is not a Markov Chain. For this rest of this section, unless otherwise stated we will not assume n is function regular. However, we will assume that W n is a Markov Chain. We will now discuss some properties of Markov Chains that are, or are not, preserved over functions on state spaces. First, we will prove a lemma needed for the next theorem. Lemma 4.. Let P be a transition probability matrix on a state space Y. Then a probability distribution on Y is stationary for P if, on some probability space, there exist Y-valued random variables Y 0 ; Y such that (a) each of Y 0 and Y has marginal distribution ; and (b) the conditional distribution of Y given Y 0 = y is the y th row of P. Proof. Suppose we have some random variables Y 0 ; Y such that P fy = xjy 0 = yg = P (y; x) and P fy 0 = yg = P fy = yg = (y). We want to show that is stationary; that is, we want to show (y i ) = P yy ((y)p (y; y i)). (y; y i )) = (P fy 0 = yg P fy = xjy 0 = yg) yy((y)p yy = yy(p fy = y i ; Y 0 = yg) = (y i ) Theorem 4.3. Let n, with stationary distribution, be a function regular Markov Chain. Let W n = F ( n ), as usual. Then the projection F of to W is a stationary distribution for W n. Proof. We will rst choose two random variables Y 0 ; Y such that: P fy = x j Y 0 = yg = q(y; x) () P fy 0 = yg = P fy = yg = (F (y)) () = (z) zf (y) By the lemma, we know that if two such random variables exist, than (F ) is a stationary distribution. Let P (Y = x; Y 0 = y) = f(x; y). Having assumed P fy = x j Y 0 = Y g = q(y; x), we want to show that f(x; y) satises the condition () above. P fy = x; Y 0 = Y g = q(y; x) P fy 0 = yg f(x; y) ) = q(y; x) PzF (y) (z) ) f(x; y) = q(y; x) ) f(x; y) = W zf (y) zf (y) (z) (z)

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES3 P The last step follows since W q(y; x) =. P Thus, we have just shown that P (Y 0 = y) = zf (y) (z), the second part of condition. Now, all that is left is to show that P (Y = x) = ( F (x)). q(y; x) = P fw = y j W 0 = xg = P fx F (y) j 0 F (x)g = P fx F (y) j 0 = z F (x)g We know that, since n is function regular, the last step above does not depend on the choice of z. Choose any u in F (y). yw f(x; y) = yw P (u; v) vf (x) = yw zf (y) vf (x) = yw zf (y) vf (x) = vf = vf (x) (x) z = ( F )(x) p(z; v)(z) zf (y) (z) P (u; v)(z) P (z; v)(z) (v) (Since is stationary in ) (Since we were able to combine the y sum and the z sum) We just showed that P y f(x; y) = P fy = xg = ( F )(x), so we are done. Now, as an example of a Markov Chain with a function on its state space, I will introduce the Ehrenfest Urn model. The state space is f0; g N. At each integer time t, one random index j N is chosen, and the j th coordinate from time t is switched. Thus, for two vectors x and y, the transition probabilities are: p(x; y) =( : if x and y dier in exactly one coordinate N 0 : otherwise This can be visualized as two urns of balls. At each time t, one ball is randomly chosen to switch urns. I claim that the stationary distribution of the Ehrenfest Urn model, is that all possible vectors are uniformly distributed with probability. Because there N are N vectors, this is a probability distribution. This distribution is stationary because, for any y = N N N N = N N N fxjx and y dier in exactly one coordinate g (y) = (x)p(x; y) Because the Ehrenfest Urn model is irreducible and is nite, is unique.

4 JAMES READY Now, let W = f0; ; ; : : : ; Ng, and dene W n = N j= n (j) We know that W n is a Markov chain because n is function regular (any vector with k 's at time t is equally likely to have l 's at time t). We can determine the transition probabilities of W n by looking of the transition probabilities of n. If W n = w, then W n+ increases by if a 0 is chosen to be switched in n, and decreases by if a is chosen in n. the transition probabilities are as follows: P (w; w + ) = N w N P (w; w ) = w N P (w; any other value) = 0 Using Theorem 5.3, we can also determine the stationary distribution of W n by considering the stationary distribution,, of n. Since all vectors are equally likely, it is simply a matter of counting how many vectors in go to each value in N Y. The number of ways to have w 's in an N length vector is w. Because there are N binary vectors of length N, we can see that the stationary distribution of W n,, is: (w) = We will now show that, if n converges to its stationary distribution, then W n converges to its stationary distribution at least as fast. Theorem 4.4. If n is irreducible and aperiodic, then W n is irreducible and aperiodic. Proof. Because n is irreducible and aperiodic, there exists some M such that for all m > M and x; y, p m (x; y) > 0. We will now show that q m (w ; w ) > 0 for any m > M and w ; w W, where q m (w ; w ) represents the transition probabilities of going from w to w in m steps in the chain of the W n. Choose any w ; w W. N w N q m (w ; w ) = P f n F (w ) j n m F (w )g = P f n F (w ); n m F (w )g P f n m F (w )g PF (w );yf (w ) pm (x; y) = > 0 P f n m F (w )g This condition is stronger than the condition for irreducibility, so W n is irreducible. If we let w = w, this condition says that for any m > M and w W, q m (w ; w ) > 0, so T (w ) fm j m > Mg. This implies that gcd(t (w )) = for all w W, so W n is aperiodic. Theorem 4.5. d(t) Wn d(t) n.

j j MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES5 Proof. d(t) Wn = max ww jj yw = max ww jj z max ww max jj z = d(t) n q t (w; y) (y) F (w) jj max F (w) z p t (x; z) (x) The third line followed because F (w) jj T V P f t = xj t F p t (x; z) (x) jj T V (w)gp t (x; z) (x) jj T V By Theorem 5.3 jj T V! P f t = xj t F (w)g = Corollary 4.6. For all, t mix ()(W n ) t mix ()( n ) Proof. Recall that t mix () = minftjd(t) g. This corollary follows immediately from theorem 5. I'll now show some counterexamples of properties that are not preserved under functions on state spaces. If n is reducible, W n is not necessarily irreducible. Consider an n of two identical, non-communicating classes, for example, the Markov Chain represented by the graph below: a c * * b c It is clear that there is a function (namely, F (a) = F (c) = and F (b) = F (d) = ) that maps the corresponding states in each class to the same point, creating a Markov chain with a single communicating class. If n is periodic, then W n is not necessarily period. Consider the Markov Chain represented by the transition probability matrix below to be n : a b c d a 0.5 0.5 b.5 0.5 0 c 0.5 0.5 d.5 0.5 0 Now let F (a) = F (b) = and F (c) = F (d) =. We then have that W n represented by the transition probability matrix below:.5.5.5.5 Here, n has period, but its image has period, so W n is aperiodic. Consider the following lemma: is

6 JAMES READY Lemma 4.7. Let n be a function regular Markov Chain, and W n = F ( n ). q(w ; w ) = 0 if and only if for all x F (w ) and y F (w ), we have p(x; y) = 0. Proof. First, we will show that the right hand side implies the left hand side. Suppose that for all x F (w ), y F (w ), p(x; y) = 0. We then get that P f n F (w ) j n F (w )g = 0 which is equivalent to saying q(w ; w ) = 0, as desired. Now, we will show the left hand side implies the right hand side by proving the contrapositive. Suppose there exists some P x 0 F (w ) and y 0 F (w ) such that p(x; y) = 0. This means that yf p(x (w ) 0; y) P =, for some > 0. Because n is function regular, we know that for all x, yf (w ) p(x; y) =. Also, because we are conditioning on the event n F (w ), we will assume that P f n F (w )g > 0. We can now show that q(x; y) = P f n F (w ) j n F (w )g = P f n F (w ); n F (w )g P f n F (w )g P PF = P f (w ) n = xg ( yf (w ) p(x; y)) P f n F (w )g = P f n F (w )g P f n F (w )g > 0 Corollary 4.8. If n is irreducible and function regular, then W n is irreducible. Proof. Recall our denition of an irreducible Markov Chain: n is irreducible if for any two states x; y there exists an integer t (possibly depending on x and y) such that p t (x; y) > 0. It follows immediately from the lemma that if n is irreducible, then W n is irreducible. Corollary 4.9. If n is aperiodic and function regular, then W n is aperiodic. Proof. Recall our set T (x) = ft j p t (x; x) > 0g from our denition of the period of a Markov Chain. Let T (x) = ft j q t (F (x); F (x)) > 0g. The above lemma shows that T (x) must be always contain T (x). Thus, for any w, the gcd(t (x)) gcd(t (x)). Because n is aperiodic, gcd(x) = for all x, so, because F is always onto, gcd(w) = for all w W. Thus, W n is aperiodic. Acknowledgments. It is a pleasure to thank my mentor, Yuan Shao, for helping me decide on a topic, assisting me with the dicult proofs and concepts, and looking over all of my work to ensure its correctness. I could not have done this paper without him. References [] David A. Levin, Yuval Peres, and Elizabeth Wilmer. Markov Chains and Mixing Times. http://www.uoregon.edu/ dlevin/markov/markovmixing.pdf. [] Steven P. Lalley. Markov Chains: Basic Theory. http://galton.uchicago.edu/ lalley/courses/3/markovchains.pdf.

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES7 [3] Steven P. Lalley. Statistics 3: Homework Assignment 5. http://galton.uchicago.edu/ lalley/courses/3/hw5.pdf