Achieving Stationary Distributions in Markov Chains. Monday, November 17, 2008 Rice University

Istructor: Achievig Statioary Distributios i Markov Chais Moday, November 1, 008 Rice Uiversity Dr. Volka Cevher STAT 1 / ELEC 9: Graphical Models Scribe: Rya E. Guerra, Tahira N. Saleem, Terrace D. Savitsky 1 Motivatio Markov Chai Mote Carlo (MCMC) simulatio is a very popular method to produce samples from a kow posterior distributio for hidde variables where the form of the distributio is highly complex such that it may ot be directly sampled. The MCMC algorithm draws samples from a proposal distributio from where we kow it will be easy to acquire samples. Whe certai properties are met by both the kow posterior ad the proposal distributios, the MCMC algorithm possesses the pleasig quality that samples draw i successive iteratios will almost surely coverge to samples draw from the kow posterior. The Markov Chai (MC) time-idexed radom process provides the uderlyig theory that supplies these eeded properties that must be met by both the posterior distributio ad the proposal distributio so that we may use the MCMC algorithms. Specifically, we will eumerate the properties that must be possessed by a properly costructed probability trasitio probability matrix for a discrete state Markov Chai to esure that it coverges to a ivariat, statioary distributio as the umber of steps i our chai icreases. Review of Markov Chais A MC is a time-idexed radom process with the Markov property. Havig the Markov property meas that give the preset state, future states are idepedet of the past states. Cosider a k-state markov chai where for Z +, the set of all positive itegers, π j (0) = p(x 0 = s j ), π j () = p(x = s j ). 1

... x x 1 x x +1... By D-separatio (1) for a head-to-tail cofiguratio, we kow x x 1 {x 1, x,..., x }. We defie trasitio matrix P (i j) = P(x = s x 1 = s 1 ) := the probability of movig from state s i at time 1 to state s j at time step. We assume the trasitio matrix P is time homogeous, meaig the probabilities do ot vary with time (or space) i the Markov chai. More costructively, we build our probability trasitio matrix with [P ij ] = P(i j). Defie Π() as the margial probability distributio (vector) that supplies the probabilities of beig i each of the states i state space S S at time 1. The, give the margial probability vector at time 1, we may compute the same at time : Π() = PΠ( 1). By iteratig this algorithm startig at time = 0, we may defie the margial probability over the state space S at time by repeated applicatio of the trasitio matrix, Π() = P Π(0). Covergece to the Statioary Distributio.1 The Trasitio Matrix coverges to a ivariat distributio We will demostrate that if our Markov Chai satisfies certai properties reflected i the costructio of the trasitio matrix P that lim t P Π.

where Π is a ivariat/costat margial distributio, idepedet of time. I other words, P will coverge to a rak 1 matrix with costat colums equal to the statioary distributio.. Coditios for Covergece of Trasitio Matrix, P For a trasitio matrix P to coverge to a ivariat distributio, it must possess the followig properties (5): Irreducibility - A MC is said to be irreducible if every state i the state space S ca be reached from every other state space i a fiite umber of moves with positive probability. This property may also be stated as every state commuicates with every other state. We may express this property i compact form with: 0 : p () ij > 0, i, j S A irreducible Markov Chai is said to possess a sigle class. If a MC cotais multiple classes, the the MC may be divided ito separate chais, each with its ow trasitio matrix. Aperiodicity - The periodicity d(i) of state s i measures the miimum umber of steps it takes to have positive probability of returig to that state. We express d(i) as: d(i) := gcd{ 1 : p () ii > 0} where gcd stads for greatest commo divisor ad represets the step multiple required to retur to state i. A state is called aperiodic if the miimum umber of steps equals 1. A MC is said to be aperiodic if all the states i the MC are aperiodic. Recurrece - A state s i is called recurret if the chai returs to s i with probability 1 i a fiite umber of steps. The MC is recurret if all the states i S are recurret. Whe these properties are satisfied by our MC, the all the etries of P are 0 < p ij < 1, i, j P. Ituitively, this costructio of P says that there is some positive probability of movig to (or remaiig i) ay other state from ay other state at time. We caot get stuck i ay state. The Perro-Frobeius theorem () tells us that for a matrix A with positive etries a ij > 0 there is a positive real eigevalue r of A such that ay other eigevalue λ satisfies λ < r. The boud r is referred to as the spectral radius of A.

The practical sigificace is that o repeated applicatio of A, for example i a time-idexed radom process, the directio of the largest eigevalue domiates so that startig i ay state at time 0, repeated applicatio of A will drive the system to the ivariat directio expressed by the eigevector u i of λ i, the largest eigevalue of A. For example, if we have a eigebasis E = (v 1,, v ) that spas R for A (which meas it is diagoalizable), the we may express ay state vector x R as x = c 1 v 1 + + c v. The, Ax = A (c 1 v 1 + + c v ) = c 1 λ 1 v 1 + + c λ v. The i repeated applicatio of A over steps, we have A x λ 1 = c 1 v 1 + c λ λ 1 v + + c λ λ1 v. The we may coclude, lim A x = c 1 λ 1 v 1. Returig to our trasitio matrix P, with 0 < p ij < 1, i, j P, this costructio of P esures that the largest eigevalue λ = 1 ad all other eigevalues λ 1 of P satisfy λ < 1. Also, i this case there exists a vector havig positive etries, summig to 1, which is a eigevector associated to the eigevalue λ = 1. Both properties ca the be used i combiatio to show that the limit P := lim k P k exists ad is a positive stochastic (radom) matrix of matrix rak oe cotaiig the desired statioary distributio, Π, which is the eigevector associated to λ = 1. Note that sice eigevectors are uique oly up to costats of proportioality, we eforce the costrait that the members of the eigevector must sum to 1 i order to provide the desired uique solutio.. Eige Decompositio ad Diagoalizability Recall for a matrix A the eigevalues are solutios to the characteristic polyomial p A (z) = det(zi A) = ((z λ 1 )(z λ ) (z λ m )) = 0. The algebraic multiplicity of λ is the umber of repeated values i p A (z). The associated eigevectors are derived for each λ from ker(λi A). The defie the geometric multiplicity of λ as the dimesio of this space or E λ = dim(ker(λi A)). We are able to defie a eigebasis for A ad to, therefore, diagoalize A if the algebraic multiplicity equals the geometric multiplicity for all λ. I this case, we may decompose A = ΓΛΓ 1, where Γ is the eigebasis of A ad Λ is a diagoal matrix with the eigevalues of A as the diagoal etries.

Example Fidig the Statioary Distributio for a trasitio matrix, P where P (x = s j x 0 = s i ) = [P ] j,i > 0, P = 0. 0.1 0. 0. 0. 0. 0. 0. 0. 5 0.0 0.15 0.1 P = 0.8 0.59 0.5 5 0. 0. 0.. From the Perro Frobeius Theorem ad the properties of a irreducible, aperiodic ad recurret MC, we kow the largest eigevalue is λ 1 = 1 > λ >... 1, P = [v 1 v v ] 1 0 0 0 λ 0 0 0 λ 5 [v 1v v ] 1, P = Γ 1 0 0 0 λ 0 0 0 λ 5 Γ 1. = Γ 1 0 0 0.... 5 Γ 1 P = v 1 / v 1 := u. 5

Π = u. If we calculate the eigevalue of our matrx P we fid λ 1 = 1.0000, λ = 0.5, λ = 0.05 ad u = [.18.5.9]. Refereces [1] C. Bishop, Patter Recogitio ad Machie Learig, Cambridge, U.K., Spriger Sciece 00. [] G. Casella ad E. George, Explaiig the Gibbs Sampler, The America Statisticia, Vol., No., August 199. [] S. Chib ad E. Greeberg, Uderstadig the Metropolis-Hastigs Algorithm, The America Statisticia, Vol. 9, No.. November 1995. [] J.L. Doob, Stochastic Processes, New York: Joh Wiley ad Sos, 195. [5] S.P. Mey ad R.L. Tweedie, Markov Chais ad Stochastic Stability, Lodo: Spriger-Verlag, 199. Secod editio to appear, Cambridge Uiversity Press, 008, olie: http://decisio. csl.uiuc.edu/~mey/pages/book.html.