Achieving Stationary Distributions in Markov Chains. Monday, November 17, 2008 Rice University

Similar documents
Fastest mixing Markov chain on a path

2 Markov Chain Monte Carlo Sampling

Stochastic Matrices in a Finite Field

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall Midterm Solutions

TCOM 501: Networking Theory & Fundamentals. Lecture 3 January 29, 2003 Prof. Yannis A. Korilis

Lecture 4. We also define the set of possible values for the random walk as the set of all x R d such that P(S n = x) > 0 for some n.

Math 61CM - Solutions to homework 3

1 Last time: similar and diagonalizable matrices

Modern Discrete Probability Spectral Techniques

Infinite Sequences and Series

MAT1026 Calculus II Basic Convergence Tests for Series

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Average Reward Optimization Objective In Partially Observable Domains - Supplementary Material

if j is a neighbor of i,

Notes for Lecture 11

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

AN EIGENVALUE REPRESENTATION FOR RANDOM WALK HITTING TIMES AND ITS APPLICATION TO THE ROOK GRAPH

CMSE 820: Math. Foundations of Data Sci.

REPRESENTING MARKOV CHAINS WITH TRANSITION DIAGRAMS

6.3 Testing Series With Positive Terms

Arkansas Tech University MATH 2924: Calculus II Dr. Marcel B. Finan

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

The inverse eigenvalue problem for symmetric doubly stochastic matrices

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Section 11.8: Power Series

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Symmetric Matrices and Quadratic Forms

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

Eigenvalues and Eigenvectors

Machine Learning for Data Science (CS 4786)

BIRKHOFF ERGODIC THEOREM

The Perturbation Bound for the Perron Vector of a Transition Probability Tensor

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities

18.S096: Homework Problem Set 1 (revised)

M17 MAT25-21 HOMEWORK 5 SOLUTIONS

Machine Learning for Data Science (CS 4786)

DEEPAK SERIES DEEPAK SERIES DEEPAK SERIES FREE BOOKLET CSIR-UGC/NET MATHEMATICAL SCIENCES

Lecture 14: Graph Entropy

Lecture 9: Hierarchy Theorems

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Lecture 8: October 20, Applications of SVD: least squares approximation

Linear chord diagrams with long chords

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

ELEC633: Graphical Models

Markov Decision Processes

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

5.1. The Rayleigh s quotient. Definition 49. Let A = A be a self-adjoint matrix. quotient is the function. R(x) = x,ax, for x = 0.

Math Solutions to homework 6

4. Partial Sums and the Central Limit Theorem

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

Feedback in Iterative Algorithms

Element sampling: Part 2

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Lecture 12: November 13, 2018

Metric Space Properties

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Math 113 Exam 3 Practice

Zeros of Polynomials

Lecture #20. n ( x p i )1/p = max

Advanced Stochastic Processes.

PROPERTIES OF AN EULER SQUARE

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Chimica Inorganica 3

Linearly Independent Sets, Bases. Review. Remarks. A set of vectors,,, in a vector space is said to be linearly independent if the vector equation

Random Models. Tusheng Zhang. February 14, 2013

CS284A: Representations and Algorithms in Molecular Biology

Define a Markov chain on {1,..., 6} with transition probability matrix P =

Massachusetts Institute of Technology

CS 330 Discussion - Probability

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

MAT 271 Project: Partial Fractions for certain rational functions

Introduction to Computational Molecular Biology. Gibbs Sampling

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.

Notes 27 : Brownian motion: path properties

CSE 1400 Applied Discrete Mathematics Number Theory and Proofs

Chapter 6 Infinite Series

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Roger Apéry's proof that zeta(3) is irrational

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 5

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

K. Grill Institut für Statistik und Wahrscheinlichkeitstheorie, TU Wien, Austria

Homework Set #3 - Solutions

INFINITE SEQUENCES AND SERIES

A Note on Matrix Rigidity

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

A Note On The Exponential Of A Matrix Whose Elements Are All 1

Ma 530 Introduction to Power Series

MA131 - Analysis 1. Workbook 3 Sequences II

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Entropy Rates and Asymptotic Equipartition

On Some Properties of Digital Roots

Solutions of Homework 2.

Spectral Partitioning in the Planted Partition Model

Sequence A sequence is a function whose domain of definition is the set of natural numbers.

ON MEAN ERGODIC CONVERGENCE IN THE CALKIN ALGEBRAS

MATH 1910 Workshop Solution

Continued Fractions and Pell s Equation

Linear Programming and the Simplex Method

NBHM QUESTION 2007 Section 1 : Algebra Q1. Let G be a group of order n. Which of the following conditions imply that G is abelian?

Transcription:

Istructor: Achievig Statioary Distributios i Markov Chais Moday, November 1, 008 Rice Uiversity Dr. Volka Cevher STAT 1 / ELEC 9: Graphical Models Scribe: Rya E. Guerra, Tahira N. Saleem, Terrace D. Savitsky 1 Motivatio Markov Chai Mote Carlo (MCMC) simulatio is a very popular method to produce samples from a kow posterior distributio for hidde variables where the form of the distributio is highly complex such that it may ot be directly sampled. The MCMC algorithm draws samples from a proposal distributio from where we kow it will be easy to acquire samples. Whe certai properties are met by both the kow posterior ad the proposal distributios, the MCMC algorithm possesses the pleasig quality that samples draw i successive iteratios will almost surely coverge to samples draw from the kow posterior. The Markov Chai (MC) time-idexed radom process provides the uderlyig theory that supplies these eeded properties that must be met by both the posterior distributio ad the proposal distributio so that we may use the MCMC algorithms. Specifically, we will eumerate the properties that must be possessed by a properly costructed probability trasitio probability matrix for a discrete state Markov Chai to esure that it coverges to a ivariat, statioary distributio as the umber of steps i our chai icreases. Review of Markov Chais A MC is a time-idexed radom process with the Markov property. Havig the Markov property meas that give the preset state, future states are idepedet of the past states. Cosider a k-state markov chai where for Z +, the set of all positive itegers, π j (0) = p(x 0 = s j ), π j () = p(x = s j ). 1

... x x 1 x x +1... By D-separatio (1) for a head-to-tail cofiguratio, we kow x x 1 {x 1, x,..., x }. We defie trasitio matrix P (i j) = P(x = s x 1 = s 1 ) := the probability of movig from state s i at time 1 to state s j at time step. We assume the trasitio matrix P is time homogeous, meaig the probabilities do ot vary with time (or space) i the Markov chai. More costructively, we build our probability trasitio matrix with [P ij ] = P(i j). Defie Π() as the margial probability distributio (vector) that supplies the probabilities of beig i each of the states i state space S S at time 1. The, give the margial probability vector at time 1, we may compute the same at time : Π() = PΠ( 1). By iteratig this algorithm startig at time = 0, we may defie the margial probability over the state space S at time by repeated applicatio of the trasitio matrix, Π() = P Π(0). Covergece to the Statioary Distributio.1 The Trasitio Matrix coverges to a ivariat distributio We will demostrate that if our Markov Chai satisfies certai properties reflected i the costructio of the trasitio matrix P that lim t P Π.

where Π is a ivariat/costat margial distributio, idepedet of time. I other words, P will coverge to a rak 1 matrix with costat colums equal to the statioary distributio.. Coditios for Covergece of Trasitio Matrix, P For a trasitio matrix P to coverge to a ivariat distributio, it must possess the followig properties (5): Irreducibility - A MC is said to be irreducible if every state i the state space S ca be reached from every other state space i a fiite umber of moves with positive probability. This property may also be stated as every state commuicates with every other state. We may express this property i compact form with: 0 : p () ij > 0, i, j S A irreducible Markov Chai is said to possess a sigle class. If a MC cotais multiple classes, the the MC may be divided ito separate chais, each with its ow trasitio matrix. Aperiodicity - The periodicity d(i) of state s i measures the miimum umber of steps it takes to have positive probability of returig to that state. We express d(i) as: d(i) := gcd{ 1 : p () ii > 0} where gcd stads for greatest commo divisor ad represets the step multiple required to retur to state i. A state is called aperiodic if the miimum umber of steps equals 1. A MC is said to be aperiodic if all the states i the MC are aperiodic. Recurrece - A state s i is called recurret if the chai returs to s i with probability 1 i a fiite umber of steps. The MC is recurret if all the states i S are recurret. Whe these properties are satisfied by our MC, the all the etries of P are 0 < p ij < 1, i, j P. Ituitively, this costructio of P says that there is some positive probability of movig to (or remaiig i) ay other state from ay other state at time. We caot get stuck i ay state. The Perro-Frobeius theorem () tells us that for a matrix A with positive etries a ij > 0 there is a positive real eigevalue r of A such that ay other eigevalue λ satisfies λ < r. The boud r is referred to as the spectral radius of A.

The practical sigificace is that o repeated applicatio of A, for example i a time-idexed radom process, the directio of the largest eigevalue domiates so that startig i ay state at time 0, repeated applicatio of A will drive the system to the ivariat directio expressed by the eigevector u i of λ i, the largest eigevalue of A. For example, if we have a eigebasis E = (v 1,, v ) that spas R for A (which meas it is diagoalizable), the we may express ay state vector x R as x = c 1 v 1 + + c v. The, Ax = A (c 1 v 1 + + c v ) = c 1 λ 1 v 1 + + c λ v. The i repeated applicatio of A over steps, we have A x λ 1 = c 1 v 1 + c λ λ 1 v + + c λ λ1 v. The we may coclude, lim A x = c 1 λ 1 v 1. Returig to our trasitio matrix P, with 0 < p ij < 1, i, j P, this costructio of P esures that the largest eigevalue λ = 1 ad all other eigevalues λ 1 of P satisfy λ < 1. Also, i this case there exists a vector havig positive etries, summig to 1, which is a eigevector associated to the eigevalue λ = 1. Both properties ca the be used i combiatio to show that the limit P := lim k P k exists ad is a positive stochastic (radom) matrix of matrix rak oe cotaiig the desired statioary distributio, Π, which is the eigevector associated to λ = 1. Note that sice eigevectors are uique oly up to costats of proportioality, we eforce the costrait that the members of the eigevector must sum to 1 i order to provide the desired uique solutio.. Eige Decompositio ad Diagoalizability Recall for a matrix A the eigevalues are solutios to the characteristic polyomial p A (z) = det(zi A) = ((z λ 1 )(z λ ) (z λ m )) = 0. The algebraic multiplicity of λ is the umber of repeated values i p A (z). The associated eigevectors are derived for each λ from ker(λi A). The defie the geometric multiplicity of λ as the dimesio of this space or E λ = dim(ker(λi A)). We are able to defie a eigebasis for A ad to, therefore, diagoalize A if the algebraic multiplicity equals the geometric multiplicity for all λ. I this case, we may decompose A = ΓΛΓ 1, where Γ is the eigebasis of A ad Λ is a diagoal matrix with the eigevalues of A as the diagoal etries.

Example Fidig the Statioary Distributio for a trasitio matrix, P where P (x = s j x 0 = s i ) = [P ] j,i > 0, P = 0. 0.1 0. 0. 0. 0. 0. 0. 0. 5 0.0 0.15 0.1 P = 0.8 0.59 0.5 5 0. 0. 0.. From the Perro Frobeius Theorem ad the properties of a irreducible, aperiodic ad recurret MC, we kow the largest eigevalue is λ 1 = 1 > λ >... 1, P = [v 1 v v ] 1 0 0 0 λ 0 0 0 λ 5 [v 1v v ] 1, P = Γ 1 0 0 0 λ 0 0 0 λ 5 Γ 1. = Γ 1 0 0 0.... 5 Γ 1 P = v 1 / v 1 := u. 5

Π = u. If we calculate the eigevalue of our matrx P we fid λ 1 = 1.0000, λ = 0.5, λ = 0.05 ad u = [.18.5.9]. Refereces [1] C. Bishop, Patter Recogitio ad Machie Learig, Cambridge, U.K., Spriger Sciece 00. [] G. Casella ad E. George, Explaiig the Gibbs Sampler, The America Statisticia, Vol., No., August 199. [] S. Chib ad E. Greeberg, Uderstadig the Metropolis-Hastigs Algorithm, The America Statisticia, Vol. 9, No.. November 1995. [] J.L. Doob, Stochastic Processes, New York: Joh Wiley ad Sos, 195. [5] S.P. Mey ad R.L. Tweedie, Markov Chais ad Stochastic Stability, Lodo: Spriger-Verlag, 199. Secod editio to appear, Cambridge Uiversity Press, 008, olie: http://decisio. csl.uiuc.edu/~mey/pages/book.html.