Lecture 5: Random Walks and Markov Chain

Similar documents
Some Definition and Example of Markov Chain

18.175: Lecture 30 Markov chains

18.600: Lecture 32 Markov Chains

18.440: Lecture 33 Markov Chains

The Markov Chain Monte Carlo Method

Probability & Computing

Markov Chains and MCMC

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

6.842 Randomness and Computation February 24, Lecture 6

Lecture 2: September 8

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Lecture 6: Random Walks versus Independent Sampling

Markov Processes Hamid R. Rabiee

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

Lecture 3: September 10

Stochastic Processes (Week 6)

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains Handout for Stat 110

25.1 Markov Chain Monte Carlo (MCMC)

Graph fundamentals. Matrices associated with a graph

Convex Optimization CMU-10725

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

6.842 Randomness and Computation March 3, Lecture 8

0.1 Naive formulation of PageRank

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

An Algorithmist s Toolkit September 10, Lecture 1

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Markov Chains (Part 3)

On random walks. i=1 U i, where x Z is the starting point of

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Markov Chains, Random Walks on Graphs, and the Laplacian

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

Markov Chains and MCMC

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any

Model Counting for Logical Theories

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Markov Chains and Stochastic Sampling

1 Random Walks and Electrical Networks

MARKOV CHAINS AND MIXING TIMES

1.10 Matrix Representation of Graphs

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Online Social Networks and Media. Link Analysis and Web Search

IEOR 6711: Professor Whitt. Introduction to Markov Chains

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Link Analysis Ranking

1 Adjacency matrix and eigenvalues

Modern Discrete Probability Spectral Techniques

4.7.1 Computing a stationary distribution

LTCC. Exercises. (1) Two possible weather conditions on any day: {rainy, sunny} (2) Tomorrow s weather depends only on today s weather

Linear Algebra Solutions 1

Math Camp Notes: Linear Algebra II

Approximate Counting and Markov Chain Monte Carlo

The Monte Carlo Method

Markov chains. Randomness and Computation. Markov chains. Markov processes

RANDOM WALKS. Course: Spring 2016 Lecture notes updated: May 2, Contents

6 Markov Chain Monte Carlo (MCMC)

Discrete quantum random walks

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Stochastic Processes

8.1 Concentration inequality for Gaussian random matrix (cont d)

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Introduction to Machine Learning CMU-10701

LECTURE 3. Last time:

A Glimpse of Quantum Computation

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Homework 10 Solution

Chapter 2. Markov Chains. Introduction

ECEN 689 Special Topics in Data Science for Communications Networks

Lecture 7: Spectral Graph Theory II

Lecture 3: graph theory

Necessary and sufficient conditions for strong R-positivity

Stochastic process. X, a series of random variables indexed by t

Periodicity & State Transfer Some Results Some Questions. Periodic Graphs. Chris Godsil. St John s, June 7, Chris Godsil Periodic Graphs

Chapter 7. Markov chain background. 7.1 Finite state space

Problem Set 2. Assigned: Mon. November. 23, 2015

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

2.1 Laplacian Variants

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Perron Frobenius Theory

Strongly Regular Graphs, part 1

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

The Simplest Construction of Expanders

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Hittingtime and PageRank

Algebraic Representation of Networks

LTCC. Exercises solutions

Online Social Networks and Media. Link Analysis and Web Search

Lecture 13: Spectral Graph Theory

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Lecture notes for Analysis of Algorithms : Markov decision processes

Transcription:

Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables X 0, X,... Ω is a Markov chain with state space Ω if for all possible states x t+, x t,..., x Ω we have Pr [ X t+ = x t+ X t = x t X t = x t X 0 = x 0 ] = Pr [ X t+ = x t+ X t = x t ]. The Markov chain is called time-homogenous if the latter probability is independent of t. In this course, we will only consider finite Markov chains which are also time-homogenous. In this case, it is convenient to store all information about the transition in a Markov chain in the so-called transition matrix P which is a Ω Ω matrix defined by: Properties of the transition matrix P: all entries in P are in [0, ], every row sum equals. P x,y := Pr [ X = y X 0 = x ]. 0 /2 /2 0 0 /3 0 /3 /3 0 P = /4 /4 0 /4 /4 0 /2 0 0 /2 0 0 0 0 4 2 3 5 Figure : Example of a Markov chain corresponding to a random walk on a graph G with 5 vertices. A very important special case is the Markov chain that corresponds to a random walk on an undirected, unweighted graph. Here, the random walk picks each step a neighbor chosen uniformly at random and moves to that neighbor. Hence, the transition matrix is P = D A, where D is the diagonal matrix with D i,i = deg(i). The next lemma shows how to compute the probability for multiple steps.

Lecture 5: Random Walks and Markov Chain 2 Lemma 5.2. For any two states x, y Ω and any round t N, where P 0 is the identity matrix. Pr [ X t = y X 0 = x ] = P t x,y, Proof. Let µ 0 be the (row-)unit-vector which has at component x and 0 otherwise (the starting distribution). Then let µ t be the distribution of the random walk at step t. Clearly, µ y = P x,y = z Ω µ 0 z P z,y, and therefore, and more generally, µ = µ 0 P, µ t = µ t P = µ 0 P t. Let us now return to the above example and see how the powers of P look like. 0 /2 /2 0 0 /3 0 /3 /3 0 P = /4 /4 0 /4 /4 0 /2 0 0 /2 0 0 0 0 0.7 0.24 0.32 0.7 0.09 P 0 0.6 0.25 0.34 0.6 0.08 = 0.6 0.26 0.35 0.6 0.08 0.7 0.24 0.32 0.7 0.09 0.8 0.24 0.3 0.8 0.09 /6 /4 /3 /6 /2 P /6 /4 /3 /6 /2 = /6 /4 /3 /6 /2 /6 /4 /3 /6 /2 /6 /4 /3 /6 /2 Another important special case of a Markov chain is the random walk on edge-weighted graphs. Here, every (undirected) edge {x, y} E has a non-negative weight w {x,y}. The transition matrix P is defined as P {x,y} = w {x,y} w x, where w x = z N(x) w {x,z}. Definition 5.3. A distribution π is called stationary, if πp = π. A distribution π is (time)-reversible w.r.t. P, if it satisfies for all x, y Ω, π x P x,y = π y P y,x. Lemma 5.4. Let π be a probability distribution that is time-reversible w.r.t. P. Then π is a stationary distribution for P.

Lecture 5: Random Walks and Markov Chain 3 Proof. Summing over all states y Ω yields x Ω π x P x,y = x Ω π y P y,x = π y. Hence, if π is time-reversible w.r.t. P, then once the distribution π is attained, the chain moves with the same frequency from x to y than from y to x. Random walks on graphs and random walks on edge-weighted graphs are always reversible. (A simple example for a non-reversible Markov chain is a Markov chain for which there are two states with P x,y > 0 but P x,y = 0.) Let us now compute the stationary distribution for three important examples of Markov chains: For a random walk on an (unweighted) graph, π x = deg(x) : π x = y N(x) deg(y) P y,x = y N(x) deg(y) An even more easier way to verify this is to use Lemma 5.4. deg(y) = deg(x). If the transition matrix P is symmetric, then π = (/n,..., /n) is uniform. For a random walk with non-negative edge-weights w {u,v}, {u, v} E, let = x V w x. Then the Markov chain is reversible with π x = w x / (and hence π is the stationary distribution): π x P x,y = w x w{x,y} w x = w {x,y} = w {y,x} = w y w{y,x} w y = π y P y,x.

Lecture 5: Random Walks and Markov Chain 4 /8 5/8 rainy sunny 5 rainy 2 sunny 3 2/3 /4 /4 /3 /2 3/4 foggy cloudy 2 foggy /6 cloudy /3 Figure 2: The Markov chain representing the weather in Saarbrücken. The transition matrix for the weather chain (the states are ordered R, F, C, S). 5/8 /4 0 /8 P = 2/3 0 /3 0 0 /6 /3 /2 /4 0 3/4 0 By using the above formula for edge-weighted random walks, we can compute the stationary distribution: π = (8/2, 3/2, 6/2, 4/2) (note that 2 is the sum of all weights of the vertices, where every edge weight is counted twice, but every loop weight is counted once) The naive approach to compute P t numerically involves log 2 (t) matrix multiplications (fastest algorithm O(n 2.376 )). Using the reversibility, we now show how to do it using only a constant number of matrix multiplications (assuming that we can efficiently compute the eigenvectors for very small t it might be actually more efficient to compute P t directly). Moreover, the approach will give us a precise formula for any entry P t x,y as a function of t. Since the matrix P is reversible w.r.t. to π, it follows that the matrix N x,y := π /2 x P x,y π /2 y, is symmetric. Let us rewrite this using diagonal matrices: N = D /2 π PD /2 π. Hence the matrix N can be diagonalized, i.e., there is an orthogonal matrix R consisting of the eigenvalues of N so that RNR = D, where D is the diagonal matrix with eigenvalues λ λ 2 λ n of N. Therefore, we can compute powers of the matrix P in the following way: ( ) t P t = Dπ /2 ND +/2 π ( = Dπ /2 R DRD +/2 π ) t = D /2 π R D t D +/2 π R Definition 5.5. A Markov chain is called aperiodic if for all states x Ω: gcd( { t N: P t x,x > 0 } ) =.

Lecture 5: Random Walks and Markov Chain 5 P = 0 0 0 0 0 0 /2 0 0 /2 0 0 0 0 0 0 P 2 = /2 0 0 /2 0 0 0 0 0 0 P 3 = P 4 = P = /2 0 0 /2 0 0 0 0 0 0 /2 0 0 /2 0 0 0 0 0 0 /2 0 0 /2 0 0 0 4 /2 /2 2 3 Figure 3: Example of a Markov chain with period 3. The set {t N: P t, > 0} contains only multiples of 3. Definition 5.6. A Markov chain is irreducible if for any two states x, y Ω there exists an integer t such that P t x,y > 0. Intuitively, a Markov chain is called irreducible if it is connected viewed as a graph. The following Markov chain corresponds to a random walk on the (unconnected) graph which consists of two disjoint cycles of length 2. Hence the transition matrix P below is not irreducible. 0 0 0 P = 0 0 0 0 0 0 0 0 0 Note that P has two stationary distributions: π = (/2, /2, 0, 0) and π = (0, 0, /2, /2). Theorem 5.7 (Fundamental Theorem for Markov Chains). If a Markov chain is aperiodic and ireducible, then it holds for all states x, y Ω that: there exists an integer T = T (x, y) N so that for all t T, P t x,y > 0.

Lecture 5: Random Walks and Markov Chain 6 in the limit, lim t Pt x,y = π y = E [ τ y,y ], where τ y,y = min{t N \ {0}: X t = y, X 0 = y} is the first return to y. Hence for large t, the matrix P t has value π y in column y. Moreover, note that π y = E[ τ y,y ] means that the proportion of steps spent in state y is equal to the expected waiting times between two consecutive visits. 2 Speed of Convergence We now focus on random walks on graphs. We assume G = (V, E) is an undirected, unweighted and connected graph. For simplicity, we also assume that G is d-regular. Recall that M = D A is the normalized adjacency matrix which will be transition matrix of the random walk on G, so P = M. Since P is a symmetric matrix, π = (/n,..., /n) is the stationary distribution. Lemma 5.8. Let P be any symmetric transition matrix. Then for any probability vector x: xp t π 2 λ t, where π = (/n,..., /n) is the uniform vector and λ = max{ λ 2, λ n }.