Markov Chains, Random Walks on Graphs, and the Laplacian

Similar documents
Convex Optimization CMU-10725

Lecture 2: September 8

Probability & Computing

Perron Frobenius Theory

Markov Chains Handout for Stat 110

Introduction to Machine Learning CMU-10701

Markov Chains and Stochastic Sampling

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Markov Chains and MCMC

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Lecture: Local Spectral Methods (1 of 4)

25.1 Markov Chain Monte Carlo (MCMC)

Finite-Horizon Statistics for Markov chains

Markov Chains, Stochastic Processes, and Matrix Decompositions

CMPSCI 791BB: Advanced ML: Laplacian Learning

Lecture #5. Dependencies along the genome

Markov Chains and Spectral Clustering

Markov Processes Hamid R. Rabiee

MATH36001 Perron Frobenius Theory 2015

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

MAA704, Perron-Frobenius theory and Markov chains.

Fiedler s Theorems on Nodal Domains

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

On the convergence of weighted-average consensus

Fiedler s Theorems on Nodal Domains

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

Eigenvectors Via Graph Theory

Language Acquisition and Parameters: Part II

On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors

1.3 Convergence of Regular Markov Chains

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Discrete Math Class 19 ( )

Nonnegative Matrices I

ECEN 689 Special Topics in Data Science for Communications Networks

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Networks and Their Spectra

6 Markov Chain Monte Carlo (MCMC)

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Invertibility and stability. Irreducibly diagonally dominant. Invertibility and stability, stronger result. Reducible matrices

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

Section 1.7: Properties of the Leslie Matrix

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Reinforcement Learning

Model Counting for Logical Theories

Introduction to Machine Learning CMU-10701

Necessary and sufficient conditions for strong R-positivity

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Detailed Proof of The PerronFrobenius Theorem

Markov chains. Randomness and Computation. Markov chains. Markov processes

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Lecture 5: Random Walks and Markov Chain

18.175: Lecture 30 Markov chains

Approximate Counting and Markov Chain Monte Carlo

Spectra of Adjacency and Laplacian Matrices

Geometric Mapping Properties of Semipositive Matrices

An Introduction to Entropy and Subshifts of. Finite Type

Social network analysis: social learning

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Nonnegative and spectral matrix theory Lecture notes

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Markov Chains on Countable State Space

Lecture 3: graph theory

On random walks. i=1 U i, where x Z is the starting point of

Chapter 7. Markov chain background. 7.1 Finite state space

Consensus of Information Under Dynamically Changing Interaction Topologies

Consensus Seeking in Multi-agent Systems Under Dynamically Changing Interaction Topologies

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral radius, symmetric and positive matrices

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Z-Pencils. November 20, Abstract

1 Matrix notation and preliminaries from spectral graph theory

Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Consensus Seeking in Multi-agent Systems Under Dynamically Changing Interaction Topologies

6.842 Randomness and Computation March 3, Lecture 8

Markov Chains (Part 3)

LECTURE 3. Last time:

On the mathematical background of Google PageRank algorithm

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Algebraic Representation of Networks

The Perron Frobenius theorem and the Hilbert metric

Iowa State University and American Institute of Mathematics

Eigenvalue comparisons in graph theory

Classification of Countable State Markov Chains

The Theory behind PageRank

Inequalities for the spectra of symmetric doubly stochastic matrices

The Markov Chain Monte Carlo Method

ELEC633: Graphical Models

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Data Mining and Analysis: Fundamental Concepts and Algorithms

Computational math: Assignment 1

Transcription:

Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan

Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer networks: routing! Markov decision processes and reinforcement learning! Spectral clustering and semi-supervised learning! Physical and biological processes! There is a natural connection between a graph and a random walk on the vertices! Laplacian spectra and random walk convergence! How fast a random walk converges depends intimately on the spectra (which in turn depends on the topology of the graph)

Examples of Markov Chains! Knight s tour:! Assume a knight moves on an empty chessboard, starting from some position and randomly selecting one of the possible legal moves.! How long does it take the knight to return to its starting position?! Shuffling a deck of cards:! What is the optimal algorithm for shuffling a deck of cards?! Riffle shuffle: ~ log 2 d ; Top-to-random shuffle: ~ d log 2 d! Sampling from a complex distribution:! Let f: R d -> (0,1) be some arbitrary distribution! How do we efficiently sample from f when d is large?! Markov chain Monte Carlo (MCMC) methods

Markov Chain! A Markov chain is a stochastic process on a finite (or infinite) set of states S where! P(s t+1 = j s t = i, s t-1 =k, ) = P ij (the Markov property holds)! Let α 0 be an initial distribution on S! α 0 (i) = P(s o = i)! We are interested in computing the long-term distribution as t -> infinity! Note that! P(s 1 = i) = k P(s 1 = i s 0 = k) P(s 0 = k)! Or, more generally, α 1 = α 0 P = P T α 0 T! So, by induction, we get α t = (P T ) t α 0

Types of Markov Chain! There are many different types of Markov chains! A state s is called recurrent if α t (s) > 0 as t -> infinity (or P t (s,s) > 0)! A state s is transient if P t (s,s) -> 0 as t -> infinity! Two states u and v are called communicating if there is a positive probability of reaching each from the other! P k (u,v) > 0 and P l (v,u) > 0, for k >0, l > 0! A set of states is irreducible if they form a communicating class! A set of states is called ergodic if all states in the set communicate with each other, and don t communicate with states outside the set! A Markov chain is ergodic if its state space is irreducible and aperiodic! A Markov chain is called aperiodic if! The gcd of all t such that P t (s,s) > 0 is 1

Examples of Markov Chains.3 1.2.5 1 2.7.5 1 2 What type of Markov chain does each of these define? Which states are recurrent or transient? Which chains are aperiodic? Does a Knight s tour define an ergodic Markov chain?.8

Limiting Distribution! Let P be a stochastic matrix defining a Markov chain! The stationary (or invariant) distribution P * is defined as! If a Markov chain is irreducible, the invariant distribution P * has identical rows! In other words, the long-term probability of being in a state does not depend on the starting distribution! If a state is transient, what can be said about the column corresponding to it in the invariant distribution?

Random Walk on Undirected Graphs! Let G = (V, E, W) be a weighted undirected graph! G defines a natural random walk, where! The vertices of G are the states of a Markov chain! The transition probabilities are p uv = w uv /d u! where d u = v w(u,v)! More compactly, we can write P = D -1 W! What sort of Markov chain does this define?! Does this chain converge to a stationary distribution?! Is the rate of convergence related to the graph topology?! How can we extend this model to directed graphs?

Reversible Random Walks on Graphs! If an undirected graph is connected and non-bipartite, the Markov chain defined by the random walk is irreducible (and aperiodic)! A random walk (or Markov chain) is called reversible if α * (u) P(u,v) = α*(v) P(v,u)! Random walks on undirected weighted graphs are reversible! Note from our earlier analysis that even though the random walk on a graph defines an asymmetric matrix, its eigenvalues are all real! D -1 W = D -1/2 (D -1/2 W D -1/2 ) D 1/2 = D -1/2 (I L) D 1/2

Perron s Theorem for Positive Matrices! A matrix A is positive if all its elements are positive! Theorem: For any positive square matrix A (of size n)! ρ(a) > 0, where ρ is an eigenvalue! For any other eigenvalue λ, λ < ρ! The eigenvector (called the Perron vector) associated with ρ has all positive elements

Irreducible Matrices! A matrix A is said to be reducible if there exists a permutation matrix using which A can be transformed into the following form: P T A P = B C 0 D! A matrix which cannot be reduced is said to be irreducible! A matrix is nonnegative if A 0.! Theorem: A nonnegative matrix A is irreducible if and only if (I + A) n-1 > 0

Example.3 0.6 0.3 0.7 1.7 2.4 3 Is this irreducible?

Irreducibility Check! (I + P) 2 = 1.6900 2.0300 0.2800 0 2.8400 1.1600 0 2.0300 1.9700

Example.3 0.5 0.3 0.1 0.7 1 2 3.7.4 Is this irreducible?

Irreducibility Check! (I + P)^2 = 1.7600 1.9600 0.2800 0.2800 2.6000 1.1200 0.0700 1.9600 1.9700

Perron-Frobenius Theorem! Perron-Frobenius Theorem: Let A be an irreducible nonnegative matrix (so that A 0).! Then, A has a positive real eigenvalue ρ such that every other eigenvalue λ of A has modulus < ρ.! Furthermore, the eigenvector x corresponding to ρ has all positive entries (and this vector is unique)! ρ is a simple eigenvalue (geometrically and algebraically)! The largest eigenvalue of a matrix A (in this case ρ) is called its spectral radius! Simple corollary: If A is irreducible, A T is also irreducible. In this case, the positive eigenvector x is a left eigenvector

Applying the Perron-Frobenius Theorem! Let P be a irreducible stochastic matrix, where j P ij = 1! Since all its entries are non-negative, we can apply the Perron-Frobenius theorem.! Note however that we know that the constant vector 1 is an eigenvector of P, since P 1 = 1! Hence, it must be the Perron vector!! Thus, the spectral radius of P is ρ = 1

Convergence of Random Walks! Theorem: Let G = (V, E, W) be a connected non-bipartite weighted undirected graph. Let α 0 be an initial probability distribution on V. If λ = max ( λ i (P(G)): λ i (P(G)) 1) (i.e, 1-λ is the Fiedler eigenvalue)! Then, for a simple random walk on G, we have

Proof of Convergence Theorem! Let Q = P T, and let y i, 1 < i < n be the eigenfunctions of Q which are orthonormal! Note that λ n (Q) = 1 (why?) and therefore λ < 1! We apply the Perron-Frobenius theorem and deduce that all the components of y n (v) > 0! Since the eigenfunctions y i form a basis for R n, we get α 0 = i=1n w i y i! Observe that w n = <α 0, y n > > 0! We can now write the distribution at time step t as α t = Q t α 0 = i=1 n-1 w i λ it y i + w n y n

Proof of Convergence Theorem! Since λ < 0, α t -> w n y n = α *! Hence, the difference can be written as

Implications of this result! The theorem quantifies the relationship between the rate at which the random walk approaches the stationary distribution and the first non-zero eigenvalue of the normalized Laplacian (why?)! By deriving bounds on λ 1 for example graphs, we can see how fast a random walk will mix on these graphs! Spielman s lectures (2 and 3) derive lower bounds on canonical graphs.! For a path graph P n on n nodes, λ 1 > 4/n 2! The mixing rate of a random walk is defined as 1/(1 - λ)

Laplacian for Directed Graphs! In the next class, we will see how to apply the Perron- Frobenius theorem to define the Laplacian for directed graphs