Probability & Computing

Similar documents
Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

Markov Chains, Random Walks on Graphs, and the Laplacian

Lecture 5: Random Walks and Markov Chain

Markov Chains and Stochastic Sampling

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Two Proofs of Commute Time being Proportional to Effective Resistance

Approximate Counting and Markov Chain Monte Carlo

25.1 Markov Chain Monte Carlo (MCMC)

The Markov Chain Monte Carlo Method

Model Counting for Logical Theories

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Generating random spanning trees. Andrei Broder. DEC - Systems Research Center 130 Lytton Ave., Palo Alto, CA

6.842 Randomness and Computation February 24, Lecture 6

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

(K2) For sum of the potential differences around any cycle is equal to zero. r uv resistance of {u, v}. [In other words, V ir.]

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Link Analysis Ranking

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Markov Chains, Stochastic Processes, and Matrix Decompositions

Online Social Networks and Media. Link Analysis and Web Search

Modern Discrete Probability Spectral Techniques

Essentials on the Analysis of Randomized Algorithms

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Processes Hamid R. Rabiee

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

1 Random Walks and Electrical Networks

Convex Optimization CMU-10725

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

The Theory behind PageRank

Stochastic Processes (Week 6)

Lecture 2: September 8

Some Definition and Example of Markov Chain

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Necessary and sufficient conditions for strong R-positivity

Markov chains. Randomness and Computation. Markov chains. Markov processes

ECEN 689 Special Topics in Data Science for Communications Networks

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Deterministic approximation for the cover time of trees

Disjointness and Additivity

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Chapter 7. Markov chain background. 7.1 Finite state space

Introduction to Machine Learning CMU-10701

Online Social Networks and Media. Link Analysis and Web Search

MATH36001 Perron Frobenius Theory 2015

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Perron Frobenius Theory

The Monte Carlo Method

Lecture 20 : Markov Chains

Convergence Rate of Markov Chains

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Markov chains (week 6) Solutions

Hittingtime and PageRank

Markov Chains and MCMC

6 Markov Chain Monte Carlo (MCMC)

Markov Chains on Countable State Space

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Lecture #5. Dependencies along the genome

2. Transience and Recurrence

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Markov Chains Handout for Stat 110

Approximate Counting

ACO Comprehensive Exam October 14 and 15, 2013

Random walk: notes. Contents. December 5, 2017

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Link Analysis. Leonid E. Zhukov

Math Homework 5 Solutions

Quantum walk algorithms

From Determinism to Stochasticity

An Introduction to Randomized algorithms

Markov Chains and MCMC

What can be sampled locally?

MATH 56A SPRING 2008 STOCHASTIC PROCESSES 65

Random Variable. Pr(X = a) = Pr(s)

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

0.1 Naive formulation of PageRank

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Stochastic Simulation

1 Adjacency matrix and eigenvalues

STOCHASTIC PROCESSES Basic notions

Lecture 1 : Probabilistic Method

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

1.3 Convergence of Regular Markov Chains

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Lectures on Markov Chains

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Markov Chains (Part 3)

Spectral Graph Theory and its Applications

Universal Traversal Sequences with Backtracking

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

The cover time of sparse random graphs.

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Transcription:

Probability & Computing

Stochastic Process time t {X t t 2 T } state space Ω X t 2 state x 2 discrete time: T is countable T = {0,, 2,...} discrete space: Ω is finite or countably infinite X 0,X,X 2,...

Markov Property dependency structure of X 0,X,X 2,... Markov property: (memoryless) X t+ depends only on X t 8t =0,, 2,...,8x 0,x,...,x t,x,y 2 Pr[X t+ = y X 0 = x 0,...,X t = x t,x t = x] =Pr[X t+ = y X t = x] Markov chain: discrete time discrete space stochastic process with Markov property.

Transition Matrix Markov chain: X 0,X,X 2,... Pr[X t+ = y X 0 = x 0,...,X t = x t,x t = x] =Pr[X t+ = y X t = x] = P (t) xy = P xy (time-homogenous) P y 2 x 2 P xy stochastic matrix P =

chain: X 0, X, X 2,... distribution: (0) () (2) 2 [0, ] X (t) x =Pr[X t = x] x2 x = (t+) = (t) P (t+) y =Pr[X t+ = y] = X x2 Pr[X t = x]pr[x t+ = y X t = x] = X x2 x (t) P xy =( (t) P ) y

(0) P! () P! (t) P! (t+) P! initial distribution: (0) transition matrix: P (t) = (0) P t Markov chain: M =(,P)

/3 /3 /3 2 3 /3 2/3 0 0 P = /3 0 2/3 /3 /3 /3

Convergence /3 3 /3 /3 /3 2 2/3 0 0 P = /3 0 2/3 /3 /3 /3 0.2469 0.3333 0.2500 0.4074 0 0.3752 0.3457 0.6667 P 0 52 0.250 0.3333 0.2500 0.362 0.5556 0.375 0.3750 0.3749 0.3750 0.3868 0.2222 0.250 0.2778 0.2500 0.3663 0.6 0.375 0.3750 0.3749 0.3750 0.3827 0.3333 distribution, P 20 ( 4, 3 8, 3 8 )

Stationary Distribution Markov chain M =(,P) stationary distribution π: P = (fixed point) Perron-Frobenius Theorem: stochastic matrix P: P = is also a left eigenvalue of P (eigenvalue of P T ) the left eigenvector P = is nonnegative stationary distribution always exists

Perron-Frobenius Perron-Frobenius Theorem: A : a nonnegative n n matrix with spectral radius ρ(a) ρ(a) > 0 is an eigenvalue of A; there is a nonnegative (left and right) eigenvector associated with ρ(a); if further A is irreducible, then: there is a positive (left and right) eigenvector associated with ρ(a) that is of multiplicity ; for stochastic matrix A the spectral radius ρ(a)=.

Stationary Distribution Markov chain M =(,P) stationary distribution π: P = (fixed point) Perron-Frobenius Theorem: stochastic matrix P: P = is also a left eigenvalue of P (eigenvalue of P T ) the left eigenvector P = is nonnegative stationary distribution always exists

Convergence /3 3 /3 /3 /3 2 2/3 0 0 P = /3 0 2/3 /3 /3 /3 0.2500 0.3750 0.3750 P 20 0.2500 0.3750 0.3750 0.2500 0.3750 0.3750 ergodic: convergent to stationary distribution

2 2 3 reducible 3 4 4 4 2 3 3 4 /2 /2 0 0 P = /3 2/3 0 0 0 0 3/4 /4 0.4 0.6 0 0 P 20 0.4 0.6 0 0 0 0 0.5 0.5 0 0 0.5 0.5 0 0 /4 3/4

periodic P = 0 0 P 2 = 0 0 P 2k = 0 0 P 2k+ = 0 0

2 2 2 3 3 reducible 4 /3 /3 /3 2 3 4 4 3 4 3 /3 2/3 periodic

Fundamental Theorem of Markov Chain: If a finite Markov chain M =(,P) aperiodic, then initial distribution (ergodic) where lim t! (0) P t = is irreducible and (0) is a unique stationary distribution satisfying P =

Irreducibility y is accessible from x: access 9t, P t (x, y) > 0 x communicate y x communicates with y: x is accessible from y y is accessible from x MC is irreducible: all pairs of states communicate 2 3 4 2 3 4 4 communicating classes 2 3 3 4

Reducible Chains component A component B P = P A 0 0 P B stationary distributions: = A + ( ) B component A component B stationary distribution: =(0, B)

Aperiodicity period of state x: d x = gcd{t P t (x, x) > 0} aperiodic chain: all states have period period: the gcd of lengths of cycles x d x =3

Lemma (period is a class property) x and y communicate d x = d y n x y 8 cycle of y n n 2 d x (n + n 2 ) d x (n + n 2 + n) o ) d x n ) d x apple d y

Fundamental Theorem of Markov Chain: If a finite Markov chain M =(,P) aperiodic, then initial distribution is irreducible and (0) where lim t! (0) P t = is a unique stationary distribution satisfying P = finiteness irreducibility existence uniqueness } Perron-Frobenius ergodicity convergence

Coupling of Markov Chains Markov chain M =(,P) (0) : initial X 0,X,X 2,... stationary : X0,X 0,X 0 2,... 0 faithful running of MC initial (0) : X 0 X X n X 00 t stationary : X 0 X X n X n+ Goal: lim t! Pr[X00 t = X 0 t]=

Coupling of Markov Chains Markov chain M =(,P) (0) : initial X 0,X,X 2,... stationary : X0,X 0,X 0 2,... 0 faithful running of MC initial (0) : X 0 X X n X 00 t stationary : X 0 X X n X n+ Goal: conditioning on X 0 = x, X 0 0 = y 9n, Pr[X n = X 0 n = y] > 0

Markov chain M =(,P) initial (0) : X 0 X X n stationary : X 0 X X n X n+ conditioning on X 0 = x, X 0 0 = y irreducible: 9n, P n (x, y) > 0 aperiodic: 9n 2, n = n + n 2 ( P n 2 (y, y) > 0 P n +n 2 (y, y) > 0 Pr[X n = X 0 n = y] Pr[X n = y]pr[x 0 n = y] P n (x, y)p n 2 (y, y)p n (y, y) > 0

Markov chain M =(,P) faithful running of MC initial (0) : X 0 X X n X 00 t stationary : X 0 X X n X n+ conditioning on X 0 = x, X 0 0 = y 9n, Pr[X n = X 0 n] > 0 lim t! Pr[X00 t = X 0 t]=

Fundamental Theorem of Markov Chain: If a finite Markov chain M =(,P) aperiodic, then initial distribution (ergodic) where lim t! (0) P t = is irreducible and (0) is a unique stationary distribution satisfying P = finiteness irreducibility existence uniqueness } Perron-Frobenius ergodicity convergence

Fundamental Theorem of Markov Chain: If a Markov chain M =(,P) is irreducible and ergodic, then initial distribution (0) where lim t! (0) P t = is a unique stationary distribution satisfying P = finite: ergodic = aperiodic infinite: ergodic aperiodic + non-null persistent

Infinite Markov Chains fxy : prob. of ever reaching y starting from x h xy : expected time of first reaching y starting from x f xy (t), Pr[y 62 {X,...,X t } ^ X t = y X 0 = x] f xy, X f xy (t) h xy, t= t= X tf xy (t) x is transient if f xx < x is recurrent or persistent if f xx = x is null-persistent if x is persistent and h xx = x is non-null-persistent if x is persistent and h xx < a chain is non-null-persistent if all states are

Fundamental Theorem of Markov Chain: If a Markov chain M =(,P) is irreducible and ergodic, then initial distribution (0) where lim t! (0) P t = is a unique stationary distribution satisfying P = finite: ergodic = aperiodic infinite: ergodic aperiodic + non-null persistent irreducible + non-null persistent unique stationary distribution aperiodic + non-null persistent ergodic

PageRank Rank: importance of a page A page has higher rank if pointed by more highrank pages. High-rank pages have greater influence. Pages pointing to few others have greater influence.

PageRank (simplified) the web graph G(V,E) rank of a page: r(v) r(v) = u:(u,v) E r(u) d + (u) d + (u) : out-degree of u random walk: P (u, v) = d + (u) if (u, v) E, 0 otherwise. stationary distribution: rp = r a tireless random surfer

Random Walk on Graph undirected graph G(V, E) walk: v,v 2,... V that v i+ v i random walk: v i+ is uniformly chosen from N(v i ) P (u, v) = d(u) u v 0 u v P = D A adjacency matrix A D(u, v) = d(u) u = v 0 u = v

Random Walk on Graph stationary: convergence (ergodicity); stationary distribution; hitting time: time to reach a vertex; cover time: time to reach all vertices; mixing time: time to converge.

Random Walk on Graph G(V,E) for finite chain: P (u, v) = d(u) u v 0 u v irreducible and aperiodic converge irreducible G is connected P t (u, v) > 0 A t (u, v) > 0 aperiodic G is non-bipartite bipartite no odd cycle period =2 non-bipartite (2k+)-cycle undirected 2-cycle aperiodic

Lazy Random Walk undirected graph G(V, E) lazy random walk: flip a coin to decide whether to stay P (u, v) = u = v 2 u v 2d(u) 0 otherwise P = 2 (I + D A) adjacency matrix A D(u, v) = d(u) u = v 0 u = v always aperiodic!

Random Walk on Graph G(V,E) P (u, v) = d(u) u v 0 u v Stationary distribution : v V v = v v V, v = d(v) V 2m d(v) 2m = 2m v V d(v) = ( P ) v = u V up (u, v) = u N(v) d(u) 2m d(u) = d(v) 2m = v regular graph uniform distribution

Random Walk on Graph G(V,E) P (u, v) = u = v 2 u v 2d(u) 0 otherwise Stationary distribution : v V, v = d(v) 2m ( P ) v = u V up (u, v) = 2 d(v) 2m + 2 X u2n(v) d(u) 2m d(u) = d(v) 2m = v

Reversibility ergodic flow detailed balance equation: (x)p (x, y) = (y)p (y, x) time-reversible Markov chain: 9, 8,x,y 2, (x)p (x, y) = (y)p (y, x) stationary distribution: ( P )y = X x (x)p (x, y) = X x (y)p (y, x) = (y) time-reversible: when start from Pr[X 0 = x 0 X = x... X n = x n ] = Pr[X 0 = x n X = x n... X n = x 0 ]

Reversibility ergodic flow detailed balance equation: (x)p (x, y) = (y)p (y, x) time-reversible Markov chain: 9, 8,x,y 2, (x)p (x, y) = (y)p (y, x) stationary distribution: ( P )y = X x (x)p (x, y) = X x (y)p (y, x) = (y) time-reversible: when start from (X 0,X,...,X n ) (X n,x n,...,x 0 )

Reversibility ergodic flow detailed balance equation: (x)p (x, y) = (y)p (y, x) time-reversible Markov chain: 9, 8,x,y 2, (x)p (x, y) = (y)p (y, x) stationary distribution: ( P )y = X x (x)p (x, y) = X x (y)p (y, x) = (y) the Markov chain can be uniquely represented as an edge-weighted undirected graph G(V,E) with ergodic flows as edge weights

Random Walk on Graph G(V,E) P (u, v) = P (u, v) = d(u) u v 0 u v u = v 2 u v 2d(u) 0 otherwise detailed balance equation: (u)p (u, v) = (v)p (v, u) u = v : u 6 v : u v : hold for free (u) / P (u, v)

Random Walk on Graph undirected graph G(V, E) (not necessarily regular) lazy random walk Goal: a uniform stationary distribution π Goal: an arbitrary stationary distribution π

Random Walk stationary: convergence (ergodicity); stationary distribution; hitting time: time to reach a vertex; cover time: time to reach all vertices; mixing time: time to converge.

Hitting and Covering consider a random walk on G(V,E) hitting time: expected time to reach v from u u,v = E min n>0 X n = v X 0 = u cover time: expected time to visit all vertices C u = E min n {X 0,...,X n } = V X 0 = u C(G) = max C u u V

G = K n C(G) = (n log n) G : path or cycle C(G) = (n 2 ) K n 2 n 2 G : lollipop C(G) = (n 3 )

Hitting Time u,v = E min n>0 X n = v X 0 = u stationary distribution v = d(v) 2m Renewal Theorem: irreducible v,v = v = 2m d(v)

Renewal Theorem: irreducible (x) =Pr[X 0 = x] v,v = v = 2m d(v) X 0,X,X 2,... T x : first t>0 that X t = x integer X 0 E[X] = X n 0 Pr[X >n] x,x = E[T x X 0 = x] = X n Pr[T x n X 0 = x] x,x x = X n Pr[T x n X 0 = x]pr[x 0 = x] = X n Pr[T x n ^ X 0 = x] =Pr[X 0 = x]+ X n 2 (Pr[8 apple t apple n,x t 6= x] Pr[80 apple t apple n,x t 6= x]) =Pr[X 0 = x]+ X n 2 (Pr[80 apple t apple n 2,X t 6= x] Pr[80 apple t apple n,x t 6= x]) =Pr[X 0 = x]+pr[x 0 6= x] lim n! a n = (irreducible) a n-

random walk on G(V,E) v,v = v = 2m d(v) Lemma: uv 2 E u,v < 2m v,v = X wv2e d(v) ( + w,v) v u 2m = X wv2e ( + w,v ) u,v < 2m

Cover Time h C u = E min n {X 0,...,X n } = V X 0 = u i C(G) = max C u u V Theorem: C(G) apple 4nm pick a spanning tree T of G Eulerian tour: v v 2 v 2(n ) v 2n = v C(G) 2(n ) i= < 4nm v i,v i+

USTCON (undirected s-t connectivity) Instance: undirected G(V, E); vertices: s, t s-t connected? deterministic: traverse: linear space log-space? s undirected G t

Theorem (Aleliunas-Karp-Lipton-Lovász-Rackoff 979) USTCON can be solved by a poly-time Monte Carlo randomized algorithm with bounded one-sided error, which uses O(log n) extra space. start a random walk at s; if reach t in 4n 3 steps return yes else return no space: O(log n) unconnected no connected Cover time: C(G) 4nm < 2n 3 Markov s inequality Pr[ no ] < /2

Electric Network edge uv : resistance R uv vertex v : potential v edge orientation u! v: current flow C u!v potential difference u,v = u v Kirchhoff s Law: vertex v, flow-in = flow-out Ohm s Law: edge uv, C u!v = u,v R uv

Effective Resistance electrical network: u effective resistance R(u,v): potential difference between u and v required to send unit of flow current from u to v

each edge e resistance R e = u Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v)

graph G(V,E) construct an electrical network: for every edge e : a special vertex v : R e = each vertex u : inject d(u) units current flow into u remove all 2m units current flow from v Lemma: 8u 2 V, u,v = u,v

Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) d(u) = X uw2e = X uw2e C u!w (Kirchhoff) (Ohm) X = ( u,v w,v ) uw2e u,w = d(u) u,v X X uw2e uw2e w,v w,v

Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) X uw2e w,v u,v = E min n>0 X n = v X 0 = u X u,v = wu2e =+ d(u) d(u) ( + w,v) X wu2e w,v

Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) u,v =+ d(u) X uw2e X wu2e w,v w,v ) has the same unique solution

Effective Resistance electrical network: each edge e resistance R e = u effective resistance R(u,v): potential difference between u and v required to send unit of flow current from u to v

each edge e resistance R e = u Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: Scenario B

Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) u,v =+ d(u) X uw2e X wu2e w,v w,v ) has the same unique solution

Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: Scenario B A u,v = u,v B v,u = v,u

Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: C: Scenario B Scenario C A u,v = u,v B v,u = v,u C u,v = B v,u = v,u D: D = A + C Scenario D D u,v = A u,v + C u,v = u,v + v,u D u,v : potential difference between u and v to send 2m units current flow from u to v

Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: C: Scenario B Scenario C A u,v = u,v B v,u = v,u C u,v = B v,u = v,u D: D = A + C Scenario D D u,v = A u,v + C u,v = u,v + v,u R(u, v) = D u,v 2m

Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) Theorem: C(G) apple 2nm pick a spanning tree T of G Eulerian tour: v v 2 v 2(n ) v 2n = v C(G) 2(n ) i= v i,v i+ = X ( u,v + v,u ) uv2t X =2m R(u, v) apple 2mn uv2t

Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) G: path u n v C(G) apple 2nm =2n 2 = O(n 2 ) C(G) = u,v v,u = (n 2 ) C(G) = (n 2 ) u,v + v,u =2mR(u, v) =2n(n )

Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) G: lollipop K n u 2 n 2 v C(G) apple 2nm = O(n 3 ) C(G) u,v = (n 3 C(G) = (n 3 ) ) u,v + v,u =2mR u,v = (n 3 ) v,u = O(n 2 )