Probability & Computing
Stochastic Process time t {X t t 2 T } state space Ω X t 2 state x 2 discrete time: T is countable T = {0,, 2,...} discrete space: Ω is finite or countably infinite X 0,X,X 2,...
Markov Property dependency structure of X 0,X,X 2,... Markov property: (memoryless) X t+ depends only on X t 8t =0,, 2,...,8x 0,x,...,x t,x,y 2 Pr[X t+ = y X 0 = x 0,...,X t = x t,x t = x] =Pr[X t+ = y X t = x] Markov chain: discrete time discrete space stochastic process with Markov property.
Transition Matrix Markov chain: X 0,X,X 2,... Pr[X t+ = y X 0 = x 0,...,X t = x t,x t = x] =Pr[X t+ = y X t = x] = P (t) xy = P xy (time-homogenous) P y 2 x 2 P xy stochastic matrix P =
chain: X 0, X, X 2,... distribution: (0) () (2) 2 [0, ] X (t) x =Pr[X t = x] x2 x = (t+) = (t) P (t+) y =Pr[X t+ = y] = X x2 Pr[X t = x]pr[x t+ = y X t = x] = X x2 x (t) P xy =( (t) P ) y
(0) P! () P! (t) P! (t+) P! initial distribution: (0) transition matrix: P (t) = (0) P t Markov chain: M =(,P)
/3 /3 /3 2 3 /3 2/3 0 0 P = /3 0 2/3 /3 /3 /3
Convergence /3 3 /3 /3 /3 2 2/3 0 0 P = /3 0 2/3 /3 /3 /3 0.2469 0.3333 0.2500 0.4074 0 0.3752 0.3457 0.6667 P 0 52 0.250 0.3333 0.2500 0.362 0.5556 0.375 0.3750 0.3749 0.3750 0.3868 0.2222 0.250 0.2778 0.2500 0.3663 0.6 0.375 0.3750 0.3749 0.3750 0.3827 0.3333 distribution, P 20 ( 4, 3 8, 3 8 )
Stationary Distribution Markov chain M =(,P) stationary distribution π: P = (fixed point) Perron-Frobenius Theorem: stochastic matrix P: P = is also a left eigenvalue of P (eigenvalue of P T ) the left eigenvector P = is nonnegative stationary distribution always exists
Perron-Frobenius Perron-Frobenius Theorem: A : a nonnegative n n matrix with spectral radius ρ(a) ρ(a) > 0 is an eigenvalue of A; there is a nonnegative (left and right) eigenvector associated with ρ(a); if further A is irreducible, then: there is a positive (left and right) eigenvector associated with ρ(a) that is of multiplicity ; for stochastic matrix A the spectral radius ρ(a)=.
Stationary Distribution Markov chain M =(,P) stationary distribution π: P = (fixed point) Perron-Frobenius Theorem: stochastic matrix P: P = is also a left eigenvalue of P (eigenvalue of P T ) the left eigenvector P = is nonnegative stationary distribution always exists
Convergence /3 3 /3 /3 /3 2 2/3 0 0 P = /3 0 2/3 /3 /3 /3 0.2500 0.3750 0.3750 P 20 0.2500 0.3750 0.3750 0.2500 0.3750 0.3750 ergodic: convergent to stationary distribution
2 2 3 reducible 3 4 4 4 2 3 3 4 /2 /2 0 0 P = /3 2/3 0 0 0 0 3/4 /4 0.4 0.6 0 0 P 20 0.4 0.6 0 0 0 0 0.5 0.5 0 0 0.5 0.5 0 0 /4 3/4
periodic P = 0 0 P 2 = 0 0 P 2k = 0 0 P 2k+ = 0 0
2 2 2 3 3 reducible 4 /3 /3 /3 2 3 4 4 3 4 3 /3 2/3 periodic
Fundamental Theorem of Markov Chain: If a finite Markov chain M =(,P) aperiodic, then initial distribution (ergodic) where lim t! (0) P t = is irreducible and (0) is a unique stationary distribution satisfying P =
Irreducibility y is accessible from x: access 9t, P t (x, y) > 0 x communicate y x communicates with y: x is accessible from y y is accessible from x MC is irreducible: all pairs of states communicate 2 3 4 2 3 4 4 communicating classes 2 3 3 4
Reducible Chains component A component B P = P A 0 0 P B stationary distributions: = A + ( ) B component A component B stationary distribution: =(0, B)
Aperiodicity period of state x: d x = gcd{t P t (x, x) > 0} aperiodic chain: all states have period period: the gcd of lengths of cycles x d x =3
Lemma (period is a class property) x and y communicate d x = d y n x y 8 cycle of y n n 2 d x (n + n 2 ) d x (n + n 2 + n) o ) d x n ) d x apple d y
Fundamental Theorem of Markov Chain: If a finite Markov chain M =(,P) aperiodic, then initial distribution is irreducible and (0) where lim t! (0) P t = is a unique stationary distribution satisfying P = finiteness irreducibility existence uniqueness } Perron-Frobenius ergodicity convergence
Coupling of Markov Chains Markov chain M =(,P) (0) : initial X 0,X,X 2,... stationary : X0,X 0,X 0 2,... 0 faithful running of MC initial (0) : X 0 X X n X 00 t stationary : X 0 X X n X n+ Goal: lim t! Pr[X00 t = X 0 t]=
Coupling of Markov Chains Markov chain M =(,P) (0) : initial X 0,X,X 2,... stationary : X0,X 0,X 0 2,... 0 faithful running of MC initial (0) : X 0 X X n X 00 t stationary : X 0 X X n X n+ Goal: conditioning on X 0 = x, X 0 0 = y 9n, Pr[X n = X 0 n = y] > 0
Markov chain M =(,P) initial (0) : X 0 X X n stationary : X 0 X X n X n+ conditioning on X 0 = x, X 0 0 = y irreducible: 9n, P n (x, y) > 0 aperiodic: 9n 2, n = n + n 2 ( P n 2 (y, y) > 0 P n +n 2 (y, y) > 0 Pr[X n = X 0 n = y] Pr[X n = y]pr[x 0 n = y] P n (x, y)p n 2 (y, y)p n (y, y) > 0
Markov chain M =(,P) faithful running of MC initial (0) : X 0 X X n X 00 t stationary : X 0 X X n X n+ conditioning on X 0 = x, X 0 0 = y 9n, Pr[X n = X 0 n] > 0 lim t! Pr[X00 t = X 0 t]=
Fundamental Theorem of Markov Chain: If a finite Markov chain M =(,P) aperiodic, then initial distribution (ergodic) where lim t! (0) P t = is irreducible and (0) is a unique stationary distribution satisfying P = finiteness irreducibility existence uniqueness } Perron-Frobenius ergodicity convergence
Fundamental Theorem of Markov Chain: If a Markov chain M =(,P) is irreducible and ergodic, then initial distribution (0) where lim t! (0) P t = is a unique stationary distribution satisfying P = finite: ergodic = aperiodic infinite: ergodic aperiodic + non-null persistent
Infinite Markov Chains fxy : prob. of ever reaching y starting from x h xy : expected time of first reaching y starting from x f xy (t), Pr[y 62 {X,...,X t } ^ X t = y X 0 = x] f xy, X f xy (t) h xy, t= t= X tf xy (t) x is transient if f xx < x is recurrent or persistent if f xx = x is null-persistent if x is persistent and h xx = x is non-null-persistent if x is persistent and h xx < a chain is non-null-persistent if all states are
Fundamental Theorem of Markov Chain: If a Markov chain M =(,P) is irreducible and ergodic, then initial distribution (0) where lim t! (0) P t = is a unique stationary distribution satisfying P = finite: ergodic = aperiodic infinite: ergodic aperiodic + non-null persistent irreducible + non-null persistent unique stationary distribution aperiodic + non-null persistent ergodic
PageRank Rank: importance of a page A page has higher rank if pointed by more highrank pages. High-rank pages have greater influence. Pages pointing to few others have greater influence.
PageRank (simplified) the web graph G(V,E) rank of a page: r(v) r(v) = u:(u,v) E r(u) d + (u) d + (u) : out-degree of u random walk: P (u, v) = d + (u) if (u, v) E, 0 otherwise. stationary distribution: rp = r a tireless random surfer
Random Walk on Graph undirected graph G(V, E) walk: v,v 2,... V that v i+ v i random walk: v i+ is uniformly chosen from N(v i ) P (u, v) = d(u) u v 0 u v P = D A adjacency matrix A D(u, v) = d(u) u = v 0 u = v
Random Walk on Graph stationary: convergence (ergodicity); stationary distribution; hitting time: time to reach a vertex; cover time: time to reach all vertices; mixing time: time to converge.
Random Walk on Graph G(V,E) for finite chain: P (u, v) = d(u) u v 0 u v irreducible and aperiodic converge irreducible G is connected P t (u, v) > 0 A t (u, v) > 0 aperiodic G is non-bipartite bipartite no odd cycle period =2 non-bipartite (2k+)-cycle undirected 2-cycle aperiodic
Lazy Random Walk undirected graph G(V, E) lazy random walk: flip a coin to decide whether to stay P (u, v) = u = v 2 u v 2d(u) 0 otherwise P = 2 (I + D A) adjacency matrix A D(u, v) = d(u) u = v 0 u = v always aperiodic!
Random Walk on Graph G(V,E) P (u, v) = d(u) u v 0 u v Stationary distribution : v V v = v v V, v = d(v) V 2m d(v) 2m = 2m v V d(v) = ( P ) v = u V up (u, v) = u N(v) d(u) 2m d(u) = d(v) 2m = v regular graph uniform distribution
Random Walk on Graph G(V,E) P (u, v) = u = v 2 u v 2d(u) 0 otherwise Stationary distribution : v V, v = d(v) 2m ( P ) v = u V up (u, v) = 2 d(v) 2m + 2 X u2n(v) d(u) 2m d(u) = d(v) 2m = v
Reversibility ergodic flow detailed balance equation: (x)p (x, y) = (y)p (y, x) time-reversible Markov chain: 9, 8,x,y 2, (x)p (x, y) = (y)p (y, x) stationary distribution: ( P )y = X x (x)p (x, y) = X x (y)p (y, x) = (y) time-reversible: when start from Pr[X 0 = x 0 X = x... X n = x n ] = Pr[X 0 = x n X = x n... X n = x 0 ]
Reversibility ergodic flow detailed balance equation: (x)p (x, y) = (y)p (y, x) time-reversible Markov chain: 9, 8,x,y 2, (x)p (x, y) = (y)p (y, x) stationary distribution: ( P )y = X x (x)p (x, y) = X x (y)p (y, x) = (y) time-reversible: when start from (X 0,X,...,X n ) (X n,x n,...,x 0 )
Reversibility ergodic flow detailed balance equation: (x)p (x, y) = (y)p (y, x) time-reversible Markov chain: 9, 8,x,y 2, (x)p (x, y) = (y)p (y, x) stationary distribution: ( P )y = X x (x)p (x, y) = X x (y)p (y, x) = (y) the Markov chain can be uniquely represented as an edge-weighted undirected graph G(V,E) with ergodic flows as edge weights
Random Walk on Graph G(V,E) P (u, v) = P (u, v) = d(u) u v 0 u v u = v 2 u v 2d(u) 0 otherwise detailed balance equation: (u)p (u, v) = (v)p (v, u) u = v : u 6 v : u v : hold for free (u) / P (u, v)
Random Walk on Graph undirected graph G(V, E) (not necessarily regular) lazy random walk Goal: a uniform stationary distribution π Goal: an arbitrary stationary distribution π
Random Walk stationary: convergence (ergodicity); stationary distribution; hitting time: time to reach a vertex; cover time: time to reach all vertices; mixing time: time to converge.
Hitting and Covering consider a random walk on G(V,E) hitting time: expected time to reach v from u u,v = E min n>0 X n = v X 0 = u cover time: expected time to visit all vertices C u = E min n {X 0,...,X n } = V X 0 = u C(G) = max C u u V
G = K n C(G) = (n log n) G : path or cycle C(G) = (n 2 ) K n 2 n 2 G : lollipop C(G) = (n 3 )
Hitting Time u,v = E min n>0 X n = v X 0 = u stationary distribution v = d(v) 2m Renewal Theorem: irreducible v,v = v = 2m d(v)
Renewal Theorem: irreducible (x) =Pr[X 0 = x] v,v = v = 2m d(v) X 0,X,X 2,... T x : first t>0 that X t = x integer X 0 E[X] = X n 0 Pr[X >n] x,x = E[T x X 0 = x] = X n Pr[T x n X 0 = x] x,x x = X n Pr[T x n X 0 = x]pr[x 0 = x] = X n Pr[T x n ^ X 0 = x] =Pr[X 0 = x]+ X n 2 (Pr[8 apple t apple n,x t 6= x] Pr[80 apple t apple n,x t 6= x]) =Pr[X 0 = x]+ X n 2 (Pr[80 apple t apple n 2,X t 6= x] Pr[80 apple t apple n,x t 6= x]) =Pr[X 0 = x]+pr[x 0 6= x] lim n! a n = (irreducible) a n-
random walk on G(V,E) v,v = v = 2m d(v) Lemma: uv 2 E u,v < 2m v,v = X wv2e d(v) ( + w,v) v u 2m = X wv2e ( + w,v ) u,v < 2m
Cover Time h C u = E min n {X 0,...,X n } = V X 0 = u i C(G) = max C u u V Theorem: C(G) apple 4nm pick a spanning tree T of G Eulerian tour: v v 2 v 2(n ) v 2n = v C(G) 2(n ) i= < 4nm v i,v i+
USTCON (undirected s-t connectivity) Instance: undirected G(V, E); vertices: s, t s-t connected? deterministic: traverse: linear space log-space? s undirected G t
Theorem (Aleliunas-Karp-Lipton-Lovász-Rackoff 979) USTCON can be solved by a poly-time Monte Carlo randomized algorithm with bounded one-sided error, which uses O(log n) extra space. start a random walk at s; if reach t in 4n 3 steps return yes else return no space: O(log n) unconnected no connected Cover time: C(G) 4nm < 2n 3 Markov s inequality Pr[ no ] < /2
Electric Network edge uv : resistance R uv vertex v : potential v edge orientation u! v: current flow C u!v potential difference u,v = u v Kirchhoff s Law: vertex v, flow-in = flow-out Ohm s Law: edge uv, C u!v = u,v R uv
Effective Resistance electrical network: u effective resistance R(u,v): potential difference between u and v required to send unit of flow current from u to v
each edge e resistance R e = u Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v)
graph G(V,E) construct an electrical network: for every edge e : a special vertex v : R e = each vertex u : inject d(u) units current flow into u remove all 2m units current flow from v Lemma: 8u 2 V, u,v = u,v
Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) d(u) = X uw2e = X uw2e C u!w (Kirchhoff) (Ohm) X = ( u,v w,v ) uw2e u,w = d(u) u,v X X uw2e uw2e w,v w,v
Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) X uw2e w,v u,v = E min n>0 X n = v X 0 = u X u,v = wu2e =+ d(u) d(u) ( + w,v) X wu2e w,v
Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) u,v =+ d(u) X uw2e X wu2e w,v w,v ) has the same unique solution
Effective Resistance electrical network: each edge e resistance R e = u effective resistance R(u,v): potential difference between u and v required to send unit of flow current from u to v
each edge e resistance R e = u Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: Scenario B
Lemma: 8u 2 V, u,v = u,v u,v =+ d(u) u,v =+ d(u) X uw2e X wu2e w,v w,v ) has the same unique solution
Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: Scenario B A u,v = u,v B v,u = v,u
Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: C: Scenario B Scenario C A u,v = u,v B v,u = v,u C u,v = B v,u = v,u D: D = A + C Scenario D D u,v = A u,v + C u,v = u,v + v,u D u,v : potential difference between u and v to send 2m units current flow from u to v
Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) A: B: C: Scenario B Scenario C A u,v = u,v B v,u = v,u C u,v = B v,u = v,u D: D = A + C Scenario D D u,v = A u,v + C u,v = u,v + v,u R(u, v) = D u,v 2m
Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) Theorem: C(G) apple 2nm pick a spanning tree T of G Eulerian tour: v v 2 v 2(n ) v 2n = v C(G) 2(n ) i= v i,v i+ = X ( u,v + v,u ) uv2t X =2m R(u, v) apple 2mn uv2t
Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) G: path u n v C(G) apple 2nm =2n 2 = O(n 2 ) C(G) = u,v v,u = (n 2 ) C(G) = (n 2 ) u,v + v,u =2mR(u, v) =2n(n )
Theorem (Chandra-Raghavan-Ruzzo-Smolensky-Tiwari 989) 8u, v, 2 V, u,v + v,u =2mR(u, v) G: lollipop K n u 2 n 2 v C(G) apple 2nm = O(n 3 ) C(G) u,v = (n 3 C(G) = (n 3 ) ) u,v + v,u =2mR u,v = (n 3 ) v,u = O(n 2 )