Modern Discrete Probability Spectral Techniques

Moder Discrete Probability VI - Spectral Techiques Backgroud Sébastie Roch UW Madiso Mathematics December 1, 2014

1 Review 2 3 4

Mixig time I Theorem (Covergece to statioarity) Cosider a fiite state space V. Suppose the trasitio matrix P is irreducible, aperiodic ad has statioary distributio π. The, for all x, y, P t (x, y) π(y) as t +. For probability measures µ, ν o V, let their total variatio distace be µ ν TV := sup A V µ(a) ν(a). Defiitio (Mixig time) The mixig time is t mix (ε) := mi{t 0 : d(t) ε}, where d(t) := max x V P t (x, ) π( ) TV.

Mixig time II Defiitio (Separatio distace) The separatio distace is defied as [ ] s x (t) := max 1 Pt (x, y), y V π(y) ad we let s(t) := max x V s x (t). Because both {π(y)} ad {P t (x, y)} are o-egative ad sum to 1, we have that s x (t) 0. Lemma (Separatio distace v. total variatio distace) d(t) s(t).

Mixig time III Proof: Because 1 = y π(y) = y Pt (x, y), y:p t (x,y)<π(y) [ ] π(y) P t (x, y) = y:p t (x,y) π(y) [ ] P t (x, y) π(y). So P t (x, ) π( ) TV = 1 π(y) P t (x, y) 2 y [ ] = π(y) P t (x, y) = y:p t (x,y)<π(y) y:p t (x,y)<π(y) s x(t). [ ] π(y) 1 Pt (x, y) π(y)

Reversible chais Defiitio (Reversible chai) A trasitio matrix P is reversible w.r.t. a measure η if η(x)p(x, y) = η(y)p(y, x) for all x, y V. By summig over y, such a measure is ecessarily statioary.

Example I Recall: Defiitio (Radom walk o a graph) Let G = (V, E) be a fiite or coutable, locally fiite graph. Simple radom walk o G is the Markov chai o V, started at a arbitrary vertex, which at each time picks a uiformly chose eighbor of the curret state. Let (X t ) be simple radom walk o a coected graph G. The (X t ) is reversible w.r.t. η(v) := δ(v), where δ(v) is the degree of vertex v.

Example II Defiitio (Radom walk o a etwork) Let G = (V, E) be a fiite or coutable, locally fiite graph. Let c : E R + be a positive edge weight fuctio o G. We call N = (G, c) a etwork. Radom walk o N is the Markov chai o V, started at a arbitrary vertex, which at each time picks a eighbor of the curret state proportioally to the weight of the correspodig edge. Ay coutable, reversible Markov chai ca be see as a radom walk o a etwork (ot ecessarily locally fiite) by settig c(e) := π(x)p(x, y) = π(y)p(y, x) for all e = {x, y} E. Let (X t ) be radom walk o a etwork N = (G, c). The (X t ) is reversible w.r.t. η(v) := c(v), where c(v) := x v c(v, x).

Eigebasis I We let := V < +. Assume that P is irreducible ad reversible w.r.t. its statioary distributio π > 0. Defie f, g π := x V π(x)f (x)g(x), f 2 π := f, f π, (Pf )(x) := y P(x, y)f (y). We let l 2 (V, π) be the Hilbert space of real-valued fuctios o V equipped with the ier product, π (equivalet to the vector space (R,, π )). Theorem There is a orthoormal basis of l 2 (V, π) formed of eigefuctios {f j } j=1 of P with real eigevalues {λ j} j=1.

Eigebasis II Proof: We work over (R,, π). Let D π be the diagoal matrix with π o the diagoal. By reversibility, π(x) π(y) M(x, y) := P(x, y) = P(y, x) =: M(y, x). π(y) π(x) So M = (M(x, y)) x,y = Dπ 1/2 PDπ 1/2, as a symmetric matrix, has real eigevectors {φ j } j=1 formig a orthoormal basis of R with correspodig real eigevalues {λ j } j=1. Defie f j := Dπ 1/2 φ j. The ad Pf j = PD 1/2 π φ j = D 1/2 π D 1/2 π PD 1/2 π φ j = D 1/2 π Mφ j = λ j D 1/2 π φ j = λ j f j, f i, f j π = Dπ 1/2 φ i, Dπ 1/2 φ j π = x = φ i, φ j. π(x)[π(x) 1/2 φ i (x)][π(x) 1/2 φ j (x)]

Eigebasis III Lemma For all j 1, x π(x)f j(x) = 0. Proof: By orthoormality, f 1, f j π = 0. Now use the fact that f 1 1. Let δ x (y) := 1 {x=y}. Lemma For all x, y, j=1 f j(x)f j (y) = π(x) 1 δ x (y). Proof: Usig the otatio of the theorem, the matrix Φ whose colums are the φ j s is uitary so ΦΦ = I. That is, j=1 φ j(x)φ j (y) = δ x(y), or π(x)π(y)fj (x)f j (y) = δ x(y). Rearragig gives the result. j=1

Eigebasis IV Lemma Let g l 2 (V, π). The g = j=1 g, f j π f j. Proof: By the previous lemma, for all x g, f j πf j (x) = π(y)g(y)f j (y)f j (x) = y y j=1 j=1 π(y)g(y)[π(x) 1 δ x(y)] = g(x). Lemma Let g l 2 (V, π). The g 2 π = j=1 g, f j 2 π. Proof: By the previous lemma, 2 g 2 π = g, f j πf j = g, f i πf i, g, f j πf j j=1 i=1 π j=1 π = g, f i π g, f j π f i, f j π, i,j=1

Eigevalues I Let P be fiite, irreducible ad reversible. Lemma Ay eigevalue λ of P satisfies λ 1. Proof: Pf = λf = λ f = Pf = max x y P(x, y)f (y) f We order the eigevalues 1 λ 1 λ 1. I fact: Lemma We have λ 1 = 1 ad λ 2 < 1. Also we ca take f 1 1. Proof: Because P is stochastic, the all-oe vector is a right eigevector with eigevalue 1. Ay eigefuctio with eigevalue 1 is P-harmoic. By Corollary 3.22 for a fiite, irreducible chai the oly harmoic fuctios are the costat fuctios. So the eigespace correspodig to 1 is oe-dimesioal. Sice all eigevalues are real, we must have λ 2 < 1.

Eigevalues II Theorem (Rayleigh s quotiet) Let P be fiite, irreducible ad reversible with respect to π. The secod largest eigevalue is characterized by { f, Pf π λ 2 = sup : f l 2 (V, π), } π(x)f (x) = 0. f, f π x (Similarly, λ 1 = sup f l 2 (V,π) f,pf π f,f π.) Proof: Recallig that f 1 1, the coditio x π(x)f (x) = 0 is equivalet to f 1, f π = 0. For such a f, the eigedecompositio is f = f, f j πf j = f, f j πf j, j=1 j=2

Eigevalues III ad so that Pf = f, f j πλ j f j, j=2 f, Pf π f, f π = i=2 j=2 f, f i π f, f j πλ j f i, f j π j=2 f, f j 2 π Takig f = f 2 achieves the supremum. = j=2 f, f j 2 πλ j j=2 f, f j 2 π λ 2.

1 Review 2 3 4

Spectral decompositio I Theorem Let {f j } j=1 be the eigefuctios of a reversible ad irreducible trasitio matrix P with correspodig eigevalues {λ j } j=1, as defied previously. Assume λ 1 λ. We have the decompositio P t (x, y) π(y) = 1 + f j (x)f j (y)λ t j. j=2

Spectral decompositio II Proof: Let F be the matrix whose colums are the eigevectors {f j } j=1 ad let D λ be the diagoal matrix with {λ j } j=1 o the diagoal. Usig the otatio of the eigebasis theorem, which after rearragig becomes D 1/2 π P t D 1/2 π = M t = (D 1/2 π F )D t λ(d 1/2 π F ), P t D 1 π = FD t λf.

Example: two-state chai I Let V := {0, 1} ad, for α, β (0, 1), ( ) 1 α α P :=. β 1 β Observe that P is reversible w.r.t. to the statioary distributio ( ) β π := α + β, α. α + β We kow that f 1 1 is a eigefuctio with eigevalue 1. As ca be checked by direct computatio, the other eigefuctio (i vector form) is ( α ) β f 2 := β,, α with eigevalue λ 2 := 1 α β. We ormalized f 2 so f 2 2 π = 1.

Example: two-state chai II The spectral decompositio is therefore ( ) ( ) P t Dπ 1 1 1 α = + (1 α β) t β 1 1 1 β. 1 α Put differetly, P t = ( β α+β β α+β α α+β α α+β ) + (1 α β) t ( α α+β β α+β α α+β β α+β ). (Note for istace that the case α + β = 1 correspods to a rak-oe P, which immediately coverges.)

Example: two-state chai III Assume β α. The d(t) = max x 1 2 y P t (x, y) π(y) = β α + β 1 α β t. As a result, ( ) ( ) log ε α+β β log ε 1 log α+β t mix (ε) = log 1 α β = β log 1 α β 1.

Spectral decompositio: agai Recall: Theorem Let {f j } j=1 be the eigefuctios of a reversible ad irreducible trasitio matrix P with correspodig eigevalues {λ j } j=1, as defied previously. Assume λ 1 λ. We have the decompositio P t (x, y) π(y) = 1 + f j (x)f j (y)λ t j. j=2

Spectral gap From the spectral decompositio, the speed of covergece of P t (x, y) to π(y) is govered by the largest eigevalue of P ot equal to 1. Defiitio (Spectral gap) The absolute spectral gap is γ := 1 λ where λ := λ 2 λ. The spectral gap is γ := 1 λ 2. Note that the eigevalues of the lazy versio 1 2 P + 1 2I of P are { 1 2 (λ j + 1)} j=1 which are all oegative. So, there, γ = γ. Defiitio (Relaxatio time) The relaxatio time is defied as t rel := γ 1.

Example cotiued: two-state chai There two cases: α + β 1: I that case the spectral gap is γ = γ = α + β ad the relaxatio time is t rel = 1/(α + β). α + β > 1: I that case the spectral gap is γ = γ = 2 α β ad the relaxatio time is t rel = 1/(2 α β).

Mixig time v. relaxatio time I Theorem Let P be reversible, irreducible, ad aperiodic with statioary distributio π. Let π mi = mi x π(x). For all ε > 0, ( ) ( ) 1 1 (t rel 1) log t mix (ε) log t rel. 2ε επ mi Proof: We start with the upper boud. By the lemma, it suffices to fid t such that s(t) ε. By the spectral decompositio ad Cauchy-Schwarz, Pt (x, y) 1 π(y) λt f j (x)f j (y) λ t f j (x) 2 f j (y) 2. j=2 By our previous lemma, j=2 f j(x) 2 π(x) 1. Pluggig this back above, Pt (x, y) 1 π(y) λt π(x) 1 π(y) 1 λt (1 γ )t = e γ t. π mi π mi π mi j=2 j=2

Mixig time v. relaxatio time II ( The r.h.s. is less tha ε whe t log ) 1 επ mi t rel. For the lower boud, let f be a eigefuctio associated with a eigevalue achievig λ := λ 2 λ. Let z be such that f (z) = f. By our previous lemma, y π(y)f (y) = 0. Hece λ t f (z) = P t f (z) = [P t (z, y)f (y) π(y)f (y)] y f P t (z, y) π(y) f 2d(t), so d(t) 1 2 λt. Whe t = t mix(ε), ε 1 2 λt mix(ε) ( ) ( 1 1 t mix(ε) 1 t mix(ε) log λ y. Therefore ) log λ ( ) 1. 2ε ( ) 1 ( ) 1 ( ) 1 1 The result follows from λ 1 = 1 λ λ = γ 1 γ = trel 1.

1 Review 2 3 4

Radom walk o the cycle I Cosider simple radom walk o a -cycle. That is, V := {0, 1,..., 1} ad P(x, y) = 1/2 if ad oly if x y = 1 mod. Lemma (Eigebasis o the cycle) For j = 0,..., 1, the fuctio ( ) 2πjx f j (x) := cos, x = 0, 1,..., 1, is a eigefuctio of P with eigevalue ( ) 2πj λ j := cos.

Radom walk o the cycle II Proof: Note that, for all i, x, P(x, y)f j (y) = 1 [ ( ) ( )] 2πj(y 1) 2πj(y + 1) cos + cos 2 y [ e i 2πj(y 1) 2πj(y 1) i + e = 1 2 [ e i 2πjy = = [ cos = cos 2 2πjy i + e 2 ( 2πjy ( 2πj )] [ cos ) f j (y). ] [ e i 2πj ( 2πj 2πj(y+1) + ei 2πj i + e 2 )] ] ] 2πj(y+1) i + e 2

Radom walk o the cycle III Theorem (Relaxatio time o the cycle) The relaxatio time for lazy simple radom walk o the cycle is Proof: The eigevalues are 2 t rel = 1 cos ( ) = Θ( 2 ). 2π The spectral gap is therefore 1 (1 cos ( 2π 2 ( 2π 1 cos [ ( ) ] 1 2πj cos + 1. 2 ) ). By a Taylor expasio, ) = 4π2 2 + O( 4 ). Sice π mi = 1/, we get t mix (ε) = O( 2 log ) ad t mix (ε) = Ω( 2 ). We showed before that i fact t mix (ε) = Θ( 2 ).

Radom walk o the cycle IV I this case, a sharper boud ca be obtaied by workig directly with the spectral decompositio. By Jese s iequality, { } 2 4 P t (x, ) π( ) 2 TV = π(y) Pt (x, y) 1 π(y) ( ) P t 2 (x, y) π(y) 1 π(y) y y 2 = λ t j f j (x)f j = λ 2t j f j (x) 2. j=2 The last sum does ot deped o x by symmetry. Summig over x ad dividig by, which is the same as multiplyig by π(x), gives π j=2 4 P t (x, ) π( ) 2 TV x π(x) j=2 λ 2t j f j (x) 2 = j=2 λ 2t j π(x)f j (x) 2 = x j=2 λ 2t j, where we used that f j 2 π = 1.

Radom walk o the cycle V Cosider the o-lazy chai with odd. We get 4d(t) 2 ( ) 2t ( 1)/2 2πj cos = 2 j=2 j=1 cos ( ) 2t πj. For x [0, π/2), cos x e x2 /2. (Ideed, let h(x) = log(e x2 /2 cos x). The h (x) = x ta x 0 sice (ta x) = 1 + ta 2 x 1 for all x ad ta 0 = 0. So h(x) h(0) = 0.) The ( 1)/2 4d(t) 2 2 j=1 ) ) exp ( π2 j 2 t 2 exp ( π2 2 t 2 ) 2 exp ( π2 t ( ) exp 3π2 t 2 l 2 l=0 j=1 ( 2 exp = 1 exp ( exp π2 t 2 π2 (j 2 1) ) ( ), 3π2 t 2 where we used that j 2 1 3(j 1) for all j = 1, 2, 3,.... So t mix(ε) = O( 2 ). 2 ) t

Radom walk o the hypercube I Cosider simple radom walk o the hypercube V := { 1, +1} where x y if x y 1 = 1. For J [], we let χ J (x) = j J x j, x V. These are called parity fuctios. Lemma (Eigebasis o the hypercube) For all J [], the fuctio χ J is a eigefuctio of P with eigevalue λ J := 2 J.

Radom walk o the hypercube II Proof: For x V ad i [], let x [i] be x where coordiate i is flipped. Note that, for all J, x, P(x, y)χ J (y) = y 1 χ J(x [i] ) = J χ J (x) J χ J(x) = 2 J χ J (x). i=1

Radom walk o the hypercube III Theorem (Relaxatio time o the hypercube) The relaxatio time for lazy simple radom walk o the hypercube is t rel =. Proof: The eigevalues are J γ = γ = 1 1 = 1. for J []. The spectral gap is Because V = 2, π mi = 1/2. Hece we have t mix (ε) = O( 2 ) ad t mix (ε) = Ω(). We have show before that i fact t mix (ε) = Θ( log ).

Radom walk o the hypercube IV As we did for the cycle, we obtai a sharper boud by workig directly with the spectral decompositio. By the same argumet, 4d(t) 2 J λ 2t J. Cosider the lazy chai agai. The 4d(t) 2 ( ) ( ) 2t J ( = 1 l ) 2t l J l=1 ( ( = 1 + exp 2t )) 1. So t mix(ε) 1 log + O(). 2 ( ) ( exp 2tl ) l l=1

1 Review 2 3 4

Some remarks about ifiite etworks I Remark (Recurret case) The previous results caot i geeral be exteded to ifiite etworks. Suppose P is irreducible, aperiodic ad positive recurret. The it ca be show that, if π is the statioary distributio, the for all x P t (x, ) π( ) TV 0, as t +. However, oe eeds stroger coditios o P tha reversibility for the spectral theorem to apply, e.g., compactess (that is, P maps bouded sets to relatively compact sets (i.e. whose closure is compact)).

Some remarks about ifiite etworks II Example (A positive recurret chai whose P is ot compact) For p < 1/2, let (X t ) be the birth-death chai with V := {0, 1, 2,...}, P(0, 0) := 1 p, P(0, 1) = p, P(x, x + 1) := p ad P(x, x 1) := 1 p for all x 1, ad P(x, y) := 0 if x y > 1. As ca be checked by direct computatio, P is reversible with respect to the statioary distributio π(x) = (1 γ)γ x for x 0 where γ := p 1 p. For j 1, defie g j (x) := π(j) 1/2 1 {x=j}. The g j 2 π = 1 for all j so {g j } j is bouded i l 2 (V, π). O the other had, Pg j (x) = pπ(j) 1/2 1 {x=j 1} + (1 p)π(j) 1/2 1 {x=j+1}.

Some remarks about ifiite etworks III Example (Cotiued) So Pg j 2 π = p 2 π(j) 1 π(j 1) + (1 p) 2 π(j) 1 π(j + 1) = 2p(1 p). Hece {Pg j } j is also bouded. However, for j > l Pg j Pg l 2 π (1 p) 2 π(j) 1 π(j + 1) + p 2 π(l) 1 π(l 1) = 2p(1 p). So {Pg j } j does ot have a covergig subsequece ad therefore is ot relatively compact.

Some remarks about ifiite etworks IV Most radom walks o ifiite etworks we have ecoutered so far were trasiet or ull recurret. I such cases, there is o statioary distributio to coverge to. I fact: Theorem If P is a irreducible chai which is either trasiet or ull recurret, we have for all x, y lim t P t (x, y) = 0. Proof: I the trasiet case, sice t 1 X t =y < + a.s. uder P x, we have t Pt (x, y) = E x[ t 1 X t =y] < + so P t (x, y) 0.