RANDOM WALKS. Course: Spring 2016 Lecture notes updated: May 2, Contents

Size: px

Start display at page:

Download "RANDOM WALKS. Course: Spring 2016 Lecture notes updated: May 2, Contents"

Harry Ward
5 years ago
Views:

1 RANDOM WALKS ARIEL YADIN Course: Spring 2016 Lecture notes updated: May 2, 2016 Contents Lecture 1. Introduction 3 Lecture 2. Markov Chains 8 Lecture 3. Recurrence and Transience 18 Lecture 4. Stationary Distributions 26 Lecture 5. Positive Recurrent Chains 33 Lecture 6. Convergence to Equilibrium 37 Lecture 7. Conditional Expectation 42 Lecture 8. Martingales 50 Lecture 9. Reversible Chains 55 Lecture 10. Discrete Analysis 60 Lecture 11. Networks 67 Lecture 12. Network Reduction 73 Lecture 13. Thompson s Principle 80 Lecture 14. Nash-Williams 84 Lecture 15. Flows 89 Lecture 16. Resistance in Euclidean Lattices 93 Lecture 17. Spectral Analysis 98 1

2 2 Lecture 18. Kesten s Amenability Criterion 103 Lecture Lecture Lecture Lecture Number of exercises in lecture: 0 Total number of exercises until here: 0

3 3 Random Walks Ariel Yadin Lecture 1: Introduction 1.1. Overview In this course we will study the behavior of random processes; that is, processes that evolve in time with some randomness, or probability measure, governing the evolution. Let us give some examples: A gambler playing the roulette. A drunk man walking in some city. A drunk bird flying in the sky. The evolution of a certain family name. Some questions which we will be able to (hopefully) answer by the end of the course: Suppose a gambler starts with N Shekel. What is the probability that the gambler will earn another N Shekel before losing all of the money? How long will it take for a drunk man walking to reach either his house or the city limits? Suppose a chess knight moves randomly on a chess board. Will the knight eventually return to the starting point? What is the expected number of steps until the knight returns? Suppose that men of the Rothschild family have three children on average. What is the probability that the Rothschild name will still be alive in another 100 years? Is there positive probability for the Rothschild name to survive forever? 1.2. Random Walks on Z We will start with some soft example, and then go into the more deep and precise theory. What is a random walk? A (simple) random walk on a graph is a process, or a sequence of vertices, such that at every step the next vertex is chosen uniformly among the neighbors of the current vertex, each step of the walk independently. Story about Pólya meeting a couple in the woods. George Pólya ( )

4 4 Figure 1. Path of a drunk man walking in the streets. Figure 2. Path of a drunk bird flying around. Now, suppose we want to perform a random walk on Z. If the walker is at a vertex z, then a uniformly chosen neighbor is choosing z + 1 or z 1 with probability 1/2 each.

5 5 That is, we can model a random walk on Z by considering an i.i.d. sequence (X k ) k=1, where X k is uniform on { 1, 1}, and the walk will be S t = t k=1 X k. So X k is the k-th step of the walk, and S t is the position after t steps. Let us consider a few properties of the random walk on Z: First let us calculate the expected number of visits to 0 by time t: Proposition 1.1. Let (S t ) t be a random walk on Z. Denote by V t the number of visits to 0 up to time t; that is, V t = # {1 k t : S k = 0}. Then, there exists a constant c > 0 such that for all t, E[V t ] > c t. Proof. An inequality we will use is Stirling s approximation of n!: 2πn(n/e) n 1 e 12n+1 < n! < 2πn(n/e) n e 1 12n. This leads by a bit of careful computation to: ( 1 2 2n exp 1 ) < πn 12n + 1 ( ) 2n < 1 ( ) 1 2 2n exp. n πn 12n Specifically, 1 2 < ( ) 2n πn 2 2n < 2. n Now, what is the probability P[S k = 0]? Note that there are k steps, so for S k = 0 we need that the number of + s equals the number of s. Rigorously, if James Stirling ( ) R t = # {1 k t : X k = 1} and L t = # {1 k t : X k = 1}, then R t + L t = t. Moreover, the distribution of R t is Bin(t, 1/2). Also, S t = R t L t, so for S t = 0 we need that R t = L t = t/2. This is only possible for even t, and we get ( ) 2k P[S 2k = 0] = P[R 2k = k] = 2 2k and P[S 2k+1 = 0] = 0. k Now, note that V t = t k=1 1 {S k =0}. So E[V t ] = t P[S k = 0] = k=1 t/2 k=1 ( ) 2k 2 2k k

6 6 Since m k=1 1 m+1 1 dx = 2π 1/2 ( m + 1 1), πk 1 πx we get that E[V t ] c t for some c > 0. Let us now consider the probability that the random walker will return to the origin. Proposition 1.2. P[ t 1 : S t = 0] = 1. Proof. Let p = P[ t 1 : S t = 0]. Assume for a contradiction that p < 1. (p > 0 since p > P[S 2 = 0] = 1 2.) Suppose that S t = 0 for some t > 0. Then, since S t+k = S t + k j=1 X t+j, (S t+k ) k has the same distribution as a random walk on Z, and is independent of S t. So P[ k 1 : S t+k = 0 S t = 0] = p. Thus, every time we are at 0 there is probability 0 < 1 p < 1 to never return. Now we consider the different excursions. That is, let T 0 = 0 and define inductively T k = inf {t T k : S t = 0}, where inf =. Now let K be the first k such that T k =. The analysis above gives that for k 1, P[K = k] = P[T 1 <,..., T k 1 <, T k = ] = P[T 1 T 0 <,..., T k 1 T k 2 <, T k T k 1 = ]. The main observation now is that the different T k T k 1 are independent, so P[K = k] = p k 1 (1 p). That is, K Geo(1 p). Thus, E[K] = 1 1 p. But note that K is exactly the number of visits to 0 in the infinite time walk. That is, V t K. However, in the previous proposition we have shown that E[V t ] c t a contradiction! So it must be that p = 1. It is not a coincidence that the expected number of visits to 0 is infinite, and that the probability to return to 0 is 1. This will also be the case in 2-dimensions, but not in 3-dimensions. In the upcoming classes we will rigorously prove the following theorem by Pólya. Theorem 1.3. Fix d 1. Let (X k ) k be i.i.d. d-dimension random variables uniformly distributed on {±e 1,..., ±e d } (where {e 1,..., e d } is the standard basis for R d ). Let S t = t k=1 X k. Let p(d) = P[ t 1 : S t = 0]. Then, p(d) = 1 for d 2 and p(d) < 1 for d 3.

7 7 Remark 1.4. The proof for d 3 is mainly that P[S t = 0] Ct d/2. Thus, for d 3, P[S t = 0] <. t=1 So by the Borel-Cantelli Lemma P[S t = 0 i.o. ] = 0. In other words, P[ T : t > T S t 0] = P[lim inf {S t 0}] = 1. Thus, a.s. the number of visits to 0 is finite. If the probability to return to 0 was 1, then the number of visits to 0 must be infinite a.s. All this will be done rigorously in the upcoming classes. Number of exercises in lecture: 0 Total number of exercises until here: 0

8 8 Random Walks Ariel Yadin Lecture 2: Markov Chains 2.1. Preliminaries Graphs. We will make use of the structure known as a graph: Notation: For a set S we use ( S k) to denote the set of all subsets of size k in S; e.g. ( ) S = {{x, y} : x, y S, x y}. 2 Definition 2.1. A graph G is a pair G = (V (G), E(G)), where V (G) is a countable set, and E(G) ( ) V (G) 2. The elements of V (G) are called vertices. The elements of E(G) are called edges. The notation x G y (sometimes just x y when G is clear from the context) is used for {x, y} E(G). If x y, we say that x is a neighbor of y, or that x is adjacent to y. If x e E(G) then the edge e is said to be incident to x, and x is incident to e. G. The degree of a vertex x, denoted deg(x) = deg G (x) is the number of edges incident to x in Notation: Many times we will use x G instead of x V (G). Example 2.2. The complete graph. Empty graph on n vertices. Cycles. Z, Z 2, Z d. Regular trees. Cayley graphs of finitely generated groups: Let G =< S > be a finitely generated group, with a finite generating set S such that S is symmetric (S = S 1 ). Then, we can equip G with a graph structure C = C G,S by letting V (C) = G and {g, h} E(C) iff g 1 h S. S being symmetric implies that this is a graph.

9 9 C G,S is called the Cayley graph of G with respect to S. Examples: Z d, regular trees, cycles, complete graphs. Definition 2.3. Let G be a graph. A path in G is a sequence γ = (γ 0, γ 1,..., γ n ) (with the possibility of n = ) such that for all j, γ j γ j+1. γ 0 is the start vertex and γ n is the end vertex (when n < ). The length of γ is γ = n. If γ is a path in G such that γ starts at x and ends at y we write γ : x y. The notion of a path on a graph gives rise to two important notions: connectivity and graph distance. Definition 2.4. Let G be a graph. For two vertices x, y G define dist(x, y) = dist G (x, y) := inf { γ : γ : x y}, where inf =. Exercise 2.1. Show that dist G defines a metric on G. (Recall that a metric is a function that satisfies: ρ(x, y) 0 and ρ(x, y) = 0 iff x = y. ρ(x, y) = ρ(y, x). ρ(x, y) ρ(x, z) + ρ(z, y). ) Definition 2.5. Let G be a graph. We say that vertices x and y are connected if there exists a path γ : x y of finite length. That is, if dist G (x, y) <. We denote x connected to y by x y. The relation is an equivalence relation, so we can speak of equivalence classes. The equivalence class of a vertex x under this relation is called the connected component of x. If a graph G has only one connected component it is called connected. That is, G is connected if for every x, y G we have that x y.

10 10 Exercise 2.2. Prove that is an equivalence relation in any graph. In this course we will focus on connected graphs. Notation: For a path in a graph G, or more generally, a sequence of elements from a set S, we use the following time notation: If s = (s 0, s 1,..., s n,...) is a sequence in S (finite of infinite), then s[t 1, t 2 ] = (s t1, s t1+1,..., s t2 ) for all integers t 2 t S-valued random variables. Given a countable set S, we can define a discrete topology on S. Thus, the Borel σ-algebra on S is just the complete σ-algebra 2 S. This gives rise to the notion of S-valued random variables, which is just a fancy name for functions X from a probability space into S such that for every s S the pull-back X 1 (s) is an event. That is, Definition 2.6. Let (Ω, F, P) be a probability space, and let S be a countable set. A S-valued random variable is a function X : Ω S such that for any s S, X 1 (s) F Sequences - infinite dimensional vectors. At some point, we will want to consider sequences of random variables. If X = (X n ) n is a sequence of S-valued random variables, we can think of X as an infinite dimensional vector. What is the appropriate measurable space for such vectors? Well, we can consider Ω = S N, the space of all sequences in S. Next, we have a π-system of cylinder sets: Given a finite sequence s 0, s 1,..., s m in S, the cylinder induced by these is C = C(s 1,..., s m ) = { } ω S N : ω 0 = s 0,..., ω m = s m. The collection of all cylinder sets forms a π-system. We let F be the σ-algebra generated by this π-system. Constantin Carathéodory ( ) Carathéodory and Kolmogorov extension. Now suppose we have a probability measure P on (Ω, F) as above. For every n, we can consider the restriction of P to the first n coordinates; that is, we can consider Ω n = S n and the full σ-algebra on Ω n, and then P n [{s 0, s 1,..., s n 1 }] := P[C(s 0, s 1,..., s n 1 )]

11 11 defines a probability measure on Ω n. Note that these measures are consistent, in the sense that for any n > m, P m [{s 0,..., s m }] = P n [{ω S n : ω 0 = s 0,..., ω m = s m }]. Theorems by Carathéodory and Kolmogorov tell us that if we started with a consistent family of probability measure on S n, n = 1, 2,..., we could find a unique extension of these whose restriction would give these measures. In other words, the finite-dimensional marginals determine the probability measure of the sequence. Andrey Kolmogorov ( ) Matrices. Recall that if A, B are n n matrix and v is an n-dimensional vector, then Av, va are vectors defined by n n (Av) k = A k,j v j and (va) k = v j A j,k. j=1 j=1 Also, AB is the matrix defined by n (AB) m,k = A m,j B j,k. j=1 These definitions can be generalized to infinite dimensions. Also, we will view vectors also as functions, and matrices as operators. For example, if C 0 (N) = R N = {f : N R}. Then, any infinite matrix A is an operator on C 0 (N) by defining (Af)(k) := n A(k, n)f(n) and (fa)(k) := n f(n)a(n, k) Markov Chains A stochastic process is just a sequence of random variables. If (X n ) n is a stochastic process, or a sequences of random variables, then we can think of the sequence (X n ) n as a infinite dimensional random variable; consider the function f : N R defined by f(n) = X n. This is a different function for each ω Ω. We can view this as a random function. Up till now we have not restricted our processes - so anything can be a stochastic process. However, in the discussion regarding random walks, we wanted the current step to be dependent only on the position, regardless of the history and time. This gives rise to the following definition: Definition 2.7. Let S be a countable set. A Markov chain on S is a sequence (X n ) n 0 of S-valued random variables (i.e. measurable functions X n : Ω S), that satisfies the following Markovian property:

12 12 For any n 0, and any s 0, s 1,..., s n, s n+1 S, P[X n+1 = s n+1 X 0 = s 0,..., X n = s n ] = P[X n+1 = s n+1 X n = s n ] = P[X 1 = s n+1 X 0 = s n ]. That is, the probability to go from s to s does not depend on n or on the history, but only on the current position being at s and on s. This property is known as the Markov property. ndrey Markov ( ) A set S as above is called the state space. Remark 2.8. Any Markov chain is characterized by its transition matrix. Let (X n ) n be a Markov chain on S. For x, y S define P (x, y) = P[X n+1 = y X n = x] (which is independent of n). Then, P is a S S matrix indexed by the elements of S. One immediately notices that for all x, P (x, y) = 1, y S and that all the entries of P are in [0, 1]. Such a matrix is called stochastic. [ Each row of the matrix is a probability measure on S. ] On the other hand, suppose that P is a stochastic matrix indexed by a countable set S. Then, one can define the sequence of S-valued random variables as follows. Let X 0 = x for some fixed starting point x X. For all n 0, conditioned on X 0 = s 0,..., X n = s n, define X n+1 as the random variable with distribution P[X n+1 = y X n = s n,..., X 0 = s 0 ] = P (s n, y). One can verify that this defines a Markov chain. We will identify a stochastic matrix P with the Markov chain it defines. Notation: We say that (X t ) t is Markov-(µ, P ) if (X t ) t is a Markov chain with transition matrix P and starting distribution X 0 µ. If we wish to stress the state space, we say that (X t ) t is Markov-(µ, P, S). Sometimes we omit the starting distribution; i.e. (X t ) t is Markov-P means that (X t ) t is a Markov chain with transition matrix P. Example 2.9. Consider the following state space and matrix: S = Z. P (x, y) = 0 if x y 1 and P (x, y) = 1/2 if x y = 1. What if we change this to P (x, y) = 1/4 for x y = 1 and P (x, x) = 1/2? What about P (x, x + 1) = 3/4 and P (x, x 1) = 1/4? Example Consider the set Z n := Z/nZ = {0, 1,..., n 1}. Let P (x, y) = 1/2 for x y { 1, 1} (mod n).

13 13 Example Let G be a graph. For x, y G define P (x, y) = 1 deg(x) if x y and P (x, y) = 0 if x y. This Markov chain is called the simple random walk on G. If we take 0 < α < 1 and set Q(x, x) = α and Q(x, y) = (1 α) 1 deg(x) for x y, and Q(x, y) = 0 for x y, then Q is also a stochastic matrix, and defines what is sometimes called the lazy random walk on G (with holding probability α). Note that Q = αi + (1 α)p. Notation: We will usually use (X n ) n to denote the realization of Markov chains. We will also use P x to denote the probability measure P x = P[ X 0 = x]. Note that the Markov property is just the statement that P[X n = x X 0 = s 0,..., X n = s n ] = P[X n+1 = x X n = s n ] = P sn [X 1 = x]. More generally, if µ is a probability measure on S, we write P µ = P[ X 0 µ] = s µ(s) P s. Exercise 2.3. Let (X n ) n be a Markov chain on state space S, with transition matrix P. Show that for any event A σ(x 0,..., X k ) P µ [X n+k = y A, X k = x] = P n (x, y) (provided P µ [A, X k = x] > 0). Remark For those uncomfortable with σ-algebras, Example Consider a bored programmer. She has a (possibly biased) coin, and two chairs, say a and b. Every minute, out of boredom, she tosses the coin. If it comes out heads, she moves to the other chair. Otherwise, she does nothing. This can be modeled by a Markov chain on the state space {a, b}. At each time, with some probability 1 p the programmer does not move, and with probability p she jumps to the other state. The corresponding transition matrix would be P = 1 p p. p 1 p What is the probability P a [X n = b] =? For this we need to calculate P n. A complicated way would be to analyze the eigenvalues of P...

14 14 An easier way: Let µ n = P n (a, ). So µ n+1 = µ n P. Consider the vector π = (1/2, 1/2). Then πp = P. Now, consider a n = (µ n π)(a). Since µ n is a probability measure, we get that µ n (b) = 1 µ n (a), so a n = (µ n 1 π)p (a) = (1 p)µ n 1 (a) + pµ n 1 (b) 1/2 = (1 2p)(µ n 1 π)(a) + p π(a) + (1 2p)π(a) = (1 2p)a n 1. So a n = (1 2p) n a 0 = (1 2p) n 1 2 and P n (a, a) = µ n (a) = 1+(1 2p)n 2. (This also implies that P n (a, b) = 1 P n (a, a) = 1 (1 2p)n 2.) We see that P n = π. 1 1 π The following proposition relates starting distributions, and steps of the Markov chain, to matrix and vector multiplication. Proposition Let (X n ) n be a Markov chain with transition matrix P on some state space S. Let µ be some distribution on S; i.e. µ is an S-indexed vector with s µ(s) = 1. Then, P µ [X n = y] = (µp n )(y). Specifically, taking µ = δ x we get that P x [X n = y] = P n (x, y). Moreover, if f : S R is any function, which can be viewed as a S-indexed vector, then µp n f = E µ [f(x n )] and (P n f)(x) = E x [f(x n )]. Proof. This is shown by induction: It is the definition for n = 0 (P 0 = I the identity matrix). The Markov property gives for n > 0, using induction, P µ [X n = y] = s S P µ [X n = y X n 1 = s] P µ [X n 1 = s] = s P (s, y)(µp n 1 )(s) = ((µp n 1 )P )(y) = (µp n )(y). The second assertion also follows by conditional expectation, E µ [f(x n )] = s µ(s) E[f(X n ) X 0 = s] = s µ(s) x P[X n = x X 0 = s]f(x) = s,x µ(s)p n (s, x)f(x) = µp n f. (P n f)(x) = E x [f(x n )] is just for µ = δ x.

15 Classification of Markov chains When we spoke about graphs, we have the notion of connectivity. We are now interested to generalize this notion to Markov chains. We want to say that a state x is connected to a state y if there is a way to get from x to y; note that for general Markov chains this does not necessarily imply that one can get from y to x. Definition Let P be the transition matrix of a Markov chain on S. P is called irreducible if for every pair of states x, y S there exists t > 0 such that P t (x, y) > 0. This means that for every pair, there is a large enough time such that with positive probability the chain can go from one of the pair to the other in that time. Example Consider the cycle Z/nZ, for n even. This is an irreducible chain since for any x, y, we have for t = dist(x, y), if γ is a path of length t from x to y, P t (x, y) P x [(X 0,..., X t ) = γ] = 2 t > 0. Note that at each step, the Markov chain moves from the current position +1 or 1 (mod n). Thus, since n is even, at even times the chain must be at even vertices, and at odd times the chain must be at odd vertices. Thus, it is not true that there exists t > 0 such that for all x, y, P t (x, y) > 0. The main reason for this is that the chain has a period: at even times it is on some set, and at odd times on a different set. Similarly, the chain cannot be back at its starting point at odd times, only at even times. Definition Let P be a Markov chain on S. A state x is called periodic if gcd {t 1 : P t (x, x) > 0} > 1, and this gcd is called the period of x. If gcd {t 1 : P t (x, x) > 0} = 1 the x is called aperiodic. P is called aperiodic if all x S are aperiodic. Otherwise P is called periodic. Note that in the even-length cycle example, gcd {t 1 : P t (x, x) > 0} = gcd {2, 4, 6,...} = 2. Remark If P is periodic, then there is an easy way to fix P to become aperiodic: namely, let Q = αi +(1 α)p be a lazy version of P. Then, Q(x, x) α for all x, and thus Q is aperiodic.

16 16 Proposition Let P be a Markov chain on state space S. x is aperiodic if and only if there exists t(x) such that for all t > t(x), P t (x, x) > 0. If P is irreducible, then P is aperiodic if and only if there exists an aperiodic state x. Consequently, if P is irreducible and aperiodic, and if S is finite, then there exists t 0 such that for all t > t 0 all x, y admit P t (x, y) > 0. Proof. We start with the first assertion. Assume that x is aperiodic. Let R = {t 1 : P t (x, x) > 0}. Since P t+s (x, x) P t (x, x)p s (x, x) we get that t, s R implies t + s R; i.e. R is closed under addition. A number theoretic result tells us that since gcd R = 1 it must be that R c is finite. The other direction is simpler. If R c is finite, then R contains primes p q, so gcd R = gcd(p, q) = 1. For the second assertion, if P is irreducible and x is aperiodic, then let t(x) be such that for all t > t(x), P t (x, x) > 0. For any z, y let t(z, y) be such that P t(z,y) (z, y) > 0 (which exists by irreducibility). Then, for any t > t(y, x) + t(x) + t(x, y) we get that P t (y, y) P t(y,x) (y, x)p t t(y,x) t(x,y) (x, x)p t(x,y) (x, y) > 0. So for all large enough t, P t (y, y) > 0, which implies that y is aperiodic. This holds for all y, so P is aperiodic. The other direction is trivial from the definition. For the third assertion, for any z, y let t(z, y) be such that P t(z,y) (z, y) > 0. Let T = max z,y {t(z, y)}. Let x be an aperiodic state and let t(x) be such that for all t > t(x), P t (x, x) > 0. We get that for any t > 2T + t(x) we have that t t(z, x) t(x, z) t 2T > t(x), so P t (z, y) P t(z,x) (z, x)p t t(z,x) t(x,z) (x, x)p t(x,z) (x, z) > 0. Exercise 2.4. Let G be a finite connected graph, and let Q be the lazy random walk on G with holding probability α; i.e. Q = αi + (1 α)p where P (x, y) = 1 deg(x) P (x, y) = 0 if x y. if x y and Show that Q is aperiodic. Show that for diam(g) = max {dist(x, y) : x, y G} we have that for all t > diam(g), all x, y G admit Q t (x, y) > 0.

17 17 Number of exercises in lecture: 4 Total number of exercises until here: 4

18 18 Random Walks Ariel Yadin Lecture 3: Recurrence and Transience 3.1. Recurrence and Transience Notation: If (X t ) t is Markov-P on state space S, we can define the following: For A S, T A = inf {t 0 : X t A} and T + A = inf {t 1 : X t A}. These are the hitting time of A and return time to A. (We use the convention that inf =.) If A = {x} we write T x = T {x} and similarly T x + = T + {x}. Recall that we saw that the simple random walk on Z a.s. returns to the origin. We also stated that on Z 3 this is not true, and the simple random walk will never return to the origin with positive probability. Let us classify Markov chain according to these properties. Definition 3.1. Let P be a Markov chain on S. Consider a state x S. If P x [T x + = ] > 0, we say that x is a transient state. If P x [T x + < ] = 1, we say that x is recurrent. For a recurrent state x, there are two options: If E x [T x + ] < we say that x is positive recurrent. If E x [T x + ] = we say that x is null recurrent. Our first goal will be to prove the following theorem. Theorem 3.2. Let (X t ) t be a Markov chain on S with transition matrix P. If P is irreducible, then for any x, y S, x is (positive, null) recurrent if and only if y is (positive, null) recurrent. That is, for irreducible chains, all the states have the same classification.

19 Stopping Times A word about σ-algebras: Recall that the canonical σ-algebra we take on the space S N is the σ-algebra generated by the cylinder sets. A cylinder set is a set of the form { } ω S N : ω 0 = x 0,..., ω t = x t for some t 0. A S N is called a t-cylinder set if there exist x 0,..., x t S such that for every ω A we have ω j = x j for all j = 0,..., t. Recall the σ-algebra σ(x 0,..., X t ) = σ ( X 1 j (x) : x S, j = 0,..., t ) = σ ( A : A is a j-cylinder set for some j t ). Exercise 3.1. Define an equivalence relation on S N by ω t ω if ω j = ω j for all j = 0, 1,..., t. Show that this is indeed an equivalence relation. We say that en event A respects t if for any equivalent ω t ω we have that ω A if and only if ω A. Show that σ(x 0, X 1,..., X t ) = {A : A respects t }. The hitting and return times above have the property, that their value can be determined by the history of the chain; that is the event {T A t} is determined by (X 0, X 1,..., X t ). Definition 3.3 (Stopping Time). Consider a Markov chain on S. Recall that the probability space is (S N, F, P) where F is the σ-algebra generated by the cylinder sets. A random variable T : S N N { } is called a stopping time if for all t 0, the event {T t} σ(x 0,..., X t ). Example 3.4. Any hitting time and return time is a stopping time. Indeed, t {T A t} = {X j A}. j=0 Similarly for T + A. Example 3.5. Consider the simple random walk on Z 3. Let T = sup {t : X t = 0}. This is the last time the walk is at 0. One can show that T is a.s. finite. However, T is not a stopping time,

20 20 since for example {T = 0} = { t > 0 X t 0} = {X t 0} σ(x 0 ). t=1 Example 3.6. Let (X t ) t be a Markov chain and let T = inf {t T A : X t A }, where A, A S. Then T is a stopping time, since t k {T t} = {X m A, X k A }. k=0 m=0 Proposition 3.7. Let T, T be stopping times. The following holds: Any constant t N is a stopping time. T T and T T are stopping times. T + T is a stopping time. Proof. Since {t k} {, Ω}, the trivial σ-algebra, we get that {t k} σ(x 0,..., X k ) for any k. So constants are stopping times. For the minimum: {T T t} = {T t} {T t} σ(x 0,..., X t ). The maximum is similar: {T T t} = {T t} {T t} σ(x 0,..., X t ). For the addition, t {T + T t} = {T = k, T t k}. k=0 Since {T = k} = {T k} \ {T k 1} σ(x 0,..., X k ), we get that T + T is a stopping time Conditioning on a stopping time. Stopping times are extremely important in the theory of martingales, a subject we will come back to in the future. For the moment, the important property we want is the Strong Markov Property. For a fixed time t, we saw that the process (X t+n ) n is a Markov chain with starting distribution X t, independent of σ(x 0,..., X t ). We want to do the same thing for stopping times.

21 21 Let T be a stopping time. The information captured by X 0,..., X T, is the σ-algebra σ(x 0,..., X T ). This is defined to be the collection of all events A such that for all t, A {T t} σ(x 0,..., X t ). That is, σ(x 0,..., X T ) = {A : A {T t} σ(x 0,..., X t ) for all t}. One can check that this is indeed a σ-algebra. Exercise 3.2. Show that σ(x 0,..., X T ) is a σ-algebra. Important examples are: For any t, {T t} σ(x 0,..., X T ). Thus, T is measurable with respect to σ(x 0,..., X T ). X T is measurable with respect to σ(x 0,..., X T ) (indeed {X T = x, T t} σ(x 0,..., X t ) for all t and x). Proposition 3.8 (Strong Markov Property). Let (X t ) t be a Markov-P on S, and let T be a stopping time. For all t 0, define Y t = X T +t. Then, conditioned on T < and X T, the sequence (Y t ) t is independent of σ(x 0,..., X T ) and is Markov-(δ XT, P ). Proof. The (regular) Markov property tells us that for any m > k, and any event A σ(x 0,..., X k ), P[X m = y, A, X k = x] = P m k (x, y) P[A, X k = x]. We need to show that for all t, and any A σ(x 0,..., X T ), P[X T +t+1 = y X T +t = x, 1 A, T < ] = P (x, y) (provided of course that P[X T +t = x, A, T < ] > 0). Indeed this follows from the fact that A {T = k} σ(x 0,..., X k ) σ(x 0,..., X k+t ) for all k, so P[X T +t+1 = y,a, X T +t = x, T < ] = P[X k+t+1 = y, X k+t = x, A, T = k] = k=0 P (x, y) P[X k+t = x, A, T = k] = P (x, y) P[X T +t = x, A, T < ]. k=0 Another way to state the above proposition is that for a stopping time T, conditional on T < we can restart the Markov chain from X T.

22 Excursion Decomposition We now use the strong Markov property to prove the following: Example 3.9. Let P be an irreducible Markov chain on S. Fix x S. Define inductively the following stopping times: T (0) x T (k) x So T (k) x is the time of the k-th return to x. = 0, and { } = inf t T x (k 1) + 1 : X t = x. Let V t (x) be the number of visits to x up to time t; i.e. V t (x) = t k=1 1 {X k =x}. It is immediate that V t (x) k if and only if T (k) x t. Now let us look at the excursions to x: The k-th excursion is X[T (k 1) x, T (k) x ] = (X T (k 1) x, X T (k 1) x +1,..., X ). T (k) These excursions are paths of the Markov chain ending at x and starting at x (except, possibly, the first excursion which starts at X 0 ). For k > 0 define if T (k) x < and 0 otherwise. For T (k) x τ (k) x We claim that conditioned on T (k 1) x = T (k) x T (k 1) x, <, this is the length of the k-th excursion. <, the excursion X[T (k 1) x x, T (k) ], is independent of σ(x 0,..., X (k 1) T ), and has the distribution of the first excursion X[0, T x + ] conditioned on x X 0 = x. Indeed, let Y t = X (k 1) T x +t. For any A σ(x 0,..., X (k 1) T ), and for any path γ : x x, x since X (k 1) T = x, x P[Y [0, τ (k) x ] = γ A, T (k 1) x ] = P[X[T (k 1), T (k) x x ] = γ A, T (k 1) x x < ] = P x [X[0, T + x ] = γ], where we have used the strong Markov property. This gives rise to the following relation: Lemma Let P be an irreducible Markov chain on S. Then, (P x [T + x < ]) k = P x [V (x) k] = P x [T (k) x < ]. Consequently, where 1/0 =. 1 + E x [V (x)] = 1 P x [T + x = ],

23 23 Proof. The event {V (x) k} is the event that x is visited at least k times, which is exactly the event that the k-th excursion ends at some finite time. From the example above we have that for any m, Since P[T (m) x < T (m 1) x (m 1) < ] = P[ t 1 : X (m 1) T = x T x +t x < ] = P x [T x + < ]. } { } { T x (m) < = T x (m) <, T x (m 1) <, we can inductively conclude that P x [T (k) x < ] = P x [T (k) x < T (k 1) x = = (P x [T + x < ]) k k=0 < ] P[T (k 1) x < ] The second assertion follows from the fact that E x [V (x)] = P x [V (x) k] = 1 P x [T x + < ], where this holds even if P x [T + x < ] = 1. Similarly, one can prove: Exercise 3.3. Let (X t ) t be Markov-(S, P ) for some irreducible P. Let Z S. Show that under P x, the number of visits to x until hitting Z (i.e. the random variable V = V TZ (x) + 1 {X0=x}) is distributed geometric-p, for p = P x [T Z < T + x ]. We now get the following important characterization of recurrence in Markov chains: Corollary Let P be an irreducible Markov chain on S. Then the following are equivalent: (1) x is recurrent. (2) P x [V (x) = ] = 1. (3) For any state y, P x [T + y < ] = 1. (4) E x [V (x)] =. Proof. If x is recurrent, then P x [T + x < ] = 1. So for any k, P x [V (x) k] = 1. Taking k to infinity, we get that P x [V (x) = ] = 1. This is the first implication. For the second implication: Let y S. Let E k = X[T (k 1) x, T (k) x ] be the k-th excursion from x. We assumed that P x [ k T (k) x < ] = 1. So under P x, all (E k ) are independent and identically distributed.

24 24 Since P is irreducible, there exists t > 0 such that P x [X t = y, t < T + x ] > 0 (this is an exercise). Thus, we have that p := P x [T y < T + x ] P x [X t = y, t < T + x ] > 0. This implies by the strong Markov property that P x [T y < T (k+1) x T y > T (k) x So, using the fact that P x [ k T (k) x < ] = 1,, T (k) x < ] p > 0. P x [T y T x (k) ] = P x [T y T x (k) T y > T x (k 1), T x (k 1) < ] P x [T y > T x (k 1) ] (1 p) P x [T y T (k 1) x ] (1 p) k. Thus, P x [T + y = ] P x [ k, T y T x (k 1) ] = lim (1 k p)k = 0. This proves the second implication. Finally, if for any y we have P x [T + y This shows that (1),(2),(3) are equivalent. < ] = 1, then taking y = x shows that x is recurrent. It is obvious that (2) implies (4). Since P x [T x + 1 = ] = E x[v (x)]+1, we get that (4) implies (1). Exercise 3.4. Show that if P is irreducible, there exists t > 0 such that P x [X t = y, t < T + x ] > 0. Solution to ex:3.4. :( There exists n such that P n (x, y) > 0 (because P is irreducible). Thus, there is a sequence x = x 0, x 1,..., x n = y such that P (x j, x j+1 ) > 0 for all 0 j < n. Let m = max{0 j < n : x j = x}, and let t = n m and y j := x m+j for 0 j t. Then, we have the sequence x = y 0,..., y t = y so that y j x for all 0 < j t, and we know that P (y j, y j+1 ) > 0 for all 0 j < t. Thus, P x [X t = y, t < T + x ] P x [ 0 j t, X j = y j ] = P (y 0, y 1 ) P (y t 1, y t ) > 0. :)

25 25 Example A gambler plays a fair game. Each round she wins a dollar with probability 1/2, and loses a dollar with probability 1/2, all rounds independent. What is the probability that she never goes bankrupt, if she starts with N dollars? We have already seen that this defines a simple random walk on Z, and that E 0 [V t (0)] c t. Thus, taking t we get that E 0 [V (0)] =, and so 0 is recurrent. Note that 0 here was not special, since all vertices look the same. This symmetry implies that P x [T x + < ] = 1 for all x Z. Thus, for any N, P N [T 0 + = ] = 0. That is, no matter how much money the gambler starts with, she will always go bankrupt eventually. We now have part of Theorem 3.2. Corollary Let P be an irreducible Markov chain on S. Then, for any x, y S, x is transient if and only if y is transient. Proof. As usual, by irreducibility, for any pair of states z, w we can find t(z, w) > 0 such that P t(z,w) (z, w) > 0. Fix x, y S and suppose that x is transient. For any t > 0, P t+t(x,y)+t(y,x) (x, x) P t(x,y) (x, y)p t (y, y)p t(y,x) (y, x). Thus, E y [V (y)] = P t 1 (y, y) P t(x,y) (x, y)p t(y,x) (y, x) t=1 So y is transient as well. P t+t(x,y)+t(y,x) (x, x) <. t=1 Number of exercises in lecture: 4 Total number of exercises until here: 8

26 26 Random Walks Ariel Yadin Lecture 4: Stationary Distributions 4.1. Stationary Distributions Suppose that P is a Markov chain on state space S such that for some starting distribution µ, we have that P µ [X n = x] π(x) where π is some limiting distribution. One immediately checks that in this case we must have πp (x) = s lim P n (y, s)p (s, x) = lim P n+1 (y, x) = π(x), n n or πp = π. (That is, π is a left eigenvector for P with eigenvalue 1.) Definition 4.1. Let P be a Markov chain. If π is a distribution satisfying πp = π then π is called a stationary distribution. Example 4.2. Recall the two-state chain P = 1 p p. We saw that P p 1 p 1 1 Indeed, it is simple to check that π = (1/2, 1/2) is a stationary distribution in this case. Example 4.3. Consider a finite graph G. Let P be the transition matrix of a simple random walk on G. So P (x, y) = 1 deg(x) 1 {x y}. Or: deg(x)p (x, y) = 1 {x y}. Thus, deg(x)p (x, y) = deg(y). So deg is a left eigenvector for P with eigenvalue 1. Since x deg(x) = x x 1 {x y} = 2 y e E(G) = 2 E(G), we normalize π(x) = deg(x) 2 E(G) to get a stationary distribution for P. The above stationary distribution has a special property, known as the detailed balance equation: A distribution π is said to satisfy the detailed balance equation with respect to a transition matrix P if for all states x, y π(x)p (x, y) = π(y)p (y, x).

27 27 Exercise 4.1. If π satisfies the detailed balance equations, then π is a stationary distribution. We will come back to such distributions in the future Stationary Distributions and Hitting Times There is a deep connection between stationary distributions and return times. result here is: The main Theorem 4.4. Let P be an irreducible Markov chain on state space S. Then, the following are equivalent: P has a stationary distribution π. Every x is positive recurrent. Some x is positive recurrent. P has a unique stationary distribution, π(x) = 1 E x[t + x ]. The proof of this theorem goes through a few lemmas. In the next lemma we will consider a function (vector) v : S [0, ]. Although it may take the value, since we are only dealing with non-negative numbers we can write vp (x) = y v(y)p (y, x) without confusion (with the convention that 0 = 0). Lemma 4.5. Let P be an irreducible Markov chain on state space S. Let v : S [0, ] be such that vp = v. Then: If there exists a state x such that v(x) < then v(y) < for all states y. If v is not the zero vector, then v(y) > 0 for all states y. Note that this implies that if π is a stationary distribution then all the entries of π are strictly positive. Proof. For any t, using the fact that v 0, v(x) = z v(z)p t (z, x) v(y)p t (y, x).

28 28 Thus, for a suitable choice of t, since P is irreducible, we know that P t (y, x) > 0, and so v(y) v(x) P t (y,x) <. For the second assertion, if v is not the zero vector, since it is non-negative, there exists a state x such that v(x) > 0. Thus, for any state y and for t such that P t (x, y) > 0 we get v(y) = z v(z)p t (z, y) v(x)p t (x, y) > 0. Notation: Recall that for a Markov chain (X t ) t we denote by V t (x) = t k=1 1 {X k =x} the number of visits to x. Lemma 4.6. Let (X t ) t be Markov-(P, µ) for irreducible P. Assume T is a stopping time such that P µ [X T = x] = µ(x) for all x. Assume further that 1 T < P µ -a.s. Let v(x) = E µ [V T (x)]. Then, vp = v. Moreover, if E µ [T ] < then P has a stationary distribution π(x) = v(x) E µ[t ]. Proof. The assumptions on T give that for any j, µ(x) = P µ [X T = x] = P µ [X j = x, T = j]. j=0 Thus we have that j=1 P µ [X j = y, T > j] = P µ [X 0 = y] + P µ [X j = y, T > j] j=1 = = j=1 P µ [X j = y, T = j] + P µ [X j = y, T > j] j=1 P µ [X j = y, T j] = v(y). j=1 v(x) = P µ [X j = x, T j] = P µ [X j+1 = x, T > j] = j=0 P µ [X j+1 = x, X j = y, T > j] j=0 = y y P µ [X j = y, T > j]p (y, x) = (vp )(x). j=0

29 29 That is, vp = v. Since v(x) = E µ [ x x V T (x)] = E µ [T ], if E µ [T ] <, then π(x) = v(x) E µ[t ] defines a stationary distribution. Example 4.7. Consider (X t ) t that is Markov-P for an irreducible P, and let v(y) = E x [V T + x (y)]. If x is recurrent, then P x -a.s. we have 1 T x + <, and P x [X T + x = y] = 1 {y=x} = P x [X 0 = y]. So we conclude that vp = v. Since P x -a.s. V T + x (x) = 1, we have that 0 < v(x) = 1 <, so 0 < v(y) < for all y. Note that although it may be that E x [T x + ] =, i.e. x is null-recurrent, we still have that for any y, E x [V T + x (y)] <, i.e. the expected number of visits to y until returning to x is finite. If x is positive recurrent, then π(y) = Ex[V T x + (y)] is a stationary distribution for P. E x[t + x ] This vector plays a special role, as in the next Lemma. Lemma 4.8. Let P be an irreducible Markov chain. Let u(y) = E x [V T + x (y)]. Let v 0 be a non-negative vector such that vp = v, and v(x) = 1. Then, v u. Moreover, if x is recurrent, then v = u. Proof. If y = x then v(x) = 1 u(x), so we can assume that y x. We will prove by induction that for all t, for any y x, (4.1) Indeed, for t = 1 this is just t P x [X k = y, T x + k=1 k] v(y). P x [X 1 = y, T + x 1] = P (x, y) z v(z)p (z, y) = v(y), since v 0, v(x) = 1 and y x. For general t > 0, we rely on the fact that by the Markov property, for any y x, P x [X k+1 = y, T + x k+1] = z x P x [X k+1 = y, X k = z, T + x k] = z x P x [X k = z, T + x k]p (z, y).

30 30 So by induction, t+1 t P x [X k = y, T x + k] = P (x, y) + P x [X k+1 = y, T x + k + 1] k=1 k=1 = P (x, y) + t P (z, y) P x [X k = z, T x + k] z x k=1 P (x, y) + z x P (z, y)v(z) = z v(z)p (z, y) = v(y). This completes a proof of (4.1) by induction. Now, one notes that the left-hand side of (4.1) is just the expected number of visits to y started at x, up to time T + x t. Taking t, using monotone convergence, t v(y) P x [X k = y, T x + k=1 k] = E x [V T + x t (y)] u(y). This proves that v u. Since x is recurrent, we have up = u, and u(x) = 1 = v(x). We have seen that v u 0, and of course (v u)p = v u. Until now we have not actually used irreducibility; we will use this to show that v u = 0. Indeed, let y be any state. If v(y) > u(y) then v u is a non-zero non-negative left eigenvector for P, so must be positive everywhere. This contradicts v(x) u(x) = 0. So it must be that v u 0. We are now in good shape to prove Theorem 4.4. Proof of Theorem 4.4. Assume that π is a stationary distribution for P. Fix any state x. Recall that π(x) > 0. Define the vector v(z) = π(z) π(x). We have that v 0, vp = v and v(x) = 1. Hence, v(z) E x [V T + x (z)] for all z. That is, E x [T + x ] = y E x [V T + x (y)] y v(y) = y π(y) π(x) = 1 π(x) <. So x is positive recurrent. This holds for a generic x. The second bullet of course implies the third. Now assume some state x is positive recurrent. Let v(y) = E x [V T + x (y)]. Since x is recurrent, we know that vp = v, and y v(y) = E x[t + x ] <. So π = for P. v E x[t + x ] is a stationary distribution Since P has a stationary distribution, by the first implication all states are positive recurrent. Thus, for any state z, if v = π π(z) then vp = v and v(z) = 1. So z being recurrent we get that

31 31 v(y) = E z [V T + z (y)] for all y. Specifically, E z [T + z ] = y v(y) = 1 π(z), which holds for all states z. For the final implication, if P has a specific stationary distribution, then of course it has a stationary distribution. Corollary 4.9 (Stationary distributions are unique). If an irreducible Markov chain P has two stationary distributions π and π, then π = π. Exercise 4.2. Let P be an irreducible Markov chain. Show that for positive recurrent states x, y, E x [V T + x (y)] E y [V T + y (x)] = Transience, positive or null recurrence are properties of the chain We also have now shown that Theorem* (3.2). [restatement] Let P be an irreducible Markov chain. For any two states x, y: x is transient / null recurrent / positive recurrent if and only if y is transient / null recurrent / positive recurrent. Proof. We have seen that P x [T x + 1 = ] = 1 + E x [V (x)] implies that x is transient if and only if y is transient. Now, if x is positive recurrent, then P has a stationary distribution, so all states, including y are positive recurrent. In light of this: Definition Let P be an irreducible Markov chain. We say that P is transient, if there exists a transient state. P is null recurrent if there exists a null recurrent state. P is positive recurrent if there exists a positive recurrent state.

32 32 Number of exercises in lecture: 2 Total number of exercises until here: 10

33 33 Random Walks Ariel Yadin Lecture 5: Positive Recurrent Chains 5.1. Simple Random Walks Last lecture we proved that an irreducible Markov chain P has a stationary distribution if and only if P is positive recurrent, and the stationary distribution is the reciprocal of the expected return time. Let s investigate what this means in the setting of a simple random walk on a graph. Example 5.1. Let G be a graph, and let P be the simple random walk on G; that is, P (x, y) = 1 deg(x) 1 {x y}. First, it is immediate that P is irreducible. This was shown in the exercises. Consider the vector v(x) = deg(x). We have that deg(x)p (x, y) = x x 1 deg(x) deg(x) 1 {x y} = deg(y). That is, vp = v. If we take u(y) = v(y)/v(x) for some x, then up = u and u(x) = 1. Thus, if P is recurrent, then E x [V T + x (y)] = u(y) = deg(y) deg(x) for all x, y. This does not depend on dist(x, y)! Another observation is that x v(x) = 2 E(G). That is, P is positive recurrent if and only if G is finite. Moreover, in this case, the stationary distribution for P is π(x) = deg(x) 2 E(G). Note that if G is a finite regular graph then the stationary distribution on G is the uniform distribution. Example 5.2. Recall the simple random walk on Z. We already have seen that this is a recurrent Markov chain. Thus, if vp = v, then v(y) = E x [V T + x (y)]v(x) for all x, y. Since the constant vector 1 satisfies 1P = 1, we get that E x [V T + x (y)] = 1 for all x, y. Thus, any v such that vp = v must admit v c. So there is no stationary distribution on Z; that is, Z is null-recurrent. (We could have also deduced this from the previous example.)

34 34 Example 5.3. Consider a different Markov chain on Z: Let P (x, x+1) = p and P (x, x 1) = 1 p for all x. Suppose vp = v. Then, v(x) = v(x 1)p+v(x+1)(1 p), or v(x+1) = 1 [ ] 1 p (v(x) pv(x 1)) τ Solving such recursions is simple: Set u x = v(x + 1) v(x). So ux+1 = 1 1 p Au x where A = 1 p. 1 p 0 Since the characteristic polynomial of A is λ 2 λ + p(1 p) = (λ p)(λ (1 p)), we get that the eigenvalues of A are p and 1 p. One can easily check that A is diagonalizable, and so ( x v(x) = u x (2) = (1 p) x (A x u 0 )(2) = (1 p) x [0 1]MDM 1 u 0 = a p 1 p) + b, where D is diagonal with p, 1 p on the diagonal, and a, b are constants that depend on the matrix M and on u 0 (but independent of x). Thus, x v(x) will only converge for a = 0, b = 0 which gives v = 0. That is, there is no stationary distribution, and P is not positive recurrent. In the future we will in fact see that P is transient for p 1/2, and for p = 1/2 we have already seen that P is null-recurrent. Example 5.4. A chess knight moves on a chess board, each step it chooses uniformly among the possible moves. Suppose the knight starts at the corner. What is the expected time it takes the knight to return to its starting point? At first, this looks difficult... However, let G be the graph whose vertices are the squares of the chess board, V (G) = {1, 2,..., 8} 2. Let x = (1, 1) be the starting point of the knight. For edges, we will connect two vertices if the knight can jump from one to the other in a legal move.

35 35 Thus, for example, a vertex in the center of the board has 8 adjacent vertices. A corner, on the other hand has 2 adjacent vertices. In fact, we can determine the degree of all vertices o legal moves: Summing all the degrees, one sees that 2 E(G) = 4 ( ) = 4 84 = 336. Thus, the stationary distribution is π(i, j) = deg(i, j)/336. Specifically, π(x) = 2/336 and so E x [T + x ] = Summary so far Let us sum up what we know so far about irreducible chains. If P is an irreducible Markov chain, then: E x [V (x)] + 1 = 1 P x[t + x = ]. For all states x, y, x is transient if and only if y is transient. If P is recurrent, the vector v(z) = E x [V T + x (z)] is a positive left eigenvector for P, and any non-negative left eigenvector for P is proportional to v. P has a stationary distribution if and only if P is positive recurrent. If P is positive recurrent, then π(x) E x [T + x ] = Positive Recurrent Chains Recall that Lemma 4.6 connects the expected number of visits to x up to an appropriate stopping time, to the stationary distribution and the expected value of the stopping time: Lemma* (4.6). [restatement] stopping time such that Let (X t ) t be Markov-(P, µ) for irreducible P. Assume T is a P µ [X T = x] = µ(x) for all x. Assume further that 1 T < P µ -a.s. Let v(x) = E µ [V T (x)]. Then, vp = v. Moreover, if E µ [T ] < then P has a stationary distribution π(x) = v(x) E µ[t ].

36 36 Good choices of the stopping time T for positive recurrent chains will give some nice identities. Proposition 5.5. Let P be a positive recurrent chain with stationary distribution π. Then, E x [T + x ] = 1 π(x). E x [V T + x (y)] = π(y) π(x). For x y, 1 + E x [V T + y (x)] = π(x) (E y [T + x ] + E x [T + y ]). For x y, π(x) P x [T + y < T + x ] (E y [T + x ] + E x [T + y ]) = 1. (This is sometimes called the edge commute inequality. It will be important in the future.) For x y, Proof. We have: E x [T y ] + E y [T x ] Follows by choosing T = T + x in Lemma π(x)p (x, y). We have already seen this. It also follows by choosing T = T + x in Lemma 4.6. Let T = inf {t T x + 1 : X t = y}. So E y [T ] = E y [T + x ] + E x [T + y ]. Since P y [T = z] = 1 {z=y}, we can apply Lemma 4.6. The strong Markov property at time T x gives that So by Lemma 4.6, E y [V T (x)] = E y [ T x k T 1 {Xk =x}] = E x [V T + y (x)] + 1. E x [V T + y (x)] = E y [V T (x)] 1 = π(x) E y [T ] 1 = π(x) (E y [T + x ] + E x [T + y ]) 1. This follows from the previous bullet since P x -a.s. V T + y (x) + 1 Geo(p) for p = P x [T + y < T + x ]. Since for x y we have that P x [T + y from the previous bullet. Number of exercises in lecture: 0 Total number of exercises until here: 10 < T + x ] P x [X 1 = y] = P (x, y), we get the assertion

37 37 Random Walks Ariel Yadin Lecture 6: Convergence to Equilibrium 6.1. Convergence to Equilibrium Recall that we saw that if P t (y, x) π(x) for all x, then π must be a stationary distribution. We will start to work our way to prove the opposite, at least for irreducible and aperiodic chains. Our goal: Theorem* (6.5). [restatement] Let (X t ) t be an irreducible and aperiodic Markov chain. Suppose that π is a stationary distribution for this chain. Then, for any starting distribution µ, and any state x, P µ [X t = x] π(x) Couplings Example 6.1. Two gamblers walk into a casino in Las Vegas. The first one plays a fair game - every round she wins a dollar with probability 1/2, and loses a dollar with probability 1/2. All rounds are independent. The second gambler plays an unfair game - every round he wins a dollar with probability p < 1/2, and loses a dollar with probability 1 p, again all rounds independent. It is extremely intuitive that the second gambler is worse off than the first one. It should be the case that the probability of the second gambler to go bankrupt is at least the probability of the first one. Also, it seems that any reasonable measure of success should be larger for the first gambler than for the second. How can we mathematically prove this? For example, we would like to show that for all starting positions N and any M > N, we have that P 1 N[T 0 < T M ] P 2 N [T 0 < T M ]. How can we show this? The idea is to use couplings.

38 38 Definition 6.2. A coupling of Markov chains P, Q on a state space S, is a stochastic process (X t, Y t ) t such that (X t ) t is Markov-P and (Y t ) t is Markov-Q. Note that (X t, Y t ) t need not be a Markov chain on S 2. If a coupling (X t, Y t ) t is in addition a Markov chain on S 2, then we say that (X t, Y t ) t is a Markovian coupling. If R is the transition matrix for the Markovian coupling (X t, Y t ) t, we say that R is a coupling of P, Q. Example 6.3. Let us use a Markovian coupling to show that lowering the winning probability for a gambler, lowers their chances of winning. Let p < q, and let P be the transition matrix on N for the gambler that wins with probability p, and let Q be the transition matrix for the gambler that wins with probability q. That is, P (n, n + 1) = p and P (n, n 1) = 1 p for all n > 0, and P (0, 0) = 1. Similarly for Q. The corresponding Markov chains are (X t ) t for P and (Y t ) t for Q. We can could the chains as follows: Given (X t, Y t ), since Y moves up with higher probability than X, we can organize a coupling such that Y t+1 X t+1 in any case. That is, given (X t, Y t ), if X t > 0 let (1, 1) with probability p (X t+1, Y t+1 ) = (X t, Y t ) + ( 1, 1) with probability q p ( 1, 1) with probability 1 q. If X t = 0, Y t > 0 let (X t+1, Y t+1 ) = (X t, Y t ) + (0, 1) with probability q (0, 1) with probability 1 q. If X t = Y t = 0 let X t+1 = Y t+1 = 0 It is immediate to check that this is indeed a coupling of P and Q, and that Y t X t for all t provided that Y 0 X 0. One can check that the resulting transition matrix is: p i = 1, j = 1n, m > 0 q p i = 1, j = 1, n, m > 0 1 q i = 1, j = 1, n, m > 0 R((n, m), (n + i, m + j)) = q i = 0, j = 1, n = 0, m > 0 1 q i = 0, j = 1, n = 0, m > 0 1 i = 0, j = 0, n = m = 0. So this is a Markovian coupling.

39 39 Thus, for any M > N, P Q N [T 0 < T M ] = P R (N,N)[ t : Y t = 0 and n < t Y n < M] P R (N,N)[ t : X t = 0 and n < t X n < M] = P P N [T 0 < T M ], where P P, P Q, P R denote the probability measures for P, Q, and R respectively, and we have used the fact that under P R (N,N), a.s. X t Y t for all t Coupling Time. Lemma 6.4. Let (X t, Y t ) t be a Markovian coupling of two Markov chains on the same state space S with the same transition matrix P. Define the coupling time as τ = inf {t 0 : X t = Y t }. This is a stopping time for the Markov chain (X t, Y t ) t. Define X t t τ Z t = Y t t τ. Then, (Z t ) t is a Markov chain with transition matrix P, started from Z 0 = X 0. Specifically, (Z t, Y t ) t is a coupling of Markov chains such that for all t τ, Z t = Y t. Proof. Since {τ t + 1} = {τ < t + 1} c time t gives σ((x 0, Y 0 ),..., (X t, Y t )), the Markov property at P[Z t+1 = y Z t = x, τ t+1, Z t 1,..., Z 0 ] = P[X t+1 = y X t = x, τ t+1, X t 1,..., X 0 ] = P (x, y). Since τ is a stopping time, we can use the strong Markov property to deduce that for any t, P[Z t+1 = y Z t = x, τ t, Z t 1,..., Z 0 ] = P[Y t+1 = y Y t = x,..., Y τ ] = P (x, y). Thus, for any t, P[Z t+1 = y Z t = x, Z t 1,..., Z 0 ] = P[Z t+1, τ t + 1 Z t = x, Z t 1,..., Z 0 ] + P[Z t+1, τ t Z t = x, Z t 1,..., Z 0 ] = P (x, y) (P[τ t + 1 Z t = x, Z t 1,..., Z 0 ] + P[τ t Z t = x, Z t 1,..., Z 0 ]) = P (x, y).

40 The Convergence Theorem In this section we will prove a fundamental result in the theory of Markov chains. Theorem 6.5. Let P be an irreducible and aperiodic Markov chain. If P has a stationary distribution π, then for any starting distribution µ, and any state x, P µ [X t = x] π(x). Proof. Let (Y t ) t be Markov-(π, P ) independent of (X t ) t. Since πp t = π, we have that π(x) = P[Y t = x]. Let τ be the coupling time of (X t, Y t ) t. First we show that P[τ < ] = 1, so P[τ > t] 0. Indeed, (X t, Y t ) t is a Markov chain on S 2, with transition matrix Q((x, y), (x, y )) = P (x, x )P (y, y ). Moreover, for χ(x, y) = π(x)π(y), we get that χ is stationary distribution for Q. We claim that since P is irreducible and aperiodic, then Q is also irreducible (and aperiodic). Indeed, let (x, y), (x, y ) S 2. We already saw that there exist t(x, x ), t(y, y ) such that for all t > t(x, x ), P t (x, x ) > 0 and for all t > t(y, y ), P t (y, y ) > 0. t > max {t(x, x ), t(y, y )} we have that Q t ((x, x ), (y, y )) > 0. Thus, Q is irreducible. Thus, for all Since Q has a stationary distribution and Q is irreducible, we get that Q is positive recurrent. Specifically, P[T (x,x) < ] = 1 for any x S. Since τ T (x,x), we get that P[τ < ] = 1. Now define Y t t τ Z t = X t t τ. So (X t, Z t ) t is a coupling of Markov chains such that for all t τ, X t = Z t. Also, since Z 0 = Y 0 π, P[Z t = x] = P[Z t = x, t < τ] + P[Z t = x, t τ] = P[Z t = x, t < τ] + P[X t = x, t τ]. Adding this to we get that P[X t = x] = P[X t = x, t < τ] + P[X t = x, t τ], P[X t = x] P[Z t = x] = P[X t = x, t < τ] P[Z t = x, t < τ] P[τ > t] 0. Finally, the previous lemma tells us that (Z t ) t is Markov-(S, P, π), most importantly, starting distribution π. So P[Z t = x] = π(x).

41 41 Number of exercises in lecture: 0 Total number of exercises until here: 10

That is, we want the notions of conditional probability and conditional expectation. The notion of conditional expectation is central to probability.

42 42 Random Walks Ariel Yadin Lecture 7: Conditional Expectation 7.1. Conditional Probability Recall that we want to define a random walk. A (simple) random walk is a process that given the current location chooses among the available neighbors uniformly. So we need a way of conditioning on the current position. That is, we want the notions of conditional probability and conditional expectation. The notion of conditional expectation is central to probability. It is developed using the Radon-Nikodym derivative from measure theory: ohann Radon ( ) tto Nikodym ( ) Theorem 7.1. Let µ, ν be two probability measures on (Ω, F). Suppose that µ is absolutely continuous with respect to ν; that is, ν(a) = 0 implies that µ(a) = 0 for all A F. Then, there exists a (ν-a.s. unique) random variable dµ dν A F, E µ [1 A ] = E ν [1 A dµ dν ]. Lebesgue integrals give the following form: A dµ = which can be informally stated as dµ dν dν = dµ. A dµ dν dν, This theorem is used to prove the following theorem. on (Ω, F, ν) such that for any event Theorem 7.2. Let X be a random variable on a probability space (Ω, F, P) such that E[ X ] <. Let G F be a sub-σ-algebra of F. Then, there exists a (P-a.s. unique) G- measurable random variable Y such that for all A G, E[Y 1 A ] = E[X1 A ]. Notation: An X such as above is called integrable. Notation: If Y is G-measurable then we write Y G.

43 43 Definition 7.3. Let X be an integrable (E[ X ] < ) random variable on a probability space (Ω, F, P). Let G F be a sub-σ-algebra of F. The random variable from the above theorem is denoted E[X G]. If Y is a random variable on (Ω, F, P) then we denote E[X Y ] := E[X σ(y )]. If A F is any event then we write P[A G] := E[1 A G]. Proof of Theorem 7.2. Note that uniqueness is immediate from the fact that if Y, Y are two such random variables, then for A n = { Y Y n 1} we have that A n G (as a function of (Y, Y )) and P[A n ]n 1 E[(Y Y )1 An ] = E[X1 An ] E[X1 An ] = 0. So by continuity of probability, P[Y > Y ] = P[ n A n ] = lim n P[A n ] = 0. Exchanging the roles of Y and Y we get that P[Y Y ] = 0. For existence we use the Radon-Nikodym derivative: First assume that X 0. Then, define a probability measure on (Ω, G) by A G Q(A) = E[X1 A]. E[X] If P[A] = 0 then Q(A) = 0 (e.g. by Cauchy-Schwartz E[X1 A ] 2 Q << P. So the Radon-Nikodym derivative exists and for all A G, E[X1 A ] = E[ dq d P 1 A] E[X]. Taking Y = dq d P E[X] completes the case of X 0. E[X 2 ] P[A] = 0); that is For the general case, recall that X = X + X, and X +, X are non-negative. Let Y 1 = E[X + G] and Y 2 = E[X G]. Then, Y 1 Y 2 G and for any A G, E[X1 A ] = E[X + 1 A ] E[X 1 A ] = E[(Y 1 Y 2 )1 A ]. Thus, Y = Y 1 Y 2 completes the proof. Note that to prove that Y = E[X G] one needs to show two things: Y G and E[Y 1 A ] = E[X1 A ] for all A G. Important: Conditional expectation E[X G] is the average value of X given the information

44 44 in G; this is a random variable, not a number as is the usual expectation. One needs to be careful with this. Whenever we write E[X G] = Z we actually mean that E[X G] = Z a.s. Exercise 7.1. Let X be an integrable random variable on (Ω, F, P). Let G F be a sub-σ-algebra. Then, If X G then E[X G] = X. [ The average value of X given X is X itself. ] If G = {, Ω} then E[X G] = E[X]. [ Given no information, the average value of X is E[X]. ] If X = c for c a constant, then X is measurable with respect to the trivial σ-algebra {, Ω} G, so E[c G] = c. If X is independent of G then E[X G] = E[X]. [ Given no information about X, the average value of X is E[X]. ] E[E[X G]] = E[X]. Solution. It is trivial that E[X1 A ] = E[X1 A ] so if X G then X satisfies both properties required to be a conditional expectation. Again, constants are measurable with respect to any σ-algebra. For the second property, E[X1 ] = 0 = E[E[X]1 ] and E[X1 Ω ] = E[X]. Easy. Follows from the previous bullets. If X is independent of G, then for any A G, E[X1 A ] = E[X] P[A] = E[E[X]1 A ]. Also, E[X] G since constants are measurable with respect to any σ-algebra. Consider the event Ω G. Since 1 = 1 Ω we get that E[X] = E[X1 Ω ] = E[E[X G]1 Ω ] = E[E[X G]]. Exercise 7.2. not change the conditioning. ] If Y = Y a.s. then E[X Y ] = E[X Y ]. [ Changing by measure 0 does Hint: Consider E[X σ(y ) σ(y )].

45 Solution. It suffices to prove that if G and G are σ-algebras such that if A G G then P[A] = 0 (that is, G and G only differ on measure 0 events) then E[X G] = E[X G ] a.s. G G is a σ-algebra as an intersection of σ-algebras.

45 45 Solution. It suffices to prove that if G and G are σ-algebras such that if A G G then P[A] = 0 (that is, G and G only differ on measure 0 events) then E[X G] = E[X G ] a.s. G G is a σ-algebra as an intersection of σ-algebras. Let Z = E[X G G ]. Since G G G and G G G we have that Z is both G and G measurable. Moreover, for any A G: if A G then P[A] = 0 so E[X1 A ] = 0 = E[Z1 A ]. If A G then A G G so E[X1 A ] = E[Z1 A ] by definition. Thus, Z = E[X G]. Similarly, exchanging the roles of G and G, we get Z = E[X G ], so E[X G] = E[X G ] a.s. Exercise 7.3. E[aX + Y G] = a E[X G] + E[Y G] a.s. Solution. The right hand side is of course G-measurable. For any A G, E[(aX+Y )1 A ] = a E[X1 A ]+E[Y 1 A ] = a E[E[X G]1 A ]+E[E[Y G]1 A ] = E[(a E[X G]+E[Y G])1 A ]. Exercise 7.4. If X Y then E[X G] E[Y G]. Solution. Since Y X 0 is suffices to show that if X 0 then E[X G] 0 a.s. Let A n = { E[X G] n 1}. So A n G and P[A n ]n 1 E[E[X G]1 An ] = E[X1 An ] 0. So P[A n ] = 0 for all n, and thus P[E[X G] < 0] = P[ n : A n ] = 0. Exercise 7.5. Let G G. Show that for any event A with P[A] > 0, P[G A] = E[P[A G]1 G]. P[A] Thomas Bayes ( )

46 46 Solution. Note that since G G, by definition, E[P[A G]1 G ] = E[1 A 1 G ] = P[A G] More Properties Proposition 7.4 (Monotone Convergence). If (X n ) n is a monotone non-decreasing sequence of non-negative integrable random variables, such that X n X for some integrable X, then E[X n G] E[X G] a.s. Proof. Let Y n = X X n. Since X n X, we get that Y n 0 for all n. Thus, (E[Y n G]) n is a monotone non-increasing sequence of non-negative random variables. Let Z(ω) = inf n E[Y n G](ω) = lim n E[Y n G](ω) = lim inf n E[Y n G](ω). So Z G and Z 0. Fatou s Lemma gives that for any A G, E[Z] lim inf n E[E[Y n G]] = lim inf E[X X n ] = 0, n since E[X n ] E[X] by monotone convergence. Thus, Z = 0 a.s. This implies that E[X G] E[X n G] a.s. 0. Proposition 7.5. If Z G then E[XZ G] = E[X G]Z a.s. Proof. Note that E[X G]Z G so we only need to prove the second property. We use the usual four-step proof, from indicators to simple random variables to non-negatives to general. If Z = 1 B for some B G then for any A G, E[XZ1 A ] = E[X1 B A ] = E[E[X G]1 B A ] = E[E[X G]Z1 A ]. If Z is simple, then Z = k a k1 Ak and by linearity and the previous case, E[XZ G] = k a k E[X1 Ak G] = k a k E[X G]1 Ak = E[X G]Z. For general non-negative Z, in the case X is non-negative, we approximate Z by a nondecreasing sequence of simple random variables, Z n Z, so that XZ n XZ and by monotone convergence and the previous case, E[XZ G] = lim E[XZ n G] = lim E[X G]Z n = E[X G]Z. n n

47 For a general Z G, and general X, write Z = Z + Z and X = X + X, with 0 Z +, Z G and X +, X 0.

The following properties all have their usual proof adapted to the conditional setting. Proposition 7.6 (Jensen s Inequality).

Johan Jensen (1859-1925) Thus, for any ω Ω, there exist A(ω), B(ω) such that g(s) A(ω)s + B(ω) for all s and g(e[x G](ω)) = A(ω) E[X G](ω) + B(ω).

7 (Cauchy-Schwarz). If X, Y are in L 2 (Ω, F, P), then (E[XY G]) 2 E[X 2 G] E[Y 2 G]. Proof. By Jensen s inequality, E E[XY G] E[ XY ] E[X 2 ][Y 2 ] <, so E[XY G] is a.s. finite. If E[Y 2 G] = 0 a.s. then Y = 0 a.

47 47 For a general Z G, and general X, write Z = Z + Z and X = X + X, with 0 Z +, Z G and X +, X 0. By the previous case and linearity, E[X ± Z G] = E[X ± (Z + Z ) G] = E[X ± G](Z + Z ) = E[X ± G]Z, which immediately leads to the assertion. The following properties all have their usual proof adapted to the conditional setting. Proposition 7.6 (Jensen s Inequality). If g : R R is convex, and X, g(x) are integrable, then g(e[x G]) E[g(X) G]. Proof. If g is convex then for any m there exist a m, b m such that g(s) a ms+b m for all s, and g(m) = a mm+b m. Johan Jensen ( ) Thus, for any ω Ω, there exist A(ω), B(ω) such that g(s) A(ω)s + B(ω) for all s and g(e[x G](ω)) = A(ω) E[X G](ω) + B(ω). It is not difficult to see that A, B are measurable, and determined by E[X G] and g, so A, B are G-measurable random variables. Thus, g(e[x G]) = A E[X G] + B = E[AX + B G] E[g(X) G]. Proposition 7.7 (Cauchy-Schwarz). If X, Y are in L 2 (Ω, F, P), then (E[XY G]) 2 E[X 2 G] E[Y 2 G]. Proof. By Jensen s inequality, E E[XY G] E[ XY ] E[X 2 ][Y 2 ] <, so E[XY G] is a.s. finite. If E[Y 2 G] = 0 a.s. then Y = 0 a.s. and so both sides of the inequality become 0. So we can assume that E[Y 2 G] > 0. Set λ = E[XY G] E[Y 2, which is a G-measurable random variable. By linearity, G] Augustin-Louis Cauchy ( ) 0 E[(X λy ) 2 G] = E[X 2 G] + λ 2 E[Y 2 G] 2λ E[XY G] = E[X 2 G] (E[XY G])2 E[Y 2. G] Proposition 7.8 (Markov / Chebyshev ). If X 0 is integrable, then for any G-measurable Z such that Z > 0, P[X Z G] E[X G] Z. Hermann Schwarz ( ) Proof. Let Y = Z1 {X Z}. So Y X. Thus, Z P[X Z G] = E[Y G] E[X G]. Pafnuty Chebyshev ( )

48 48 Remark 7.9. Suppose that (Ω, F, P) is a probability space, and G F is some sub-σ-algebra. We have two vector spaces associated: L 2 (Ω, G, P) L 2 (Ω, F, P); the spaces of square-integrable G-measurable random variables and square-integrable F-measurable random variables. These spaces come equipped with a inner-product structure given by < X, Y >= E[XY ]. The theory of inner-product (or Hilbert) spaces tells us that L 2 (Ω, F, P) = L 2 (Ω, G, P) V where V is the orthogonal complement to L 2 (Ω, G, P) in L 2 (Ω, F, P). So we can project any F-measurable square integrable X onto L 2 (Ω, G, P). This projection turns out to be exactly X E[X G]. In fact, it is immediate that E[X G] is a square-integrable G-measurable random variable. Moreover, for Y L 2 (Ω, G, P), X E[X G], Y = E[XY E[X G]Y ] = E[XY ] E[E[XY G]] = 0. Thus, to minimize E[(X Y ) 2 ] over all Y L 2 (Ω, G, P), we can take Y = E[X G] The smaller σ-algebra always wins. Perhaps the most important property that has no unconditional counterpart is Proposition Let X be an integrable random variable on a probability space (Ω, F, P). Let H G F be sub-σ-algebras. Then, E[E[X H] G] = E[X H]. E[E[X G] H] = E[X H]. Proof. The first assertion comes from the fact that E[X H] H G, so conditioning on G has no effect. For the second assertion we have that E[X H] H of course, and for any A H, using that A G as well, E[E[X G]1 A ] = E[E[X1 A G]] = E[X1 A ] = E[E[X H]1 A ] Partitioned Spaces During this course, we will almost always use conditional probabilities conditioned on some discrete random variable. Note that if Y is discrete with range R (perhaps d-dimensional), then r R 1 {Y =r} = 1 a.s. This simplifies the discussion regarding conditional probabilities. The main observation is the following

49 49 Exercise 7.6. Suppose that (Ω, F, P) is a probability space with Ω = k I A k where A k F for all k I, with I some countable (possibly finite) index set. Show that σ((a k ) k I ) = { A k : J I }. k J Hint: Show that any set in the right-hand side must be in σ((a k ) k I ). Show that the righthand side is a σ-algebra. Lemma Let X be an integrable random variable on (Ω, F, P). Let I be some countable index set (possibly finite). Suppose that P[ k I A k] = 1 where A k F for all k, and P[A k ] > 0 for all k. Let G = σ((a k ) k I ). Then, E[X G] = k 1 Ak E[X1 A k ]. P[A k ] Proof. Let Y = k 1 A k E[X1 A k ] P[A k ]. The of course Y G. For any A G we have that 1 A = k J 1 A k (P-a.s.) for some J I. Thus, E[Y 1 A ] = k J E[1 Ak ] E[X1 A k ] P[A k ] = k J E[X1 Ak ] = E[X1 A ]. Corollary Let Y be a discrete random variable with range R on (Ω, F, P). Let X be an integrable random variable on the same space. Then, E[X Y ] = r R 1 {Y =r} E[X1 {Y =r} ] P[Y = r] = r R 1 {Y =r} E[X Y = r], where we take the convention that E[X Y = r] = E[X1 {Y =r}] P[Y =r] = 0 for P[Y = r] = 0. Proof. Ω = r R {Y = r}. Note that E[X Y ] is a discrete random variable as well, regardless of the original distribution of X. Number of exercises in lecture: 6 Total number of exercises until here: 16

50 50 Random Walks Ariel Yadin Lecture 8: Martingales 8.1. Martingales Do conditional expectation Definition 8.1. Let (Ω, F, P) be a probability space. A filtration is a monotone sequence of sub-σ-algebras F 0 F 1 F. A sequence (X n ) n of random variables is said to be adapted to a filtration (F n ) n if for all n, X n F n. Definition 8.2. Let (Ω, F, P) be a probability space, and let (F n ) n be a filtration. A sequence (X n ) n is said to be a martingale with respect to the filtration (F n ) n, or sometimes a (F n ) n - martingale, if for all n, E[ X n ] < (i.e. X n is integrable). E[X n+1 F n ] = X n. (X n ) n is adapted to (F n ) n. If the filtration is not specified then we say that (X n ) n is a martingale if it is a martingale with respect to the natural filtration F n := σ(x 0,..., X n ); that is, a sequence of integrable random variables such that for all n, E[X n+1 X n,..., X 0 ] = X n. Exercise 8.1. Show that if (X n ) n is an F n -martingale then (X n ) n is also a martingale with respect to the natural filtration (σ(x 0,..., X n )) n. (Hint: Show that for all n, σ(x 0,..., X n ) F n.)

51 51 Example 8.3. Let (X n ) n be a simple random walk on Z started at X 0 = 0. property gives that The Markov E[X n+1 X n,..., X 0 ] = 1 2 (X n + 1) (X n 1) = X n. So (X n ) n is a martingale. Example 8.4. More generally, if (X n ) n is a sequence of independent random variables with E[X n ] = 0 for all n, and S n = n k=0 X k, then E[S n+1 S n,..., S 0 ] = S n + E[X n+1 S n,..., S 0 ]. Since S n,..., S 0 σ(x 0,..., X n ) and since X n+1 is independent of σ(x 0,..., X n ) we have that E[X n+1 S n,..., S 0 ] = E[X n+1 ] = 0. So, in conclusion, (S n ) n is a martingale. Proposition 8.5. Let (X n ) n be a (F n ) n -martingale. For any k n we have E[X n F k ] = X k. Proof. For k = n this is obvious. Assume that k < n. By properties of conditional expectation, because F k F n 1, E[X n F k ] = E[E[X n F n 1 ] F k ] = E[X n 1 F k ]. Continuing inductively, we get the proposition. Exercise 8.2. Let (X n ) n be a (F n ) n -martingale. Let T be a stopping time (with respect to the filtration (F n ) n ). Prove that (Y n := X T n ) n is a (F n ) n -martingale. Theorem 8.6 (Optional Stopping). Let (X n ) n be an (F n ) n -martingale and T a stopping time. We have that E[X T X 0 ] = X 0 in the following cases: If T is bounded; that is if T t a.s. for some 0 < t <. If T is a.s. finite and there exists M > 0 such that X n M for all n a.s. ((X n ) n is bounded). If E[T ] < and there exists M > 0 such that X n+1 X n M for all n a.s. (X n ) n has bounded increments.

52 52 Proof. We start with the first case: Let Y n = X T n. Since T t a.s. we get that Y t = X T. Since Y 0 = X 0 we conclude E[X T X 0 ] = E[Y t Y 0 ] = Y 0 = X 0. For the second case: Let Y n = X T n as above. We have E[Y n X 0 ] E[X T X 0 ] = E[(X T n X T ) 1 {T >n} X 0 ] 2M P[T > n X 0 ] 0, because T < a.s. Thus, since T n is a bounded stopping time, Finally, for the third case: Note that Thus, similarly to the above, E[X T X 0 ] = lim n E[Y n Y 0 ] = lim n Y 0 = X 0. T 1 X T n X T 1 {T >n} X k+1 X k 1 {T >n} MT 1 {T >n}. k=n E[X T n X 0 ] E[X T X 0 ] M E[T 1 {T >n} X 0 ]. Since T 1 {T >n} 0, and since E[T ] <, we get by dominated convergence that E[T 1 {T >n} ] 0, and so X 0 = E[X T n X 0 ] E[X T X 0 ]. Let us use martingales to calculate some probabilities. Example 8.7 (Gambler s Ruin). Let (X t ) t be a simple random walk on Z. Let T = T ({0, n}) be the first time the walk is at 0 or n. We can think of X t as a the amount of money a gambler playing a fair game has after the t-th game. What is the probability that a gambler that starts with x reaches n before going bankrupt? Let p n (x) = P x [T n < T 0 ]. Since (X t ) t is a martingale, we get that (X t T ) t is a bounded martingale under the measure P x. Since T is a.s. finite, we can apply the optional stopping theorem to get x = E x [X T T ] = E x [X T ] = E x [X T T n < T 0 ] p n (x) + E x [X T T 0 < T n ] (1 p n (x)) = p n (x) n.

53 53 So p n (x) = x n. Remark 8.8. This is another proof that Z is recurrent: Let A n = { T n < T 0 + }. So (An ) n is a decreasing sequence of events. Thus, P 1 [ n A n ] = lim n P[A n ] = lim n 1 n = 0. By symmetry, P 1 [ n A n ] = 0. Now, the event that the walk never returns to 0 is the event that the walk takes a step to either 1 or 1 and then never returns to 0; i.e. { T + 0 = } { = X 1 = 1, } { A n X 1 = 1, } A n. n n The Markov property gives P 0 [T + 0 = ] = P 1[ n A n ] + P 1 [ n A n ] = 0. Example 8.9. What about the amount of time it takes to reach 0 or n? Consider Y t = X 2 t t. E[Y t+1 X 0,..., X t ] = 1 2 ((X t + 1) 2 (t + 1) + (X t 1) 2 (t + 1) ) = Y t. So (Y t ) t is a martingale, and thus (Y T t ) is a bounded martingale under the measure P x. Thus, since Y 0 = X0 2, x 2 = E x [Y T Y 0 ] = E x [XT 2 ] E x [T ] = p n (x)n 2 E x [T ]. So by the previous example, for any 0 x n, E x [T ] = xn x 2 = x(n x). Remark This is another proof that Z is null-recurrent: Under P 0, the event T n < T + 0 implies that T + 0 2n. So, P 0 [T + 0 2n] P 0[X 1 = 1, T n < T + 0 ] = P 1[T n < T 0 ] = 1 n.

54 54 Since P 0 [T + 0 2n 1] P 0[T + 0 E 0 [T 0 + ] = P 0 [T 0 + = m=0 n=1 2n], we get that > m] = P 0 [T 0 + m] m=1 P 0 [T 0 + 2n 1] + P 0[T 0 + 2n] 2 n =. Example Consider the martingale X 2 t t. Using the optional stopping theorem at time n=1 T = T + 0 we get that 1 = E 1 [X 2 0 0] = E 1 [X 2 T T ] = E 1 [T ]. Similarly, E 1 [T ] = 1. Since E 0 [T + 0 ] = 1 2 we get that E 0 [T + 0 ] = 2 <! Where did we go wrong? ( E0 [T + 0 X 1 = 1] + E 0 [T + 0 X 1 = 1] ) = 1 2 (E 1[T 0 + 1] + E 1 [T 0 + 1]), We could not use the optional stopping theorem, because the martingale X 2 t t is not bounded! Example Actually, this last bit gives a third proof that E 0 [T + 0 ] =. Suppose that E x [T 0 ] <. Since (X t ) t is a martingale with bounded differences, by the optional stopping theorem x = E x [X T0 ]. But, X T0 = 0 a.s. so E x [T 0 ] = for all x. Using the Markov property, Number of exercises in lecture: 2 Total number of exercises until here: 18 E 0 [T + 0 ] = 1 2 (E 1[T 0 + 1] + E 1 [T 0 + 1]) =.

55 55 Random Walks Ariel Yadin Lecture 9: Reversible Chains Let (X t ) t be Markov-P Time Reversal Then, conditioned on X t, we have that X[0, t] and X[t, ) are independent. This suggests looking at the chain run backwards in time - since determining the past given the future will only depend on the current state. However, in accordance with the second law of thermodynamics (entropy always increases), we know that nice enough chains converge to a stationary distribution, even if the chain is started from a very ordered distribution - namely a δ-measure. This suggests that there is a specific direction we are looking at, and that the chain is moving from order to disorder represented by the stationary measure. However, if we start the chain from the stationary distribution, perhaps we can view the chain both forwards and backwards in time. This is the content of the following. Definition 9.1. Let P be an irreducible Markov chain with stationary distribution π. Define ˆP (x, y) = P (y, x) π(y) π(x). ˆP is called the time reversal of P. The next theorem justifies the name time reversal. Theorem 9.2. Let π be the stationary distribution for an irreducible Markov chain P. Then, ˆP is an irreducible Markov chain, and π is a stationary distribution for ˆP. Moreover: Let (X t ) t be Markov-(π, P ). Fix any T > 0 and define Y t = X T t, t = 0,..., T. Then, (Y t ) T t=0 is Markov-(π, ˆP ). Proof. The fact that ˆP is a Markov chain follows from Also, (π ˆP )(x) = y ˆP (x, y) = y y π(y) ˆP (y, x) = y π(y)p (y, x) 1 π(x) = π(x) π(x) = 1. π(y)π(x)p (x, y) 1 π(y) = π(x) P (x, y) = π(x), y

56 56 so π is stationary for ˆP. Finally, note that π(x)p (x, y) = π(y) ˆP (y, x). So, P[Y 0 = x 0,..., Y T = x T ] = P π [X 0 = x T, X 1 = x T 1,..., X T = x 0 ] = π(x T )P (x T, x T 1 ) P (x 1, x 0 ) = ˆP (x T 1, x T ) ˆP (x T 2, x T 1 ) ˆP (x 0, x 1 ) π(x 0 ) = π(x 0 ) ˆP (x 0, x 1 ) ˆP (x 1, x 2 ) ˆP (x T 1, x T ). Recall the following definition: 9.2. Reversible Chains Definition 9.3. Let P be a Markov chain on S. A probability measure on S, π, is said to satisfy the detailed balance equations if for all x, y S, π(x)p (x, y) = π(y)p (y, x). We also say that P and π are in detailed balance. We also proved in the exercises that if P and π are in detailed balance, then π must be a stationary distribution for P. (The opposite is not necessarily true, as is shown in the exercises.) Immediately we see a connection between detailed balance and time reversals: Proposition 9.4. Let P be a Markov chain with stationary distribution π. The following are equivalent: P and π are in detailed balance. P = ˆP. For any T > 0, (X t ) T t=0 is Markov-(π, P ) if and only if (X T t ) T t=0 is Markov-(π, P ). [ The time reversal is the same as the forward-time chain. ] Proof. We show that each bullet implies the one after it. If P and π are in detailed balance, then for any states x, y, So P = ˆP. ˆP (x, y) = P (y, x) π(y) 1 π(x) = π(x)p (x, y) π(x) = P (x, y).

57 57 If P = ˆP then for any T > 0, if (X t ) T t=0 is Markov-(π, P ) then (X T t ) T t=0 is Markov-(π, ˆP ). Since ˆP = P we get that (X T t ) T t=0 is Markov-(π, P ). Reversing the roles of X t and X T t we get that for all T > 0, (X t ) T t=0 is Markov-(π, P ) if and only if (X T t ) T t=0 is Markov-(π, P ). Now for the third implication, assume that for all T > 0, (X t ) T t=0 is Markov-(π, P ) if and only if (X T t ) T t=0 is Markov-(π, P ). Take T = 1. Then (X 0, X 1 ) is Markov-(π, P ) if and only if (X 1, X 0 ) is Markov-(π, P ). That is, π(x)p (x, y) = P π [X 0 = x, X 1 = y] = P π [X 1 = y, X 0 = x] = π(y)p (x, y). So P and π are in detailed balance Reversible chains as weighted graphs Definition 9.5. Let G be a graph. A conductance on G is a function c : V (G) 2 [0, ) satisfying c(x, y) = c(y, x) for all x, y. c(x, y) > 0 if and only if x y. The pair (G, c) is called a weighted graph, or sometimes a network or electric network. Remark 9.6. Let (G, c) be a weighted graph, with C = x,y c(x, y) <. Define c x = y c(x, y) and P (x, y) = c(x,y) c x. P is a stochastic matrix, and so defines a Markov chain. For π(x) = cx C we have that π is a distribution, and π(x)p (x, y) = c(x, y) = c(y, x) = π(y)p (y, x). Thus, P is reversible. We will refer to such a P as the random walk on G induced by c. On the other hand, if P is a reversible Markov chain S, we can define a weighted graph as follows: Let V (G) = S and c(x, y) = π(x)p (x, y). Let x y if c(x, y) > 0. Note that c(x, y) = π(x)p (x, y) = 1. x,y x,y Also, we see that P is the random walk induced by (G, c). Connection to multiple edges and self-loops. Definition 9.7. If (G, c) is a weighted graph with x,y c(x, y) <, then the Markov chain P (x, y) = c(x,y) z c(x,z) is called the weighted random walk on G with weights c.

58 58 Example 9.8. Let (G, c) be the graph V (G) = {0, 1, 2}, with edges E(G) = {{0, 1}, {1, 2}, {0, 2}} and c(0, 1) = 1, c(1, 2) = 2 and c(2, 0) = 3. The weighted random walk is then P = The stationary measure is, of course, π(x) = y c(x, y)/ z,w c(z, w) so π = [ 1 3 stationary distribution. We can compute that ˆP = P (which is expected since P is reversible) ] is the Example 9.9 (One dimensional Markov chains are almost reversible). Let P be a Markov chain on Z such that P (x, y) > 0 if and only if x y = 1. For x Z let p x = P (x, x + 1) (so 1 p x = P (x, x 1). Consider the following conductances on Z: Let c(0, 1) = 1. For x > 0 set x p y c(x, x + 1) =. 1 p y Let c(0, 1) = 1 px p x and for x < 0 set Note that for any x Z we have that c(x, x 1) = y=1 x y= 1 c(x, x + 1) = c(x 1, x) 1 p y p y. p x 1 p x, so c(x, x + 1) c(x, x + 1) + c(x, x 1) = p x (1 p x )( px 1 p x + 1) = p x. So P is the weighted random walk with weights given by c. and Moreover, note that (c(x, x 1) + c(x, x + 1))P (x, x + 1) = c(x, x 1) 1 1 p x p x = c(x, x + 1) (c(x + 1, x) + c(x + 1, x + 2))P (x + 1, x) = c(x, x + 1) 1 1 p x (1 p x ) = c(x, x + 1), So for m(x) = c(x, x 1) + c(x, x + 1) we have that m(x)p (x, y) = m(y)p (y, x) for all x, y. That is, if m was a distribution, P would be reversible.

59 59 To normalize m to be a distribution we would need that m(x) = x x c(x, x 1) + c(x, x + 1) = 2 x c(x, x + 1) <. For example, if p x = 1/3 for x > 0 and p x = 2/3 for x < 0 we would have that c(x, x+1) = 2 x for x 0 and c(x, x 1) = 2 x for x 0. Thus m(x) = 2 2 x + 2 x = 4 2 = 8 <. x x=0 So π(x) = c(x,x 1)+c(x,x+1) 8 is a stationary distribution. In general, we see that a drift towards 0 would give a reversible chain. Number of exercises in lecture: 0 Total number of exercises until here: 18

60 60 Random Walks Ariel Yadin Lecture 10: Discrete Analysis Laplacian In order to study electric networks and conductances, we will first introduce the concept of harmonic functions. Let G = (V (G), c) be a network; recall that by this we mean: c : V (G) V (G) [0, ) with c(x, y) = c(y, x) for all x, y G and c x := y c(x, y) < for all x. We denote by E(G) the set of oriented edges of G; that is, E(G) = {(x, y) : c(x, y) > 0}. (We write x y when c(x, y) > 0.) For e E(G) we write e = (e +, e ). c is known as the conductance of the network. Let C 0 (V ) = {f : V (G) R} and C 0 (E) = {f : E(G) R} be the sets of all functions of vertices and (oriented) edges of G respectively. We can define an operator : C 0 (V ) C 0 (E) by: for any edge x y, ( f)(x, y) = c(x, y)(f(x) f(y)). We can also define an operator div : C 0 (E) C 0 (V ) (divf )(x) = y x 1 c x (F (x, y) F (y, x)). We can consider the spaces C 0 (V ), C 0 (E) with the inner products f, f = x c x f(x)f (x) and F, F = e 1 c(e) F (e)f (e). Consider the subspaces L 2 (V ) = { f C 0 (V ) : f, f < } and L 2 (E) = { F C 0 (E) : F, F < }. The operator is a linear operator from L 2 (V ) to L 2 (E). Also div : L 2 (E) L 2 (V ) is a linear

61 61 operator, and f, F = (x,y) (f(x) f(y))f (x, y) = x y f(x)(f (x, y) F (y, x)) = x c x f(x) y x 1 c x (F (x, y) F (y, x)) = f, divf. So = div and div = are dual of each other. Recall that the weighted random walk on the network G is just the Markov process with transition matrix given by P (x, y) = c(x,y) c x. Define the operator : C 0 (V ) C 0 (V ) by = 1 2div. That is, f(x) = 1 2 div f(x) = 1 2 y x 1 c x ( f(x, y) f(y, x)) = y P (x, y)(f(x) f(y)). Exercise Show that (in matrix form) = I P where I is the identity operator Harmonic functions Definition A function f : V (G) R is called harmonic at x if f(x) = 0. f is said to be harmonic on A if for all x A, f is harmonic at x. f is said to be harmonic, if f is harmonic at all x. Harmonic functions and martingales are intimately related. Proposition Let G = (V (G), c) be a network. Let f : G R be a function. Let S G and let T = T S c be the first exit time of S, for (X t ) t, the weighted random walk on G. Then, f is harmonic in S if and only if the sequence (M t = f(x t T )) t is a martingale under P x for all x. Proof. First assume that f is harmonic in S. Note that if x S then X t T = X 0 = x a.s. under P x. So as a constant sequence, M t = f(x) is a martingale. So we only need to deal with x S. The main observation here is that the Markov property is just the fact that E x [f(x t+1 ) F t ] = y P (X t, y)f(y) = (P f)(x t ).

62 62 For any t, since 1 {T t+1} = 1 {T >t} F t, and f(x T )1 {T t} F t, E x [M t+1 F t ] = E x [f(x t+1 ) F t ]1 {T >t} + f(x T )1 {T t} = (P f)(x t )1 {T >t} + f(x T )1 {T t}. If f is harmonic at x, then P f(x) = f(x). Thus, since on the event T > t, f is harmonic at X t, we get that (P f)(x t )1 {T >t} = f(x t )1 {T >t}. In conclusion, E x [M t+1 F t ] = (P f)(x t )1 {T >t} +f(x T )1 {T t} = f(x t )1 {T >t} +f(x T )1 {T t} = f(x t T ) = M t. So M t is a martingale. For the other direction, assume that M t T is a martingale. Then, for any x S, f(x) = M 0 = E x [M 1 ] = E x [f(x 1 )] = (P f)(x), were we have used that under P x, T 1 a.s. So we have that for any x S, f(x) = (I P )f(x) = 0. So f is harmonic in S. Harmonic functions exhibit properties analogous to those in the continuous case. Proposition 10.3 (Solution to Dirichlet Problem). Let G = (V (G), c) be a network. Let B G (we think of B as the boundary). Let D = {x G : P x [T B < ] = 1}. (So B D.) Let u : B R be some bounded function (boundary values). Then, there exists a unique function f : D R that is bounded, harmonic in D \ B and admits f(b) = u(b) for all b B. Proof. Define f(x) = E x [u(x TB )]. This is well defined, since under P x, T B < a.s. and since u is bounded. It is immediate to check that for any b B, f(b) = u(b). Also, for x D \ B, since T B 1 P x a.s., by the Markov property, f(x) = E x [u(x TB )] = y P (x, y) E y [u(x TB )] = P f(x). So f is harmonic at x. For uniqueness, assume that g : D R is bounded, harmonic in D \ B, and g(b) = u(b) for all b B. We want to show that (10.1) for all x D, g(x) = E x [u(x TB )].

63 63 g is bounded, so (g(x TB t)) t is a bounded martingale, so (10.1) holds by the optional stopping theorem, because T B < P x -a.s. for all x D. If we remove the condition that T B < then we can only guaranty existence but not uniqueness of the solution to the Dirichlet problem. Proposition Let G = (V (G), c) be a network. Let B G and let u : B R be some function. Then, there exists a function f : G R that is harmonic in G \ B and admits f(b) = u(b) for all b B. Proof. We define f(x) = E x [u(x TB )1 {TB < }]. Obviously, f(b) = u(b) for all b B. Also, for x B, since T B 1 P x -a.s. we have that f is harmonic at x by the Markov property. Comparison to Poisson formula? The maximum principle for harmonic functions in R d states that if a non-constant function is harmonic in a connected open subset of R d then it will have all its maximal values on the boundary. Proposition 10.5 (Maximum Principle). Let G = (V (G), c) be a network. Let B G and D = {x G : P x [T B < ] = 1}. Let f : D R be a bounded function, harmonic in D \ B. Then, sup x D f(x) = sup f(x) and inf f(x) = inf f(x). x B x D x B That is, supremum and infimum are on the boundary. Moreover, if D \ B is connected, and f is not constant, any x such that f(x) attains the supremum or infimum must admit x B. Proof. For any x D we know that because X TB B a.s. f(x) = E x [f(x TB )] sup f(b), b B Now, assume that f(x) sup y D f(y) for some x D \ B. Let z D. Since D \ B is connected, there exists a path from x to z that does not intersect B. Thus, there exists t > 0

64 64 such that P x [T B t, X t = z] > 0. Since f is harmonic in D \ B, we get that f(x TB s) s is a martingale. Thus, stopping at time s = t, (f(x) f(z)) P x [T B t, X t = z] = E x [(f(x) f(x t TB )) 1 {TB t,x t=z}] E x [f(x) f(x t TB )] = 0. So f(z) f(x) f(z) for any z D, and f is constant. This completes the proofs for the supremum. For the infimum, consider the function g = f. So g is bounded, harmonic in D \ B. Since sup x S g(x) = inf x S f(x) for any set S, we can apply the proposition to g to get the assertions for the infimum. Example Consider the following network: V (G) = Z and c(x, x + 1) = ( p 1 p) x. Suppose that p > 1/2 (if p = 1/2 this is just the simple random walk on Z, and if p < 1/2 then we can exchange x x to get the same thing). The weighted random walk here is given by P (x, x + 1) = c(x, x + 1) = p and P (x, x 1) = 1 p. c(x, x + 1) + c(x 1, x) First let s prove that the weighted random walk here is transient. For example, recall that it suffices to show that P 0 [X t = 0] <. t=0 Well, since at each step the walk moves right with probability p and left with probability 1 p independently, we can model this walk by X t = t ξ k, where (ξ k ) k are independent and all have distribution P[ξ k = 1] = p = 1 P[ξ k = 1]. The usual trick here is to note that ξ k+1 2 Ber(p), so ( ) 2t P 0 [X 2t = 0] = P[Bin(2t, p) = t] = p t (1 p) t. t (This is symmetric in p as expected.) Of course P[X 2t+1 = 0] = 0 because of parity issues. k=1 Now, since ( 2t t ) is the number of size t subsets out of 2t elements, this is at most the total number of subsets which is 2 2t. Since for p 1/2, 4p(1 p) < 1, we get that P[X t = 0] (4p(1 p)) t 1 = 1 4p(1 p) <. t=0 This is one proof that for p 1/2 the weighted walk is transient. t=0

65 65 Now, let us consider B = {0} and boundary values u(0) = 1. What is a bounded harmonic function f : G R such that f is harmonic in G \ B? Well, we can take f 1, which is one option. Another option is to take f(x) = P x [T 0 < ]. But since G is transient, we know that f 1! Since P x [T 0 < ] = E x [u(0)1 {T0< }] we see that this is the second solution from above. However, the uniqueness is only for functions defined on {x : P x [T 0 < ] = 1}, so a-priori there is freedom to choose more than one option for those x s such that P x [T 0 < ] < 1. add discussion on finite networks? Green Function Let G = (V (G), c) be a network. Let u : G R be a function. Suppose we want to solve the equation f = u. If we had a function g : G G R that satisfied g(, x) = 1 {x= } for every x, we could write f(y) = x g(y, x)u(x). Then, f(z) = x u(x)1 {x=z} = u(z), which is a solution. So finding the solution to g = 1 {x= } is the basic step. It turns out that such a g exists, and g is called the Green Function. It is the counterpart of the classical Green Function. Proposition Let G = (V (G), c) be a network. Let Z G be a set (possibly empty). Define T Z 1 g Z (x, y) = E x [ k=0 1 {Xk =y}]. Assume that at least one of the following conditions holds: The weighted random walk on G is transient. Z.

66 66 Then, g Z (, x) = 1 {x= } for all x Z. Moreover, for all x, y, c x g Z (x, y) = c y g Z (y, x). Proof. The conditions of the proposition are there to ensure that g Z (x, y) = P x [X k = y, T Z > k] <. k=0 First, the Markov property gives that for a fixed y, using h(x) = g Z (x, y), h(x) = 1 {x=y} + P x [X k = y, T Z > k] = 1 {x=y} + P (x, w) P w [X k 1 = y, T Z > k 1] k=1 = 1 {x=y} + w so h(x) = 1 {x=y}. P (x, w)h(w), The symmetry of g Z is shown as follows: By the definition of the weighted random walk, we have that c x P (x, y) = c y P (y, x) = c(x, y) for all x y. Thus, for any path in G, (x 0,..., x n ), Thus, for any x, y, k=1 c x0 P x0 [X 0 = x 0,..., X n = x n ] = c xn P xn [X 0 = x n,..., x n = x 0 ]. c x P x [X k = y, T Z > k] = = γ:x y γ =k, γ Z= γ:x y γ =k, γ Z= w c x P x [X[0, k] = γ] = c y P y [X k = x, T Z > k]. c y P y [X[0, k] = (γ k, γ k 1,..., γ 0 )] Summing over k completes the proof. Number of exercises in lecture: 1 Total number of exercises until here: 19

67 67 Random Walks Ariel Yadin Lecture 11: Networks Also, Let G = (V, c) be a network. Recall that for x y, We have the duality formula f, F = (x,y) Some discrete analysis ( f)(x, y) = c(x, y)(f(x) f(y)). (divf )(x) = y x 1 c x (F (x, y) F (y, x)). (f(x) f(y))f (x, y) = x y f(x)(f (x, y) F (y, x)) = x c x f(x) y x 1 c x (F (x, y) F (y, x)) = f, divf, where f, f = x c xf(x)f (x) and F, F = e 1 c(e) F (e)f (e). Also, = I P = 1 2 div. We want to think of as differentiation. So the opposite operation should be some kind of integral. Let γ : x y be a path in G. For a function F C 0 (E) on the oriented edges of G, define γ F = γ 1 j=0 1 F (γ j, γ j+1 ) c(γ j, γ j+1 ). For a path γ define its reversal by ˆγ = (γ γ, γ γ 1,..., γ 0 ). Also, define ˆF C 0 (E) by ˆF (x, y) = F (y, x). We make a few observations: Proposition Let F C 0 (E). ˆγ F = γ ˆF. Thus, if F is anti-symmetric, i.e. F (x, y) = F (y, x) for all x y, then for any path γ F = ˆγ F. If F = f for some f C 0 (V ), then for any path γ : x y we have that γ F = f(x) f(y).

68 68 If f = g then there exists a constant η such that f = g + η. Proof. The first bullet is immediate just reversing the order of the edges in F. For the second bullet, expanding the sum, we find that for γ : x y, γ F = γ 1 j=0 f(γ j ) f(γ j+1 ) = f(x) f(y). For the third bullet, note that for any γ : x y we have that f(x) f(y) = f = g = g(x) g(y). So f(x) g(x) = f(y) g(y) for all x, y, and the difference f g is constant. γ γ Definition A function F C 0 (E) is said to respect Kirchhoff s cycle law if for any cycle γ : x x, γ F = 0. Gustav Kirchhoff ( ) Any gradient respects Kirchhoff s cycle law, as shown above. But the converse also holds: Proposition F C 0 (E) respects Kirchhoff s cycle law if and only if there exists f C 0 (V ) such that F = f. In other words, if F respects Kirchhoff s cycle law, then we can define F := f for any f such that f = F, and then all representations of F differ by some constant. Proof. We only need to prove the only if direction. Assume that F respects Kirchhoff s cycle law. First, note that F must be anti-symmetric. Indeed, for x y, the path (x, y, x) is a cycle, and F (x, y) + F (y, x) = c(x, y) (x,y,x) F = 0. Now, fix x, y G and let γ : x y and β : x y. Then, the path α = γ ˆβ = (γ 0,..., γ γ, β β 1,..., β 0 ) is a cycle α : x x. So γ F F = F + F = F = 0. β γ ˆβ α So F does not depend on the choice of γ : x y, but only on the endpoints x and y. γ Fix some a G and for any x G define f(x) = F for some γ : x a, with the convention γ that f(a) = 0. It is clear that for any x y, 1 F (x, y) c(x, y) = So F = f. (x,y) F = f(x) f(y).

69 11.2. Electrical Networks Let G = (V, c) be a network. For each edge x y, define the resistance of the edge to be r(x, y) = 1 c(x,y). Let A, Z G be two disjoint subsets.

69 Electrical Networks Let G = (V, c) be a network. For each edge x y, define the resistance of the edge to be r(x, y) = 1 c(x,y). Let A, Z G be two disjoint subsets. If we were physicists, we could enforce voltage 1 on A, voltage 0 on Z, and look at the voltage and current flowing through the graph G, where each edge is a r(x, y)-ohm resistor. According to Ohm s law, the current equals the potential difference divided by the resistance I = V R. Kirchhoff would reformulate this telling us that the total current out of each node should be 0, except for those nodes in A Z. Let us turn this into a mathematical definition. The physics will only serve as intuition (albeit usually good intuition). Definition Let G = (V, c) be a network. Let A, Z be disjoint subsets of G. A voltage imposed on A and Z is a function v : G R that is harmonic in G \ (A Z). A unit voltage is a voltage v with v(a) = 1 for all a A and v(z) = 0 for all z Z. Given a voltage v, the current induced by v is defined I(x, y) = v(x, y) = c(x, y)(v(x) v(y)) for all oriented edges x y. Note that this has the form I(x, y) = v(x) v(y) r(x,y), which is the form of Ohm s law. Definition Let G = (V, c) be a network, and let A, Z be disjoint subsets of G. A flow Georg Ohm ( ) from A to Z is a function F on oriented edges of G satisfying: F is anti-symmetric: For every edge x y, F (x, y) = F (y, x). F is divergence free: For every x G\(A Z), divf (x) = y x 1 c x (F (x, y) F (y, x)) = 0. (A function being divergence free is sometimes said to respect Kirchhoff s node law.) For simplicity, we will sometimes extend a flow F to all pairs (x, y) by defining F (x, y) = 0 for x y. Example If v is a voltage, then the current induced by v is a flow; indeed, I(x, y) = c(x, y)(v(x) v(y)) = c(y, x)(v(y) v(x)) = I(y, x),

70 70 and for x A Z, divi(x) = y x 1 (I(x, y) I(y, x)) = 2 c(x, y) (v(x) v(y)) = 2 v(x) = 0. c x c y x x This fact is Kirchhoff s node law. Example If v is a voltage, and I is the current induced by v, then we have Kirchhoff s cycle law: for any cycle γ : x x, γ = (x = γ 0, γ 1,..., γ n = x), n 1 n 1 I(x j, x j+1 )r(x j, x j+1 ) = v(x j ) v(x j+1 ) = v(x) v(x) = 0. j=0 j=0 This of course is due to the fact that any derivative v respects Kirchhoff s cycle law. Exercise Let G = (V, c) be a finite network and A, Z disjoint subsets of G. Let I be flow from A to Z, that satisfies Kirchhoff s cycle law: for any cycle γ : x x, γ = (x = γ 0, γ 1,..., γ n = x), n 1 I(x j, x j+1 )r(x j, x j+1 ) = 0. j=0 Then, there exists a voltage v such that I is induced by v. Moreover, if u, v are two such voltages, then v u = η, for some constant η Probability and Electric Networks Since voltages are harmonic functions, it is not surprising that there is a connection between probability and electric networks. Let us elaborate on this. Definition Let G = (V, c) be a network. Let a G and Z G. Let v be a voltage such that v(z) = 0 for all z Z and v(a) = 1. Define the effective conductance from a to Z by C eff (a, Z) : = x = x I(a, x) = c a 2 divi(a) c(a, x)(v(a) v(x)) = c a v(a), where I is the current induced by v.

71 71 The effective resistance is defined as the reciprocal of the effective conductance. R eff (a, Z) := (C eff (a, Z)) 1. Proposition Let G = (V, c) be a network. Let {a}, Z be disjoint subsets. Let v be a voltage such that v(z) = 0 for all z Z, and v(a) 0 arbitrary. Let I be the current induced by v. Then, C eff (a, Z) = x I(a,x) v(a) = ca v(a) v(a). If the component of a in G \ Z is finite, then C eff (a, Z) = c a P a [T Z < T + a ]. Specifically, in this case C eff (a, Z) does not depend on the choice of the voltage. Proof. The first bullet follows from the fact that u = 1 v(a) v is a voltage with u(z) = 0 for all z Z and u(a) = 1, and v(a) u = v. For the second bullet, let D be the component of a in G \ Z. We have two harmonic functions on D: u = v v(a) and P x[t Z > T a ], which are 0 on Z and 1 on a. Thus, these functions are equal, because D is finite. Now, c a P a [T Z < T + a ] = c a x = c a v(a) v(a) P (a, x)(1 u(x)) = 1 v(a) = C eff (a, Z). c(a, x)(v(a) v(x)) x Resistance to Infinity Example Let G = (V, c) be an infinite network, and let a G. Let (G n ) n be an increasing sequence of finite connected subgraphs of G, that contain a, such that G = n G n (in this case we say that (G n ) n exhaust G). For every n, let Z n = G \ G n. Note that the connected component of a in G \ Z n is G n which is finite. Thus, we can consider the effective conductance from a to Z n, C eff (a, Z n ). This is a sequence of numbers, which converges to a limit; indeed, if T a + <, since X[0, T a + ] is a finite path, there exists n 0 such that for all n > n 0, X[0, T a + ] G n. The events {T Zn < T a + } form a decreasing sequence, so lim C eff(a, Z n ) = c a lim P a[t Zn < T a + ] = c a P a [T a + = ]. n n

72 72 Thus, we see that lim n C eff (a, Z n ) does not depend on the choice of the exhausting subgraphs (G n ) n, and (G, c) is recurrent lim n C eff(a, Z n ) = 0 lim n R eff(a, Z n ) =. In light of the above: Definition Let G = (V, c) be an infinite network, and let a G. Let (G n ) n be an increasing sequence of finite connected subgraphs of G, that contain a, such that G = n G n. Let Z n = G \ G n. Define the conductance from a to infinity and resistance from a to infinity as Thus, the theorem is: C eff (a, ) = lim n C eff(a, Z n ) and R eff (a, ) = C eff (a, ) 1. Theorem The weighted random walk in a network G is recurrent if and only if the resistance from some vertex a to infinity is infinite. Number of exercises in lecture: 1 Total number of exercises until here: 20

73 73 Random Walks Ariel Yadin Lecture 12: Network Reduction Network Reduction Recall that C eff (a, ) = c a P a [T + a = ]. So the effective resistance or conductance to infinity will not help us decide whether (G, c) is recurrent unless we have a way of simplifying a sequence finite networks G n. We will now compute a few operations that will help us reduce networks to simpler ones without changing the effective conductance between a and Z. Thus, it will give us the ability to compute probabilities on some networks. When we wish two differentiate between effective conductance (or resistance) in two networks, we will use C eff (a, Z; G) and C eff (a, Z; G ) Parallel Law. Exercise Suppose (G, c) is a network with multiple edges. Let (G, c ) be the network without multiple edges where the weight c (x, y) is the sum of all weights between x and y in (G, c). That is, c (x, y) = c(e). e E(G),e + =x,e =y Then, (G, c ) is a network without multiple edges, and the weighted random walk on (G, c ) has the same distribution as the weighted random walk on (G, c). Specifically, for all a, Z the effective conductance between a and Z does not change. Solution. This is just the fact that the transition probabilities for (G, c) and (G, c ) are proportional to each-other: P (x, y) c(e) = c (x, y) P (x, y). e : e + =x,e =y

74 74 c 2 x c 1 y x c 1 + c 2 y Figure 3. Parallel Law Series Law. Proposition 12.1 (Series Law). Let (G, c) be a network. Suppose there exists w that has exactly two adjacent vertices u 1, u 2. Let (G, c ) be the network given by V (G ) = V (G) \ {w}, and c(x, y) x, y V (G ), {x, y} {u c 1, u 2 } (x, y) = 1 r(u + c(u 1,w)+r(u 2,w) 1, u 2 ) {x, y} = {u 1, u 2 }. That is, we remove the edges between u 1 w and u 2 w and add weight 1 c(u 1,w) 1 +c(u 2,w) 1 the edge u 1 u 2 (which may have originally had weight 0). Then, for any a, Z such that w {a} Z, and such that the component of a in G \ Z is finite, we have that C eff (a, Z; G) = C eff (a, Z; G ). to Proof. Let (G, c ) be a network identical to (G, c) except that c (u 1, w) = c (u 2, w) = 0 and c (u 1, u 2 ) = c(u 1, u 2 )+C. We want to calculate C so that any function that is harmonic at u 1, w on G will be harmonic at u 1 on G as well. Let f : G R be harmonic at u 1, w on G. If f(u 1 ) = f(w), then harmonicity at w together with the fact that w is adjacent only to u 1, u 2, give that f(u 1 ) = f(w) = f(u 2 ). So the weight of the edges between u 1, u 2, w does not affect harmonicity of function, and can be changed.

75 75 Hence, we assume that f(u 1 ) f(w). Let h = h(w) = 0 and h(u 1 ) = 1. Harmonicity at u 1 gives that f f(w) f(u 1) f(w). So h is harmonic at u 1, w and c(u 1, y)(h(u 1 ) h(y)) = c(u 1, w)(h(u 1 ) h(w)) = c(u 1, w). y w Harmonicity at w gives c(u 1, w) + c(u 2, w)h(u 2 ) = 0. Thus, in order for h to be harmonic at u 1 on G, we require that 0 = ( c(u 1, y)(h(u 1 ) h(y)) + C(h(u 1 ) h(u 2 )) = c(u 1, w) + C 1 + c(u ) 1, w). c(u 2, w) y w This leads to the equation C = c(u 1, w) c(u 2, w) c(u 1, w) + c(u 2, w) = 1 r(u 1, w) + r(u 2, w). Thus, we have shown that choosing the weight 1 r(u 1,w)+r(u 2,w) as above, we get that if f is harmonic at u 1, w on G, then f is also harmonic at u 1 on G. Taking u 1 to play the role of u 2, the same holds if f is harmonic at u 2 and w on G. Let a, Z be as in the proposition. Let D be the component of a in G \ Z. Let v be a unit voltage imposed on a and Z in D. Since we chose the weight on u 1 u 2 in G correctly, we get that v is also a unit voltage imposed on a and Z in G. Because C eff (a, Z; G) = y v(a, y) and similarly in G, and since G \ Z and G \ Z only differ at edges adjacent to u 1, u 2 and w, we have that C eff (a, Z; G) C eff (a, Z; G ) = 0 for all a {u 1, u 2 }. Now, if a = u 1 then we have by harmonicity of v at w, (c(u 1, w) + c(u 2, w))v(w) = c(u 1, w)v(a) + c(u 2, w)v(u 2 ). Since the only difference is on edges adjacent to u 1, u 2 and w, C eff (a, Z; G) C eff (a, Z; G ) = c(a, w)(v(a) v(w)) = = 0. 1 r(u 1, w) + r(u 2, w) (v(a) v(u 2)) c(u 1, w) c(u 1, w) + c(u 2, w) ((c(u 1, w) + c(u 2, w))(v(a) v(w)) + c(u 2, w)(v(a) v(u 2 )))

76 76 Remark Note that if w has exactly 2 neighbors in a network (G, c) as above, with resistances r 1, r 2 on these edges, then the network with these two resistors exchanged with a single resistor of resistance r 1 +r 2 is an equivalent network in the sense that effective resistances and conductances do note change, as above. c 1 c 2 u 1 w u c 1 c 2 u 1 u 2 Figure 4. Series Law Example What is the effective conductance between a and z in the following network: a z a z 17/24 1/2 a 1/2 z a 1/3 z 1/2 1/2 1/2 3/8 a z a 1/2 z 3/2 3/5

77 Contracting Equal Voltages. Exercise Let (G, c) be a network, and let v be a unit voltage imposed on a and Z. Suppose x, y {a} Z are such that v(x) = v(y). Define (G, c ) by contracting x, y to the same vertex; that is: V (G ) is V (G) with the vertices x, y removed and a new vertex xy instead. All edges and weights stay the same, except for those adjacent to x or y, for which we have c (xy, w) = c(x, w) + c(y, w) for all w. Then, v is a unit voltage imposed on a and Z in G (where v(xy) := v(x) = v(y)), and the effective conductance between a and Z does not change: C eff (a, Z; G) = C eff (a, Z; G ). Solution. Since the only change is at edges adjacent to x and y, we only need to check that for w = xy or w xy such that w {a} Z, v is harmonic at w in G. For w xy, c (w, u)(v(w) v(u)) = c(w, u)(v(w) v(u)) + (c(w, x) + c(w, y))(v(w) v(xy)) u u x,y = u c(w, u)(v(w) v(u)), where we have used that v(xy) = v(x) = v(y). So if v is harmonic at w in G then v is harmonic at w in G. Similarly, for w = xy, c (xy, u)(v(xy) v(u)) = u u c(x, u)(v(x) v(u)) + u c(y, u)(v(y) v(u)), so v is harmonic at xy in G. Example What is the effective conductance between a and z in the following network:

78 78 a z a z a z 2/3 a 1/2 z a z 2 1/2 2 Exercise Let (G, c) be a network, and let v be a unit voltage imposed on a and Z. Suppose x, y {a} Z are such that v(x) = v(y). Let c be a new weight function on G that is identical to c except for the edge x y. For x y let c (x, y) = C 0 some arbitrary number, possibly 0. Let be the Laplacian on (G, c ). Then, v is harmonic in G \ ({a} Z) also with respect to c. Conclude that the effective conductance between a and Z is the same in both (G, c) and (G, c ). Solution. Since the difference is only the edge x y, we only need to check that the harmonicity is preserved at x and y. Because v(x) v(y) = 0, c z v(z) = w c (z, w)(v(z) v(w)) = c(z, w)(v(z) v(w)) + c (x, y)(v(x) v(y)) = c z v(z). w : {z,w} ={x,y}

79 79 Thus v is a unit voltage imposed on a and Z with respect to c as well. Also, C eff (a, Z; (G, c )) = c a v(a) = c a v(a) = C eff (a, Z; (G, c)). Example The network from the previous example can be reduced by removing the vertical edge. Exercise Let G = (V, c) be a network such that V = Z and x y if and only if x y = 1. For the weighted random walk (X t ) t on G define V t (x) = t n=0 1 {Xn=x} the number of visits to x up to time t. Let T + 0 = inf {t 1 : X t = 0}. Calculate E 0 [V T + (x)] as a function of c only. 0 Number of exercises in lecture: 4 Total number of exercises until here: 24

.1. Thomson s Principle Definition 13.1. For F L 2 (E) and for v such that v L 2 (E) define the energy of F and of v by E(F ) := F, F = e r(e)f (e) 2 and E(v) := v, v = x y c(x, y)(v(x) v(y)) 2.

80 80 Random Walks Ariel Yadin Lecture 13: Thompson s Principle Suppose G is a network. We think of weights as conductances, so it seems intuitive that increasing the conductance of edges would result in making the graph more transient. This is what we prove in this lecture Thomson s Principle Definition For F L 2 (E) and for v such that v L 2 (E) define the energy of F and of v by E(F ) := F, F = e r(e)f (e) 2 and E(v) := v, v = x y c(x, y)(v(x) v(y)) 2. Note that if v L 2 (V ) then E(v) = 2 v, v by the duality formula. Joseph John Thomson ( ) Lemma 13.2 (Thomson s Principle / Dirichlet Principle). Let G = (V, c) be a finite network, let A, Z be disjoint subsets. The unit voltage v is the function that minimizes the energy E(f) over all functions f with f(a) = 1 for all a A and f(z) = 0 for all z Z. Proof. By the duality formula we have that for any f, f C 0 (V ), f, f = 1 2 f, f = f, f. (That is, the Laplacian is self-dual.) Since f v = 0 on A Z, and since v is harmonic off A Z, we get that (f v) v 0. So, (f v), v = f v, v = x c x (f(x) v(x)) v(x) = 0. This implies E(f) = 2 (f v + v), f v + v = E(f v) + E(v) E(v), where we have used that the energy is always non-negative. Johann Dirichlet ( )

81 81 Lemma 13.3 (Thomson s Principle - Dual Form). Let G be a finite network, let {a}, Z be disjoint subsets. Let v(x) = 1 2 g Z(x, a) where g Z (x, a) is the Green function (the expected number of visits to a started at x from time 0 until before hitting Z). Then, over all flows F from a to Z with flow divf (a) = 1, the energy E(F ) is minimized at I = v. Proof. First, we know that v is a voltage on a and Z with v(z) = 0 for all z Z. Also, divi(a) = 2 v(a) = 1. Let F be a flow from a to Z with divf (a) = 1. Then, F I is a flow from a to Z with div(f I)(a) = 0. Since div(f I) is 0 off Z, and v is 0 on Z, we get that div(f I)v 0. Thus, F I, I = div(f I), v = 0. So, E(F ) = E(F I + I) = E(F I) + E(I) E(I). Corollary 13.4 (Rayleigh s Monotonicity Principle). Let G be a finite network, and let a be a point not in a subset Z. Suppose c is a weight function on G such that c c. Then, C eff (a, Z; c) C eff (a, Z; c ). Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit voltage imposed on a and Z with respect to c. Note that John William Strutt, 3rd Baron Rayleigh ( ) E(v) = 2 v, v = 2 x c x v(x)v(x) = 2c a v(a) = 2C eff (a, Z; c), because v(x) = 0 for x {a} Z, v(z) = 0 for z Z, and v(a) = 1. Similarly, E(u) = 2C eff (a, Z; c ). (This fact is called conservation of energy.) Since c a c a, using Thomson s principle, C eff (a, Z; c) = 1 2 c(x, y)(v(x) v(y)) c(x, y)(u(x) u(y)) 2 x,y 1 2 x,y x,y c (x, y)(u(x) u(y)) 2 = C eff (a, Z; c ).

82 82 Corollary Let G be an infinite network. Let c be a weight function on G such that c c. If (G, c) is transient, then also (G, c ) is transient. Proof. Fix a vertex o G. For every n, let G n be the ball of radius n around o; that is G n = {x G : dist(x, o) n}. So (G n ) n form an increasing sequence of subgraphs that exhaust G. Let Z n = G n+1 \ G n, which is the outer boundary of the ball of radius n. We know that G is transient, which is equivalent to lim R eff(a, Z n ; c) < n (because imposing a unit voltage on a and G \ G n is the same as imposing a unit voltage on a and Z n ). Now, for each fixed n, since c c, considering the finite networks (G n, c) and (G n, c ), we have that C eff (a, Z n ; c) C eff (a, Z n, c ). Thus, so (G, c ) is transient. lim R eff(a, Z n ; c ) lim R eff(a, Z n ; c) <, n n Exercise Let H be a subgraph of a graph G (not necessarily spanning all vertices of G. Show that if the simple random walk on H is transient then so is the simple random walk on G Shorting Another intuitive network operation that can be done is to short two vertices. This can be thought of as imposing a conductance of between them. Since this increases the conductance, it is intuitive that this will increase the effective conductance. Proposition Let (G, c) be a finite network. Let b, d G and define (G, c ) by shorting b and d: Let V (G ) = V (G) \ {b, d} {bd} and c (z, w) = c(z, w) for z, w {b, d} and c (bd, w) = c(b, w) + c(d, w). Then, for any disjoint sets {a}, Z, we have that C eff (a, Z; G) C eff (a, Z; G ).

83 83 Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit voltage imposed on a and Z with respect to c. Conservation of energy tells us that 2C eff (a, Z; c) = c(x, y)(v(x) v(y)) 2 and 2C eff (a, Z; c ) = c (x, y)(u(x) u(y)) 2. x,y x,y Note that u can be viewed as a function on V (G) by setting u(b) = u(d) = u(bd). Since c a c a, using Thomson s principle, C eff (a, Z; c ) = 1 2 c (x, y)(u(x) u(y)) 2 + c (bd, w)(u(bd) u(w)) 2 x,y G\{b,d} w c(x, y)(u(x) u(y)) 2 + c(k, w)(u(k) u(w)) 2 = 1 2 x,y G\{b,d} = 1 2 x,y G 1 2 x,y G c(x, y)(u(x) u(y)) 2 Number of exercises in lecture: 1 Total number of exercises until here: 25 k=b,d w c(x, y)(v(x) v(y)) 2 = C eff (a, Z; c).

84 84 Random Walks Ariel Yadin Lecture 14: Nash-Williams A Probabilistic Interpretation of Current Proposition Let (G, c) be a network. Let a G and Z G such that the component of a in G \ Z is finite. For the weighted random walk on G, (X t ) t, and for any edge x y, let V x,y be the number of times the walk goes from x to y until hitting Z; that is, Then, V x,y := where v is a unit voltage imposed on a, Z. Proof. Let T Z 1 {Xk 1 =x,x k =y}. k=1 E a [V x,y V y,x ] = v(x, y) R eff (a, Z), g(x) = c a g Z (a, x) = c T Z 1 a E a 1 {Xk =x} = g Z (x, a). c x c x We have already seen that g is harmonic in G \ ({a} Z). Also, g(z) = 0 for all z and g(a) = k=0 1 P a [T Z < T + a ] = c a R eff (a, Z). (g is a voltage imposed on a, Z with g(z) = 0 for all z Z.) Now, Thus, E a [V x,y ] = = That is, since v = P a [X k 1 = x, X k = y, T Z > k 1] k=1 P a [X k = x, T Z > k]p (x, y) = 1 c x g(x)p (x, y) = 1 g(x)c(x, y). c a c a k=0 g g(a) E a [V x,y V y,x ] = 1 c a c(x, y)(g(x) g(y)). is a unit voltage imposed on a, Z, and since c 1 a g = R eff (a, Z)v, E a [V x,y V y,x ] = R eff (a, Z) c(x, y)(v(x) v(y)).

85 14.2. The Nash-Williams Criterion Definition 14.2. Let G be a graph. Let A, Z be disjoint subsets.

85 The Nash-Williams Criterion Definition Let G be a graph. Let A, Z be disjoint subsets. A subset of edges Π is a cut between A and Z if any path γ : a z with a A and z Z must pass through and edge in Π. A subset of edges Π is a cutest, (sometimes a cut between A and ), if any infinite simple path that starts at a A, must pass through an edge in Π. One intuitive statement is that if e is a cut edge between a and Z, then R eff (a, Z) r(e), because there is at least some more resistance between a and Z. Proposition Let (G, c) be a finite network. Let {a}, Z be disjoint subsets and let e be a cut edge between a and Z. Then R eff (a, Z) r(e). Proof. Suppose that e = (x, y). Let V x,y be the number of times a random walk crosses the edge (x, y) until hitting Z and let V y,x be the number of times the walk crosses y, x before hitting Z. We have seen that where v is a unit voltage imposed on a and Z. E a [V x,y V y,x ] = v(x, y) R eff (a, Z), Because G is finite, we know by uniqueness of harmonic functions that v(x) = P x [T a < T Z ]. Because (x, y) is a cut edge between a and Z, to get from y to a the walk must pass through x; that is, So 0 v(x) v(y) 1. v(y) = P y [T a < T Z ] = P y [T x < T Z ] P x [T a < T Z ] v(x). Now, since (x, y) is a cut edge between a and Z, we must have that V x,y V y,x 1, because the walk must cross the edge (x, y) and every time it crosses back over (y, x) it must return to cross (x, y). Thus, 1 c(x, y)(v(x) v(y))r eff (a, Z) c(x, y)r eff (a, Z). If Π is a cut between a and Z then shorting all edges in Π would result in a cut edge of conductance at most e Π c(e). A natural generalization of the above is the following. Lemma 14.4 (Nash-Williams Inequality). Let (G, c) be a finite network, and {a}, Z disjoint sets. Suppose that Π 1,..., Π k are k pairwise disjoint cuts between a and Z. Then, Crispin Nash-Williams ( )

86 86 R eff (a, Z) k c(e) e Π j j=1 1. Proof. Note that since removing edges from a cut-set only increases the right hand side, we can prove the lemma with the assumption that cut-sets are minimal. Specifically, they do not contain both (x, y) and (y, x) for an edge x y. Let v be a unit voltage imposed on a and Z. We know (conservation of energy) that 1 2 E(v) = C eff (a, Z). For an edge (x, y) let V x,y be the number of crossings from x to y until hitting Z; that is, V x,y = T Z 1 {Xk 1 =x,x k =y}. k=1 Then, for any minimal cut Π between a and Z, we have that P a -a.s. Also, we have that for any edge (x, y), V x,y V y,x 1. (x,y) Π E a [V x,y V y,x ] = v(x, y) R eff (a, Z). Thus, applying Cauchy-Schwarz, for any cut Π between a and Z, 1 R eff (a, Z) 2 (x,y) Π v(x, y) That is, for any one of the cuts Π j, c(e) e Π j 1 2 R eff (a, Z) 2 R eff (a, z) 2 (x,y) Π (x,y) Π c(x, y) (x,y) Π c(x, y)(v(x) v(y)) 2. c(x, y)(v(x) v(y)) 2. Since the cuts Π j are disjoint, and since we assumed that the cut does not contain both (x, y) and (y, x) (because they are minimal), we have that k c(e) e Π j j=1 1 R eff (a, z) c(x, y)(v(x) v(y)) 2 = R eff (a, Z). x,y

87 87 Corollary 14.5 (Nash-Williams Criterion). Let (G, c) be an infinite network. If (Π n ) n is a sequence of pairwise disjoint finite cutsets between a and such that then (G, c) is recurrent. ( n=1 e Π n c(e) ) 1 =, Proof. Fix n. Let G n be subnetwork induced by (G, c) on the smallest ball (in the graph metric) that contains n j=1 Π j. Let Z n = G \ G n. So (G n ) n exhaust G and for each fixed n, R eff (a, Z n ) n c(e) e Π j j=1 Letting n, the left hand side tends to R eff (a, ) and the right hand side tends to the infinite sum. Since R eff (a, ), (G, c) is recurrent. 1. Example We now give a proof that Z and Z 2 are recurrent. Recall that we could prove this by showing that 1/ t d = 1 P 0 [X 2t = 0] const. 1/t d = 2 However, it will be easier to do this without these calculations (especially the more complicated Z 2 case). For Z this is easy, because Z is just composed of edges in series, so for any n > 0, R eff (0, { n, n}) = n/2. that Now for Z 2 : By the Nash-Williams criterion, it suffices to find disjoint cutsets (Π n ) n such Indeed, taking n 1 Π n =. Π n = {(x, y) : x = n, y = n + 1} we have that Π n = 4(2n + 1).

88 88 ADD: example that Nash-Williams is not necessary. Number of exercises in lecture: 0 Total number of exercises until here: 25

89 Random Walks Ariel Yadin Lecture 15: Flows 15.1. Finite Energy Flows The Nash-Williams criterion was a sufficient condition for recurrence.

89 89 Random Walks Ariel Yadin Lecture 15: Flows Finite Energy Flows The Nash-Williams criterion was a sufficient condition for recurrence. We now turn to a stronger condition which is necessary and sufficient. Let (G, c) be an infinite weighted graph. Recall that a flow from A to Z is an anti-symmetric function with vanishing divergence off A Z. In this spirit, we say that F is a flow from o G to if F is anti-symmetric. divf (o) 0 and divf (x) = 0 for all x o. If in addition divf (o) = 1 we say that F is a unit flow from o to infinity. Theorem 15.1 (T. Lyons, 1983). A weighted graph (G, c) is transient if and only if there exists a finite energy flow on (G, c) from some vertex o G to. Proof. The proof is an adaptation of a method of H. Royden in the continuous world. Assume that F is a flow from o to. By changing F F/divF (o) we can assume without loss of generality that F is a unit flow. For each n let G n be the finite subnetwork of G induced on the ball of radius n around o (in the graph metric). Let Z n = G n \ G n 1. Transience of G is equivalent to Terrence Lyons lim R eff(o, Z n ) <. n Let v n = 1 2 g Z n (x, o), where g Zn is the Green function on the finite network G n. Since div v n (o) = 2 v n (o) = 1, the dual version of Thompson s principle tells us that E(F ) E Gn (F ) E Gn (v n ). Also, since v n (o) = 1 2 g Z n (o, o) = 1 2 P o [T Zn < T + o ] = c o 2 R eff(o, Z n )

90 90 and v n (o) = 1 2, we get that E Gn (v n ) = 2 x c x v n (x)v n (x) = 2c o v n (o)v n (o) = c2 o 2 R eff(o, Z n ). Thus, if F has finite energy on G then and (G, c) is transient. lim R eff(o, Z n ) = 2 n c 2 lim E G n (v n ) 2 n 0 c 2 E(F ) < o For the other direction, assume that (G, c) is transient and consider the functions v n (x) = P x [T o < T Zn ] and v(x) = P x [T o < ]. v n is a unit voltage imposed on o and Z n in G n, and v n (x) v(x) for every x by monotone convergence. Note that v(o) = v n (o) = 1 and v, v n are non-constant because (G, c) is transient. Let I n = v n and I = v. Note that for every edge e, I n (e) 2 r(e) I(e) 2 r(e). Also, for every n, since G n is finite, E(I n ) = 2 v n, v n = 2c o v n (o)v n (o) = 2C eff (o, Z n ) 2c o <. Thus, Fatou s lemma (for sums) tells us that E(I) = e Since I = v, we have that lim I n(e) 2 r(e) lim inf n n I n (e) 2 r(e) 2c o <. e divi(o) = 2 v(o) = 2 y P (o, y)(1 P y [T o < ]) = 2 P o [T + o = ] > 0 by transience, and divi(x) = 2 v(x) = 0 for all x o. That is, I is a flow from o to with finite energy Flows on Z d We now want to give some more details about random walks on Z d. We start with a proof that Z d is transient for d 3. By Rayleigh s monotonicity principle it suffices to prove that Z 3 is transient. By Lyon s Theorem it suffices to provide a finite energy flow on Z 3. Let µ be a probability measure on paths on some graph G. Let Γ denote the random path, and suppose that µ-a.s. every vertex of G is visited finitely many times. Then, we can define V (x, y) to be the number of times Γ crosses the edge (x, y), and E µ (x, y) to be the expectation of V (x, y) under µ.

91 91 Claim Suppose that Γ is infinite and Γ 0 = o µ-a.s. Suppose also that E µ (x, y) < for every edge (x, y). Then, F (x, y) := E µ (x, y) E µ (y, x) is a flow from o to. Proof. Anti-symmetry is clear. Also, for any x o, since Γ is infinite, it cannot terminate at x. Thus, every time Γ crosses an edge (y, x) it must then cross an edge (x, z) immediately after. Thus, µ-a.s. y x V (x, y) V (y, x) = 0 and so divf (x) = 0. Also, since Γ 0 = o, we get one extra passage out of o, but the rest must cancel: c o divf (o) = 2 y F (o, y) = 2. That is, to show a graph is transient, we need to construct a measure on infinite paths, that start at some vertex, and the expected number of visits to any vertex is finite. If the energy is finite for such a measure, we have transience Wedges. Let us prove something a bit more general than Z 3 being transient and Z 2 being recurrent. Let ϕ : N N be an increasing function. Consider the subgraph of Z 3 induced on W ϕ = { (x, y, z) Z 3 : z ϕ( x ) }. (This is the ϕ-wedge.) Theorem 15.3 (T. Lyons 1983). If n=1 1 n(ϕ(n) + 1) = then W ϕ is recurrent. If ϕ(n + 1) ϕ(n) 1 and then W ϕ is transient. n=1 1 n(ϕ(n) + 1) < Proof. The first direction is simpler. Let W ϕ be a wedge, and let B n denote the ball of radius n around 0 in the graph metric (which is the L 1 distance in R 3 ). Let B n be the edges connecting B n to Bn. c Thus, B n form disjoint cutsets between 0 and. What is the size of B n? Well there are at most 2n choices for x and then, given x there are at most 2(ϕ( x ) + 1) 2(ϕ(n) + 1) choices for z, which then determines y up to sign. Thus, the

92 92 size is bounded by B n O(n(ϕ(n) + 1)). So Nash-Williams tells us that if 1 n(ϕ(n) + 1) = the walk is recurrent. n=1 Now for the other direction. We define a measure on paths in W ϕ. Let U 1, U 2 be uniformly chosen on [0, 1] independently. Let L be the set {(n, Un, U ϕ(n)) : n N}. Choose a monotone path Γ in W ϕ that is always at distance at most 1 from L. (A monotone path γ is a path in Z 3 such that dist(γ t+1, 0) dist(γ t, 0).) Fix an edge e in W ϕ and suppose that (x, y, z) is an endpoint of e. Let R = x + y + z. The event that e Γ implies that (x, y, z) is at distance at most 1 from L. That implies that (where we have used that ϕ(n) n). Also 3n n + Un + U ϕ(n) R 1 n R 1 3 x n + y Un + z U ϕ(n) 1 so nu [y 1, y + 1] and ϕ(n)u [z 1, z + 1]. Thus, µ[e Γ] 4 n(ϕ(n) + 1). Because Γ visits any edge at most once, this is also a bound on E µ (e). Since there are at most O(R(ϕ(R) + 1)) O(n(ϕ(n) + 1)) such possibilities for (x, y, z) W ϕ, we have that the energy of the flow is at most 2 R x + y + z =R e : e + =(x,y,z) = const. n 1 n(ϕ(n) + 1). E µ (e) 2 const. n(ϕ(n) + 1) n 1 n 2 (ϕ(n) + 1) 2 Since this is finite, the flow is of finite energy and the wedge is transient. Example For example, if we choose ϕ(n) = n ε, we get a transient wedge. This is also true if we take ϕ(n) = (log n) 2. If we choose ϕ(n) = 1 we get essentially Z 2 and recurrence, of course. Also, ϕ(n) = log n will give a divergent sum, so this wedge is recurrent. Number of exercises in lecture: 0 Total number of exercises until here: 25

93 93 Random Walks Ariel Yadin Lecture 16: Resistance in Euclidean Lattices Let s wrap up our discussions with some examples of random walks on graphs Euclidean lattices We have already seen that Z d are transient for d 3 and recurrent for d 2. We saw two different methods to prove this. The first was brute force computation of P 0 [S t = 0], using Stirling s formula, and then approximating E 0 [V t (0)] and E 0 [V (0)]. The second method was more robust, and less computational. It involved approximating the energy of certain flows, mainly taking a uniform direction and following that direction with a path in the lattice. Energy estimates and the Nash-Williams inequality can give us better control of the effective resistance and Green function Resistance Estimates. Since Z d, d 3, is transient, we know that R eff (0, B n ) is bounded by two constants, where B n is the boundary of the ball of radius n around 0. However, for d = 2 we know that R eff (0, B n ). We now investigate the growth rate of this function. Proposition Let Z n = { z Z 2 : dist(0, z) n }. Then, there exist constants 0 < c, C < independent of n such that c log n R eff (0, Z n ) C log n. Proof. The lower bound follows by noting that for the sets Π n = { (z, z ) Z 2 : dist(z, 0) = n 1, dist(z, 0) = n }, all Π 1, Π 2,..., Π n are cuts between 0 and Z n, with size Π n = O(n). So the Nash-Williams inequality gives R eff (0, Z n ) n k=1 n 1 Π k const. 1 = const. log n. k k=1

94 94 For the other direction, let v n (x) = 1 4 g Z n (x, 0). So v n is a voltage imposed on 0 and Z n, with v n (0) = 1 4 and v n(0) = (4 P 0 [T + 0 > T Z n ]) 1 = R eff (0, Z n ). Also, E(v n ) = 8 v n (0)v n (0) = 2R eff (0, Z n ). Let U be a uniform random variable in [0, 1], and let L = { (n, Un) R 2}. Let Γ be some random monotone path from 0 that always is at distance at most 1 from L. For any edge e = (x, y) in Z 2, the event e Γ implies that x n 1 and nu [y 1, y + 1]. Thus, the expected number of times Γ crosses e is at most 2 n 2 x 1. Let F n be the flow given by this random path restricted to G \ Z n. Since the number of edges with endpoint at distance n from 0 is O(n), E(F n ) n O(k k 2 ) = O(log n). k=1 Recall that divf n (0) = 1/2, so Thompson s principle tells us that for I = v n, since I is a current with divi(0) = 2 v n (0) = 1 2, E(F n ) E(I) = E(v n ) = 2R eff (0, Z n ). Remark If we tried to adapt the argument above to Z d, we would see that the probability that an edge e at distance n from 0 is in Γ is at most O(n (d 1) ) (because we would be looking at the direction (n, U 1 n, U 2 n,..., U d 1 n) for U 1,..., U d i.i.d. ). Thus, n n R eff (0, Z n ) 2 1 E(F n ) O(k d 1 k 2(d 1) ) = O(k 1 d ) = O(n 2 d ). k=1 k=1 Similarly the lower bound would follow from the Nash-Williams inequality Regular Trees Let T d denote the d regular tree. Fix some vertex ρ T d as the root. For n 0 let T n = {x T d : dist(x, ρ) = n}. It is easy to check that T 0 = 1 and T n = d(d 1) n 1 for n 1. For any x, y T n there exists a graph automorphism ϕ : T d T d that maps ϕ(x) = y and fixes each level T n ; i.e. ϕ(t n ) = T n. Thus, if v n is a unit voltage imposed on ρ and T n, we have that v n is constant on T k for k n. Thus, all vertices in each level T k can be shorted into

95 95 one vertex, without changing the effective resistance R eff (ρ, T n ). This gives us a network whose vertices are {0, 1,..., n}, and resistances r(k, k + 1) = T k+1 1. Thus, the effective resistance is Thus, R eff (ρ, ) = R eff (ρ, T n ) = 1 d n k=1 1 (d 1) k 1 = d 1 d(d 2) (1 (d 1) n). d 1 d(d 2) < so T d is transient for d > A computational proof. We now give a computational proof that the random walk on T d is transient for d > 2. Let (X t ) t be the random walk on T d, and consider the following sequences: D t := dist(x t, ρ) and M t = (d 1) Dt. Let T j be the first time X t T j. First, note that E[M t+1 F t ] = 1 {Dt=0}(d 1) {Dt>0} ( 1 d (d 1)M t + (1 1 d )(d 1) 1 M t ) = 1 {Dt=0}(d 1) {Dt>0}M t. So under P x for x ρ, we have that (M t T0 ) t is a bounded martingale. If P x [T 0 < ] = 1 we would have by the optional stopping theorem that (d 1) dist(x,ρ) = E x [M T0 ] = 1, which is a contradiction. Since we get that T d is transient. P ρ [T + ρ < ] = 1 P x [T 0 < ] < 1, d x T 1 In fact, the above lets us calculate exactly the probability to escape from ρ: If T = T 0 T n then by the optional stopping theorem, for x T 1, so So (d 1) 1 = E x [M T ] = P x [T ρ > T n ] (d 1) n + 1 P x [T ρ > T n ], P x [T ρ > T n ] = d 2 d 1 (d 1) n+1 = 1 (d 1)n 1 1 (d 1) n 1. Also, v(x) = P x [T ρ < T n ] is a unit voltage on ρ and T n. Thus, v(ρ) = 1 d 2 P x [T ρ > T n ] = d d 1 (d 1) n+1. x T 1 C eff (ρ, T n ) = d v(ρ) = d(d 2) d(d 2) = d 1 (d 1) n+1 d 1 (1 (d 1) n) 1,

96 96 which coincides with our calculation above Flows from random paths In this section, we generalize the previous constructions on Z d. Let µ be a probability measure on infinite paths in G started from o G. By mapping each path in the support of µ to its loop-erasure, we may assume without loss of generality that µ is supported on simple paths (paths that do not cross any vertex more than once). Let α, β be two independent random paths of law µ. Now, define F C 0 (E) by F (x, y) = µ((x, y) α) µ((y, x) α) (by e α we mean that there exists n such that e = (α n, α n+1 ). We claim that F is a flow. Indeed, for x o the number of edges going into x in α equals the number of edges exiting x in α. Thus, for x o, divf (x) = y x 1 c x (F (x, y) F (y, x)) = 2 c x E[1 {(x,y) α} 1 {(y,x) α} ] = 0. Similarly for x = o, there is one more edge exiting o than edges entering o. So we have divf (o) = 2 c o. y x Let us calculate the energy of F. First, note that for x y, F (x, y) 2 = (µ((x, y) α) µ((y, x) α)) 2 µ((x, y) α) 2 + µ((y, x) α) 2. Thus, for α, β independent paths of law µ, E(F ) = r(e)f (e) 2 2 r(e)µ(e α) µ(e β) e e = 2 E e r(e)1 {e α} 1 {e β} We conclude: Proposition Let G be a graph. Suppose that G admits a probability measure µ on infinite paths in G started from some fixed o G such that for two independent paths α, β we have E α β <. Then G is transient. The following is an open question. Conjecture Let G be a transitive graph. If the simple random walk on G is transient, then there exists a measure on infinite paths µ started from some fixed o G such that for two independent paths α, β of law µ, there exists ε > 0 with E[e ε α β ] <.

97 97 Number of exercises in lecture: 0 Total number of exercises until here: 25

98 98 Random Walks Ariel Yadin Lecture 17: Spectral Analysis Spectral Radius Let (G, c) be a network. Recall that the transition matrix P is an operator on C 0 (V ) that operates by P f(x) = y P (x, y)f(y). Also, recall that the space L2 (V ) is the space of functions f C 0 (V ) that admit f, f = x c x f(x) 2 <. One can easily check that P : L 2 (V ) L 2 (V ). Also, P is a self-adjoint operator, and its norm admits P 1 (this is called a contraction). Proposition Let (G, c) be a weighted graph with transition matrix P. The limit does not depend on x, y. ρ(p ) = lim sup(p n (x, y)) 1/n n Proof. Fix z, w V. We will show that ρ(p ) lim sup n (P n (z, w)) 1/n. Because P is irreducible, we have that for some t, t > 0, P t (z, x) > 0, P t (y, w) > 0. Thus, P n (z, w) P t (z, x)p n t t (x, y)p t (y, w). Since (P t (z, x)) 1/n 1 and (P t (y, w)) 1/n 1, lim sup(p n (z, w)) 1/n lim sup(p n (x, y)) 1/n. n n Exchanging the roles of x, y and z, w we get that the lim sup does not depend on the choice of x, y. Definition 17.2 (Spectral Radius). Let (G, c) be a weighted graph with transition matrix P. Define the spectral radius of (G, c) to be ρ(g, c) = ρ(p ) := lim sup P n (x, x). n

99 One of the reasons for the name spectral radius is that by the Cauchy-Hadamard criterion, the generating function for the Green function has radius of convergence ρ 1.

99 99 One of the reasons for the name spectral radius is that by the Cauchy-Hadamard criterion, the generating function for the Green function has radius of convergence ρ 1. That is, the function g(x, y z) = P n (x, y)z n n=0 converges when z < ρ 1. Note that ρ 1, and that g(x, y z = 1) is exactly the Green function. Since the Green function converges if and only if G is transient, we have that for recurrent graphs ρ = 1. The natural question arises, what are precisely the cases for which ρ = 1? This has been answered by Kesten in his PhD thesis in 1959, see Theorem 18.1 below. The above is a good reason for the radius part of the name spectral radius. The next proposition explains the spectral part of the name. Jacques Hadamard ( ) Proposition Let (G, c) be a weighted graph with transition matrix P. Then, P = ρ(p ). Moreover, for any x, y, Proof. First, note that for any x, y, Thus, ρ(p ) P. P n (x, y) cy c x ρ(p ) n. P n (x, y) = c 1 x P n δ y, δ x c 1 x P n δ y δ x = cy c x P n. The other direction is a bit more complicated. Let f L 2 (V ) have finite support S V. Now, because S is finite, for every ε there exists N = N(ε, S), such that for all n > N, and all x, y S we have that P 2n (x, y) (ρ(p ) + ε) 2n. Thus, for all n > N(ε, S), P n f 2 = P 2n f, f = x,y c x P 2n (x, y)f(x)f(y) c x P 2n (x, y)f(x)f(y)1 {f(x)>0} 1 {f(y)>0} x,y (ρ(p ) + ε) 2n c x f(x)f(y)1 {f(x)>0} 1 {f(y)>0} = C f (ρ(p ) + ε) 2n. x,y Thus, lim sup n P n f 1/n ρ(p ) + ε for any ε, and so lim sup n P n f 1/n ρ(p ). Now, consider the sequence a n = P n f. We have that a 2 n+1 = P n+1 f, P n+1 f = P n f, P n+2 f P n f P n+2 f = a n a n+2.

100 100 That is, b n := an+1 a n is a non-decreasing sequence. Thus, the following limits exist and satisfy So, sup b n = lim b n = lim n n n a1/n n ρ(p ). P f f This holds for all finitely supported f. We want this to hold for all f L 2 (V ). = b 0 sup b n ρ(p ). n We now use the fact that the finitely supported functions are dense in L 2 (V ). Indeed, let f L 2 (V ). Fix ε > 0. Since x c xf(x) 2 <, there exists a finite set S ε V such that x S ε c x f(x) 2 < ε 2. Thus, setting g = f1 Sε, we have that f g 2 < ε 2. Now, since g is finitely supported, and since g f, P f = P (f g) + P g P f g + P g P ε + ρ(p ) g P ε + ρ(p ) f. Taking ε 0, P f ρ(p ) f. Since this holds for all f, we get that P ρ(p ). Exercise Let (G, c) be a weighted graph with transition matrix P. Let ρ(p ) be the spectral radius. Show that if G is recurrent then ρ(p ) = Energy minimization. Let (G, c) be a weighted graph. Consider the functions on G with finite support; i.e. L 0 (V ). These all have finite energy. We want to find the function that minimizes the energy, when normalized to have length 1. Proposition Let (G, c) be a weighted graph. Then E(f) 1 ρ(g) = inf 0 f L 0 (V ) 2 f, f. (Sometimes 1 ρ is called the spectral gap. This is the minimal possible energy of unit length functions.)

101 101 Proof. Note that for f L 0 (V ) we can use duality so that Thus, it suffices to show that 1 2E(f) = f, f = f, f P f, f. ρ = ρ := Now, for any f 0 we have by Cauchy-Schwarz P f, f sup 0 f L 0 (V ) f, f. P f, f P f f P f, f, so ρ P = ρ(p ). On the other hand, since P is self-adjoint, for any f, g L 0 (V ), So Now take g = f P f P f, g = 1 ( P (f + g), f + g P (f g), f g ). 4 P f, g ρ 4 ( f + g, f + g + f g, f g ) = ρ ( f, f + g, g ). 2 f P f = So P f ρ f for all f L 0 (V ). P f. Plugging this in above gives f P f P f, P f ρ 2 ( f, f + f 2 P f, P f P f 2 ) = ρ f 2. Using the fact that L 0 (V ) is dense in L 2 (V ) completes the proof: For any f L 2 (V ) and any ε find g L 0 (V ) such that f g < ε and g f. Then, P f P (f g) + P g P ε + ρ g P ε + ρ f. Taking ε 0 gives that ρ(p ) = P ρ Isoperimetric Constant For a graph G, we are interested in how small a boundary of a set can be, compared to the volume of that set. These serve as bottlenecks in the graph, so a random walk can get stuck inside for a while. Thus, it makes sense to define the following. Definition Let (G, c) be a weighted graph. Let S G be a finite subset. Define the (edge) boundary of S to be S = {(x, y) E(G) : x S, y S}. Define the isoperimetric constant of G to be Φ = Φ(G, c) := inf {c( S)/c(S) : S is a finite connected subset of G}.

If Φ(G, c) = 0 we say that (G, c) is amenable. Otherwise we call (G, c) non-amenable.

102 102 Here c( S) = e S c(e) and c(s) = x S c x. Of course 1 Φ(G) 0 for any graph. When Φ(G) > 0, we have that sets expand : the edges coming out of a set carry a constant proportion of the weight of the set. Definition Let (G, c) be a weighted graph. If Φ(G, c) = 0 we say that (G, c) is amenable. Otherwise we call (G, c) non-amenable. A sequence a finite connected sets (S n ) n such that c( S n )/c(s n ) 0 is called a Folner sequence, or the sets are called Folner sets. rling Folner ( ) The concept of amenability was introduced by von Neumann in the context of groups and the Banach-Tarski paradox. Folner s criterion using boundaries of sets provided the ability to carry over the concept of amenability to other geometric objects such as graphs. The isoperimetric constant is a geometrical object. It turns out that positivity of the isoperimetric constant is equivalent to the spectral radius being strictly less than 1. John von Neumann ( ) Exercise Let S T d be a finite connected subset, with S 2. Show that S = S (d 2) + 2. Deduce that Φ(T d ) = d 2 d. Number of exercises in lecture: 2 Total number of exercises until here: 27

103 Random Walks Ariel Yadin Lecture 18: Kesten s Amenability Criterion 18.1. Kesten s Thesis Kesten, in his PhD thesis in 1959 proved the connection between amenability and spectral radius strictly less than 1.

103 103 Random Walks Ariel Yadin Lecture 18: Kesten s Amenability Criterion Kesten s Thesis Kesten, in his PhD thesis in 1959 proved the connection between amenability and spectral radius strictly less than 1. This was subsequently generalized to more general settings by others (including Cheeger, Dodziuk, Mohar). Theorem A weighted graph (G, c) is amenable if and only if ρ(g, c) = 1. In fact, Φ Φ 2 1 ρ Φ. First we require Harry Kesten (1931 ) Lemma Let (G, c) be a weighted graph. For any f L 0 (V ) (that is, with finite support) 2Φ(G, c) c x f(x) f(x, y). x x,y Note that if f = 1 S for a finite set S, this is exactly the definition of Φ. Proof. Since f has finite support we can write c({x : f(x) > t})dt = 0 0 x c x 1 {f(x)>t} dt = x c x f(x)1 {f(x) 0} x c x f(x). Also, 0 1 {f(x)>t f(y)} dt = f(x) f(y) 1 {f(x) f(y)}. Using the set S t = {x : f(x) > t} we see that S t = {(x, y) E : f(x) > t y}.

104 104 Since for any t, Φ c(s t ) c( S t ), we can integrate over t to get Φ c x f(x) Φ c(s t )dt x 0 0 x,y c(x, y)1 {f(x)>t f(y)} dt = c(x, y) f(x) f(y) 1 {f(x) f(y)} 1 2 x,y f(x, y), where we have used the fact that all sums are finite because f has finite support. x,y Proof of Theorem The leftmost inequality is just ξ 2 /2 1 1 ξ 2, valid for any ξ [0, 1]. The rightmost inequality follows by taking a sequence of finite connected sets (S n ) n such that Φ = lim n c( S n )/c(s n ). 1 2 E(1 S n ) = 1 2 Since ( 1 Sn (x, y)) 2 = c(x, y) 2 (1 {(x,y) Sn} + 1 {(y,x) Sn}), Also, 1 Sn, 1 Sn = x c x1 {x Sn} = c(s n ). Thus, r(x, y)( 1 Sn (x, y)) 2 = c(x, y)1 {(x,y) Sn} = c( S n ). x,y x,y E(f) 1 ρ = inf 0 f L 0 (V ) 2 f, f lim c( S n ) n c(s n ) = Φ. The central inequality is Φ 2 1 ρ 2. We use that First, for f L 0 (V ), E(f) 1 ρ = inf 0 f L 0 (V ) 2 f, f 2 f, f + 2 P f, f = x,y c(x, y)f(x) 2 + x,y P f, f and ρ sup 0 f L 0 (V ) f, f. c(x, y)f(y) x,y c(x, y)f(x)f(y) = x,y c(x, y)(f(x) + f(y)) 2 For g = f 2, f, f = x c x g(x) (2Φ) 1 x,y c(x, y) g(x) g(y) = (2Φ) 1 x,y c(x, y) f(x) f(y) f(x) + f(y). Applying Cauchy-Schwarz, 4Φ 2 f, f 2 c(x, y)(f(x) f(y)) 2 c(x, y)(f(x) + f(y)) 2 x,y x,y = E(f) (2 f, f + 2 P f, f ) 2E(f) f, f (1 + ρ).

105 105 Rearranging, we get that for any f L 0 (V ), 4Φ 2 2 E(f) (1 + ρ). f, f Taking infimum over all f L 0 (V ), we get that Φ 2 (1 ρ)(1 + ρ) = 1 ρ 2 as required. Example Let s calculate ρ(t d ), the spectral radius for the d-regular tree. Let r be the root of T d, and let T n = {x : dist(x, r) = n}. For one direction, consider the function n f n (x) = (d 1) k/2 1 {x Tk } = 1 {1 dist(x,r) n} (d 1) dist(x,r)/2. k=1 If x y then c(x, y)f(x)f(y) = (d 1) (dist(x,r)+dist(y,r))/2 if 1 dist(x, r), dist(y, r) n, and 0 otherwise. Thus, since T k = d(d 1) k 1, Simlarly, f n 2 = and 0 otherwise. So, x : 1 dist(x,r) n = d 2 (d 1) 1 n. c x (d 1) dist(x,r) = n d(d 1) k 1 d (d 1) k k=1 2(d 1) 1/2 d (d 1) dist(x,r)/2 2 dist(x, r) n 1, (d 1) 1/2 d (d 1) dist(x,r)/2 dist(x, r) {1, n}, P f n (x) = (d 1) 1/2 x = r, 1 d (d 1) n/2 dist(x, r) = n + 1, P f n 2 = x c x (P f n (x)) 2 n 1 = d d(d 1) k 1 k=2 This implies that 4(d 1) d 2 (d 1) k + d (d 1) 1 + d 2 (d 1) n + d 2 d 1 d 2 (d 1) 1 + d 2 (d 1) n 1 d 1 d 2 (d 1) n = 4(n 2) + d d = 4n 5 + d d 1. ρ(t d ) P f n / f n 2 d 1. d 1 (d 1) n d2 For the other direction, since Φ(T d ) = d 2 d, we have that ρ(t d) 1 Φ(T d ) 2 = 2 d 1 d.

106 106 Number of exercises in lecture: 0 Total number of exercises until here: 27

107 107 Random Walks Ariel Yadin Lecture 19: Speed of Random Walks Let (G, c) be a weighted graph and let (X t ) t be the corresponding weighted random walk. In the exercises one shows that the limit E x [dist(x t, X 0 )] lim t t exists for transitive graphs, and is independent of the choice of starting vertex x. We call this the speed of the random walk. For general graph this limit may not exist, so we consider lim inf and lim sup of the sequence. Of course, these limits lie between 0 and 1. Definition Let (G, c) be a weighted graph and let (X t ) t be the corresponding weighted random walk. Fix some o G. The lower speed and upper speed are defined to be lim inf t E o [dist(x t, X 0 )] t and lim sup t If these limits coincide, we call the corresponding limit the speed. E o [dist(x t, X 0 )]. t Example Let us calculate the speed of the random walk on T d. Fix o T d. Let (X t ) t be the random walk and define D t = dist(x t, o). Let L t = L t (o) = t k=0 1 {X k =o} and L 1 = 0. Consider the sequence M t = dist(x t, o) d 2 d t 2 d L t 1. ( ) E o [dist(x t+1, o) F t ] = 1 {Xt=o} + 1 {Xt o} (dist(xt, o) + 1) d 1 d + (dist(x t, o) 1) 1 d = dist(x t, o) + d 2 ( + 1 {Xt=o} 1 d 2 ) dist(x t, o) d d = dist(x t, o) + d 2 d + 2 d 1 {X t=o}, where we have used that dist(x t, o)1 {Xt=o} = 0. Thus, E o [M t+1 F t ] = dist(x t, o) + d 2 d = dist(x t, o) d 2 d t 2 d L t 1 = M t. + 2 d 1 {X t=o} d 2 d (t + 1) 2 d L t

108 108 So (M t ) t is a martingale. This implies that 0 = E o [M t ] = E o [dist(x t, o)] d 2 d t 2 d E o[l t 1 ]. Since T d is transient, we know by monotone convergence that Thus, lim E 1 o[l t 1 ] = E o [V (o) + 1] = t P o [T o + = ] <. 1 lim t t E o[dist(x t, o)] = d 2 d. It is not a coincidence that T d has positive speed. In fact, this has to do with the fact that ρ(t d ) < 1, or that T d is non-amenable. Theorem Let (G, c) be a weighted graph, and let (X t ) t be the corresponding random walk started at some o G. Assume the following: ρ(g, c) < 1. There exists M > 0 such that c x M for all x (i.e. c x is uniformly bounded). The limit b := lim sup B(o, r) 1/r < r is finite, where B(o, r) is the ball of radius r around o. Then, the lower speed is positive. In fact, a.s. lim inf t 1 t dist(x log ρ(g) t, o) > 0. log b Proof. Let 0 < α < log ρ log b. So ρbα < 1. We can choose λ > b such that ρλ α < 1. Because λ > b, there exists some universal constant K > 0 such that B(o, r) Kλ r for all r. Because c x is uniformly bounded, we have that K can be chosen large enough so that K > M/c o. Thus, for all x and all t, P t (o, x) Kρ t. Combining these two bounds we get that P[dist(X t, o) αt] x B(o, αt ) P t (o, x) K 2 ρ t λ αt. Since ρλ α < 1, we get that these probabilities are summable. By Borel-Cantelli, we have that P[lim inf 1 t dist(x t, o) α] = 0. Taking α log ρ log b completes the proof.

109 109 Recall that by Fatou s Lemma log ρ log b E o[lim inf t 1 dist(x t, o)] lim inf t 1 E o [dist(x t, o)]. So, non-amenable graphs have positive (lower) speed. Example For all d, the random walk on Z d has zero speed: In fact, we show that for a random walk (X t ) t on Z d, E 0 [dist(x t, 0)] t 1/2. Consider the j-th coordinate X t (j). E 0 [X t+1 (j) 2 F t ] = 1 2d ((X t (j) + 1) 2 + (X t (j) 1) 2) + ( 1 1 2d) Xt (j) 2 = X t (j) d. Thus, M t = X t (j) 2 t d is a martingale, and 0 = E 0[X t (j) 2 ] t d. So E 0[ X t (j) ] t/d, and E 0 [dist(x t, 0)] dt. Also, note that we can write X t (j) = where (ξ k ) k are i.i.d. random variables with P[ξ k = 1] = P[ξ k = 1] = 1/2d and P[ξ k = 0] = 1 1/2d. Since E[ξ k ] = 0 and Var[ξ k ] = E[ξk 2 ] = 1/d, we get by the central limit theorem that dt 1/2 X t (j) converges in distribution to a standard normal random variable, N(0, 1). So t k=1 ξ k lim P 0[ d X t (j) t 1 t 2 ] = P[ N(0, 1) 1 2 ] := c > 0. Thus, and so lim inf t 1 t E 0 [ X t (j) ] lim inf t lim inf t 1 t P 0 [ X t (j) 1 2 td 1/2 ] 1 t E 0 [dist(x t, 0)] c 2 d. t 2 d = c 2 d, Since many interesting graphs have zero speed, we sometimes are interested in a bit more precision. Definition Let (G, c) be a weighted graph and let (X t ) t be the corresponding weighted random walk. Fix some o G. The lower escape exponent and upper escape exponent are defined to be lim inf t log E o [dist(x t, X 0 )] log t and lim sup t log E o [dist(x t, X 0 )]. log t If these limits coincide, we call the corresponding limit the escape exponent.

110 110 Example T d has escape exponent 1. In fact any graph with positive speed has escape exponent 1. (This is immediate from log E o [dist(x t, X 0 )] = log( 1 t E o[dist(x t, X 0 )]) + log t.) Example Z d has escape exponent 1/2, as shown above. Speed exponent 1/2 plays an important role in the theory. Walks with speed exponent 1/2 are called diffusive. Walks with speed exponent < 1/2 (resp. > 1/2) are called sub-diffusive (resp. super-diffusive). Walks with speed exponent 1 (i.e. positive speed) are called ballistic Graph Powers If G is a graph, there is a natural graph structure on V (G) d : Define the graph G d to be as follows. The vertex set of G d is V (G d ) = V (G) d. The edges are define by the relations: (x 1,..., x d ) (y 1,..., y d ) k : j k, x j = y j and x k y k. Lemma Let G be a graph with speed exponent α. Then, any lazy random walk on G has speed exponent α. Moreover, for any d 1, the graph G d has speed exponent α as well. Proof. First, Exercise Show that d dist G d((x 1,..., x d ), (y 1,..., y d )) = dist G (x j, y j ). j=1 Now, let (X t ) t be a random walk on G d and let X t (j) be the j-th coordinate of X t. Note that (X t (j)) t is a lazy random walk on G with holding probability 1 1 d. Then, dist(x t, X 0 ) = j dist(x t (j), X 0 (j)), so it suffices to prove that any lazy walk on G has speed exponent α. Let (Y t ) t be a lazy walk on G with holding probability p. Let (X t ) t be a simple random walk on G. Suppose that P is the transition matrix for the simple random walk (X t ) t on G, so that

111 111 the transition matrix for (Y t ) t is Q = pi + (1 p)p. Let f(x) = dist(x, o). We have that t ( ) t Q t = (1 p) k p t k P k, k so k=0 E o [dist(y t, o)] = x Q t (o, x)dist(x, o) = (Q t f)(o) = t k=0 ( ) t (1 p) k p t k P k f(o) = k t k=0 Now, for any ε > 0 there exists K ε such that for all k > K ε, k α ε E o [dist(x k, o)] k α+ε. ( ) t (1 p) k p t k E o [dist(x k, o)]. k Let B t Bin(t, 1 p), and let q k = ( t k) (1 p) k p t k = P[B t = k]. By Chebychev s inequality, P[ B t (1 p)t > 1 2 (1 p)t] 4 Var[B t] (1 p) 2 t 2 = 4p (1 p)t, so P[B t 1 2 (1 p)t] 1 4p (1 p)t 1. Hence, for ε > 0, for all large enough t (so that (1 p)t > 2K ε ), which implies that lim inf t t k=0 log E o [dist(y t, o)] log t On the other hand, so lim sup t q k E o [dist(x k, o)] P[B t 1 2 (1 p)t] ( 1 2 (1 p)t) α ε, ( log P[Bt 1 2 (1 p)t] ((1 p)/2)α ε) α ε + lim = α ε. t log t t q k E o [dist(x k, o)] K ε + t α+ε, k=0 log E o [dist(y t, o)] log t Number of exercises in lecture: 1 Total number of exercises until here: 28 ( log 1 + K ε ) t α + ε + lim α+ε = α + ε. t log t

112 112 Random Walks Ariel Yadin Lecture 20: Lamp-Lighter Graphs We have already seen that non-amenable graphs must have positive speed and so escape exponent 1. Non-amenable graphs are also transient, because their spectral radius is strictly less than 1. The converse of these statements does not hold. Figure 5 sums up the situation (for graphs) in terms of speed, amenability and transience. positive speed non-amenable T d τ(ackermann) LL(Z 3 ) zero speed LL(Z) τ 3 τ = τ((β log 2 k) k ) Z 3 subdiffusive Z transient recurrent Exponential growth line Figure 5. Possibilities for speed, amenability and transience

113 113 We will now construct a special class of graphs called lamp-lighter graphs. These are used to give many examples in geometric group theory. (exponential volume growth) amenable graphs with positive speed. They will provide us with examples of Let us describe the construction in words, before the formal definition. We start with any graph G (finite or infinite). This is the base graph. Suppose some lamp-lighter walks around on the graph G. At every site of G there is some lamp, whose state is either on or off. The lamp-lighter walks around and can also change the state of the lamp at her current position - changing it either to on or to off. What is a position in this new space? A position consists of the configuration of all lamps on G, that is, a function from G to {0, 1} and the position of the lamp-lighter, i.e. a vertex in G. Definition 20.1 (Lamp-Lighter Chain). Let P be a Markov chain on state space S. We define the Markov chain LL(P ), called lamp-lighter on P, as follows. The state space for LL(P ) is LL(S) := S ({0, 1} S ) c, where ({0, 1} S ) c is the set of ω : S {0, 1} with finite support (i.e. ω 1 (1) is finite). For a state (x, ω) LL(S), we call x the position of the lamp-lighter. If ω(y) = 1 we say the lamp at y is on, and if ω(y) = 0 we say it is off. For a lamp configuration ω ({0, 1} S ) c and a position x S we define ω x {0, 1} S ω x (y) = ω(y) for all y x and ω x (x) = ω(x) + 1 (mod 2). Define the transition matrix LL(P ) by setting for η {ω, ω x, ω y, (ω x ) y } and 0 otherwise. LL(P )((x, ω), (y, η)) = 1 P (x, y), 4 If (G, c) is a weighted graph, the LL(G) = LL(P ) for P the weighted random walk on (G, c). by Note that the chain LL(P ) evolves as follows: At each step, the lamp-lighter chooses a neighbor of her current position with distribution P (x, ) and moves there, then she refreshes the state of the lamps at the old position and at the new position to on or off with probability 1/2 each, independently. Remark If G is a graph, then LL(G) defines a graph structure as well. P ((x, ω), (y, η)) > 0 if and only if P (x, y) > 0 and η {ω, ω y }. So the graph structure on LL(G) is given by the relations (x, ω) (y, η) for η {ω, ω x, ω y, (ω x ) y }. In fact:

114 114 Exercise Suppose that (G, c) is a weighted graph, and P is the transition matrix of the weighted random walk on G. Show that LL(P ) is given by a weighted random walk on a weighted graph whose vertices are (x, ω), x G, ω ({0, 1} G ) c. What is the weight function on this graph? Exercise Let P be an irreducible Markov chain. Let (X t, ω t ) t be Markov-LL(P ). Show that (X t ) t is Markov-P. Exercise Let G be a graph, and let L = LL(G). Let o G and let 0 {0, 1} G denote the all zero function (configuration). Then, for any (x, ω) L, dist L ((x, ω), (o, 0)) ω 1 (1). The next example is an (exponential volume growth) amenable graph, but with positive speed. Example Consider L = LL(Z 3 ). First we show that L is amenable. We only need to demonstrate a Folner sequence. For every r, let (B r ) r be a Folner sequence in Z 3 (e.g. the L balls of radius r). Let Note that A r = B r 2 Br. A r = { (x, ω) L : x B r, ω 1 (1) B r }. Also, ((x, ω), (y, η)) A r if and only if (x, y) B r and η {ω, ω y, ω x, (ω x ) y }. Thus, A r = 4 B r 2 Br. Thus, since the degree in L is 12, and so L is amenable. Φ(L) inf r A r 12 A r = inf 4 B r = 0, r 6 B r Next we show that L has positive speed. Let 0 denote the all 0 lamp configuration, and let o = (0, 0) L. Let (X t, ω t ) be a random walk on L. We claim that for any z 0, (20.1) P o [ω t (z) = 1] = 1 2 P o[t + z t],

115 115 where T + z = inf {t 1 : X t = z}. Given this, we have that E o [dist(x t, o)] E o [ ωt 1 (1) ] = z P[ω t (z) = 1] = 1 2 E o[ R t ], where R t = {X 1,..., X t }. Since (X t ) t is a random walk on Z 3, we are left with showing that lim t t 1 E Z3 0 [ R t ] > 0. In fact, we have using Exercise 20.4 below, E Z3 0 [ R t ] P Z3 0 [T 0 + t = ] > 0. We turn to proving (20.1). Let (y 0, η 0 ),..., (y n, η n ) be a path in L. Let T = inf {t : y t = z}, (where inf = ). Define a new path (y 0, η 0 ),... (y T 1, η T 1 ), (y T, η z T ), (y T +1, η z T +1),..., (y n, η z n). Since L is a regular graph, both paths have the same probability. Summing over all possible paths, we get that for any k t, So Thus, proving (20.1). P o [ω t (z) = 1, T + z = k] = P o [ω z t (z) = 1, T + z = k] = P o [ω t (z) = 0, T + z = k]. P o [ω t (z) = 1] = P o [ω t (z) = 1 T + z = k] = P o [ω t (z) = 0 T + z = k] = 1 2. t P o [ω t (z) = 1, T z + = k] = 1 2 k=1 t P o [T z + = k] = 1 2 P o[t z + t], k=1 Exercise Show that for d 3, if (X t ) t is a random walk on Z d, and R t = {X 1,..., X t } is the range, then E 0 [ R t ] t P 0 [T + 0 = ]. Example We have already seen an example of amenable zero-speed graphs: Z d. We in fact know that these are diffusive. Let us show that this can even be done with a graph of exponential volume growth.

116 116 We will show that LL(Z) is (at most) diffusive. Let (X t, ω t ) be a random walk on L = LL(Z). Let o = (0, 0) L. Define M t = max k t X t. Since the lamp-lighter up to time t never leaves [ M t, M t ], we have P o -a.s. that ω 1 (1) [ M t, M t ]. Note that at time t, the lamp-lighter can walk to one of the ends of [ M t, M t ] in at most M t steps, and then start turning off all the lamps in [ M t, M t ] in at most 2M t steps, finally returning to 0 in at most another M t steps. Thus, dist(x t, o) 4M t for all t, P o -a.s. Thus it suffices to show that E[M t ] 2 t for all t. For this we use a trick called the reflection principle. By the strong Markov property at time T x, we have that P o [X t x, T x t] = t P o [T x = k] P x [X t k x] k=0 t 1 = 1 {x=o} P o [X t o] + P o [T x = k] P o[t x = t] P o [T x t] 1 2. where we have used transitivity, and symmetry by reflecting around 0: k=1 P x [X t x] = P 0 [X t 0] = P 0 [X t 0] 1 2, since P 0 [X t 0] + P 0 [X t 0] = 1 + P 0 [X t = 0] 1. We now have Reflecting around 0, So P 0 [max k t X k x] = P 0 [T x t] 2 P 0 [X t x]. P 0 [min k t X k x] = P 0 [T x t] 2 P 0 [X t x]. P 0 [M t x] P 0 [max k t X k x] + P 0 [min k t X k x] 2 P 0 [X t x] + 2 P 0 [X t x]. We conclude with 2 E[ X t ] = 2 P 0 [ X t x + 1] = 2 P 0 [X t x + 1] + 2 P 0 [X t (x + 1)] x=0 x=0 P 0 [M t x + 1] = E 0 [M t ]. x=0

117 117 So E 0 [M t ] 2 E[ X t ] 2 t. Thus, E o [dist(x t, o)] 8 t. Number of exercises in lecture: 4 Total number of exercises until here: 32

118 Random Walks Ariel Yadin Lecture 21: Our next goal is to complete the picture in Figure 5; that is to give examples of graphs that are transient, but have very slow speed (sub-diffusive), and

118 118 Random Walks Ariel Yadin Lecture 21: Our next goal is to complete the picture in Figure 5; that is to give examples of graphs that are transient, but have very slow speed (sub-diffusive), and examples of graphs that are recurrent but have positive upper speed Concentration of Martingales: Azuma s inequality Let (X t ) t be a random walk on Z. We know (using the martingale X t 2 t) that E 0 [T { r,r} ] = r 2. That is, it takes a random walk r 2 steps to reach distance r. We have already seen that this implies diffusive behavior of the walk. Let us prove a short concentration result, showing that actually T { r,r} is close to r 2 with very high probability. Theorem 21.1 (Azuma s Inequality). Let (M t ) t be a (F t ) t -martingale with bounded increments (i.e. M t+1 M t 1 a.s.). Then for any λ > 0, Proof. There are two main ideas: P[M t M 0 λ] exp ( λ2 2t The first idea, is that for a random variable X with E[X] = 0 and X 1 a.s. one has E[e αx ] e α2 /2. Indeed, f(x) = e αx is a convex function, so for x 1 we can write x = β 1 + (1 β) ( 1), where β = x+1 2, so ). e αx βe α + (1 β)e α = cosh(α) + x sinh(α). (Here 2 cosh(α) = e α + e α and 2 sinh(α) = e α e α.) Thus, because E[X] = 0, and using (2k)! 2 k k!, E[e αx ] cosh(α) + E[X] sinh(α) = cosh(α) = k=0 α 2k (2k)! α 2k 2 k k! = 2 eα /2. k=0 For the second idea, due to Sergei Bernstein, one applies the Chebychev / Markov inequality egei Bernstein ( ) to the non-negative random variable e αx, and then optimizes on α.

119 119 In our case: For every t, since E[M t M t 1 F t 1 ] = 0 and M t M t 1 1, exactly as above we could show that E[e α(mt Mt 1) F t 1 ] e α2 /2 a.s. Thus, E[e α(mt M0) ] = E [ e α(mt 1 M0) E[e α(mt Mt 1) F t 1 ] ] e α2 /2 E[e α(mt 1 M0) ] e tα2 /2. Now apply Markov s inequality to the non-negative random variable e α(mt M0) to get P[M t M 0 λ] = P[e α(mt M0) e αλ ] exp ( 1 2 tα2 αλ ). Optimizing over α we get that for α = λ/t, P[M t M 0 λ] exp ) ( λ2. 2t Example Let s apply Azuma s inequality to random walks on Z. Let (X t ) t be a random walk on Z. Recall that (X t ) t is a martingale. Consider the stopping time T = T { r,r}. This is the first time X t r. Recall the reflection principle: P 0 [T t] = P 0 [max k t X k r] 4 P 0 [X t r]. Using Azuma s inequality on this last term, P 0 [T { r,r} t] 4 exp ) ( r2. 2t d k Recurrent Trees - The Grove Let (d k ) k N be a sequence of positive numbers. For each k, let τ k be a binary tree of depth Define the graph τ((d k ) k ) to be the graph N, with the tree τ k glued at the vertex k N (let the root of τ k be k N); that is, the vertex set of τ((d k ) d ) is k=0 V (τ k). The edges are those in each τ k, with the edges k k + 1 for all k added. We call this the (d k ) k -grove. Proposition The graph τ((d k ) k ) is recurrent.

120 120 d 4 d 0 d 1 d 2 d Figure 6. The graph τ((d k ) k ). Proof. If v is a unit voltage on 0 and τ k, then for any n k, and any vertex x τ n we have that v(x) = v(n). Indeed, if (X t ) t is a random walk on this graph, then because τ n is finite, P x -a.s. the hitting time of n is finite, and also, v(x t ) is a bounded martingale. Thus, by the optional stopping theorem, v(x) = E x [v(x Tn )] = v(n). Thus, we can short together all vertices in each tree τ n, n k. This results in the network which is just the graph N. Thus, R eff (0, τ n ) = n. Recall that if τ is a finite binary tree of depth d, then E(τ) = V (τ) 1 = d k=0 2k 1 = 2 d+1 2. Lemma Let r N τ((d k ) k ). The hitting time of r, T r, has expectation given by r 1 E 0 [T r ] = 4 (r k)2 d k 3 r(r + 1). 2 k=0 Proof. Every time the walk as at a vertex k N, with probability 1/2 it starts a random walk in the finite subtree τ k. The expected time to return to the root in a finite tree is the reciprocal of the stationary distribution on that tree. Thus, we have λ k := E k [T + k X 1 N] = E(τ k ) = 2(2 d k 1). Now, using the strong Markov property, for k > 0, E k [T k+1 ] = E k 1[T k+1 ] (λ k + E k [T k+1 ]) = E k 1[T k ] E k[t k+1 ] λ k E k[t k+1 ].

121 121 Rearranging and iterating, k E k [T k+1 ] = 2λ k E k 1 [T k ] = = 2 λ j + k + E 0 [T 1 ]. Similarly, E 0 [T 1 ] = 2 3 (λ 0 + E 0 [T 1 ]) + 1 3, so E 0[T 1 ] = 2λ Thus, j=1 and k k E k [T k+1 ] = 2 λ j + k + 1 = 2 dj+2 3(k + 1), j=0 j=0 r 1 r 1 r 1 E 0 [T r ] = E k [T k+1 ] = (r k)2 dk+2 3 k + 1 k=0 k=0 k=0 r 1 = (r k)2 dk+2 3 r(r + 1) 2 k=0 Exercise Let τ be a finite binary tree of depth d with root o. Then, P o [T + o > 2 d ] (2e) 1. The next theorem gives an example of a tree with speed exponent α for any α 1/2. Theorem Let d k = β log 2 (k + 1). The tree τ((d k ) k ) has speed exponent (β + 2) 1. Proof. Let (X t ) t be a random walk on τ = τ((d k ) k ). For x τ denote x the number that is the root of the unique finite subtree τ k such that x τ k. So x dist(x, 0) x + d x. So it suffices to prove that log E 0 [ X t ] lim = (β + 2) 1. t log t The lower bound is simpler. Note that P 0 [ X t < r] P 0 [ X t < r, T r > t] + P 0 [ X t < r, T r t] P 0 [T r > t] + P r [ X t < r] t 1 E 0 [T r ] If we take t 4 E 0 [T r ], we get that P 0 [ X t < r] 3 4. Since r 1 E 0 [T r ] = 4 (r k)2 d k 3 r 2 r(r + 1) (r x + 1)x β dx 3 2 r(r + 1) rβ+2, k=0 1

122 122 We get that for t = 4 E 0 [T r ], r t 1/(β+2). So and E 0 [ X t ] P 0 [ X t r]r r 4 t1/(β+2). We now turn to the upper bound on E 0 [ X t ]. Define inductively the following times. θ 0 = 0 θ n+1 = inf {t θ n : X t X θn }. That is, (θ n ) n are the subsequent times the random walk moves from a vertex in N to a new vertex in N. For every 0 < k N P k [X 1 = k + 1 X 1 N] = P k [X 1 = k 1 X 1 N] = 1 2. So the sequence (Z n = X θn ) n is a random walk on N. Now, if the walk is at a vertex k N, then with probability 1/2 it performs an excursion into the finite subtree τ k, and with remaining probability 1/2 is moves in N. Thus, by the exercise above, P 0 [θ n+1 > θ n + 2 d k Zn = k, F θn ] (4e) 1 Let x = r, y = 2r, z = 3r. Let N < M be such that θ N = T y and θ M = inf {m > N : Z m {x, z}}. For n N let J n = 1 {θn+1>θ n+2 dx }, and let S = {N n < M : J n = 1}. Since d k d x for all k [x, z], we have from the above, that for any set A {0, 1,..., k 1}, a.s. P 0 [S A + N M N k] ( 1 (4e) 1) k A. Thus, for any λ < K, the event S < λ can be bounded by P 0 [ S < λ] P 0 [M N < K] + P 0 [ S < λ M N K] ( ) K (1 P 0 [M N < K] + (4e) 1 ) K λ. λ Now, M N is the number of steps a random walk on Z started at y = 2r takes to reach {x, z} = {r, 3r}. Translating r 0 we get that P 0 [M N < λ] is bounded by the probability a random walk on Z started at 0 reaches [ r, r] before time λ. Azuma s inequality above (and Example 21.2 following it) tells us that P 0 [M N < K] 4 exp ) ( r2. 2K Taking K = r 2 (8 log r) 2 and λ = εk for ε small enough (so that log(1 (4e) 1 ) (ε 1 1) > 2 log ε) we have P 0 [ S < λ] < exp( Ω((log r) 2 )) + exp( Ω((r/ log r) 2 )),

123 123 which decays faster than any polynomial in r. The event S λ implies that T 3r θ M > S 2 dx + θ N > λ 2 dx. We thus conclude that for t = λ 2 dx, P 0 [ X t > 3r] P 0 [T 3r < t] P 0 [ S < λ] exp( Ω((log r) 2 )). Since λ r 2 (log r) 1 and 2 dx r β. we get that t r 2+β (log r) 1, and E 0 [ X t ] 3r + t P 0 [ X t > 3r] 3r + t exp( Ω((log t) 2 )) t 1/(β+2)+o(1). So lim sup t which coincides with our lower bound. log E 0 [ X t ] log t 1 β + 2, Transient and Sub-Diffusive Example We now have an example of a transient sub-diffusive graph. Let τ = τ((d k ) k ) be the grove for d k = β log 2 (k + 1). Let G = τ 3. We know that as a graph power G has speed exponent (β + 2) 1. However, since N is a subgraph of τ, then also N 3 is a subgraph of G. We know that N 3 is transient, so G must be transient as well Recurrent Positive Speed Exercise [Paley-Zygmund Inequality] Let X be a non-negative random variable. Let α [0, 1]. Then, P[X > α E[X]] (E[X])2 E[X 2 ] (1 α) 2. Raymond Paley ( ) Lemma There exists a universal constant p > 0 such that the following holds. Let τ be a finite binary tree of depth d with root o. For any t d, P o [dist(x t, o) 1 6t] p, Antoni Zygmund ( ) where (X t ) t is a random walk on τ.

124 124 Proof. Let D t = dist(x t, o). We have already seen that for L t := t k=0 1 {X t=o} (L 1 = 0) and M t = D t 1 3 t L t 1 that (M t ) d t=0 is a martingale (the restriction to t d is so that the walk does not reach the leaves). Thus, for t d, Also, for t d, So E o [D t ] = 1 3 t + E o[l t 1 ]. E o [D 2 t F t 1 ] = 1 {Xt 1=o} + 1 {Xt 1 o} ( 1 3 (D t 1 1) (D t 1 + 1) 2) = D 2 t D t 1. E o [D 2 t ] = E o [D 2 t 1] E 0[D t 1 ] = = t t 1 k=0 1 3 k + E o[l k 1 ]. Note that for t d, we have that L t is the number of visits to the root up to time t. Let q be the probability that a random walk on an infinite rooted binary tree does not return to the root. Then, if A is the set of leaves in τ then P o [T A < T + o ] q. However, since P o -a.s. t d T A, we get that for any t d, Thus we conclude that 1 E o [L t ] E o [L TA ] = 1 P o [T A < T + o ] 1 q <. E o [D 2 t ] t t(t 1) + 1 q t. We now use the Payley-Zygmund inequality to conclude that for any t d, P o [D t 1 2 E o[d t ]] (E o[d t ]) 2 4 E o [Dt 2 ] Since E o [D t ] 1 3t the proof is complete t2 1 9 t2 + ( q ) t 1 4. Example We complete the picture in Figure 5 by giving an example of a recurrent graph, but with positive speed. Recall that for the (d k ) k grove, the expected time to reach the vertex r N is r 1 E 0 [T r ] = 4 (r k)2 d k 3 r(r + 1). 2 k=0 Let (d k ) k be an increasing sequence that satisfies d r > ( r 1 ) (r k)2 d k 3 2 r(r + 1). k=0

125 125 (This sequence must grow super-fast, at least like the Ackermann tower function.) Note that this ensures that d r > 4 E 0 [T r ]. Consider the (d k ) k -grove τ = τ((d k ) k ). τ is of course recurrent. Recall that for a random walk (X t ) t on τ and for t 4 E 0 [T r ], we have that P 0 [ X t < r] 3 4 (where X t is the root of the finite subtree containing X t ). Given that X 0 = r, we have by Lemma 21.7 for any t d r, P r [dist(x t, r) 1 6t] c > 0, for some universal constant c > 0. So if we take t = 2 t for t = 4 E 0 [T r ], then t < d r so So P 0 [dist(x t, 0) 1 6 t ] P 0 [ X t = k] P k [dist(x t, k) 1 6 t ] 1 4 c. And so τ has positive speed. t k=r E 0 [dist(x t, 0)] 1 4 c 1 6 t t. Number of exercises in lecture: 2 Total number of exercises until here: 34

126 Random Walks Ariel Yadin Lecture 22: 22.1. Galton-Watson Processes rancis Galton (1822-1911) enry Watson (1827-1903) The final topic for this course is a special Markov chain on trees, known as the Galton-Watson process.

In words, the model can be stated as follows. We start with one individual. This individual has a certain random number of offspring. Thus passes one generation.

126 126 Random Walks Ariel Yadin Lecture 22: Galton-Watson Processes rancis Galton ( ) enry Watson ( ) The final topic for this course is a special Markov chain on trees, known as the Galton-Watson process. Galton and Watson were interested in the question of the survival of aristocratic surnames in the Victorian era. They proposed a model to study the dynamics of such a family name. In words, the model can be stated as follows. We start with one individual. This individual has a certain random number of offspring. Thus passes one generation. In the next generation, each one of the offspring has its own offspring independently. The processes continues building a random tree of descent. The formal definition is a bit complicated. For the moment let us focus only on the population size at a given generation. Definition Let µ be a distribution on N; i.e. µ : N [0, 1] such that n µ(n) = 1. The Galton-Watson Process, with offspring distribution µ, (also denoted GW µ,) is the following Markov chain (Z n ) n on N: Let (X j,k ) j,k N be a sequence of i.i.d. random variables with distribution µ. At generation n = 0 we set Z 0 = 1. [ Start with one individual. ] Given Z n, let Z n Z n+1 := X n+1,k. k=1 [ X n+1,k represents the number of offspring of the k-th individual in generation n. ] Example If µ(0) = 1 then the GW µ process is just the sequence Z 0 = 1, Z n = 0 for all n > 0. If µ(1) = 1 then GW µ is Z n = 1 for all n. How about µ(0) = p = 1 µ(1)? In this case, Z 0 = 1, and given that Z n = 1, we have that Z n+1 = 0 with probability p, and Z n+1 = 1 with probability 1 p, independently of all (Z k : k n). If Z n = 0 the Z n+1 = 0 as well.

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time