Branching processes. Chapter Background Basic definitions

Size: px

Start display at page:

Download "Branching processes. Chapter Background Basic definitions"

Ashley Lewis
6 years ago
Views:

1 Chapter 5 Branching processes Branching processes arise naturally in the study of stochastic processes on trees and locally tree-like graphs. After a review of the basic extinction theory of branching processes, we give a few classical examples of applications in discrete probability. 5.1 Background We begin with a review of the extinction theory of Galton-Watson branching processes Basic definitions Recall the definition of a Galton-Watson process. Definition 5.1. A Galton-Watson branching process is a Markov chain of the fol- Galton-Watson lowing form: process Let Z 0 := 1. Let X(i, t), i 1, t 1, be an array of i.i.d. Z + -valued random variables with finite mean m = E[X(1, 1)] < +1, and define inductively, Z t := X X(i, t). 1appleiappleZ t 1 To avoid trivialities we assume P[X(1, 1) = i] < 1 for all i 0. Version: September 9, 2015 Modern Discrete Probability: An Essential Toolkit Copyright c 2015 Sébastien Roch 212

2 In words, Z t models the size of a population at time (or generation) t. The random variable X(i, t) corresponds to the number of offspring of the i-th individual (if there is one) in generation t 1. Generation t is formed of all offsprings of the individuals in generation t 1. We denote by {p k } k 0 the law of X(1, 1). We also let f(s) :=E[s X(1,1) ] be the corresponding probability generating function. By tracking genealogical relationships, that is, who is whose child, we obtain a tree T rooted at the single individual in generation 0 with a vertex for each individual in the progeny and an edge for each parent-child relationship. We refer to T as a Galton-Watson tree. A basic observation about Galton-Watson processes is that their growth is exponential in t. Galton-Watson tree Lemma 5.2 (Exponential growth I). Let M t := m t Z t. Then (M t ) is a nonnegative martingale with respect to the filtration F t = (Z 0,...,Z t ). In particular, E[Z t ]=m t. Proof. Recall the following measure-theoretic lemma (see e.g. [Dur10, Exercise 5.1.1]). Lemma 5.3. Let (, F, P) be a probability space. If Y 1 = Y 2 a.s. on B 2 F then E[Y 1 F]=E[Y 2 F] a.s. on B. Returning to the proof, observe that on {Z t 1 = k} 2 3 E[Z t F t 1 ]=E4 X(j, t) F t 1 5 = mk = mz t 1. 1applejapplek This is true for all k. Rearranging shows that (M t ) is a martingale. For the second claim, note that E[M t ]=E[M 0 ]=1. In fact, the martingale convergence theorem gives the following. Lemma 5.4 (Exponential growth II). We have M t! M 1 < +1 a.s. for some nonnegative random variable M 1 2 ([ t F t ) with E[M 1 ] apple 1. Proof. This follows immediately from the martingale convergence theorem for nonnegative martingales (Corollary 3.35) and Fatou s lemma. 213

3 5.1.2 Extinction Observe that 0 is a fixed point of the process. The event {Z t! 0} = {9t : Z t =0}, is called extinction. Establishing when extinction occurs is a central question in extinction branching process theory. We let be the probability of extinction. Throughout, we assume that p 0 > 0 and p 1 < 1. Here is a first observation about extinction. Lemma 5.5. A.s. either Z t! 0 or Z t! +1. Proof. The process (Z t ) is integer-valued and 0 is the only fixed point of the process under the assumption that p 1 < 1. From any state k, the probability of never coming back to k>0is at least p k 0 > 0, so every state k>0is transient. The claim follows. In the critical case, that immediately implies almost sure extinction. Theorem 5.6 (Extinction: critical case). Assume m =1. Then Z t! 0 a.s., i.e., =1. Proof. When m =1, (Z t ) itself is a martingale. Hence (Z t ) must converge to 0 by Lemma 5.4. We address the general case using generating functions. Let f t (s) =E[s Zt ]. Note that, by monotonicity, = P[9t 0:Z t = 0] = lim P[Z t = 0] = lim f t(0), (5.1) t!+1 t!+1 Moreover, by the Markov property, f t has a natural recursive form f t (s) = E[s Zt ] where f (t) is the t-th iterate of f. = E[E[s Zt F t 1 ]] = E[f(s) Z t 1 ] = f t 1 (f(s)) = = f (t) (s), (5.2) Theorem 5.7 (Extinction: subcriticial and supercritical cases). The probability of extinction is given by the smallest fixed point of f in [0, 1]. Moreover: (Subcritical regime) If m<1 then =1. 214

4 (Supercritical regime) If m>1 then < 1. Proof. The case p 0 + p 1 =1is straightforward: the process dies almost surely after a geometrically distributed time. So we assume p 0 + p 1 < 1 for the rest of the proof. We first summarize some properties of f. Lemma 5.8. On [0, 1], the function f satisfies: (a) f(0) = p 0, f(1) = 1; (b) f is indefinitely differentiable on [0, 1); (c) f is strictly convex and increasing; (d) lim s"1 f 0 (s) =m<+1. Proof. See e.g. [Rud76] for the relevant power series facts. Observe that (a) is clear by definition. The function f is a power series with radius of convergence R 1. This implies (b). In particular, f 0 (s) = X i 1 ip i s i 1 0, and f 00 (s) = X i 2 i(i 1)p i s i 2 > 0, because we must have p i > 0 for some i>1 by assumption. This proves (c). Since m<+1, f 0 (1) = m is well defined and f 0 is continuous on [0, 1], which implies (d). We first characterize the fixed points of f. See Figure 5.1 for an illustration. Lemma 5.9. We have: If m>1 then f has a unique fixed point 0 2 [0, 1). If m<1 then f(t) >tfor t 2 [0, 1). Let 0 := 1 in that case. Proof. Assume m>1. Since f 0 (1) = m>1, there is > 0 s.t. f(1 ) < 1. On the other hand f(0) = p 0 > 0 so by continuity of f there must be a fixed point in (0, 1 ). Moreover, by strict convexity and the fact that f(1) = 1, if x 2 (0, 1) is a fixed point then f(y) <yfor y 2 (x, 1), proving uniqueness. The second part follows by strict convexity and monotonicity. It remains to prove convergence of the iterates to the appropriate fixed point. See Figure 5.2 for an illustration. Lemma We have: 215

5 Figure 5.1: Fixed points of f in subcritical (left) and supercritical (right) cases. Figure 5.2: Convergence of iterates to a fixed point. 216

6 If x 2 [0, 0 ), then f (t) (x) " 0 If x 2 ( 0, 1) then f (t) (x) # 0 Proof. We only prove 1. The argument for 2. is similar. By monotonicity, for x 2 [0, 0 ), we have x<f(x) <f( 0 )= 0. Iterating x<f (1) (x) < <f (t) (x) <f (t) ( 0 )= 0. So f (t) (x) " L apple 0. By continuity of f we can take the limit inside of f (t) (x) =f(f (t 1) (x)), to get L = f(l). So by definition of 0 we must have L = 0. The result then follows from the above lemmas together with Equations (5.1) and (5.2). Example 5.11 (Poisson branching process). Consider the offspring distribution X(1, 1) Poi( ) with > 0. We refer to this case as the Poisson branching process. Then f(s) =E[s X(1,1) ]= X i 0 e i i! si = e (s 1). So the process goes extinct with probability 1 when apple 1. For > 1, the probability of extinction is the smallest solution in [0, 1] to the equation e (1 x) = x. The survival probability := 1 satisfies 1 e =. J We can use the extinction results to obtain more information on the limit in Lemma 5.4. Of course, conditioned on extinction, M 1 =0a.s. On the other hand: Lemma 5.12 (Exponential growth III). Conditioned on nonextinction, either M 1 = 0 a.s. or M 1 > 0 a.s. In particular, P[M 1 = 0] 2 {, 1}. Proof. A property of rooted trees is said to be inherited if all finite trees satisfy this property and whenever a tree satisfies the property then so do all the descendant trees of the children of the root. The property {M 1 =0} is inherited. The result then follows from the following 0-1 law. 217

7 Lemma 5.13 (0-1 law for inherited properties). For a Galton-Watson tree T, an inherited property A has, conditioned on nonextinction, probability 0 or 1. Proof. Let T (1),...,T (Z 1) be the descendant subtrees of the children of the root. We use the notation T 2 A to mean that the tree T satisfies A. Then, by independence, P[A] =E[P[T 2 A Z 1 ]] apple E[P[T (i) 2 A, 8i apple Z 1 Z 1 ]] = E[P[A] Z 1 ]=f(p[a]), so P[A] 2 [0, ] [ {1}. Also P[A] because A holds for finite trees. That concludes the proof. A further moment assumption provides a more detailed picture. Lemma 5.14 (Exponential growth: finite second moment). Let (Z t ) be a branching process with m = E[X(1, 1)] > 1 and 2 = Var[X(1, 1)] < +1. Then, (M t ) converges in L 2 and, in particular, E[M 1 ]=1. Further, P[M 1 = 0] =. Proof. We bound E[M 2 t ] by computing it explicitly by induction. From the orthogonality of increments (Lemma 3.39), it holds that E[M 2 t ]=E[M 2 t 1]+E[(M t M t 1 ) 2 ]. On {Z t 1 = k}, Hence E[(M t M t 1 ) 2 F t 1 ] = m 2t E[(Z t mz t 1 ) 2 F t 1 ] kx = m 2t E 4 X(i, t) mk! F t 1 5 Since E[M 2 0 ]=1, = m 2t k 2 i=1 = m 2t Z t 1 2. E[M 2 t ]=E[M 2 t 1]+m t 1 2. Xt+1 E[Mt 2 ]=1+ 2 m i, which is uniformly bounded when m>1. Therefore (M t ) converges in L 2 (see e.g. [Dur10, Theorem 5.4.5]). Finally, by Fatou s lemma, i=2 E M 1 apple sup km t k 1 apple sup km t k 2 <

8 and E[M t ] E[M 1 ] applekm t M 1 k 1 applekm t M 1 k 2, implies the convergence of expectations. The last statement follows from Lemma Remark The Kesten-Stigum Theorem gives a necessary and sufficient condition for E[M 1 ]=1to hold [KS66b]. See e.g. [LP, Chapter 12] Bond percolation on Galton-Watson trees Let T be a Galton-Watson tree for an offspring distribution with mean m>1. Perform bond percolation on T with density p. Theorem 5.16 (Bond percolation on Galton-Watson trees). Conditioned on nonextinction, p c (T )= 1 m Proof. Let C 0 be the cluster of the root in T with density p. We can think of C 0 as being generated by a Galton-Watson branching process where the offspring distribution is the law of P X(1,1) i=1 I i where the I i s are i.i.d. Ber(p) and X(1, 1) is distributed according to the offspring distribution of T. In particular, by conditioning on X(1, 1), the offspring mean under C 0 is mp. If mp apple 1 then a.s. 1=P p [ C 0 < +1] =E[P p [ C 0 < +1 T ]], and we must have P p [ C 0 < +1 T ]=1a.s. In other words, p c (T ) 1 m a.s. On the other hand, the property of trees {P p [ C 0 < +1 T ]=1} is inherited. So by Lemma 5.13, conditioned on nonextinction, it has probability 0 or 1. That probability is of course 1 on extinction. So by P p [ C 0 < +1] =E[P p [ C 0 < +1 T ]], if the probability is 1 conditioned on nonextinction then it must be that mp apple 1. In other words, for any fixed p such that mp > 1, conditioned on nonextinction P p [ C 0 < +1 T ]=0a.s. By monotonicity of P p [ C 0 < +1 T ] in p, taking a limit p n! 1/m proves the result Random walk on Galton-Watson trees To be written. See [LP, Theorem 3.5 and Corollary 5.10]. Requires: Section and

9 5.2 Random-walk representation In this section, we develop a useful random-walk representation of the Galton- Watson process Exploration process Consider the following exploration process of a Galton-Watson tree T. The exploration process, started at the root 0, has 3 types of vertices: - A t : active vertices, - E t : explored vertices, active, explored, and neutral vertices - N t : neutral vertices. We start with A 0 := {0}, E 0 := ;, and N 0 contains all other vertices in T. At time t, if A t 1 = ; we let (A t, E t, N t ):=(A t 1, E t 1, N t 1 ). Otherwise, we pick an element, a t, from A t 1 and set: - A t := A t 1 [ {x 2 N t 1 : {x, a t } 2 T }\{a t }, - E t := E t 1 [ {a t }, - N t := N t 1 \{x 2 N t 1 : {x, a t } 2 T }. To be concrete, we choose a t in breadth-first search (or first-come-first-serve) manner: we exhaust all vertices in generation t before considering vertices in generation t +1. We imagine revealing the edges of T as they are encountered in the exploration process and we let (F t ) be the corresponding filtration. In words, starting with 0, the Galton-Watson tree T is progressively grown by adding to it at each time a child of one of the previously explored vertices and uncovering its children in T. In this process, E t is the set of previously explored vertices and A t is the set of vertices who are known to belong to T but whose full neighborhood is waiting to be uncovered. The rest of the vertices form the set N t. Let A t := A t, E t := E t, and N t := N t. Note that (E t ) is non-decreasing while (N t ) is non-increasing. Let 0 := inf{t 0:A t =0}, (which by convention is +1 if there is no such t). The process is fixed for all t> 0. Notice that E t = t for all t apple 0, as exactly one vertex is explored at each 220

10 time until the set of active vertices is empty. Moreover, for all t, (A t, E t, N t ) forms a partition of [n] so A t + t + N t = n, 8t apple 0. Lemma 5.17 (Total progeny). Let W be the total progeny. Then W = 0. The random-walk representation is the following. Observe that the process (A t ) admits a simple recursive form. Recall that A 0 := 1. Conditioning on F t 1 : - If A t 1 =0, the exploration process has finished its course and A t =0. - Otherwise, (a) one active vertex becomes an explored vertex and (b) its neutral neighbors become active vertices. That is, 8 >< A t 1 + {z} 1 + X t, t 1 < 0, {z} A t = (a) (b) >: 0, o.w. where X t is distributed according to the offspring distribution. We let Y t = X t 1 1 and with S 0 := 1. Then S t := =inf{t 0:S t =0} tx Y i, i=1 =inf{t 0:1+[X 1 1] + +[X t 1] = 0} =inf{t 0:X X t = t 1}, and (A t ) is a random walk started at 1 with steps (Y t ) stopped when it hits 0 for the first time: A t =(S t^ 0 ). We give two applications. 221

11 5.2.2 Duality principle The random-walk representation above is useful to prove the following duality principle. Theorem 5.18 (Duality principle). Let (Z t ) be a branching process with offspring distribution {p k } k 0 and extinction probability < 1. Let (Zt) 0 be a branching process with offspring distribution {p 0 k } k 0 where p 0 k = k 1 p k. Then (Z t ) conditioned on extinction has the same distribution as (Z 0 t), which is referred to as the dual branching process. Proof. We use the random walk representation. Let H =(X 1,...,X 0 ) and H 0 = (X 0 1,...,X0 0 0) be the histories of the processes (Z t ) and (Z 0 t) respectively. (Under breadth-first search, the process (Z t ) can be reconstructed from H.) In the case of extinction, the history of (Z t ) has finite length. We call (x 1,...,x t ) a valid history if x x i (i 1) > 0 for all i<tand x x t (t 1) = 0. By definition of the conditional probability, for a valid history (x 1,...,x t ) with a finite t, dual branching process P[H =(x 1,...,x t ) 0 < +1] = P[H =(x 1,...,x t )] P[ 0 < +1] ty = 1 p xi. i=1 Because x x t = t 1, 1 ty ty ty p xi = 1 1 x i p 0 x i = p 0 x i = P[H 0 =(x 1,...,x t )]. i=1 i=1 i=1 Note that X k 0 p 0 k = X k 0 k 1 p k = 1 f( ) =1, because is a fixed point of f. So {p 0 k } k 0 is indeed a probability distribution. Note further that X k k 1 p k = f 0 ( ) < 1, k 0 kp 0 k = X k 0 since f 0 is strictly increasing, f( ) = < 1 and f(1) = 1. So the dual branching process is subcritical. 222

12 Example 5.19 (Poisson branching process). Let (Z t ) be a Galton-Watson branching process with offspring distribution Poi( ) where > 1. Then the dual probability distribution is given by p 0 k = k 1 p k = k 1 e where recall that e (1 ) =, so k k! = 1 e ( ) k, k! p 0 k = e (1 ) e ( ) k k! = e ( )k. k! That is, the dual branching process has offspring distribution Poi( ). J Hitting-time theorem The random-walk representation also gives a formula for the distribution of the size of the progeny. We start with a combinatorial lemma of independent interest. Let u 1,...,u t 2 R and define r 0 := 0 and r i := u 1 + +u i for 1 apple i apple t. We say that j is a ladder index if r j >r 0 _ _r j 1. Consider the cyclic permutations of u =(u 1,...,u t ): u (0) = u, u (1) = (u 2,...,u t,u 1 ),..., u (t 1) = (u t,u 1,...,u t 1 ). Define the corresponding partial sums r ( ) j := u ( ) u ( ) j for j =1,...,t and =0,...,t 1. Observe that (r ( ) 1,...,r( ) t ) =(r +1 r,r +2 r,...,r t r, [r t r ]+r 1, [r t r ]+r 2,...,[r t r ]+r ) =(r +1 r,r +2 r,...,r t r, r t [r r 1 ],r t [r r 2 ],...,r t [r r 1],r t ). (5.3) Lemma 5.20 (Spitzer s combinatorial lemma). Assume r t > 0. Let ` be the number of cyclic permutations such that t is a ladder index. Then ` 1. Moreover, each such cyclic permutation has exactly ` ladder indices. Proof. We first show that ` 1, i.e., there is at least one cyclic permutation where t is a ladder index. Let be the smallest index achieving the maximum of r 1,...,r t, i.e., r >r 1 _ _ r 1 and r r +1 _ _ r t. From (5.3), r +i r apple 0 <r t, 8i =1,...,t, 223

13 and r t [r r j ] <r t, 8j =1,..., 1. Moreover, r t > 0=r 0 by assumption. So, in u ( ), t is a ladder index. Since ` 1, we can assume w.l.o.g. that u is such that t is a ladder index. Then is a ladder index in u if and only if if and only if r >r 0 _ _ r 1, r t >r t r and r t [r r j ] <r t, 8j =1,..., 1. Moreover, because r t >r j for all j, we have r t [r +i r ]=(r t r +i )+r and the last equation is equivalent to and That is, t is a ladder index in the r t >r t [r +i r ], 8i =1,...,t r t [r r j ] <r t, 8j =1,..., 1. -th cyclic permutation. We are now ready to prove the hitting-time theorem. Theorem 5.21 (Hitting-time theorem). Let (Z t ) be a Galton-Watson branching process with total progeny W. In the random walk representation of (Z t ), for all t 1. P[W = t] = 1 t P[X X t = t 1], Proof. Let R i := 1 S i and U i := 1 X i for all i =1,...,t and let R 0 := 0. Then {X X t = t 1} = {R t =1}, and By symmetry, for all {W = t} = {t is the first ladder index in R 1,...,R t }. P[t is the first ladder index in R 1,...,R t ] = P[t is the first ladder index in R ( ) 1,...,R( ) t ]. 224

14 Let E be the event on the last line. Hence 2 P[W = t] =E[ E 1 ]= 1 t E 4 tx =1 E 3 5. By Spitzer s combinatorial lemma, there is at most one cyclic permutation where t is the first ladder index. In particular, P t =1 E 2 {0, 1}. So P[W = t] = 1 t P [ t =1 E. Finally observe that, because R 0 =0and U i apple 1 for all i, the partial sum at the j-th ladder index must take value j. So the event {[ t =1 E } implies that {R t =1} because the last partial sum of all cyclic permutations is R t. Similarly, because there is at least one cyclic permutation such that t is a ladder index, the event {R t =1} implies {[ t =1E }. Therefore, which concludes the proof. P[W = t] = 1 t P [R t = 1], Note that the formula in the hitting-time theorem is somewhat remarkable as the probability on the l.h.s. is P[S i > 0, 8i <tand S t = 0] while the probability on the r.h.s. is 1 t P[S t = 0]. Example 5.22 (Poisson branching process). Let (Z t ) be a Galton-Watson branching process with offspring distribution Poi( ) where > 0. Let W be its total progeny. By the hitting-time theorem, for t 1, P[W = t] = 1 t P[X X t = t 1] = 1 t e t ( t)t 1 (t 1)! t = e ( t)t 1, t! where we used that a sum of independent Poisson is Poisson. J 225

15 5.3 Comparison to branching processes We begin with an example whose connection to branching processes is clear: percolation on trees. Translating standard branching process results into their percolation counterpart immediately gives a more detailed picture of the behavior of the process than was derived in Section We then tackle the phase transition of Erdös-Rényi graphs using a comparison to branching processes Percolation on trees: critical exponents In this section, we use branching processes to study bond percolation on the infinite b-ary tree b T b. The same techniques can be adapted to T d with d = b +1in a straightforward manner. We denote the root by 0. We think of the open cluster of the root, C 0, as the progeny of a branching process as follows. Denote n the n-th level of b T b, that is, the vertices of b T b at graph distance n from the root. In the branching process interpretation, we think of the immediate descendants in C 0 of a vertex v as the children of v. By construction, v has at most b children, independently of all other vertices in the same generation. In this branching process, the offspring distribution is binomial with parameters b and p; Z n := C 0 n represents the size of the progeny at generation n; and W := C 0 is the total progeny of the process. In particular C 0 < +1 if and only if the process goes extinct. Because the mean number of offspring is bp, by Theorem 5.7, this leads immediately to a (second) proof of: Claim p c btb = 1 b. The generating function of the offspring distribution is (s) :=((1 p)+ps) b. So, by Theorems 5.6 and 5.7, the percolation function (p) =P p [ C 0 =+1], is 0 on [0, 1/b], while on (1/b, 1] the quantity (p) :=(1 solution in [0, 1) of the fixed point equation (p)) is the unique s =((1 p)+ps) b. (5.4) For b =2, for instance, we can compute the fixed point explicitly by noting that 0 = ((1 p)+ps) 2 s = p 2 s 2 +[2p(1 p) 1]s +(1 p) 2, 226

16 whose solution for p 2 (1/2, 1] is s = = = [2p(1 p) 1] ± p [2p(1 p) 1] 2 4p 2 (1 p) 2 2p 2 [2p(1 p) 1] ± p 1 4p(1 p) 2p 2 [2p(1 p) 1] ± (2p 1) 2p 2 = 2p2 +[(1 2p) ± (2p 1)] 2p 2. So, rejecting the fixed point 1, We have proved: (p) =1 2p 2 + 2(1 2p) 2p 2 = 2p 1 p 2. Claim For b =2, (p) = ( 0, 0 apple p apple 1 2, 2(p 1 2 ) p <papple1. The expected size of the population at generation n is (bp) n so for p 2 [0, 1 b ) E C 0 = X (bp) n = 1 1 bp. n 0 For p 2 ( 1 b, 1), the total progeny is almost surely infinite, but it is of interest to compute the expected cluster size conditioned on C 0 < +1. We use the duality principle, Theorem For 0 apple k apple b, let ˆp k := [ (p)] k 1 p k b = [ (p)] k 1 p k (1 p) b k k [ (p)] k b = ((1 p)+p (p)) b p k (1 p) b k k b p (p) k 1 p = k (1 p)+p (p) (1 p)+p (p) b = ˆp k (1 ˆp) b k k b k 227

17 where we used (5.4) and implicitly defined the dual density ˆp := p (p) (1 p)+p (p). (5.5) In particular {ˆp k } is indeed a probability distribution. In fact it is binomial with parameters b and ˆp. The corresponding generating function is ˆ(s) :=((1 ˆp)+ˆps) b = (p) 1 (s (p)), where the second expression can be seen directly from the definition of {ˆp k }. Moreover, ˆ0 (s) = (p) 1 0 (s (p)) (p) = 0 (s (p)), so ˆ0 (1 )= 0 ( (p)) < 1 by the proof of Theorem 5.7, confirming that percolation with density ˆp is subcritical. Summarizing: Claim Conditioned on C 0 < +1, (supercritical) percolation on b T b with density p 2 ( 1 b, 1) has the same distribution as (subcritical) percolation on b T b with density defined by (5.5). Therefore: Claim f (p) :=E p C0 { C 0 <+1} For b =2, (p) =1 (p) = 1 p p 2 so = ( 1 1 bp, p 2 [0, 1 b ), (p) 1 bˆp, p 2 ( 1 b, 1). ˆp = p 1 2 p p (1 p)+p 1 p p 2 = (1 p) 2 =1 p, p(1 p)+(1 p) 2 and Claim For b =2, 8 1/2 ><, p 2 [0, 1 1 f p 2 ), 2 (p) = p 2 p >:, p 2 ( 1 2, 1). p

18 In fact, the hitting-time theorem, Theorem 5.21, gives an explicit formula for the distribution C 0. Namely, because C 0 d = 0 for S t = P`applet X` (t 1) where S 0 =1and the X`s are i.i.d. binomial with parameters b and p and further P[ 0 = t] = 1 t P[S t = 0], we have 2 3 P p [ C 0 = `] = 1` P 4 X X` = ` 15 = 1` b` p` ` 1 iapple` 1 (1 p) b` (` 1), (5.6) where we used that a sum of independent binomials with the same p is still binomial. In particular, at criticality, using Stirling s formula it can be checked that P pc [ C 0 = `] 1` 1 p 2 pc (1 p c )b` = 1 p 2 (1 pc )`3. as `! +1. Close to criticality, physicists predict that many quantities behave according to power laws of the form p p c, where the exponent is referred to as a critical exponent. The critical exponents are believed to satisfy certain universality prop- critical exponent erties. But even proving the existence of such exponents in general remains a major open problem. On trees, though, we can simply read off the critical exponents from the above formulas. For b =2, Claims 5.24 and 5.27 imply for instance that, as p! p c, (p) 8(p p c ) {p>1/2}, and f (p) 1 2 p p c 1. In fact, as can be seen from Claim 5.26, the critical exponent of f (p) does not depend on b. The same holds for (p). See Exercise 5.5. Using (5.6), the higher moments of C 0 can also be studied around criticality. See Exercise Random binary search trees: height To be written. See [Dev98, Section 2.1]. 229

19 Erdös-Rényi graph: the phase transition A compelling way to view Erdös-Rényi graphs as the density varies is the following coupling or evolution. For each pair {i, j}, let U {i,j} be independent uniform random variables in [0, 1] and set G(p) :=([n], E(p)) where {i, j} 2 E(p) if and only if U {i,j} apple p. Then G(p) is distributed according to G n,p. As p varies from 0 to 1, we start with an empty graph and progressively add edges until the complete graph is obtained. We showed in Section that log n n is a threshold function for connectivity. Before connectivity occurs in the evolution of the random graph, a quantity of interest is the size of the largest connected component. As we show in this section, this quantity itself undergoes a remarkable phase transition: when p = n with < 1, the largest component has size (log n); as crosses 1, many components quickly merge to form a so-called giant component of size (n). This celebrated result of Erdös and Rényi, which is often referred to as the phase transition of the Erdös-Rényi graph, is related to the phase transition in percolation. That should be clear from the similarities between the proofs, specifically the branching process approach to percolation on trees (Section 5.3.1). Although the proof is quite long, it is well worth studying in details. It employs most tools we have seen up to this point: first and second moment methods, Chernoff-Cramér bound, martingale techniques, coupling and stochastic domination, and branching processes. It is quintessential discrete probability. Statements and proof sketch result from Chapter 2. Before stating the main theorems, we recall a basic - (Poisson tail) Let S n be a sum of n i.i.d. Poi( ) variables. Recall from (2.28) and (2.29) that for a> 1 a n log P[S n an] a log a + =: I Poi (a), (5.7) and similarly for a< To simplify the notation, we let 1 n log P[S n apple an] I Poi (a). (5.8) I := I Poi (1) = 1 log 0, (5.9) where the inequality follows from the convexity of I attains its minimum at =1where it is 0. Requires: Sections 2.2.1, 2.3.1, 4.1 and and the fact that it 230

20 We let p = n and denote by C max the largest connected component. In the subcritical case, that is, when < 1, we show that the largest connected component has logarithmic size in n. Theorem 5.28 (Subcritical case: upper bound on the largest cluster). Let G n G n,pn where p n = n with 2 (0, 1). For all apple > 0, where I is defined in (5.9). P n,pn Cmax > (1 + apple)i 1 log n = o(1), (We also give a matching logarithmic lower bound on the size of C max in Theorem 5.36.) In the supercritical case, that is, when > 1, we prove the existence of a unique connected component of size linear in n, which is referred to as the giant component. Theorem 5.29 (Supercritical regime: giant component). Let G n G n,pn where p n = n with > 1. For any 2 (1/2, 1) and < 2 1, P n,pn [ C max n n ] apple O(n ). In fact, with probability 1 o(1), there is a unique largest component and the second largest cluster has size (log n). See Figure 5.3 for an illustration. At a high level, the proof goes as follows: (Subcritical regime) In the subcritical case, we use an exploration process and a domination argument to approximate the size of the connected components with the progeny of a branching process. The result then follows from the hitting-time theorem and the Poisson tail above. (Supercritical regime) In the supercritical case, a similar argument gives a bound on the expected size of the giant component, which is related to the survival of the branching process. Chebyshev s inequality gives concentration. The hard part there is to bound the variance. Exploration process For a vertex v 2 [n], let C v be the connected component containing v, also referred to as the cluster of v. To analyze the size of C v, we cluster introduce a natural procedure to explore C v and show that it is dominated above and below by branching processes. This procedure is similar to the exploration process defined in Section The exploration process started at v has 3 types of vertices: active, explored, 231 and neutral vertices

21 Figure 5.3: Illustration of the phase transition. - A t : active vertices, - E t : explored vertices, - N t : neutral vertices. We start with A 0 := {v}, E 0 := ;, and N 0 contains all other vertices in G n. At time t, if A t 1 = ; we let (A t, E t, N t ):=(A t 1, E t 1, N t 1 ). Otherwise, we pick a random element, a t, from A t 1 and set: - A t := (A t 1 \{a t }) [ {x 2 N t 1 : {x, a t } 2 G n } - E t := E t 1 [ {a t } - N t := N t 1 \{x 2 N t 1 : {x, a t } 2 G n } We imagine revealing the edges of G n as they are encountered in the exploration process and we let (F t ) be the corresponding filtration. In words, starting with v, the cluster of v is progressively grown by adding to it at each time a vertex adjacent to one of the previously explored vertices and uncovering its neighbors in G n. In this process, E t is the set of previously explored vertices and A t the frontier of the process is the set of vertices who are known to belong to C v but whose full neighborhood is waiting to be uncovered. The rest of the vertices form the set N t. See Figure

22 Figure 5.4: Exploration process for C v. The green edges are in F t. The red ones are not. Let A t := A t, E t := E t, and N t := N t. Note that (E t ) is non-decreasing while (N t ) is non-increasing. Let 0 := inf{t 0:A t =0}. The process is fixed for all t> 0. Notice that E t = t for all t apple 0, as exactly one vertex is explored at each time until the set of active vertices is empty. Moreover, for all t, (A t, E t, N t ) forms a partition of [n] so A t + t + N t = n, 8t apple 0. (5.10) Hence, in tracking the size of the exploration process, we can work alternatively with A t or N t. Specifically, the size of the cluster of v can be characterized as follows. Lemma = C v. Proof. Indeed a single vertex of C v is explored at each time until all of C v has been visited. At that point, A t is empty. The processes (A t ) and (N t ) admit a simple recursive form. Conditioning on F t 1 : 233

23 - (Active vertices) If A t 1 =0, the exploration process has finished its course and A t =0. Otherwise, (a) one active vertex becomes an explored vertex and (b) its neutral neighbors become active vertices. That is, A t = A t 1 + {At 1 >0} + Z t, (5.11) {z} 1 (a) {z} (b) where Z t is binomial with parameters N t 1 = n (t 1) A t 1 and p n. For the coupling arguments below, it will be useful to think of Z t as a sum of independent Bernoulli variables. That is, let (I t,j : t 1,j 1) be an array of independent, identically distributed {0, 1}-variables with P[I 11 = 1] = p n. We write Z t = N t 1 X i=1 I t,i. (5.12) - (Neutral vertices) Similarly, if A t 1 > 0, i.e. N t 1 <n (t 1), Z t neutral vertices become active vertices. That is, N t = N t 1 {Nt 1 <n (t 1)} Z t. (5.13) Branching process arguments With these observations, we now relate the cluster size of v to the total progeny of a certain branching process. This is the key lemma. Lemma 5.31 (Cluster size: branching process approximation). Let G n G n,pn where p n = n with > 0 and let C v be the connected component of v 2 [n]. Let W be the total progeny of a branching process with offspring distribution Poi( ). Then, for k n = o( p n), k 2 P[W k n ] O n apple P n,pn [ C v k n ] apple P[W k n ]. n Before proving the lemma, recall the following simple domination results from Chapter 4: - (Binomial domination) We have n m =) Bin(n, p) Bin(m, p). (5.14) The binomial distribution is also dominated by the Poisson distribution in the following way: 2 (0, 1) =) Poi( ) Bin n 1,. (5.15) n For the proofs, see Examples 4.24 and

24 We use these domination results to relate the size of the connected components to the progeny of the branching process. Proof of Lemma We start with the upper bound. Upper bound: Because N t 1 = n (t 1) A t 1 apple n 1, conditioned on F t 1, the following stochastic domination relations hold Bin N t 1, Bin n 1, Poi( ), n n by (5.14) and (5.15). Observe that the r.h.s. does not depend on N t 1. Let (Z t ) be a sequence of independent Poi( ). Using the coupling in Example P 4.28, we can couple the processes (I t,j ) j and (Z t ) in such way that Z n 1 t j=1 I t,j a.s. for all t. Then by induction on t, for all t, 0 apple A t apple A t a.s. where we define A t := A t 1 + {At 1 >0} 1+Z t, (5.16) with A 0 := 1. (In fact, this is a domination of Markov transition matrices, as defined in Definition 4.36.) In words, (A t ) is stochastically dominated by the exploration process of a branching process with offspring distribution Poi( ). As a result, letting 0 := inf{t 0:A t =0}, be the total progeny of the branching process, we immediately get P n,pn [ C v k n ]=P n,pn [ 0 k n ] apple P[ 0 k n ]=P[W k n ]. Lower bound: In the other direction, we proceed in two steps. We first show that, up to a certain time, the process is bounded from below by a branching process with binomial offspring distribution. In a second step, we show that this binomial branching process can be approximated by a Poisson branching process. 1. (Domination from below) Let A t be defined as A t := A t 1 + {At 1 >0} 1+Z t, (5.17) with A 0 := 1, where Z t := n X k n i=1 I t,j. (5.18) Note that (A t ) is the size of the active set in the exploration process of a branching process with offspring distribution Bin(n k n,p n ). Let 0 := inf{t 0:A t =0}, 235

25 be the total progeny of this branching process. We claim that A t is bounded from below by A t up to time n k n := inf{t 0:N t apple n k n }. Indeed, for all t apple n kn, N t 1 >n k n. Hence, by (5.12) and (5.18), Z t Z t for all t apple n kn and as a result, by induction on t, A t A t, 8t apple n kn. Because the inequality between A t and A t holds only up to time n kn, we cannot compare directly 0 and 0. However, observe that the size of the cluster of v is at least the total number of active and explored vertices at any time t; in particular, when n kn < +1, C v A n kn + E n kn = n N n kn k n. On the other hand, when n kn =+1, N t >n k n for all t in particular for all t 0 and therefore C v <k n. Moreover in that case, because A t A t for all t, it holds in addition that 0 apple 0 <k n. To sum up, we have proved the implications In particular, 0 k n =) n kn < +1 =) 0 k n. 2. (Poisson approximation) By Theorem 5.21, P[ 0 k n ] apple P n,pn [ 0 k n ]. (5.19) P[ 0 = t] = 1 t P " tx i=1 Z i = t 1 #, (5.20) where the Z i s are independent Bin(n k n,p n ). Note that P t i=1 Z i Bin(t(n k n ),p n ). Recall the definition of (Z t ) from (5.16). By Example 4.10, Theorem 4.16, and the triangle inequality for total variation distance, " tx # " tx # P Z i = t 1 P Z i = t 1 i=1 i=1 apple 1 2 t(n k n)( log(1 p n )) 2 +[t t(n k n )( log(1 p n ))] apple 1 2 apple ) 2 n tn + O(n 2 + t t(n k n ) n + O(n 2 ) tkn = O. n 236

26 So by (5.20) P[ 0 k n ]=1 P[ 0 <k n ] k 2 =1 P[ 0 <k n ]+O n n k 2 = P[ 0 k n ]+O n. n Plugging this approximation back into (5.19) gives P n,pn [ C v k n ]=P n,pn [ 0 k n ] k 2 P[ 0 k n ] O n n k 2 = P[W k n ] O n. n Remark In fact one can get a slightly better lower bound. See Exercise 5.7. When k n is large, the branching process approximation above is not as accurate because of the saturation effect: an Erdös-Rényi graph has a finite pool of vertices from which to draw edges; as the number of neutral vertices decreases, so does the expected number of uncovered edges at each time. Instead we use the following lemma. Lemma 5.33 (Cluster size: saturation). Let G n G n,pn where p n = n with > 0 and let C v be the connected component of v 2 [n]. Let Y t Bin(n 1, 1 (1 p n ) t ). Then, for any t, P n,pn [ C v = t] apple P[Y t = t 1]. Proof. We work with neutral vertices. By Lemma 5.30 and Equation (5.10), for any t, P n,pn [ C v = t] =P n,pn [ 0 = t] apple P n,pn [N t = n t]. (5.21) Recall that N 0 = n 1 and N t = N t 1 {Nt 1 <n (t 1)} N t 1 X i=1 I t,i. (5.22) 237

27 It is easier to consider the process without the indicator as it has a simple distribution. Define N0 0 := n 1 and N 0 t := N 0 t 1 N 0 t 1 X i=1 I t,i, (5.23) and observe that N t Nt 0 for all t, as the two processes agree up to time 0 at which point N t stays fixed. The interpretation of Nt 0 is straightforward: starting with n 1 vertices, at each time each remaining vertex is discarded with probability p n. Hence, the number of surviving vertices at time t has distribution Nt 0 Bin(n 1, (1 p n ) t ), by the independence of the steps. Arguing as in (5.21), P n,pn [ C v = t] apple P n,pn [Nt 0 = n t] = P n,pn [(n 1) Nt 0 = t 1] = P[Y t = t 1]. which concludes the proof. Combining the previous lemmas we get: Lemma 5.34 (Bound on the cluster size). Let G n G n,pn > 0 and let C v be the connected component of v 2 [n]. - (Subcritical case) Assume 2 (0, 1). For all apple > 0, where p n = n with P n,pn Cv > (1 + apple)i 1 log n = O(n (1+apple) ). - (Supercritical case) Assume > 1. Let be the unique solution in (0, 1) to the fixed point equation 1 e =. Note that, Example 5.11, is the survival probability of a branching process with offspring distribution Poi( ). For any apple > 0,x P n,pn Cv > (1 + apple)i 1 log n log 2 n = + O. n Moreover, for any < and any so that > 0, there exists apple, > 0 large enough P n,pn (1 + apple, )I 1 log n apple C v apple n = O(n (1+ ) ). (5.24) 238

28 Proof. In both cases we use Lemma To apply the lemma we need to bound the tail of the progeny W of a Poisson branching process. Using the notation of Lemma 5.31, by Theorem 5.21, P [W >k n ]=P[W =+1]+ X " tx # 1 t P Z i = t 1, (5.25) t>k n i=1 where the Z i s are i.i.d. Poi( ). Both terms on the r.h.s. depend on whether or not the mean is smaller or larger than 1. We start with the first term. When < 1, the Poisson branching process goes extinct with probability 1. Hence P[W =+1] =0. When > 1 on the other hand, P[W =+1] =, where > 0 is the survival probability of the branching process. As to the second term, the sum of the Z i s is t. When < 1, using (5.7), " X tx # 1 t P Z i = t 1 t>k n i=1 apple X t>k n P " tx # Z i t 1 i=1 apple X t 1 exp ti Poi t t>k n apple X exp t(i O(t 1 )) t>k n apple X C 0 exp ( ti ) t>k n apple C exp ( I k n ), (5.26) for some constants C, C 0 > 0, where we assume that k n =!(1). When > 1, " X tx # 1 t P Z i = t 1 apple X " tx # P Z i apple t t>k n i=1 t>k n i=1 apple X exp ( ti ) t>k n apple C exp ( I k n ), (5.27) for a possibly different C>0. Subcritical case: Assume 0 < Lemma 5.31, < 1 and let c =(1+apple)I 1 for apple > 0. By P n,pn [ C 1 >clog n] apple P [W >clog n]. 239

29 By (5.25) and (5.26), which proves the claim. P [W >clog n] = O (exp ( I c log n)), (5.28) Supercritical case: Now assume > 1 and again let c =(1+apple)I 1 for apple > 0. By Lemma 5.31, P n,pn [ C v >clog n] =P [W By (5.25) and (5.27), log 2 n >clog n]+o, (5.29) n P [W >clog n] = + O (exp ( ci log n)) Combining (5.29) and (5.30), for any apple > 0, = + O(n (1+apple) ). (5.30) log 2 n P n,pn [ C v >clog n] = + O. (5.31) n Next, we show that in the supercritical case when C v >clog n the cluster size is in fact linear in n with high probability. By Lemma 5.33 P n,pn [ C v = t] apple P[Y t = t 1] apple P[Y t apple t], where Y t Bin(n 1, 1 (1 p n ) t ). Roughly, the r.h.s. is negligible until the mean µ t := (n 1)(1 (1 /n) t ) is of the order of t. Let be the unique solution in (0, 1) to the fixed point equation f( ) :=1 e =. The solution is unique because f(0) = 0, f(1) < 1, and the f is increasing, strictly concave and has derivative > 1 at 0. Note in particular that, when t = n, µ t t. Let <. For any t 2 [c log n, n], by a Chernoff bound for Poisson trials (Theorem 2.31),! 2 P[Y t apple t] apple exp µ t 2 1 t µ t. (5.32) 240

30 For t/n apple <, using 1 x apple e x for x 2 (0, 1), there is > 1 such that (t/n) ) µ t (n 1)(1 e n 1 1 e (t/n) = t n t/n n 1 f(t/n) = t n t/n n 1 1 e t n t, for n large enough, by the properties of f mentioned above. Plugging this back into (5.32), we get ( )! 1 2 P[Y t apple t] apple exp t 1. 2 Therefore X n t=c log n P n,pn [ C v = t] apple apple X n t=c log n +1X t=c log n P[Y t apple t] exp t ( = O exp c log n 2 ( )! 1 2 )!! 1 2. Taking apple > 0 large enough proves (5.24). Let C max be the largest connected component of G n (choosing the component containing the lowest label if there is more than one such component). Our goal is to characterize the size of C max. Let X k := X v2[n] { C v >k}, be the number of vertices in clusters of size at least k. There is a natural connection between X k and C max, namely, C max >k () X k > 0 () X k >k. 241

31 A first moment argument on X k and the previous lemma immediately imply an upper bound on the size of C max in the subcritical case. Proof of Theorem Let c =(1+apple)I 1 for apple > 0. We use the first moment method on X k. By symmetry and the first moment method (Corollary 2.5), By Lemma 5.34, as n! +1. P n,pn [ C max >clog n] = P n,pn [X c log n > 0] apple E n,pn [X c log n ] = n P n,pn [ C 1 >clog n]. (5.33) P n,pn [ C max >clog n] =O(n n (1+apple) )=O(n apple )! 0, In fact we prove below that the largest component is of size roughly I 1 log n. But first we turn to the supercritical regime. Second moment arguments To characterize the size of the largest cluster in the supercritical case, we need a second moment argument. Assume > 1. For > 0 and <, let apple, be as defined in Lemma Set k n := (1 + apple, )I 1 log n and kn := n. We call a vertex v such that C v apple k n a small vertex. Let Y k := X v2[n] { C v applek}. small vertex Then Y kn is the number of small vertices. Observe that by definition Y k = n X k. Hence by Lemma 5.34, the expectation of Y kn is E n,pn [Y kn ]=n(1 P n,pn [ C v >k n ]) = (1 )n + O log 2 n. (5.34) Using Chebyshev s inequality (Theorem 2.13), we prove that Y kn expectation: is close to its Lemma 5.35 (Concentration of Y kn ). For any 2 (1/2, 1) and < 2 1, P n,pn [ Y kn (1 )n n ] apple O(n ). 242

32 Lemma 5.35, which is proved below, leads to our main result in the supercritical case: the existence of the giant component, a unique cluster of size linear in n. Proof of Theorem Take 2 ( /2, ) and let k n and k n be as above. Let B 1,n := { X kn n n }. Because < 1, for n large enough, the event B1,n c implies that X kn 1 and, in particular, that giant component C max apple X kn. Let B 2,n := {9v, C v 2 [k n, k n ]}. If, in addition to B1,n c, Bc 2,n also holds then C max apple X kn = X kn. There is equality in the last display if there is a unique cluster of size greater than k n. This is indeed the case under B1,n c \ Bc 2,n : if there were two distinct clusters of size k n, then since 2 > we would have for n large enough X kn = X kn > 2 k n =2 n > n + n, a contradiction. Hence we have proved that, under B1,n c \ Bc 2,n, we have C max = X kn = X kn. Applying Lemmas 5.34 and 5.35 concludes the proof. It remains to prove Lemma Proof of Lemma The main task is to bound the variance of Y kn. Note that E n,pn [Y 2 k ] = X u,v2[n] = X u,v2[n] P n,pn [ C u apple k, C v apple k] P n,pn [ C u apple k, C v apple k, u, v] +P n,pn [ C u apple k, C v apple k, u < v], (5.35) where u, v indicates that u and v are in the same connected component. 243

33 To bound the first term in (5.35), we note that u, v implies that C u = C v. Hence, X P n,pn [ C u apple k, C v apple k, u, v] = X P n,pn [ C u apple k, v 2 C u ] u,v2[n] u,v2[n] = X u,v2[n] E n,pn [ { C u applek} = X u2[n] E n,pn [ C u { C u applek}] = n E n,pn [ C 1 { C 1 applek}] {v2c u}] apple nk. (5.36) To bound the second term in (5.35), we sum over the size of C u and note that, conditioned on { C u = `,u < v}, the size of C v has the same distribution as the unconditional size of C 1 in a G n `,pn random graph, that is, P n,pn [ C v apple k C u = `,u< v] =P n Observe that this probability is increasing in `. Hence X X P n,pn [ C u = `, C v apple k, u < v] u,v2[n] `applek = X apple u,v2[n] `applek X u,v2[n] `applek = X u,v2[n] `,pn [ C 1 apple k]. X P n,pn [ C u = `,u< v] P n,pn [ C v apple k C u = `,u< v] X P n,pn [ C u = `] P n k,pn [ C v apple k] P n,pn [ C u apple k] P n k,pn [ C v apple k]. To get a bound on the variance of Y k, we need to relate this last expression to (E n,pn [Y k ]) 2. For this purpose we define k := P n k,pn [ C 1 apple k] P n,pn [ C 1 apple k]. Then, plugging this back above, we get X X P n,pn [ C u = `, C v apple k, u < v] u,v2[n] `applek apple X u,v2[n] P n,pn [ C u apple k](p n,pn [ C v apple k]+ k ) apple (E n,pn [Y k ]) 2 + n k,

34 and it remains to bound k. We use a coupling argument. Let H G n k,pn and construct H 0 G n,pn in the following manner: let H 0 coincide with H on the first n k vertices then pick the rest the edges independently. Then clearly k 0 since the cluster of 1 in H 0 includes the cluster of 1 in H. In fact, k is the probability that under this coupling the cluster of 1 has at most k vertices in H but not in H 0. That implies in particular that at least one of the vertices in the cluster of 1 in H is connected to a vertex in {n k +1,...,n}. Hence by a union bound over those edges k apple k 2 p n, and X u,v2[n] P n,pn [ C u apple k, C v apple k, u, v] apple (E n,pn [Y k ]) 2 + k 2 n. (5.37) Combining (5.36) and (5.37), we get Var[Y k ] apple 2 k 2 n. The result follows from Chebyshev s inequality (Theorem 2.13) and Equation (5.34). A similar second moment argument also gives a lower bound on the size of the largest component in the subcritical case. We proved in Theorem 5.28 that, when < 1, the probability of observing a connected component of size significantly larger than I 1 log n is vanishingly small. In the other direction, we get: Theorem 5.36 (Subcritical regime: lower bound on the largest cluster). Let G n G n,pn where p n = n with 2 (0, 1). For all apple 2 (0, 1), P n,pn Cmax apple (1 apple)i 1 log n = o(1). Proof. Recall that X k := X v2[n] { C v >k}. It suffices to prove that with probability 1 o(1) we have X k > 0 when k = (1 apple)i 1 log n. To apply the second moment method, we need an upper bound on the second moment of X k and a lower bound on its first moment. The following lemma is closely related to Lemma Exercise 5.8 asks for a proof. Lemma 5.37 (Second moment of X k ). Assume < 1. There is C>0 such that E n,pn [X 2 k ] apple (E n,p n [X k ]) 2 + Cnke ki, 8k

35 Lemma 5.38 (First moment of X k ). Let k n =(1 2 (0, apple) we have that apple)i 1 log n. Then, for any E n,pn [X kn ]= (n ), for n large enough. Proof. By Lemma 5.31, E n,pn [X kn ]=n P n,pn [ C 1 >k n ] = n P[W >k n ] O k 2 n. (5.38) Once again, we use the random-walk representation of the total progeny of a branching process (Theorem 5.21). Using the notation of Lemma 5.31, P[W >k n ]= X " tx # 1 t P Z i = t 1 t>k n i=1 = X 1 t e t ( t)t 1 (t 1)!. t>k n Using Stirling s formula, we note that e t ( t)t 1 t! For any " > 0, for n large enough, P[W >k n ] t ( t) t = e t(t/e) tp 2 t(1 + o(1)) = 1+o(1) t p exp ( t + t log + t) 2 t = 1+o(1) p exp ( ti ). 2 t 3 1 X t>k n exp ( t(i + ")) = (exp ( k n (I + "))). For any 2 (0, apple), plugging the last line back into (5.38) and taking " small enough gives E n,pn [X kn ]= (n exp ( k n (I + "))) = exp {1 (1 apple)i 1 (I + ")} log n = (n ). 246

36 We return to the proof of Theorem Let k n =(1 apple)i 1 log n. By the second moment method (Theorem 2.18) and Lemmas 5.37 and 5.38, (EX kn ) 2 P n,pn [X kn > 0] E[Xk 2 n ] 1+ O(nk ne kni ) (n 2 ) = 1+ O(k ne apple log n ) (n 2 )! 1, 1 1 for close enough to apple. That proves the claim. Critical regime via martingales To be written. See [NP10]. 5.4 Further applications Uniform random trees: local limit To be written. See [Gri81]. Exercises Exercise 5.1 (Galton-Watson process: geometric offspring). Let (Z t ) be a Galton- Watson branching process with geometric offspring distribution (started at 0), i.e., p k = p(1 p) k for all k 0, for some p 2 (0, 1). Let q := 1 p, let m be the mean of the offspring distribution, and let M t = m t Z t. a) Compute the probability generating function f of {p k } k 0 and the extinction probability := p as a function of p. b) If G is a 2 2 matrix, define Show that G(H(s)) = (GH)(s). c) Assume m 6= 1. Use b) to derive G(s) := G 11s + G 12 G 21 s + G 22. f t (s) = pmt (1 s)+qs p qm t (1 s)+qs p. 247

37 Deduce that when m>1 E[exp( M 1 )] = +(1 ) (1 ) +(1 ). d) Assume m =1. Show that and deduce that E[e f t (s) = t (t 1)s t +1 ts, Z t/t Z t > 0]! Exercise 5.2 (Supercritical branching process: infinite line of descent). Let (Z t ) be a supercritical Galton-Watson branching process with offspring distribution {p k } k 0. Let be the extinction probability and define := 1. Let Zt 1 be the number of individuals in the t-th generation with an infinite line of descent, i.e., whose descendant subtree is infinite. Denote by S the event of nonextinction of (Z t ). Define p 1 0 := 0 and p 1 k := X j 1 j k k p j. k j k a) Show that {p 1 k } k 0 is a probability distribution and compute its expectation. b) Show that for any k 0 [Hint: Condition on Z 1.] P[Z 1 1 = k S]=p 1 k. c) Show by induction on t that, conditioned on nonextinction, the process (Z 1 t ) has the same distribution as a Galton-Watson branching process with offspring distribution {p 1 k } k 0. Exercise 5.3 (Hitting-time theorem: nearest-neighbor walk). Let X 1,X 2,... be i.i.d. random variables taking value +1 with probability p and 1 with probability q := 1 p. Let S t := P t i=1 X i with S 0 := 0 and M t := max{s i :0appleiapplet}. a) For r 1, use the reflection principle to show that ( P[S t = b], b r, P[M t r, S t = b] = (q/p) r b P[S t =2r b], b < r. 248

38 b) Use a) to give a proof of the hitting-time theorem in the following special case: letting b be the first time S t hits b>0, show that for all t 1 P[ b = t] = b t P[S t = b]. [Hint: Consider the probability P[M t 1 = S t 1 = b 1,S t = b].] Exercise 5.4 (Percolation on bounded-degree graphs). Let G =(V,E) be a countable graph such that all vertices have degree bounded by b +1for b 2. Let 0 be a distinguished vertex in G. For bond percolation on G, prove that p c (G) p c ( b T b ), by bounding the expected size of the cluster of 0. [Hint: Consider self-avoiding paths started at 0.] Exercise 5.5 (Percolation on T b b : critical exponent of (p)). Consider bond percolation on the rooted infinite b-ary tree T b 1 b with b>2. For " 2 [0, 1 b ] and u 2 [0, 1], define h(",u):=u 1 1 b " (1 u)+ 1 b + " b. a) Show that there is a constant C>0 not depending on ",usuch that h(",u) b"u + b 1 2b u2 apple C(u 3 _ "u 2 ). b) Use a) to prove that (p) lim p#p c( T b b) (p p c ( T b b )) = 2b2 b 1. Exercise 5.6 (Percolation on b T 2 : higher moments of C 0 ). Consider bond percolation on the rooted infinite binary tree b T 2. For density p< 1 2, let Z p be an integer-valued random variable with distribution P p [Z p = `] =` P p[ C 0 = `], 8` 1. E p C 0 a) Using the explicit formula for P p [ C 0 = `] derived in Section 5.3.1, show that for all 0 <a<b<+1 " # Z Z b p P p (1/4)( 1 2 [a, b]! C x 1/2 e x dx, 2 p) 2 as p " 1 2, for some constant C> a

Modern Discrete Probability Branching processes

Modern Discrete Probability Branching processes Modern Discrete Probability IV - Branching processes Review Sébastien Roch UW Madison Mathematics November 15, 2014 1 Basic definitions 2 3 4 Galton-Watson branching processes I Definition A Galton-Watson