Abstract These lecture notes are largely based on Jim Pitman s textbook Combinatorial Stochastic Processes, Bertoin s textbook Random fragmentation

Size: px

Start display at page:

Download "Abstract These lecture notes are largely based on Jim Pitman s textbook Combinatorial Stochastic Processes, Bertoin s textbook Random fragmentation"

Mercy Gilmore
5 years ago
Views:

1 Abstract These lecture notes are largely based on Jim Pitman s textbook Combinatorial Stochastic Processes, Bertoin s textbook Random fragmentation and coagulation processes, and various papers. It is part of an ongoing effort to update Jim s textbook and the status of the open problems in that text. 1

2 Lecture notes: Combinatorial Stochastic Processes Ngoc M Tran University of Bonn Summer 2016 Last updated: June 12,

3 Contents 1 Random partitions Introduction Different ways to think about partitions Orderings of a partition Size-biased permutation of a finite i.i.d sequence Joint distribution Stick-breaking representation Size-biased permutation of an infinite i.i.d sequence Subordinator as limit of i.i.d Random mass partition with independent stick-breaks. Stable subordinators and Poisson-Dirichlet distributions Exchangeable random partitions. Kingman s correspondence Partition of [n] Exchangeable partitions EPPF Kingman s correspondence in terms of EPPF EPPF for independent stick-break family Various applications of the GEM process Random permutations Kingman coalescent with mutation A branching process description of the GEM distribution Number of blocks in a CRP Almost sure limits Power law Indicator of blocks Structural distribution Poisson-Dirichlet from local times Running example: Brownian motion PD from excursions of strong Markov processes Subordinator from local time of Brownian motion Stable process and subordinators Brownian excursions revisited Exercises Random walks and random trees Trees and walks Galton-Watson trees GW and Lukasciewicz walk

4 2.1.3 GW and the height function GW and contour walks Convergence to Brownian motions Convergence of the L-walk Convergence of the height function Convergence of the contour function GW tree conditioned on its size Combinatorics of GW trees conditioned on n nodes Kemperman lemma Lagrange inversion Vervaat transform Cayley tree formula Forest volume formula p-forests, coalescence and fragmentation

5 Chapter 1 Random partitions 1.1 Introduction Different ways to think about partitions In general, for a sequence x = (x(1), x(2),...), write x to denote the same sequence presented in decreasing order. Let = {x = (x(1), x(2),...) : x(i) 0, i x(i) 1} and = {x : x } be closed infinite simplices, the later contains sequences with non-increasing terms. Denote their boundaries by 1 = {x : i x(i) = 1} and 1 = {x, i x(i) = 1} respectively. Any finite sequence can be associated with an element of 1 after being normalized by its sum and extended with zeros. Example (Mass partitions). Consider a mass T split into countably many smaller masses s(1), s(2), If i s(i) = T, we say that the partition is conservative (ie: total mass is conserved). If i s(i) < T, we say that the partition is dissipative. One can imagine that when the mass splits, a sizeable chunk of it becomes dust (isolated infinitesimal particles) and dissipate into the air. The normalized sequence (s(1)/t,...) is in. Thus, is also called the space of mass partitions. Example (Interval partitions). Represent a mass T as the interval [0, T ], each point on the interval is an infinitesimal particle making up the mass. Given x, represent T x(1), T x(2),... as lengths of disjoint open intervals on [0, T ]. This is called an interval partition representation of T x. A point u [0, T ] that does not belong to the union of such intervals represents a dust. Example (Partition of n). For an integer n, a partition of n is an unordered set {n 1,..., n k } of integers summing up to n: i n i = n. The normalized sequence (n 1 /n,..., n k /n) is a mass partition. In this case, one can think of the mass as making up of n indivisible units, each of mass 1. Example (Random mass partition). Here is a simple way to get a random mass partition. Fix an integer n. Let F be a distribution on (0, ), mean µ <. Let X(1),..., X(n) be independent and identically distributed (i.i.d) random variables with distribution F. Define T n = n i=1 X(i). Then (X(1)/T,..., X(n)/T ) is a random mass partition with n parts. 5

6 Example (Stick-breaking). Here is a simple way to get a random interval partition. Start with a stick of length 1. Choose a point on the stick according to some distribution F 1 supported on [0, 1], break the stick into two pieces, discard the left-hand piece, and rescale the remaining half to have length 1. Repeating this procedure with distribution F 2, and so on. This gives a random mass partition X = (X(1), X(2),...), where k 1 X(k) = W k i=1 W i, where W i s are independent, and W i distributed as F i Orderings of a partition Let x = (x(1),..., x(n)) be a mass partition with n parts, t = i x(i). There are three natural ways to order the elements of x. Decreasing order: x. Exchangeable random order: let σ be a uniformly distributed random permutation of [n]. The mass partition x presented in exchangeable random order is the random mass partition X = (x(σ 1 ),..., x(σ n )). Size-biased order: here X = (x(σ 1 ),..., x(σ n )), where σ is the random permutation with P(σ 1 = i) = x(i), and for k distinct indices i 1,..., i k, t P(σ k = i k σ 1 = i 1,..., σ k 1 = i k 1 ) = x(i k ) t (x(i 1 ) x(i k 1 )). (1.1) An index i with bigger size x(i) tends to appear earlier in the permutation, hence the name size-biased. Call σ the size-biased order, and call the random sequence X the size-biased permutation of x. The size-biased order is important because it is the order in which new elements appear in a sampling without replacement scheme. For this reason, it is sometimes called the order of appearance. Example (Size-biased order and sampling without replacement). Let (n 1,..., n k ) be a vector of integers, n i interpreted as the number of balls of color i (or the number of animals of species i). Let n = k i=1 n k be the total number of balls. Sample without replacement from the set of n balls. Let σ denote the order of colors that appear. This is a size-biased order. Example (Kingman s paintbox and size-biased permutation). Kingman s paintbox [?] is a useful way to describe and extend size-biased permutations. For x, let s k be the sum of the first k terms. Note that x defines a partition ϕ(x) of the unit interval [0, 1], consisting of components which are intervals of the form [s k, s k+1 ) for k = 1, 2,..., and the interval [s, 1], which we call the zero component. Sample points ξ 1, ξ 2,... one by one from the uniform distribution on [0, 1]. Each time a sample point discovers a new component that is not in [s, 1], write down its size. If the sample point discovers a new point of [s, 1], write 6

7 0. Let X = (X (1), X (2),...) be the random sequence of sizes. Since the probability of discovery of a particular (non-zero) component is proportional to its length, the non-zero terms in X form the size-biased permutation of the non-zero terms in x as defined by (1.1). In the paintbox terminology, the components correspond to different colors used to paint the balls with labels 1, 2,.... Two balls i, j have the same paint color if and only if ξ i and ξ j fall in the same component. The size-biased permutation X records the size of the newly discovered components, or paint colors. The zero component represents a continuum of distinct paint colors, each of which can be represented at most once. Example (Kingman s paintbox and random partition of n). Kingman s paintbox also gives a random partition of n (or more precisely, a random partition of [n]). Consider the previous setup. Say that i j iff ξ i and ξ j fall on the same interval (ie: if the two balls have the same color). Then for each n N, we get a random partition of the n balls by colors. We end this section with some questions on the objects we have introduced so far. We will answer these in the next couple of lectures. 1. Take the size-biased permutation X of a random partition X (eg: from i.i.d). What is the distribution of X? 2. When does a random mass partition has the stick-breaking form for independent stick lengths? When does a size-biased permutation has the stick-breaking form for independent stick lengths? 1.2 Size-biased permutation of a finite i.i.d sequence The size-biased permutation of a random sequence X is defined conditioned on the sequence s values. We now focus on the size-biased permutation of an i.i.d sequence X n = (X n (1),..., X n (n)) with finite length n. We will use square brackets (X n [1],..., X n [n]) to denote the size-biased permutation, or Xn, to avoid having to list out the terms. Assume that F has density ν 1. Let T n k = X n [k + 1] X n [n] denote the sum of the last n k terms in an i.i.d. size-biased permutation of length n. For 1 k n, let ν k be the density of S k, the sum of k i.i.d. random variables with distribution F. We shall write gamma(a, λ) for a Gamma distribution whose density at x is λ a x a 1 e λx /Γ(a) for x > 0, and beta(a, b) for the Beta distribution whose Γ(a + b) density at x is Γ(a)Γ(b) xa 1 (1 x) b 1 for x (0, 1) Joint distribution We first derive joint distribution of the first k terms X n [1],..., X n [k]. 7

8 Proposition (Barouch-Kaufman [?]). We have P(X n [1] dx 1,..., X n [k] dx k ) n! k k = x j ν 1 (x j ) dx j ν n k (s) (x j x k + s) 1 ds (1.2) (n k)! j=1 0 j=1 n! k k = x j ν 1 (x j ) dx j E 1. (1.3) (n k)! x j x k + S n k j=1 j=1 Proof. Let σ denote the random permutation on n letters defined by sizebiased permutation as in (1.1). Then there are n! (n k)! distinct possible values for (σ 1,..., σ k ). By exchangeability of the underlying i.i.d. random variables X n (1),..., X n (n), it is sufficient to consider σ 1 = 1,..., σ k = k. Note that P (X n (1),..., X n (k)) dx 1... dx k, n k X n (j) ds = ν n k (s) ds ν 1 (x j ) dx j. j=k+1 Thus, restricted to σ 1 = 1,..., σ k = k, the probability of observing (X n [1],..., X n [k]) dx 1... dx k and T n k ds is precisely x 1 x 2 x x k + s x x k + s x k k x k + s ν n k(s) ν 1 (x j ) dx j ds. By summing over n! (n k)! possible values for (σ 1,..., σ k ), and integrating out the sum T n k, we arrive at (1.2). Equation (1.3) follows by rewriting. Note that X n [k] = T n k+1 T n k for k = 1,..., n 1. Thus we can rewrite (1.2) in terms of the joint law of (T n, T n 1,..., T n k ): ( k 1 ) n! t i t i+1 P(T n dt 0,..., T n k dt k ) = ν 1 (t i t i+1 ) ν n k (t k ) dt 0... dt k. (n k)! t i=0 i (1.4) Rearranging (1.4) yields the following result, which appeared as an exercise in [?, 2.3]. Corollary (Chaumont-Yor [?]). The sequence (T n, T n 1,..., T 1 ) is an inhomogeneous Markov chain with transition probability P(T n k ds T n k+1 = t) = (n k + 1) t s ν 1 (t s) ν n k(s) ds, (1.5) t ν n k+1 (t) d for k = 1,..., n 1. Together with T n = Sn, equation (1.5) specifies the joint law of (X n [1],..., X n [n]), and vice versa Stick-breaking representation For k 1, conditioned on T n k+1 = t, X n [k] is distributed as the first sizebiased pick out of n k + 1 i.i.d. random variables conditioned to have sum j=1 j=1 8

9 S n k+1 = t. This provides a recursive way to generate a finite i.i.d. size-biased permutation: first generate T n (which is distributed as S n ). Conditioned on the value of T n, generate T n 1, let X n [1] be the difference. Now conditioned on the value of T n 1, generate T n 2 via (1.5), let X n [2] be the difference, and so on. Let us explore this recursion from a different angle by considering the ratio W n,k := X n[k] and its complement, W n,k = 1 W n,k = T n k. For k 2, T n k+1 T n k+1 note that X n [k] = X n[k] T n k+1 Tn 1 k 1 = W n,k W n,i. (1.6) T n T n k+1 T n k+2 T n The variables W n,i can be interpreted as residual fractions in a stick-breaking scheme: start with a stick of length 1. Choose a point on the stick according to distribution W n,1, break the stick into two pieces, discard the piece of length W n,1 and rescale the remaining half to have length 1. Repeating this procedure k times, and (1.6) is the fraction broken off at step k relative to the original stick length. d Together with T n = Sn, one could use (1.6) to compute the marginal distribution for X n [k] in terms of the ratios W n,i. In general the W n,i are not necessarily independent, and their joint distributions need to be worked out from (1.5). Lukacs [?] proved that if X, Y are non-degenerate, positive independent X random variables, then X + Y is independent of X+Y if and only if X gamma(a, λ), Y gamma(b, λ) for some parameters a, b, λ. In this case, beta(a, b), and X + Y gamma(a + b, λ). This leads to the following. X X+Y i=1 Proposition (Patil-Taillie [?]). Consider the stick-breaking representation in (1.6) of the size-biased permutation of an i.i.d sequence with distribution F. The random variables T n and the W n,1,..., W n,n 1 in (1.6) are mutually independent if and only if F is gamma(a, λ) for some a, λ > 0. In this case, X n [1] = γ 0 β 1 X n [2] = γ 0 β1 β 2... X n [n 1] = γ 0 β1 β2... β n 2 β n 1 X n [n] = γ 0 β1 β2... β n 1 where γ 0 has distribution gamma(an, λ), β k has distribution beta(a+1, (n k)a), β k = 1 β k for 1 k n 1, and the random variables γ 0, β 1,..., β n 1 are independent. Proof. It is sufficient to consider the first stick-break. Note that ( ) X n (1) P(X n [1]/T n du, T n dt) = n u P X n (1) + (X n (2) X n (n)) du, T n dt. (1.7) Suppose F = gamma(a, λ). Since X n (1) d = gamma(a, λ), S n 1 = X n (2) X n (n) d = gamma(a(n 1), λ), independent of X n (1), the ratio X n(1) X n(1)+s n 1 has 9

10 distribution beta(a, a(n 1)) and is independent of T n. Thus Γ(a + a(n 1)) P(X n [1]/T n du) = n u Γ(a)Γ(a(n 1)) ua 1 (1 u) a(n 1) 1 Γ(a a(n 1)) = Γ(a + 1)Γ(a(n 1)) ua (1 u) a(n 1) 1. d In other words, X n [1]/T n = beta(a, a(n 1)). This proves the if direction. Now suppose W n,1 = X n [1]/T n is independent of T n. Reverse the argument, this implies that X n (1) is independent of X n (1)/T n. Apply Lukacs theorem for X = X n (1), Y = X n (2) X n (n), we see that F must be the gamma distribution. 1.3 Size-biased permutation of an infinite i.i.d sequence Subordinator as limit of i.i.d We now want to send n, and derive the analogue of the above results on the infinite simplex. First, we need an infinite sequence of i.i.d random variables X = (X(1), X(2),...) that is a.s. summable T := i X(i) <. This condition is necessary since we will divide by T to obtain a distribution on. Let (X n, n 1) be an i.i.d. positive triangular array, that is, X n = (X n (1),..., X n (n)), where X n (i), i = 1,..., n are i.i.d. and a.s. positive. Write T n for n i=1 X d n(i). A classical result in probability states that T n T if and only if T = T (1) for some Lévy process T, which in this case is a subordinator. Definition A Lévy process T in R is a stochastic process with rightcontinuous left-limits paths, stationary independent increments, and T (0) = 0. A subordinator T is a Lévy process, with real, finite, non-negative increments. Positive increments of an subordinator are called its jumps. Here is one way to cook up a subordinator with infinitely many jumps but the sum of all jumps is finite. Consider a measure Λ on (0, ) such that and 0 (1 x)λ(dx) <, (1.8) Λ((0, )) =. (1.9) Let X be a Poisson point process on (0, ) with intensity measure Λ. Equation (1.8) tells us that this point process has a.s. finitely many points above 1, so it is possible to list them in decreasing order. Let X = (X (1), X (2),...) denote the sequence of points of P in decreasing order. Equation (1.9) tells us that P has a.s. infinitely many points, so X is an infinite sequence. Now let us list X in exchangeable random order. Introduce an independent sequence U(1), U(2),... of i.i.d uniform random variables on [0, 1]. Order the 10

11 pairs (X (i), U(i)) in increasing U-coordinate, and let this be the list of (jumpsize, jump-time) description of our subordinator. 1 That is, define the process T : [0, 1] R 0 by ( T )(t) = d t + i X(i)1 {U(i) t} for 0 t 1, and d 0 a fixed number called the drift coefficient. One can check that T is indeed a subordinator restricted to [0, 1]. Furthermore, all subordinators satisfying T (1) < a.s. is of this form. The measure Λ is called the Lévy measure of T. Finally, we need a classical result on convergence of i.i.d. positive triangular arrays to subordinators (see [?, 15]). Theorem Let (X n, n 1) be an i.i.d. positive triangular array, T n = n i=1 X d n(i). Then T n T for some random variable T, T < a.s. if and only if T = T (1) for some subordinator T whose Lévy measure Λ satisfies (1.8). We would also want the following result on convergence of densities. Theorem Consider the setup of Theorem Assume T n d T, T < a.s. Let µ n be the measure of X n (i). If µ n, Λ have densities ρ n, ρ, respectively, then we have pointwise convergence for all x > 0 nρ n (x) ρ(x). Consider a subordinator with Lévy measure Λ, drift d = 0. Let T 0 be the subordinator at time 1. Assume Λ(1, ) <, Λ(0, ) =, 1 xλ(dx) <, 0 and Λ(dx) = ρ(x) dx for some density ρ. Note that T 0 < a.s, and it has a density determined by ρ via its Laplace transform, which we denote ν. Let T k denote the remaining sum after removing the first k terms of the size-biased permutation of the sequence X of ranked jumps. Proposition ([?]). The sequence ( T 0, T 1,...) is a Markov chain with stationary transition probabilities P( T 1 dt 1 T 0 = t) = t t 1 t ρ(t t 1 ) ν(t 1) ν(t) dt 1. Proof. Note the similarity to (1.5). Starting with (1.4) and send n, for any finite k, we have ν n k ν pointwise, and by Theorem 1.3.1, (n k)ν 1 ρ pointwise over R, since there is no drift term. Thus the analogue of (1.4) in the limit is P( T 0 dt 0,..., T k dt k ) = ( k 1 i=0 t i t i+1 t i ρ(t i t i+1 ) Rearranging gives the transition probability in Proposition ) ν(t k ) dt 0... dt k. (1.10) 1 We can t really speak of the smallest U(i) since there are infinitely many of them. However, the expression for T below still makes sense. 11

12 1.4 Random mass partition with independent stick-breaks. Stable subordinators and Poisson- Dirichlet distributions. The stick-breaking representation in (1.6) in the limit as n takes the form X[k] T 0 = W k k 1 i=1 W i, (1.11) where X[k] is the kth size-biased pick from the jumps of the subordinator T, and W i = X[i], W T i = 1 W i = T i. i 1 T i 1 Let us for a moment forget about the subordinator and consider (1.11). As long as the W i s are random, we have a random mass partition (P 1, P 2,...), with P i = W 1 W i 1 W i (1.12) for i = 1, 2,.... Let us look for nice random partitions. Specifically, we want the W i s are independent, and P given by (1.12) is the size-biased permutation of some mass partition Q. But if P d = Q, then P d = (Q ) d = Q d = P. So this is equivalent to saying that the distribution of P is invariant under size-biased permutation. Pitman [?] proved a complete characterization. Theorem ([?]). Let P 1, P 1 < 1, and P n = W 1 W n 1 W n for independent W i. Then P = P if and only if one of the four following conditions holds. 1. P n 0 a.s. for all n, in which case the distribution of W n is beta(1 α, θ + nα) for every n = 1, 2,..., for some 0 α < 1, θ > α. 2. For some integer constant m, P n 0 a.s. for all 1 n m, and P n = 0 a.s. otherwise. Then either (a) For some α > 0, W n has distribution beta(1 + α, mα nα) for n = 1,..., m; or (b) W n = 1/(m n + 1) a.s., that is, P n = 1/m a.s. for n = 1,..., m; or (c) m = 2, and the distribution F on (0, 1) defined by F (dw) = wp(w 1 dw)/e( W 1 ) is symmetric about 1/2. The Patil-Taillie case of Proposition is case 2(a). Case 2(b) is the limit of 2(a) as α. Case 2(c) is the special situation where the random mass partition only has two parts. In case (1), such a distribution P is known as the GEM(α, θ) distribution. The abbreviation GEM was introduced by Ewens, which stands for Griffiths- Engen-McCloskey. If P is GEM(α, θ), then P is called a Poisson-Dirichlet distribution with parameters (α, θ), denoted P D(α, θ) [?]. This is an important 12

13 family of ranked random mass partitions, with applications in fragmentation and coalescence, Bayesian statistics, and machine learning. See [?] and references therein. We now prove one direction of this theorem by deriving P D(α, θ) as ranked jumps of a subordinator. Definition The subordinator T with Lévy measure Λ(dx) = θx 1 e cx, dx for x > 0 is called a gamma subordinator with parameter (θ, c). The parameter c > 0 is called the scaling parameter. Indeed, if T a gamma subordinator with parameter (θ, 1), then c T is a gamma subordinator with parameter (θ, c). Thus, the parameter c plays no role in the random mass partition defined by a gamma subordinator. Proposition (P D(0, θ)). Let T be a gamma(θ, c) subordinator. For X its sequence of ranked jumps, the random mass partition X / T (1) has distribution P D(0, θ). Definition A stable process is a real-valued Lévy process (Y (s), s 0) with initial value Y (0) = 0 that has the self-similar property Y (s) d = s 1/α Y (1) for all s 0. The parameter α is the exponent of the process. Stable processes are important in probability. subordinator that is also a stable process. A stable subordinator is a Lemma Consider a subordinator T with Lévy measure Λ with density ρ(x) = cx (1+α), (1.13) for some constant c > 0, x > 0. Then T satisfies (1.8) if and only if α (0, 1). Proof. Plug (1.13) in to (1.8) and integrate. Proposition (P D(α, 0)). For α (0, 1), let T be a stable(α) subordinator with density (1.13). For X its sequence of ranked jumps, the random mass partition X / T (1) has distribution P D(α, 0). To obtain P D(α, θ), we will do a change of measure from P D(α, 0). First we need a technical lemma. Lemma For α (0, 1), let T be a stable(α) subordinator with density (1.13). Then T (1) > 0 a.s. and E( T (1) θ ) < for all θ > α. Proof. For θ > 0, start with the identity x θ = 1 Γ(θ) 0 e xt t θ 1 dt. Apply Fubini-Tonelli s theorem for nonnegative functions E( T (1) θ ) = 1 Γ(θ) 0 13 Ee T (1)t t θ 1 dt

14 Since T is α-stable, Ee T (1)t = e ctα. (One can also use the Lévy-Khitchine formula for subordinators). Plug in and simplify, we get E( T (1) θ ) = Γ(θ/α) αc θ/α Γ(θ). For θ < 0, express the Γ function as an integral, and use analytic continuity to extend the above computation. Note that Γ(θ/α) < for θ > α. Proposition (P D(α, θ)). For α (0, 1), let T be a stable(α) subordinator with density (1.13). Let X be its sequence of ranked jumps. Let P α denote the distribution of X. For θ > α, define the probability measure P α,θ to be absolutely continuous with respect to P α and has density P α,θ = T (1) θ E T (1) θ P α. Then under P α,θ, the random mass partition X / T (1) has distribution P D(α, θ). Proof of Propositions 1.4.2, and From (1.10), for general ρ, the joint density of T 1 and W 1 is f W1, T 1 (w 1, t 1 ) = f T0, T 1 (t 1 w 1 1, t 1 ) = w 1 ρ( t 1w 1 w 1 )t 1 w 1 1 ν(t 1 ), where w 1 = 1 w 1. For Propositions and 1.4.4, plug in the corresponding formulas for ρ and simplify. For instance, consider the α-stable subordinator of Proposition Plug in (1.13) for ρ gives f W1, T 1 (w 1, t 1 ) = cw α 1 w 1 α 1 t α 1 ν(t 1). (1.14) Since c is a constant independent of w 1 and t 1, we conclude that W 1 and T 1 are independent. In particular, W 1 has beta(1 α, α) distribution. Repeatedly apply (1.10) as above show that the W i s are independent and have the desired distributions. For Proposition 1.4.6, note that T (1) = T 0 = t 1 w 1 1. So the density f W1, T 1 (w 1, t 1 ) under P α,θ is Cf W1, T 1 (w 1, t 1 )( t 1 w 1 ) θ for some constant C depending on α and θ, and f W1, T 1 (w 1, t 1 ) is given in (1.14). Simplify to get Ccw1 α w α+θ 1 1 t (α+θ) 1 ν(t 1 ). So under P α,θ, W 1 and T 1 are independent, W 1 has beta(1 α, α+θ) distribution, and T 1 is distributed like T 0 under P α,θ+α. Recurse this computation gives the desired result for the distribution of the W i s. We have shown that the size-biased permutation of the jumps of the subordinators defined in Propositions 1.4.2, and follow the GEM(0, θ), GEM(α, 0) and GEM(α, θ) distributions, respectively. Thus, the ranked jumps follow the Poisson-Dirichlet distributions with corresponding parameters. 14

15 1.5 Exchangeable random partitions. Kingman s correspondence Partition of [n] Let n N { }. Define [n] = {1, 2,..., n}, with the convention that [ ] = N. For a set A, let A denote its cardinality (number of elements). Definition A partition π n of [n] into k blocks is an unordered collection of non-empty disjoint sets {A 1,..., A k }, whose union is [n]. The set of unordered block sizes { A 1,..., A k } is a partition of n, which we denote π n. The canonical way to order a partition of [n] is by an increasing order on the least element of each block. This is called order by appearance. A partition of [n] induces a partition of [m] for all m n by restricting to the first m elements. Definition A sequence (π [n], n 1) of partitions of [n] is called consistent or compatible if the restriction of π [n] to [m] equals π [m] for all m n. Lemma A sequence (π [n], n 1) of partitions of [n] is consistent if and only if π [n] equals the restriction of π to [n] for some partition π of N. Proof. Suppose π is a partition of N. Then clearly its restrictions to [n] form a consistent sequence. Conversely, suppose we have a consistent sequence. Label the blocks by order of appearance. Let π [n] (i) be the block of π [n] containing in i. For each fixed i N, (π [n] (i), n 1) is a non-decreasing sequence of sets. Define π(i) = n N π [n] (i). Then (π(i), i N) is a partition of N, call it π. Clearly π restricted to [n] equals π [n] Exchangeable partitions In many applications we want to forget about the labels of the individual elements of [n], treating them as exchangeable objects. Effectively we want to forget about the partition of [n], and concentrate on partitions of n. A permutation is a bijection σ : [n] [n]. For n =, a permutation of N is a bijection σ : N N that leaves all but finitely many elements of N fixed. The group of permutations of [n] naturally acts on a partition of [n] by permuting the labels of the elements. That is, where if A 1 = {i, j, k,...}, then σ {A 1,..., A k } = {σ A 1,..., σ A k }, σ A 1 = {σ(i), σ(j), σ(k),...}. Example Example: order by appearance, action of symmetric group. Exchangeable. Size-biased permutation, sampling without replacement and order by appearance. 15

16 Definition (Exchangeable random partitions). Let n N { }. A random partition Π of [n] is called exchangeable if for every permutation σ of [n], σ Π d = Π. Corollary A random partition Π of N is exchangeable if and only if its restrictions to [n] are exchangeable for all n N. Example (Random partitions from Kingman s paintbox). Fix p = (p 1, p 2...). Represent it as an interval partition of [0, 1], viewed as paint buckets. Use Kingman s paintbox to color balls numbered i = 1, 2,.... Say that i j if i and j have the same color. This defines a random partition Π of [ ]. Definition Call the random partition Π in the previous example the paintbox based on p. Lemma Fix p = (p 1, p 2...). The paintbox based on p is an exchangeable random partition. Proof. Let (U i, i 1) be the sequence of i.i.d uniform random variables used to construct the paint boxes. By definition, σ Π has the same distribution as the paintbox constructed with the sequence (U σi, i 1). But the U i s are i.i.d, so (U σi, i 1) d = (U i, i 1). Thus, σ Π d = Π. So Π is exchangeable. Example (Finite paintbox). For k N, let π = (n 1,..., n k ) be a partition of n into k parts. Represent them as k disjoint intervals, each containg n 1,..., n k integer points on [1, n]. Use a discrete version of Kingman s paintbox to paint n balls as follows: for each i [n], pick an integer on [1, n] uniformly at random out of the remaining integers, and color it according to the interval that it falls on. Then remove this integer from [1, n]. This gives a random partition Π n of [n]. Definition Call the random partition Π n in the previous example the finite paintbox based on p. The finite paintbox based on π is ex- Lemma Fix π = (n 1,..., n k ). changeable. The following says that exchangeable finite partitions arise as mixture of finite paintboxes. It setups a bijection between exchangeable finite partitions and random mass partitions supported on indivisible units. It is a finite version of the Kingman s correspondence. Proposition (Finite Kingman s Correspondence). Let Π n be an exchangeable partition of [n]. Then the distribution of Π n is a mixture of finite paintboxes. That is, let πn be the corresponding partition of n arranged in decreasing block sizes. Then the distribution of Π n conditioned on πn = π equals the distribution of the finite paintbox based on π. Proof. Exercise Theorem (Kingman s correspondence). Let Π be an exchangeable random partition of N. Then the law of Π is a mixture of paintboxes. That is, P(Π ) = P( Π dp)ρ p ( ), where ρ p stands for the law of the paintbox based on p. 16

17 Proof. For a given partition π of N, let b π : N N be a map such that all elements of the same block of π receive the same value. That is, i and j are in the same block of π if and only if b π (i) = b π (j). Such a map is called a selection map. Let U 1,... be a sequence of i.i.d uniform, independent of Π. Conditioned on values of Π, define X(i) = U bπ(i). Note that X defines Π a.s. via i j iff X(i) = X(j). We claim that X = (X(1),...) is exchangeable. Indeed, for a permutation σ of N, σ(x)(i) = X(σ i ) = U bπ(σ i) = U bπ σ(i) = U σ 1 b Π σ (i), where U j = U σ j. Now, note that σ 1 b Π σ is a selection map of b σ(π). Since Π d = σ(π) and independent of U, and U i s are i.i.d., we conclude that σ(x) = (X(σ 1 ),...) d = X = (X(1),...). So X is exchangeable. By de Finetti s theorem, there exists some random probability measure µ on [0, 1] such that X(i) s are i.i.d conditioned on µ. Now, conditioned on µ, let F be its distribution function. Introduce an independent sequence V 1,... of i.i.d uniform random variables on [0, 1]. Then conditioned on µ, (F 1 (V 1 ),...) d = X. Recover Π a.s. from X. Then i and j belongs to the same block of Π if and only if V i and V j belong to the same interval in the domain of F 1. So conditioned on µ, Π is distributed as a paintbox based on the intervals in the domain of F 1. So Π is a mixture of paintboxes (with mixture law µ) as required. Let us now consider some consequences of Kingman s correspondence. Definition Let (π n ) be a sequence of partitions. Write π n as (N n,1, N n,2,...) in the order of least element. For each i = 1, 2,..., the limit N n,i lim n n, if it exists, is called the asymptotic frequency of block i. If all blocks of (π n ) have asymptotic frequency, say that (π n ) has asymptotic frequencies. Fix p. Suppose i p(i) = 1. Let Π be a paintbox based on p, Π n be its restriction on [n]. Write Π n as (N n,1, N n,2,...) in the order of least element and as (N n,1, N n,2,...) in the order of decreasing block sizes. Then as n, by law of large numbers, for each i, N n,i lim n n p (i) (1.15) and N n,i a.s. lim P (i), (1.16) n n where P is the size-biased permutation of p. If i p(i) < 1, then we still have (1.15). For each i, either (1.16) holds, or N n,i a.s. lim 0. n n 17

18 The later case occurs if and only if Π(i) is a singleton. The set of singletons Π(0) := {i N : Π(i) = {i}} is a random set with mass p(0) := 1 i p(i). That is, paintboxes (and hence mixture of paintboxes) have asymptotic frequencies. So by Kingman s theorem, Corollary All exchangeable random partitions of N have asymptotic frequencies EPPF Lemma Let n N. A random partition Π n of [n] is exchangeable if and only if P(Π n = {A 1,..., A k }) = p( A 1,..., A k ) for some probability mass function p taking values on finite (but not fixed) partitions of n, and p is symmetric in its argument. The function p of Lemma is called the exchangeable partition probability function (EPPF) of Π n. Since p is symmetric, it is customary to list its arguments in decreasing sizes. Let C n denote the set of partitions of n ordered in decreasing sizes (these are called compositions of n). So p maps C n to [0, 1]. Since Π n is exchangeable, it induces a sequence of consistent exchangeable random partitions by restrictions to m < n. Thus, one can regard p as a map from n m=1 C m [0, 1]. Consistency implies that p satisfies the following addition rule: For each composition (n 1,..., n k ) of m n, p(n 1,..., n k ) = p(n 1,..., n k, 1) + In addition, k p(n 1,..., n j 1, n j + 1, n j+1,..., n k ). j=1 (1.17) p(1) = 1. (1.18) Conversely, if p : n m=1 C m [0, 1] satisfies (1.17) and (1.18), then by Lemma 1.5.8, it is an EPPF of some exchangeable partition Π n of [n]. Definition (Infinite EPPF). An infinite EPPF is a function p : m=1 C m [0, 1] that satisfies (1.17) and (1.18). By Lemma and definition of infinite exchangeable partitions, each infinite EPPF specifies the law of an exchangeable partition Π of N Kingman s correspondence in terms of EPPF Our goal is to cast Kingman s result in terms of the EPPF. This leads us to Theorem of Pitman. Fix p. Consider the paintbox Π based on p. What is its EPPF p? The answer is given by the Chinese Restaurant Process, introduced by Dubins and Pitman and later generalized by Pitman [?, 3]. Definition (Partially Exchangeable Chinese Restaurant Process). Start with an initially empty restaurant with an unlimited number of tables numbered 1, 2,..., each capable of seating an unlimited number of customers. Customers numbered 1, 2,... arrive one by one and choose their seats according to the following rules. 18

19 1. Customer 1 sits at table 1 2. Given that the first n customers have sat at k tables, the (n + 1)-st customer will: Join table i with probability p(i) Join a new table with probability 1 k j=1 P (j). Identify the seating configuration of n customers with a partition of [n], where i j if and only if customers i and j sit at the same table. This process defines a consistent sequence (Π n ) of partitions of n with asymptotic block frequencies p. For blocks A i listed in order of appearance, the law of Π n is specified by P(Π n = {A 1,..., A k }) = p(n 1,..., n k ) = k i=1 i=1 k 1 p(i) ni 1 i=1 1 i p(j), (1.19) where n i = A i. For general p, p defined in (1.19) is not symmetric in its argument. Thus, (Π n ) in general is not exchangeable. Now let P be a random mass partition on. Condition on the value of P, draw a random partition (Π n ) of [n] by the Chinese Restaurant Process above. This gives a consistent sequence of partitions (Π n ), with asymptotic block frequencies P, and law k k 1 i P(Π n = {A 1,..., A k }) = p(n 1,..., n k ) = E P (i) ni 1 1 P (j). (1.20) The key theorem of this section states that this sequence (Π n ) is exchangeable if and only if p in (1.20) is symmetric in its argument. Roughly, this says that exchangeable partitions of N are in bijection with exchangeable mixtures of Chinese Restaurants. Theorem (Pitman EPPF theorem). Let P be a random mass partition on. Define p : m=1 C m [0, 1] k p(n 1,..., n k ) = E i=1 k 1 P (i) ni 1 i=1 1 i=1 j=1 j=1 i P (j). (1.21) There exists an exchangeable partition Π of N whose asymptotic block frequencies in order of appearance P has the same distribution as P if and only if p is a symmetric function in its arguments. In this case, the EPPF of Π is p defined for P (i) = P (i) EPPF for independent stick-break family Let P be a random mass partition on invariant under size-biased permutation, such that its stick-breaking representation (1.12) consists of independent W i s. For each fixed k, by Theorem 1.5.9, one can compute the EPPF p using (1.21) j=1 19

20 and condition on P (k), take the expectation, condition on P (k 1), take the expectation, and so on. The independence property simplifies the formula. Recall from Theorem that there are only a few families satisfying the independent stick-break criterion. The more interesting one is the GEM(α, θ) family. The EPPF for P GEM(α, θ) is the Pitman sampling formula p α,θ (n 1,..., n k ) = (θ + α) k 1 α (θ + 1) n 1 1 where for integer m and real numbers x,a, (x) m a = m 1 k (1 α) ni 1 1, (1.22) i=1 (x + ia). For example, for α = 0, θ > 0, this is the Ewens sampling formula p 0,θ (n 1,..., n k ) = i=0 θ k θ(θ + 1) (θ + n 1) k (n i 1)!. (1.23) The original Ewens sampling formula is expressed in terms of K n,j, the number of tables with j people at stage n. In this form, (1.22) becomes i=1 P(K n,j = c j, j {1,..., n}) = (θ + α) ( j c j ) 1 α (θ + 1) n 1 n! n j=1 cj!(j!)c j ( ) cj (1 α)(j 1) j (1.24) The seating plan for the Chinese Restaurant Process has the following simple description [?]. Customer 1 sits at table 1. Given that n customers have sat at k tables, with n i customers at table i, customer (n + 1) will Sit at table i with probability ni α n+θ Sit at a new table with probability θ+kα n+θ. 1.6 Various applications of the GEM process The Chinese Restaurant description of the GEM process gives a simple way to check if some random partition is GEM. Coupling with the CRP gives an induction tool. We give some examples Random permutations Let σ n be a random permutation of [n]. Write σ n as a product of cycles. This defines a random partition of [n], call it Π n. There is an abundance of connections between random permutations and the GEM family. Here are some basic results. 20

21 Proposition Let (σ n, n 1) be a sequence of random permutations of [n] with the following properties σ n is a uniform random permutation of [n] Conditioned on σ n, σ n+1 is distributed as σ n with the element n+1 inserted in one of its cycles. Let Π n be the corresponding sequence of random partitions of [n]. Then Π n is distributed as the restriction to [n] of a GEM(0, 1). Corollary Let K n be the number of cycles in a uniform random permutation of [n]. Then n K n = Bernoulli( 1 i ) i=1 for independent Bernoulli s. In particular, K n log n d lim N(0, 1), n log n where N(0, 1) is the standard Gaussian Kingman coalescent with mutation As an application of our theory so far, we derive some properties of Kingman coalescent. Definition Fix n N. The Kingman coalescent on [n] is a Markov chain in continuous time with values in the space of partitions of [n]. This Markov chain is defined as follows. The trivial partition 1 [n] = ([n]) is an absorbing state. If π, π are two partitions of [n] such that π can be obtained from π by merging (coagulation) of two of its blocks, then the jump rate from π to π is one. All other jump rates are zero. By definition of continuous time Markov chain, to say that the jump rate from π to π is one means that the transition time from π to π is an exponential random variable with rate 1. So in terms of independent exponentials, here is one equivalent description of the n-coalescent. Corollary Start at initial state π which has k blocks. For each of ( ) k 2 pairs of blocks, introduce an independent exponential clock with rate 1. When the first clock rings, merge the pair. Then the resulting partition π is distributed as the next state of the Kingman coalescent on [n] starting at π. In the n-coalescent, there are ( k 2) different states that π can jump to, and each of them happens with equal rate. In other words, one pick a pair of blocks to merge uniformly at random. There are two ways to get a random partition of [n] from the line of descent tree. One is to chop the tree at some time t. This we will do when study coalescent as continuous time Markov processes. Another is to introduce mutation. 21

22 Proposition Let T be the line of descent tree from Kingman s coalescent on [n]. On each branch of the tree, introduce an independent Poisson point process with rate θ/2 per unit length of marks, called mutations. Say that i j if and only if the unique path in T connecting i and j does not have mutations. This defines a random partition Π n. Show that Π n is a GEM(0, θ). Proof. We will use a coupling with the CRP for GEM(0, θ). Condition on Π n 1 and the tree T n 1, consider element n. By Kingman s coalescent, for each i = 1,..., n 1, there is an independent exponential clock for the pair (i, n), and the first clock that rings is the point on the tree T n 1 that n will join. Let S be the time that the first clock rings. Note that S exponential(n 1). For a some table j, the event that {njoins tablej} is the event that Clock (n, i) is min for some i in table j AND there is no mutation on [0, 2S] for an independent P P P (θ/2). Note that these two events are independent. The probability of the first event is n j /(n 1). The probability of the second event is the probability that exponential(θ) > exponential(n 1) for two independent exponentials. This happens with probability n 1 n 1+θ. Thus, the probability of n joining table j is. This gives the coupling required to the CRP for GEM(0, θ). n j n 1+θ A branching process description of the GEM distribution There is an interesting interpretation of the GEM(α, θ) distribution in terms of a branching process in continuous time. Let Z(t) be the population at time t. Each individual lives for a random time. Each individual has an independent exponential clock. When the clock rings, she produces a random number of offsprings. Definition (GEM(α, θ) branching). Fix α [0, 1], θ > 0. Our population has two types, novel and clone. Each individual has a color, and has infinite lifetime. The reproduction rates are: A novel produces a novel at rate α, and independently produces a clone at rate 1 α. A clone produces a clone at rate 1. Furthermore, novels migrate at rate θ, independent of the reproduction. The population starts with one novel at time t = 0. The coloring rules are: Each novel has a new unique color. Each clone has the same color as its parent. Number the individuals 1, 2,... in their order of appearance in the population. Let Π be the random partition of N generated by their colors, that is, i j if and only if i and j have the same color. Proposition The random partition Π of N from the branching process in Defintion is a GEM(α, θ). Proof. The induced partition on [n], Π n, evolves according to a (α, θ) Chinese Restaurant Process. This uniquely determines the law of Π. We are done. 22

23 With this description, one can use standard results from branching processes to derive properties of the GEM(α, θ) process. For example, Lemma Let Π be a GEM(α, 0) partition, Π n be its restriction to [n]. Let K K n be the number of components of P i n. Then lim n n n exists a.s.. Denote α this limit by S. Then the distribution of S is specified by the identity W = SW α, where W is an exponential(1) independent of S, and W d = exponential(1). Proof. Let N(t) be the number of individuals at time t, N (t) be the number of novel individuals at time t. let T n be the first time where the n-th individual is born. By definition, N(T n ) = n and N (T n ) = K n. so N α (T n ) K n e αtn n α = N (T n ) e. αtn Note that K n is independent of T n, hence N α (T n) = n Kn is independent of e αtn e αtn n. α Now, (Nt, t 0) is a Yule process with rate α, that is, a pure birth process with transition rate iα from state i to state i + 1. And (N t, t 0) is a Yule process with rate 1. By known results for Yule processes, and N t e t Nt a.s. e αt a.s. W, W, where W and W are exponentially distributed with mean 1. Since T n a.s. as n, we have and So Kn n α N α (T n ) e αtn N (T n ) e αtn S independent of W, as needed. By computing moments, one finds that ES p = a.s. W α, a.s. W. Γ(p + 1) Γ(pα + 1). In other words, S has the Mittag-Leffler distribution with parameter α. Proposition also reveals an interesting coupling between GEM(0, θ) and GEM(α, θ) distribution for θ 0. If we ignore distinction between clone and novel, then this is just a birth process with rate 1, immigration at rate θ. So it is a GEM(0, θ) distribution, with i j if and only if i and j share a common ancestor. Conditioned on this partition, then the GEM(α, θ) is a refinement, where each family is broken up by coloring rules according to a GEM(α, 0). Proposition Let α [0, 1], θ 0. Break a stick of length 1 according to the GEM(0, θ) distribution. Then, break each of these stick further independently at random, according to the GEM(α, 0) distribution. let Q be a size-biased random permutation of the lengths in this array. Then Q has the GEM(α, θ) distribution. 23

24 1.7 Number of blocks in a CRP Let Π be an infinite GEM(α, θ) partition, Π n be the induced partition of [n]. Let K n be the number of blocks of Π n. Lemma The distribution of K n is where S α (n, k) = S 1, α n,k Proof. Note that P(K n = k) = (θ + α) k 1 α (θ + 1) n 1 S α (n, k), (1.25) is a generalized Stirling number of the first kind. K n = n X i, i=1 where X i is the indicator of the event that customer i sits at a new table in the CRP description of Π n. From the CRP description, the sequence (K n, n 1) is a Markov chain, starting at K 1 = 1, with increments in {0, 1}, and inhomogeneous transition probabilities P(K n+1 = k + 1 K n = k) = kα + θ n + θ (1.26) P(K n+1 = k + 1 K n = k) = n kα n + θ. (1.27) One then verifies by induction that the marginal distribution of the variables in the above Markov chain satisfies (1.25) Almost sure limits We saw from the above Lemma that for α = 0, K n is the sum of independent d Bernoulli, with X i = Bernoulli( θ θ+i 1 ). Hence for θ > 0, as n, by strong law of large numbers, K n a.s. = 0, log n and by classical central limit theorem (Lindeberg s), K n θ log n θ log n d N(0, 1). The case θ = 1 was discussed in Corollary When α (0, 1) and θ = 0, we saw in Lemma that K n n α a.s. S, where S has Mittag-Leffler density. We now discuss the last case, α (0, 1) and θ > 0. As expected from Proposition and Kingman s correspondence, there is an almost-sure limit for Kn n obtained by a change of measure for S. (In fact, in CSP, the theorem α below is used to prove Proposition via Kingman s correspondence). 24

25 Theorem For α (0, 1), θ > α, let K n be the number of blocks of a GEM(α, θ) partition of [n]. Then, as n, and for all p > 0, K n n α a.s. S α, E( K n n α )p ES p α, where S α is a strictly positive random variable with continuous density d ds P(S Γ(θ + 1) α ds) = g α,θ (s) := Γ(θ/α + 1) sθ/α g α (s), s > 0 where g α = g α,0 is the Mittag-Leffler density of the distribution of S for the case θ = 0 in Lemma Proof. See [?, Theorem 3.8]. Questions: Is there a proof for the case θ 0 via the coupling in Proposition 1.6.1? Power law Let {k n } be a sequence of real numbers. If for large n, k n = O(n γ ), then we say that the sequence {k n } obeys power law with index γ. Power laws can be observed in many different real world scenarios. The figure below (taken from [?]) shows the distribution of AOL users visits to various sites on a December day in [?] observed that a few sites get upward of 2000 visitors, whereas most sites got only a few visits (70,000 sites received only a single visit). The linearity in the log-log scale suggests that the following is a reasonable model: number of sites = C (number of visitors) γ where γ is the slope Definition An exchangeable partition Π such that K n S α := lim n n α exists a.s. is said to exhibit power law with power α, and α-diversity S α. Partition models with power law behavior has many applications in computer science. By Theorem 1.7.2, partition of N induced by GEM(α, θ) is an example of power law partition. In general, we have the following characterizaton. Theorem Fix α (0, 1). For an exchangeable partition Π, let (P i ) denote its sequence of ranked asymptotic frequencies. Then K n n α a.s. S α as n P i a.s. Zi 1/α as i, (1.28) where the random variables Z and S α determine each other by S α = Γ(1 α)z α. (1.29) 25

26 Proof Sketch. Since the convergence involved are almost sure, it is sufficient to prove the statement for a fixed sequence (p j ). The proof uses Poissonization, a method introduced by Freedman [?] to approximate a model by one with independent random variables. For our case, consider the Kingman s paintbox description of Π n. Let X n,j is the number of balls that fall in bin j. Write X n = (X n,1, X n,2,...). Note that K n = #{j : X n,j > 0}. The problem is that the X n,j s are not independent, since j X n,j = n. To Poissonize, consider an independent Poisson point process at rate 1 on (0, ). A point on this p.p.p is an event. At each event, throw a ball down on Kingman s paintbox. So in this model, balls are thrown in continuous time. This replaces X n by the independent Poisson point processes X(t) = (X j (t), j = 1, 2,...), where X j (t), the number of balls in box j at time t, is an independent Poisson random variable with rate p j. Let K(t) be the number of balls at time t. Now K(t) is a sum of independent Bernoulli. After doing the analysis for the Poissonized model, one de-poissonize by showing that the corresponding finite-n quantities are not far away from the Poissonized quantities. In other words, the de-poissonization step is to show that K n K(n) E(K(n)) E(K n ). (1.30) Here we write A n B n to mean An a.s. B n 1 as n. Thus, it remains to show that the statement holds for the de-poissonized model. That is, for a constant sequence {p i } and a constant z > 0, we need to show that p i zi 1/α as i EK(t) Γ(1 α)zt α as t. (1.31) Now, E(K(t)) = j (1 e tpj ), So (1.31) is a real analysis statement (and thus can be proved by a variety of methods). Granted this result, de-poissonizing, one proves the theorem. Note that (1.29) is an equality, not an equality in distribution. That is, (P i, i = 1, 2,...) has power law if and only if (K n, n = 1, 2,...) has power law, and their a.s. limits, up to a constant, is the same. Using Poissonization, one also obtains power law for the number of tables with j number of people for fixed j. Theorem [?] Fix α (0, 1), θ > α. Run a CRP (α, θ). Let K n denote the number of tables at step n, S α denote its α-diversity. Let K n,j be the number of tables with j people at step n. For each fixed j N, K n,j lim n n α a.s. p α (j)s α (1.32) where {p α (j), j N} is the discrete probability distribution defined by ( ) α p α (j) := ( 1) j 1 = α(1 α) j 1, j j! and K n,j K n a.s. p α (j) as n. (1.33) 26

EXCHANGEABLE COALESCENTS. Jean BERTOIN

EXCHANGEABLE COALESCENTS. Jean BERTOIN EXCHANGEABLE COALESCENTS Jean BERTOIN Nachdiplom Lectures ETH Zürich, Autumn 2010 The purpose of this series of lectures is to introduce and develop some of the main aspects of a class of random processes