The Moran Process as a Markov Chain on Leaf-labeled Trees

The Moran Process as a Markov Chain on Leaf-labeled Trees David J. Aldous University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860 aldous@stat.berkeley.edu http://www.stat.berkeley.edu/users/aldous March 29, 1999 Abstract The Moran process in population genetics may be reinterpreted as a Markov chain on a set of trees with leaves labeled by [n]. It provides an interesting worked example in the theory of mixing times and coupling from the past for Markov chains. Mixing times are shown to be of order n 2, as anticipated by the theory surrounding Kingman s coalescent. Incomplete draft -- do not circulate AMS 1991 subject classification: 05C05,60C05,60J10 Key words and phrases. Coupling from the past, Markov chain, mixing time, phylogenetic tree. Research supported by N.S.F. Grant DMS96-22859 1

1 Introduction The study of mixing times for Markov chains on combinatorial sets has attracted considerable interest over the last ten years [3, 5, 7, 14, 16, 18]. This paper provides another worked example. We must at the outset admit that the mathematics is fairly straightforward, but we do find the example instructive. Its analysis provides a simple but not quite obvious illustration of coupling from the past, reminiscent of the elementary analysis ([1] section 4) of riffle shuffle, and of the analysis of move-to-front list algorithms [12] and move-to-root algorithms for maintaining search trees [9]. The main result, Proposition 2, implies that while most combinatorial chains exhibit the cut-off phenomenon [8], this particular example has the opposite diffusive behavior. Our precise motivation for studying this model was as a simpler version of certain Markov chains on phylogenetic trees: see section 3(b) for further discussion. The model also fits a general framework studied in [4]: see section 3(a). 1.1 The Moran chain The Moran model ([11] section 3.3) in population genetics models a population of constant size n. At each step, one randomly-chosen individual is killed and another randomly-chosen individual gives birth to a child. The feature of interest is the genealogy of the individuals alive at a given time, that is how they are related to each other by descent. In population genetics these individuals are in fact genes and there is also mutation and selection structure, but our interest goes in a different direction. There is some flexibility in how much information we choose to record in the genealogy of the current population, and we will make the choice that seems simplest from the combinatorial viewpoint. Label the individuals as [n] := {1, 2,..., n}, and declare that if individual k is killed then the new child is given label k. The left diagram in figure 1 shows a possible genealogy, in which we keep track of the order of the times at which all the splits in all the lines of descent occured, but not the absolute times of splits. 2

level 7 6 5 4 3 2 1 0 t t = f 7 (t, 4, 7) 6 4 3 7 1 4 2 6 4 3 1 4 2 6 4 7 3 1 4 2 Figure 1. A transition t t in the Moran chain. Precisely, the left diagram shows a tree t with leaf-set [n] and height n, where at each level one downward edge splits into two downward edges, and where we distinguish between left and right branches. Such a tree has n(n + 1)/2 edges of unit length. Write T n for the set of such trees. The cardinality of this set is #T n = n!(n 1)! (1) We leave to the reader the (very easy) task of giving a bijective proof of (1); an inductive proof will fall out of the proof of Lemma 1 below. Interpreting the Moran model as a T n -valued process gives a Markov chain on T n which we call the Moran chain. Here is a careful definition. Take a tree t T n and distinct ordered (j, k) from [n]. Delete leaf k from t, then insert leaf k into the edge incident at leaf j, placing it to the right of leaf j. This gives a new tree t = f n (t, j, k). (2) Such a transition is illustrated in figure 1. Starting from the tree t in the left diagram, leaf 7 is deleted and the levels adjusted where necessary, to give the center diagram; then leaf 7 is inserted to the right of leaf 4, and levels adjusted, to give the tree t in the right diagram. The Moran chain is now defined to be the chain with transition probabilities p(t, t ) = P (f n (t, J, K) = t ) where (J, K) is a uniform random distinct ordered pair from [n]. It is easy to check this defines an aperodic irreducible chain, which therefore has some limiting stationary distribution. Our choice of T n as the precise state-space was motivated by 3

Lemma 1 The stationary distribution of the Moran chain is the uniform distribution on T n. Proof. From a tree t there are n (n 1) equally likely choices of (k, j) which define the possible transitions t t. To prove the lemma we need to show the chain is doubly stochastic, i.e. that for each tree t there are n(n 1) such choices which get to t from some starting trees t. For a tree t (illustrated by the right diagram in figure 1) there is only one leaf k (leaf 7, in figure 1) which might have been inserted last, into a diagram like the tree t in the center diagram. The trees t such that deleting leaf k from t gives t are exactly the trees obtainable by attaching leaf k to any of the (n 1)n/2 edges of t and to the right or the left of the existing edge, giving a total of (n 1)n/2 2 choices, as required. Remark. The final part of the argument says that the general element t T n may be constructed from a tree in T n 1 by attaching leaf n to one of the (n 1)n/2 edges, to the right or the left of the existing edge. So #T n = n(n 1) #T n 1, establishing (1) by induction. 1.2 Mixing time for the Moran chain Write (X n (m), m = 0, 1, 2,...) for the Moran chain on T n. One of the standard ways of studying mixing is via the maximal variation distance 1 d n (m) := max t T 2 n P (X n (m) = t X n (0) = t) π n (t ) (3) t where π n (t ) = 1/#T n is the stationary probability. Our result is most easily expressed in terms of the following random variables. Let (ξ i, 2 i < ) be 1 independent, and let ξ i have the exponential distribution with mean i(i 1) ; then let L = ξ i. (4) i=2 Proposition 2 (a) lim sup n d n (zn 2 ) P (L > z) < 1 for each 0 < z < (b) lim inf n d n (zn 2 ) φ(z), for some φ(z) 1 as z 0. Thus mixing time (as measured by variation distance) for the Moran chain is order n 2 but variation distance does not exhibit the cut-off phenomenon of [8] which usually occurs with Markov chains on combinatorial sets. As briefly sketched in the next section, what s really going on is that d n (zn 2 ) d (z) 4

where d ( ) is the maximal variation distance associated with a certain limit continuous-time continuous-space Markov process. We might call this diffusive behavior, from the case of simple random walks on the k-dimensional integers modulo n, whose n limit behavior is of this form with the limit process being Brownian motion on [0, 1] k. 1.3 Remarks on a limit process We now outline why Proposition 2 is not unexpected. In the original Moran model for a population, one can look back from the present to determine the number of steps L n back until the last common ancestor of the present population. It is standard (see section 2.3) that n 2 L n converges in distribution to L. Loosely speaking, this implies that the genealogy after n 2 L steps cannot depend on the initial genealogy, and Proposition 2(a) is a formalization of that idea. A more elaborate picture is given by the theory surrounding Kingman s coalescent [13, 17], which portrays the rescaled n limit of the genealogy of the current size-n population as a genealogy C of an infinite population. Informally, what s really going on is that d n (zn 2 ) d (z), 0 < z < where d ( ) is the maximal variation distance associated with a certain continuous-time Markov process (C t, 0 t < ) whose state space is a set of possible genealogies for an infinite population. However, defining the process (C t ) precisely and proving mixing time bounds via this weak convergence methodology is technically hard. It turns out to be fairly simple to give an analysis of Proposition 2 directly in the discrete setting, by combining the standard analysis of L n with a coupling construction, so that is what we shall do. 2 Proof of Proposition 2 2.1 Coupling from the past The proof is based on a standard elementary idea. Suppose a Markov chain (X(s)) can be represented in the form X(s + 1) = f(x(s), U s+1 ) for some function f, where the (U s ) are independent with common distribution µ. Then X(m) = g m (X(0), U 1, U 2,..., U m ) 5

where g m (x, u 1,..., u m ) := f(...... f(f(x, u 1 ), u 2 )......, u m ). Define d(m) as at (3). Lemma 3 d(m) 1 P (A(m)), where A(m) is the event g m (x, U m+1, U m+2,..., U 0 ) = g m (x, U m+1, U m+2,..., U 0 ) x, x where the (U i ) are independent with distribution µ. This form of coupling has in recent years become known as coupling from the past, named because we are constructing X(0) from X( m) and the intervening U s, and on the event A(m) the current state X(0) does not depend on the state X( m). Recent interest has focused on the connection between coupling from the past and perfect sampling: see [15]. 2.2 The coupling construction First note the Moran chain fits into the setup above by setting X n (s + 1) = f n (X n (s), J s, K s ) (5) for f n at (2) and (J s, K s ) independent uniform random ordered pairs from [n]. Given a tree t T n, say that f is a forest consistent with t if it can be obtained by deleting some (maybe empty) set of edges of t and then taking the spanning forest on the leaves [n]. The left diagram in figure 2 illustrates a forest f consistent with the left tree in figure 1. The transition t t = f n (t, j, k) on trees extends in the natural way to a transition f f = f n (f, j, k) on forests. Figure 2 shows a transition consistent with the transition in figure 1. f f = f 7 (f, 4, 7) 6 4 3 7 1 4 2 Figure 2. A transition f f. 6 4 7 3 1 4 2 6

Fix m. Consider the Moran chain (X n (s), m s 0) defined by (5) with some initial X( m), and consider the forest-valued chain Y n (s + 1) = f n (Y n (s), J s, K s ) where Y n ( m) is the forest on [n] with no edges. It is easy to check that, for each realization of the joint process (X n (s), Y n (s)), we have that Y n (s) is consistent with X n (s). But Y n (s) does not depend on X n ( m), and if Y n (s) is a single tree then Y n (s) = X n (s). Thus on the event A n (m) := {Y n (0) is a single tree } we have that X n (0) does not depend on X n ( m). So by Lemma 3, and we want to estimate the right side. 2.3 Analyzing the coupling d n (m) 1 P (A n (m)) (6) For a forest f on [n] write #f for the multi-set of the number of leaves in each tree-component. So #Y n ( m) = {1, 1, 1,..., 1} and A n (m) is the event that #Y n (0) = {n}. Now reconsider the original Moran model from the start of section 1.1, and run this model for time s = m, m + 1,..., 0. Give a different color to each individual in the initial time m population, and then let children inherit the color of their parent. Write (C n (s), m s 0) for the multi-set of the number of individuals of each color at time s. It is easy to check these two processes are the same: Lemma 4 The process (C n (s), m s 0) has the same distribution as the process (#Y n (s), m s 0). But analyzing C n ( ) is a classical and elementary topic in population genetics (e.g. [6]) which we repeat for completeness. Define (N n (s), 0 s m) as the number of individuals at time s who have descendants at time 0. Then N n ( ) is the Markov chain on state-space [n] with N n (0) = n and P (N n (s + 1) = j 1 N n (s) = j) = j(j 1) n(n 1) P (N n (s + 1) = j N n (s) = j) = j(j 1) 1 n(n 1). This holds because in the Moran model in reversed time, at each step two different individuals are chosen at random to have the same parent, and 7

the other parents are distinct. Now the chain N n (s) can be defined for 0 s <. Write L n = min{s : N n (s) = 1}. The event {L n m}, is the event that all the individuals in the Moran process at time 0 are descendants of some one individual at time m, and hence have the same color. So So by (6) P (A n (m)) = P (#Y n (0) = {n}) = P (C n (0) = {n}) = P (L n m). d n (m) P (L n > m). On the other hand we can represent L n as a sum of independent random variables n L n = ξ n,i i=2 where ξ n,i has geometric distribution with mean n(n 1) i(i 1). It easily follows that P (L n > n 2 z) P (L > z) for L defined at (4). This establishes part (a) of Proposition 2. Part (b) is easy. For t T n write r(t) for the number of leaves in the right-hand branch of t from the top-level split (so r(t) = 4 for the tree in the left of figure 1). Write R n (m) for the process r(x n (m)), stopped upon reaching value 1 or n 1. Then R n ( ) is a certain Markov chain which jumps by at most 1 each time, and which is a martingale, so E(R n (m) R n (0)) 2 m. It follows that, for initial trees t n satisfying n 1 r(t n ) r(0) (0, 1), if m n = o(n 2 ) then n 1 r(x n (m n )) r(0) in probability. Part (b) follows easily. 3 Final remarks (a) As with the analogous chains (riffle shuffle, move-to-front, move-to-root) mentioned in the Introduction, it seems plausible that one can do a more refined analysis of the Moran chain which exhibits all the eigenvalues; in another direction, one might be able to handle non-uniform distributions 8

on leaf-pairs. Indeed, Brown [4] section 6.3 observes that the non-uniform Moran chain fits into the general setting of random walks on left-regular bands (a type of semigroup), without analyzing this particular chain. (b) One can alternatively define the state space of the Moran chain to be the (smaller) set T n of n-leaf cladograms. A cladogram is also a binary tree with leaf-set [n], but now we do not distinguish between left and right branches, and we do not count the overall order of splits. The Moran chain on T n has a certain non-uniform stationary distribution, but the conclusion of Proposition 2 remains true. Cladograms are used in biological systematics [10] to represent evolutionary relationship between species. Markov chain Monte Carlo methods to infer the true cladogram from data start with some base chain on cladograms, providing some remote motivation for studying their mixing times. A related chain designed to have uniform stationary distribution on T n is studied in [2]; a coupling argument is used to bound its mixing time as order n 3, though we expect the correct mixing time is again order n 2. The proof in [2] uses a more intricate coupling argument of the same style as in this paper, but the absence of an analog of Lemma 4 makes its analysis rather harder. References [1] D.J. Aldous and P. Diaconis. Shuffling cards and stopping times. Amer. Math. Monthly, 93:333 348, 1986. [2] D.J. Aldous and P. Diaconis. Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem. Bull. Amer. Math. Soc., 36:413 432, 1999. [3] D.J. Aldous and J.A. Fill. Reversible Markov chains and random walks on graphs. Book in preparation, 2001. [4] K. Brown. Semigroups, rings and Markov chains. To appear in J. Theoretical Probability, 1999. [5] F.R.K. Chung, R. L. Graham, and S.-T. Yau. On sampling with Markov chains. Random Struct. Alg., 9:55 77, 1996. [6] P. Clifford and A. Sudbury. Looking backwards in time in the Moran model in population genetics. J. Appl. Probab., 22:437 442, 1985. [7] P. Diaconis. Group Representations in Probability and Statistics. Institute of Mathematical Statistics, Hayward CA, 1988. 9

[8] P. Diaconis. The cut-off phenomenon in finite Markov chains. Proc. Nat. Acad. Sci. USA, 93:1659 1664, 1996. [9] R.P. Dobrow and J.A. Fill. Rates of convergence for the move-to-root Markov chain for binary search trees. Ann. Appl. Probab., 5:20 36, 1995. [10] N. Eldredge and J. Cracraft. Phylogenic Patterns and the Evolutionary Process. Columbia University Press, New York, 1980. [11] Warren J. Ewens. Mathematical population genetics, volume 9 of Biomathematics. Springer-Verlag, Berlin, 1979. [12] J.A. Fill. An exact formula for the move-to-front rule for self-organizing lists. J. Theoretical Probab., 9:113 160, 1996. [13] J.F.C. Kingman. The coalescent. Stochastic Process. Appl., 13:235 248, 1982. [14] L. Lovász and P. Winkler. Mixing times. In D. Aldous and J. Propp, editors, Microsurveys in Discrete Probability, number 41 in DIMACS Ser. Discrete Math. Theoret. Comp. Sci., pages 85 134, 1998. [15] J. Propp and D. Wilson. Coupling from the past: a user s guide. In D. Aldous and J. Propp, editors, Microsurveys in Discrete Probability, number 41 in DIMACS Ser. Discrete Math. Theoret. Comp. Sci., pages 181 192, 1998. [16] A. J. Sinclair. Algorithms for Random Generation and Counting. Birkhauser, 1993. [17] S. Tavare. Line-of-descent and genealogical processes and their applications in population genetics models. Theoret. Population Biol., 26:119 164, 1984. [18] U. Vazirani. Rapidly mixing Markov chains. In B. Bollobás, editor, Probabilistic Combinatorics And Its Applications, volume 44 of Proc. Symp. Applied Math., pages 99 122. American Math. Soc., 1991. 10