The Moran Process as a Markov Chain on Leaf-labeled Trees

Similar documents
G(t) := i. G(t) = 1 + e λut (1) u=2

L n = l n (π n ) = length of a longest increasing subsequence of π n.

Discrete random structures whose limits are described by a PDE: 3 open problems

The Combinatorial Interpretation of Formulas in Coalescent Theory

Conductance and Rapidly Mixing Markov Chains

process on the hierarchical group

Mixing Times and Hitting Times

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

k-protected VERTICES IN BINARY SEARCH TREES

Jim Pitman. Department of Statistics. University of California. June 16, Abstract

Intersecting curves (variation on an observation of Maxim Kontsevich) by Étienne GHYS

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Intertwining of Markov processes

ON COMPOUND POISSON POPULATION MODELS

The range of tree-indexed random walk

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

SIMILAR MARKOV CHAINS

Maximum Agreement Subtrees

The symmetry in the martingale inequality

Maximum Process Problems in Optimal Control Theory

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME)

Unavoidable subtrees

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM

Activated Random Walks with bias: activity at low density

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations

3. The Voter Model. David Aldous. June 20, 2012

Increments of Random Partitions

CONTENTS. Preface List of Symbols and Notation

Almost sure asymptotics for the random binary search tree

GENERALIZED BRAUER TREE ORDERS

Phylogenetic Networks, Trees, and Clusters

The two-parameter generalization of Ewens random partition structure

Sampling Contingency Tables

MATHEMATICAL CONCEPTS OF EVOLUTION ALGEBRAS IN NON-MENDELIAN GENETICS

The Distribution of Mixing Times in Markov Chains

ON THE ROLE OF THE COLLECTION PRINCIPLE FOR Σ 0 2-FORMULAS IN SECOND-ORDER REVERSE MATHEMATICS

Arithmetic Funtions Over Rings with Zero Divisors

arxiv:math/ v1 [math.pr] 29 Nov 2002

arxiv: v1 [math.pr] 1 Jan 2013

Perron eigenvector of the Tsetlin matrix

Non-Essential Uses of Probability in Analysis Part IV Efficient Markovian Couplings. Krzysztof Burdzy University of Washington

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

A simple branching process approach to the phase transition in G n,p

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

Realization Plans for Extensive Form Games without Perfect Recall

Random walk on a polygon

LogFeller et Ray Knight

On the Longest Common Pattern Contained in Two or More Random Permutations

arxiv: v2 [math.pr] 4 Sep 2017

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu**

Remarks on a Ramsey theory for trees

WXML Final Report: Chinese Restaurant Process

COSMETIC SURGERY ON LINKS

DIFFERENTIAL POSETS SIMON RUBINSTEIN-SALZEDO

Closed-form sampling formulas for the coalescent with recombination

An Application of Catalan Numbers on Cayley Tree of Order 2: Single Polygon Counting

A REFINED ENUMERATION OF p-ary LABELED TREES

A NOTE ON TENSOR CATEGORIES OF LIE TYPE E 9

Pattern Popularity in 132-Avoiding Permutations

Ramsey Unsaturated and Saturated Graphs

Zhi-Wei Sun Department of Mathematics, Nanjing University Nanjing , People s Republic of China. Received 8 July 2005; accepted 2 February 2006

4.5 The critical BGW tree

Lecture 2: September 8

Faithful couplings of Markov chains: now equals forever

Probabilities of Evolutionary Trees under a Rate-Varying Model of Speciation

On the Logarithmic Calculus and Sidorenko s Conjecture

Generating random spanning trees. Andrei Broder. DEC - Systems Research Center 130 Lytton Ave., Palo Alto, CA

Logarithmic scaling of planar random walk s local times

TREE AND GRID FACTORS FOR GENERAL POINT PROCESSES

Stochastic Demography, Coalescents, and Effective Population Size

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner

Convergence Time to the Ewens Sampling Formula

Lecture 28: April 26

Mathematics of Life. Henry Steyer. address: URL:

Coloring Uniform Hypergraphs with Few Colors*

TAKING THE CONVOLUTED OUT OF BERNOULLI CONVOLUTIONS: A DISCRETE APPROACH

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

SMSTC (2007/08) Probability.

All Ramsey numbers for brooms in graphs

A CRITERION FOR THE EXISTENCE OF TRANSVERSALS OF SET SYSTEMS

Modern Discrete Probability Branching processes

Lecture 18 : Ewens sampling formula

Erdős-Renyi random graphs basics

EIGENVECTORS FOR A RANDOM WALK ON A LEFT-REGULAR BAND

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

A bijective proof of Shapiro s Catalan convolution

Markov chains. Randomness and Computation. Markov chains. Markov processes

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

SELECTIVELY BALANCING UNIT VECTORS AART BLOKHUIS AND HAO CHEN

Mean-field dual of cooperative reproduction

MAXIMAL CLADES IN RANDOM BINARY SEARCH TREES

CUTOFF FOR THE STAR TRANSPOSITION RANDOM WALK

Latent voter model on random regular graphs

Tutorial 1.3: Combinatorial Set Theory. Jean A. Larson (University of Florida) ESSLLI in Ljubljana, Slovenia, August 4, 2011

New lower bounds for hypergraph Ramsey numbers

ZEROS OF SPARSE POLYNOMIALS OVER LOCAL FIELDS OF CHARACTERISTIC p

Uniform Star-factors of Graphs with Girth Three

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

The Azéma-Yor Embedding in Non-Singular Diffusions

Lecture 1: An introduction to probability theory

Transcription:

The Moran Process as a Markov Chain on Leaf-labeled Trees David J. Aldous University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860 aldous@stat.berkeley.edu http://www.stat.berkeley.edu/users/aldous March 29, 1999 Abstract The Moran process in population genetics may be reinterpreted as a Markov chain on a set of trees with leaves labeled by [n]. It provides an interesting worked example in the theory of mixing times and coupling from the past for Markov chains. Mixing times are shown to be of order n 2, as anticipated by the theory surrounding Kingman s coalescent. Incomplete draft -- do not circulate AMS 1991 subject classification: 05C05,60C05,60J10 Key words and phrases. Coupling from the past, Markov chain, mixing time, phylogenetic tree. Research supported by N.S.F. Grant DMS96-22859 1

1 Introduction The study of mixing times for Markov chains on combinatorial sets has attracted considerable interest over the last ten years [3, 5, 7, 14, 16, 18]. This paper provides another worked example. We must at the outset admit that the mathematics is fairly straightforward, but we do find the example instructive. Its analysis provides a simple but not quite obvious illustration of coupling from the past, reminiscent of the elementary analysis ([1] section 4) of riffle shuffle, and of the analysis of move-to-front list algorithms [12] and move-to-root algorithms for maintaining search trees [9]. The main result, Proposition 2, implies that while most combinatorial chains exhibit the cut-off phenomenon [8], this particular example has the opposite diffusive behavior. Our precise motivation for studying this model was as a simpler version of certain Markov chains on phylogenetic trees: see section 3(b) for further discussion. The model also fits a general framework studied in [4]: see section 3(a). 1.1 The Moran chain The Moran model ([11] section 3.3) in population genetics models a population of constant size n. At each step, one randomly-chosen individual is killed and another randomly-chosen individual gives birth to a child. The feature of interest is the genealogy of the individuals alive at a given time, that is how they are related to each other by descent. In population genetics these individuals are in fact genes and there is also mutation and selection structure, but our interest goes in a different direction. There is some flexibility in how much information we choose to record in the genealogy of the current population, and we will make the choice that seems simplest from the combinatorial viewpoint. Label the individuals as [n] := {1, 2,..., n}, and declare that if individual k is killed then the new child is given label k. The left diagram in figure 1 shows a possible genealogy, in which we keep track of the order of the times at which all the splits in all the lines of descent occured, but not the absolute times of splits. 2

level 7 6 5 4 3 2 1 0 t t = f 7 (t, 4, 7) 6 4 3 7 1 4 2 6 4 3 1 4 2 6 4 7 3 1 4 2 Figure 1. A transition t t in the Moran chain. Precisely, the left diagram shows a tree t with leaf-set [n] and height n, where at each level one downward edge splits into two downward edges, and where we distinguish between left and right branches. Such a tree has n(n + 1)/2 edges of unit length. Write T n for the set of such trees. The cardinality of this set is #T n = n!(n 1)! (1) We leave to the reader the (very easy) task of giving a bijective proof of (1); an inductive proof will fall out of the proof of Lemma 1 below. Interpreting the Moran model as a T n -valued process gives a Markov chain on T n which we call the Moran chain. Here is a careful definition. Take a tree t T n and distinct ordered (j, k) from [n]. Delete leaf k from t, then insert leaf k into the edge incident at leaf j, placing it to the right of leaf j. This gives a new tree t = f n (t, j, k). (2) Such a transition is illustrated in figure 1. Starting from the tree t in the left diagram, leaf 7 is deleted and the levels adjusted where necessary, to give the center diagram; then leaf 7 is inserted to the right of leaf 4, and levels adjusted, to give the tree t in the right diagram. The Moran chain is now defined to be the chain with transition probabilities p(t, t ) = P (f n (t, J, K) = t ) where (J, K) is a uniform random distinct ordered pair from [n]. It is easy to check this defines an aperodic irreducible chain, which therefore has some limiting stationary distribution. Our choice of T n as the precise state-space was motivated by 3

Lemma 1 The stationary distribution of the Moran chain is the uniform distribution on T n. Proof. From a tree t there are n (n 1) equally likely choices of (k, j) which define the possible transitions t t. To prove the lemma we need to show the chain is doubly stochastic, i.e. that for each tree t there are n(n 1) such choices which get to t from some starting trees t. For a tree t (illustrated by the right diagram in figure 1) there is only one leaf k (leaf 7, in figure 1) which might have been inserted last, into a diagram like the tree t in the center diagram. The trees t such that deleting leaf k from t gives t are exactly the trees obtainable by attaching leaf k to any of the (n 1)n/2 edges of t and to the right or the left of the existing edge, giving a total of (n 1)n/2 2 choices, as required. Remark. The final part of the argument says that the general element t T n may be constructed from a tree in T n 1 by attaching leaf n to one of the (n 1)n/2 edges, to the right or the left of the existing edge. So #T n = n(n 1) #T n 1, establishing (1) by induction. 1.2 Mixing time for the Moran chain Write (X n (m), m = 0, 1, 2,...) for the Moran chain on T n. One of the standard ways of studying mixing is via the maximal variation distance 1 d n (m) := max t T 2 n P (X n (m) = t X n (0) = t) π n (t ) (3) t where π n (t ) = 1/#T n is the stationary probability. Our result is most easily expressed in terms of the following random variables. Let (ξ i, 2 i < ) be 1 independent, and let ξ i have the exponential distribution with mean i(i 1) ; then let L = ξ i. (4) i=2 Proposition 2 (a) lim sup n d n (zn 2 ) P (L > z) < 1 for each 0 < z < (b) lim inf n d n (zn 2 ) φ(z), for some φ(z) 1 as z 0. Thus mixing time (as measured by variation distance) for the Moran chain is order n 2 but variation distance does not exhibit the cut-off phenomenon of [8] which usually occurs with Markov chains on combinatorial sets. As briefly sketched in the next section, what s really going on is that d n (zn 2 ) d (z) 4

where d ( ) is the maximal variation distance associated with a certain limit continuous-time continuous-space Markov process. We might call this diffusive behavior, from the case of simple random walks on the k-dimensional integers modulo n, whose n limit behavior is of this form with the limit process being Brownian motion on [0, 1] k. 1.3 Remarks on a limit process We now outline why Proposition 2 is not unexpected. In the original Moran model for a population, one can look back from the present to determine the number of steps L n back until the last common ancestor of the present population. It is standard (see section 2.3) that n 2 L n converges in distribution to L. Loosely speaking, this implies that the genealogy after n 2 L steps cannot depend on the initial genealogy, and Proposition 2(a) is a formalization of that idea. A more elaborate picture is given by the theory surrounding Kingman s coalescent [13, 17], which portrays the rescaled n limit of the genealogy of the current size-n population as a genealogy C of an infinite population. Informally, what s really going on is that d n (zn 2 ) d (z), 0 < z < where d ( ) is the maximal variation distance associated with a certain continuous-time Markov process (C t, 0 t < ) whose state space is a set of possible genealogies for an infinite population. However, defining the process (C t ) precisely and proving mixing time bounds via this weak convergence methodology is technically hard. It turns out to be fairly simple to give an analysis of Proposition 2 directly in the discrete setting, by combining the standard analysis of L n with a coupling construction, so that is what we shall do. 2 Proof of Proposition 2 2.1 Coupling from the past The proof is based on a standard elementary idea. Suppose a Markov chain (X(s)) can be represented in the form X(s + 1) = f(x(s), U s+1 ) for some function f, where the (U s ) are independent with common distribution µ. Then X(m) = g m (X(0), U 1, U 2,..., U m ) 5

where g m (x, u 1,..., u m ) := f(...... f(f(x, u 1 ), u 2 )......, u m ). Define d(m) as at (3). Lemma 3 d(m) 1 P (A(m)), where A(m) is the event g m (x, U m+1, U m+2,..., U 0 ) = g m (x, U m+1, U m+2,..., U 0 ) x, x where the (U i ) are independent with distribution µ. This form of coupling has in recent years become known as coupling from the past, named because we are constructing X(0) from X( m) and the intervening U s, and on the event A(m) the current state X(0) does not depend on the state X( m). Recent interest has focused on the connection between coupling from the past and perfect sampling: see [15]. 2.2 The coupling construction First note the Moran chain fits into the setup above by setting X n (s + 1) = f n (X n (s), J s, K s ) (5) for f n at (2) and (J s, K s ) independent uniform random ordered pairs from [n]. Given a tree t T n, say that f is a forest consistent with t if it can be obtained by deleting some (maybe empty) set of edges of t and then taking the spanning forest on the leaves [n]. The left diagram in figure 2 illustrates a forest f consistent with the left tree in figure 1. The transition t t = f n (t, j, k) on trees extends in the natural way to a transition f f = f n (f, j, k) on forests. Figure 2 shows a transition consistent with the transition in figure 1. f f = f 7 (f, 4, 7) 6 4 3 7 1 4 2 Figure 2. A transition f f. 6 4 7 3 1 4 2 6

Fix m. Consider the Moran chain (X n (s), m s 0) defined by (5) with some initial X( m), and consider the forest-valued chain Y n (s + 1) = f n (Y n (s), J s, K s ) where Y n ( m) is the forest on [n] with no edges. It is easy to check that, for each realization of the joint process (X n (s), Y n (s)), we have that Y n (s) is consistent with X n (s). But Y n (s) does not depend on X n ( m), and if Y n (s) is a single tree then Y n (s) = X n (s). Thus on the event A n (m) := {Y n (0) is a single tree } we have that X n (0) does not depend on X n ( m). So by Lemma 3, and we want to estimate the right side. 2.3 Analyzing the coupling d n (m) 1 P (A n (m)) (6) For a forest f on [n] write #f for the multi-set of the number of leaves in each tree-component. So #Y n ( m) = {1, 1, 1,..., 1} and A n (m) is the event that #Y n (0) = {n}. Now reconsider the original Moran model from the start of section 1.1, and run this model for time s = m, m + 1,..., 0. Give a different color to each individual in the initial time m population, and then let children inherit the color of their parent. Write (C n (s), m s 0) for the multi-set of the number of individuals of each color at time s. It is easy to check these two processes are the same: Lemma 4 The process (C n (s), m s 0) has the same distribution as the process (#Y n (s), m s 0). But analyzing C n ( ) is a classical and elementary topic in population genetics (e.g. [6]) which we repeat for completeness. Define (N n (s), 0 s m) as the number of individuals at time s who have descendants at time 0. Then N n ( ) is the Markov chain on state-space [n] with N n (0) = n and P (N n (s + 1) = j 1 N n (s) = j) = j(j 1) n(n 1) P (N n (s + 1) = j N n (s) = j) = j(j 1) 1 n(n 1). This holds because in the Moran model in reversed time, at each step two different individuals are chosen at random to have the same parent, and 7

the other parents are distinct. Now the chain N n (s) can be defined for 0 s <. Write L n = min{s : N n (s) = 1}. The event {L n m}, is the event that all the individuals in the Moran process at time 0 are descendants of some one individual at time m, and hence have the same color. So So by (6) P (A n (m)) = P (#Y n (0) = {n}) = P (C n (0) = {n}) = P (L n m). d n (m) P (L n > m). On the other hand we can represent L n as a sum of independent random variables n L n = ξ n,i i=2 where ξ n,i has geometric distribution with mean n(n 1) i(i 1). It easily follows that P (L n > n 2 z) P (L > z) for L defined at (4). This establishes part (a) of Proposition 2. Part (b) is easy. For t T n write r(t) for the number of leaves in the right-hand branch of t from the top-level split (so r(t) = 4 for the tree in the left of figure 1). Write R n (m) for the process r(x n (m)), stopped upon reaching value 1 or n 1. Then R n ( ) is a certain Markov chain which jumps by at most 1 each time, and which is a martingale, so E(R n (m) R n (0)) 2 m. It follows that, for initial trees t n satisfying n 1 r(t n ) r(0) (0, 1), if m n = o(n 2 ) then n 1 r(x n (m n )) r(0) in probability. Part (b) follows easily. 3 Final remarks (a) As with the analogous chains (riffle shuffle, move-to-front, move-to-root) mentioned in the Introduction, it seems plausible that one can do a more refined analysis of the Moran chain which exhibits all the eigenvalues; in another direction, one might be able to handle non-uniform distributions 8

on leaf-pairs. Indeed, Brown [4] section 6.3 observes that the non-uniform Moran chain fits into the general setting of random walks on left-regular bands (a type of semigroup), without analyzing this particular chain. (b) One can alternatively define the state space of the Moran chain to be the (smaller) set T n of n-leaf cladograms. A cladogram is also a binary tree with leaf-set [n], but now we do not distinguish between left and right branches, and we do not count the overall order of splits. The Moran chain on T n has a certain non-uniform stationary distribution, but the conclusion of Proposition 2 remains true. Cladograms are used in biological systematics [10] to represent evolutionary relationship between species. Markov chain Monte Carlo methods to infer the true cladogram from data start with some base chain on cladograms, providing some remote motivation for studying their mixing times. A related chain designed to have uniform stationary distribution on T n is studied in [2]; a coupling argument is used to bound its mixing time as order n 3, though we expect the correct mixing time is again order n 2. The proof in [2] uses a more intricate coupling argument of the same style as in this paper, but the absence of an analog of Lemma 4 makes its analysis rather harder. References [1] D.J. Aldous and P. Diaconis. Shuffling cards and stopping times. Amer. Math. Monthly, 93:333 348, 1986. [2] D.J. Aldous and P. Diaconis. Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem. Bull. Amer. Math. Soc., 36:413 432, 1999. [3] D.J. Aldous and J.A. Fill. Reversible Markov chains and random walks on graphs. Book in preparation, 2001. [4] K. Brown. Semigroups, rings and Markov chains. To appear in J. Theoretical Probability, 1999. [5] F.R.K. Chung, R. L. Graham, and S.-T. Yau. On sampling with Markov chains. Random Struct. Alg., 9:55 77, 1996. [6] P. Clifford and A. Sudbury. Looking backwards in time in the Moran model in population genetics. J. Appl. Probab., 22:437 442, 1985. [7] P. Diaconis. Group Representations in Probability and Statistics. Institute of Mathematical Statistics, Hayward CA, 1988. 9

[8] P. Diaconis. The cut-off phenomenon in finite Markov chains. Proc. Nat. Acad. Sci. USA, 93:1659 1664, 1996. [9] R.P. Dobrow and J.A. Fill. Rates of convergence for the move-to-root Markov chain for binary search trees. Ann. Appl. Probab., 5:20 36, 1995. [10] N. Eldredge and J. Cracraft. Phylogenic Patterns and the Evolutionary Process. Columbia University Press, New York, 1980. [11] Warren J. Ewens. Mathematical population genetics, volume 9 of Biomathematics. Springer-Verlag, Berlin, 1979. [12] J.A. Fill. An exact formula for the move-to-front rule for self-organizing lists. J. Theoretical Probab., 9:113 160, 1996. [13] J.F.C. Kingman. The coalescent. Stochastic Process. Appl., 13:235 248, 1982. [14] L. Lovász and P. Winkler. Mixing times. In D. Aldous and J. Propp, editors, Microsurveys in Discrete Probability, number 41 in DIMACS Ser. Discrete Math. Theoret. Comp. Sci., pages 85 134, 1998. [15] J. Propp and D. Wilson. Coupling from the past: a user s guide. In D. Aldous and J. Propp, editors, Microsurveys in Discrete Probability, number 41 in DIMACS Ser. Discrete Math. Theoret. Comp. Sci., pages 181 192, 1998. [16] A. J. Sinclair. Algorithms for Random Generation and Counting. Birkhauser, 1993. [17] S. Tavare. Line-of-descent and genealogical processes and their applications in population genetics models. Theoret. Population Biol., 26:119 164, 1984. [18] U. Vazirani. Rapidly mixing Markov chains. In B. Bollobás, editor, Probabilistic Combinatorics And Its Applications, volume 44 of Proc. Symp. Applied Math., pages 99 122. American Math. Soc., 1991. 10