A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent Airam Blancas June 11, 017 Abstract Simple nested coalescent were introduced in [1] to model backwards in time the genealogy of both, species trees and genes trees. In this setting the Kingman case corresponds to binary coalescences in the species tree or binary coalescences in the genes tree. On the other hand, two level Fleming Viot with two level selection arises in [] as the limit in distribution of multilevel multitype population undergoing mutation, selection, genetic drift and spatial migration. In this note, we establish a duality relation between Kingman nested coalescents and the two level Fleming Viot associated with a two-level multitype population with genetic drift. Using this relation we can read off the genealogy backwards in time of a Kingman nested. Keywords. process. Nested coalescent process, two-level Fleming Viot process, function valued dual 1 Introduction models 1.1 The nested coalescent In phylogenetics, gene trees provide a diagrammatical representation of evolutionary relationships. In this case the branching points are generated because a gene in the sample replicates, and its copies are passed on to more than one offspring. In a corresponding evolution model, one can think of the genes as the individuals of an haploid population which reproduces in generations. On the other hand, species trees represent the genealogy of species, i.e. of populations of individuals that are capable of interbreeding. Since these individuals have their own genetic information, gene trees can be viewed as being embedded in the species phylogeny. Therefore gene lineages are allowed to coalesce while they are in the same branch of the species tree. In other words, if two genes i and j have not coalesced by time t (backwards) they are not yet in ancestral relation. In addition, if these genes are associated to different species at time t, then they are not yet in relation by species. Using this heuristic, a nested partition is formally defined in [1]. Moreover with the aim to provide a probabilistic model to describe the dynamic of genes and species trees, in this work a family of Markov processes with values in the space of nested bivariate partitions of N, called simple nested exchangeable coalescent, for short snec, is properly defined. A typical element in the state space of nested coalescents is denoted by Institut für Mathematik, Goethe Universität Frankfurt am Main. Germany. E-mail: blancas@math.uni-frankfurt.de Box 11193, 60054 Frankfurt am Main, 1
π = (π s, π g ). We agree that blocks associated with, the species and the genes partitions are enumerated in increasing order of their least element. Proposition 1 of [1] established that there exists a partition π, called link partition, which allow us to write the species partition in terms of the labels of the genes partition through the coagulation operator, that is, π s = Coag(π g, π). Namely the elements in the i-th block of the link partition correspond to the labels of the genes blocks nested to the i-th species, i.e. if j π i then the block π g j is nested to πs i. Example 1. An example of nested partition of [10] is π = (π s, π g ) where π s = ({1, 3, 5}, {, 6, 8, 9}, {4, 7, 10}) π g = ({1}, {}, {3, 5}, {4, 7}, {6, 9}, {8}, {10}) The corresponding link partition is π = ({1, 3}, {, 5, 6}, {4, 7}). According to [1] the snec transitions are characterized by a measure Σ on [0, 1] M 1 ([0, 1]). The case Σ = δ 0 δ δ0 is the called nested Kingman coalescent (K t : t 0). In this process a pair of species merges or a pair of genes merges inside one species. More precisely, two types of transitions are possible: At rate γ 1, two blocks of the gene partition nested in the same species merge into a single block. At rate γ transition that involves picking two integers i < j occurs, that is, for all species blocks labeled j, changing the label from j to i. We point out that the Kingman nested coalescent describes the genealogy of a two-level coalecent (see subsection 3.4.3 in [] for a general description) with state space the marked partition ((ρ, ζ(ρ)) : ρ P, ζ(ρ) N ρ ), (P denotes the set of partitions of N) and such that the blocks in the partition ρ jumps to a new unoccupied site with rate zero. In a two-level coalescent the marks represent positions in N of the blocks of ρ, we can view the partition ρ as the genes partition π g and the marks as the labels of the species block in which genes blocks are nested. Moreover {ζ(i) : i = 1,..., ρ } {η(π g i ) : i = 1,..., πg }, ζ = {k N : ζ(i) = k for some i {1,..., ρ }} π s = π, n i (ρ) π i, where η is the nest function introduced in [1], that is, η(π g i ) = j implies πg i πj s; n i(ρ) denotes the number of subsets with ζ = i. The transitions in a two level coalescent occurs in every single level: At level I two blocks of the partitions ρ merge at rate γ 1 if they are at the same site. Besides all the partitions at two sites combine to form a single site with all these subsets at rate γ. Therefore given ( ζ(ρ), (n 1,..., n ζ(ρ) )) = (m, (n 1,..., n m )) the possible transitions are: (m, (n 1,..., n m )) (m, (n 1,..., n i 1,..., n m )), at rate γ 1 n i n i 1, (m, (n 1,..., n m )) (m 1, (n 1,..., n i 1, n i + n j, n i+1,..., n j 1, n j+1,..., n m )), at rate γ m m 1, This transitions correspond with the transitions of a sample with m species and n i genes in everyone species, i.e. ( π s, ( π 1,..., π π s ) ) = (m, (n 1,..., n m )).
Genes trees and species trees can be viewed as a hierarchy of two levels: individuals and colonies. Recently, a class of multilevel measure-valued processes which model ensembles of subpopulations with mutation and selection at the subpopulation level, and possible death and replacement of subpopulations has been studied by []. The main goal in this work is to provide a representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman coalescent. In the following subsection we introduce the model described in []. 1. The two-level Fleming Viot Dawson [] analyzes a class of measure-valued processes which model multilevel multitype population undergoing mutation, selection, genetic drift and spatial migration. In this manuscript we will only focus on a two-level population with random genetic drift, via resampling, at both levels. To be precise, we consider a finite multi-type population allocated in M colonies with fixed size N. The set of types is I = {1,..., K} space and N i denotes the number of type i individual in everyone colony. Hence we can write the size of a typical family, N = and the type distribution in everyone colony K N i. Z := 1 N K N i δ i, which is an element on, the space of probability measures on {1,..., K}. Let us now give the description of the genetic drift. Individuals level: at rate γ 1 an individual of type i is chosen with probability x i and replaced by a type j individual which has been chosen with probability x j from the remainder N 1 individuals. Colonies level: at rate γ the i-th colony is chosen according to its type distribution µ i (t) = 1 K K n i,j (t) N δ j, where n ij (t) denotes the number of type j in the i-th colony at time t; and replaced by one of the remainder M 1. Therefore, the dynamic of the individuals in each colony is given by a continuous time Markov chain (V(t) : t 0) taking values on, with transitions probabilities ν ν 1 N δ i + 1 N δ j, at rate (N 1) γ 1 x ix j, for i, j I. Upon the colonies dynamic is described by the continuous time () M -valued Markov chain µ(t) = (µ 1 (t),..., µ M (t) : t 0). It will be convenient to think µ(t) as the following empirical measure Ξ N,M t = 1 M M δ µi (t) P(). (1) 3
Then its corresponding transitions probabilities are given by ξ (ξ + 1 M ( δ µ i + δ µj )) at rate γ (M 1)ξ(dµ i)ξ(dµ j ), () ξ ξ + 1 M (δ µ δ i N + δ j N δ µ ) at rate (N 1) γ 1 x ix j ξ(dµ). (3) In [], the asymptotic behavior of the process (Ξ N,M t : t 0) is investigated in three different stages. Firstly for a single colony when its size N. This leads as limit a -valued process called K-type Fleming-Viot, which is the unique solution of a well posed martingale problem, see Proposition.1 of [] for the description of the generator. In the next stage, a càdlàg strong Markov process with state space () N arises as the limit of a population with a finite number of colonies with size N. Finally taking both N and the number of colonies M, the author showed that there exists a process (Ξ t : t 0) with values in the space of function on P() which is the the unique solution {P ξ : ξ P()}, to the martingale problem: AH(ξ) = A 1 δh(ξ) δξ(µ) + γ δ H(ξ) δ(ξ(µ 1 ))δ(ξ(µ )) (ξ(dµ 1)δ µ1 (dµ ) ξ(dµ 1 )ξ(µ )), where A 1 denotes the generator of a K-type Fleming Viot process and m H(ξ) = h j (x 1,..., x nj )dµ n j j ξ(µ j ), (4) j=1 I n j with m the number of colonies in the sample and n i is number of individuals sampled in the i-th colony. The process Ξ is the two level Fleming Viot process. The reader is referred to Theorem 1 of [] for the definition of the two level Fleming Viot with two selection. An important tool in our analysis is duality. Namely, we will need the dual process (G t : t 0) of the two level Fleming Viot. According to [] it takes values in G, the algebra containing functions of the form G f (ξ) = m j=1 n j k=1 I f jk (x k )µ j (dx k )ξ(dµ j ), (5) where f := (f 1,..., f m ) with f j (x 1,..., x nj ) = (f j1 (x 1 ),..., f 1nj (x nj )) and f iji : I R. The dynamics of its generator includes a jump with rate γ 1 to the state G f j (ξ) where kl We take Therefore, f j kl = (f 1,..., f j 1,,..., f j kl, f j+1,..., f l 1, f l+1,..., f m ). (6) f j kl ( x1,..., x nj ) = (f j1 (x 1 ),..., f j,k 1 (x k 1 ), f jk (x k )f jl (x k ), f j,k+1 (x k+1 ),..., G f j (ξ) = kl f jk (x k )f jl (x k )µ j (dx k ) u [m]\{j} f j,l 1 (x l 1 ), f j,l+1 (x l+1 ),..., f jnj (x nj )). (7) v [n j ]\{k,l} nu f uv (x v )µ u (dx v )ξ(dµ u ), v=1 I f jv (x v )µ j (dx v ) ξ(dµ j ) 4
Similarly at rate γ the process (G t : t 0) jumps to the state G f ij(ξ) where f ij = (f 1,..., f i 1,,..., f ij, f i+1,..., f j 1, f j+1,..., f m ), (8) with f ij (x 1,..., x nj ) = (f i1 (x 1 ),..., f ini (x ni ), f j1 (x 1 ),..., f jnj (x nj )). (9) Hence G f ij(ξ) is written as follows, G f ij(ξ) = ( n i u=1 u [m]\{i,j} f iu (x u )µ i (dx u ) v=1 I n j u=1 f ju (x u )µ i (dx u ) nu f uv (x v )µ u (dx v )ξ(dµ u ), ) ξ(dµ i ) For sake of completeness we now write the generator of the dual process (G t : t 0). AG f (ξ) = γ n m j 1 G f j (ξ) G f (ξ) + γ m m G kl f ij(ξ) G f (ξ). j=1 k=1 l [n j ]\{j} j [m]\{i} Remark 1. The transitions of the dual process (G t : t 0) correspond to the transitions of the nested Kingman colaescent (K t : t 0). Indeed, we can associate with the arguments of f j (x 1,..., x nj ) the j-th species block and with the arguments of f ij (x j ) the j-th block of the gene block of species i. Hence the transition to (6) corresponds to the coalescence of genes k and l in the species j. Similarly the transition to (8) describes the lumping of species i and j. The duality relation between the two level Fleming Viot (Ξ t : t 0) and the function valued process (G t : t 0) is formally established in Theorem 3 of [], through the function H : P() G [0, 1] defined by H(G f, ξ) = m n j j=1 k=1 f kj (x k )µ j (dx k )ξ(dµ j ). where G f is the function defined in (5). Namely the following duality relation holds Main result E Ξ0 (H(G 0, Ξ t )) = E G0 (H(G t, Ξ 0 )). (10) In this section we will use the duality relation (10) to establish that the two level Fleming Viot process (Ξ t : t 0) can be conceived as the forward in time model for the Kingman nested coalescent (K t : t 0). Namely we will define a function h which is basically the function h with coordinates structured by a nested partition which correspond to the state space of the Kingman nested coalescent. For fixed t > 0, assume that K t = π N n, the set of nested partitions of [n]. Let π be the link partition associated with π and (s, g) := ( π s, π g ). We propose to associate the i-th coordinate 5
of the function h with the element in the position i of the partition π. In this direction we define h : I g R + as follows: h π (y 11,..., y 1 π1,..., y s1,..., y s πs ) := h(x π 1,..., x π n), for i = 1,..., g; j = 1,..., s and k j = 1,..., π j, where x π i := y j(i)κ(i), i [n]. Here j(i) denotes the index of the partition element σ(i) of π s to which i belongs and κ(i) denotes the index of the partition element of π g within σ(i) to which i belongs. Example. For the partition π in Example 1, observe that x π 1 = y 11, x π 3 = y 1, x π 8 = y 3, x π 10 = y 3. Continuing with this procedure we obtain, h π (y 11, y 1, y 1, y, y 3, y 31, y 3 ) = h(y 11, y 1, y 1, y 31, y 1, y, y 31, y 3, y, y 3 ). We now observe that the equalities (4) and (5), and therefore G f (ξ) and h are related through the identity h j (x 1,..., x nj ) = n j k=1 f jk(x k ). Putting the pieces together H(h π, ξ) = h π (y 11,..., y 1 π1,..., y s1,..., y s πs ) s π k l=1 j=1 (µ j (dy jk )ξ(dµ j )). (11) Example 3. Continuing with the Example, H(h π, ξ) = h π (y 11, y 1, y 1, y, y 3, y 31, y 3 )µ 1 (dy 11 )µ 1 (dy 1 ) We are now able to establish our main result. µ (dy 1 )µ (dy )µ (dy 3 )µ 3 (dy 31 )µ 3 (dy 3 )ξ(dµ 1 )ξ(dµ )ξ(dµ 3 ). Theorem 1. The basic duality between the process (K t : t 0) and (Ξ t : t 0) is given by E(H(h Kt, ξ)) = E(H(h K 0, Ξ t )). Thanks to the identity (10), it is not difficult to obtain the duality so let us turn to describe what it shows. It will be convenient to view the species as colonies and the genes as individuals. We can now draw the species trees and genes trees in a single picture, indeed a tree with branches painted using as many colors as many colonies exist in the population, see Figure 1. Let us consider a Kingman nested coalescent (K t : t 0) starting in a population completely fragmented into colonies. At time t, we will choose at random out of the total population n individuals from m colonies, so that K t = π, where π g is the trivial partition into singletons and π s is obtained by lumping the individuals block in the same colony. Kingman nested coalescent describes the ancestral lineages backward in time of a population splitted up into colonies. Nevertheless, thanks to duality relation we can view the evolution of the Kingman nested in the forward direction and moreover this forward genealogy corresponds with the two level Fleming Viot process described in Section 1.. We observe that the duality function h π fully characterized both process due to the class consists of functions of the form (4) uniquely determines probability measures on P() and using the partition π we we can read off the configuration of the Kingman nested at every time. Acknowledgements AB acknowledges support from CONACyT. This work was undertaken whilst AB was on a postdoctoral year at Goethe University Frankfurt, she gratefully acknowledges the kind hospitality of the Institute of Mathematics. Specially AB would like to thank, Anton Wakolbinger, for his very helpful advice and encouragement. 6
Figure 1: This figure shows the time evolution of sample of 1 individuals which live in three different colonies. References [1] Airam Blancas, Amaury Lambert, Arno Siri-Jégousse (017). Simple nested coalescents. Unpublished manuscript. [] Donald D. Dawson (016). Multilevel mutation-selection systems and set-valued duals. 7