A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

Similar documents
The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

arxiv: v1 [math.pr] 6 Mar 2018

Pathwise construction of tree-valued Fleming-Viot processes

Learning Session on Genealogies of Interacting Particle Systems

The total external length in the evolving Kingman coalescent. Götz Kersting and Iulia Stanciu. Goethe Universität, Frankfurt am Main

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

Stochastic flows associated to coalescent processes

ON COMPOUND POISSON POPULATION MODELS

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

Evolution in a spatial continuum

Mathematical models in population genetics II

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

The tree-valued Fleming-Viot process with mutation and selection

The Combinatorial Interpretation of Formulas in Coalescent Theory

1 Random Walks and Electrical Networks

Some mathematical models from population genetics

arxiv: v1 [math.pr] 29 Jul 2014

Stochastic Population Models: Measure-Valued and Partition-Valued Formulations

The Moran Process as a Markov Chain on Leaf-labeled Trees

Stochastic Demography, Coalescents, and Effective Population Size

How robust are the predictions of the W-F Model?

Two viewpoints on measure valued processes

Lecture 18 : Ewens sampling formula

Measure-valued diffusions, general coalescents and population genetic inference 1

The effects of a weak selection pressure in a spatially structured population

Gibbs distributions for random partitions generated by a fragmentation process

arxiv: v1 [math.pr] 8 Mar 2017

6 Introduction to Population Genetics

An Ergodic Theorem for Fleming-Viot Models in Random Environments 1

INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE

Computational Graphs, and Backpropagation. Michael Collins, Columbia University

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral

EVOLUTIONARY DISTANCES

6 Introduction to Population Genetics

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS

Mean-field dual of cooperative reproduction

A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego

arxiv: v2 [math.pr] 4 Sep 2017

Frequency Spectra and Inference in Population Genetics

Demography April 10, 2015

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego

Computing likelihoods under Λ-coalescents

An extended version of the Carathéodory extension Theorem

Introduction to self-similar growth-fragmentations

The Wright-Fisher Model and Genetic Drift

Modelling populations under fluctuating selection

Mathematical Biology. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. Matthias Birkner Jochen Blath

Crump Mode Jagers processes with neutral Poissonian mutations

Mixing Times and Hitting Times

From Individual-based Population Models to Lineage-based Models of Phylogenies

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

ANCESTRAL PROCESSES WITH SELECTION: BRANCHING AND MORAN MODELS

Evolutionary dynamics on graphs

RW in dynamic random environment generated by the reversal of discrete time contact process

Genetic Algorithms: Basic Principles and Applications

Optimal filtering and the dual process

Latent voter model on random regular graphs

arxiv: v3 [math.pr] 30 May 2013

Estimating Evolutionary Trees. Phylogenetic Methods

On semilinear elliptic equations with measure data

Stochastic Loewner evolution with branching and the Dyson superprocess

Received: 2/7/07, Revised: 5/25/07, Accepted: 6/25/07, Published: 7/20/07 Abstract

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes

LOCAL TIMES OF RANKED CONTINUOUS SEMIMARTINGALES

VC-DENSITY FOR TREES

arxiv: v1 [math.pr] 18 Sep 2007

Automata on linear orderings

SMSTC (2007/08) Probability.

arxiv: v1 [math.pr] 20 Dec 2017

Generalized Fleming-Viot processes with immigration via stochastic flows of partitions

Supplementary Material to Full Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution

EXCHANGEABLE COALESCENTS. Jean BERTOIN

Results and open problems related to Schmidt s Subspace Theorem. Jan-Hendrik Evertse

Final Exam: Probability Theory (ANSWERS)

Workshop on Stochastic Processes and Random Trees

Random self-similar trees and their applications

arxiv:math.pr/ v1 17 May 2004

A CHARACTERIZATION OF ANCESTRAL LIMIT PROCESSES ARISING IN HAPLOID. Abstract. conditions other limit processes do appear, where multiple mergers of

Stepping-Stone Model with Circular Brownian Migration

Krzysztof Burdzy Robert Ho lyst Peter March

1.3 Forward Kolmogorov equation

arxiv: v3 [math.pr] 30 Dec 2015

HOMOGENEOUS CUT-AND-PASTE PROCESSES

Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics

By Jochen Blath 1, Adrián González Casanova 2, Noemi Kurt 1 and Maite Wilke-Berenguer 3 Technische Universität Berlin

The range of tree-indexed random walk

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

Mathematical Population Genetics II

Renormalisation of hierarchically interacting Cannings processes

Processes of Evolution

The BRST complex for a group action

Wigner s semicircle law

Mathematics of Life. Henry Steyer. address: URL:

A CLT FOR MULTI-DIMENSIONAL MARTINGALE DIFFERENCES IN A LEXICOGRAPHIC ORDER GUY COHEN. Dedicated to the memory of Mikhail Gordin

SEEDS AND WEIGHTED QUIVERS. 1. Seeds

Surfing genes. On the fate of neutral mutations in a spreading population

3. The Voter Model. David Aldous. June 20, 2012

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Transcription:

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent Airam Blancas June 11, 017 Abstract Simple nested coalescent were introduced in [1] to model backwards in time the genealogy of both, species trees and genes trees. In this setting the Kingman case corresponds to binary coalescences in the species tree or binary coalescences in the genes tree. On the other hand, two level Fleming Viot with two level selection arises in [] as the limit in distribution of multilevel multitype population undergoing mutation, selection, genetic drift and spatial migration. In this note, we establish a duality relation between Kingman nested coalescents and the two level Fleming Viot associated with a two-level multitype population with genetic drift. Using this relation we can read off the genealogy backwards in time of a Kingman nested. Keywords. process. Nested coalescent process, two-level Fleming Viot process, function valued dual 1 Introduction models 1.1 The nested coalescent In phylogenetics, gene trees provide a diagrammatical representation of evolutionary relationships. In this case the branching points are generated because a gene in the sample replicates, and its copies are passed on to more than one offspring. In a corresponding evolution model, one can think of the genes as the individuals of an haploid population which reproduces in generations. On the other hand, species trees represent the genealogy of species, i.e. of populations of individuals that are capable of interbreeding. Since these individuals have their own genetic information, gene trees can be viewed as being embedded in the species phylogeny. Therefore gene lineages are allowed to coalesce while they are in the same branch of the species tree. In other words, if two genes i and j have not coalesced by time t (backwards) they are not yet in ancestral relation. In addition, if these genes are associated to different species at time t, then they are not yet in relation by species. Using this heuristic, a nested partition is formally defined in [1]. Moreover with the aim to provide a probabilistic model to describe the dynamic of genes and species trees, in this work a family of Markov processes with values in the space of nested bivariate partitions of N, called simple nested exchangeable coalescent, for short snec, is properly defined. A typical element in the state space of nested coalescents is denoted by Institut für Mathematik, Goethe Universität Frankfurt am Main. Germany. E-mail: blancas@math.uni-frankfurt.de Box 11193, 60054 Frankfurt am Main, 1

π = (π s, π g ). We agree that blocks associated with, the species and the genes partitions are enumerated in increasing order of their least element. Proposition 1 of [1] established that there exists a partition π, called link partition, which allow us to write the species partition in terms of the labels of the genes partition through the coagulation operator, that is, π s = Coag(π g, π). Namely the elements in the i-th block of the link partition correspond to the labels of the genes blocks nested to the i-th species, i.e. if j π i then the block π g j is nested to πs i. Example 1. An example of nested partition of [10] is π = (π s, π g ) where π s = ({1, 3, 5}, {, 6, 8, 9}, {4, 7, 10}) π g = ({1}, {}, {3, 5}, {4, 7}, {6, 9}, {8}, {10}) The corresponding link partition is π = ({1, 3}, {, 5, 6}, {4, 7}). According to [1] the snec transitions are characterized by a measure Σ on [0, 1] M 1 ([0, 1]). The case Σ = δ 0 δ δ0 is the called nested Kingman coalescent (K t : t 0). In this process a pair of species merges or a pair of genes merges inside one species. More precisely, two types of transitions are possible: At rate γ 1, two blocks of the gene partition nested in the same species merge into a single block. At rate γ transition that involves picking two integers i < j occurs, that is, for all species blocks labeled j, changing the label from j to i. We point out that the Kingman nested coalescent describes the genealogy of a two-level coalecent (see subsection 3.4.3 in [] for a general description) with state space the marked partition ((ρ, ζ(ρ)) : ρ P, ζ(ρ) N ρ ), (P denotes the set of partitions of N) and such that the blocks in the partition ρ jumps to a new unoccupied site with rate zero. In a two-level coalescent the marks represent positions in N of the blocks of ρ, we can view the partition ρ as the genes partition π g and the marks as the labels of the species block in which genes blocks are nested. Moreover {ζ(i) : i = 1,..., ρ } {η(π g i ) : i = 1,..., πg }, ζ = {k N : ζ(i) = k for some i {1,..., ρ }} π s = π, n i (ρ) π i, where η is the nest function introduced in [1], that is, η(π g i ) = j implies πg i πj s; n i(ρ) denotes the number of subsets with ζ = i. The transitions in a two level coalescent occurs in every single level: At level I two blocks of the partitions ρ merge at rate γ 1 if they are at the same site. Besides all the partitions at two sites combine to form a single site with all these subsets at rate γ. Therefore given ( ζ(ρ), (n 1,..., n ζ(ρ) )) = (m, (n 1,..., n m )) the possible transitions are: (m, (n 1,..., n m )) (m, (n 1,..., n i 1,..., n m )), at rate γ 1 n i n i 1, (m, (n 1,..., n m )) (m 1, (n 1,..., n i 1, n i + n j, n i+1,..., n j 1, n j+1,..., n m )), at rate γ m m 1, This transitions correspond with the transitions of a sample with m species and n i genes in everyone species, i.e. ( π s, ( π 1,..., π π s ) ) = (m, (n 1,..., n m )).

Genes trees and species trees can be viewed as a hierarchy of two levels: individuals and colonies. Recently, a class of multilevel measure-valued processes which model ensembles of subpopulations with mutation and selection at the subpopulation level, and possible death and replacement of subpopulations has been studied by []. The main goal in this work is to provide a representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman coalescent. In the following subsection we introduce the model described in []. 1. The two-level Fleming Viot Dawson [] analyzes a class of measure-valued processes which model multilevel multitype population undergoing mutation, selection, genetic drift and spatial migration. In this manuscript we will only focus on a two-level population with random genetic drift, via resampling, at both levels. To be precise, we consider a finite multi-type population allocated in M colonies with fixed size N. The set of types is I = {1,..., K} space and N i denotes the number of type i individual in everyone colony. Hence we can write the size of a typical family, N = and the type distribution in everyone colony K N i. Z := 1 N K N i δ i, which is an element on, the space of probability measures on {1,..., K}. Let us now give the description of the genetic drift. Individuals level: at rate γ 1 an individual of type i is chosen with probability x i and replaced by a type j individual which has been chosen with probability x j from the remainder N 1 individuals. Colonies level: at rate γ the i-th colony is chosen according to its type distribution µ i (t) = 1 K K n i,j (t) N δ j, where n ij (t) denotes the number of type j in the i-th colony at time t; and replaced by one of the remainder M 1. Therefore, the dynamic of the individuals in each colony is given by a continuous time Markov chain (V(t) : t 0) taking values on, with transitions probabilities ν ν 1 N δ i + 1 N δ j, at rate (N 1) γ 1 x ix j, for i, j I. Upon the colonies dynamic is described by the continuous time () M -valued Markov chain µ(t) = (µ 1 (t),..., µ M (t) : t 0). It will be convenient to think µ(t) as the following empirical measure Ξ N,M t = 1 M M δ µi (t) P(). (1) 3

Then its corresponding transitions probabilities are given by ξ (ξ + 1 M ( δ µ i + δ µj )) at rate γ (M 1)ξ(dµ i)ξ(dµ j ), () ξ ξ + 1 M (δ µ δ i N + δ j N δ µ ) at rate (N 1) γ 1 x ix j ξ(dµ). (3) In [], the asymptotic behavior of the process (Ξ N,M t : t 0) is investigated in three different stages. Firstly for a single colony when its size N. This leads as limit a -valued process called K-type Fleming-Viot, which is the unique solution of a well posed martingale problem, see Proposition.1 of [] for the description of the generator. In the next stage, a càdlàg strong Markov process with state space () N arises as the limit of a population with a finite number of colonies with size N. Finally taking both N and the number of colonies M, the author showed that there exists a process (Ξ t : t 0) with values in the space of function on P() which is the the unique solution {P ξ : ξ P()}, to the martingale problem: AH(ξ) = A 1 δh(ξ) δξ(µ) + γ δ H(ξ) δ(ξ(µ 1 ))δ(ξ(µ )) (ξ(dµ 1)δ µ1 (dµ ) ξ(dµ 1 )ξ(µ )), where A 1 denotes the generator of a K-type Fleming Viot process and m H(ξ) = h j (x 1,..., x nj )dµ n j j ξ(µ j ), (4) j=1 I n j with m the number of colonies in the sample and n i is number of individuals sampled in the i-th colony. The process Ξ is the two level Fleming Viot process. The reader is referred to Theorem 1 of [] for the definition of the two level Fleming Viot with two selection. An important tool in our analysis is duality. Namely, we will need the dual process (G t : t 0) of the two level Fleming Viot. According to [] it takes values in G, the algebra containing functions of the form G f (ξ) = m j=1 n j k=1 I f jk (x k )µ j (dx k )ξ(dµ j ), (5) where f := (f 1,..., f m ) with f j (x 1,..., x nj ) = (f j1 (x 1 ),..., f 1nj (x nj )) and f iji : I R. The dynamics of its generator includes a jump with rate γ 1 to the state G f j (ξ) where kl We take Therefore, f j kl = (f 1,..., f j 1,,..., f j kl, f j+1,..., f l 1, f l+1,..., f m ). (6) f j kl ( x1,..., x nj ) = (f j1 (x 1 ),..., f j,k 1 (x k 1 ), f jk (x k )f jl (x k ), f j,k+1 (x k+1 ),..., G f j (ξ) = kl f jk (x k )f jl (x k )µ j (dx k ) u [m]\{j} f j,l 1 (x l 1 ), f j,l+1 (x l+1 ),..., f jnj (x nj )). (7) v [n j ]\{k,l} nu f uv (x v )µ u (dx v )ξ(dµ u ), v=1 I f jv (x v )µ j (dx v ) ξ(dµ j ) 4

Similarly at rate γ the process (G t : t 0) jumps to the state G f ij(ξ) where f ij = (f 1,..., f i 1,,..., f ij, f i+1,..., f j 1, f j+1,..., f m ), (8) with f ij (x 1,..., x nj ) = (f i1 (x 1 ),..., f ini (x ni ), f j1 (x 1 ),..., f jnj (x nj )). (9) Hence G f ij(ξ) is written as follows, G f ij(ξ) = ( n i u=1 u [m]\{i,j} f iu (x u )µ i (dx u ) v=1 I n j u=1 f ju (x u )µ i (dx u ) nu f uv (x v )µ u (dx v )ξ(dµ u ), ) ξ(dµ i ) For sake of completeness we now write the generator of the dual process (G t : t 0). AG f (ξ) = γ n m j 1 G f j (ξ) G f (ξ) + γ m m G kl f ij(ξ) G f (ξ). j=1 k=1 l [n j ]\{j} j [m]\{i} Remark 1. The transitions of the dual process (G t : t 0) correspond to the transitions of the nested Kingman colaescent (K t : t 0). Indeed, we can associate with the arguments of f j (x 1,..., x nj ) the j-th species block and with the arguments of f ij (x j ) the j-th block of the gene block of species i. Hence the transition to (6) corresponds to the coalescence of genes k and l in the species j. Similarly the transition to (8) describes the lumping of species i and j. The duality relation between the two level Fleming Viot (Ξ t : t 0) and the function valued process (G t : t 0) is formally established in Theorem 3 of [], through the function H : P() G [0, 1] defined by H(G f, ξ) = m n j j=1 k=1 f kj (x k )µ j (dx k )ξ(dµ j ). where G f is the function defined in (5). Namely the following duality relation holds Main result E Ξ0 (H(G 0, Ξ t )) = E G0 (H(G t, Ξ 0 )). (10) In this section we will use the duality relation (10) to establish that the two level Fleming Viot process (Ξ t : t 0) can be conceived as the forward in time model for the Kingman nested coalescent (K t : t 0). Namely we will define a function h which is basically the function h with coordinates structured by a nested partition which correspond to the state space of the Kingman nested coalescent. For fixed t > 0, assume that K t = π N n, the set of nested partitions of [n]. Let π be the link partition associated with π and (s, g) := ( π s, π g ). We propose to associate the i-th coordinate 5

of the function h with the element in the position i of the partition π. In this direction we define h : I g R + as follows: h π (y 11,..., y 1 π1,..., y s1,..., y s πs ) := h(x π 1,..., x π n), for i = 1,..., g; j = 1,..., s and k j = 1,..., π j, where x π i := y j(i)κ(i), i [n]. Here j(i) denotes the index of the partition element σ(i) of π s to which i belongs and κ(i) denotes the index of the partition element of π g within σ(i) to which i belongs. Example. For the partition π in Example 1, observe that x π 1 = y 11, x π 3 = y 1, x π 8 = y 3, x π 10 = y 3. Continuing with this procedure we obtain, h π (y 11, y 1, y 1, y, y 3, y 31, y 3 ) = h(y 11, y 1, y 1, y 31, y 1, y, y 31, y 3, y, y 3 ). We now observe that the equalities (4) and (5), and therefore G f (ξ) and h are related through the identity h j (x 1,..., x nj ) = n j k=1 f jk(x k ). Putting the pieces together H(h π, ξ) = h π (y 11,..., y 1 π1,..., y s1,..., y s πs ) s π k l=1 j=1 (µ j (dy jk )ξ(dµ j )). (11) Example 3. Continuing with the Example, H(h π, ξ) = h π (y 11, y 1, y 1, y, y 3, y 31, y 3 )µ 1 (dy 11 )µ 1 (dy 1 ) We are now able to establish our main result. µ (dy 1 )µ (dy )µ (dy 3 )µ 3 (dy 31 )µ 3 (dy 3 )ξ(dµ 1 )ξ(dµ )ξ(dµ 3 ). Theorem 1. The basic duality between the process (K t : t 0) and (Ξ t : t 0) is given by E(H(h Kt, ξ)) = E(H(h K 0, Ξ t )). Thanks to the identity (10), it is not difficult to obtain the duality so let us turn to describe what it shows. It will be convenient to view the species as colonies and the genes as individuals. We can now draw the species trees and genes trees in a single picture, indeed a tree with branches painted using as many colors as many colonies exist in the population, see Figure 1. Let us consider a Kingman nested coalescent (K t : t 0) starting in a population completely fragmented into colonies. At time t, we will choose at random out of the total population n individuals from m colonies, so that K t = π, where π g is the trivial partition into singletons and π s is obtained by lumping the individuals block in the same colony. Kingman nested coalescent describes the ancestral lineages backward in time of a population splitted up into colonies. Nevertheless, thanks to duality relation we can view the evolution of the Kingman nested in the forward direction and moreover this forward genealogy corresponds with the two level Fleming Viot process described in Section 1.. We observe that the duality function h π fully characterized both process due to the class consists of functions of the form (4) uniquely determines probability measures on P() and using the partition π we we can read off the configuration of the Kingman nested at every time. Acknowledgements AB acknowledges support from CONACyT. This work was undertaken whilst AB was on a postdoctoral year at Goethe University Frankfurt, she gratefully acknowledges the kind hospitality of the Institute of Mathematics. Specially AB would like to thank, Anton Wakolbinger, for his very helpful advice and encouragement. 6

Figure 1: This figure shows the time evolution of sample of 1 individuals which live in three different colonies. References [1] Airam Blancas, Amaury Lambert, Arno Siri-Jégousse (017). Simple nested coalescents. Unpublished manuscript. [] Donald D. Dawson (016). Multilevel mutation-selection systems and set-valued duals. 7