The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

Similar documents
Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego

The total external length in the evolving Kingman coalescent. Götz Kersting and Iulia Stanciu. Goethe Universität, Frankfurt am Main

arxiv: v1 [math.pr] 6 Mar 2018

Workshop on Stochastic Processes and Random Trees

Lecture 18 : Ewens sampling formula

ON COMPOUND POISSON POPULATION MODELS

Frequency Spectra and Inference in Population Genetics

6 Introduction to Population Genetics

6 Introduction to Population Genetics

Estimating Evolutionary Trees. Phylogenetic Methods

The Moran Process as a Markov Chain on Leaf-labeled Trees

The Combinatorial Interpretation of Formulas in Coalescent Theory

Evolution in a spatial continuum

Crump Mode Jagers processes with neutral Poissonian mutations

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Endowed with an Extra Sense : Mathematics and Evolution

Supplementary Material to Full Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution

Mathematical Biology. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. Matthias Birkner Jochen Blath

Asymptotic Genealogy of a Branching Process and a Model of Macroevolution. Lea Popovic. Hon.B.Sc. (University of Toronto) 1997

Comparative Genomics II

Probabilistic Graphical Models

Mean-field dual of cooperative reproduction

Surfing genes. On the fate of neutral mutations in a spreading population

arxiv: v1 [math.pr] 20 Dec 2017

From Individual-based Population Models to Lineage-based Models of Phylogenies

Demography April 10, 2015

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

4.5 The critical BGW tree

Evolutionary Models. Evolutionary Models

Computational Systems Biology: Biology X

Statistics 992 Continuous-time Markov Chains Spring 2004

Measure-valued diffusions, general coalescents and population genetic inference 1

Mathematical models in population genetics II

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

The Coalescence of Intrahost HIV Lineages Under Symmetric CTL Attack

Computing likelihoods under Λ-coalescents

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

ASYMPTOTIC GENEALOGY OF A CRITICAL BRANCHING PROCESS. BY LEA POPOVIC University of California, Berkeley

STA 624 Practice Exam 2 Applied Stochastic Processes Spring, 2008

A phase-type distribution approach to coalescent theory

Reading for Lecture 13 Release v10

arxiv: v1 [math.pr] 29 Jul 2014

The range of tree-indexed random walk

Stochastic Demography, Coalescents, and Effective Population Size

L n = l n (π n ) = length of a longest increasing subsequence of π n.

Phylogenetic Networks, Trees, and Clusters

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

3. The Voter Model. David Aldous. June 20, 2012

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

On trees with a given degree sequence

Taming the Beast Workshop

On the number of subtrees on the fringe of random trees (partly joint with Huilan Chang)

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

Statistics 150: Spring 2007

THE SPATIAL -FLEMING VIOT PROCESS ON A LARGE TORUS: GENEALOGIES IN THE PRESENCE OF RECOMBINATION

Kolmogorov Equations and Markov Processes

Theoretical Population Biology

Characterization of cutoff for reversible Markov chains

RW in dynamic random environment generated by the reversal of discrete time contact process

A CHARACTERIZATION OF ANCESTRAL LIMIT PROCESSES ARISING IN HAPLOID. Abstract. conditions other limit processes do appear, where multiple mergers of

ECE 301 Fall 2011 Division 1. Homework 1 Solutions.

Learning Session on Genealogies of Interacting Particle Systems

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Evolutionary Tree Analysis. Overview

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Pathwise construction of tree-valued Fleming-Viot processes

By Jochen Blath 1, Adrián González Casanova 2, Noemi Kurt 1 and Maite Wilke-Berenguer 3 Technische Universität Berlin

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

The Heat Equation John K. Hunter February 15, The heat equation on a circle

C.DARWIN ( )

Discrete random structures whose limits are described by a PDE: 3 open problems

arxiv: v3 [math.pr] 30 May 2013

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

Lecture 19 : Chinese restaurant process

stochnotes Page 1

The Contour Process of Crump-Mode-Jagers Branching Processes

Applied Stochastic Processes

ELEMENTS OF PROBABILITY THEORY

Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Hard-Core Model on Random Graphs

Two viewpoints on measure valued processes

Bivariate Uniqueness in the Logistic Recursive Distributional Equation

The effects of a weak selection pressure in a spatially structured population

Endogeny for the Logistic Recursive Distributional Equation

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Bayesian Nonparametric Modelling of Genetic Variations using Fragmentation-Coagulation Processes

Hydrodynamics of the N-BBM process

MATH 425, FINAL EXAM SOLUTIONS

Introduction to Vector Functions

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

Mathematical Models for Population Dynamics: Randomness versus Determinism

Branching Processes II: Convergence of critical branching to Feller s CSB

Lecture 3: Markov chains.

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg

Transcription:

The nested Kingman coalescent: speed of coming down from infinity by Jason Schweinsberg (University of California at San Diego) Joint work with: Airam Blancas Benítez (Goethe Universität Frankfurt) Tim Rogers (University of Bath) Arno Siri-Jégousse (Universidad Nacional Autónoma de México)

Coalescent Processes Coalescent processes: describe the genealogy of a collection of individuals, usually from the same species. Sample n individuals at random from a population. Follow their ancestral lines backwards in time. The lineages coalesce, until they are all traced back to a common ancestor. Represent by a stochastic process (Π(t), t 0) taking its values in the set P n of partitions of {1,..., n}. Kingman s Coalescent (Kingman, 198): Continuous-time Markov chain with state space P n. Only two lineages merge at a time. Each pair of lineages merges at rate one. 1 3 4 5 {1,, 3, 4, 5} {1, }, {3, 4, 5} {1, }, {3}, {4, 5} {1}, {}, {3}, {4, 5} {1}, {}, {3}, {4}, {5}

Number of lineages at small times Consider Kingman s coalescent started with n lineages. Let K n (t) be the number of lineages remaining after time t, which is number of blocks of Π(t). When there are j lineages, time until the next merger has an exponential distribution with rate ( ) j. For small t and large n, approximate K n (t) by the solution to which yields x (t) = x(t), x(0) = n, K n (t) t + /n. If n =, we have K (t) < for all t > 0, and (Aldous, 1991): lim tk (t) = a.s. t 0

Nested Coalescents Phylogenetics: the study of the evolutionary relationships among different species. Nested coalescents: sample individuals from different species and follow their ancestral lines backwards in time. Nested Kingman s coalescent: Sample n individuals from each of s species, with 1 n, 1 s. Each pair of lineages belonging to the same species merges at rate 1. Lineages from different species do not merge. Each pair of species merges into a single species at rate c > 0. Represent by a stochastic process (Π(t), t 0) taking its values in the set of labelled partitions of {1,..., ns}. Each block has a label, indicating the species to which the lineage belongs. A more general family of nested coalescents was introduced by Blancas Benítez, Duchamps, Lambert, and Siri-Jégousse (018), allowing for multiple mergers of ancestral lines.

Illustration of the nested Kingman s coalescent with s = 3 species and n = 4 individuals sampled from each species. The blue tree that records the mergers is sometimes called the species tree. The tree that records the genealogical relationship among the 1 sampled individuals is called the gene tree. Extensive literature on estimating species tree from gene tree.

Number of lineages at small times Let S(t) be the number of species remaining at time t. Then (S(t), t 0) has the same law as the number of blocks in Kingman s coalescent, with time scaled by c, so for small t, S(t) ct + /s. Let N(t) be the number of individual lineages at time t. We have N(t) = N 1 (t) + + N S(t) (t), where N i (t) is the number of individual lineages belonging to the ith species. How can we describe the behavior of N(t) when t is small? We focus on n = s =. If there were no species mergers, we would have N i (t) /t and N(t) 4/ct. The number of lineages belonging to a species makes a large upward jump after a species merger.

The distribution of N i (t) We try to approximate the distribution of N i (t) for small t by W/t, where W has some unknown distribution. The portion of the species tree between times 0 and t consists of S(t) subtrees. Consider the subtrees associated with the ith species at time t. time t time Ut The last branchpoint occurs at approximately time U t, where U has a uniform(0, 1) distribution. At that time, we merge two species carrying W 1 /(Ut) and W /(Ut) individual lineages.

A recursive distributional equation Those (W 1 + W )/Ut lineages merge according to Kingman s coalescent for time (1 U)t. Recalling that K n (t) t + /n, the number of these lineages remaining at time t is (1 U)t + Ut W 1 +W = 1 t 1 U ( 1 ). W 1 +W Thus, W satisfies the recursive distributional equation (RDE) W = d 1 U ( 1 ), W 1 +W where W 1, W, and U are independent, W 1 and W have the same distribution as W, and U has the uniform(0, 1) distribution. This RDE has unique solution (Aldous and Bandyopadhyay, 001).

The main result Let γ be the mean of the distribution satisfying the RDE. Then N(t) = N 1 (t) + + N S(t) (t) γ t S(t) γ t ct = γ ct. Simulations indicate γ 3.45. Theorem: Consider the nested Kingman s coalescent started with n = s =. We have t γ N(t) p, as t 0. c Theorem: Consider a sequence of nested Kingman s coalescent processes. The jth process has s j species and n j lineages sampled from each species, with n j s j. If 1/n j t j 1/s j, then t jn(t j ) s j p as j if 1/s j t j 1, then t j N(t j) p γ c as j.

Another approach Lambert and Schertzer (018) obtained our result by different methods. They approximate the distribution of N i (t) by writing down a partial differential equation for the density. When 1 s n, recall that N i (t) { /t if 1/n t 1/s W/t if 1/s t 1 We do not have results for t A/s. Lambert and Schertzer (018) have a PDE which should give the density of N i (t) for t A/s, but they have not yet rigorously proved the connection to the discrete model.