ON COMPOUND POISSON POPULATION MODELS

Similar documents
Stochastic flows associated to coalescent processes

A CHARACTERIZATION OF ANCESTRAL LIMIT PROCESSES ARISING IN HAPLOID. Abstract. conditions other limit processes do appear, where multiple mergers of

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

BIRS workshop Sept 6 11, 2009

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS

The Combinatorial Interpretation of Formulas in Coalescent Theory

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg

Stochastic Demography, Coalescents, and Effective Population Size

Computational Systems Biology: Biology X

The Wright-Fisher Model and Genetic Drift

Evolution in a spatial continuum

Frequency Spectra and Inference in Population Genetics

An introduction to mathematical modeling of the genealogical process of genes

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego

The two-parameter generalization of Ewens random partition structure

Look-down model and Λ-Wright-Fisher SDE

The range of tree-indexed random walk

Learning Session on Genealogies of Interacting Particle Systems

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

Stochastic Population Models: Measure-Valued and Partition-Valued Formulations

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Probabilistic number theory and random permutations: Functional limit theory

Mathematical models in population genetics II

A phase-type distribution approach to coalescent theory

L n = l n (π n ) = length of a longest increasing subsequence of π n.

The Moran Process as a Markov Chain on Leaf-labeled Trees

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

arxiv: v2 [math.pr] 4 Sep 2017

Pathwise construction of tree-valued Fleming-Viot processes

6 Introduction to Population Genetics

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Ancestor Problem for Branching Trees

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal

Branching within branching: a general model for host-parasite co-evolution

Supermodular ordering of Poisson arrays

Asymptotic properties of the maximum likelihood estimator for a ballistic random walk in a random environment

Almost giant clusters for percolation on large trees

Measure-valued diffusions, general coalescents and population genetic inference 1

Department of Applied Mathematics Faculty of EEMCS. University of Twente. Memorandum No Birth-death processes with killing

6 Introduction to Population Genetics

Theoretical Population Biology

How robust are the predictions of the W-F Model?

Genetic Variation in Finite Populations

arxiv: v1 [math.pr] 29 Jul 2014

Rare event simulation for the ruin problem with investments via importance sampling and duality

LAN property for ergodic jump-diffusion processes with discrete observations

Mean-field dual of cooperative reproduction

Qualifying Exam in Probability and Statistics.

Crump Mode Jagers processes with neutral Poissonian mutations

Theoretical Population Biology

RW in dynamic random environment generated by the reversal of discrete time contact process

process on the hierarchical group

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Stochastic Proximal Gradient Algorithm

Two viewpoints on measure valued processes

Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee)

Computing likelihoods under Λ-coalescents

STATISTICS SYLLABUS UNIT I

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

A N-branching random walk with random selection

Poisson INAR processes with serial and seasonal correlation

On The Mutation Parameter of Ewens Sampling. Formula

URN MODELS: the Ewens Sampling Lemma

Resistance Growth of Branching Random Networks

Lecture 1: Brief Review on Stochastic Processes

Demography April 10, 2015

Qualifying Exam in Probability and Statistics.

Lecture 18 : Ewens sampling formula

The Central Limit Theorem: More of the Story

Limit Theorems for Exchangeable Random Variables via Martingales

On some distributional properties of Gibbs-type priors

The effects of a weak selection pressure in a spatially structured population

EXCHANGEABLE COALESCENTS. Jean BERTOIN

On rational approximation of algebraic functions. Julius Borcea. Rikard Bøgvad & Boris Shapiro

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

DUALITY AND FIXATION IN Ξ-WRIGHT-FISHER PROCESSES WITH FREQUENCY-DEPENDENT SELECTION. By Adrián González Casanova and Dario Spanò

NUMBER OF SYMBOL COMPARISONS IN QUICKSORT

arxiv: v1 [q-bio.pe] 20 Oct 2017

HOMOGENEOUS CUT-AND-PASTE PROCESSES

Birth-death chain. X n. X k:n k,n 2 N k apple n. X k: L 2 N. x 1:n := x 1,...,x n. E n+1 ( x 1:n )=E n+1 ( x n ), 8x 1:n 2 X n.

Properties for systems with weak invariant manifolds

Random trees and branching processes

Problems on Evolutionary dynamics

STAT STOCHASTIC PROCESSES. Contents

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

2 Discrete-Time Markov Chains

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Some mathematical models from population genetics

Conditional Distributions

Convergence Time to the Ewens Sampling Formula

arxiv: v1 [math.pr] 6 Mar 2018

Endowed with an Extra Sense : Mathematics and Evolution

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Population Genetics: a tutorial

stochnotes Page 1

Transcription:

ON COMPOUND POISSON POPULATION MODELS Martin Möhle, University of Tübingen (joint work with Thierry Huillet, Université de Cergy-Pontoise) Workshop on Probability, Population Genetics and Evolution Centre International de Rencontres Mathématiques (CIRM) Marseille-Luminy, France June 11, 2012 1

Exchangeable population models (Cannings) Non-overlapping generations r Z = {..., 1, 0, 1,...} Population size N, i.e. N individuals (genes, particles) in each generation ν (r) i := number of offspring of individual i of generation r, ν (r) 1 + +ν (r) N = N Additional assumptions on the offspring Exchangeability: (ν (r) π1,..., ν (r) πn ) d = (ν (r) 1,..., ν (r) N ) Homogeneity: (ν (r) 1,..., ν (r) N ) r i.i.d. Avoid the trivial model (ν (r) i 1). Define (ν 1,..., ν N ) := (ν (0) 1,..., ν (0) N ). π Examples. Wright-Fisher: (ν 1,..., ν N ) d = Multinomial(N, 1 N,..., 1 N ) Moran: (ν 1,..., ν N ) = random permutation of (2, 1,..., 1, 0) }{{} N 2 2

Exchangeable population models (graphical representation) 6 5 4 3 2 Past Example: Population size N = 8 ν (6) 3 = 3 ν (6) 5 = 1 ν (6) 6 = 4 1 0 Future 3

Conditional branching population models Let ξ 1, ξ 2,... be independent random variables taking values in N 0 := {0, 1, 2,...}. Assume that P(ξ 1 + + ξ N = N) > 0 for all N N := {1, 2,...}. Perform the following two steps: 1. Conditioning: Let µ 1,..., µ N be random variables such that the distribution of them coincides with that of ξ 1,..., ξ N conditioned on the event that ξ 1 + + ξ N = N. 2. Shuffling: Let ν 1,..., ν N be the µ 1,..., µ N randomly permutated. Conditional branching process models are particular Cannings models with offspring variables ν 1,..., ν N constructed as above (Karlin and McGregor, 1964). 4

Compound Poisson population models Compound Poisson population models are particular conditional branching process models for which ξ n has p.g.f. E(x ξ n ) = exp ( θ n (φ(z) φ(zx))), x 1, n N, with parameters 0 < θ n < and with a power series φ of the form φ(z) = z < r, with positive radius r of convergence and φ m 0, φ 1 > 0. m=1 φ m z m m!, 5

Distribution and factorial moments of µ Notation: Taylor expansion exp(θφ(z)) = k=0 σ k (θ) z k, z < r. k! (The coefficients σ k (θ) can be computed recursively.) Distribution of µ: P(µ 1 = j 1,..., µ N = j N ) = N! σ j1 (θ 1 ) σ jn (θ N ) j 1! j N! σ N ( N n=1 θ n) (j 1,..., j N N 0 with j 1 + + j N = N ) Factorial moments of µ: E((µ 1 ) k1 (µ N ) kn ) = N! σ N ( N n=1 θ n) j 1 k 1,...,j N k N j 1 + +j N =N σ j1 (θ 1 ) σ jn (θ N ) (j 1 k 1 )! (j N k N )! (k 1,..., k N N 0 ) 6

A subclass of compound Poisson models We focus on the subclass of compound Poisson models satisfying σ k+1 (θ) σ k (θ) + σ k +1(θ ) σ k (θ ) = σ k+k +1(θ + θ ) σ k+k (θ + θ ) k, k N 0, θ, θ (0, ). ( ) Lemma. A compound Poisson model satisfies ( ) if and only if φ m = (m 1)!φ 1 (φ 2 /φ 1 ) m 1 for all m N. (These are essentially Wright Fisher models and Dirichlet models.) If ( ) holds then µ has factorial moments E((µ 1 ) k1 (µ N ) kn ) = (N) k1 + +k N σ k1 (θ 1 ) σ kn (θ N ) σ k (θ 1 + + θ N ) (k 1,..., k N N 0 ) Assumption. In the following it is always assumed that ( ) holds. 7

Ancestral process Take a sample of n ( N ) individuals from some generation and consider their ancestors. (i, j) R t = R (n) t : individuals i and j have a common parent t generations backwards in time. The ancestral process R := (R t ) t=0,1,... is Markovian with state space E n (set of equivalence relations on {1,..., n}). Transition probabilities: P(R t+1 = η R t = ξ) = Φ (N) j (k 1,..., k j ), ξ, η E n with ξ η with Φ (N) j (k 1,..., k j ) := 1 σ k1 + +k j ( N n=1 θ n) N n 1,...,n j =1 all distinct σ k1 (θ n1 ) σ kj (θ nj ) where j := η = number of blocks of η, k 1,..., k j := group sizes of merging classes of ξ. ( k 1 + + k j = ξ ) 8

Two basic transition probabilities Notation: For N, k N define Θ k (N) := N n=1 θ k n. c N := coalescence probability := P(2 individuals have same parent) = Φ (N) 1 N 1 (2) = σ 2 (θ n ) = φ 2Θ 1 (N) + φ 2 1Θ 2 (N) σ 2 (Θ 1 (N)) φ 2 Θ 1 (N) + φ 2 1(Θ 1 (N)). 2 We also need n=1 d N := P(3 individuals have same parent) = Φ (N) 1 (3) = φ 3 Θ 1 (N) + 3φ 1 φ 2 Θ 2 (N) + φ 3 1Θ 3 (N) φ 3 Θ 1 (N) + 3φ 1 φ 2 (Θ 1 (N)) 2 + φ 3 1(Θ 1 (N)) 3. 9

Exchangeable coalescent processes Exchangeable coalescents are discrete-time or continuous-time Markov processes Π = (Π t ) t with state space E, the set of equivalence relations (partitions) on N := {1, 2,...}. During each transition, equivalence classes (blocks) merge together. Simultaneous multiple collisions of blocks are allowed. Schweinsberg (2000) characterizes these processes via a finite measure Ξ on the infinite simplex := {x = (x 1, x 2,...) : x 1 x 2 0, x := x i 1}. i=1 These processes are therefore also called Ξ-coalescents. 10

Domain of attraction For n N let ϱ n : E E n denote the restriction of E to E n, the set of equivalence relations on {1,..., n}. Definition. We say that the considered population model is in the domain of attraction of a continuoustime coalescent Π = (Π t ) t 0, if, for each sample size n N, the time-scaled ancestral process (R (n) [t/c N ] ) t 0 weakly converges to (ϱ n Π t ) t 0 as N. We say that the considered population model is in the domain of attraction of a discretetime coalescent Π = (Π t ) t=0,1,..., if, for each sample size n N, the ancestral process (R (n) t ) t=0,1,... weakly converges to (ϱ n Π t ) t=0,1,... as N. 11

Results (Regime 1) Theorem 1. Suppose that ( ) holds. If n=1 θ n <, then the compound Poisson population model is in the domain of attraction of a discrete-time Ξ-coalescent. Characterization of Ξ. There exists a consistent sequence (Q j ) j N of probability measures Q j on the j-simplex j := {(x 1,..., x j ) [0, 1] j : x 1 + + x j 1} uniquely determined via their moments x k 1 1 x k j j Q j (dx 1,..., dx j ) = σ k 1 (θ 1 ) σ kj (θ j ) j σ k1 + +k j ( n=1 θ n), k 1,..., k j N 0. Let Q denote the projective limit of (Q j ) j N, let X 1, X 2,... be random variables with joint distribution Q, and let ν be the joint distribution of the ordered variables X (1) X (2). Then, Ξ has density x (x, x) := n=1 x2 n with respect to ν. The measure Ξ is concentrated on the subset of points x = (x 1, x 2,...) satisfying x := n=1 x n = 1. 12

Regime 1 (continued) Remark. The proof of Theorem 1 is based on general convergence theorems for ancestral processes of Cannings models (M. and Sagitov 2001) and on the moment problem for the j-dimensional simplex (Gupta). Examples. Suppose that θ := n=1 θ n <. Wright-Fisher models. If φ(z) = φ 1 z, then σ k (θ) = θ k φ k 1. In this case ν is the Dirac measure at p = (θ 1 /θ, θ 2 /θ,...). The measure Ξ assigns its total mass Ξ( ) = (p, p) = ( n=1 θ2 n)/θ 2 to the single point p. Dirichlet models. If φ(x) = log(1 x), then φ m = (m 1)!, m N, and σ k (θ) = [θ] k := θ(θ + 1) (θ + k 1), k N 0. The limiting coalescent is the discrete-time Dirichlet-Kingman coalescent with parameter (θ n ) n N. 13

Results (Regime 2) Theorem 2. Suppose that ( ) holds. If n=1 θ n = and if n=1 θ2 n <, then the compound Poisson population model is in the domain of attraction of the Kingman coalescent. Remarks. 1. Time-scaling satisfies c N = Θ 2 (N)/(Θ 1 (N)) 2 if φ 2 = 0 and c N φ 2 /(φ 2 1Θ 1 (N)) if φ 2 > 0, where Θ k (N) := N n=1 θk n for k N. 2. In contrast to the situation in Theorem 1, the limiting coalescent in Theorem 2 does not depend on the function φ of the compound Poisson model. Theorem 2 is for example applicable if θ n = n α with α ( 1 2, 1]. 14

Sketch of proof The proof of Theorem 2 is based on the following technical lemma. Lemma. If ( ) holds, then the following five conditions are equivalent. (i) lim N Θ 2 (N) = 0. (ii) lim (Θ 1 (N)) 2 N (iii) lim N c N = 0. (iv) lim N Θ 3 (N) Θ 1 (N) Θ 2 (N) = 0. d N c N = 0. (v) The compound Poisson model is in the domain of attraction of the Kingman coalescent. Remark. The proof that (i) - (iii) are equivalent is technical but elementary. The equivalence of (iv) and (v) and the implication (iv) (iii) hold even for arbitrary Cannings models (M., 2000). The interesting point is that, for compound Poisson models, (iii) implies (iv). Note that this implication does not hold for arbitrary Cannings models. 15

Results (Regime 3) Theorem 3. Suppose that ( ) holds, that n=1 θ n = and that n=1 θ2 n =. Then the compound Poisson model is in the domain of attraction of the Kingman coalescent if and only if Θ 2 (N)/(Θ 1 (N)) 2 c N φ 2 /(φ 2 1Θ 1 (N)) + Θ 2 (N)/(Θ 1 (N)) 2. 0 as N. In this case the time-scaling c N satisfies Corollary. (unbiased case, Huillet and M., 2010) If ( ) holds and if θ n = θ does not depend on n, then the compound Poisson model is in the domain of attraction of the Kingman coalescent. The time-scaling c N satisfies c N (1 + φ 2 /(φ 2 1θ))/N. 16

Results (Regime 3, continued) Theorem 4. Suppose that ( ) holds and that all the limits p 1 (k) := lim N Θ k (N) (Θ 1 (N)) k, k N, exist. Then all the limits p j (k 1,..., k j ) := lim N 1 (Θ 1 (N)) k 1+ +k j N n 1,...,n j =1 all distinct θ k 1 n 1 θ k j n j, k 1,..., k j N, exist. Suppose now in addition that n=1 θ n = and that p 1 (2) > 0. Then, the compound Poisson model is in the domain of attraction of a discrete-time Ξ- coalescent Π. The characterizing measure ν(dx) := Ξ(dx)/(x, x) of Π is the Diracmeasure at x = (x 1, x 2,...), where x 1 := lim k (p 1 (k)) 1/k and x n+1 := lim k (p 1 (k) (x k 1 + + x k n)) 1/k, n N. 17

Regime 3 (continued) Remarks. Let Z N be a random variable taking the value θ n /Θ 1 (N) with probability θ n /Θ 1 (N), n {1,..., N}. The existence of all the limits p 1 (k), k N, is equivalent to the convergence Z N Z in distribution, where Z has characteristic function t k=0 (tk /k!)p 1 (k + 1), t R. In contrast to the situation in Theorem 1, the limiting discrete-time Ξ-coalescent in Theorem 4 does not depend on the function φ of the compound Poisson model. Example. Fix λ > 1 and suppose that θ n = λ n, n N. Then, p 1 (k) = (λ 1) k /(λ k 1) > 0, k N. In this case the measure Ξ of the limiting Ξ-coalescent assigns its total mass Ξ( ) = p 1 (2) = (λ 1)/(λ + 1) to the single point x = (x 1, x 2,...) defined via x n := (λ 1)/λ n = (1 1/λ)(1/λ) n 1, n N. 18

Generalization: Assume that ( ) does not hold Theorem. (Huillet, M. 2011) Fix θ (0, ) and suppose that the equation θzφ (z) = 1 has a real solution z(θ) (0, r). Then µ 1 X in distribution as N, where X has distribution P(X = k) = σ k (θ) (z(θ))k k! e θφ(z(θ)) k {0, 1, 2,...}. The associated symmetric compound Poisson model is in the domain of attraction of the Kingman coalescent. The effective population size N e := 1/c N satisfies N e ϱn as N, where ϱ := 1/E((X) 2 ) = 1/(1 + θ(z(θ)) 2 φ (z(θ))) (0, 1]. Remark. Proof uses the saddle point method to establish the asymptotics of σ N (Nθ). Open cases. a) Symmetric models without a solution z(θ), condensation (work in progress). b) Non symmetric models. 19

Conclusions Asymptotics of the ancestry for some compound Poisson population models analyzed Results essentially based on convergence theorems (M. 2000 and M. and Sagitov 2001) for ancestral processes of exchangeable Cannings population models Convergence to the Kingman coalescent if and only if Θ 2 (N)/(Θ 1 (N)) 2 0 Compound Poisson models satisfying ( ) are never in the domain of attraction of a continuous-time coalescent different from Kingman s coalescent; discrete-time Ξ-coalescents (with simultaneous multiple collisions) arise if the parameters θ n are unbalanced Three regimes depending on the behavior of the series n θ n and n θ2 n ; complete convergence results for the first two regimes; partial convergence results for the third regime (when both series diverge) Convergence to the Kingman coalescent holds even for more general symmetric compound Poisson models, which do not necessarily satisfy the restriction ( ) 20

References CANNINGS, C. (1974) The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv. Appl. Probab. 6, 260 290. GUPTA, J. C. (1999) The moment problem for the standard k-dimensional simplex. Sankhyā A 61, 286 291. HUILLET, T. AND MÖHLE, M. (2011) Population genetics models with skewed fertilities: a forward and backward analysis. Stochastic Models 27, 521 554. HUILLET, T. AND MÖHLE, M. (2012) Correction on Population genetics models with skewed fertilities: a forward and backward analysis. Stochastic Models, to appear. KARLIN, S. AND MCGREGOR, J. (1964) Direct product branching processes and related Markov chains. Proc. Nat. Acad. Sci. U.S.A. 51, 598 602. 21

References (continued) KINGMAN, J. F. C. (1982) The coalescent. Stochastic Process. Appl. 13, 235 248. MÖHLE, M. (2000) Total variation distances and rates of convergence for ancestral coalescent processes in exchangeable population models. Adv. Appl. Probab. 32, 983 993. MÖHLE, M. (2011) Coalescent processes derived from some compound Poisson population models. Electron. Comm. Probab. 16, 567 582. MÖHLE, M. AND SAGITOV, S. (2001) A classification of coalescent processes for haploid exchangeable population models. Ann. Probab. 29, 1547 1562. SCHWEINSBERG, J. (2000) Coalescents with simultaneous multiple collisions. Electron. J. Probab. 5, 1 50. 22