ON COMPOUND POISSON POPULATION MODELS

ON COMPOUND POISSON POPULATION MODELS Martin Möhle, University of Tübingen (joint work with Thierry Huillet, Université de Cergy-Pontoise) Workshop on Probability, Population Genetics and Evolution Centre International de Rencontres Mathématiques (CIRM) Marseille-Luminy, France June 11, 2012 1

Exchangeable population models (Cannings) Non-overlapping generations r Z = {..., 1, 0, 1,...} Population size N, i.e. N individuals (genes, particles) in each generation ν (r) i := number of offspring of individual i of generation r, ν (r) 1 + +ν (r) N = N Additional assumptions on the offspring Exchangeability: (ν (r) π1,..., ν (r) πn ) d = (ν (r) 1,..., ν (r) N ) Homogeneity: (ν (r) 1,..., ν (r) N ) r i.i.d. Avoid the trivial model (ν (r) i 1). Define (ν 1,..., ν N ) := (ν (0) 1,..., ν (0) N ). π Examples. Wright-Fisher: (ν 1,..., ν N ) d = Multinomial(N, 1 N,..., 1 N ) Moran: (ν 1,..., ν N ) = random permutation of (2, 1,..., 1, 0) }{{} N 2 2

Exchangeable population models (graphical representation) 6 5 4 3 2 Past Example: Population size N = 8 ν (6) 3 = 3 ν (6) 5 = 1 ν (6) 6 = 4 1 0 Future 3

Conditional branching population models Let ξ 1, ξ 2,... be independent random variables taking values in N 0 := {0, 1, 2,...}. Assume that P(ξ 1 + + ξ N = N) > 0 for all N N := {1, 2,...}. Perform the following two steps: 1. Conditioning: Let µ 1,..., µ N be random variables such that the distribution of them coincides with that of ξ 1,..., ξ N conditioned on the event that ξ 1 + + ξ N = N. 2. Shuffling: Let ν 1,..., ν N be the µ 1,..., µ N randomly permutated. Conditional branching process models are particular Cannings models with offspring variables ν 1,..., ν N constructed as above (Karlin and McGregor, 1964). 4

Compound Poisson population models Compound Poisson population models are particular conditional branching process models for which ξ n has p.g.f. E(x ξ n ) = exp ( θ n (φ(z) φ(zx))), x 1, n N, with parameters 0 < θ n < and with a power series φ of the form φ(z) = z < r, with positive radius r of convergence and φ m 0, φ 1 > 0. m=1 φ m z m m!, 5

Distribution and factorial moments of µ Notation: Taylor expansion exp(θφ(z)) = k=0 σ k (θ) z k, z < r. k! (The coefficients σ k (θ) can be computed recursively.) Distribution of µ: P(µ 1 = j 1,..., µ N = j N ) = N! σ j1 (θ 1 ) σ jn (θ N ) j 1! j N! σ N ( N n=1 θ n) (j 1,..., j N N 0 with j 1 + + j N = N ) Factorial moments of µ: E((µ 1 ) k1 (µ N ) kn ) = N! σ N ( N n=1 θ n) j 1 k 1,...,j N k N j 1 + +j N =N σ j1 (θ 1 ) σ jn (θ N ) (j 1 k 1 )! (j N k N )! (k 1,..., k N N 0 ) 6

A subclass of compound Poisson models We focus on the subclass of compound Poisson models satisfying σ k+1 (θ) σ k (θ) + σ k +1(θ ) σ k (θ ) = σ k+k +1(θ + θ ) σ k+k (θ + θ ) k, k N 0, θ, θ (0, ). ( ) Lemma. A compound Poisson model satisfies ( ) if and only if φ m = (m 1)!φ 1 (φ 2 /φ 1 ) m 1 for all m N. (These are essentially Wright Fisher models and Dirichlet models.) If ( ) holds then µ has factorial moments E((µ 1 ) k1 (µ N ) kn ) = (N) k1 + +k N σ k1 (θ 1 ) σ kn (θ N ) σ k (θ 1 + + θ N ) (k 1,..., k N N 0 ) Assumption. In the following it is always assumed that ( ) holds. 7

Ancestral process Take a sample of n ( N ) individuals from some generation and consider their ancestors. (i, j) R t = R (n) t : individuals i and j have a common parent t generations backwards in time. The ancestral process R := (R t ) t=0,1,... is Markovian with state space E n (set of equivalence relations on {1,..., n}). Transition probabilities: P(R t+1 = η R t = ξ) = Φ (N) j (k 1,..., k j ), ξ, η E n with ξ η with Φ (N) j (k 1,..., k j ) := 1 σ k1 + +k j ( N n=1 θ n) N n 1,...,n j =1 all distinct σ k1 (θ n1 ) σ kj (θ nj ) where j := η = number of blocks of η, k 1,..., k j := group sizes of merging classes of ξ. ( k 1 + + k j = ξ ) 8

Two basic transition probabilities Notation: For N, k N define Θ k (N) := N n=1 θ k n. c N := coalescence probability := P(2 individuals have same parent) = Φ (N) 1 N 1 (2) = σ 2 (θ n ) = φ 2Θ 1 (N) + φ 2 1Θ 2 (N) σ 2 (Θ 1 (N)) φ 2 Θ 1 (N) + φ 2 1(Θ 1 (N)). 2 We also need n=1 d N := P(3 individuals have same parent) = Φ (N) 1 (3) = φ 3 Θ 1 (N) + 3φ 1 φ 2 Θ 2 (N) + φ 3 1Θ 3 (N) φ 3 Θ 1 (N) + 3φ 1 φ 2 (Θ 1 (N)) 2 + φ 3 1(Θ 1 (N)) 3. 9

Exchangeable coalescent processes Exchangeable coalescents are discrete-time or continuous-time Markov processes Π = (Π t ) t with state space E, the set of equivalence relations (partitions) on N := {1, 2,...}. During each transition, equivalence classes (blocks) merge together. Simultaneous multiple collisions of blocks are allowed. Schweinsberg (2000) characterizes these processes via a finite measure Ξ on the infinite simplex := {x = (x 1, x 2,...) : x 1 x 2 0, x := x i 1}. i=1 These processes are therefore also called Ξ-coalescents. 10

Domain of attraction For n N let ϱ n : E E n denote the restriction of E to E n, the set of equivalence relations on {1,..., n}. Definition. We say that the considered population model is in the domain of attraction of a continuoustime coalescent Π = (Π t ) t 0, if, for each sample size n N, the time-scaled ancestral process (R (n) [t/c N ] ) t 0 weakly converges to (ϱ n Π t ) t 0 as N. We say that the considered population model is in the domain of attraction of a discretetime coalescent Π = (Π t ) t=0,1,..., if, for each sample size n N, the ancestral process (R (n) t ) t=0,1,... weakly converges to (ϱ n Π t ) t=0,1,... as N. 11

Results (Regime 1) Theorem 1. Suppose that ( ) holds. If n=1 θ n <, then the compound Poisson population model is in the domain of attraction of a discrete-time Ξ-coalescent. Characterization of Ξ. There exists a consistent sequence (Q j ) j N of probability measures Q j on the j-simplex j := {(x 1,..., x j ) [0, 1] j : x 1 + + x j 1} uniquely determined via their moments x k 1 1 x k j j Q j (dx 1,..., dx j ) = σ k 1 (θ 1 ) σ kj (θ j ) j σ k1 + +k j ( n=1 θ n), k 1,..., k j N 0. Let Q denote the projective limit of (Q j ) j N, let X 1, X 2,... be random variables with joint distribution Q, and let ν be the joint distribution of the ordered variables X (1) X (2). Then, Ξ has density x (x, x) := n=1 x2 n with respect to ν. The measure Ξ is concentrated on the subset of points x = (x 1, x 2,...) satisfying x := n=1 x n = 1. 12

Regime 1 (continued) Remark. The proof of Theorem 1 is based on general convergence theorems for ancestral processes of Cannings models (M. and Sagitov 2001) and on the moment problem for the j-dimensional simplex (Gupta). Examples. Suppose that θ := n=1 θ n <. Wright-Fisher models. If φ(z) = φ 1 z, then σ k (θ) = θ k φ k 1. In this case ν is the Dirac measure at p = (θ 1 /θ, θ 2 /θ,...). The measure Ξ assigns its total mass Ξ( ) = (p, p) = ( n=1 θ2 n)/θ 2 to the single point p. Dirichlet models. If φ(x) = log(1 x), then φ m = (m 1)!, m N, and σ k (θ) = [θ] k := θ(θ + 1) (θ + k 1), k N 0. The limiting coalescent is the discrete-time Dirichlet-Kingman coalescent with parameter (θ n ) n N. 13

Results (Regime 2) Theorem 2. Suppose that ( ) holds. If n=1 θ n = and if n=1 θ2 n <, then the compound Poisson population model is in the domain of attraction of the Kingman coalescent. Remarks. 1. Time-scaling satisfies c N = Θ 2 (N)/(Θ 1 (N)) 2 if φ 2 = 0 and c N φ 2 /(φ 2 1Θ 1 (N)) if φ 2 > 0, where Θ k (N) := N n=1 θk n for k N. 2. In contrast to the situation in Theorem 1, the limiting coalescent in Theorem 2 does not depend on the function φ of the compound Poisson model. Theorem 2 is for example applicable if θ n = n α with α ( 1 2, 1]. 14

Sketch of proof The proof of Theorem 2 is based on the following technical lemma. Lemma. If ( ) holds, then the following five conditions are equivalent. (i) lim N Θ 2 (N) = 0. (ii) lim (Θ 1 (N)) 2 N (iii) lim N c N = 0. (iv) lim N Θ 3 (N) Θ 1 (N) Θ 2 (N) = 0. d N c N = 0. (v) The compound Poisson model is in the domain of attraction of the Kingman coalescent. Remark. The proof that (i) - (iii) are equivalent is technical but elementary. The equivalence of (iv) and (v) and the implication (iv) (iii) hold even for arbitrary Cannings models (M., 2000). The interesting point is that, for compound Poisson models, (iii) implies (iv). Note that this implication does not hold for arbitrary Cannings models. 15

Results (Regime 3) Theorem 3. Suppose that ( ) holds, that n=1 θ n = and that n=1 θ2 n =. Then the compound Poisson model is in the domain of attraction of the Kingman coalescent if and only if Θ 2 (N)/(Θ 1 (N)) 2 c N φ 2 /(φ 2 1Θ 1 (N)) + Θ 2 (N)/(Θ 1 (N)) 2. 0 as N. In this case the time-scaling c N satisfies Corollary. (unbiased case, Huillet and M., 2010) If ( ) holds and if θ n = θ does not depend on n, then the compound Poisson model is in the domain of attraction of the Kingman coalescent. The time-scaling c N satisfies c N (1 + φ 2 /(φ 2 1θ))/N. 16

Results (Regime 3, continued) Theorem 4. Suppose that ( ) holds and that all the limits p 1 (k) := lim N Θ k (N) (Θ 1 (N)) k, k N, exist. Then all the limits p j (k 1,..., k j ) := lim N 1 (Θ 1 (N)) k 1+ +k j N n 1,...,n j =1 all distinct θ k 1 n 1 θ k j n j, k 1,..., k j N, exist. Suppose now in addition that n=1 θ n = and that p 1 (2) > 0. Then, the compound Poisson model is in the domain of attraction of a discrete-time Ξ- coalescent Π. The characterizing measure ν(dx) := Ξ(dx)/(x, x) of Π is the Diracmeasure at x = (x 1, x 2,...), where x 1 := lim k (p 1 (k)) 1/k and x n+1 := lim k (p 1 (k) (x k 1 + + x k n)) 1/k, n N. 17

Regime 3 (continued) Remarks. Let Z N be a random variable taking the value θ n /Θ 1 (N) with probability θ n /Θ 1 (N), n {1,..., N}. The existence of all the limits p 1 (k), k N, is equivalent to the convergence Z N Z in distribution, where Z has characteristic function t k=0 (tk /k!)p 1 (k + 1), t R. In contrast to the situation in Theorem 1, the limiting discrete-time Ξ-coalescent in Theorem 4 does not depend on the function φ of the compound Poisson model. Example. Fix λ > 1 and suppose that θ n = λ n, n N. Then, p 1 (k) = (λ 1) k /(λ k 1) > 0, k N. In this case the measure Ξ of the limiting Ξ-coalescent assigns its total mass Ξ( ) = p 1 (2) = (λ 1)/(λ + 1) to the single point x = (x 1, x 2,...) defined via x n := (λ 1)/λ n = (1 1/λ)(1/λ) n 1, n N. 18

Generalization: Assume that ( ) does not hold Theorem. (Huillet, M. 2011) Fix θ (0, ) and suppose that the equation θzφ (z) = 1 has a real solution z(θ) (0, r). Then µ 1 X in distribution as N, where X has distribution P(X = k) = σ k (θ) (z(θ))k k! e θφ(z(θ)) k {0, 1, 2,...}. The associated symmetric compound Poisson model is in the domain of attraction of the Kingman coalescent. The effective population size N e := 1/c N satisfies N e ϱn as N, where ϱ := 1/E((X) 2 ) = 1/(1 + θ(z(θ)) 2 φ (z(θ))) (0, 1]. Remark. Proof uses the saddle point method to establish the asymptotics of σ N (Nθ). Open cases. a) Symmetric models without a solution z(θ), condensation (work in progress). b) Non symmetric models. 19

Conclusions Asymptotics of the ancestry for some compound Poisson population models analyzed Results essentially based on convergence theorems (M. 2000 and M. and Sagitov 2001) for ancestral processes of exchangeable Cannings population models Convergence to the Kingman coalescent if and only if Θ 2 (N)/(Θ 1 (N)) 2 0 Compound Poisson models satisfying ( ) are never in the domain of attraction of a continuous-time coalescent different from Kingman s coalescent; discrete-time Ξ-coalescents (with simultaneous multiple collisions) arise if the parameters θ n are unbalanced Three regimes depending on the behavior of the series n θ n and n θ2 n ; complete convergence results for the first two regimes; partial convergence results for the third regime (when both series diverge) Convergence to the Kingman coalescent holds even for more general symmetric compound Poisson models, which do not necessarily satisfy the restriction ( ) 20

References CANNINGS, C. (1974) The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv. Appl. Probab. 6, 260 290. GUPTA, J. C. (1999) The moment problem for the standard k-dimensional simplex. Sankhyā A 61, 286 291. HUILLET, T. AND MÖHLE, M. (2011) Population genetics models with skewed fertilities: a forward and backward analysis. Stochastic Models 27, 521 554. HUILLET, T. AND MÖHLE, M. (2012) Correction on Population genetics models with skewed fertilities: a forward and backward analysis. Stochastic Models, to appear. KARLIN, S. AND MCGREGOR, J. (1964) Direct product branching processes and related Markov chains. Proc. Nat. Acad. Sci. U.S.A. 51, 598 602. 21

References (continued) KINGMAN, J. F. C. (1982) The coalescent. Stochastic Process. Appl. 13, 235 248. MÖHLE, M. (2000) Total variation distances and rates of convergence for ancestral coalescent processes in exchangeable population models. Adv. Appl. Probab. 32, 983 993. MÖHLE, M. (2011) Coalescent processes derived from some compound Poisson population models. Electron. Comm. Probab. 16, 567 582. MÖHLE, M. AND SAGITOV, S. (2001) A classification of coalescent processes for haploid exchangeable population models. Ann. Probab. 29, 1547 1562. SCHWEINSBERG, J. (2000) Coalescents with simultaneous multiple collisions. Electron. J. Probab. 5, 1 50. 22