Computing likelihoods under Λ-coalescents

Size: px
Start display at page:

Download "Computing likelihoods under Λ-coalescents"

Transcription

1 Computing likelihoods under Λ-coalescents Matthias Birkner LMU Munich, Dept. Biology II Joint work with Jochen Blath and Matthias Steinrücken, TU Berlin New Mathematical Challenges from Molecular Biology, Banff International Research Station, September 6-11, 2009 Matthias Birkner (LMU Munich) Banff, Sept / 42

2 Outline 1 Introduction 2 Infinitely-many-sites model 3 Computing likelihoods 4 Illustration and outlook Matthias Birkner (LMU Munich) Banff, Sept / 42

3 Introduction Wright-Fisher model and Kingman s coalescent Wright-Fisher model: The fundamental model for genetic drift A (haploid) population of N individuals per generation, each individual in the present generation picks a parent at random from the previous generation, genetic types are inherited (possibly with a small probability of mutation). past Matthias Birkner (LMU Munich) Banff, Sept / 42

4 Introduction Wright-Fisher model and Kingman s coalescent Wright-Fisher model: The fundamental model for genetic drift A (haploid) population of N individuals per generation, each individual in the present generation picks a parent at random from the previous generation, genetic types are inherited (possibly with a small probability of mutation). past Matthias Birkner (LMU Munich) Banff, Sept / 42

5 Introduction Wright-Fisher model and Kingman s coalescent Wright-Fisher model: The fundamental model for genetic drift A (haploid) population of N individuals per generation, each individual in the present generation picks a parent at random from the previous generation, genetic types are inherited (possibly with a small probability of mutation). past Matthias Birkner (LMU Munich) Banff, Sept / 42

6 Introduction Wright-Fisher model and Kingman s coalescent Wright-Fisher model: The fundamental model for genetic drift A (haploid) population of N individuals per generation, each individual in the present generation picks a parent at random from the previous generation, genetic types are inherited (possibly with a small probability of mutation). past Matthias Birkner (LMU Munich) Banff, Sept / 42

7 Introduction Genealogical point of view Wright-Fisher model and Kingman s coalescent Sample n ( N) individuals from the present generation present past Matthias Birkner (LMU Munich) Banff, Sept / 42

8 Kingman s coalescent Introduction Wright-Fisher model and Kingman s coalescent Theorem (Kingman (& Hudson, Griffiths), 1982) In the limit N, the genealogy of an n- sample, measured in units of N generations, is described by a continuous-time Markov chain where each pair of lineages merges at rate 1. Matthias Birkner (LMU Munich) Banff, Sept / 42

9 Kingman s coalescent Introduction Wright-Fisher model and Kingman s coalescent Theorem (Kingman (& Hudson, Griffiths), 1982) In the limit N, the genealogy of an n- sample, measured in units of N generations, is described by a continuous-time Markov chain where each pair of lineages merges at rate 1. Robustness. The same limit appears for any exchangeable offspring vectors (ν 1,...,ν N ), (independent over generations), if time is measured in units of N σ 2 generations, where σ2 = lim N Var(ν 1) (under a third moment condition on ν 1 ). Matthias Birkner (LMU Munich) Banff, Sept / 42

10 Introduction Wright-Fisher model and Kingman s coalescent Modeling neutral variation: Superimposing types on the coalescent Assume that the considered genetic types do not affect their bearer s reproductive succes. If as population size N, N mutation prob. per ind. per generation r, σ2 the type configuration in the sample can be described by putting mutations with rate r along the genealogy. Matthias Birkner (LMU Munich) Banff, Sept / 42

11 Question Introduction Multiple merger coalescents What if the variability of surviving offspring numbers across individuals is so large that reasonably individual offspring variance σ 2? This might happen e.g. in marine species (so-called reproduction sweepstakes). Matthias Birkner (LMU Munich) Banff, Sept / 42

12 Introduction Multiple merger coalescents Coalescents with multiple collisions, aka Λ-coalescents While n lineages, any k coalesce at rate λ n,k = [0,1] x k 2 (1 x) n k Λ(dx), where Λ is a finite measure on [0,1]. (Sagitov, 1999; Pitman, 1999). Matthias Birkner (LMU Munich) Banff, Sept / 42

13 Introduction Multiple merger coalescents Coalescents with multiple collisions, aka Λ-coalescents While n lineages, any k coalesce at rate λ n,k = [0,1] x k 2 (1 x) n k Λ(dx), where Λ is a finite measure on [0,1]. (Sagitov, 1999; Pitman, 1999). Interpretation: re-write λ n,k = [0,1] xk (1 x) n k 1 x 2 Λ(dx) to see: at rate 1 x 2 Λ([x,x + dx]), an x-resampling event occurs. Thinking forwards in time, this corresponds to an event in which the fraction x of the total population is replaced by the offspring of a single individual. Matthias Birkner (LMU Munich) Banff, Sept / 42

14 Introduction Multiple merger coalescents Coalescents with multiple collisions, aka Λ-coalescents While n lineages, any k coalesce at rate λ n,k = [0,1] x k 2 (1 x) n k Λ(dx), where Λ is a finite measure on [0,1]. (Sagitov, 1999; Pitman, 1999). Interpretation: re-write λ n,k = [0,1] xk (1 x) n k 1 x 2 Λ(dx) to see: at rate 1 x 2 Λ([x,x + dx]), an x-resampling event occurs. Thinking forwards in time, this corresponds to an event in which the fraction x of the total population is replaced by the offspring of a single individual. Form of rates stems from λ n,k = λ n+1,k + λ n+1,k+1 (consistency condition). Matthias Birkner (LMU Munich) Banff, Sept / 42

15 Introduction Multiple merger coalescents Coalescents with multiple collisions, aka Λ-coalescents While n lineages, any k coalesce at rate λ n,k = [0,1] x k 2 (1 x) n k Λ(dx), where Λ is a finite measure on [0,1]. (Sagitov, 1999; Pitman, 1999). Interpretation: re-write λ n,k = [0,1] xk (1 x) n k 1 x 2 Λ(dx) to see: at rate 1 x 2 Λ([x,x + dx]), an x-resampling event occurs. Thinking forwards in time, this corresponds to an event in which the fraction x of the total population is replaced by the offspring of a single individual. Form of rates stems from λ n,k = λ n+1,k + λ n+1,k+1 (consistency condition). Note: Λ = δ 0 corresponds to Kingman s coalescent. Matthias Birkner (LMU Munich) Banff, Sept / 42

16 Introduction Multiple merger coalescents Cannings models in the domain of attraction of a Λ-coalescent Fixed population size N, exchangeable offspring numbers in one generation ( ν1,ν 2,...,ν N ). Sagitov (1999), Möhle & Sagitov (2001) clarify under which conditions the genealogies of a sequence of exchangeable finite population models are described by a Λ-coalescent: c N := pair coalescence probability over one generation 0 ( c N = 1 N 1 E[ν 1(ν 1 1)] ) two double mergers asymptotically negligible compared to one triple merger Nc N Pr ( a given family has size Nx ) 1 x y 2 Λ(dy) Time is measured in 1/c N generations (in general 1/pop. size) Matthias Birkner (LMU Munich) Banff, Sept / 42

17 Introduction The Beta(2 α, α)-class Note: There are many Λ-coalescents. Maybe a natural first candidate (considered by Eldon & Wakeley, Genetics 2006): Λ = wδ 0 + (1 w)δ ψ with w,ψ (0,1) Matthias Birkner (LMU Munich) Banff, Sept / 42

18 Introduction The Beta(2 α, α)-class Note: There are many Λ-coalescents. Maybe a natural first candidate (considered by Eldon & Wakeley, Genetics 2006): Λ = wδ 0 + (1 w)δ ψ with w,ψ (0,1) Simple, few parameters But: No realistic microscopic population model Matthias Birkner (LMU Munich) Banff, Sept / 42

19 Introduction The Beta(2 α, α)-class A heavy-tailed Cannings model and Beta-coalescents Haploid population of size N. Individual i has X i potential offspring, X 1,X 2,...,X N are i.i.d. with mean m := E [ X 1 ] > 1, Pr ( X 1 k ) Const. k α with α (1,2). Note: infinite variance. Sample N without replacement from all potential offspring to form the next generation. Matthias Birkner (LMU Munich) Banff, Sept / 42

20 Introduction The Beta(2 α, α)-class A heavy-tailed Cannings model and Beta-coalescents Haploid population of size N. Individual i has X i potential offspring, X 1,X 2,...,X N are i.i.d. with mean m := E [ X 1 ] > 1, Pr ( X 1 k ) Const. k α with α (1,2). Note: infinite variance. Sample N without replacement from all potential offspring to form the next generation. Theorem (Schweinsberg, 2003) Let c N = prob. of pair coalescence one generation back in N-th model. c N const. N 1 α, measured in units of 1/c N generations, the genealogy of a sample from the N-th model is approximately described by a ( Λ-coalescent with Λ = Beta(2 α, α). ) 1 Beta(2 α, α)(dx) = ½ [0,1] (x) Γ(2 α)γ(α) x1 α (1 x) α 1 dx Matthias Birkner (LMU Munich) Banff, Sept / 42

21 Introduction The Beta(2 α, α)-class The family Beta(2 α, α), α (1, 2] Kingman s coalescent included as boundary case: Beta(2 α,α) δ 0 weakly as α 2. Smaller α means tendency towards more extreme resampling events. For α 1, corresponding coalescents do not come down from infinity Beta(2 α, α)-coalescents appear as genealogies of α-stable continuous mass branching process (via a time-change). Scaling relation of mutation rate per generation relative to population size depends on α! Matthias Birkner (LMU Munich) Banff, Sept / 42

22 Neutral mutations Infinitely many sites Infinitely many sites model (Kimura 1971) Model genetic locus as infinite sequence of completely linked sites, mutations always hit a new site. All mutations are neutral. Mathematical abstraction: a gene is [0,1] a type is a configuration of points on [0,1] Ethier & Griffiths (1987) parametrisation: type space E = [0,1] N mutation operator Bf ( (x 1,x 2,... ) ) = r 1 0 f ( (u,x 1,x 2,... ) ) f ( (x 1,x 2,... ) ) du Matthias Birkner (LMU Munich) Banff, Sept / 42

23 Neutral mutations Infinitely many sites Berestycki, Berestycki & Schweinsberg (2007) and Berestycki, Berestycki & Limic (2009) describe precisely the asymptotic behaviour of the frequency spectrum under a Λ-coalescent Matthias Birkner (LMU Munich) Banff, Sept / 42 Infinitely many sites model (Kimura 1971) Model genetic locus as infinite sequence of completely linked sites, mutations always hit a new site. All mutations are neutral. Mathematical abstraction: a gene is [0,1] a type is a configuration of points on [0,1] Ethier & Griffiths (1987) parametrisation: type space E = [0,1] N mutation operator Bf ( (x 1,x 2,... ) ) = r 1 0 f ( (u,x 1,x 2,... ) ) f ( (x 1,x 2,... ) ) du

24 Neutral mutations Infinitely many sites Combinatorics of the Infinitely-many-sites model Model genetic locus as infinite sequence of completely linked sites, mutations always hit a new site Example: segr. site Seq (0=wild type, 1=mutant assume known ancestral types) Obs. fit IMS no sub-matrix (and no row permutation) a b c a b c Matthias Birkner (LMU Munich) Banff, Sept / 42

25 Neutral mutations Infinitely many sites Combinatorics of the infinitely-many-sites model, II If the infinitely-many-sites model applies, the observations correspond to a unique rooted perfect phylogeny (or genetree ). Sequences, Genetree, obs. types segr. site Seq ,4 5 Construct e.g. using Gusfield s (1991) algorithm. 4 3 type multiplicity (1, 0) 1 (2, 1, 0) 1 (4, 3, 0) 2 (3, 0) 1 Note: purely combinatorial, does not depend on a probabilistic model for the observations. Matthias Birkner (LMU Munich) Banff, Sept / 42

26 Neutral mutations (Λ-) Ethier-Griffiths urn Simulating samples under the IMS model Literally, the data arises (in our model) in a stochastic two-step procedure First generate genealogy acc. to coalescent. Then superimpose mutations. Matthias Birkner (LMU Munich) Banff, Sept / 42

27 Neutral mutations (Λ-) Ethier-Griffiths urn Simulating samples under the IMS model Literally, the data arises (in our model) in a stochastic two-step procedure First generate genealogy acc. to coalescent. Then superimpose mutations. The Ethier-Griffiths urn (1987) can be used to generate a random sample of size n under Kingman s coalescent (with mutation rate r per line) in one pass : Start with 2 leaves. When there are k leaves: Add a mutation to a leaf w. prob. 2r 2r+(k 1), split one leaf w. prob. k 1 2r+(k 1) (leaf picked uniformly among the k). Stop when n + 1 leaves, delete last leaf. Matthias Birkner (LMU Munich) Banff, Sept / 42

28 Neutral mutations (Λ-) Ethier-Griffiths urn Simulating samples under the IMS model: Λ-case (Y (n) t ) 0 block counting process of Λ-coalescent starting from n blocks: Jump from i to j {1,2,...,i 1} at rate q ij := ( i i j+1) λi,i j+1. τ 1 := inf{t 0 : Y (n) t = 1}. Matthias Birkner (LMU Munich) Banff, Sept / 42

29 Neutral mutations (Λ-) Ethier-Griffiths urn Simulating samples under the IMS model: Λ-case (Y (n) t ) 0 block counting process of Λ-coalescent starting from n blocks: Jump from i to j {1,2,...,i 1} at rate q ij := ( i i j+1) λi,i j+1. Ỹ (n) t τ 1 := inf{t 0 : Y (n) t = 1}. := Y (n) (τ 1 t) time-reversed block counting process (Ỹ (n) t = for t τ 1 ). Jump rates q (n) ji = g niq i j g nj, q (n) n = q nn = n 1 j=1 q nj (Nagasawa s formula) P(Ỹ (n) 0 = k) = P(Y (n) τ 1 = k) = g nkq k1. g ni := E t = i)dt is the Green function (in general, not known explicitly, but easy recursion). 0 1(Y (n) Matthias Birkner (LMU Munich) Banff, Sept / 42

30 Neutral mutations (Λ-) Ethier-Griffiths urn Simulating samples under the IMS model: Λ-case, cont. The n-λ- Ethier-Griffiths urn (mutation rate r). Begin with K leaves, P(K = k) = P(Ỹ (n) 0 = k). While there are k leaves: Add a mutation to a leaf w. prob. split one leaf into l if k = n goto stop (leaf picked uniformly among the k). Stop. w. prob. eq(n) w. prob. r kr eq (n) kk, k,k+l 1 eq (n) kk eq nn (n) kr eq nn (n), Matthias Birkner (LMU Munich) Banff, Sept / 42

31 Computing likelihoods Tree probabilities Recursion for tree probabilities Can calculate P (r,λ) ( observed sequence data (t,n) ) =: p(t,n) via Matthias Birkner (LMU Munich) Banff, Sept / 42

32 Computing likelihoods Recursion for tree probabilities Tree probabilities Can calculate P (r,λ) ( observed sequence data (t,n) ) =: p(t,n) via p(t,n) = 1 rn + λ n r + rn + λ n + n i i:n i 2 k=2 r 1 rn + λ n d ( n k i:n i =1,x i0 unique, s(x i ) x j j i:n i =1, x i0 unique ) λ n,k n i k + 1 n k + 1 p( t,n (k 1)e i ) p ( s i (t),n ) ( (n j + 1)p ri (t), r i (n + e j ) ). j:s(x i)=x j Extends Ethier & Griffiths (1987) to Λ-coalescents and Möhle s recursion (2005) to IMS model (and is a true recursion in the complexity of the sample). Matthias Birkner (LMU Munich) Banff, Sept / 42

33 Computing likelihoods Tree probabilities Compute probabilities of observed data naïve approach: simulate (often), est. via empirical frequency of observed data (infeasible: req. prob. may be ) Use exact recursions for moderate sample complexities. Approach more complex samples by version of Griffiths & Tavaré s (1994) Monte Carlo method [ τ 1 p(t,n) = E (t,n) i=0 ] f (r,λ) (X i ) For suitable Markov chain X i on sample configurations. Estimate expectation via empirical mean of independent runs. extension to Λ-coalescents by B. & Blath (2008) Matthias Birkner (LMU Munich) Banff, Sept / 42

34 Computing likelihoods Tree probabilities Analyzing a simulated sample of size 15: sd b estimate ) log10( Griffiths & Tavaré log10(#runs) Matthias Birkner (LMU Munich) Banff, Sept / 42

35 Histories Computing likelihoods Importance sampling Interpret genealogy as sequence of historical states: H = ( H τ = ((1),(0)),H τ 1,...,H 1,H 0 = (t,n) ) ( ((3,1,0),(1,0),(2,1,0),(0) ),(1,2,1,1) ) Matthias Birkner (LMU Munich) Banff, Sept / 42

36 Histories Computing likelihoods Importance sampling Interpret genealogy as sequence of historical states: H = ( H τ = ((1),(0)),H τ 1,...,H 1,H 0 = (t,n) ) 0 H 6 1 H 5 H H 3 H 2 H 1 H 0 ( ((3,1,0),(1,0),(2,1,0),(0) ),(1,2,1,1) ) Matthias Birkner (LMU Munich) Banff, Sept / 42

37 Histories Computing likelihoods Importance sampling Interpret genealogy as sequence of historical states: H = ( H τ = ((1),(0)),H τ 1,...,H 1,H 0 = (t,n) ) ( ((3,1,0),(1,0),(2,1,0),(0) ),(1,2,1,1) ) Matthias Birkner (LMU Munich) Banff, Sept / 42

38 Histories Computing likelihoods Importance sampling Interpret genealogy as sequence of historical states: H = ( H τ = ((1),(0)),H τ 1,...,H 1,H 0 = (t,n) ) ( ((3,1,0),(1,0),(2,1,0),(0) ),(1,2,1,1) ) Matthias Birkner (LMU Munich) Banff, Sept / 42

39 Histories Computing likelihoods Importance sampling Interpret genealogy as sequence of historical states: H = ( H τ = ((1),(0)),H τ 1,...,H 1,H 0 = (t,n) ) different histories can lead to same sample ( ((3,1,0),(1,0),(2,1,0),(0) ),(1,2,1,1) ) Matthias Birkner (LMU Munich) Banff, Sept / 42

40 Importance sampling Computing likelihoods Importance sampling We have p(t,n) = P (r,λ) ( H0 = (t,n) ) = = H:H 0 =(t,n) H:H 0 =(t,n) ( ) P (r,λ) H Q ( H ) Q(H), }{{} =:w(h) importance weight P (r,λ) ( H ) for any law Q on histories s.th. P (r,λ) {H0 =(t,n)} Q. Thus, p(t,n) 1 R R w(h (i) ), i=1 where H (1),..., H (R) are independent samples from Q Matthias Birkner (LMU Munich) Banff, Sept / 42

41 Computing likelihoods Importance sampling (Theoretical) optimal solution p(t,n) = H:H 0 =(t,n) ( ) P (r,λ) H Q ( H ) Q( H ) 1 R R w(h (i) ) i=1 Q opt ( ) := P (r,λ) ( H 0 = (t,n) ) is optimal: Variance of estimator is zero since w(h (i) ) p(t,n). Finding Q opt is as hard as the original problem. H 0,H 1,... is Markov chain under Q opt (time reversal of Λ-Ethier-Griffiths urn). Remark: Transition probabilities q GT (H i H i+1 ) P (r,λ) (H i+1 H i ) gives (Λ-)Griffiths-Tavaré method. Matthias Birkner (LMU Munich) Banff, Sept / 42

42 Computing likelihoods Importance sampling Stephens and Donnelly s (2000) IMS candidate Kingman case: Choose individual uniformly: If type is unique in sample, remove outmost mutation, if at least two individuals with this type, merge two lines. (this would be optimal for parent-independent mutations) Heuristic extension to Λ case: ( si (t),n ) w.p. 1 if n i = 1, x i0 unique, s i (x i ) x j j ( (t,n) ri (t,r i (n + e j )) w.p. 1 if n i = 1, x i0 unique, s i (x i ) x j ( ) t,n (k 1)ei where q ni (k) = w.p. n i q ni (k) if 2 k n i, q n,n k+1 P ni, jump probabilities of block counting process. l=2 q n,n l+1 Matthias Birkner (LMU Munich) Banff, Sept / 42

43 Computing likelihoods Importance sampling Analyzing a simulated sample of size 15: sd b estimate ) log10( Griffiths & Tavaré Stephens & Donnelly log10(#runs) Matthias Birkner (LMU Munich) Banff, Sept / 42

44 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea Sample of size n where exactly one mutation is visible (in d copies). 0 n-d 1 d { } p (1) (r,λ) (n,d) = P most recent event involves individual bearing (r,λ) mutation Probability can be computed Kingman case: explicit formula (HUW (2008)) Λ case: numerically, using recursion Matthias Birkner (LMU Munich) Banff, Sept / 42

45 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea contd. For a general sample (t,n) put i u i,m = { where mutation m is present in d m individuals. Matthias Birkner (LMU Munich) Banff, Sept / 42

46 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea contd. For a general sample (t,n) carrying: = d put i u i,m = { p (1) (r,λ) (n,d m) ni d m if i bears m where mutation m is present in d m individuals. Matthias Birkner (LMU Munich) Banff, Sept / 42

47 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea contd. For a general sample (t,n) carrying: put i u i,m = { p (1) (r,λ) (n,d m) ni d m if i bears m where mutation m is present in d m individuals. Matthias Birkner (LMU Munich) Banff, Sept / 42

48 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea contd. For a general sample (t,n) carrying: not carrying: put i { (1) p (r,λ) u i,m = (n,d m) ni d ( m (1) 1 p (r,λ) (n,d m) ) n i n d m where mutation m is present in d m individuals. if i bears m otherwise, Matthias Birkner (LMU Munich) Banff, Sept / 42

49 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea contd. For a general sample (t,n) carrying: not carrying: put i { (1) p (r,λ) u i,m = (n,d m) ni d ( m (1) 1 p (r,λ) (n,d m) ) n i n d m where mutation m is present in d m individuals if i bears m otherwise, Matthias Birkner (LMU Munich) Banff, Sept / 42

50 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea contd. For a general sample (t,n) carrying: not carrying: put i { (1) p (r,λ) u i,m = (n,d m) ni d ( m (1) 1 p (r,λ) (n,d m) ) n i n d m if i bears m otherwise, where mutation m is present in d m individuals. Propose type i according to { ( ) q Λ-HUW i (t,n) m u i,m if i is allowed to act 0 otherwise. Matthias Birkner (LMU Munich) Banff, Sept / 42

51 Computing likelihoods Importance sampling Hobolth, Uyenoyama & Wiuf s (2008) idea contd. If proposed type i is singleton: remove outmost mutation, has n i 2: merger inside type i. Kingman case: merge two lines Λ-case: propose l + 1-merger w.p. P r,λ { 0 n 1 d o l 0 n 1 d o } Matthias Birkner (LMU Munich) Banff, Sept / 42

52 Computing likelihoods Importance sampling Analyzing a simulated sample of size 15: sd b estimate ) log10( Griffiths & Tavaré Stephens & Donnelly Hobolth, Uyenoyama & Wiuf log10(#runs) Matthias Birkner (LMU Munich) Banff, Sept / 42

53 Computing likelihoods Using Pairs of Mutations Importance sampling For sample (t,n) denote possible steps by { {decrease n i by l} if n i > 1,l > 0, (i,l) := {remove o(i)} if n i = 1,l = 0 and o(i) leaf. 0 0 m(i) m(i) 1 6 i Let m(i) be type corresponding to i in induced subtree. Set { } decrease multiplicity of m(i) by n P (r,λ) i l in {m v {m1,m 2 }(i,l) = 1, m 2 }-genetree #m(i) if l > 1, { } last event origin of o(m(i)) in P (r,λ) l = 0. {m 1, m 2 }-genetree 4 2 Matthias Birkner (LMU Munich) Banff, Sept / 42

54 Computing likelihoods Using Pairs of Mutations cont d Importance sampling Summing over all pairs of mutations turned out to be inefficient. Sum over pairs where one mutation is o(i). For general (t,n) choose step (i,l) according to { m o(i) Q Λ-HUW2 (i,l t,n) v {o(i),m}(i,l) if i is allowed to act, 0 otherwise. Matthias Birkner (LMU Munich) Banff, Sept / 42

55 Computing likelihoods Importance sampling Analyzing a simulated sample of size 15: sd b estimate ) log10( Griffiths & Tavaré Stephens & Donnelly Hobolth, Uyenoyama & Wiuf Hobolth, Uyenoyama & Wiuf II log10(#runs) Matthias Birkner (LMU Munich) Banff, Sept / 42

56 Performance Computing likelihoods Importance sampling Simulated 100 samples of size 15 with random r and α. Analysed with (new) random r and α. Time needed to get relative error below 0.01: (a) measured in log10(# runs of MC) (b) measured in log10(seconds) black = Λ-GT, green = Λ-SD, red = Λ-HUW, blue = Λ-HUW-II Matthias Birkner (LMU Munich) Banff, Sept / 42

57 Computing likelihoods Importance sampling Remarks Potential for speed-up: Compute likelihoods for several parameter sets (r 1,Λ 1 ),...,(r m,λ m ) simulataneously using sampled histories H (1),..., H (R) from one driving value (r 0,Λ 0 ) Pre-compute and store p (r,λ) (t,n)) for small trees (e.g. at most m mutations), stop chain when entering set of small trees Matthias Birkner (LMU Munich) Banff, Sept / 42

58 Computing likelihoods Importance sampling A simulation study: Comparing three estimators (sample size n = 100) Three (ML) estimators for α α SS based on summary statistic (no. mutations,no. singletons) (computation fast via recursion, Eldon & Wakeley s approach) α data based on full data α fulltree based on knowledge of full genealogical tree (in reality: unknown) Matthias Birkner (LMU Munich) Banff, Sept / 42

59 Computing likelihoods Importance sampling A simulation study: Comparing three estimators (sample size n = 100) Three (ML) estimators for α α SS based on summary statistic (no. mutations,no. singletons) (computation fast via recursion, Eldon & Wakeley s approach) α data based on full data α fulltree based on knowledge of full genealogical tree (in reality: unknown) True α = 1 α SS α data α fulltree mean MSE True α = 1.4 α SS α data α fulltree mean MSE True α = 2 α SS α data α fulltree mean MSE (Estimated values, based on 100 replicates. True r = 1.0) Matthias Birkner (LMU Munich) Banff, Sept / 42

60 Computing likelihoods Importance sampling Estimating aspects of the genealogy Method allows to estimate times, conditional on data, e.g. E (r,λ) [ TMRCA of sample obs. data = (t,n) ], P (r,λ) ( TMRCA of sample t obs. data = (t,n) ) : For any test function ϕ E (r,λ) [ ϕ(h) obs. data = (t,n) ] 1 R R i=1 ϕ(h (i) ) P ( (r,λ) H (i) ) Q ( H (i)) with H (1),...,H (R) i.i.d. draws from Q( ) (compatible with P (r,λ) ( obs. data)). (Literally: Plus additional independent exponential waiting times between events in the genealogy) Matthias Birkner (LMU Munich) Banff, Sept / 42

61 Illustration and outlook Illustration Dataset from Ward et al, Extensive Mitochondrial Diversity Within a Single Amerindian Tribe, PNAS 1991 Analysis with Beta-Coalescent: r e 25 1e 24 1e 23 1e e 21 1e 20 2e 204e 20 9e 20 8e 20 6e α Mitochondrial control region from 55 female Nuu-Chah-Nulth: α ML = 1.9, r ML = 2.2 (Sample as edited in Griffiths & Tavaré, Stat. Sci., 1994) 1e 21 1e 22 1e 23 1e 24 1e 25 Matthias Birkner (LMU Munich) Banff, Sept / 42

62 r r Illustration and outlook r r Illustration r r Genetic variation at the mitochondrial cyt b locus of Atlantic cod: log-likelihood surfaces Carr & Marshall 1991 Pepin & Carr 1993 Árnason & Palsson α α α bα ML = 1.4,br ML = 0.8, bα BBS = 1.4 bα ML = 1.4,br ML = 0.6, bα BBS = 1.5 bα ML = 1.65,br ML = 0.7, bα BBS = Árnason et al 1998 Árnason et al 2000 Sigurgíslason & Árnason α α α bα ML = 1.55,br ML = 0.6, bα BBS = 1.8 bα ML = 1.4,br ML = 0.8, bα BBS = 1.5 bα ML = 1.4,br ML = 0.8, bα BBS = Matthias Birkner (LMU Munich) Banff, Sept / 42

63 Illustration and outlook Illustration α-effective population size can the figures make sense? In Schweinsberg s model, we have pair coalescence prob. c N C N 1 α (C = αγ(α)γ(2 α)m α, where m = mean of X 1 ) Matthias Birkner (LMU Munich) Banff, Sept / 42

64 Illustration and outlook Illustration α-effective population size can the figures make sense? In Schweinsberg s model, we have pair coalescence prob. c N C N 1 α (C = αγ(α)γ(2 α)m α, where m = mean of X 1 ) Let µ = mutation rate per gen. at the considered locus, then µ c N r Matthias Birkner (LMU Munich) Banff, Sept / 42

65 Illustration and outlook Illustration α-effective population size can the figures make sense? In Schweinsberg s model, we have pair coalescence prob. c N C N 1 α (C = αγ(α)γ(2 α)m α, where m = mean of X 1 ) Let µ = mutation rate per gen. at the considered locus, then µ r, ( c N hence rαγ(α)γ(2 α)m α) 1/(α 1). N eff,α µ Matthias Birkner (LMU Munich) Banff, Sept / 42

66 Illustration and outlook Illustration α-effective population size can the figures make sense? In Schweinsberg s model, we have pair coalescence prob. c N C N 1 α (C = αγ(α)γ(2 α)m α, where m = mean of X 1 ) Let µ = mutation rate per gen. at the considered locus, then µ r, ( c N hence rαγ(α)γ(2 α)m α) 1/(α 1). N eff,α µ Using µ = (Árnason, 2004), α = 1.5, r = 1 (and, ad hoc, m = 2), this gives N eff,α= Árnason (2004) writes:... the actual population size [of atlantic cod] is not < 10 9 and probably one or two orders of magnitude larger. Matthias Birkner (LMU Munich) Banff, Sept / 42

67 Illustration and outlook Illustration α-effective population size can the figures make sense? In Schweinsberg s model, we have pair coalescence prob. c N C N 1 α (C = αγ(α)γ(2 α)m α, where m = mean of X 1 ) Let µ = mutation rate per gen. at the considered locus, then µ r, ( c N hence rαγ(α)γ(2 α)m α) 1/(α 1). N eff,α µ Using µ = (Árnason, 2004), α = 1.5, r = 1 (and, ad hoc, m = 2), this gives N eff,α= Árnason (2004) writes:... the actual population size [of atlantic cod] is not < 10 9 and probably one or two orders of magnitude larger. By contrast, using a Wright-Fisher model and N eff,α=2 µ = r, we have (using r α=2 = 2): Neff,α= Matthias Birkner (LMU Munich) Banff, Sept / 42

68 Summary & Outlook Illustration and outlook Outlook Eldon & Wakeley, Genetics 2006, wrote For many species, the coalescent with multiple mergers might be a better null model than Kingman s coalescent. Matthias Birkner (LMU Munich) Banff, Sept / 42

69 Illustration and outlook Outlook Summary & Outlook Eldon & Wakeley, Genetics 2006, wrote For many species, the coalescent with multiple mergers might be a better null model than Kingman s coalescent. For panmictic fixed-size discrete generations populations, haploid neutral one-locus theory is mathematically complete. Tools for estimation exist, results point towards non-kingman-ness in certain cases. Statistical properties of estimators? (Further) speed-up of computer-intensive methods? A good class of alternative models? In particular, true diploid models? Application to scenarios with selection? Matthias Birkner (LMU Munich) Banff, Sept / 42

70 Illustration and outlook Outlook Thank you for your attention! Matthias Birkner (LMU Munich) Banff, Sept / 42

Mathematical Biology. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. Matthias Birkner Jochen Blath

Mathematical Biology. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. Matthias Birkner Jochen Blath J. Math. Biol. DOI 0.007/s00285-008-070-6 Mathematical Biology Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model Matthias Birkner Jochen Blath Received:

More information

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion Bob Griffiths University of Oxford A d-dimensional Λ-Fleming-Viot process {X(t)} t 0 representing frequencies of d types of individuals

More information

Stochastic Demography, Coalescents, and Effective Population Size

Stochastic Demography, Coalescents, and Effective Population Size Demography Stochastic Demography, Coalescents, and Effective Population Size Steve Krone University of Idaho Department of Mathematics & IBEST Demographic effects bottlenecks, expansion, fluctuating population

More information

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg Rick Durrett Rick Durrett (Cornell) Genetics with Jason 1 / 42 The Problem Given a population of size N, how long does it take until τ k the first

More information

Evolution in a spatial continuum

Evolution in a spatial continuum Evolution in a spatial continuum Drift, draft and structure Alison Etheridge University of Oxford Joint work with Nick Barton (Edinburgh) and Tom Kurtz (Wisconsin) New York, Sept. 2007 p.1 Kingman s Coalescent

More information

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge The mathematical challenge What is the relative importance of mutation, selection, random drift and population subdivision for standing genetic variation? Evolution in a spatial continuum Al lison Etheridge

More information

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego. Dynamics of the evolving Bolthausen-Sznitman coalescent by Jason Schweinsberg University of California at San Diego Outline of Talk 1. The Moran model and Kingman s coalescent 2. The evolving Kingman s

More information

Measure-valued diffusions, general coalescents and population genetic inference 1

Measure-valued diffusions, general coalescents and population genetic inference 1 Measure-valued diffusions, general coalescents and population genetic inference 1 Matthias Birkner 2, Jochen Blath 3 Abstract: We review recent progress in the understanding of the interplay between population

More information

ON COMPOUND POISSON POPULATION MODELS

ON COMPOUND POISSON POPULATION MODELS ON COMPOUND POISSON POPULATION MODELS Martin Möhle, University of Tübingen (joint work with Thierry Huillet, Université de Cergy-Pontoise) Workshop on Probability, Population Genetics and Evolution Centre

More information

Theoretical Population Biology

Theoretical Population Biology Theoretical Population Biology 74 008 104 114 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb A coalescent process with simultaneous

More information

BIRS workshop Sept 6 11, 2009

BIRS workshop Sept 6 11, 2009 Diploid biparental Moran model with large offspring numbers and recombination Bjarki Eldon New mathematical challenges from molecular biology and genetics BIRS workshop Sept 6, 2009 Mendel s Laws The First

More information

The Combinatorial Interpretation of Formulas in Coalescent Theory

The Combinatorial Interpretation of Formulas in Coalescent Theory The Combinatorial Interpretation of Formulas in Coalescent Theory John L. Spouge National Center for Biotechnology Information NLM, NIH, DHHS spouge@ncbi.nlm.nih.gov Bldg. A, Rm. N 0 NCBI, NLM, NIH Bethesda

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent Airam Blancas June 11, 017 Abstract Simple nested coalescent were introduced in [1] to model

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego The genealogy of branching Brownian motion with absorption by Jason Schweinsberg University of California at San Diego (joint work with Julien Berestycki and Nathanaël Berestycki) Outline 1. A population

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009 Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates JOSEPH FELSENSTEIN Department of Genetics SK-50, University

More information

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego Yaglom-type limit theorems for branching Brownian motion with absorption by Jason Schweinsberg University of California San Diego (with Julien Berestycki, Nathanaël Berestycki, Pascal Maillard) Outline

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Endowed with an Extra Sense : Mathematics and Evolution

Endowed with an Extra Sense : Mathematics and Evolution Endowed with an Extra Sense : Mathematics and Evolution Todd Parsons Laboratoire de Probabilités et Modèles Aléatoires - Université Pierre et Marie Curie Center for Interdisciplinary Research in Biology

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Human Population Genomics Outline 1 2 Damn the Human Genomes. Small initial populations; genes too distant; pestered with transposons;

More information

Stochastic flows associated to coalescent processes

Stochastic flows associated to coalescent processes Stochastic flows associated to coalescent processes Jean Bertoin (1) and Jean-François Le Gall (2) (1) Laboratoire de Probabilités et Modèles Aléatoires and Institut universitaire de France, Université

More information

Lecture 18 : Ewens sampling formula

Lecture 18 : Ewens sampling formula Lecture 8 : Ewens sampling formula MATH85K - Spring 00 Lecturer: Sebastien Roch References: [Dur08, Chapter.3]. Previous class In the previous lecture, we introduced Kingman s coalescent as a limit of

More information

Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics

Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics 1 Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics Yun S. Song, Rune Lyngsø, and Jotun Hein Abstract Given a set D of input sequences, a genealogy for D can be

More information

MANY questions in population genetics concern the role

MANY questions in population genetics concern the role INVESTIGATION The Equilibrium Allele Frequency Distribution for a Population with Reproductive Skew Ricky Der 1 and Joshua B. Plotkin Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania

More information

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences Indian Academy of Sciences A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences ANALABHA BASU and PARTHA P. MAJUMDER*

More information

Quantitative trait evolution with mutations of large effect

Quantitative trait evolution with mutations of large effect Quantitative trait evolution with mutations of large effect May 1, 2014 Quantitative traits Traits that vary continuously in populations - Mass - Height - Bristle number (approx) Adaption - Low oxygen

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

Demography April 10, 2015

Demography April 10, 2015 Demography April 0, 205 Effective Population Size The Wright-Fisher model makes a number of strong assumptions which are clearly violated in many populations. For example, it is unlikely that any population

More information

Closed-form sampling formulas for the coalescent with recombination

Closed-form sampling formulas for the coalescent with recombination 0 / 21 Closed-form sampling formulas for the coalescent with recombination Yun S. Song CS Division and Department of Statistics University of California, Berkeley September 7, 2009 Joint work with Paul

More information

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego) The nested Kingman coalescent: speed of coming down from infinity by Jason Schweinsberg (University of California at San Diego) Joint work with: Airam Blancas Benítez (Goethe Universität Frankfurt) Tim

More information

I of a gene sampled from a randomly mating popdation,

I of a gene sampled from a randomly mating popdation, Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University

More information

CLOSED-FORM ASYMPTOTIC SAMPLING DISTRIBUTIONS UNDER THE COALESCENT WITH RECOMBINATION FOR AN ARBITRARY NUMBER OF LOCI

CLOSED-FORM ASYMPTOTIC SAMPLING DISTRIBUTIONS UNDER THE COALESCENT WITH RECOMBINATION FOR AN ARBITRARY NUMBER OF LOCI Adv. Appl. Prob. 44, 391 407 (01) Printed in Northern Ireland Applied Probability Trust 01 CLOSED-FORM ASYMPTOTIC SAMPLING DISTRIBUTIONS UNDER THE COALESCENT WITH RECOMBINATION FOR AN ARBITRARY NUMBER

More information

An introduction to mathematical modeling of the genealogical process of genes

An introduction to mathematical modeling of the genealogical process of genes An introduction to mathematical modeling of the genealogical process of genes Rikard Hellman Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats 2009:3 Matematisk

More information

A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks

A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks Alea 6, 25 61 (2009) A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks Matthias Birkner, Jochen Blath, Martin Möhle, Matthias Steinrücken

More information

Two viewpoints on measure valued processes

Two viewpoints on measure valued processes Two viewpoints on measure valued processes Olivier Hénard Université Paris-Est, Cermics Contents 1 The classical framework : from no particle to one particle 2 The lookdown framework : many particles.

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ), ABC estimation of the scaled effective population size. Geoff Nicholls, DTC 07/05/08 Refer to http://www.stats.ox.ac.uk/~nicholls/dtc/tt08/ for material. We will begin with a practical on ABC estimation

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

The two-parameter generalization of Ewens random partition structure

The two-parameter generalization of Ewens random partition structure The two-parameter generalization of Ewens random partition structure Jim Pitman Technical Report No. 345 Department of Statistics U.C. Berkeley CA 94720 March 25, 1992 Reprinted with an appendix and updated

More information

Theoretical Population Biology

Theoretical Population Biology Theoretical Population Biology 80 (20) 80 99 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb Generalized population models and the

More information

Population Genetics: a tutorial

Population Genetics: a tutorial : a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development

More information

The fate of beneficial mutations in a range expansion

The fate of beneficial mutations in a range expansion The fate of beneficial mutations in a range expansion R. Lehe (Paris) O. Hallatschek (Göttingen) L. Peliti (Naples) February 19, 2015 / Rutgers Outline Introduction The model Results Analysis Range expansion

More information

Diffusion Models in Population Genetics

Diffusion Models in Population Genetics Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura

More information

A CHARACTERIZATION OF ANCESTRAL LIMIT PROCESSES ARISING IN HAPLOID. Abstract. conditions other limit processes do appear, where multiple mergers of

A CHARACTERIZATION OF ANCESTRAL LIMIT PROCESSES ARISING IN HAPLOID. Abstract. conditions other limit processes do appear, where multiple mergers of A CHARACTERIATIO OF ACESTRAL LIMIT PROCESSES ARISIG I HAPLOID POPULATIO GEETICS MODELS M. Mohle, Johannes Gutenberg-University, Mainz and S. Sagitov 2, Chalmers University of Technology, Goteborg Abstract

More information

Genetic Variation in Finite Populations

Genetic Variation in Finite Populations Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

M A. Consistency and intractable likelihood for jump diffusions and generalised coalescent processes. Jere Juhani Koskela. Thesis

M A. Consistency and intractable likelihood for jump diffusions and generalised coalescent processes. Jere Juhani Koskela. Thesis M A D O S C Consistency and intractable likelihood for jump diffusions and generalised coalescent processes by Jere Juhani Koskela Thesis Submitted for the degree of Doctor of Philosophy Mathematics Institute

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

From Individual-based Population Models to Lineage-based Models of Phylogenies

From Individual-based Population Models to Lineage-based Models of Phylogenies From Individual-based Population Models to Lineage-based Models of Phylogenies Amaury Lambert (joint works with G. Achaz, H.K. Alexander, R.S. Etienne, N. Lartillot, H. Morlon, T.L. Parsons, T. Stadler)

More information

122 9 NEUTRALITY TESTS

122 9 NEUTRALITY TESTS 122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that

More information

Inventory Model (Karlin and Taylor, Sec. 2.3)

Inventory Model (Karlin and Taylor, Sec. 2.3) stochnotes091108 Page 1 Markov Chain Models and Basic Computations Thursday, September 11, 2008 11:50 AM Homework 1 is posted, due Monday, September 22. Two more examples. Inventory Model (Karlin and Taylor,

More information

URN MODELS: the Ewens Sampling Lemma

URN MODELS: the Ewens Sampling Lemma Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 3, 2014 1 2 3 4 Mutation Mutation: typical values for parameters Equilibrium Probability of fixation 5 6 Ewens Sampling

More information

So in terms of conditional probability densities, we have by differentiating this relationship:

So in terms of conditional probability densities, we have by differentiating this relationship: Modeling with Finite State Markov Chains Tuesday, September 27, 2011 1:54 PM Homework 1 due Friday, September 30 at 2 PM. Office hours on 09/28: Only 11 AM-12 PM (not at 3 PM) Important side remark about

More information

Crump Mode Jagers processes with neutral Poissonian mutations

Crump Mode Jagers processes with neutral Poissonian mutations Crump Mode Jagers processes with neutral Poissonian mutations Nicolas Champagnat 1 Amaury Lambert 2 1 INRIA Nancy, équipe TOSCA 2 UPMC Univ Paris 06, Laboratoire de Probabilités et Modèles Aléatoires Paris,

More information

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal Evolution of cooperation in finite populations Sabin Lessard Université de Montréal Title, contents and acknowledgements 2011 AMS Short Course on Evolutionary Game Dynamics Contents 1. Examples of cooperation

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Mathematical Biology. Sabin Lessard

Mathematical Biology. Sabin Lessard J. Math. Biol. DOI 10.1007/s00285-008-0248-1 Mathematical Biology Diffusion approximations for one-locus multi-allele kin selection, mutation and random drift in group-structured populations: a unifying

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Challenges when applying stochastic models to reconstruct the demographic history of populations. Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017 Outline 1 Introduction 2 Inverse

More information

RW in dynamic random environment generated by the reversal of discrete time contact process

RW in dynamic random environment generated by the reversal of discrete time contact process RW in dynamic random environment generated by the reversal of discrete time contact process Andrej Depperschmidt joint with Matthias Birkner, Jiří Černý and Nina Gantert Ulaanbaatar, 07/08/2015 Motivation

More information

Asymptotics and Simulation of Heavy-Tailed Processes

Asymptotics and Simulation of Heavy-Tailed Processes Asymptotics and Simulation of Heavy-Tailed Processes Department of Mathematics Stockholm, Sweden Workshop on Heavy-tailed Distributions and Extreme Value Theory ISI Kolkata January 14-17, 2013 Outline

More information

arxiv: v1 [math.pr] 29 Jul 2014

arxiv: v1 [math.pr] 29 Jul 2014 Sample genealogy and mutational patterns for critical branching populations Guillaume Achaz,2,3 Cécile Delaporte 3,4 Amaury Lambert 3,4 arxiv:47.772v [math.pr] 29 Jul 24 Abstract We study a universal object

More information

Surfing genes. On the fate of neutral mutations in a spreading population

Surfing genes. On the fate of neutral mutations in a spreading population Surfing genes On the fate of neutral mutations in a spreading population Oskar Hallatschek David Nelson Harvard University ohallats@physics.harvard.edu Genetic impact of range expansions Population expansions

More information

Bisection Ideas in End-Point Conditioned Markov Process Simulation

Bisection Ideas in End-Point Conditioned Markov Process Simulation Bisection Ideas in End-Point Conditioned Markov Process Simulation Søren Asmussen and Asger Hobolth Department of Mathematical Sciences, Aarhus University Ny Munkegade, 8000 Aarhus C, Denmark {asmus,asger}@imf.au.dk

More information

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Coalescent based demographic inference. Daniel Wegmann University of Fribourg Coalescent based demographic inference Daniel Wegmann University of Fribourg Introduction The current genetic diversity is the outcome of past evolutionary processes. Hence, we can use genetic diversity

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research education use, including for instruction at the authors institution

More information

Ancestral Inference in Molecular Biology. Simon Tavaré. Saint-Flour, 2001

Ancestral Inference in Molecular Biology. Simon Tavaré. Saint-Flour, 2001 Ancestral Inference in Molecular Biology This is page i Printer: Opaque this Simon Tavaré Departments of Biological Sciences, Mathematics and Preventive Medicine, University of Southern California Saint-Flour,

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

Mean-field dual of cooperative reproduction

Mean-field dual of cooperative reproduction The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time

More information

Learning Session on Genealogies of Interacting Particle Systems

Learning Session on Genealogies of Interacting Particle Systems Learning Session on Genealogies of Interacting Particle Systems A.Depperschmidt, A.Greven University Erlangen-Nuremberg Singapore, 31 July - 4 August 2017 Tree-valued Markov processes Contents 1 Introduction

More information

Stochastic Population Models: Measure-Valued and Partition-Valued Formulations

Stochastic Population Models: Measure-Valued and Partition-Valued Formulations Stochastic Population Models: Measure-Valued and Partition-Valued Formulations Diplomarbeit Humboldt-Universität zu Berlin Mathematisch-aturwissenschaftliche Fakultät II Institut für Mathematik eingereicht

More information

ANCESTRAL PROCESSES WITH SELECTION: BRANCHING AND MORAN MODELS

ANCESTRAL PROCESSES WITH SELECTION: BRANCHING AND MORAN MODELS **************************************** BANACH CENTER PUBLICATIONS, VOLUME ** INSTITUTE OF MATHEMATICS Figure POLISH ACADEMY OF SCIENCES WARSZAWA 2* ANCESTRAL PROCESSES WITH SELECTION: BRANCHING AND MORAN

More information

Intertwining of Markov processes

Intertwining of Markov processes January 4, 2011 Outline Outline First passage times of birth and death processes Outline First passage times of birth and death processes The contact process on the hierarchical group 1 0.5 0-0.5-1 0 0.2

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Markov processes and queueing networks

Markov processes and queueing networks Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution

More information

Mathematical Population Genetics II

Mathematical Population Genetics II Mathematical Population Genetics II Lecture Notes Joachim Hermisson March 20, 2015 University of Vienna Mathematics Department Oskar-Morgenstern-Platz 1 1090 Vienna, Austria Copyright (c) 2013/14/15 Joachim

More information

Competing selective sweeps

Competing selective sweeps Competing selective sweeps Sebastian Bossert Dissertation zur Erlangung des Doktorgrades der Fakultät für Mathematik und Physik der Albert-Ludwigs-Universität Freiburg im Breisgau Pr r t r rö r r t Pr

More information

Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee)

Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee) Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee) Michael Fuchs Department of Applied Mathematics National Chiao Tung University

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

Pathwise construction of tree-valued Fleming-Viot processes

Pathwise construction of tree-valued Fleming-Viot processes Pathwise construction of tree-valued Fleming-Viot processes Stephan Gufler November 9, 2018 arxiv:1404.3682v4 [math.pr] 27 Dec 2017 Abstract In a random complete and separable metric space that we call

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

The effects of a weak selection pressure in a spatially structured population

The effects of a weak selection pressure in a spatially structured population The effects of a weak selection pressure in a spatially structured population A.M. Etheridge, A. Véber and F. Yu CNRS - École Polytechnique Motivations Aim : Model the evolution of the genetic composition

More information

The Moran Process as a Markov Chain on Leaf-labeled Trees

The Moran Process as a Markov Chain on Leaf-labeled Trees The Moran Process as a Markov Chain on Leaf-labeled Trees David J. Aldous University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860 aldous@stat.berkeley.edu http://www.stat.berkeley.edu/users/aldous

More information

J. Stat. Mech. (2013) P01006

J. Stat. Mech. (2013) P01006 Journal of Statistical Mechanics: Theory and Experiment Genealogies in simple models of evolution Éric Brunet and Bernard Derrida Laboratoire de Physique Statistique, École ormale Supérieure, UPMC, Université

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

process on the hierarchical group

process on the hierarchical group Intertwining of Markov processes and the contact process on the hierarchical group April 27, 2010 Outline Intertwining of Markov processes Outline Intertwining of Markov processes First passage times of

More information

3. The Voter Model. David Aldous. June 20, 2012

3. The Voter Model. David Aldous. June 20, 2012 3. The Voter Model David Aldous June 20, 2012 We now move on to the voter model, which (compared to the averaging model) has a more substantial literature in the finite setting, so what s written here

More information

Computational statistics

Computational statistics Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f

More information

Problems for 3505 (2011)

Problems for 3505 (2011) Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.

More information

Reinforcement Learning: Part 3 Evolution

Reinforcement Learning: Part 3 Evolution 1 Reinforcement Learning: Part 3 Evolution Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Cross-entropy method for TSP Simple genetic style methods can

More information

A phase-type distribution approach to coalescent theory

A phase-type distribution approach to coalescent theory A phase-type distribution approach to coalescent theory Tuure Antonangeli Masteruppsats i matematisk statistik Master Thesis in Mathematical Statistics Masteruppsats 015:3 Matematisk statistik Maj 015

More information

On the inadmissibility of Watterson s estimate

On the inadmissibility of Watterson s estimate On the inadmissibility of Watterson s estimate A. Futschik F. Gach Institute of Statistics and Decision Support Systems University of Vienna ISDS-Kolloquium, 2007 Outline 1 Motivation 2 Inadmissibility

More information

Stopping-Time Resampling for Sequential Monte Carlo Methods

Stopping-Time Resampling for Sequential Monte Carlo Methods Stopping-Time Resampling for Sequential Monte Carlo Methods Yuguo Chen and Junyi Xie Duke University, Durham, USA Jun S. Liu Harvard University, Cambridge, USA July 27, 2004 Abstract Motivated by the statistical

More information