Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009
What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How? When? Is that gene under selection? Is/was there gene flow between sub-populations? Could I describe the evolutionary model under which the species has been evolving?
Wright-Fisher: a simple model Finite and constant population size N diploid individuals, 2N genes Random mating Non overlapping generations No selection, structure Only Drift N =5 2N =10 N =5 2N =10
Genetic Drift gene evolution past 2N =10 present
Genetic Drift gene evolution past 2N =10 present
Genetic Drift gene evolution past 2N =10 present
Genetic Drift gene evolution past 2N =10 present
Genetic Drift past 2N =10 present
Genetic Drift gene evolution past 2N =10 present
Genetic Drift gene evolution past 2N =10 present
Genetic Drift gene evolution past 2N =10 present
Backward in Time past 2N =10 present n =3
Coalescent Events past 2N =10 N =10 present n =3
Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N
Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N j generations Coalescence probability of 2 genes j generations back, within a sample of n genes p( T 2 j 1 1 = j) = 1 2N 2N 1 n ( n 1) 2
Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N j generations Coalescence probability of 2 genes j generations back, within a sample of n genes p( T 2 j 1 1 = j) = 1 2N 2N p(2, 2N) ( n 1) 2 probability 2 genes haven t coalesced during the first j-1 generations 1 n number of possibles pairs of 2 genes within n genes
Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N j generations Coalescence probability of 2 genes j generations back, within a sample of n genes p( T 2 j 1 1 = j) = 1 2N 2N 1 n ( n 1) 2 T 2 ~ geometric distribution For n << N, approximation by an exponential distribution Tn coalescence time between 2 genes: Tn ~ exponential distribution: E (Tn )= 4N / (n(n-1)) so E(T 2 ) = 2N
Highly Variable Process Same T n, T n-1,, T 3, but different T 2 distribution of T 2 Variable T n, T n-1,, T 3,T 2 All trees have same probability
Coalescent with Mutations 4N # mutations added on each branch of the tree following a Poisson process mi : E(mi)= µt µ : mutation rate/gene/generation t : length of branch T 2 N 2 T 3 T 4 T 5
How to summarize a genealogy? Infinite Site Model 5 mutations 5 segregating sites 15 genes/individuals
How to summarize a genealogy? Infinite Site Model 5 mutations alignment 5 segregating sites 15 genes/individuals
How to summarize a genealogy? π = 0.37 S = 5 θ S = 0.31 Estimations de 4Nµ - nucleotide diversity π (Tajima 1983) π = average # of pairwise differences ( π ) = µ. t = µ.2.2n = Nµ E 4 - thetaθ S (Watterson 1975) S = # segregating sites θ S = n 1 i= 1 S 1 i
And a set of genealogies? Expected distributions of summary statistics in a population under a standard model, 10,000 simulations with θ = 5 2500 θw π 2000 θ S 1500 π 1000 500 0 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 30.5
And a set of genealogies? Expected distributions of summary statistics in a population under a standard model, 10,000 simulations with θ = 5 2500 θw π 2000 θ S 1500 π 1000 500 0 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 30.5
Effect of Demography Gene genealogy affected by N variations standard bottleneck structured population population expansion Formalization of the evolutionary history of populations Coalescence theory will describe expected diversity in a sample under different evolution scenarii
Effect of Demography standard bottleneck structured population population expansion Affects polymorphism patterns ( θ S and π)
2.5 4Nµ 2 θ S 4Neµ Pi and π differentially affected Theta Watterson (from S) demographic expansion N 2 1.5 equilibrium bottleneck 1 N 0 θ S 0.5 N 1 π 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 time Molecular signatures of variations in population size
Tajima s D D π Var θs = D ~ N(0,1) under ( π θ ) S standard coalescent
Tajima s D D π Var θs = D ~ N(0,1) under ( π θ ) S standard coalescent 4Nµ 2.5 D 0 D 0 D 0 4Neµ Pi Theta Watterson (from S) 2 1.5 equilibrium bottleneck demographic expansion 1 0.5 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 θ S π time
Coalescent Simulations Hudson (2002): Program ms based on coalescence theory to generate simulated gene samples. Simplest model = Wright-Fisher population + complex: - changes in population size - population structure - migration - recombination - partial selfing - selection
Define the best model s parameter Real samples α parameter tested in model Simulated samples π π π α θ S θ S θ S Observed values ± 20% Estimated Likelihood i (α i )
Example of Maize Domestication Wright et al. (2005) Domestication bottleneck (k = 2.45) More severe bottleneck at some genes candidate selected genes? (k1 = 0.15, k2 = 2.45)
Coalescence and phylogeny Coalescence: evolutionary history of genes Phylogeny: evolutionary history of species! Highly diverged species Speciation history Population history
Coalescence and phylogeny Coalescence: evolutionary history of genes Phylogeny: evolutionary history of species! Close related species Incomplete lineage sorting Gene history Species history
More details in: Hein, J., Schierup, M.H. & Wiuf, C. Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory. (Oxford University Press, USA: 2005).