Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Similar documents
6 Introduction to Population Genetics

6 Introduction to Population Genetics

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Population Genetics I. Bio

The Wright-Fisher Model and Genetic Drift

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Demography April 10, 2015

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Genetic Variation in Finite Populations

The Combinatorial Interpretation of Formulas in Coalescent Theory

Estimating Evolutionary Trees. Phylogenetic Methods

Lecture 18 : Ewens sampling formula

How robust are the predictions of the W-F Model?

122 9 NEUTRALITY TESTS

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

7. Tests for selection

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Neutral Theory of Molecular Evolution

ms a program for generating samples under neutral models

Hidden Markov models in population genetics and evolutionary biology

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Processes of Evolution

Selection and Population Genetics

Computational Systems Biology: Biology X

I of a gene sampled from a randomly mating popdation,

Lecture 14 Chapter 11 Biology 5865 Conservation Biology. Problems of Small Populations Population Viability Analysis

Expected coalescence times and segregating sites in a model of glacial cycles

Stochastic Demography, Coalescents, and Effective Population Size

Intensive Course on Population Genetics. Ville Mustonen

Mathematical models in population genetics II

Genetic Drift in Human Evolution

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

Classical Selection, Balancing Selection, and Neutral Mutations

Introduction to Advanced Population Genetics

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

arxiv: v1 [q-bio.pe] 14 Jan 2016

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Frequency Spectra and Inference in Population Genetics

Australian bird data set comparison between Arlequin and other programs

Taming the Beast Workshop

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

An introduction to mathematical modeling of the genealogical process of genes

The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations

EVOLUTIONARY DISTANCES

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Introduction to population genetics & evolution

The coalescent process

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

Theoretical Population Biology

Rapid speciation following recent host shift in the plant pathogenic fungus Rhynchosporium

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Selection against A 2 (upper row, s = 0.004) and for A 2 (lower row, s = ) N = 25 N = 250 N = 2500

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Fitness landscapes and seascapes

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

Reproduction- passing genetic information to the next generation

Population Genetics: a tutorial

Big Idea #1: The process of evolution drives the diversity and unity of life

Supplementary Material to Full Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution

Robust demographic inference from genomic and SNP data

Neutral behavior of shared polymorphism

Evolution in a spatial continuum

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

arxiv: v1 [math.pr] 29 Jul 2014

Evolutionary Genetics: Part 0.2 Introduction to Population genetics

URN MODELS: the Ewens Sampling Lemma

Heaving Toward Speciation

Effective population size and patterns of molecular evolution and variation

Population Genetics & Evolution

Computing likelihoods under Λ-coalescents

POPULATION GENETICS MODELS FOR THE STATISTICS OF DNA SAMPLES UNDER DIFFERENT DEMOGRAPHIC SCENARIOS MAXIMUM LIKELIHOOD VERSUS APPROXIMATE METHODS

DNA-based species delimitation

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author

Concepts and Methods in Molecular Divergence Time Estimation

Compartmentalization detection

ROBUST METHODS FOR ESTIMATING ALLELE FREQUENCIES SHU-PANG HUANG

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

Unit 7: Evolution Guided Reading Questions (80 pts total)

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Genetic variation in natural populations: a modeller s perspective

Ancestral Inference in Molecular Biology. Simon Tavaré. Saint-Flour, 2001

Microevolution Changing Allele Frequencies

I. Short Answer Questions DO ALL QUESTIONS

SARAH P. OTTO and TROY DAY

Phylogenomics of closely related species and individuals

Evolutionary Models. Evolutionary Models

Macroevolution: Part III Sympatric Speciation

Darwinian Selection. Chapter 6 Natural Selection Basics 3/25/13. v evolution vs. natural selection? v evolution. v natural selection

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

From Individual-based Population Models to Lineage-based Models of Phylogenies

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

A phase-type distribution approach to coalescent theory

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang

The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent"

Transcription:

Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009

What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How? When? Is that gene under selection? Is/was there gene flow between sub-populations? Could I describe the evolutionary model under which the species has been evolving?

Wright-Fisher: a simple model Finite and constant population size N diploid individuals, 2N genes Random mating Non overlapping generations No selection, structure Only Drift N =5 2N =10 N =5 2N =10

Genetic Drift gene evolution past 2N =10 present

Genetic Drift gene evolution past 2N =10 present

Genetic Drift gene evolution past 2N =10 present

Genetic Drift gene evolution past 2N =10 present

Genetic Drift past 2N =10 present

Genetic Drift gene evolution past 2N =10 present

Genetic Drift gene evolution past 2N =10 present

Genetic Drift gene evolution past 2N =10 present

Backward in Time past 2N =10 present n =3

Coalescent Events past 2N =10 N =10 present n =3

Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N

Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N j generations Coalescence probability of 2 genes j generations back, within a sample of n genes p( T 2 j 1 1 = j) = 1 2N 2N 1 n ( n 1) 2

Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N j generations Coalescence probability of 2 genes j generations back, within a sample of n genes p( T 2 j 1 1 = j) = 1 2N 2N p(2, 2N) ( n 1) 2 probability 2 genes haven t coalesced during the first j-1 generations 1 n number of possibles pairs of 2 genes within n genes

Mathematical Theory: Kingman Wright-Fisher Model Coalescence probability of 2 genes at a given generation p(2, 2N) = 1/2N j generations Coalescence probability of 2 genes j generations back, within a sample of n genes p( T 2 j 1 1 = j) = 1 2N 2N 1 n ( n 1) 2 T 2 ~ geometric distribution For n << N, approximation by an exponential distribution Tn coalescence time between 2 genes: Tn ~ exponential distribution: E (Tn )= 4N / (n(n-1)) so E(T 2 ) = 2N

Highly Variable Process Same T n, T n-1,, T 3, but different T 2 distribution of T 2 Variable T n, T n-1,, T 3,T 2 All trees have same probability

Coalescent with Mutations 4N # mutations added on each branch of the tree following a Poisson process mi : E(mi)= µt µ : mutation rate/gene/generation t : length of branch T 2 N 2 T 3 T 4 T 5

How to summarize a genealogy? Infinite Site Model 5 mutations 5 segregating sites 15 genes/individuals

How to summarize a genealogy? Infinite Site Model 5 mutations alignment 5 segregating sites 15 genes/individuals

How to summarize a genealogy? π = 0.37 S = 5 θ S = 0.31 Estimations de 4Nµ - nucleotide diversity π (Tajima 1983) π = average # of pairwise differences ( π ) = µ. t = µ.2.2n = Nµ E 4 - thetaθ S (Watterson 1975) S = # segregating sites θ S = n 1 i= 1 S 1 i

And a set of genealogies? Expected distributions of summary statistics in a population under a standard model, 10,000 simulations with θ = 5 2500 θw π 2000 θ S 1500 π 1000 500 0 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 30.5

And a set of genealogies? Expected distributions of summary statistics in a population under a standard model, 10,000 simulations with θ = 5 2500 θw π 2000 θ S 1500 π 1000 500 0 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 30.5

Effect of Demography Gene genealogy affected by N variations standard bottleneck structured population population expansion Formalization of the evolutionary history of populations Coalescence theory will describe expected diversity in a sample under different evolution scenarii

Effect of Demography standard bottleneck structured population population expansion Affects polymorphism patterns ( θ S and π)

2.5 4Nµ 2 θ S 4Neµ Pi and π differentially affected Theta Watterson (from S) demographic expansion N 2 1.5 equilibrium bottleneck 1 N 0 θ S 0.5 N 1 π 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 time Molecular signatures of variations in population size

Tajima s D D π Var θs = D ~ N(0,1) under ( π θ ) S standard coalescent

Tajima s D D π Var θs = D ~ N(0,1) under ( π θ ) S standard coalescent 4Nµ 2.5 D 0 D 0 D 0 4Neµ Pi Theta Watterson (from S) 2 1.5 equilibrium bottleneck demographic expansion 1 0.5 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 θ S π time

Coalescent Simulations Hudson (2002): Program ms based on coalescence theory to generate simulated gene samples. Simplest model = Wright-Fisher population + complex: - changes in population size - population structure - migration - recombination - partial selfing - selection

Define the best model s parameter Real samples α parameter tested in model Simulated samples π π π α θ S θ S θ S Observed values ± 20% Estimated Likelihood i (α i )

Example of Maize Domestication Wright et al. (2005) Domestication bottleneck (k = 2.45) More severe bottleneck at some genes candidate selected genes? (k1 = 0.15, k2 = 2.45)

Coalescence and phylogeny Coalescence: evolutionary history of genes Phylogeny: evolutionary history of species! Highly diverged species Speciation history Population history

Coalescence and phylogeny Coalescence: evolutionary history of genes Phylogeny: evolutionary history of species! Close related species Incomplete lineage sorting Gene history Species history

More details in: Hein, J., Schierup, M.H. & Wiuf, C. Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory. (Oxford University Press, USA: 2005).