Challenges when applying stochastic models to reconstruct the demographic history of populations.

Similar documents
Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Stochastic Demography, Coalescents, and Effective Population Size

How robust are the predictions of the W-F Model?

Coalescence times for three genes provide sufficient information to distinguish population structure from population size changes

Mathematical models in population genetics II

Population Genetics I. Bio

6 Introduction to Population Genetics

Demography April 10, 2015

6 Introduction to Population Genetics

Hidden Markov models in population genetics and evolutionary biology

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

A sequentially Markov conditional sampling distribution for structured populations with migration and recombination

arxiv: v1 [q-bio.pe] 9 Nov 2015

Selection and Population Genetics

CS 68: BIOINFORMATICS. Prof. Sara Mathieson Swarthmore College Spring 2018

Computational Systems Biology: Biology X

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

Endowed with an Extra Sense : Mathematics and Evolution

The Wright-Fisher Model and Genetic Drift

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Introduction to population genetics & evolution

Diffusion Models in Population Genetics

Genetic Variation in Finite Populations

EVOLUTIONARY DISTANCES

Frequency Spectra and Inference in Population Genetics

ON COMPOUND POISSON POPULATION MODELS

Scalable Algorithms for Population Genomic Inference

(Write your name on every page. One point will be deducted for every page without your name!)

The Combinatorial Interpretation of Formulas in Coalescent Theory

Lecture 18 : Ewens sampling formula

Demographic Inference with Coalescent Hidden Markov Model

Expected coalescence times and segregating sites in a model of glacial cycles

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

Reading for Lecture 13 Release v10

Population Structure

THERE has been interest in analyzing population genomic

Population Genetics: a tutorial

O 3 O 4 O 5. q 3. q 4. Transition

The problem Lineage model Examples. The lineage model

ESTIMATION of recombination fractions using ped- ber of pairwise differences (Hudson 1987; Wakeley

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

Evolution in a spatial continuum

Phylogenomics of closely related species and individuals

Neutral Theory of Molecular Evolution

arxiv: v1 [q-bio.pe] 14 Jan 2016

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Full Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

How should we organize the diversity of animal life?

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Lecture 14 Chapter 11 Biology 5865 Conservation Biology. Problems of Small Populations Population Viability Analysis

URN MODELS: the Ewens Sampling Lemma

An introduction to Approximate Bayesian Computation methods

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Processes of Evolution

Genetic Drift in Human Evolution

Theoretical Population Biology

I of a gene sampled from a randomly mating popdation,

Introduction to Advanced Population Genetics

Estimating Evolutionary Trees. Phylogenetic Methods

Populations in statistical genetics

How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Mathematical Population Genetics II

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Mathematical Population Genetics II

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

Modern Evolutionary Classification. Section 18-2 pgs

ms a program for generating samples under neutral models

Effective population size and patterns of molecular evolution and variation

Formalizing the gene centered view of evolution

Alignment Algorithms. Alignment Algorithms

A Phylogenetic Network Construction due to Constrained Recombination

Evolutionary Computation

Learning ancestral genetic processes using nonparametric Bayesian models

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Runaway. demogenetic model for sexual selection. Louise Chevalier. Jacques Labonne

Introduction to Artificial Intelligence (AI)

HIDDEN MARKOV MODELS

Supplementary Material to Full Likelihood Inference from the Site Frequency Spectrum based on the Optimal Tree Resolution

Genetics. Metapopulations. Dept. of Forest & Wildlife Ecology, UW Madison

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal

Surfing genes. On the fate of neutral mutations in a spreading population

THE analysis of population subdivision has been a ble to transform an estimate of F ST into an estimate of

Evolutionary Models. Evolutionary Models

CLOSED-FORM ASYMPTOTIC SAMPLING DISTRIBUTIONS UNDER THE COALESCENT WITH RECOMBINATION FOR AN ARBITRARY NUMBER OF LOCI

Metapopulation models for historical inference

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

On the inadmissibility of Watterson s estimate

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Introduction to Wright-Fisher Simulations. Ryan Hernandez

The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

An introduction to mathematical modeling of the genealogical process of genes

The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent"

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution

Probability and Time: Hidden Markov Models (HMMs)

Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data

Transcription:

Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017

Outline 1 Introduction 2 Inverse Instantaneous Coalescence Rate (IICR) 3 Applications of the IICR 4 Conclusions

Introduction Inverse Instantaneous Coalescence Rate (IICR) Applications of the IICR Conclusions Presentation Outline 1 Introduction 2 Inverse Instantaneous Coalescence Rate (IICR) 3 Applications of the IICR 4 Conclusions Willy Rodríguez The IICR as a summary statistic

Goals of Population Genetics Build models that help us understand the main evolutionary events that gave rise to the observed patterns of genetic variation. Evolutionary forces

Reconstructing demographic history from DNA Reconstructing demographic history from DNA

Wright-Fisher model Wright-Fisher model (1930-1931) Non-overlapping generations. Constant population size (2N genes, 1 indiv. = one gene). Random mating (panmixia).

Wright-Fisher model and key concepts Key concepts Coalescence events: Individuals I 3 and I 5 coalesce (have a common ancestor) 4 generations before the present. Probability that two individuals coalesce in the previous 1 generation: 2N. Probability that they do not coalesce: (1 1 2N ). G 2 : Number of generations to reach the ancestor of 2 genes. P(G 2 > l) = (1 1 2N )l. Let 2N be large, G 2 2N D T 2,with T 2 Exp(1).

The Coalescent and some approximations

The Coalescent and some approximations Kingman, J., 1982. The coalescent. A continuous-time Markov chain allowing to describe family relationships between individuals backward in time.

The Coalescent and some approximations Kingman, J., 1982. The coalescent. A continuous-time Markov chain allowing to describe family relationships between individuals backward in time. Hudson, R. R., 1983. The coalescent with recombination. A process allowing to trace back lineages, incorporating recombination events.

The Coalescent and some approximations Kingman, J., 1982. The coalescent. A continuous-time Markov chain allowing to describe family relationships between individuals backward in time. Hudson, R. R., 1983. The coalescent with recombination. A process allowing to trace back lineages, incorporating recombination events. Herbots, H. M. J. D. 1994. The structured coalescent. An extension of Kingman s coalescent to subdivided populations.

The Coalescent and some approximations Kingman, J., 1982. The coalescent. A continuous-time Markov chain allowing to describe family relationships between individuals backward in time. Hudson, R. R., 1983. The coalescent with recombination. A process allowing to trace back lineages, incorporating recombination events. Herbots, H. M. J. D. 1994. The structured coalescent. An extension of Kingman s coalescent to subdivided populations. McVean & Cardin, 2005. The Sequentially Markovian Coalescent. An approximation to the coalescent with recombination using a Markov chain along the genome.

Variable population size. Relation with T 2 Griffiths & Tavaré. 1994. Distribution of T 2, function of population size change. Population evolving with deterministically varying size. λ(t) = N(t), with 2N: present population size (genes). 2N t 1 Λ(t) = 0 λ(u) du. Distribution of the coalescence times of two genes (T 2 ). P(T 2 > t) = 1 F T2 (t) = e Λ(t) f T2 (t) = ( F T2 (t) ) = 1 λ(t) e Λ(t). Objective: Reconstruct the function λ.

Methods for estimating past population size changes Methods for estimating past population size changes MSVAR (1999). Skyline plot (2000). Bayesian skyline plot (2005). dadi (2009) PSMC (2011). DiCal (2013). VarEff (2014). MSMC (2014). stairway plot (2015) PopSizeABC (2016). SMC++ (2017)

Methods for estimating past population size changes Methods for estimating past population size changes MSVAR (1999). Skyline plot (2000). Bayesian skyline plot (2005). dadi (2009) PSMC (2011). DiCal (2013). VarEff (2014). MSMC (2014). stairway plot (2015) PopSizeABC (2016). SMC++ (2017)

Methods for estimating past population size changes Methods for estimating past population size changes PSMC (2011).

The PSMC. Markov Chain along the genome Li and Durbin, 2011.Pairwise Sequentially Markovian Coalescent. The PSMC method Input data: one diploid genome (one diploid individual). Based on the Sequentially Markovian Coalescent (SMC. McVean & Cardin, 2005). Uses a relation between recombinations, mutations and λ k in a model of variable population size. Describes a Hidden Markov Model along the genome, allowing to estimate values of population size in the past. Developed to be applied on long DNA sequences (e.g.: one chromosome.)

PSMC. Markov Chain along the genome

PSMC. Markov Chain along the genome

PSMC. Markov Chain along the genome

PSMC. Markov Chain along the genome

PSMC. Markov Chain along the genome

PSMC. Hidden Markov Chain along the genome Discrete state space, discrete-time HMM. Hidden states: coalescence times (T 2 ) at position a (T 2 [t k, t k+1 ]). Observed states: homozygous or heterozygous at position a. Parameters: scaled mutation rate (θ), scaled recombination rate (ρ), demographic history (λ k ).

One example (the psmc) The psmc method (Heng Li & Richard Durbin, 2011) can be affected by population structure.

One example (the psmc). A simple structure model n islands model

One example (the psmc). A simple structure model What is going on here?

Introduction Inverse Instantaneous Coalescence Rate (IICR) Applications of the IICR Conclusions Presentation Outline 1 Introduction 2 Inverse Instantaneous Coalescence Rate (IICR) 3 Applications of the IICR 4 Conclusions Willy Rodríguez The IICR as a summary statistic

Coalescence times and population size changes. The panmixia hypothesis (random mating). What if we do not have random mating?

Instantaneous Coalescence Rate Griffiths & Tavaré (1994). Variable population size (N e ) N e (t) = N 0 λ(t) P(T 2 > t) = e t 0 1 λ(u) du Inverse Instantaneous Coalescence Rate (IICR). Mazet et al. 2015 log(p(t 2 > t)) = 1 λ(t) λ(t) = P(T 2>t) f T2 (t) = 1 F T 2 (t) f T2 (t). λ can be evaluated at any t using only the T 2 distribution. Valid for any model. 1 T 2 can be interpreted as a lifetime. λ(t) : instantaneous coalescence rate (hazard function of failure rate). λ(t) may be disconnected with size changes.

Based on T 2, it is not possible to distinguish structure from a panmictic model with population size change.

Based on T 2, it is not possible to distinguish structure from a panmictic model with population size change. We can estimate the IICR using any method based on the panmixia hypothesis (ex: psmc).

Based on T 2, it is not possible to distinguish structure from a panmictic model with population size change. We can estimate the IICR using any method based on the panmixia hypothesis (ex: psmc). We can predict the IICR for any model by doing simulations Take a vector of independent values of T 2. Then: ˆλ(t) = 1 ˆF T2 (t). ˆf T2 (t)

Based on T 2, it is not possible to distinguish structure from a panmictic model with population size change. We can estimate the IICR using any method based on the panmixia hypothesis (ex: psmc). We can predict the IICR for any model by doing simulations Take a vector of independent values of T 2. Then: ˆλ(t) = 1 ˆF T2 (t). ˆf T2 (t) https://github.com/willyrv/iicrestimator

Explicit expression for the IICR under the n islands model.

Introduction Inverse Instantaneous Coalescence Rate (IICR) Applications of the IICR Conclusions Presentation Outline 1 Introduction 2 Inverse Instantaneous Coalescence Rate (IICR) 3 Applications of the IICR 4 Conclusions Willy Rodríguez The IICR as a summary statistic

Trace the IICR for non panmictic models L. Chikhi et al. The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice (to appear HDY)

Trace the IICR for non panmictic models L. Chikhi et al. The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice (to appear HDY)

Trace the IICR for non panmictic models L. Chikhi et al. The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice (to appear HDY)

Considering alternative scenarios

Considering alternative scenarios Courve fitting (https://github.com/maxhalford/stsicmr-inference)

Considering alternative scenarios Courve fitting (https://github.com/maxhalford/stsicmr-inference)

Considering alternative scenarios Courve fitting (https://github.com/maxhalford/stsicmr-inference)

Considering alternative scenarios PSMC in human data. Alternative scenario: n islands model with changes in gene flow and recent grow

Considering alternative scenarios PSMC in human data. Alternative scenario: n islands model with changes in gene flow and recent grow Summary statistic in an ABC framework

Introduction Inverse Instantaneous Coalescence Rate (IICR) Applications of the IICR Conclusions Presentation Outline 1 Introduction 2 Inverse Instantaneous Coalescence Rate (IICR) 3 Applications of the IICR 4 Conclusions Willy Rodríguez The IICR as a summary statistic

Conclusions Identifiability problem when using T 2 (T k ) based methods. Reconstructed skyline plots are not meaningless but should be interpreted carefully (as an IICR) in a general case. For some models it is possible to have an explicit expression of the IICR. The IICR can be used as a summary statistic to explain the data under any model, provided we can simulate coalescence times. The IICR can be used as a summary statistic in an ABC framework.

Collaborators

Acknowledgement

Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017

Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017