Inférence en génétique des populations IV.

Similar documents
Lecture 18 : Ewens sampling formula

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

How robust are the predictions of the W-F Model?

Population Genetics I. Bio

Mathematical Population Genetics II

Population Genetics: a tutorial

STAT 536: Genetic Statistics

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Computational Systems Biology: Biology X

Frequency Spectra and Inference in Population Genetics

Mathematical Population Genetics II

Population Structure

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Genetic Variation in Finite Populations

Methods for Cryptic Structure. Methods for Cryptic Structure

Selection and Population Genetics

THE CENTRAL CONCEPTS OF INCLUSIVE FITNESS 3000 word article in the Oxford University Press Encyclopedia of Evolution, January 2002.

Evolutionary dynamics of populations with genotype-phenotype map

Quantitative trait evolution with mutations of large effect

6 Introduction to Population Genetics

Mathematical models in population genetics II

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

The Wright-Fisher Model and Genetic Drift

Demography April 10, 2015

Stochastic Demography, Coalescents, and Effective Population Size

Statistics & Data Sciences: First Year Prelim Exam May 2018

I of a gene sampled from a randomly mating popdation,

Diffusion Models in Population Genetics

Lecture 9. QTL Mapping 2: Outbred Populations

6 Introduction to Population Genetics

Cooperation Achieved by Migration and Evolution in a Multilevel Selection Context

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal

Evolution of quantitative traits

Mathematical Population Genetics. Introduction to the Stochastic Theory. Lecture Notes. Guanajuato, March Warren J Ewens

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013

BIOL Evolution. Lecture 9

Contrasts for a within-species comparative method

Breeding Values and Inbreeding. Breeding Values and Inbreeding

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

Association studies and regression

Problems on Evolutionary dynamics

Overview. Background

Population genetics snippets for genepop

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Challenges when applying stochastic models to reconstruct the demographic history of populations.

arxiv: v1 [q-bio.pe] 14 Jan 2016

Supplementary Figures.

Microevolution 2 mutation & migration

Introductory seminar on mathematical population genetics

Unit roots in vector time series. Scalar autoregression True model: y t 1 y t1 2 y t2 p y tp t Estimated model: y t c y t1 1 y t1 2 y t2

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Index. Causality concept of, 128 selection and, 139, 298. Adaptation, 6, 7. See also Biotic adaptation. defining, 55, 133, 301

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Chapter 8: Introduction to Evolutionary Computation

Computational statistics

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

Time Series: Theory and Methods

Endowed with an Extra Sense : Mathematics and Evolution

URN MODELS: the Ewens Sampling Lemma

Statistical population genetics

Mathematical Biology. Sabin Lessard

Darwinian Selection. Chapter 7 Selection I 12/5/14. v evolution vs. natural selection? v evolution. v natural selection

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

Inbreeding depression due to stabilizing selection on a quantitative character. Emmanuelle Porcher & Russell Lande

Surfing genes. On the fate of neutral mutations in a spreading population

Case-Control Association Testing. Case-Control Association Testing

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α:

Lecture 4: Allelic Effects and Genetic Variances. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

(Genome-wide) association analysis

The co-evolution of culturally inherited altruistic helping and cultural transmission under random group formation

Covariance function estimation in Gaussian process regression

Closed-form sampling formulas for the coalescent with recombination

Fitness landscapes and seascapes

MIXED MODELS THE GENERAL MIXED MODEL

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

The Admixture Model in Linkage Analysis

Recovery of a recessive allele in a Mendelian diploid model

STAT 730 Chapter 4: Estimation

Inference For High Dimensional M-estimates. Fixed Design Results

EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

1.3 Convergence of Regular Markov Chains

Brief history of The Prisoner s Dilemma (From Harman s The Price of Altruism)

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Selection on multiple characters

Lecture 2: Introduction to Quantitative Genetics

DNA-based species delimitation

From Individual-based Population Models to Lineage-based Models of Phylogenies

Chapter 3 - Temporal processes

Effective population size and patterns of molecular evolution and variation

OVERVIEW. L5. Quantitative population genetics

Partitioning the Genetic Variance

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University

Transcription:

Inférence en génétique des populations IV. François Rousset & Raphaël Leblois M2 Biostatistiques 2015 2016 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 1 / 33

Modeling and analysis of covariances in allele frequencies Spatial Intra-family FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 2 / 33

Modeling and analysis of covariances in allele frequencies Modeling: important parameters in selection processes FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 3 / 33

Modeling and analysis of covariances in allele frequencies Modeling: important parameters in selection processes Artificial selection, or how to increase milk production by selecting bulls Wright s shifting balance theory General formal theory of selection with interactions between relatives FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 3 / 33

Modeling and analysis of covariances in allele frequencies Modeling: important parameters in selection processes Artificial selection, or how to increase milk production by selecting bulls Wright s shifting balance theory General formal theory of selection with interactions between relatives Statistical analysis: relatively simple (and routine) methods of inference FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 3 / 33

Modeling and analysis of covariances in allele frequencies Modeling: important parameters in selection processes Artificial selection, or how to increase milk production by selecting bulls Wright s shifting balance theory General formal theory of selection with interactions between relatives Statistical analysis: relatively simple (and routine) methods of inference Two main spatial models (Wright): island model, and isolation-by-distance Estimate Nm or the neighborhood size 4Dπσ 2 A few statistically desirable properties Analytical understanding FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 3 / 33

Modeling and analysis of covariances in allele frequencies Modeling: important parameters in selection processes Artificial selection, or how to increase milk production by selecting bulls Wright s shifting balance theory General formal theory of selection with interactions between relatives Statistical analysis: relatively simple (and routine) methods of inference Two main spatial models (Wright): island model, and isolation-by-distance Estimate Nm or the neighborhood size 4Dπσ 2 A few statistically desirable properties Analytical understanding Huge literature, often based on very naive ideas FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 3 / 33

Outline An example of the role of spatial structure in evolution Modeling of the role of spatial structure Formal modelling of covariances: diffusion vs. coalescence Genealogical interpretations of covariances and regressions An example of statistical inference based on covariances FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 4 / 33

Outline An example of the role of spatial structure in evolution Modeling of the role of spatial structure Formal modelling of covariances: diffusion vs. coalescence Genealogical interpretations of covariances and regressions An example of statistical inference based on covariances FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 5 / 33

Example: evolution of social behaviour Siderophore production as public good D Pseudomonas aeruginosa C D A Defector that does not produce the public good still benefits from those produced by the Cooperators. Public good production should be counter-selected. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 6 / 33

Biologists test theories about the importance of dispersal Controlling bacterial dispersal Different agar concentrations: FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 7 / 33

Public goods production mean relative mutant fitness ± s.e. (v) 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0% 0.25% 0.5% 1% agar concentration Viscosity Siderophore-defective strain wins Siderophore-producing strain wins Kümmerli et al, PRSB, 2009 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 8 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

The effect of dispersal almost explained Same logic important for virtually any trait evolving in a population spatially subdivided in (small) subpopulations. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 9 / 33

Outline An example of the role of spatial structure in evolution Modeling of the role of spatial structure Formal modelling of covariances: diffusion vs. coalescence Genealogical interpretations of covariances and regressions An example of statistical inference based on covariances FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 10 / 33

Formulation in terms of partial regression We have seen that (for deterministic models) the change in allele frequency can be written in terms of allelic fitnesses w and w, as p p =(w w )p (1 p ) =β w,1 Var(1 ) = Cov[w i, 1 (i))] where β w,1 is the regression coefficient of an individual s allelic fitness w i on her genotype 1 (i) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 11 / 33

Formulation in terms of partial regression We have seen that (for deterministic models) the change in allele frequency can be written in terms of allelic fitnesses w and w, as p p =(w w )p (1 p ) =β w,1 Var(1 ) = Cov[w i, 1 (i))] where β w,1 is the regression coefficient of an individual s allelic fitness w i on her genotype 1 (i) We can apply the standard recursive relationship for regression coefficients between a response variable W and two predictors F and N: β W,F = β W,F.N + β N,F β W,N.F. here for F := 1, the indicator variable for the (focal) individual s genotype, and N := the mean value of the indicator variable for a group of social partners FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 11 / 33

Formulation in terms of partial regression We have seen that (for deterministic models) the change in allele frequency can be written in terms of allelic fitnesses w and w, as p p =(w w )p (1 p ) =β w,1 Var(1 ) = Cov[w i, 1 (i))] where β w,1 is the regression coefficient of an individual s allelic fitness w i on her genotype 1 (i) This entails the two-predictor regression expression for the change in allele frequency: p p = (β W,F.N + β N,F β W,N.F ) Var(1 ) where β N,F is the regression coefficient of the allele frequency among social partners to allele presence in the focal individual. β N,F is (one definition of) relatedness in the genetic literature. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 11 / 33

Outline An example of the role of spatial structure in evolution Modeling of the role of spatial structure Formal modelling of covariances: diffusion vs. coalescence Genealogical interpretations of covariances and regressions An example of statistical inference based on covariances FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 12 / 33

Probability of identity Probability that two gene copies are of same allelic state = Identity in state FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 13 / 33

Probability of identity Probability that two gene copies are of same allelic state = Identity in state Cf Wright-Fisher model with two alleles a u A v FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 13 / 33

Probability of identity Probability that two gene copies are of same allelic state = Identity in state Cf Wright-Fisher model with two alleles a u A v pdf of allele frequency f (p) =Beta(α = 2Nu, β = 2Nv) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 13 / 33

Probability of identity Probability that two gene copies are of same allelic state = Identity in state Cf Wright-Fisher model with two alleles a u A v pdf of allele frequency f (p) =Beta(α = 2Nu, β = 2Nv) 1 P(identity) = 0 f (x) dx FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 13 / 33

Probability of identity Probability that two gene copies are of same allelic state = Identity in state Cf Wright-Fisher model with two alleles a u A v pdf of allele frequency f (p) =Beta(α = 2Nu, β = 2Nv) 1 P(identity) = [x 2 + (1 x) 2 ]f (x) dx = u + 2Nu2 + v + 2Nv 2 0 (u + v)[1 + 2N(u + v)] 1 1 + 4Nu if u = v FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 13 / 33

Probability of identity Probability that two gene copies are of same allelic state = Identity in state Cf Wright-Fisher model with two alleles a u A v pdf of allele frequency f (p) =Beta(α = 2Nu, β = 2Nv) 1 P(identity) = [x 2 + (1 x) 2 ]f (x) dx = u + 2Nu2 + v + 2Nv 2 0 (u + v)[1 + 2N(u + v)] 1 1 + 4Nu if u = v Mathematical overkill! FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 13 / 33

Probability of identity: coalescent argument Infinite allele model identity by descent FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 14 / 33

Probability of identity: coalescent argument Infinite allele model identity by descent For c t the probability that coalescence of a a given pair of genes occurred t generations ago, P(identity) = Q = c t...? t=0 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 14 / 33

Probability of identity: coalescent argument Infinite allele model identity by descent For c t the probability that coalescence of a a given pair of genes occurred t generations ago, P(identity) = Q = c t (1 µ) 2t = t=0 c t γ t for γ (1 µ) 2 t=0 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 14 / 33

Probability of identity: coalescent argument Infinite allele model identity by descent For c t the probability that coalescence of a a given pair of genes occurred t generations ago, P(identity) = Q = c t (1 µ) 2t = t=0 c t γ t for γ (1 µ) 2 t=0 Generating function FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 14 / 33

Probability of identity: coalescent argument Infinite allele model identity by descent For c t the probability that coalescence of a a given pair of genes occurred t generations ago, P(identity) = Q = Generating function c t (1 µ) 2t = t=0 For Wright-Fisher model: Q = 1 c t γ t for γ (1 µ) 2 t=0 ( 1 1 1 ) t 1 γ t γ = N N γ + N(1 γ) 1 1 + 2Nµ when N and µ 0 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 14 / 33

Infinite island model; Wright s F ST A very simple, easily solved model Immigration m Frequency of some focal allele locally, p = p i for given island i, and globally, p. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 15 / 33

Infinite island model; Wright s F ST Diffusion approximation for frequency in each island i: P i Const p 2Nm p 1 i (1 p i ) 2Nm(1 p) 1 =Beta(2Nm p, 2Nm(1 p)) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 16 / 33

Infinite island model; Wright s F ST Diffusion approximation for frequency in each island i: P i Const p 2Nm p 1 i (1 p i ) 2Nm(1 p) 1 =Beta(2Nm p, 2Nm(1 p)) We have seen that the stationarity distribution satisfies: 0 = f (p, t) t a(p)f (p, t) = + 2 b(p)f (p, t) p 2 p 2 f (p) exp ( 2 p a(x)/b(x) dx ) b(p) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 16 / 33

Infinite island model; Wright s F ST Diffusion approximation for frequency in each island i: P i Const p 2Nm p 1 i (1 p i ) 2Nm(1 p) 1 =Beta(2Nm p, 2Nm(1 p)) Wright-Fisher model: b(p) = p(1 p) and a(p) = N( vp + u(1 p)) = N[(u + v)(p u/(u + v))] Frequency P Const p 2Nu 1 (1 p) 2Nv 1 =Beta(α = 2Nu, β = 2Nv) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 16 / 33

Infinite island model; Wright s F ST Diffusion approximation for frequency in each island i: P i Const p 2Nm p 1 i (1 p i ) 2Nm(1 p) 1 =Beta(2Nm p, 2Nm(1 p)) Wright-Fisher model: b(p) = p(1 p) and a(p) = N( vp + u(1 p)) = N[(u + v)(p u/(u + v))] Frequency P Const p 2Nu 1 (1 p) 2Nv 1 =Beta(α = 2Nu, β = 2Nv) Island model b(p i ) = p i (1 p i ) and a(p i ) = N(m(p i p)) Frequency P Beta[α = 2Nm p, β = 2Nm(1 p)] FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 16 / 33

Infinite island model; Wright s F ST Diffusion approximation for frequency in each island i: P i Const p 2Nm p 1 i (1 p i ) 2Nm(1 p) 1 =Beta(2Nm p, 2Nm(1 p)) F ST Var p i p(1 p) = E(p2 i ) p2 p(1 p) = 1 1 + 2Nm More generally variances and covariances of allele frequencies can be expressed in terms of probabilities of identity of pair of genes. Here locally, P( p i ; p) = F ST p i + (1 F ST ) p 2 (with mutation at a low enough rate: proven in next slides); hence globally. P( ; p) = F ST p + (1 F ST ) p 2 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 16 / 33

Outline An example of the role of spatial structure in evolution Modeling of the role of spatial structure Formal modelling of covariances: diffusion vs. coalescence Genealogical interpretations of covariances and regressions Infinite island model F ST, separation of time scales Finite island model coalescence times, separation of time scales Isolation by distance separation of time scales Robustness of regression coefficients An example of statistical inference based on covariances FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 17 / 33

Genealogical interpretation of F ST R R R R R R R R R R R R R R R R R R R R R R R R R FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 18 / 33

Genealogical interpretation of F ST past generations -2-1 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 18 / 33

Genealogical interpretation of F ST past generations p (or p i ) -2-1 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 18 / 33

Genealogical interpretation of F ST past generations p (or p i ) p 2-2 -1 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 18 / 33

Genealogical interpretation of F ST past generations F ST p (or p i ) (1 F ST ) p 2-2 -1 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 18 / 33

Wright s F ST redefined (For any genetic marker) One can describe the correlation of allele types within some given sub-population relative to the allele types of random genes in the total population as F ST = Q w Q b 1 Q b in terms of probabilities of identity Q w and Q b within and between subpopulations. This leads to relatively simple analysis. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 19 / 33

General facts about Wright s F -statistics ˆF ST = ˆQ w ˆQ b 1 ˆQ b F IS = Q 0 Q w 1 Q w FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 20 / 33

General facts about Wright s F -statistics Naive but straightforward estimators ˆF ST = ˆQ w ˆQ b 1 ˆQ b F IS = Q 0 Q w 1 Q w FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 20 / 33

General facts about Wright s F -statistics ˆF ST = ˆQ w ˆQ b 1 ˆQ b F IS = Q 0 Q w 1 Q w Relationship with average coalescence times Q = c t γ t for γ (1 µ) 2 = 1 2µ E(T ) + O(µ 2 ) t=0 F ST E(T b) E(T w ) E(T b ) when µ 0. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 20 / 33

Modelling in terms of probabilities of identity Q Example: finite island model Relationship of Q w with Q w, Q b one generation earlier? where and FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 21 / 33

Modelling in terms of probabilities of identity Q Example: finite island model Relationship of Q w with Q w, Q b one generation earlier? where Q(t + 1) = γa[q(t) + c(t)] and FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 21 / 33

Modelling in terms of probabilities of identity Q Example: finite island model Relationship of Q w with Q w, Q b one generation earlier? Q(t + 1) = γa[q(t) + c(t)] where and Q(t) = ( Qw (t) Q b (t) ) and c(t) = ( 1 Qw(t) N 0 ) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 21 / 33

Modelling in terms of probabilities of identity Q Example: finite island model Relationship of Q w with Q w, Q b one generation earlier? where and Q(t) = A = ( Q(t + 1) = γa[q(t) + c(t)] ( Qw (t) Q b (t) ) and c(t) = ( 1 Qw(t) N 0 ) a 11 = (1 m) 2 + m2 n d 1 1 a 11 1 a 11 a 11 n d 1 n d 1 is the transition matrix for the random walk of two gene lineages between the state within same deme and the complementary state. ) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 21 / 33

The stationary solution in an informative form Q = γ(i γa) 1 Ac Eigensystem of A λ 1 = 1, e 1 = (1, 1) ( ) 2, λ 2 = 1 m n d n d 1 e2 = (n d 1, 1) ( 1 Qw ) c = N = 1 Q w 1 (e 0 1 + e 2 ) N n d FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 22 / 33

The stationary solution in an informative form Q = γ(i γa) 1 Ac Eigensystem of A λ 1 = 1, e 1 = (1, 1) ( ) 2, λ 2 = 1 m n d n d 1 e2 = (n d 1, 1) ( 1 Qw ) c = N = 1 Q w 1 (e 0 1 + e 2 ) N n d Q = 1 Q w N 1 n d ( γ 1 γ e 1 + γλ ) 2 e 2 1 γλ 2 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 22 / 33

The stationary solution in an informative form Q = γ(i γa) 1 Ac Eigensystem of A λ 1 = 1, e 1 = (1, 1) ( ) 2, λ 2 = 1 m n d n d 1 e2 = (n d 1, 1) ( 1 Qw ) c = N = 1 Q w 1 (e 0 1 + e 2 ) N n d Q = 1 Q w N 1 n d ( γ 1 γ e 1 + γλ ) 2 e 2 1 γλ 2 Q w Q b 1 Q w = 1 N γλ 2 1 γλ 2 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 22 / 33

The stationary solution in an informative form Q = γ(i γa) 1 Ac Eigensystem of A λ 1 = 1, e 1 = (1, 1) ( ) 2, λ 2 = 1 m n d n d 1 e2 = (n d 1, 1) ( 1 Qw ) c = N = 1 Q w 1 (e 0 1 + e 2 ) N n d Q = 1 Q w N 1 n d ( γ 1 γ e 1 + γλ ) 2 e 2 1 γλ 2 F ST Q w Q b 1 Q b = γλ 2 γλ 2 + N(1 γλ 2 ) 1 1 + 2Nm FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 22 / 33

The stationary solution in another informative form in terms of eigenvalues of another matrix G: Q = γa(q + c) = γ[gq + (A G)1] = γ[gq + (I G)1] where G is the matrix whose ijth element is the coefficient of the jth element of Q in the ith element of A[Q + c]. Denote l j and u j the eigenvalues and eigenvectors of G; and write (I G)1 as j x ju j. Then FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 23 / 33

The stationary solution in another informative form in terms of eigenvalues of another matrix G: Q = γa(q + c) = γ[gq + (A G)1] = γ[gq + (I G)1] where G is the matrix whose ijth element is the coefficient of the jth element of Q in the ith element of A[Q + c]. Denote l j and u j the eigenvalues and eigenvectors of G; and write (I G)1 as j x ju j. Then Q = γ(i γg) 1 j x j u j = j γx j u j = 1 γl j t γ t j l t 1 j x j u j i.e. for ith element of Q, c i,t = j lt 1 j x j u ji. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 23 / 33

Genealogical interpretation of F -statistics (N = 100, m = 10 3, n d = 10) Probability of coalescence 10 2 10 4 10 6 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 24 / 33

Genealogical interpretation of F -statistics Interpretation of F ST from comparison of two distributions of coalescence times FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 24 / 33

Genealogical interpretation of F -statistics robustness: ancient events have no effect on F ST FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 24 / 33

Genealogical interpretation of F -statistics (N = 100, m = 10 3, n d = 10) Probability of coalescence 10 2 10 4 10 6 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 24 / 33

Genealogical interpretation of F -statistics (N = 100, m = 10 3, n d = 1000) Probability of coalescence 10 2 10 4 10 6 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 24 / 33

Genealogical interpretation of F -statistics (N = 100, m = 10 3, n d = 100000) Probability of coalescence 10 2 10 4 10 6 FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 24 / 33

Genealogical interpretation of F ST past generations F ST p (1 F ST ) p 2-2 -1 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 25 / 33

Island model with self-fertilization ( selfing ) Three eigenvalues of G = different time scales dependent on magnitude of n d N vs. N vs selfing rate. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 26 / 33

Isolation by distance Pattern of isolation by distance Genealogical interpretation FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 27 / 33

Isolation by distance: a quantitative approximation Generating function of one-lineage dispersal ψ(z) := k m kz k ; FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 28 / 33

Isolation by distance: a quantitative approximation Generating function of one-lineage dispersal ψ(z) := k m kz k ; Generating function of two-lineage increments in distance dispersal ψ 2 (z); FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 28 / 33

Isolation by distance: a quantitative approximation Generating function of one-lineage dispersal ψ(z) := k m kz k ; Generating function of two-lineage increments in distance dispersal ψ 2 (z); In terms of the eigensystem of A for any distribution of dispersal distance: Q = 1 1 Q w Nn d j γλ j 1 γλ j e j λ k = ψ 2 ( e ı2πk/nx ) and e k = (cos(2πki/n x )) for (i = 0,..., n x 1); FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 28 / 33

Isolation by distance: a quantitative approximation Generating function of one-lineage dispersal ψ(z) := k m kz k ; Generating function of two-lineage increments in distance dispersal ψ 2 (z); In terms of the eigensystem of A for any distribution of dispersal distance: Q = 1 1 Q w Nn d j γλ j 1 γλ j e j λ k = ψ 2 ( e ı2πk/nx ) and e k = (cos(2πki/n x )) for (i = 0,..., n x 1); Two dimensional λ kl = ψ 2 ( e ı2πk/nx ) ψ 2 ( e ı2πl/ny ) ; FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 28 / 33

Isolation by distance: a quantitative approximation Q w Q ij 1 Q w = 1 Nn d 1 Nπ 2 k l π π Q w Q d 1 Q w log(d) 2Nπσ 2 + constant 0 0 γλ kl 1 γλ kl (e kl,0 e kl,ij ) γψ 2 (e ıx ) ψ 2 (e ıy ) 1 γψ 2 (e ıx ) ψ 2 (e ıy [1 cos(kx) cos(ly)] dx dy ) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 28 / 33

Isolation by distance: a quantitative approximation Q w Q ij 1 Q w = 1 Nn d 1 Nπ 2 k l π π Q w Q d 1 Q w log(d) 2Nπσ 2 + constant 0 0 γλ kl 1 γλ kl (e kl,0 e kl,ij ) γψ 2 (e ıx ) ψ 2 (e ıy ) 1 γψ 2 (e ıx ) ψ 2 (e ıy [1 cos(kx) cos(ly)] dx dy ) 2Nπσ 2 Q w Q d 1 Q w FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 28 / 33

Outline An example of the role of spatial structure in evolution Modeling of the role of spatial structure Formal modelling of covariances: diffusion vs. coalescence Genealogical interpretations of covariances and regressions An example of statistical inference based on covariances FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 29 / 33

A simple inference under isolation by distance 2Nπσ 2 Q w Q d 1 Q w regress ˆQ w ˆQ d 1 ˆQ w against log(distance) to obtain an estimate of 1/2Nπσ 2. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 30 / 33

A simple inference under isolation by distance 2Nπσ 2 Q w Q d 1 Q w regress ˆQ w ˆQ d 1 ˆQ w against log(distance) to obtain an estimate of 1/2Nπσ 2. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 30 / 33

Statistical inference: an illustration FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 31 / 33

Statistical inference: an illustration FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 31 / 33

Statistical inference: an illustration FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 31 / 33

Statistical inference: an illustration 1/regression slope = estimate of neighborhood size Nb 4Dπσ 2 Here one finds Nb ˆ=753 with CI 319 3162 obtained by bootstrapping over loci. (mark-recapture experiments yielded Nb ˆ=555) FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 31 / 33

Moment vs. likelihood method Nb 200 500 1000 2000 5000 10000 50000 Regression PAC likelihood FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 32 / 33

Summary Various evolutionary processes can be understood in terms of covariances in allele frequencies; Formal analysis in terms of probabilities of identity and distributions of coalescence times; Wright s F ST and its variants can characterize recent events; They can provide some reasonable data analyses, although not the most precise ones. FR & RL Inférence en génétique des populations IV. M2 Biostatistiques 2015 2016 33 / 33