Quantitative trait evolution with mutations of large effect

Similar documents
Optimum selection and OU processes

Mathematical models in population genetics II

Frequency Spectra and Inference in Population Genetics

Association studies and regression

Lecture 18 : Ewens sampling formula

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

The Wright-Fisher Model and Genetic Drift

Population Genetics: a tutorial

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Spring 2012 Math 541B Exam 1

URN MODELS: the Ewens Sampling Lemma

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

Computational Systems Biology: Biology X

Population Genetics I. Bio

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

GARCH processes continuous counterparts (Part 2)

First Year Examination Department of Statistics, University of Florida

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Physics Sep Example A Spin System

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Introduction to population genetics & evolution

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Evolution in a spatial continuum

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Contrasts for a within-species comparative method

Empirical Likelihood Tests for High-dimensional Data

Random Vectors and Multivariate Normal Distributions

Derivation of Itô SDE and Relationship to ODE and CTMC Models

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

BioControl - Week 6, Lecture 1

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Problems for 3505 (2011)

How robust are the predictions of the W-F Model?

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Gaussian Processes. 1. Basic Notions

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

WXML Final Report: Chinese Restaurant Process

Sympatric Speciation

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Evolution of quantitative traits

CPSC 540: Machine Learning

On the posterior structure of NRMI

This is a Gaussian probability centered around m = 0 (the most probable and mean position is the origin) and the mean square displacement m 2 = n,or

Inférence en génétique des populations IV.

Heterozygosity is variance. How Drift Affects Heterozygosity. Decay of heterozygosity in Buri s two experiments

Multiple Random Variables

Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection

Diffusion Models in Population Genetics

Lecture 2: Basic Concepts of Statistical Decision Theory

Linear Regression (1/1/17)

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

{σ x >t}p x. (σ x >t)=e at.

Variance Component Models for Quantitative Traits. Biostatistics 666

The Contour Process of Crump-Mode-Jagers Branching Processes

Exercises. T 2T. e ita φ(t)dt.

General Bayesian Inference I

Looking Under the EA Hood with Price s Equation

Mod-φ convergence I: examples and probabilistic estimates

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Density Modeling and Clustering Using Dirichlet Diffusion Trees

Discrete & continuous characters: The threshold model

ON COMPOUND POISSON POPULATION MODELS

University of Regina. Lecture Notes. Michael Kozdron

Some mathematical models from population genetics

STAT 418: Probability and Stochastic Processes

Problems on Evolutionary dynamics

Phasing via the Expectation Maximization (EM) Algorithm

Expectation Propagation Algorithm

Wald s theorem and the Asimov data set

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

Nonparametric hypothesis tests and permutation tests

6 Introduction to Population Genetics

Poisson Approximation for Two Scan Statistics with Rates of Convergence

Random Averaging. Eli Ben-Naim Los Alamos National Laboratory. Paul Krapivsky (Boston University) John Machta (University of Massachusetts)

Sample Spaces, Random Variables

Gaussian processes for inference in stochastic differential equations

Genetic Variation in Finite Populations

Modelling Dependent Credit Risks

Negative Association, Ordering and Convergence of Resampling Methods

Remarks on Improper Ignorance Priors

The Admixture Model in Linkage Analysis

. Find E(V ) and var(v ).

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Normalized kernel-weighted random measures

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013

Coding sequence array Office hours Wednesday 3-4pm 304A Stanley Hall

f (1 0.5)/n Z =

FE 5204 Stochastic Differential Equations

Stat 451 Lecture Notes Numerical Integration

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Bayesian Methods for Machine Learning

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )

Transcription:

Quantitative trait evolution with mutations of large effect May 1, 2014

Quantitative traits Traits that vary continuously in populations - Mass - Height - Bristle number (approx) Adaption - Low oxygen tolerance Disease - Obesity Agriculture - Fruit yield

Fisher s quantitative trait model 1 locus n biallelic loci impacting trait Frequency 0 200 600 Allele j at locus i has effect a ij Phenotype is X k = i,j a ijz ijk - z ijk is number of copies of allele j at locus i in individual k E( X ) = 2nE(p)E(a) E(Var(X )) = 2nE(p(1 p))e(a 2 ) Environmental variation induces extra variability Frequency Frequency 0 50 150 0 50 100 150-1.5-1.0-0.5 0.0 Trait 5 loci -1 0 1 2 3 Trait 100 loci -35-30 -25-20 -15-10 -5 Trait

Neutral genetic variance Equilibrium genetic variance - E(V A ( )) = 2NV m V m is the additive variance of new mutations - V m 2µσ 2 - µ: is trait-wide mutation rate - σ 2 : variance of mutant effect size distribution Var(V A ( )) 4Nµσ 4 (Lynch and Hill 1986)

Lande s Brownian motion model Approximation to full model - Don t keep track of individual loci - Assume genetic variance constant in time - Mutations have small effect sizes Can incorporate selection via Ornstein-Uhlenbeck model Extremely influential in comparative biology - c.f. Felsenstein: independent contrasts X -1.0-0.8-0.6-0.4-0.2 0.0 0.2 0.0 0.2 0.4 0.6 0.8 1.0 time

Evidence for non-brownian evolution

A coalescent model Haploid population of size N Trait governed by n loci Each locus has an independent coalescent tree Each locus has mutation rate θ 2 (coalescent time units) When a mutation happens, effect Y is drawn from distribution with density p(y).

Example

Mutational effects Common assumption: many loci of very small effect - Infinitesimal model

Large effect mutations In some cases, large effect mutations may occur - Transcription factor binding sites - Null mutants upstream in pathways

Characteristic functions Characteristic function For a random variable X drawn from the probability measure µ( ), the function φ X (k) = E(e ikx ) = e ikx dµ(x) is called the characteristic function Sums of random variables If X 1,..., X n are independent random variables, and X = i X i then φ X (k) = φ Xi (k) i

Compound Poisson process Definition Let N(t) be a rate λ Poisson process, and (Y i, i 1) a sequence of independent and identically distributed random variables. Then N(t) Y i X (t) = i=1 is called a compound Poisson process. X(t) -5-3 -1 1 0 1 2 3 4 5 t

Characteristic function of a compound Poisson process CF of a CP process If X (t) is a rate λ Compound Poisson process, and ψ(k) is the characteristic function of the jump distribution, then the characteristic function of X is φ t (k) = e λt(ψ(k) 1)

One locus, sample of size 2 Condition on T 2, the coalescence time - X 1 and X 2 are independent CP(θ/2) processes run for time T 2 - Joint distribution depends on root value Consider Z = X 2 X 1 - Doesn t depend on root

One locus, sample of size 2 (CF) φ Z (k) = = = = = 0 0 0 0 E(e ikz )e t dt E(e ik(x 2 X 1 ) )e t dt φ t (k)φ t ( k)e t dt e θ 2 t(ψ(k)+ψ( k) 2) e t dt 1 1 θ 2 (ψ(k) + ψ( k) 2)

Phenotype at the root The root of the tree must be specified A new individual could coalesce more anciently than the current root - For each n, there is a fixed root - Ensuring consistency could be hard

The root problem

Getting rid of the root Normalization If X = (X 0,..., X n 1 ) are the trait values in the sample, then the random vector Z = (Z 1,..., Z n 1 ) does not depend on the root value = (X 1 X 0,..., X n 1 X 0 ) Normalizing by an arbitrary individual makes n unimportant - Intuition: now every sample has to follow the path through the tree to individual 0, so the root doesn t matter

One locus, sample of size 3 Characteristic function for n = 3, and symmetric mutational effects φ Z1,Z 2 (k 1, k 2 ) = 1 1+θ(1 ψ(k 1 )) + 1 1+θ(1 ψ(k 2 )) + 1 1+θ(1 ψ(k 1 +k 2 )) 3 θ 2 (ψ(k 1) + ψ(k 2 ) + ψ(k 1 + k 2 ) 3) Sketch of derivation: - Condition on topology (each of 3 topologies equally likely) - Compute characteristic function for each topology while conditioning on coalescence times - Integrate over coalescence times - Take weighted sum of characteristic functions for each topology

Sending the number of loci off to infinity We would like some nice limit as the number of loci increases - Trivial result: uniform on R or δ(x) Need to decrease the effect size or mutation rate of each locus Three nontrivial limits - Mutation rate per locus decreases but effect sizes remain constaint - Mutation rate per locus remains constant but effect sizes decrease and effect size distribution does not have fat tails - Mutation rate per locus remains constant but effect size decreases and effect size distribution has fat tail

Decreasing mutation rate Correlated CPP limit As n and θ 0 such that nθ Θ, φ Z1,Z 2 (k 1, k 2 ) e Θ 2 (ψ(k 1)+ψ(k 2 )+ψ(k 1 +k 2 ) 3) which is the characteristic function of two correlated compound Poisson processes. Perhaps not very biologically relevant Will ignore for rest of talk

Small effect mutations Bivariate Gaussian limit As n and the second moment of the effect distrbution, τ 2, 0 such that nτ 2 σ 2, φ Z1,Z 2 (k 1, k 2 ) e θ 2 σ2 (k 2 1 +k 1k 2 +k 2 2 ) which is the characteristic function of a Bivariate Gaussian distribution with mean vector (0, 0) and variance-covariance matrix [ ] Σ = θσ 2 1 1/2 1/2 1

Large effect mutations Bivariate stable distribution Assume that p(y) κ y (α+1) and set t = κπ (sin(απ/2)γ(α)α) 1. As n and t 0 such that nt c, where c = c φ Z1,Z 2 (k 1, k 2 ) e 1 2 θc( k 1 α + k 2 α + k 1 +k 2 α ) π sin( απ 2 ) bivariate α-stable distribution, which is the characteristic function of a

de Finetti s theorem Theorem If (X 1, X 2,...) is an infinitely exchangeable random vector, then the probability density of (X 1 = x 1,... X n = x n ) is a mixture over i.i.d. probability densities. That is, p(x 1,..., x n ) = p θ (x i )ν(dθ) Suggests that we can find a de Finetti measure such that all our normalized samples are i.i.d. i

A guess in the normal case Given the mean, the trait is distributed N (0, θ 2 σ2 ) The mean is itself distributed N (0, θ 2 σ2 ) Law of total variance gives Var(X ) = 2Nµσ 2 - Same as classical derivation ( φ(k 1, k 2 ) = 2 ) e ik l x l 1 (m x πθσ 2 e l ) 2 1 θσ 2 m2 dx l e θσ 2 dm πθσ 2 l=1 = e θ 4 σ2 (k1 2+k2 2) = e θ 4 σ2 (k 2 1 +k2 2) e θ 4 σ2 (k 1 +k 2 ) 2 = e θ 2 σ2 (k 2 1 +k 1k 2 +k 2 2) e im(k 1+k 2 ) 1 m2 e θσ 2 dm πθσ 2

Tempting interpretation and conjecture Conditioning on the amount of evolution exclusive to sample 0 - Integrate over whether sample 1 or sample 2 coalesces with sample 0 first Suggests that this can be extended to arbitrary sample sizes Conjecture about larger samples (Gaussian limit) When the mutation kernel has only small effect mutations, the limit distribution is normal with variance θ 2 σ2 and random mean.

Testing the Gaussian conjecture Simulate quantitative traits according to the coalescent model Use KS test to assess convergence in the limit Normal, KS D 2 8 2 7 # samples 2 6 2 5 2 4 value diff same 2 3 2 2 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 # loci

Conjecture for the stable limit Bivariate case - Independently α-stable with random median - Analogous to Gaussian limit Conjecture about larger samples (Stable limit) When the mutation kernel has large mutational effects, the limit distribution is α-stable with scale ( θ 2 c) 1/α and random median.

Testing the stable conjecture Simulate quantitative traits according to the coalescent model Use KS test to assess convergence in the limit Stable, KS D 2 8 2 7 # samples 2 6 2 5 2 4 value diff same 2 3 2 2 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 # loci

More randomness than expected Density 0 2 4 6 8-1.0-0.5 0.0 0.5 1.0 Trait value

Stable limit for sample of size 4 Tedious computation - Need to average over 18 trees. Conjecture would require φ Z1,Z 2,Z 3 (k 1, k 2, k 3 ) e θ 2 c( k 1 α + k 2 α + k 3 α + k 1 +k 2 +k 3 α ) Trivariate characteristic function For a sample of size three, under the same conditions as the bivariate limit, φ Z1,Z 2,Z 3 (k 1, k 2, k 3 ) exp{ θ 3 c( k 1 α + k 2 α + k 3 α + 1 2 k 1 + k 2 α + 1 2 k 1 + k 3 α + 1 2 k 2 + k 3 α + k 1 + k 2 + k 3 α )}

Extremely weak conjecture for large effect mutations Conjecture (Stable limit) When the mutation kernel has large mutational effects, the limit distribution is α-stable with random parameters Not even sure that this is true - A mixture of stable distributions?

Empirical data Neurospora crassa mrna-seq - Ellison et al (2011), PNAS - Restrict to genes with FPKM > 1 For each gene fit normal distribution, α-stable distribution Bootstrap likelihood ratio test (H 0 : α = 2 vs. H 1 : α < 2) Preliminary! - Only analyzed 346 genes

Distribution of p-values Frequency 0 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 p value 54 genes significant at FDR of 32%

Distribution of α in significant genes Frequency 0 2 4 6 8 1.4 1.5 1.6 1.7 1.8 1.9 α^

Cherry-picked examples NCU09477 NCU00484 Density 0.0 0.5 1.0 1.5 Density 0 1 2 3 1.0 0.5 0.0 0.5 log(fpkm) 0.2 0.0 0.2 0.4 log(fpkm) NCU02435 NCU07659 Density 0 1 2 3 4 Density 0.0 1.0 2.0 0.6 0.4 0.2 0.0 0.2 0.4 log(fpkm) 1.0 0.5 0.0 0.5 log(fpkm)

Conclusions Introduced a coalescent model of quantitative trait evolution - Provided exact formulas for the characteristic functions for samples of size 2, 3, 4 - Found limiting distributions as the number of loci became large in each case - Conjectured about extending these limits to larger samples Extensions - Samples not contemporaneous - Population structure - Diploids Why a neutral model? - Analytically tractable calculations - Intuition for what to expect with weak selection - Null model