The Combinatorial Interpretation of Formulas in Coalescent Theory

Similar documents
6 Introduction to Population Genetics

6 Introduction to Population Genetics

Computational Systems Biology: Biology X

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

How robust are the predictions of the W-F Model?

Rare Alleles and Selection

Lecture 18 : Ewens sampling formula

ON COMPOUND POISSON POPULATION MODELS

Mathematical models in population genetics II

Endowed with an Extra Sense : Mathematics and Evolution

Stochastic Demography, Coalescents, and Effective Population Size

Evolution in a spatial continuum

Lecture 19 : Chinese restaurant process

Genetic Variation in Finite Populations

Selection and Population Genetics

The Moran Process as a Markov Chain on Leaf-labeled Trees

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Frequency Spectra and Inference in Population Genetics

The Wright-Fisher Model and Genetic Drift

Population Genetics I. Bio

classes with respect to an ancestral population some time t

BIRS workshop Sept 6 11, 2009

An introduction to mathematical modeling of the genealogical process of genes

Populations in statistical genetics

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg

Demography April 10, 2015

Statistical population genetics

CLOSED-FORM ASYMPTOTIC SAMPLING DISTRIBUTIONS UNDER THE COALESCENT WITH RECOMBINATION FOR AN ARBITRARY NUMBER OF LOCI

Processes of Evolution

J. Stat. Mech. (2013) P01006

Theoretical Population Biology

A phase-type distribution approach to coalescent theory

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Evolution & Natural Selection

Haploid & diploid recombination and their evolutionary impact

A CHARACTERIZATION OF ANCESTRAL LIMIT PROCESSES ARISING IN HAPLOID. Abstract. conditions other limit processes do appear, where multiple mergers of

The effects of a weak selection pressure in a spatially structured population

Ancestral Inference in Molecular Biology. Simon Tavaré. Saint-Flour, 2001

The two-parameter generalization of Ewens random partition structure

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Australian bird data set comparison between Arlequin and other programs

I of a gene sampled from a randomly mating popdation,

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Chapter 16: Evolutionary Theory

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

ANCESTRAL PROCESSES WITH SELECTION: BRANCHING AND MORAN MODELS

Notes for MCTP Week 2, 2014

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Crump Mode Jagers processes with neutral Poissonian mutations

URN MODELS: the Ewens Sampling Lemma

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

Similarity Measures and Clustering In Genetics

Properties of normal phylogenetic networks

Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics

Computational statistics

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

Closed-form sampling formulas for the coalescent with recombination

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

Mathematical Population Genetics II

From Individual-based Population Models to Lineage-based Models of Phylogenies

RW in dynamic random environment generated by the reversal of discrete time contact process

Microevolution Changing Allele Frequencies

1. The diagram below shows two processes (A and B) involved in sexual reproduction in plants and animals.

LIFE SCIENCE CHAPTER 5 & 6 FLASHCARDS

Computing likelihoods under Λ-coalescents

Mathematical Population Genetics II

On The Mutation Parameter of Ewens Sampling. Formula

Introduction to population genetics & evolution

Pathwise construction of tree-valued Fleming-Viot processes

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT

Stirling Numbers of the 1st Kind

The Evolution of Gene Dominance through the. Baldwin Effect

WXML Final Report: Chinese Restaurant Process

It is indeed a pleasure for me to contribute to this dedicatory volume

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Introduction to Digital Evolution Handout Answers

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

Lecture 1. References

Evolution. Species Changing over time

The coalescent process

Modeling Evolution DAVID EPSTEIN CELEBRATION. John Milnor. Warwick University, July 14, Stony Brook University

Concepts and Methods in Molecular Divergence Time Estimation

REVIEW 6: EVOLUTION. 1. Define evolution: Was not the first to think of evolution, but he did figure out how it works (mostly).

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS

Some mathematical models from population genetics

mrna Codon Table Mutant Dinosaur Name: Period:

Regents Biology REVIEW 6: EVOLUTION. 1. Define evolution:

Tree sets. Reinhard Diestel

Coalescent Theory for Seed Bank Models

Name Period. 3. How many rounds of DNA replication and cell division occur during meiosis?

Chapter 5 Evolution of Biodiversity. Sunday, October 1, 17

Follow links Class Use and other Permissions. For more information, send to:

Neutral behavior of shared polymorphism

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Chapter 13- Reproduction, Meiosis, and Life Cycles. Many plants and other organisms depend on sexual reproduction.

Evolutionary dynamics on graphs

Transcription:

The Combinatorial Interpretation of Formulas in Coalescent Theory John L. Spouge National Center for Biotechnology Information NLM, NIH, DHHS spouge@ncbi.nlm.nih.gov Bldg. A, Rm. N 0 NCBI, NLM, NIH Bethesda MD 09

Motivation Evolution is fundamental to understanding biological networks. The infinite alleles model was arguably the most important model of evolution until about 990. yields many simple solutions, mostly proved by induction. still provides useful approximations (e.g., in LDHat, a program for estimating recombination rates along human chromosomes) If a difficult problem has a solution with a simple form, the solution had to be simple, and something is missing. In 990, a simple solution was discovered, but not pursued. The solution M Kimura & JF Crow (9) Genetics 9: P Joyce & S Tavare (990) Stoch Proc Appl : might indicate techniques useful in newer evolutionary realistic models. provides people of a combinatorial bent with an instant understanding of a significant part of almost thirty years of evolutionary literature.

Overview Population Genetics Models Forward Time Wright-Fisher Model Moran Model Yule Model The Death Process for the Moran Model Backward Time The waiting times are independent of the combinatorics. Invariance theorems: the ancestral tree of a population sample converges to a single limit for many genetics models (the coalescent process). The death process maps to random permutations, so counting permutations provides many results for the infinite alleles model. The mapping to random permutations also gives insight into deep results for the entire population.

Life Reproduction Mutation Selection C Neuhauser & SM Krone (99) Genetics :9 F Crick (9) Life Itself: Its Origin and Nature

Wright-Fisher Model Without Mutation The population is: composed of haploid individuals; of constant size. M (No sex!) Individuals reproduce simultaneously. Every offspring chooses a single parent uniformly at random. RA Fisher (90) The Genetical Theory of Natural Selection S Wright (9) Genetics :9

Wright-Fisher Model Without Mutation M t 0

Wright-Fisher Model Without Mutation Some lineages die out by chance. Some lineages expand by chance. Some lineages do nothing much at all. M t 0

Moran Model Without Mutation The population is: composed of haploid individuals; of constant size. At each step Two individuals are chosen uniformly at random with replacement. One reproduces; the other (possibly the same individual) dies.

Moran Model Without Mutation M 9 t 0

Moran Model With Mutation Infinite Alleles Model The population is: composed of haploid individuals; of constant size. At each step Two individuals are chosen uniformly at random with replacement. One reproduces; the other (possibly the same individual) dies. With probability u, the new individual is a mutant type never seen before. 0

Infinite Alleles & Infinite Sites Models Infinite alleles models gel Every mutation is a type never seen before, but nothing is said about the structure of the mutation. Infinite sites models DNA Every mutation is a type never seen before, occuring in a new place on the DNA.

Moran Model Without Mutation M t 0

Moran Model With Mutation u u M t 0

Moran Model With Mutation Looking Backward a Death Process Let us temporarily ignore changes that do not affect the ancestry of the present population. set of rooted trees ordered by height M nt 0

Moran Model With Mutation Looking Backward a Death Process M Ancestral Tree n 0

Moran Model With Mutation Looking Backward a Death Process M canonical representation of a permutation by its cycle structure n 0

Moran Model With Mutation Looking Backward a Death Process M n 0

Moran Model With Mutation Looking Backward a Death Process M Each permutation on M objects corresponds to exactly one realization of the death process. n 0

Moran Model With Mutation 9 Looking Backward a Death Process M Ancestral Permutation n 0

Moran Model With Mutation Looking Backward M 0 k n k n tj t u u u u j j ( j ) M M j= M M P t The variates Π and T are independent. ( Π, T) = ( π, t) { } t t j+ t k cycles k = t 0 - - 0 t t t t π

P Moran Model With Mutation ( Π, T) = ( π, t) { } Looking Backward k n k n tj t u u u u j j ( j ) M M j= M M The variates Π and T are independent. j+ P{ Π=π} π P k k k u u Mu k θ = = M M u k { Π= π } θ S n k cycles k =

Moran Model With Mutation Looking Backward from a Population M n 0

Moran Model With Mutation Mn Looking Backward from a Sample n Each permutation on n objects corresponds to exactly one realization of the death process. 0

Moran Model With Mutation Looking Backward from a Sample k n k n tj t u u u u j j ( j ) M M j= M M P ( Π, T) = ( π, t) { } M u = θ M + θ Fix P j+ { T= t} Mu θ = u By suitably scaling time, the waiting times between deaths are independent exponential variates. There are invariance theorems showing that when their times are suitably scaled, the Wright-Fisher Model, the Moran Model, and many other genetics models converge to limits, so-called coalescent processes. π JFC Kingman (9) Stoch Proc Appl :

Moran Model With Mutation n M The Death Process The distribution of the death process for the population at any time has the same form as for a sample. 0 WJ Ewens & GA Watterson, p. in NM Bingham & CM Goldie (00) Probability and Mathematical Genetics

Sample Path of the Death Process & the Cycle Structure of Permutations π = ( )( )( )( ) π The Fundamental Theorem P of Infinite Alleles Mutation Models k { Π= π } θ To calculate the probability of any particular ancestral tree, enumerate the corresponding ancestral permutations with θ marking the cycles and then normalize to a probability. In a coalescent limit (infinite population with suitable scaling of time), the waiting times between events are independent exponential variates. S n P Joyce & S Tavare (990) Stoch Proc Appl :

The Ancestral Permutation and the Sampling Permutation π = ( )( )( )( ) In practice, individuals are sampled from the population before they are placed into the ancestral tree. Without loss of generality, let the sampling be sequential, and without replacement. The sampling order of the n individuals requires a second, sampling permutation, to put the individuals into the unique order required for the correspondence between the ancestral graph and the ancestral permutation. P { Σ= σ S } = / n! σ = n The variates Π, Σ, and T are independent. π

A Light Warm-up

Moran Model Without Mutation (θ=0) Looking Backward a Death Process 9 n M n 0

Yule Process Without Mutation (θ=0) Looking Forward What is the distribution of the number of leaves on either side of the first split? 0 n M n 0

Moran Model Without Mutation (θ=0) Looking Backward a Death Process n M n The Each distribution cyclic permutation of the number of n of leaves objects on corresponds either side of to the exactly first one realization split is uniform. of the death process (or the Yule process). 0 J Hein, MH Schierup, and C Wiuf (00) Gene Genealogies, Variation, and Evolution p.

Combinatorial Notations

Combinatorial Notations n θ = θ( θ + )...( θ + n ) = θ θ k= 0 k = π Sn n n k k Rising factorial Stirling number of the first kind Stirling cycle number The number of permutations of n objects with k cycles The rising factorial is the ordinary generating function of permutations, with θ marking the number of cycles. ( )...( n ) n θ θ θ θ = + Falling factorial ξ #(elements in a set ξ) R Graham, D Knuth, and O Patashnik (99) Concrete Mathematics: A Foundation for Computer Science

Something a Little Heavier WJ Ewens (9) Theor Pop Biol : P Donnelly & S Tavare (9) Adv Appl Prob : J Hein et al. (00) Gene Genealogies, Variation and Evolution

Probability of k Types in the Sample With Mutation n θ P { k types} n k = θ k The number of permutations of n objects with k cycles, marked with θ k Stirling number of the first kind Stirling cycle number k cycles k = π P Donnelly & S Tavare (9) Adv Appl Prob :

Ewens Sampling Formula n θ P { a } a an... n = n i= partition by types n! ( a! ) i ai i k θ 0 0 0 0 0 π k n = a i= i n n = ia i= i k cycles k = WJ Ewens (9) Theor Pop Biol :

Most Recent Common Ancestor (MRCA) Without Mutation What is the probably that in a sample of n individuals, a random subset of m coalesces to its MRCA before coalescing with any other individuals? In the (circular) sample permutation, the individuals are consecutive. In the (circular) ancestral permutation, relative to the m individuals in the sample permutation, both end numbers are smaller than the internal numbers. n n m m + m = n = J Hein et al. (00) Gene Genealogies, Variation and Evolution π

Something Heavy

Distribution of Ancestors Without Mutation π k= ancestors 9 0

Distribution of Ancestors 0 ( ) P = { } Without Mutation { } n! n! A ξ,..., ξ k #(realizations producing A k ) ( ) ( ) ξ!...!!!! ξk k k n k k= ancestors k 0 π JFC Kingman (9) Stoch Proc Appl :

Distribution of Ancestors With Mutation { = ({ } ( ))} ( ) ( ) n n! θ P A k ξ,..., ξ k, η,..., η j #(realizations marked by θ producing A k ) ξ!... ξ! k! η!... η! k ( ) ( j) j θ ( n k) ( η( ) +... + η ) ( )... η j ( j) k= ancestors! k θ oldest to youngest 0 π P Donnelly & S Tavare (9) Adv Appl Prob :

Distribution of Sample Types P { A ( ) } 0 = ( ),..., ( ) plus k j younger types j n θ η η #(realizations marked by θ producing A 0 ) s j n n sj k θ n n s n s k j ( )...( ) j oldest to youngest π si = η +... + η ( ) ( i) total length of first i cycles P Donnelly & S Tavare (9) Adv Appl Prob :

Distribution of All Sample Types { A ( )} 0 = ( ),..., ( k ) n θ P η η #(realizations producing A 0, marked by θ) n! n n s n s ( )...( ) k k θ ( η,..., η ) = (,,, ) ( ) ( k ) oldest to youngest π si = η +... + η ( ) ( i) P Donnelly & S Tavare (9) Adv Appl Prob :

Something Really Heavy

Moran Model With Mutation Nested Samples Embed a death process for a known sample of size n in the death process for a super-sample of size M. Use the correspondence with random permutations in S M to derive results for the super-sample, conditioned on the sample. It will be expected that various exact results hold for the Moran model WJ Ewens & GA Watterson in NM Bingham & CM Goldie (00) Probability and Mathematical Genetics, p. Let M to derive results relating the sample to an infinite population. Because of invariance theorems, such results often pertain to several different genetics models.

Distribution of Sample Types Within an Infinite Population P P n! { A ( )} 0 = η( ),..., η ( ) = k n( n s )...( n s ) k oldest to youngest types within sample k θ n θ k n! θ n θ θ θ θ { A ( )} 0 = η( ),..., η ( ) = k ( + n)( + n s )...( + n s ) k oldest to youngest types within sample are in fact the k oldest types in the population k = P Donnelly & S Tavare (9) Adv Appl Prob : si = η +... + η ( ) ( i) Species Deletion Property

Summary Population Genetics Models Forward Time Models without mutation and infinite-alleles infinite-sites models Wright-Fisher Model Moran Model Yule Model The Death Process for the Models Backward Time The waiting times are independent of the combinatorics. Invariance theorems: the ancestral tree of a population sample converges to a single limit for many genetics models (the coalescent process). The death process maps to random permutations, so counting permutations provides many results for genetic samples. The mapping to random permutations also gives insight into results for the entire population Species Deletion Property

Laszlo Szekely 9 Eva Czabarka