The Combinatorial Interpretation of Formulas in Coalescent Theory John L. Spouge National Center for Biotechnology Information NLM, NIH, DHHS spouge@ncbi.nlm.nih.gov Bldg. A, Rm. N 0 NCBI, NLM, NIH Bethesda MD 09
Motivation Evolution is fundamental to understanding biological networks. The infinite alleles model was arguably the most important model of evolution until about 990. yields many simple solutions, mostly proved by induction. still provides useful approximations (e.g., in LDHat, a program for estimating recombination rates along human chromosomes) If a difficult problem has a solution with a simple form, the solution had to be simple, and something is missing. In 990, a simple solution was discovered, but not pursued. The solution M Kimura & JF Crow (9) Genetics 9: P Joyce & S Tavare (990) Stoch Proc Appl : might indicate techniques useful in newer evolutionary realistic models. provides people of a combinatorial bent with an instant understanding of a significant part of almost thirty years of evolutionary literature.
Overview Population Genetics Models Forward Time Wright-Fisher Model Moran Model Yule Model The Death Process for the Moran Model Backward Time The waiting times are independent of the combinatorics. Invariance theorems: the ancestral tree of a population sample converges to a single limit for many genetics models (the coalescent process). The death process maps to random permutations, so counting permutations provides many results for the infinite alleles model. The mapping to random permutations also gives insight into deep results for the entire population.
Life Reproduction Mutation Selection C Neuhauser & SM Krone (99) Genetics :9 F Crick (9) Life Itself: Its Origin and Nature
Wright-Fisher Model Without Mutation The population is: composed of haploid individuals; of constant size. M (No sex!) Individuals reproduce simultaneously. Every offspring chooses a single parent uniformly at random. RA Fisher (90) The Genetical Theory of Natural Selection S Wright (9) Genetics :9
Wright-Fisher Model Without Mutation M t 0
Wright-Fisher Model Without Mutation Some lineages die out by chance. Some lineages expand by chance. Some lineages do nothing much at all. M t 0
Moran Model Without Mutation The population is: composed of haploid individuals; of constant size. At each step Two individuals are chosen uniformly at random with replacement. One reproduces; the other (possibly the same individual) dies.
Moran Model Without Mutation M 9 t 0
Moran Model With Mutation Infinite Alleles Model The population is: composed of haploid individuals; of constant size. At each step Two individuals are chosen uniformly at random with replacement. One reproduces; the other (possibly the same individual) dies. With probability u, the new individual is a mutant type never seen before. 0
Infinite Alleles & Infinite Sites Models Infinite alleles models gel Every mutation is a type never seen before, but nothing is said about the structure of the mutation. Infinite sites models DNA Every mutation is a type never seen before, occuring in a new place on the DNA.
Moran Model Without Mutation M t 0
Moran Model With Mutation u u M t 0
Moran Model With Mutation Looking Backward a Death Process Let us temporarily ignore changes that do not affect the ancestry of the present population. set of rooted trees ordered by height M nt 0
Moran Model With Mutation Looking Backward a Death Process M Ancestral Tree n 0
Moran Model With Mutation Looking Backward a Death Process M canonical representation of a permutation by its cycle structure n 0
Moran Model With Mutation Looking Backward a Death Process M n 0
Moran Model With Mutation Looking Backward a Death Process M Each permutation on M objects corresponds to exactly one realization of the death process. n 0
Moran Model With Mutation 9 Looking Backward a Death Process M Ancestral Permutation n 0
Moran Model With Mutation Looking Backward M 0 k n k n tj t u u u u j j ( j ) M M j= M M P t The variates Π and T are independent. ( Π, T) = ( π, t) { } t t j+ t k cycles k = t 0 - - 0 t t t t π
P Moran Model With Mutation ( Π, T) = ( π, t) { } Looking Backward k n k n tj t u u u u j j ( j ) M M j= M M The variates Π and T are independent. j+ P{ Π=π} π P k k k u u Mu k θ = = M M u k { Π= π } θ S n k cycles k =
Moran Model With Mutation Looking Backward from a Population M n 0
Moran Model With Mutation Mn Looking Backward from a Sample n Each permutation on n objects corresponds to exactly one realization of the death process. 0
Moran Model With Mutation Looking Backward from a Sample k n k n tj t u u u u j j ( j ) M M j= M M P ( Π, T) = ( π, t) { } M u = θ M + θ Fix P j+ { T= t} Mu θ = u By suitably scaling time, the waiting times between deaths are independent exponential variates. There are invariance theorems showing that when their times are suitably scaled, the Wright-Fisher Model, the Moran Model, and many other genetics models converge to limits, so-called coalescent processes. π JFC Kingman (9) Stoch Proc Appl :
Moran Model With Mutation n M The Death Process The distribution of the death process for the population at any time has the same form as for a sample. 0 WJ Ewens & GA Watterson, p. in NM Bingham & CM Goldie (00) Probability and Mathematical Genetics
Sample Path of the Death Process & the Cycle Structure of Permutations π = ( )( )( )( ) π The Fundamental Theorem P of Infinite Alleles Mutation Models k { Π= π } θ To calculate the probability of any particular ancestral tree, enumerate the corresponding ancestral permutations with θ marking the cycles and then normalize to a probability. In a coalescent limit (infinite population with suitable scaling of time), the waiting times between events are independent exponential variates. S n P Joyce & S Tavare (990) Stoch Proc Appl :
The Ancestral Permutation and the Sampling Permutation π = ( )( )( )( ) In practice, individuals are sampled from the population before they are placed into the ancestral tree. Without loss of generality, let the sampling be sequential, and without replacement. The sampling order of the n individuals requires a second, sampling permutation, to put the individuals into the unique order required for the correspondence between the ancestral graph and the ancestral permutation. P { Σ= σ S } = / n! σ = n The variates Π, Σ, and T are independent. π
A Light Warm-up
Moran Model Without Mutation (θ=0) Looking Backward a Death Process 9 n M n 0
Yule Process Without Mutation (θ=0) Looking Forward What is the distribution of the number of leaves on either side of the first split? 0 n M n 0
Moran Model Without Mutation (θ=0) Looking Backward a Death Process n M n The Each distribution cyclic permutation of the number of n of leaves objects on corresponds either side of to the exactly first one realization split is uniform. of the death process (or the Yule process). 0 J Hein, MH Schierup, and C Wiuf (00) Gene Genealogies, Variation, and Evolution p.
Combinatorial Notations
Combinatorial Notations n θ = θ( θ + )...( θ + n ) = θ θ k= 0 k = π Sn n n k k Rising factorial Stirling number of the first kind Stirling cycle number The number of permutations of n objects with k cycles The rising factorial is the ordinary generating function of permutations, with θ marking the number of cycles. ( )...( n ) n θ θ θ θ = + Falling factorial ξ #(elements in a set ξ) R Graham, D Knuth, and O Patashnik (99) Concrete Mathematics: A Foundation for Computer Science
Something a Little Heavier WJ Ewens (9) Theor Pop Biol : P Donnelly & S Tavare (9) Adv Appl Prob : J Hein et al. (00) Gene Genealogies, Variation and Evolution
Probability of k Types in the Sample With Mutation n θ P { k types} n k = θ k The number of permutations of n objects with k cycles, marked with θ k Stirling number of the first kind Stirling cycle number k cycles k = π P Donnelly & S Tavare (9) Adv Appl Prob :
Ewens Sampling Formula n θ P { a } a an... n = n i= partition by types n! ( a! ) i ai i k θ 0 0 0 0 0 π k n = a i= i n n = ia i= i k cycles k = WJ Ewens (9) Theor Pop Biol :
Most Recent Common Ancestor (MRCA) Without Mutation What is the probably that in a sample of n individuals, a random subset of m coalesces to its MRCA before coalescing with any other individuals? In the (circular) sample permutation, the individuals are consecutive. In the (circular) ancestral permutation, relative to the m individuals in the sample permutation, both end numbers are smaller than the internal numbers. n n m m + m = n = J Hein et al. (00) Gene Genealogies, Variation and Evolution π
Something Heavy
Distribution of Ancestors Without Mutation π k= ancestors 9 0
Distribution of Ancestors 0 ( ) P = { } Without Mutation { } n! n! A ξ,..., ξ k #(realizations producing A k ) ( ) ( ) ξ!...!!!! ξk k k n k k= ancestors k 0 π JFC Kingman (9) Stoch Proc Appl :
Distribution of Ancestors With Mutation { = ({ } ( ))} ( ) ( ) n n! θ P A k ξ,..., ξ k, η,..., η j #(realizations marked by θ producing A k ) ξ!... ξ! k! η!... η! k ( ) ( j) j θ ( n k) ( η( ) +... + η ) ( )... η j ( j) k= ancestors! k θ oldest to youngest 0 π P Donnelly & S Tavare (9) Adv Appl Prob :
Distribution of Sample Types P { A ( ) } 0 = ( ),..., ( ) plus k j younger types j n θ η η #(realizations marked by θ producing A 0 ) s j n n sj k θ n n s n s k j ( )...( ) j oldest to youngest π si = η +... + η ( ) ( i) total length of first i cycles P Donnelly & S Tavare (9) Adv Appl Prob :
Distribution of All Sample Types { A ( )} 0 = ( ),..., ( k ) n θ P η η #(realizations producing A 0, marked by θ) n! n n s n s ( )...( ) k k θ ( η,..., η ) = (,,, ) ( ) ( k ) oldest to youngest π si = η +... + η ( ) ( i) P Donnelly & S Tavare (9) Adv Appl Prob :
Something Really Heavy
Moran Model With Mutation Nested Samples Embed a death process for a known sample of size n in the death process for a super-sample of size M. Use the correspondence with random permutations in S M to derive results for the super-sample, conditioned on the sample. It will be expected that various exact results hold for the Moran model WJ Ewens & GA Watterson in NM Bingham & CM Goldie (00) Probability and Mathematical Genetics, p. Let M to derive results relating the sample to an infinite population. Because of invariance theorems, such results often pertain to several different genetics models.
Distribution of Sample Types Within an Infinite Population P P n! { A ( )} 0 = η( ),..., η ( ) = k n( n s )...( n s ) k oldest to youngest types within sample k θ n θ k n! θ n θ θ θ θ { A ( )} 0 = η( ),..., η ( ) = k ( + n)( + n s )...( + n s ) k oldest to youngest types within sample are in fact the k oldest types in the population k = P Donnelly & S Tavare (9) Adv Appl Prob : si = η +... + η ( ) ( i) Species Deletion Property
Summary Population Genetics Models Forward Time Models without mutation and infinite-alleles infinite-sites models Wright-Fisher Model Moran Model Yule Model The Death Process for the Models Backward Time The waiting times are independent of the combinatorics. Invariance theorems: the ancestral tree of a population sample converges to a single limit for many genetics models (the coalescent process). The death process maps to random permutations, so counting permutations provides many results for genetic samples. The mapping to random permutations also gives insight into results for the entire population Species Deletion Property
Laszlo Szekely 9 Eva Czabarka