Time-dependent Molecular Rates: A Very Rough Guide
Background / Introduction
- Time dependency of morphological evolution: inverse correlation with time interval over which the rate was measured rate of change between successive generations exceeded macroevolutionary rates by several orders of magnitude. (Kurtén 1959) o confirmed in a number of subsequent studies - The molecular clock: proteins experience amino acid replacements at a surprisingly consistent rate across very different species presumed single, uniform rate of genetic evolution. (Zuckerkandl & Pauling 1965) o Versatile tool for dating genetic events - The updated molecular clock: Rates can vary across various dimensions (Ho & Lee 2016) o Site effects: different parts of the genome o Lineage effects: taxa o And, seemingly, across time: Inverse correlation between measured rate and timescale of measurement time-dependent molecular rate (TDMR)
A B Site effect Lineage effect Time (My ago) 0 100 Clade X Clad Clade X 200 C D Epoch effect Site & Lineage effects 0 Time (My ago) and ver, he ary ntially fossil ock n birds h radiated period e oldest h groups of the ular rigin of ms) more the first ar (Figure ncies il ned. As enerally ous fossils, mmal or y to be ation or t largely molecular more adiations, ences would be models to es. an also traits and evolution. nalyses ains ecular 100 Clade X Clade X 200 Key: Gene 1 Gene 2 Fast evolutionary rate Slow evolutionary rate Current Biology Figure 2. Modern molecular clocks can accommodate complex variation in rates of genetic change across the tree of life. (A) Rate variation across sites: gene 1 evolves rapidly but gene 2 evolves slowly, across all lineages. (B) Rate variation across lineages: genes 1 and 2 both evolve rapidly in clade X. (C) Rate variation across time periods or epochs : genes 1 and 2 both evolved rapidly between 140 and 80
What (is TDMR)
- Disparity between spontaneous mutation rates measured over short timescales and substitution rates over geological timescales - In other words: mutation rate (µ) vs. substitution rate (σ) - All mutations strictly neutral => σ = µ (Kimura 1968) - => Expect µ and σ to differ when many mutations aren t neutral - Rates on different timescales reflect different biological processes: o Very short (e.g. between successive generations): Can include all but the most detrimental mutations Approaches µ (discounting lethal mutations) o Very long (e.g. between distantly related species): Usually dominated by substitutions (fixed mutations) Approximates σ: < µ due to purifying (negative) selection
Present AB C D t 1 coalescent history of extant individuals t 2 between intraspecific genealogies and speciation event Past t 3 coalescent history of ancestral population prior to speciation Fig. 3 A simplified representation of genealogical history at a single locus in a pair of species. Each circle represents a randomly mating individual and each row represents one generation. The ancestral species (spanning the time period t 3 ) splits into two descendent species (spanning the periods t 2 and t 1 ). Four contemporary individuals labelled A, B, C and D are referred to in the text.
- A and B closely related individuals: Observed differences dominated by mutations - A and D from different species: Differences likely dominated by substitutions (fixed during t2, sometimes t3), though some may be polymorphisms (generally arising in t1) - Proportions of t1, t2, t3 -> relative influence of µ and σ on rate estimate from A-D comparison, e.g. o t2 long relative to t1, t3 => estimate approximates σ - Another factor: effective population size o E.g. if large and most mutations deleterious: estimate from A-C < estimate from A-B
How
The degree of divergence between 2 sequences is determined by: a. The rate of molecular change b. TMRCA (time to most recent common ancestor) => To estimate the rate (a), independent information about the evolutionary timescale (b) is needed.
Typical calibration techniques for the shortest timescales: - Laboratory mutation-accumulation lines (Keightley et al. 2009) - Studies of pedigrees (Haag-Liautard et al. 2008) - Heterochronous sampling: sampled sequences of different ages (Duchêne et al. 2015) o Also used for longer timescales, e.g. with adna (ancient DNA) o Calibration technique: temporal structure eliminates need for calibration by fixing an internal node at a point in time
Calibration techniques for longer timescales: - Dated geological events (Burridge et al. 2008) - Archaeological and anthropological evidence (Henn et al. 2009) Fossil record usually for calibrations at least several million years in age - Evidence for earliest appearance of members of separate lineages o Minimum age constraint for divergence event - Usually insufficient morphological variation on shorter timescales for reliable diagnostic usage
ANIMALS AND PLANTS Fossil record Geology/Biogeography Ancient DNA Pedigree 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Lab line Serial sampling Years before present Archaeology Ancient DNA BACTERIA AND VIRUSES Pop. co-divergence Species co-divergence Fig. 2 Typical age ranges of different forms of calibrating information.
Where
- mtdna in: o Humans: pedigree rate of mtdna sequence divergence higher than phylogenetic rate (meta-analysis by Howell et al. 2003) o Fish (Burridge et al. 2008) o Birds (Subramanian et al. 2009) o Insects (Papadopoulou et al. 2010) - Bacteria (Comas et al. 2013) - Viruses (Aiewsakun & Katzourakis 2015)
It s Complicated
- The role of purifying selection in removing some mutations, and the existence of some degree of TDMR is uncontroversial - But: Magnitude of TDMR is debated - Support for order-of-magnitude differences - But that is disputed (e.g. on methodological grounds, e.g. Emerson & Hickerson 2015) - 2 illustrations of order-of-magnitude differences, 1 of sub-order-ofmagnitude difference:
Evolutionary rate (substitutions/site/year) 10 5 10 6 10 7 10 8 10 9 10 10 10 5 10 6 10 7 10 8 10 0 (a) (c) 10 1 10 2 10 4 10 6 10 8 10 2 10 3 10 4 (b) (d) 10 3 10 4 10 5 10 6 10 7 10 6 10 7 10 8 10 2 10 4 10 6 10 10 2 10 0 10 2 10 4 10 6 10 8 8 Fig. 1 Time-dependent patterns in rate estimates have been observed in a variety of taxonomic groups, including: (a) noncoding mitochondrial DNA from amniotes (data from Fig. 1b in Molak & Ho 2015), (b) mitochondrial DNA from insects (data from Fig. 2 in Ho & Lo 2013), (c) genomic DNA from bacteria (data from Fig. 4 in Comas et al. 2013) and (d) genomic RNA and DNA from viruses (data from Fig. 1a in Duch^ene et al. 2014). Trend lines are based on those estimated in the original analyses of these data sets. In panel d, separate trend lines are given for RNA viruses (solid) and DNA viruses (dashed). Calibration age (years)
FIG. 4. Difference between mtdna mutation rate estimates and rates expected under the K2P þ c model. Phylogeny-based mutation rate estimates are drawn from tables 1 and 2. Solid black line: effective mutation rate under our general model of K2P þ c assuming published pedigreebased mutation rate (0.95/bp/My) taken from Howell et al. (2003), ts/tv 5 20/1, a 5 0.4. We explored the sensitivity of the model parameters, examples for each parameter are displayed. Solid gray lines: effective mutation rate given alternative instantaneous pedigree mutation rates 1.20 and 0.75/bp/My. Dotted line: effective mutation rate given ts/tv 5 100/1. Dashed lines: effective mutation rate given alternative a 5 0.8 and 0.2.
Why
Roughly classifiable into biological and methodological factors - Note: Only one purely biological factor, i.e. purifying selection - Most methodological factors involve inadequate modelling of biological processes - Relative importance of factors likely vary across cases Natural selection: purifying (negative) selection - Substantial proportion of slightly deleterious mutations lost continuously from mtdna gene pool over a prolonged period (Kivisild et al. 2006) - Time-dependent decline in the ratio of non-synonymous to synonymous changes in coding sequences (Peterson & Masel 2009)
Methodological factors Calibration error - For calibrations based on the presumed timing of population- or speciesdivergence event, genetic divergence typically assumed to coincide with population divergence. - Genetic divergence often precedes population/species divergence => underestimate of time since genetic divergence => rate overestimation - Can result in rate underestimation, when genetic divergence postdates calibration event - Magnitude of bias greatest on short timescales, where the difference between genetic and population divergence times is an appreciable proportion of the total time separating the two populations or species
T T split T T split Lineage 1 Lineage 2 T T split Species A Species B
Mutational saturation - E.g. when one nucleotide undergoes multiple substitutions from its initial state, and only the first and last states are observed - Unobserved substitutions bias rate estimates downward - Likely to be less of a problem over very short time frames, but important over longer time frames - Under-correction for saturation can contribute to time-dependent rates
Rate heterogeneity across sites (RHAS) - Genomic sites can evolve at significantly different rates o E.g. mtdna exhibits up to 1,000-fold difference between the fastest and slowest evolving sites - Decreases rate estimates through increasing saturation (more unobserved substitutions) at fast-evolving sites over time - Time-dependent underestimate more pronounced both when the proportion of mutational hotspots increases, and when the ratio between high and low rates increases in the data (Soubrier et al. 2012)
A B FIG. 3.Mathematical exploration of the relationship between divergence time, actual substitution rate, and inferred rate from the mathematical model. (A) Rates for various proportions of fastsites, when the ratio of fast to slow rates is 1,000. (B) Ratesforvariousratiosoffastto slow rates, when the proportion of fast sites is fixed at 10%. * represents the parameter values chosen in subsequent simulations.
Demographic factors - E.g. population expansions o For humans, purifying selection further weakened by population expansions associated with the out-of Africa migration and the end of the Last Ice Age. (Henn et al. 2009) o => Higher rate estimates for timescales since then compared to before
References Burridge CP, Craw D, Fletcher D, Waters JM (2008). Geological dates and molecular rates: fish DNA sheds light on time dependency. Molecular Biology and Evolution 25:624 633. Comas I, Coscolla M, Luo T et al. (2013). Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nature Genetics 45:1176 1182. Duchêne S, Duchêne D, Holmes EC, Ho SYW (2015). The performance of the date-randomisation test in phylogenetic analyses of time-structured virus data. Molecular Biology and Evolution 32:1895 1906. Haag-Liautard C, Coffey N, Houle D et al. (2008). Direct estimation of the mitochondrial DNA mutation rate in Drosophila melanogaster. PLoS Biology 6:e204. Henn BM, Gignoux CR, Feldman MW, Mountain JL (2009). Characterizing the time dependency of human mitochondrial DNA mutation rate estimates. Molecular Biology and Evolution 26:217 230. Howell N, Smejkal CB, Mackey DA, Chinnery PF, Turnbull DM, Herrnstadt C (2003). The pedigree rate of sequence divergence in the human mitochondrial genome. There is a difference between phylogenetic and pedigree rates. Am J Hum Genet 72:659 670. Keightley PD, Trivedi U, Thomson M et al. (2009). Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Research 19:1195 1201. Kimura M (1968). Evolutionary rate at the molecular level. Nature 217:624 626.
Kivisild T, Shen P, Wall DP, Do B, Sung R, Davis K et al. (2006). The role of selection in the evolution of human mitochondrial genomes. Genetics 172:373 387. Lee MSY, Ho SYW (2016). Molecular clocks. Current Biology 26:R387-R407 Loogväli EL, Kivisild T, Margus T, Villems R (2009). Explaining the imperfection of the molecular clock of hominid mitochondria. PLoS ONE 4:e8260. Papadopoulou A, Anastasiou I, Vogler AP (2010). Revisiting the insect mitochondrial molecular clock: the mid- Aegean trench calibration. Molecular Biology and Evolution 27:1659 1672. Peterson GI, Masel J (2009). Quantitative prediction of molecular clock and Ka Ks at short timescales. Molecular Biology and Evolution 26:2595 2603. Soubrier J, Steel M, Lee MSY et al. (2012). The influence of rate heterogeneity among sites on the time dependence of molecular rates. Molecular Biology and Evolution 29:3345-3358. Subramanian S, Denver DR, Millar CD et al. (2009). High mitogenomic evolutionary rates and time dependency. Trends in Genetics 25:482 486. Zuckerkandl E, and Pauling L (1962). Molecular disease, evolution, and genic heterogeneity. In Horizons in Biochemistry, M. Kasha, and B. Pullman, eds. (New York: Academic Press): 189 225.