Temporal Trails of Natural Selection in Human Mitogenomes Author Sankarasubramanian, Sankar Published 2009 Journal Title Molecular Biology and Evolution DOI https://doi.org/10.1093/molbev/msp005 Copyright Statement This is a pre-copy-editing, author-produced PDF of an article accepted for publication in Molecular Biology and Evolution following peer review. The definitive publisher-authenticated version Molecular Biology and Evolution Volume 26, Issue 4, 2009, Pages 715-717 is available online at: http://dx.doi.org/10.1093/molbev/msp005. Downloaded from http://hdl.handle.net/10072/30190 Griffith Research Online https://research-repository.griffith.edu.au
Letter (Revised) December 30, 2008 Temporal trails of natural selection in human mitogenomes Sankar Subramanian 1,2 1 Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Auckland, New Zealand 2 Griffith School of Environment, Griffith University, Nathan Qld 4111, Australia Keywords: human evolution, haplogroups, natural selection, deleterious mutations, coalescence time and the neutral theory Running head: Natural selection in human mitochondrial genomes Abbreviation: Mitogenomes Mitochondrial genomes; tmrca- Time of the Most Recent Common Ancestor Title length: 57 characters (including spaces) Abstract length: 113 words Total length of text: 10,412 (including spaces) Total page requirement: 3.75 pages Number of figures: 2 Number of tables: 1 Number of references: 20 Address for correspondence: Dr. Sankar Subramanian Griffith School of Environment Griffith University 170 Kessels Road Nathan QLD 4111 Australia Phone: +61-7-3735 7495 Fax: +61-7-3735 7459 E-mail: s.subramanian@griffith.edu.au
Abstract Mildly deleterious mutations initially contribute to the diversity of a population but later they are selected against at high frequency and are eliminated eventually. Using over 1500 complete human mitochondrial genomes along with those of Neanderthal and Chimpanzee, I provide empirical evidence for this prediction by tracing the footprints of natural selection over time. The results show a highly significant inverse relationship between the ratio of nonsynonymous- to synonymous divergence (dn/ds) and the age of human haplogroups. Furthermore this study suggests that slightly deleterious mutations constitute up to 80% of the mitochondrial amino acid replacement mutations detected in human populations and that over the last 500,000 years these mutations have been gradually removed.
The neutral theory of molecular evolution suggests that the genetic polymorphisms in a population and the long-term evolutionary changes in the genomes are two different aspects of the same evolutionary process (Kimura 1983). According to this theory the rate of protein evolution and the extent of amino acid polymorphisms are largely determined by the rate of neutral amino acid mutations. Therefore, the ratio of nonsynonymous divergence (dn) to synonymous divergence (ds) between species is expected to be equal to the ratio of nonsynonymous diversity to synonymous diversity within the population of a species. Contrary to this prediction studies on animal mitochondrial protein coding genes showed that this ratio (ω = dn/ds) observed within species was much higher than that estimated between species (Nachman et al. 1996; Rand and Kann 1996; Hasegawa et al. 1998). The presence of slightly deleterious mutations in populations was attributed to explain this discrepancy. Furthermore, variation in the rate of evolution between short- and long-term evolutionary histories of species has also been suggested (Ho et al 2005). Although these studies highlight the difference in the evolutionary process at population and species levels, such dichotomous information is insufficient to understand the actual shift in the evolutionary process over time. Therefore in the present study I examine the relationship between the footprints of selection and the temporal history of human populations. For this purpose I assembled a dataset of 1512 complete mitochondrial genomes belonging to 18 haplogroups and the genomes of Neanderthal, and chimpanzee (Supplementary figure S1). The coalescence time (tmrca) of each haplogroup was estimated by the program, MCMCcoal using a rate of 3.2 10-8 substitution per site per year (Rannala and Yang 2003). The ωs (dn/ds) were estimated using (Yang 2007)
(Table1). For human-neanderthal and human-chimpanzee comparisons pair-wise divergences at nonsynonymous- and synonymous sites were used. The ωs of haplogroups and species comparisons were plotted as the function of the coalescence/divergence times. A highly significant inverse relationship of ω with time is shown in Figure 1A (Spearman s coefficient, ρ = -0.89, P = 0.0002, n = 20). This correlation remains significant even after excluding the human-chimpanzee comparison (ρ = -0.87, P = 0.0002, n = 19). To examine any methodological biases that may have influenced the results, the number of nonsynonymous (N) and synonymous changes (S) in each haplogroup tree were counted using PAUP (Swofford 2003) (Table 1). The ratio of nonsynoymous- to synonymous changes (N/S) also produced a similar and highly significant result (ρ = -0.80, P = 0.0006, n=19) when plotted against the coalescence times (Figure 1B). Human-chimpanzee comparison was not included in this analysis because of the high divergence between them, which leads to underestimation of synonymous changes resulting from multiple hits. To estimate the coalescence time independent of synonymous positions the mitochondrial hyper-variable region (HVR I) was used. The relationship of ω with coalescence times obtained using HVR I is qualitatively similar and highly significant (Supplementary figure S2). The average ω observed for the young haplogroups of < 50 Kyr is 5.8 times higher than that estimated for the human-chimpanzee pair (0.23 vs 0.04) (Figure 2 and Table 1). Within human populations the average ω of the young haplogroups is 28% higher than that of the groups aged between 50-100 Kyr (0.18) and 64% higher than that of the >100 Kyr old African haplogroups (0.14). The ω for the human-neanderthal comparison (0.12) is only slightly higher than that of the African haplogroups. However
this estimate could be influenced by effects of the much smaller population sizes of Neanderthals (Green et al. 2006). Previous studies on animal mitochondrial genes (Nachman et al. 1996; Rand and Kann 1996; Hasegawa et al. 1998) and on bacterial (Rocha et al. 2006) and viral (Holmes 2003) genomes observed a high ω within the population of a species or between closely related species. This general pattern that is present in widely different organisms cannot be explained by any lineage or population specific effects such as relaxed selective constraints or reduction/expansion in population size. However these results are in fact predicted by the neutral theory, which suggests that deleterious alleles can contribute significantly to the heterozygosity but they are selected against at high frequency and being prevented from becoming fixed (p46, Kimura 1983). Therefore the presence of mild to moderately deleterious mutations in the population is the most likely explanation for the elevated ω. The present study elucidates this and shows a gradual decline in the fraction of such mutations over time. Apparently the young populations harbor a mixture of mutations with varying degrees of detrimental effect on fitness and the relative time of removal of these mutations depends on the magnitude of the effect. The ω of the human-chimpanzee comparison suggests that ~4% of the nonsynonymous mutations are neutral as they had been fixed in these species. This implies that on an average 83% of the nonsynonymous mutations present in the young haplogroups (t < 50 Kyr, ω = 0.23) are slightly deleterious and even after 500 Kyr (human-neanderthal divergence, ω = 0.12) these mutations constitute ~67%. The ω estimates are suggestive of the strength of selection based on the assumption of neutral evolution in synonymous sites. Although selection in synonymous sites of human genes
is known to be weak due to the small population size (Subramanian 2008), a recent study suggests that over 98% of the synonymous sites of human mitochondrial genes are largely free from selection (Yang and Nielsen 2008). Therefore assuming a weak selection in synonymous sites is unlikely to influence the results significantly. The present study provides a neutral or nearly neutral explanation (Kimura 1983; Ohta 1992) for the ephemeral presence of slightly deleterious replacement mutations in a population. Such mutations result in high ω estimates in recent human haplogroups (see Kivisild et al. 2006). Therefore investigations suggesting adaptive evolution should consider the relative age of the population as a much higher ω is generally expected for very young haplogroups. Methods The mitochondrial genome sequences of human haplogroups were obtained from GenBank. Only mitogenomes that have been explicitly mentioned as a particular haplogroup were included. Genomes that did not form a monophyletic group with the respective haplogroup were excluded. The sequences were aligned by CLUSTAL-W (Larkin et al. 2007). Concatenated cdna and RNA alignments of 1535 mitogenomes belonging to 22 haplogroups, Neanderthal, Chimpanzee and Bonobo were used to construct a Neighbor-Joining tree (Saitou and Nei 1987) based on the Tamura-Nei distance (Tamura and Nei 1993) using MEGA (Tamura et al. 2007) (Supplementary figure S1). However for further analysis only haplogroups with > 10 genomes were included and the final dataset contained 1512 mitogenomes genomes belonging to 18 haplogroups (Table 1).
The coalescence distance (root height) of the most recent common ancestor (MRCA) for each haplogroup was estimated by MCMCcoal (Rannala and Yang 2003), using the synonymous sites (fourfold degenerate sites) of the 12 protein coding genes coded by the heavy strand of the mitochondrial DNA. The gene coded by the light strand, ND6 was not included due to the difference in mutation patterns between the strands (Hasegawa et al. 1998). Based on the synonymous divergence between humanchimpanzee (0.45 substitutions per site) and assuming a 7 Mya divergence time the evolutionary rate was estimated as 3.2 10-8 substitution per site per year. Using this rate, the time of the most recent common ancestor (tmrca) of each haplogroup was determined (Table 1). Although the tmrca estimates are based on simple Jukes-Cantor model, using the GTR+Gamma model employed in the program BEAST (Drummond and Rambaut) produced similar relative estimates (Supplementary table S1). Note that the results of this study only depend on the relative ages of the haplogroups rather than absolute time. The ratio of nonsynonymous- to synonymous divergence for each haplogroup was estimated by the single dn/ds ratio model of PAML (Yang 2007) using an NJ model tree. For human-neanderthal and human-chimpanzee comparisons pair-wise divergence at synonymous and nonsynonymous sites were estimated using PAML. To obtain the list of synonymous changes (S) and nonsynoymous changes (N) in each haplogroup tree, the parsimony setting of PAUP was used (Swofford 2003). For this purpose the zerofold degenerate sites and fourfold degenerate sites were used respectively. Supplementary Material Supplementary materials are available at Molecular Biology and Evolution online.
Acknowledgements The author is grateful to David Lambert and acknowledges the support from the Marsden Fund and University of Auckland. I thank John Waugh, Leon Huynen and two anonymous referees for comments.
Literature Cited Drummond, A., and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214. Green, R., J. Krause, S. Ptak, A. Briggs, M. Ronan, J. Simons, L. Du, M. Egholm, J. Rothberg, M. Paunovic, and S. Pääbo. 2006. Analysis of one million base pairs of Neanderthal DNA. Nature 444:330-336. Hasegawa, M., Y. Cao, and Z. Yang. 1998. Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Mol Biol Evol 15:1499-1505. Ho, S., M. Phillips, A. Cooper, and A. Drummond. 2005. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol 22:1561-1568. Holmes, E. 2003. Patterns of intra- and interhost nonsynonymous variation reveal strong purifying selection in dengue virus. J Virol 77:11296-11298. Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University press, Cambridge. Kivisild, T., P. Shen, D. Wall, B. Do, R. Sung, K. Davis, G. Passarino, P. Underhill, C. Scharfe, A. Torroni, R. Scozzari, D. Modiano, A. Coppa, P. de Knijff, M. Feldman, L. Cavalli-Sforza, and P. Oefner. 2006. The role of selection in the evolution of human mitochondrial genomes. Genetics 172:373-387. Larkin, M., G. Blackshields, N. Brown, R. Chenna, P. McGettigan, H. McWilliam, F. Valentin, I. Wallace, A. Wilm, R. Lopez, J. Thompson, T. Gibson, and D. Higgins. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947-
2948. Nachman, M., W. Brown, M. Stoneking, and C. Aquadro. 1996. Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142:953-963. Ohta, T. 1992. The nearly neutral theory of molecular evolution. Annual review of ecology and systematics 23:263-286. Rand, D., and L. Kann. 1996. Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. Mol Biol Evol 13:735-748. Rannala, B., and Z. Yang. 2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164:1645-1656. Rocha, E., J. Smith, L. Hurst, M. Holden, J. Cooper, N. Smith, and E. Feil. 2006. Comparisons of dn/ds are time dependent for closely related bacterial genomes. J Theor Biol 239:226-235. Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406-425. Subramanian, S. 2008. Nearly neutrality and the evolution of codon usage bias in eukaryotic genomes. Genetics 178:2429-2432. Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Associates, Sunderland, Massachusetts. Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596-1599. Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in
the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512-526. Yang, Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586-1591. Yang, Z., and R. Nielsen. 2008. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol 25:568-579.
Figure Legend Figure 1. (A) The log-log relationship between the coalescence/divergence time and the ratio of nonsynonymous- to synonymous divergence (ω=dn/ds). Each data point denotes a haplogroup and the two points at the far right are the human-neanderthal and human-chimpanzee comparisons. The relationship is highly significant (Spearman s coefficient, ρ = -0.89, P = 0.0002, n=20), which is true even after excluding the humanchimpanzee comparison (ρ = -0.87, P = 0.0002, n=19). (B) The log-log relationship between the coalescence/divergence times and the ratio of the number of nonsynonymous- to synoymous changes (N/S). Spearman s coefficient, ρ = -0.80, P = 0.0006, n=19. The human-chimpanzee comparison is not included in this analysis. The best fitting linear regression lines are shown. Figure 2. Average ω (dn/ds) estimated for the haplogroups with coalescence times of < 50 Kyr (X, V, W, I, G, J and B), 100-150 Kyr (A, C, D, T, H, U and K-UK) and >150 Kyr (L0, L1, L2 and L3). Errors bars are 95% confidence intervals. The differences between the means (of any two groups) are highly significant (P < 0.01) using a two tailed Mann-Whitney U test.
Table 1. Estimates of coalescence times, ω (dn/ds) and the ratio of nonsynonymous (N) to synonymous changes (S) Haplogroup/ Number of tmrca (Kyr) ω=dn/ds N/S Species pair Genomes A 97 50.2 0.17 1.26 B 31 49.6 0.21 1.50 C 48 58.0 0.20 1.21 D 28 61.2 0.17 0.91 G 11 36.7 0.21 1.00 H 357 73.4 0.17 1.10 I 29 32.1 0.27 1.91 J 106 47.5 0.24 1.71 K (UK) 195 89.3 0.17 1.00 L0 35 158.2 0.16 0.92 L1 33 125.9 0.15 0.93 L2 94 123.1 0.14 0.81 L3 92 101.4 0.13 0.76 T 97 67.2 0.16 0.90 U 127 77.8 0.20 1.19 V 61 20.3 0.22 2.22 W 51 28.0 0.17 1.17 X 20 18.4 0.31 3.50 Human-Neanderthal - 492.6 0.12 0.57 Human-Chimpanzee - 7000.0 0.04 -
A 1 Ratio of nonsynonymous- to synonymous divergence (ω = dn/ds) 0.1 0.01 10000 100000 1000000 10000000 Coalescence/divergence time (Years) B Ratio of nonsynonymous- to synoymous changes (N/S) 10 1 0.1 10000 100000 1000000 Coalescence/divergence time (years) Figure 1 Subramanian
0.3 0.26 Average ω (dn/ds) 0.22 0.18 0.14 0.1 <50 50-100 >100 Coalescence time (Kyr) Figure 2 Subramanian