The abundance of deleterious polymorphisms in humans

Similar documents
Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.

Fitness landscapes and seascapes

Fixation of Deleterious Mutations at Critical Positions in Human Proteins

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

The genomic rate of adaptive evolution

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

7. Tests for selection

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA

Natural selection on the molecular level

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh,

Understanding relationship between homologous sequences

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

UNDER the neutral model, newly arising mutations fall

Time-dependent Molecular Rates: A Very Rough Guide

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Lecture Notes: BIOL2007 Molecular Evolution

Estimating Evolutionary Trees. Phylogenetic Methods

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events

Genetic Drift in Human Evolution

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Gene regulation: From biophysics to evolutionary genetics

Near Neutrality. Leading Edge of the Neutral Theory of Molecular Evolution. Austin L. Hughes

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Genetic Variation in Finite Populations

Sequence Divergence & The Molecular Clock. Sequence Divergence

Nonneutral evolution at the mitochondrial NADH dehydrogenase subunit 3 gene in mice (mtdna/neutral theory/sughtly deleterious/slecon/mus domesuicus)

Sequence evolution within populations under multiple types of mutation

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

Selection against A 2 (upper row, s = 0.004) and for A 2 (lower row, s = ) N = 25 N = 250 N = 2500

Leber s Hereditary Optic Neuropathy

Neutral Theory of Molecular Evolution

Supporting Information

122 9 NEUTRALITY TESTS

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Processes of Evolution

The distribution of fitness effects of new mutations

Concepts and Methods in Molecular Divergence Time Estimation

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Polymorphism due to multiple amino acid substitutions at a codon site within

Neutral behavior of shared polymorphism

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

When is selection effective?

Cladistics and Bioinformatics Questions 2013

Inferring the Frequency Spectrum of Derived Variants to Quantify Adaptive Molecular Evolution in Protein-Coding Genes of Drosophila melanogaster

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Adaptation in the Human Genome. HapMap. The HapMap is a Resource for Population Genetic Studies. Single Nucleotide Polymorphism (SNP)

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

TOWARD A SELECTION THEORY OF MOLECULAR EVOLUTION

Lecture 11 Friday, October 21, 2011

A Population Genetics-Phylogenetics Approach to Inferring Natural Selection in Coding Sequences

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

the role of standing genetic variation

Elevated Rates of Nonsynonymous Substitution in Island Birds

8/23/2014. Phylogeny and the Tree of Life

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Comparative population genomics: power and principles for the inference of functionality

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

How much non-coding DNA do eukaryotes require?

Molecular evolution - Part 1. Pawan Dhar BII

Weighing the evidence for adaptation at the molecular level

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Classical Selection, Balancing Selection, and Neutral Mutations

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

Proceedings of the SMBE Tri-National Young Investigators Workshop 2005

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Genome-wide evolutionary rates in laboratory and wild yeast

Evaluating Purifying Selection in the Mitochondrial DNA of Various Mammalian Species

NUCLEOTIDE mutations are the ultimate source of

Supporting information for Demographic history and rare allele sharing among human populations.

Graduate Funding Information Center

Taming the Beast Workshop

LETTERS. Natural selection on protein-coding genes in the human genome

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

7.36/7.91 recitation CB Lecture #4

FUNDAMENTALS OF MOLECULAR EVOLUTION

Processes of Evolution

MICHAEL D. PURUGGANAN* AND JANE I. SUDDITH MATERIALS AND METHODS

WHAT fraction of new mutations in the genome are

Selection and Population Genetics

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Introduction to Advanced Population Genetics

p(d g A,g B )p(g B ), g B

Levels of genetic variation for a single gene, multiple genes or an entire genome

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Lecture 7 Mutation and genetic variation

Genomes and Their Evolution

Microsatellite evolution in Adélie penguins

Transcription:

Genetics: Published Articles Ahead of Print, published on February 23, 2012 as 10.1534/genetics.111.137893 Note February 3, 2011 The abundance of deleterious polymorphisms in humans Sankar Subramanian Griffith School of Environment, Griffith University, 170 Kessels Road, Nathan Qld 4111, Australia 1 Copyright 2012.

Running head: Temporal pattern of deleterious polymorphisms Keywords: human evolution, deleterious mutations, disease-associated mutations, genome-wide association and population genetics theory Address for correspondence: Dr. Sankar Subramanian Environmental Futures Centre School of Environment Griffith University 170 Kessels Road Nathan QLD 4111 Australia Phone: +61-7-3735 7495 Fax: +61-7-3735 7459 E-mail: s.subramanian@griffith.edu.au 2

ABSTRACT Here I show a gradual decline in the proportion of deleterious nonsynonymous SNPs (nsnps) from tip to root of the human population tree. This study reveals that up to 48% of nsnps specific to a single genome are deleterious in nature, which underscores the abundance of deleterious polymorphisms in humans. 3

The neutral theory of molecular evolution predicts that amino acid replacement polymorphisms are neutral and largely governed by mutation and genetic drift (KIMURA and OHTA 1971). However, Ohta (1973) suggested that some amino acid polymorphisms could be slightly deleterious and that they are modulated by purifying selection and drift. According to her prediction, rare amino acid polymorphisms are, on average slightly deleterious and the number of these variants in a population is generally more than that expected under the strict neutral hypothesis (OHTA 1973). Later, Kimura (1983) proposed that deleterious mutations can contribute significantly to diversity, but at high frequencies they are selected against and prevented from becoming fixed. These predictions were examined in a number of studies using RFLP (EXCOFFIER 1990; MERRIWETHER et al. 1991; WHITTAM et al. 1986) or DNA sequence data (TAJIMA 1989), which indeed showed an excess in the number of rare variants than that expected using Watterson s test (WATTERSON 1978) and/or Tajima s test (TAJIMA 1989). Using a different approach, studies on mitochondrial genes compared the diversity and divergence at synonymous and nonsynonymous sites and showed that the ratio of intra-specific diversities at nonsynonymous- to synonymous positions ( =dn/ds) was significantly higher than the ratio of inter-specific divergence at these sites (BALLARD and KREITMAN 1994; NACHMAN et al. 1994; RAND et al. 1994). This suggests an excess of nonsynonymous polymorphisms (nsnps) in populations compared to fixed replacement mutations. The excess fraction of nsnps could be enriched with rare alleles, which are potentially deleterious and natural selection eliminates them from the population over time. This pattern appears to be universal as it was observed by a number of studies on mitochondrial genes of human (HASEGAWA et al. 1998; NACHMAN et al. 1996; RAND and KANN 1996), mouse (NACHMAN et al. 1994; RAND and KANN 1996), fruit fly 4

(BALLARD and KREITMAN 1994; RAND et al. 1994) as well as on nuclear genes of bacteria (ROCHA et al. 2006) and viruses (HOLMES 2003). Using a number of nuclear loci, Hughes et.al (2003; 2005) showed a reduced diversity at nonsynonymous sites of human genes suggesting purifying selection, which in turn, implies that slightly deleterious nsnps are prevalent in human populations. Large-scale genome-wide studies on mitochondrial (KIVISILD et al. 2006; RUIZ- PESINI et al. 2004; SUBRAMANIAN 2009) and nuclear genes (BUSTAMANTE et al. 2005; CARGILL et al. 1999; WONG et al. 2003) also suggested the presence of low-frequency deleterious amino acid SNPs in humans. Using allele frequency spectrum of synonymous and nonsynonymous sites, several studies estimated the strength of selection on amino acid mutations and thus determined the proportion of deleterious amino acid mutations in humans (BOYKO et al. 2008; EYRE-WALKER et al. 2006). However, it is important to estimate the fraction of deleterious nsnps at different depths of the human population tree. This is because a much higher fraction of deleterious nsnps is expected if individuals from the same population (or between closely related populations) are compared, as opposed to comparisons involving genomes from distantly related populations. For instance, we anticipate a relatively high fraction of deleterious SNPs among the polymorphisms shared between two Europeans than those shared between a European and an Asian or a European and an African. This is evident from a previous study that showed a significantly less fraction of deleterious nsnps that are shared between African-Americans and European-Americans than those present in only one of the population (LOHMUELLER et al. 2008). This is because the fraction of deleterious SNPs is expected to be inversely proportional to the time of separation between populations as they are removed by purifying selection over time. Hence I examined 5

the temporal pattern of deleterious nsnps at all internal and terminal branches of the human population tree using complete genomes belonging to four distinct human populations. To quantify the proportion of deleterious polymorphisms, two independent methods were used. Polymorphism data from ten complete human genomes belonging to Europeans, Asians, Africans and a Khoisan (an ancient African lineage) were obtained from different databanks (see, supporting information for materials and methods). Synonymous (ssnps) and nonsynonymous SNPs (nsnps) were grouped based on the pattern of sharing between the genomes following the known phylogenetic relationship between the human populations (TISHKOFF et al. 2009). If a SNP is shared between a Khoisan and any other genome, it was considered to be the oldest (see branch A in figure 1). If a SNP is shared between a European and an African and not shared by Khoisan, it is considered to be common to Africans and non-africans (branch B). Similarly, if a SNP is shared between a European and an Asian but not shared by an African or Khoisan, it is considered to be ancestral to Eurasians (branch C). Likewise, if a SNP is shared only between any two Europeans and not shared with any other genome, it was considered to be ancestral to Europeans (branch D). Finally, if a SNP is present in only one genome and not shared with any other genome, it is considered to be specific to that individual (all terminal branches such as E). These grouping assumed that convergent or parallel mutations are rare. Typically, nsnps are expected to be more abundant at terminal branches than internal branches of the tree as a fraction of them are removed over time. In contrast, ssnps are expected to be present at roughly equal proportions at all nodes of the tree. Therefore, the ratio of nsnps (A) to ssnps (S) will normalize the time elapsed and therefore this ratio could be used to estimate the 6

excess fraction of nsnps in each branch at various depths of the tree. Table 1 shows that A/S ratios are highest among unshared genome specific SNPs (branches E, G, I and J) and lowest among those SNPs that are shared with Khoisan genome (branch A). I then estimated the ratio of diversity at nonsynonymous to synonymous sites ( ) for each branch. This measure is much clearer, as it is on a scale of 0-1. Figure 1 shows a steady decline in from the tip to the root of the tree with the estimate for the terminal branches (0.34-0.48) being up to twofold higher than that of the ancestral branch (0.26). The estimate of was smallest for the inter-species humanchimp comparison (0.25), which is significantly smaller than the s estimated using intra-species comparisons (P < 0.001, using a Z test) (Table 1). These results are in conformity with the theoretical predictions suggesting that intra-species is expected to be higher than interspecies when Nes < 0 (KIMURA 1983; KRYAZHIMSKIY and PLOTKIN 2008; PETERSON and MASEL 2009). McDonald and Kreitman (MCDONALD and KREITMAN 1991) showed that under neutrality the estimated for human populations ( H ) is expected to be equal to the HC estimated for the interspecific (human-chimp) comparison i.e. H HC. It is clear from table 1 that the estimated for the human population is always higher than that estimated for the human-chimp comparison i.e. H HC. This is due to the presence of a fraction of deleterious nsnps ( ) in the populations. To subtract the fraction of deleterious SNPs the equation could be written as H H HC. This equation could be simplified to estimate the fraction of deleterious nonsynonymous SNPs ( ) as HC / H ). The measure is the fraction of deleterious nsnps that are segregating in the population. I estimated for each branch of the tree and as 7

expected, this measure systematically declined from the tip to the root of the tree (Figure 1). This is also clear from the negative relationship between and the relative age of the SNPs (Figure 2). Importantly, deleterious proportion is up to 7.9 times higher among the nsnps specific to individual genomes (0.48 - branch E) than those shared with the ancient Khoisan genome (0.06 - branch A). To obtain an independent estimate of the fraction of deleterious nsnps, I used PolyPhen-2, which predicts the possible impact of an amino acid substitution on the structure and function of a human protein using physicochemical and comparative information (ADZHUBEI et al. 2010). Based on severity, this software predicts an amino acid change as either benign or damaging. The proportion of nsnps damaging to the protein ( ) was then estimated using nsnps specific to each branch of the tree. It is interesting to note that the pattern of on the human population tree is almost identical to that of (Table 1 and Figure 1&2). Among the oldest nsnps that are shared with the Khoisan, only 10% were predicted to be damaging to protein structure/function (branch A) while this fraction was 33% for the genome-specific nsnps. Although the declining patterns of from tip to root are very clear in African as well as non- African lineages, the estimates of non-african branches (0.29-0.48) (Figure 2A&B) are higher than those of African ones (0.21-0.38) (Figure 2C). This could be due to the effect of population bottlenecks that presumably had occurred in European and Asian populations (LI and DURBIN 2011; MARTH et al. 2004). During this period some of the deleterious SNPs might have drifted to higher frequencies, which are expected to decrease during population expansion [see,(hughes 8

et al. 2005)]. Since the African population has had an uninterrupted (or mildly bottlenecked) population history (LI and DURBIN 2011), such an effect is expected to be minimal. The results show that the deleterious proportion of nsnps is can be up to 48% of the amino acid variants specific to a single genome. However, this is an underestimate as this analysis was based only on a small dataset of 10 genomes. Adding more genomes to this analysis will redefine the pattern of sharing of SNPs as a number of nsnps that are currently identified as lineage or genome specific will likely be shared among diverse populations. However a large proportion of these shared nsnps are expected to be neutral as deleterious SNPs are unlikely to perpetuate in a population for a long time. Hence, by adding more genomes, a large proportion of neutral nsnps are likely to be shifted to deeper internal branches of the human population tree. Although this will lead to a reduction in the number of nsnps (observed in this study) in the terminal branches of the population tree, a much higher (than reported here) proportion of them are likely to be deleterious in nature. Furthermore, the fraction of deleterious SNPs estimated here is based on the assumption that adaptive evolution in hominids is rare. If this assumption is not valid then the fractions of deleterious SNPs reported here are underestimates. This is because adaptive evolution results in higher rate of nonsynonymous substitution compared to nonsynonymous diversity. Hence by excluding the adaptive fraction while estimating will lead to lower inter-species (than that reported here), which will result in higher estimates. A previous study using two populations (African-Americans and European-Americans) showed that the deleterious fraction of population specific nsnps was 52-77% higher compared to that 9

shared between the two populations (LOHMUELLER et al. 2008). However using a much deeper human population tree this study showed that the deleterious fraction of genome-specific nsnps is 3.3 7.9 times higher compared to that shared between all humans. Although the temporal pattern reported in this study is based on nonsynonymous coding SNPs, an identical pattern is expected for all noncoding SNPs such as those present in promoters, silencers, enhancers, splicejunctions and untranslated regions (UTRs) that are under selective constraint. The findings of this study suggest that SNPs that are exclusively present in a single genome or closely related genomes are more likely to be associated with a disease as they are under strong purifying selection (HUGHES et al. 2003). Hence genome-wide association studies could focus on these SNPs, particularly when the disease causing variants have large phenotypic effects (CIRULLI and GOLDSTEIN 2010). ACKNOWLEDGEMENTS The author is grateful to David Lambert and thanks Leon Huynen, Robert Friedman, Michael Nachman and two anonymous reviewers for valuable comments. FOOTNOTES Supporting information is available online at: http://www.genetics.org/.. Polyphen data is available at http://www.mediafire.com/?5jila01rh49dmgm. LITERATURE CITED ADZHUBEI, I. A., S. SCHMIDT, L. PESHKIN, V. E. RAMENSKY, A. GERASIMOVA et al., 2010 A method and server for predicting damaging missense mutations. Nat Methods 7: 248-249. BALLARD, J. W., and M. KREITMAN, 1994 Unraveling selection in the mitochondrial genome of Drosophila. Genetics 138: 757-772. 10

BOYKO, A. R., S. H. WILLIAMSON, A. R. INDAP, J. D. DEGENHARDT, R. D. HERNANDEZ et al., 2008 Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4: e1000083. BUSTAMANTE, C. D., A. FLEDEL-ALON, S. WILLIAMSON, R. NIELSEN, M. T. HUBISZ et al., 2005 Natural selection on protein-coding genes in the human genome. Nature 437: 1153-1157. CARGILL, M., D. ALTSHULER, J. IRELAND, P. SKLAR, K. ARDLIE et al., 1999 Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet 22: 231-238. CIRULLI, E. T., and D. B. GOLDSTEIN, 2010 Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11: 415-425. EXCOFFIER, L., 1990 Evolution of human mitochondrial DNA: evidence for departure from a pure neutral model of populations at equilibrium. J Mol Evol 30: 125-139. EYRE-WALKER, A., M. WOOLFIT and T. PHELPS, 2006 The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics 173: 891-900. HASEGAWA, M., Y. CAO and Z. YANG, 1998 Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Mol Biol Evol 15: 1499-1505. HOLMES, E. C., 2003 Patterns of intra- and interhost nonsynonymous variation reveal strong purifying selection in dengue virus. J Virol 77: 11296-11298. HUGHES, A. L., B. PACKER, R. WELCH, A. W. BERGEN, S. J. CHANOCK et al., 2003 Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc Natl Acad Sci U S A 100: 15754-15757. 11

HUGHES, A. L., B. PACKER, R. WELCH, A. W. BERGEN, S. J. CHANOCK et al., 2005 Effects of natural selection on interpopulation divergence at polymorphic sites in human proteincoding Loci. Genetics 170: 1181-1187. KIMURA, M., 1983 The neutral theory of molecular evolution. Cambridge University press, Cambridge. KIMURA, M., and T. OHTA, 1971 Protein polymorphism as a phase of molecular evolution. Nature 229: 467-469. KIVISILD, T., P. SHEN, D. P. WALL, B. DO, R. SUNG et al., 2006 The role of selection in the evolution of human mitochondrial genomes. Genetics 172: 373-387. KRYAZHIMSKIY, S., and J. B. PLOTKIN, 2008 The population genetics of dn/ds. PLoS Genet 4: e1000304. LI, H., and R. DURBIN, 2011 Inference of human population history from individual wholegenome sequences. Nature 475: 493-496. LOHMUELLER, K. E., A. R. INDAP, S. SCHMIDT, A. R. BOYKO, R. D. HERNANDEZ et al., 2008 Proportionally more deleterious genetic variation in European than in African populations. Nature 451: 994-997. MARTH, G. T., E. CZABARKA, J. MURVAI and S. T. SHERRY, 2004 The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166: 351-372. MCDONALD, J. H., and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652-654. MERRIWETHER, D. A., A. G. CLARK, S. W. BALLINGER, T. G. SCHURR, H. SOODYALL et al., 1991 The structure of human mitochondrial DNA variation. J Mol Evol 33: 543-555. 12

NACHMAN, M. W., S. N. BOYER and C. F. AQUADRO, 1994 Nonneutral evolution at the mitochondrial NADH dehydrogenase subunit 3 gene in mice. Proc Natl Acad Sci U S A 91: 6364-6368. NACHMAN, M. W., W. M. BROWN, M. STONEKING and C. F. AQUADRO, 1996 Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142: 953-963. OHTA, T., 1973 Slightly deleterious mutant substitutions in evolution. Nature 246: 96-98. PETERSON, G. I., and J. MASEL, 2009 Quantitative prediction of molecular clock and ka/ks at short timescales. Mol Biol Evol 26: 2595-2603. RAND, D. M., M. DORFSMAN and L. M. KANN, 1994 Neutral and non-neutral evolution of Drosophila mitochondrial DNA. Genetics 138: 741-756. RAND, D. M., and L. M. KANN, 1996 Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. Mol Biol Evol 13: 735-748. ROCHA, E. P., J. M. SMITH, L. D. HURST, M. T. HOLDEN, J. E. COOPER et al., 2006 Comparisons of dn/ds are time dependent for closely related bacterial genomes. J Theor Biol 239: 226-235. RUIZ-PESINI, E., D. MISHMAR, M. BRANDON, V. PROCACCIO and D. C. WALLACE, 2004 Effects of purifying and adaptive selection on regional variation in human mtdna. Science 303: 223-226. SUBRAMANIAN, S., 2009 Temporal Trails of Natural Selection in Human Mitogenomes. Molecular Biology and Evolution 26: 715-717. TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585-595. 13

TISHKOFF, S. A., F. A. REED, F. R. FRIEDLAENDER, C. EHRET, A. RANCIARO et al., 2009 The genetic structure and history of Africans and African Americans. Science 324: 1035-1044. WATTERSON, G. A., 1978 The homozygosity test of neutrality. Genetics 88: 405-417. WHITTAM, T. S., A. G. CLARK, M. STONEKING, R. L. CANN and A. C. WILSON, 1986 Allelic variation in human mitochondrial genes based on patterns of restriction site polymorphism. Proc Natl Acad Sci U S A 83: 9611-9615. WONG, G. K., Z. YANG, D. A. PASSEY, M. KIBUKAWA, M. PADDOCK et al., 2003 A population threshold for functional polymorphisms. Genome Res 13: 1873-1879. 14

Table 1. Genome-wide human SNPs and parameter estimates Branch on the tree Nonsynoymous SNPs (A) Synonymous SNPs (S) A/S = dn/ds Fraction of deleterious nsnps ( Damaging nsnps Benign nsnps Fraction of damaging nsnps ( ) HC * 62373 92246 0.68 0.25 (0.001) - - - - A 6494 9019 0.72 0.26 (0.004) 0.06 (0.017) 503 4598 0.10 (0.004) B 3028 3723 0.81 0.30 (0.007) 0.17 (0.025) 382 1982 0.16 (0.008) C 2003 2181 0.92 0.33 (0.010) 0.26 (0.032) 329 1221 0.21 (0.010) D 1210 1232 0.98 0.36 (0.014) 0.31 (0.043) 217 758 0.22 (0.013) E 5172 3956 1.31 0.48 (0.010) 0.48 (0.024) 1364 2782 0.33 (0.007) F 293 309 0.95 0.35 (0.028) 0.29 (0.085) 55 186 0.23 (0.027) G 2333 1843 1.27 0.46 (0.014) 0.47 (0.034) 576 1327 0.30 (0.011) H 411 478 0.86 0.31 (0.021) 0.21 (0.069) 77 261 0.23 (0.023) I 4013 4284 0.94 0.34 (0.007) 0.28 (0.023) 878 2382 0.27 (0.008) J 5565 5140 1.08 0.39 (0.008) 0.38 (0.021) 1535 3118 0.33 (0.007) * - Human-Chimpanzee comparison 15

FIGURE LEGENDS Figure 1. Population phylogeny of human genomes used in this study. At each branch, estimates of the following are given in parenthesis: i) Ratio of nonsynonymous to synonymous SNPs per site ( ) ii) Fraction of deleterious nonsynonymous SNPs ( ) iii) Proportion of damaging nonsynonymous SNPs ( ). *Estimates for terminal branches are the average of five, two and two corresponding genomes of Europeans, Asians and Africans respectively. Materials and methods are provided in the supporting information. Figure 2. The proportion of deleterious nonsynonymous polymorphisms ( ) and the fraction of damaging amino acid polymorphisms ( ) estimated for each branch of the human population tree shown in Figure 1. The negative relationship between deleterious nsnps and relative time is shown for the A) European lineage B) Asian lineage C) African lineage. Error bars indicate the standard error of the mean. 16

17

18