The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

Size: px
Start display at page:

Download "The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites"

Transcription

1 The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites Scott Williamson and Maria E. Orive Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence We investigate the effect of purifying selection at multiple sites on both the shape of the genealogy and the distribution of mutations on the tree. We find that the primary effect of purifying selection on a genealogy is to shift the distribution of mutations on the tree, whereas the shape of the tree remains largely unchanged. This result is relevant to the large number of coalescent estimation procedures, which generally assume neutrality for segregating polymorphisms applying these estimators to evolutionarily constrained sequences could lead to a significant degree of bias. We also estimate the statistical power of several neutrality tests in detecting weak to moderate purifying selection and find that the power is quite good for some parameter combinations. This result contrasts with previous studies, which predicted low statistical power because of the minor effect that weak purifying selection has on the shape of a genealogy. Finally, we investigate the effect of Hill-Robertson interference among linked deleterious mutations on patterns of molecular variation. We find that dependence among selected loci can substantially reduce the efficacy of even fairly strong purifying selection. Introduction A gene genealogy represents the historical relationships among DNA sequences. As such, genealogies are closely related to many patterns of molecular variation, such as the sampling distribution of segregating sites (Watterson 1975; Fu 1995) and nucleotide diversity (Tajima 1983). Also, because of their historical representation, genealogies provide a link between population genetic patterns of variation and phylogenetic patterns of evolution. Coalescent theory, which describes the statistical properties of genealogies, has been very successful in providing estimators for important population parameters and in distinguishing various models of evolution (reviewed in Hudson 1990; Fu and Li 1999). Most of these successes are based on the assumption that all new mutations are selectively neutral. For sequences subject to selection, the validity of these methods is contingent on selection having an insignificant effect on the genealogy. A number of studies have investigated the statistical properties of genealogies for sequences subject to selection. (A brief note on terminology: in keeping with related studies such as Fu and Li 1993, we use the term branch to refer to both internodes and leaves. This definition of branch is different from the formal graph theory definition. Further, internal branch refers to an internode, and external branch refers to a leaf i.e., a branch that connects to a tip of the tree. Finally, we use tree shape as shorthand for the distribution of branch lengths among internal and external branches, without specific reference to tree topology.) For example, Charlesworth B, Morgan, and Charlesworth D (1993) and Hudson and Kaplan (1994, 1995) have shown that background selection (i.e., very strong purifying selection at linked loci) can appreciably reduce Key words: purifying selection, coalescence theory, molecular evolution. Address for correspondence and reprints: Scott Williamson, Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas scottw@ku.edu. Mol. Biol. Evol. 19(8): by the Society for Molecular Biology and Evolution. ISSN: the overall tree length, analogous to a reduction in effective population size. In contrast, for the case of balancing selection, Kaplan, Darden, and Hudson (1988) have found that the total tree length can be considerably increased. Further, Kelly (1997) and Kelly and Wade (2000) found that balancing selection also affects the shape of the tree, giving rise to long internal branches. For weak purifying selection, Krone and Neuhauser (1997) derived a new representation of genealogies called the ancestral selection graph (ASG), which is the weak selection analog of Kingman s (1982) neutral coalescent. Using their ASG methodology, Neuhauser and Krone (1997) found that weak purifying selection has a negligible effect on the time back to the most recent common ancestor (MRCA) of all the sequences in a sample. Przeworski, Charlesworth, and Wall (1999) and Slade (2000) expanded this result to investigate the effect of weak purifying selection on tree shape, again finding virtually no effect of this type of selection. Further, using forward simulations and a correspondingly different set of assumptions, Golding (1997) also found that weak purifying selection has a negligible effect on tree shape. All these studies share two dominant themes. First, they focus on structural changes in the genealogy; e.g., changes in tree shape, tree length, or MRCA time. But because structural changes to the true genealogy are generally unobservable, these studies all assume that observed variation is neutral, whereas the variation on which selection acts is unseen. Under this assumption, mutations are distributed randomly over the entire tree i.e., the expected number of mutations on any branch is simply proportional to the length of the branch. Thus, the number of mutations on a branch is an unbiased estimate of relative branch length. However, if the selected variation is observed segregating in the sample, selection could alter both the shape of the genealogy and the distribution of mutations on the tree (Golding, Aquadro, and Langley 1986; Przeworski, Charlesworth, and Wall 1999). For example, in a sequence subject to weak purifying selection, observed mutations are expected to be recent because older del- 1376

2 Genealogies and Purifying Selection 1377 FIG. 1. The potential action of selection on a genealogy. (a) A neutral genealogy, in which the expected number of mutations on a branch is proportional to its length. (b) A genealogy in which selection has altered the shape of the tree, in this case, increasing the relative length of the external branches. (c) A genealogy with an apparently neutral shape but with a nonrandom distribution of mutations caused by selection. Observed mutations tend to be recent and occur on the external branches. eterious mutations would have already been lost from the population. Hence, the expected number of mutations on a branch would depend on the age of the branch (fig. 1). A second theme of these earlier studies is that they generally consider selection at a single locus, which can be either completely or partially linked to the sequence sampled. These studies cannot be readily expanded to multiple selected sites because, with low levels of recombination, distinct selected sites within a sequence do not evolve independently. For example, the buildup of negative linkage disequilibrium in regions of low recombination reduces the efficacy of directional selection a process known as Hill-Robertson interference (Hill and Robertson 1966; McVean and Charlesworth 2000). Przeworski, Charlesworth, and Wall (1999) attempted to deal with this problem by using the ASG to simulate an infinitely many sites mutation model. However, because of the limitations of the ASG methodology, they could only simulate very weak selection (2Ns 0.2). The study of genealogies with selection leaves two open questions (1) How does selection at multiple sites affect the distribution of mutations on the tree? and (2) how does mutual dependence among selected sites affect the shape of a genealogy and other patterns of molecular variation? This study uses simulations to address these two questions for the specific case of weak and moderate purifying selection. Because mildly deleterious mutations are eliminated slowly from a population, they are likely to be observed segregating in a sample. Therefore, if a large proportion of new mutations are mildly deleterious, weak purifying selection could lead to an appreciable shift in the distribution of observed mutations on a tree. Also, the long sojourn times of mildly deleterious mutations allow multiple selected sites to segregate simultaneously, which can lead to Hill- Robertson interference. McVean and Charlesworth (2000) have investigated the effect of interference on sojourn times, fixation probabilities, and the sampling distribution of segregating sites. They use a reversiblemutation model (Bulmer 1991) with identical mutational fitness effects at each site, which is generally used as a basis for investigations of codon bias. However, this mutation model is probably inappropriate for nonsynonymous changes and mutations in noncoding DNA. Therefore, we investigate the important limiting case of unconditionally deleterious mutation with varying distributions of mutational fitness effects. We also use our simulations to assess the power of several neutrality tests (Tajima s [1989] D test and Fu and Li s [1993] D, D*, F, and F* tests) for detecting weak and intermediate strengths of purifying selection. Under the assumption that selected variation is unobserved, Golding (1997) and Przeworski, Charlesworth, and Wall (1999) suggest that statistical power should be low because weak purifying selection at a single locus has only a small effect on the shape of a genealogy. However, if (1) selection at multiple sites has a synergistic effect on tree shape compared with selection at a single site or if (2) the primary effect of selection is to shift the distribution of mutations on the tree, then Tajima s and Fu and Li s tests might be able to detect purifying selection. McVean and Charlesworth (2000) and Tachida (2000) both use simulations of weak selection at multiple sites to estimate the power of Tajima s (1989) D statistic and some of Fu and Li s (1993) statistics. However, both studies allow adaptive, as well as deleterious, mutation, which could potentially override the effect of weak purifying selection on the statistics. Przeworski, Charlesworth, and Wall (1999) pointed out the limitations of current retrospective simulation methods (e.g., Krone and Neuhauser 1997; Neuhauser and Krone 1997; Slade 2000) for modeling selection at multiple linked sites; the necessary assumptions regarding the genetic system are very restrictive, and expanding to multiple sites is labor-intensive. Therefore, following Golding (1997), we use forward simulations which track ancestry in each generation. But, rather than using Golding s single-locus, two-allele model with symmetrical mutation, we consider an infinitely many sites mutation model within a nonrecombining sequence. In addition, we track the mutational history of each individual in the population. This allows us to determine the distribution of mutations on the tree. Materials and Methods We ran stochastic simulations of a single nonrecombining sequence forward in time; in each generation, we kept track of the ancestry of each gene in all previous generations. A summary of all simulation parameters and output statistics is given in table 1. Selection and reproduction followed the Wright-Fisher model for diploids. Individual fitness was determined as the average fitness of the two parental genes, thus assuming no dominance. Mutation occurred at rate per sequence per generation. The deleterious fitness effect, s, of each new mutation was drawn from a gamma distribution of mutational effects with mean and shape parameter. The gamma distribution is represented by the density function 1 s/ g(s;, ) s e /() 0 s. The gamma distribution was used because it can approximate a wide variety of distributions. For instance,

3 1378 Williamson and Orive Table 1 Simulation Parameters and Output Statistics Parameter or statistic Definition N... n S... e J n... L n... MRCA... Diploid population size Sample size Per-locus mutation rate Mean selection coefficient of new mutations Shape parameter of the gamma distribution Number of segregating sites in the sample Number of external mutations in the sample Average number of pairwise differences in the sample Total tree length of the sample genealogy Summed length of external branches in the sample genealogy Time back to the most recent common ancestor of the sample when the shape parameter 1, the gamma reduces to an exponential distribution with mean. For reasonably large (5), the gamma is an approximately symmetrical, bell-shaped curve. And as beta tends to infinity, the gamma approximates an equal-effects model of mutation. Thus, as increases, the variance decreases. There was no epistasis fitness effects of multiple mutations were combined multiplicatively. We assumed that each new mutation is unique, following an infinitely many sites model. At the end of each generation, the absolute mean fitness of the entire population was reset to unity i.e., we used a soft selection model. Otherwise, with unconditionally deleterious mutation and no recombination, Muller s ratchet (Muller 1964) could have driven the mean fitness close to zero. At the beginning of each run, the population was allowed 8N generations to reach selective equilibrium; during this time, we did not follow ancestry. This established the founding generation. (We have done simulations with a variable number of generations [2N, 4N, 8N, and 16N] to establish mutation-selection-drift equilibrium; we find that the length of time the population is run before the founding generation makes no difference in output statistics.) Next, the population was run until coalescence occurred that is, until all individual genes were descendents of a single gene in the founding generation. At this point, it would not have been appropriate to end the run; the distribution of all statistics would have been conditional on the fact that coalescence of the entire population had just occurred. Therefore, the population was run for an additional 7.5N generations to eliminate any conditionality. We do not know of any theory that predicts the necessary time to eliminate this sort of conditionality. We used 7.5N generations after coalescence because, at this level, simulations of the neutral case achieved reasonable accordance with the neutral expectations for all statistics (data not shown). At the end of each run, n sequences were randomly sampled from the population. For this sample, a genealogy was constructed from the ancestral information stored with each sequence. Three statistics were recorded to summarize the shape of the simulated trees (1) the sum of the external branch lengths L n, (2) the total tree length J n, and (3) the time to the most recent common ancestor (MRCA). Three mutation-based statistics were also calculated (1) the total number of segregating sites, S; (2) nucleotide diversity, ; and (3) the number of external mutations, e. External mutations are defined as mutations occurring on branches that connect to the tips of the genealogy. For large sample sizes, the number of external mutations is generally equal to or very close to the number of singletons (i.e., the number of polymorphic sites for which the less frequent nucleotide is represented only once in the sample). We conducted power analyses for Tajima s (1989) D test and Fu and Li s (1993) D, D*, F, and F* tests. For Tajima s D test, we used the critical values of Simonsen, Churchill, and Aquadro (1995), which are contingent on the observed number of segregating sites. Fu and Li s (1993) D and F tests require a single out-group sequence to determine character polarity at segregating sites. However, because we kept track of the history of new mutations, character polarity was implicit to the information stored by the program. Thus, we did not need to simulate a sister population to generate an out-group sequence. We ran 1,000 simulations for each combination of parameters mutation rate, mean fitness effect of mutations, and shape parameter. Unless otherwise noted, simulations were run with a sample size of n 50 and a diploid population size of N 250. The simulations were written in the C programming language copies of the source code are available upon request from the corresponding author. For a random number generator, we used the function ran1 from Press et al. (1992, p. 280). To assess the realism of our simulations, we ran simulations with solely neutral mutations and compared these with the neutral expectations. The means of all the mutation-based and tree-based summary statistics we considered were within 2.5 standard errors of their respective neutral expectations (table 2) most were actually much closer than this. The most substantial difference between our simulations and the neutral coalescent was that the standard deviations of many of the statistics were somewhat lower than expected. This is possibly because we only ran our simulations 7.5N generations after coalescence, and therefore we might have missed a few very extreme events. To determine whether our results were scalable to larger populations with smaller mutation rates and mean selective effects, we also ran simulations with variable population sizes but with constant values of 4N and 2N. Consistent with

4 Genealogies and Purifying Selection 1379 Table 2 A Comparison of Average Mutation and Tree Statistics with Their Neutral Expectations for Simulations Without Selection 4N 1... Observed Expected 5... Observed Expected... Observed Expected MUTATION STATISTICS S a e b c (2.327) (2.471) (7.574) (7.938) (13.270) (14.397) (1.114) 1 (1.052) (2.872) 5 (2.771) (4.360) (4.551) TREE STATISTICS (0.722) 1 (0.761) (2.561) 5 (2.743) (4.839) (5.160) J n b L n b MRCA d Observed Expected 4,445 (1192) 4,497 (1278) 1,019 (239) 1,000 (278) 970 (498) 980 (538) NOTE. For all simulations, N 250 and n 50. Values in parentheses are simulated standard deviations over 1,000 replicates or expected standard deviations. a Neutral expectations from Watterson (1975). b Neutral expectation from Fu and Li (1993). c Neutral expectation from Tajima (1983). d Neutral expectation from Hudson (1990). diffusion theory (Crow and Kimura 1970; Sawyer and Hartl 1992), the neutral coalescent (Kingman 1982; Hudson 1990), and similar simulations (e.g., McVean and Charlesworth 2000), we found that the actual population size is unimportant when the mutation rate, selection coefficient, and time are scaled by N (table 3). Over the range of diploid population sizes we considered (N 125, 500), all the mutational statistics (S, e, and ) and their standard deviations were virtually constant. Also, when branch length is scaled by N, all the tree statistics were constant. Results and Discussion Tree Shape Versus Distribution of Mutations The results for both tree and mutation statistics are presented in table 4. Among mutation statistics, selection had the largest effect on nucleotide diversity; for the larger values of (i.e., symmetrical distributions of fitness effects with lower variance), the average nucleotide diversity was reduced by almost -fold for every -fold increase in the strength of selection. Selection also had a strong effect on the number of segregating sites. The number of external mutations was not substantially affected except in the case of strong selection. In contrast to the mutation statistics, selection only had a moderate effect on the tree statistics, which is consistent with the single-locus results of Golding (1997), Neuhauser and Krone (1997), Przeworski, Charlesworth, and Wall (1999), and Slade (2000). The total tree length and MRCA time were affected the most by intermediate strengths of selection. But this departure was minor relative to the stochasticity inherent to genealogies. No tree statistic strayed from its neutral expectation by much more than one neutral standard deviation. A straightforward way to describe the shift in the distribution of mutations is by contrasting the average Table 3 Average Tree-Based and Mutation-Based Statistics for Simulations with Constant Values of 4N and 2N but with Variable Population Sizes N J n /N L n /N MRCA/N S e Neutral (4.99) (4.86) 3.95 (1.66) 3.95 (1.54) 3.84 (2.11) 3.65 (2.02) (13.45) (13.59) (14.79) 9.97 (5.31) 9.75 (4.80).20 (4.82) 9.79 (5.06) 9.78 (5.15).07 (5.58) 2N, (3.06) (2.98) 3.69 (1.26) 3.67 (1.27) 2.44 (1.18) 2.43 (1.15) (5.17) (5.20) 7.65 (3.39) 7.40 (3.43) 3.23 (1.37) 3.15 (1.36) (5.22) 7.55 (3.16) 3.16 (1.43) 2N, (2.65).28 (2.55) 3.68 (1.15) 3.60 (1.) 2.12 (0.97) 2.06 (0.90) (4.41) (4.15) (4.23) 7.31 (3.30) 7.22 (3.08) 7.24 (3.29) 2.19 (0.87) 2. (0.81) 2.19 (0.90) NOTE. For all simulations, 4N and n 25. Values in parentheses are standard deviations over 1,000 replicates for N 125 and 250, and 0 replicates for N 500. Tree-based statistics are not given for N 500 because, with such a large population, tracking ancestry was too computationally intensive.

5 1380 Williamson and Orive Table 4 Average Tree-Based and Mutation-Based Statistics for Simulations with Variable Mean Selective Coefficients and Variable Distributions of Fitness Effects 2N J n L n MRCA S e ,411 (1,202) 4,349 (1,119) 4,439 (1,178) 4,169 (1,032) 4,185 (1,035) 4,2 (1,080) 3,789 (836) 3,767 (857) 3,757 (822) 3,505 (784) 3,320 (679) 3,202 (643) 3,539 (840) 3,460 (821) 3,512 (858) 3,823 (981) 4,041 (43) 4,090 (18) 1,009 (313) 1,001 (309) 1,005 (320) 1,015 (330) 997 (286) 1,009 (307) 964 (260) 992 (290) 992 (303) 973 (262) 963 (253) 960 (249) 959 (287) 944 (260) 938 (264) 971 (288) 950 (283) 971 (301) 944 (494) 925 (455) 967 (506) 850 (408) 860 (414) 861 (431) 704 (326) 700 (332) 694 (308) 630 (293) 564 (249) 523 (235) 670 (329) 661 (337) 687 (347) 781 (402) 859 (441) 882 (468) (12.92) (12.36) (12.89) (.72) (.89) (.84) (7.66) (7.81) (7.84) (5.91) (4.96) (4.96) (4.26).49 (3.38).35 (3.33) 8.15 (2.83) 4.61 (2.24) 4.35 (2.13).12 (4.43) 9.90 (4.41) 9.89 (4.37) 9.92 (4.66) 9.77 (4.16).05 (4.31) 9.32 (3.76) 9.41 (3.87) 9.47 (4.08) 8.68 (3.37) 8.35 (3.17) 8.41 (3.30) 6.93 (2.89) 6.44 (2.73) 6.61 (2.69) 4.79 (2.19) 3.78 (1.99) 3.68 (1.95) 9.28 (4.54) 9.05 (4.22) 9.37 (4.48) 7.62 (3.37) 7.68 (3.51) 7.62 (3.45) 5.40 (2.20) 5.02 (2.11) 4.98 (2.01) 3.18 (1.29) 2.28 (0.86) 2.07 (0.80) 1.71 (0.75) 0.74 (0.29) 0.69 (0.28) 0.76 (0.39) 0.23 (0.12) 0.21 (0.11) NOTE. For all simulations, N 250, 4N, and n 50. Values in parentheses are standard deviations over 1,000 replicates. proportion of the tree length composed of external branches (L n /J n ) with the average proportion of new mutations that are external ( e /S). For neutral mutations, these two ratios are expected to be equal; e /S L n /J n indicates that the distribution of mutations has shifted toward the tips of the external branches, and e /S L n /J n indicates a shift toward the root of the tree. We find that L n /J n changes very little with increasing levels of selection for all distributions of mutational fitness effects. The largest, although still modest, departure from the neutral value occurs at intermediate strengths of selection with the higher values of and 4N (fig. 2). In sharp contrast, e /S changes markedly with increasing strengths of selection. With 4N and 2N 0, more than 80% of all mutations are external, compared with just 24% in the neutral case (fig. 2). Clearly, purifying selection at multiple sites can cause a major shift in the distribution of mutations toward the tips of the tree. FIG. 2. The average proportion of tree length composed of external branches (circles) and the average proportion of segregating mutations that are external (squares) for different mutation rates and distributions of mutational effects. The shift in the distribution of mutations on the tree is particularly relevant to the large number of coalescent estimation procedures that have been developed over the last decade. All these methods assume that observed segregating mutations are selectively neutral, and applying these methods to constrained regions of the genome could lead to considerable bias. Methods have been developed to estimate migration rate (Slatkin and Maddison 1989; Nath and Griffiths 1996; Wakeley 1998; Beerli and Felsenstein 1999), effective population size (Orive 1993; Li and Fu 1994; Kuhner, Yamato, and Felsenstein 1995), population growth rate (Kuhner, Yamato, and Felsenstein 1998), divergence time for isolated populations (Wakeley and Hey 1997), ancestral population size (Wakeley and Hey 1997), admixture proportions (Bertorelle and Excoffier 1998), and the per-site recombination rate (Hey and Wakeley 1997; Kuhner, Yamato, and Felsenstein 2000). Wakeley and Hey s (1997) method for estimating divergence time provides a good example of how weak purifying selection could bias coalescent estimators. They divide the total number of segregating sites into (1) sites with polymorphisms shared by both populations, (2) sites polymorphic in only one population, and (3) fixed differences between populations. If one assumes no homoplasy, shared polymorphisms must have arisen in the ancestral population, whereas the other two classes of segregating sites could have arisen before or after divergence. Our results indicate that weak purifying selection can cause a substantial reduction in the number of old mutations, which, in this example, would lead to a lower than expected estimate for the number of shared polymorphisms and would have a lesser effect on other classes of polymorphism. Using Wakeley and Hey s (1997) method, weak purifying selection against slightly deleterious mutations could lead to a substantial overesti-

6 Genealogies and Purifying Selection 1381 FIG. 3. Tree length as a function of the strength of selection for different distributions of mutational effects. The hatched line is the neutral expectation, and the solid line is the background selection expectation. mate of divergence time. This is just one example of how selection is an important source of bias for studies that use genealogical information from constrained regions of the genome. Evolutionary studies of deleterious mutation generally focus on either very slightly deleterious mutation (s 1/2N; e.g., Ohta 1973) or strongly deleterious mutation (s 1%; e.g., Charlesworth B, Morgan, and Charlesworth D 1993), while paying fairly little attention to the transition between the two. Our results indicate that some predictions for the effect of strongly deleterious mutations hold for even relatively weak selection. Studies of background selection (Charlesworth B, Morgan, and Charlesworth D 1993; Hudson and Kaplan 1994, 1995) predict that the effect of strong selection against recurrent deleterious mutation should be roughly equivalent to the effect of a reduction in effective population size. Specifically, for an equal-effects mutation model, Charlesworth B, Morgan, and Charlesworth D (1993, Eq. 4) predict that the expected length of each branch should be equal to the neutral branch length multiplied by a factor of f e /(2h) o where h is the dominance coefficient (for our simulations, h 1/2). For 2N 30 and 2N 0, our results agree reasonably well for the background selection predictions (fig. 3), especially for mutation models with low variance (the background selection predictions are based on an equal-effects mutation model). Although 2N 30 and 2N 0 are strong in the context of our simulations, these values represent unmeasurable strengths of selection for a reasonably large effective population size. For example, if N 0,000, the selection coefficients are and 0.001, respectively. The consequence of variation in mutational fitness effects is rather complex. The relative impact of different mutation models depends on the strength of selection. For example, tree length was affected the most under the equal-effects mutation model ( ) for weak and intermediate selection (fig. 3). But for strong selection, tree length was most affected under an exponential distribution of mutational effects ( 1). We suggest that, relative to the equal-effects case, increasing the variance in the distribution of mutational effects is analogous to a simultaneous reduction in selective coefficient and population size. For distributions with high variance (low ), a large class of mutations will be so strongly selected against that they can contribute little to polymorphism. This class of mutations will have an effect analogous to a reduction in population size (Charlesworth B, Morgan, and Charlesworth D 1993; Hudson and Kaplan 1994, 1995). In addition, among mutations that contribute to polymorphism, weakly selected mutations will be disproportionately represented in the population. Consequently, this class of mutations will disproportionately affect tree statistics, which is analogous to a reduction in mean fitness effect. Interference Between Sites Sawyer and Hartl (1992) and Hartl, Moriyama, and Sawyer (1994) develop predictions for the frequency distribution of segregating sites subject to selection under the assumption of free recombination. We can compare these predictions to our no-recombination results to explore the effect of interference. Sawyer and Hartl (1992) find that the limiting density function for population mutant frequencies is 1 e4n(1x) 1 f (x) 4N. 4N 1 e x(1 x) They used this result to derive the expected number of segregating sites in a sample of size n 1 n n 4N(1x) 1 x (1 x) 1 e H(n) 4N dx. 4N 0 x(1 x) 1 e Also, Hartl, Moriyama, and Sawyer (1994) expanded this result to the entire frequency distribution of segregating sites, represented as the expected number of sites with new mutations shared by r sequences in a sample of size n 1 r nr 4N(1x) n x (1 x) 1 e M(r, n) 4N dx. r x(1 x) 1 e 4N 0 All these results assume that all mutations have equal fitness effects, and H(50), H(2), and M(1,50) are directly comparable to our equal-effects ( ) simulation results for S,, and e. Free-recombination expectations for the frequency distribution of segregating sites with variable mutational fitness effects can be achieved through a simple modification to Sawyer and Hartl s (1992) results. Namely, the limiting density function for population mutant frequencies is simply the equal-effects density function f of selective effect s multiplied by the distribution of mutational fitness effects (in our case, the gamma distribution g), then integrated over all possible s 1 4Ns(1x) 1 e 1 f *(x) 4N g(s;, ) ds. 1 e4ns 0 x(1 x) The arguments of Sawyer and Hartl (1992) and Hartl, Moriyama, and Sawyer (1994) which lead from the pop-

7 1382 Williamson and Orive FIG. 4. The relative and absolute effects of interference on S,, and e. In the top row of graphs, the magnitude of interference is represented as the ratio of the no-recombination mean divided by the free-recombination expectation. For this representation, interference increases monotonically with the strength of selection. In the bottom row, interference is represented as the absolute difference between the norecombination mean and the free-recombination expectation. For this representation, interference is strongest for intermediate strengths of selection. ulation frequency density to the distribution of segregating sites in a sample apply similarly here. Therefore, the expected number of segregating sites is xn (1 x) n H*(n) 4N g(s;, ) x(1 x) e 4Ns(1x) ds dx 1 e 4Ns and the frequency distribution of segregating sites is M*(r, n) 4N 1 1 r nr 0 0 n x (1 x) g(s;, ) r x(1 x) 1 e 4Ns(1x) ds dx. 1 e 4Ns H*(50), H*(2), and M*(1,50) can be compared with our results for S,, and e with 1 and. A comparison of our results with the free-recombination expectations is shown in figure 4. For all selection coefficients, interference increased the level of observed variation, reflected in S,, and e. In contrast, under a reversible mutation, finite-sites model, McVean and Charlesworth (2000, Fig. 3) find that interference decreases standing variation for weak selection. Further, McVean and Charlesworth (2000) reasoned that interference should have the largest effect with weak and intermediate strengths of selection strongly deleterious mutations should not be appreciably affected because they are maintained at such low frequencies, reducing the opportunity for negative linkage disequilibrium. For the situation we consider, this reasoning depends on whether the effect of interference is considered relatively or absolutely. When the effect of interference is represented as the ratio of the observed mean divided by the free-recombination expectation, as in McVean and Charlesworth (2000, Figs. 1 3), we found that the magnitude of interference increased monotonically with the strength of selection. It is unclear whether this trend would continue for stronger selection. However, when the effect of interference is represented as the absolute difference between the observed mean and the free-recombination expectation, interference had the maximum effect for weak to intermediate levels of selection. Statistical Power For most parameter combinations, the statistical power of all tests considered was quite low (fig. 5). But, for intermediate and strong selection with relatively high mutation rates, Fu and Li s (1993) and Tajima s (1989) tests achieved good power. For a two-tailed test at the 95% confidence level, power was as high as 80%. It should be noted, however, that this power estimation is based on the assumption that all mutations segregating in the sample are at least slightly deleterious. If neutral mutations were added, statistical power could be diluted. Because purifying selection has only a minor effect on the shape of the tree, neutral mutations would continue to look neutral even though they were linked to deleterious mutations; i.e., with close to the neutral expectation for nucleotide diversity, number of segregating sites, and number of external mutations. Also, because deleterious mutations are generally maintained at low frequencies, they would be underrepresented in a sample of sequences with both neutral and deleterious mutations. For example, in our simulations, statistical power

8 Genealogies and Purifying Selection 1383 anonymous reviewer for helpful comments on the manuscript. This work was supported by a National Science Foundation Predoctoral Fellowship to S.W. and by a University of Kansas New Faculty General Research Fund Grant and National Science Foundation Grant DEB to M.E.O. LITERATURE CITED FIG. 5. Statistical power to detect purifying selection for Tajima s D test and Fu and Li s D, D*, F, and F* tests. Statistical power is measured as the fraction, out of 1,000 simulated samples, of twotailed tests rejecting neutrality for P For all simulations, N 250. The different graphs show results for different mutation models, 1,, and, and for different scaled mutation rates, 4N 3 and. was the highest for 4N and 2N 30. For these parameters, if neutral mutations occurred at the same rate, then, extrapolating from tree statistics, one would expect roughly 77% of all mutations segregating in the sample to be neutral. Therefore, the evolutionary signal caused by purifying selection at multiple sites could be partially obscured by neutral mutation. Conclusions Our most important result is that purifying selection at multiple sites has a much stronger effect on the distribution of mutations than on the shape of the genealogy. Although previous studies have presented heuristic arguments that weak purifying selection should shift the distribution of mutations toward the tips of the genealogy (e.g., Fu and Li 1993; Akashi 1999), to our knowledge, ours is the first study to quantify this shift. The shift in the distribution of mutations could considerably bias the many coalescent estimators when applied to constrained regions of the genome. This potential bias is particularly important in the light of studies that suggest that most nonsynonymous mutations (e.g., Fay, Wyckoff, and Wu 2001) and many synonymous mutations (reviewed in Sharp et al. 1995; Akashi and Eyre- Walker 1998) are subject to purifying selection. Acknowledgments We would like to thank John Kelly for many invaluable discussions and suggestions on the manuscript. We also thank Brian Golding, Peter Waddell, and one AKASHI, H Within- and between-species DNA sequence variation and the footprint of natural selection. Gene 238: AKASHI, H., and A. EYRE-WALKER Translational selection and molecular evolution. Curr. Opin. Genet. Dev. 8: BEERLI, P., and J. FELSENSTEIN Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152: BERTORELLE, G., and L. EXCOFFIER Inferring admixture proportions from molecular data. Mol. Biol. Evol. 15: BULMER, M. G The selection-mutation-drift theory of synonymous codon usage. Genetics 129: CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH The effect of deleterious mutation on neutral molecular variation. Genetics 134: CROW, J. F., and M. KIMURA An introduction to population genetics theory. Burgess, Minneapolis, Minn. FAY, J. C., G. J. WYCKOFF, and C. I. WU Positive and negative selection on the human genome. Genetics 158: FU, Y. X Statistical properties of segregating sites. Theor. Popul. Biol. 48: FU, Y. X., and W. H. LI Statistical test of neutrality of mutations. Genetics 133: Coalescing into the 21st century: an overview and prospects of coalescent theory. Theor. Popul. Biol. 56: 1. GOLDING, G. B The effect of purifying selection on genealogies. Pp in P. DONNELLY and S. TAVARE, eds. Progress in population genetics and human evolution, Vol. 87. IMA volumes in mathematics and its applications. Springer-Verlag, New York. GOLDING, G. B., C. F. AQUADRO, and C. H. LANGLEY Sequence evolution within populations under multiple types of mutation. Proc. Natl. Acad. Sci. USA 83: HARTL, D. L., E. N. MORIYAMA, and S. A. SAWYER Selection intensity for codon bias. Genetics 138: HEY, J., and J. WAKELEY A coalescent estimator of the population recombination rate. Genetics 145: HILL, W. G., and A. ROBERTSON The effect of linkage on limits to artificial selection. Genet. Res. 8: HUDSON, R. R Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 1:1 14. HUDSON, R. R., and N. L. KAPLAN Gene trees with background selection. Pp inb. GOLDING, ed. Nonneutral evolution: theories and molecular data. Chapman and Hall, New York The coalescent process and background selection. Philos. Trans. R. Soc. Lond. B 349: KAPLAN, N. L., T. DARDEN, and R. R. HUDSON The coalescent process in models with selection. Genetics 120: KELLY, J. K A test of neutrality based on interlocus associations. Genetics 146:

9 1384 Williamson and Orive KELLY, J. K., and M. J. WADE Molecular evolution near a two-locus balanced polymorphism. J. Theor. Biol. 204: KINGMAN, J. F. C The coalescent. Stochastic Process. Appl. 13: KRONE, S. K., and C. NEUHAUSER Ancestral processes with selection. Theor. Popul. Biol. 51: KUHNER, M. K., J. YAMATO, and J. FELSENSTEIN Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140: Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149: Maximum likelihood estimation of recombination rates from population data. Genetics 156: LI, W.-H., and Y.-X. FU Estimation of population parameters and detection of natural selection from DNA sequences. Pp inb. GOLDING, ed. Non-neutral evolution: theories and molecular data. Chapman and Hall, New York. MCVEAN, G. A. T., and B. CHARLESWORTH The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics 155: MULLER, H. J The relation of recombination to mutational advance. Mutat. Res. 1:2 9. NATH, H., and R. GRIFFITHS Estimation in an island model using simulation. Theor. Popul. Biol. 50: NEUHAUSER, C., and S. K. KRONE The genealogy of samples in models with selection. Genetics 145: OHTA, T Slightly deleterious mutant substitutions in evolution. Nature 246: ORIVE, M. E Effective population size in organisms with complex life-histories. Theor. Popul. Biol. 44: PRESS, W. H., B. P. FLANNERY, S.A.TEUKOLSKY, and W. T. VETTERLING Numerical recipes in C: the art of scientific computing. Cambridge University Press, Cambridge, U.K. PRZEWORSKI, M., B. CHARLESWORTH, and J. D. WALL Genealogies and weak purifying selection. Mol. Biol. Evol. 16: SAWYER, S. A., and D. L. HARTL Population genetics of polymorphism and divergence. Genetics 132: SHARP, P. M., M. AVEROF, A. T. LLOYD, G. MATASSI, and J. F. PEDEN DNA sequence evolution: the sounds of silence. Philos. Trans. R. Soc. Lond. B 349: SIMONSEN, K. L., G. A. CHURCHILL, and C. F. AQUADRO Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141: SLADE, P. F Simulation of selected genealogies. Theor. Popul. Biol. 57: SLATKIN, M., and W. P. MADDISON A cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123: TACHIDA, H Molecular evolution in a multisite nearly neutral model. J. Mol. Evol. 50: TAJIMA, F Evolutionary relationship of DNA sequences in finite populations. Genetics 5: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: WAKELEY, J Segregating sites in Wright s island model. Theor. Popul. Biol. 53: WAKELEY, J., and J. HEY Estimating ancestral population parameters. Genetics 145: WATTERSON, G. A On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: BRIAN GOLDING, reviewing editor Accepted April 24, 2002

I of a gene sampled from a randomly mating popdation,

I of a gene sampled from a randomly mating popdation, Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University

More information

Genetic Variation in Finite Populations

Genetic Variation in Finite Populations Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2

More information

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009 Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?

More information

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates JOSEPH FELSENSTEIN Department of Genetics SK-50, University

More information

The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent"

The Structure of Genealogies in the Presence of Purifying Selection: a Fitness-Class Coalescent The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent" The Harvard community has made this article openly available. Please share how this access benefits you.

More information

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs Statistical Tests for Detecting Positive Selection by Utilizing High Frequency SNPs Kai Zeng *, Suhua Shi Yunxin Fu, Chung-I Wu * * Department of Ecology and Evolution, University of Chicago, Chicago,

More information

Sequence evolution within populations under multiple types of mutation

Sequence evolution within populations under multiple types of mutation Proc. Natl. Acad. Sci. USA Vol. 83, pp. 427-431, January 1986 Genetics Sequence evolution within populations under multiple types of mutation (transposable elements/deleterious selection/phylogenies) G.

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

Selection and Population Genetics

Selection and Population Genetics Selection and Population Genetics Evolution by natural selection can occur when three conditions are satisfied: Variation within populations - individuals have different traits (phenotypes). height and

More information

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants Genetics: Published Articles Ahead of Print, published on September 1, 2006 as 10.1534/genetics.106.061432 Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Kai Zeng,*

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences Indian Academy of Sciences A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences ANALABHA BASU and PARTHA P. MAJUMDER*

More information

The Combinatorial Interpretation of Formulas in Coalescent Theory

The Combinatorial Interpretation of Formulas in Coalescent Theory The Combinatorial Interpretation of Formulas in Coalescent Theory John L. Spouge National Center for Biotechnology Information NLM, NIH, DHHS spouge@ncbi.nlm.nih.gov Bldg. A, Rm. N 0 NCBI, NLM, NIH Bethesda

More information

arxiv: v2 [q-bio.pe] 26 May 2011

arxiv: v2 [q-bio.pe] 26 May 2011 The Structure of Genealogies in the Presence of Purifying Selection: A Fitness-Class Coalescent arxiv:1010.2479v2 [q-bio.pe] 26 May 2011 Aleksandra M. Walczak 1,, Lauren E. Nicolaisen 2,, Joshua B. Plotkin

More information

Australian bird data set comparison between Arlequin and other programs

Australian bird data set comparison between Arlequin and other programs Australian bird data set comparison between Arlequin and other programs Peter Beerli, Kevin Rowe March 7, 2006 1 Data set We used a data set of Australian birds in 5 populations. Kevin ran the program

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Metapopulation models for historical inference

Metapopulation models for historical inference Molecular Ecology (2004) 13, 865 875 doi: 10.1111/j.1365-294X.2004.02086.x Metapopulation models for historical inference Blackwell Publishing, Ltd. JOHN WAKELEY Department of Organismic and Evolutionary

More information

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,

More information

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data Proc. Natl. Acad. Sci. USA Vol. 95, pp. 5456 546, May 998 Statistics Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data ANDRZEJ

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

122 9 NEUTRALITY TESTS

122 9 NEUTRALITY TESTS 122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that

More information

Fitness landscapes and seascapes

Fitness landscapes and seascapes Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of

More information

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh,

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Genetics: Published Articles Ahead of Print, published on November 19, 2005 as 10.1534/genetics.105.047217 Estimating selection on non-synonymous mutations Laurence Loewe 1, Brian Charlesworth, Carolina

More information

Lecture Notes: BIOL2007 Molecular Evolution

Lecture Notes: BIOL2007 Molecular Evolution Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Effective population size and patterns of molecular evolution and variation

Effective population size and patterns of molecular evolution and variation FunDamental concepts in genetics Effective population size and patterns of molecular evolution and variation Brian Charlesworth Abstract The effective size of a population,, determines the rate of change

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution Wright-Fisher Models, Approximations, and Minimum Increments of Evolution William H. Press The University of Texas at Austin January 10, 2011 1 Introduction Wright-Fisher models [1] are idealized models

More information

Neutral behavior of shared polymorphism

Neutral behavior of shared polymorphism Proc. Natl. Acad. Sci. USA Vol. 94, pp. 7730 7734, July 1997 Colloquium Paper This paper was presented at a colloquium entitled Genetics and the Origin of Species, organized by Francisco J. Ayala (Co-chair)

More information

Neutral Theory of Molecular Evolution

Neutral Theory of Molecular Evolution Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation

More information

Contrasts for a within-species comparative method

Contrasts for a within-species comparative method Contrasts for a within-species comparative method Joseph Felsenstein, Department of Genetics, University of Washington, Box 357360, Seattle, Washington 98195-7360, USA email address: joe@genetics.washington.edu

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

LINKAGE DISEQUILIBRIUM, SELECTION AND RECOMBINATION AT THREE LOCI

LINKAGE DISEQUILIBRIUM, SELECTION AND RECOMBINATION AT THREE LOCI Copyright 0 1984 by the Genetics Society of America LINKAGE DISEQUILIBRIUM, SELECTION AND RECOMBINATION AT THREE LOCI ALAN HASTINGS Defartinent of Matheinntics, University of California, Davis, Calijornia

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Supporting Information

Supporting Information Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA

Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA Rasmus Nielsen* and Ziheng Yang *Department of Biometrics, Cornell University;

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p. Lecture 8 - Selection and Tests of Neutrality Gibson and Muse, chapter 5 Nei and Kumar, chapter 2.6 p. 258-264 Hartl, chapter 3, p. 22-27 The Usefulness of Theta Under evolution by genetic drift (i.e.,

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Expected coalescence times and segregating sites in a model of glacial cycles

Expected coalescence times and segregating sites in a model of glacial cycles F.F. Jesus et al. 466 Expected coalescence times and segregating sites in a model of glacial cycles F.F. Jesus 1, J.F. Wilkins 2, V.N. Solferini 1 and J. Wakeley 3 1 Departamento de Genética e Evolução,

More information

Population Genetics: a tutorial

Population Genetics: a tutorial : a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development

More information

How should we organize the diversity of animal life?

How should we organize the diversity of animal life? How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)

More information

The coalescent process

The coalescent process The coalescent process Introduction Random drift can be seen in several ways Forwards in time: variation in allele frequency Backwards in time: a process of inbreeding//coalescence Allele frequencies Random

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Evolutionary Theory Mathematical and Conceptual Foundations Sean H. Rice Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Contents Preface ix Introduction 1 CHAPTER 1 Selection on One

More information

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them? Carolus Linneaus:Systema Naturae (1735) Swedish botanist &

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Surfing genes. On the fate of neutral mutations in a spreading population

Surfing genes. On the fate of neutral mutations in a spreading population Surfing genes On the fate of neutral mutations in a spreading population Oskar Hallatschek David Nelson Harvard University ohallats@physics.harvard.edu Genetic impact of range expansions Population expansions

More information

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE Department of Biology, University of Penmyluania, Philadelphia, Pennsyluania 19174 Manuscript received September 17,

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Human Population Genomics Outline 1 2 Damn the Human Genomes. Small initial populations; genes too distant; pestered with transposons;

More information

PHENOTYPIC evolution is channeled through patterns

PHENOTYPIC evolution is channeled through patterns Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.106.069658 Neutral Evolution of Multiple Quantitative Characters: A Genealogical Approach Cortland K. Griswold,* Benjamin Logsdon,1

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Segregation versus mitotic recombination APPENDIX

Segregation versus mitotic recombination APPENDIX APPENDIX Waiting time until the first successful mutation The first time lag, T 1, is the waiting time until the first successful mutant appears, creating an Aa individual within a population composed

More information

Stochastic Demography, Coalescents, and Effective Population Size

Stochastic Demography, Coalescents, and Effective Population Size Demography Stochastic Demography, Coalescents, and Effective Population Size Steve Krone University of Idaho Department of Mathematics & IBEST Demographic effects bottlenecks, expansion, fluctuating population

More information

ESTIMATION of recombination fractions using ped- ber of pairwise differences (Hudson 1987; Wakeley

ESTIMATION of recombination fractions using ped- ber of pairwise differences (Hudson 1987; Wakeley Copyright 2001 by the Genetics Society of America Estimating Recombination Rates From Population Genetic Data Paul Fearnhead and Peter Donnelly Department of Statistics, University of Oxford, Oxford, OX1

More information

The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations

The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.094904 The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations Thomas Städler,*,1

More information

Properties of Statistical Tests of Neutrality for DNA Polymorphism Data

Properties of Statistical Tests of Neutrality for DNA Polymorphism Data Copyright 1995 by the Genetics Society of America Properties of Statistical Tests of Neutrality for DNA Polymorphism Data Katy L. Simonsen,* Gary A. Churchill*.+ and Charles F. Aquadro: *Center for Applied

More information

EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION

EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION Friday, July 27th, 11:00 EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION Karan Pattni karanp@liverpool.ac.uk University of Liverpool Joint work with Prof.

More information

THE analysis of population subdivision has been a ble to transform an estimate of F ST into an estimate of

THE analysis of population subdivision has been a ble to transform an estimate of F ST into an estimate of Copyright 2001 by the Genetics Society of America Distinguishing Migration From Isolation: A Markov Chain Monte Carlo Approach Rasmus Nielsen and John Wakeley Department of Organismic and Evolutionary

More information

Linking levels of selection with genetic modifiers

Linking levels of selection with genetic modifiers Linking levels of selection with genetic modifiers Sally Otto Department of Zoology & Biodiversity Research Centre University of British Columbia @sarperotto @sse_evolution @sse.evolution Sally Otto Department

More information

Notes 20 : Tests of neutrality

Notes 20 : Tests of neutrality Notes 0 : Tests of neutrality MATH 833 - Fall 01 Lecturer: Sebastien Roch References: [Dur08, Chapter ]. Recall: THM 0.1 (Watterson s estimator The estimator is unbiased for θ. Its variance is which converges

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

Muller s Ratchet and the Pattern of Variation at a Neutral Locus

Muller s Ratchet and the Pattern of Variation at a Neutral Locus Copyright 2002 by the Genetics Society of America Muller s Ratchet and the Pattern of Variation at a Neutral Locus Isabel Gordo,*,1 Arcadio Navarro and Brian Charlesworth *Instituto Gulbenkian da Ciência,

More information

Genetic hitch-hiking in a subdivided population

Genetic hitch-hiking in a subdivided population Genet. Res., Camb. (1998), 71, pp. 155 160. With 3 figures. Printed in the United Kingdom 1998 Cambridge University Press 155 Genetic hitch-hiking in a subdivided population MONTGOMERY SLATKIN* AND THOMAS

More information

Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data

Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data Na Li and Matthew Stephens July 25, 2003 Department of Biostatistics, University of Washington, Seattle, WA 98195

More information

The genomic rate of adaptive evolution

The genomic rate of adaptive evolution Review TRENDS in Ecology and Evolution Vol.xxx No.x Full text provided by The genomic rate of adaptive evolution Adam Eyre-Walker National Evolutionary Synthesis Center, Durham, NC 27705, USA Centre for

More information

Demography April 10, 2015

Demography April 10, 2015 Demography April 0, 205 Effective Population Size The Wright-Fisher model makes a number of strong assumptions which are clearly violated in many populations. For example, it is unlikely that any population

More information

I. Short Answer Questions DO ALL QUESTIONS

I. Short Answer Questions DO ALL QUESTIONS EVOLUTION 313 FINAL EXAM Part 1 Saturday, 7 May 2005 page 1 I. Short Answer Questions DO ALL QUESTIONS SAQ #1. Please state and BRIEFLY explain the major objectives of this course in evolution. Recall

More information

MUTATIONS that change the normal genetic system of

MUTATIONS that change the normal genetic system of NOTE Asexuals, Polyploids, Evolutionary Opportunists...: The Population Genetics of Positive but Deteriorating Mutations Bengt O. Bengtsson 1 Department of Biology, Evolutionary Genetics, Lund University,

More information

EVOLUTION INTERNATIONAL JOURNAL OF ORGANIC EVOLUTION

EVOLUTION INTERNATIONAL JOURNAL OF ORGANIC EVOLUTION EVOLUTION INTERNATIONAL JOURNAL OF ORGANIC EVOLUTION PUBLISHED BY THE SOCIETY FOR THE STUDY OF EVOLUTION Vol. 54 December 2000 No. 6 Evolution, 54(6), 2000, pp. 839 854 PERSPECTIVE: GENE DIVERGENCE, POPULATION

More information

Gene regulation: From biophysics to evolutionary genetics

Gene regulation: From biophysics to evolutionary genetics Gene regulation: From biophysics to evolutionary genetics Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen Johannes Berg Stana Willmann Curt Callan (Princeton)

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Tutorial on Theoretical Population Genetics

Tutorial on Theoretical Population Genetics Tutorial on Theoretical Population Genetics Joe Felsenstein Department of Genome Sciences and Department of Biology University of Washington, Seattle Tutorial on Theoretical Population Genetics p.1/40

More information

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events 9 Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events Model selection is a process of seeking the least inadequate model from a predefined set, all of which may be grossly

More information

Discrete & continuous characters: The threshold model

Discrete & continuous characters: The threshold model Discrete & continuous characters: The threshold model Discrete & continuous characters: the threshold model So far we have discussed continuous & discrete character models separately for estimating ancestral

More information

Rare Alleles and Selection

Rare Alleles and Selection Theoretical Population Biology 59, 8796 (001) doi:10.1006tpbi.001.153, available online at http:www.idealibrary.com on Rare Alleles and Selection Carsten Wiuf Department of Statistics, University of Oxford,

More information

Outline of lectures 3-6

Outline of lectures 3-6 GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 009 Population genetics Outline of lectures 3-6 1. We want to know what theory says about the reproduction of genotypes in a population. This results

More information

The neutral theory of molecular evolution

The neutral theory of molecular evolution The neutral theory of molecular evolution Introduction I didn t make a big deal of it in what we just went over, but in deriving the Jukes-Cantor equation I used the phrase substitution rate instead of

More information

Outline of lectures 3-6

Outline of lectures 3-6 GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 007 Population genetics Outline of lectures 3-6 1. We want to know what theory says about the reproduction of genotypes in a population. This results

More information

Population Structure

Population Structure Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations

More information

POPULATION GENETICS MODELS FOR THE STATISTICS OF DNA SAMPLES UNDER DIFFERENT DEMOGRAPHIC SCENARIOS MAXIMUM LIKELIHOOD VERSUS APPROXIMATE METHODS

POPULATION GENETICS MODELS FOR THE STATISTICS OF DNA SAMPLES UNDER DIFFERENT DEMOGRAPHIC SCENARIOS MAXIMUM LIKELIHOOD VERSUS APPROXIMATE METHODS Int. J. Appl. Math. Comput. Sci., 23, Vol. 13, No. 3, 347 355 POPULATION GENETICS MODELS FOR THE STATISTICS OF DNA SAMPLES UNDER DIFFERENT DEMOGRAPHIC SCENARIOS MAXIMUM LIKELIHOOD VERSUS APPROXIMATE METHODS

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

A fast estimate for the population recombination rate based on regression

A fast estimate for the population recombination rate based on regression Genetics: Early Online, published on April 15, 213 as 1.1534/genetics.113.21 A fast estimate for the population recombination rate based on regression Kao Lin, Andreas Futschik, Haipeng Li CAS Key Laboratory

More information

Analysis of the Seattle SNP, Perlegen, and HapMap data sets

Analysis of the Seattle SNP, Perlegen, and HapMap data sets A population genetics model with recombination hotspots that are heterogeneous across the population Peter Calabrese Molecular and Computational Biology, University of Southern California, 050 Childs Way,

More information