The effect of strong purifying selection on genetic diversity

Size: px
Start display at page:

Download "The effect of strong purifying selection on genetic diversity"

Transcription

1 Genetics: Early Online, published on May 9, 08 as 0.534/genetics The eect o strong puriying selection on genetic diversity Ivana Cvijović,4, Benjamin H. Good,4, and Michael M. Desai,3,4 Department o Organismic and Evolutionary Biology and FAS Center or Systems Biology, Harvard University, Cambridge, MA 038, Departments o Physics and Bioengineering, University o Caliornia, Berkeley CA 9470, 3 Department o Physics, Harvard University, Cambridge, MA 038, 4 Kavli Institute or Theoretical Physics, University o Caliornia, Santa Barbara, CA 9306 Puriying selection reduces genetic diversity, both at sites under direct selection and at linked neutral sites. This process, known as background selection, is thought to play an important role in shaping genomic diversity in natural populations. Yet despite its importance, the eects o background selection are not ully understood. Previous theoretical analyses o this process have taken a backwards-time approach based on the structured coalescent. While they provide some insight, these methods are either limited to very small samples or are computationally prohibitive. Here, we present a new orward-time analysis o the trajectories o both neutral and deleterious mutations at a nonrecombining locus. We ind that strong puriying selection leads to remarkably rich dynamics: neutral mutations can exhibit sweep-like behavior, and deleterious mutations can reach substantial requencies even when they are guaranteed to eventually go extinct. Our analysis o these dynamics allows us to calculate analytical expressions or the ull site requency spectrum. We ind that whenever background selection is strong enough to lead to a reduction in genetic diversity, it also results in substantial distortions to the site requency spectrum, which can mimic the eects o population expansions or positive selection. Because these distortions are most pronounced in the low and high requency ends o the spectrum, they become particularly important in larger samples, but may have small eects in smaller samples. We also apply our orward-time ramework to calculate other quantities, such as the ultimate ates o polymorphisms or the itnesses o their ancestral backgrounds. Puriying selection against newly arising deleterious mutations is essential to preserving biological unction. It is ubiquitous across all natural populations, and is responsible or genomic sequence conservation across long evolutionary timescales. In addition to preserving unction at directly selected sites, negative selection also leaves signatures in patterns o diversity at linked neutral sites that have been observed in a wide range o organisms (Begun and Aquadro, 99; Charlesworth, 996; Comeron, 04; Cutter and Payseur, 003; Elyashiv et al., 06; Flowers et al., 0; McVicker et al., 009). This process is known as background selection, and understanding its eects is essential to characterizing the evolutionary pressures that have shaped a population, as well as to distinguishing its eects rom less ubiquitous events such as population expansions or the positive selection o new adaptive traits. At a qualitative level, the eects o background selection are well-known: it reduces linked neutral diversity by reducing the number o individuals that are able to contribute descendants in the long run. Since individuals that carry strongly deleterious mutations cannot leave descendants on long timescales, all diversity that persists in the population must have arisen in individuals that were ree o deleterious mutations. Since all o these individuals are equivalent in itness, this suggests that diversity should resemble that expected in a neutral population o smaller size speciically, with size equal to the number o mutation-ree individuals (Charlesworth et al., 993). However, an extensive body o work has shown that this intuition is not correct, and that background selection against strongly deleterious mutations can lead to non-neutral distortions in diversity statistics (Charlesworth et al., 993, 995; Good et al., 04; Gordo et al., 00; Hudson and Kaplan, 994; Nicolaisen and Desai, 0; O Fallon et al., 00; Tachida, 000; Walczak et al., 0; Williamson and Orive, 00). The reason or this is simple: even strong selection cannot purge deleterious alleles instantly. Instead, deleterious haplotypes persist in the population on short timescales, allowing neutral variants that arise on their backgrounds to reach modest requencies. This is most readily apparent in statistics based on the site requency spectrum (the number, p(), o polymorphisms which are at requency in the population), such as the number o singletons or Tajima s D (Tajima, 989). As we show below, even when deleterious mutations have a strong eect on itness, the site requency spectrum shows an enormous excess o rare variants compared to the expectation or a neutral popula- Copyright 08.

2 tion o reduced eective size. These signatures in genetic diversity are qualitatively similar to those we expect rom population expansions and positive selection (Keinan and Clark, 0; Rannala, 997; Sawyer and Hartl, 99; Slatkin and Hudson, 99). A detailed quantitative understanding o background selection is thereore essential i we are to disentangle its signatures rom those o other evolutionary processes. The traditional approach to analyzing the eects o puriying selection has been to use backwardstime approaches based on the structured coalescent (Hudson and Kaplan, 994, 988). This oers an approximate ramework to model how background selection aects the statistics o genealogical histories o a sample, and hence the expected patterns o genetic diversity. The approximations underlying this method are valid when selection is suiciently strong that deleterious mutations rarely ix (Neher and Shraiman, 0), the same regime we will consider in this work. However, while these backwardstime structured coalescent methods make it possible to rapidly simulate genealogies, they are essentially numerical methods and do not lead to analytical predictions. Furthermore, they give limited intuition as to the conditions under which their approximations are valid. A more technical but crucial limitation is that they rapidly become very computationally demanding in larger samples. This is becoming an increasingly important problem as advances in sequencing technology now make it possible to study sample sizes o thousands (or even hundreds o thousands) o individuals. The poor scaling o coalescent methods with sample size is o particular importance in studying background selection: since puriying selection is expected to result in an excess o rare variants, its eects increase in magnitude as sample size increases. This can reveal deviations rom neutrality in large samples that are not seen in smaller samples. Here, we use an alternative, orward-time approach to analyze how puriying selection aects patterns o genetic variation at a nonrecombining genomic segment. Our method is based on the observation that to predict single-locus statistics, such as the site requency spectrum, it is not necessary to model the entire genealogy. Instead, we model the requency o the lineage descended rom a single mutation as it changes over time due to the combined orces o selection and genetic drit, and as it accumulates additional deleterious mutations. We then use these allele requency trajectories to predict the site requency spectrum, rom which any other single-site statistic o interest can then be calculated (note however that multi-site statistics such as linkage disequilibrium or correlations between allele requencies at dierent sites cannot be calculated rom the site requency spectrum). We show that background selection creates large distortions in the requency spectrum at linked neutral sites whenever there is signiicant itness variation in the population. These distortions are concentrated in the high and low requency ends o the requency spectrum, and hence are particularly important in large samples. We provide analytical expressions or the requencies at which these distortions occur, and can thereore predict at what sample sizes they can be seen in data. Aside rom single-timepoint statistics such as the site requency spectrum, we also obtain analytical orms or the statistics o allele requency trajectories. These trajectories have a very non-neutral character, which relects the underlying linked selection. Our approach oers an intuitive explanation or how these non-neutral behaviors arise in the presence o substantial linked itness variation, which explains the origins o the distortions in the site requency spectrum. The statistics o allele requency trajectories can also be used to calculate any time-dependent singlesite statistic. For example, we analyze how the uture trajectory o a mutation can be predicted rom the requency at which we initially observe it, and we discuss the extent to which the observed requency o a polymorphism can inorm us about the itness o the background on which it arose. We emphasize that we ocus throughout on modeling a perectly linked genomic region. In the presence o recombination, our results oer insights about the eects o linked selection on diversity within regions that are eectively ully linked on the relevant timescales. In the Discussion, we discuss how our results can be used to provide a lower bound on the length o these segments, and thereore on the amount o linked selection relevant in sexually reproducing populations, and comment on possible uture extensions o our analysis to include recombination explicitly. We begin in the next section by providing an intuitive explanation or the origins o the distortions in the site requency spectrum in the presence o strong background selection, and explain why these distortions always accompany a reduction in diversity. This section summarizes the importance o correctly accounting or background selection, particularly when analyzing large samples, and should be accessible to all readers. We next deine a speciic model o background selection and summarize our main quantitative results.

3 3 FIG. (A) The (unolded) average site requency spectrum o neutral alleles along a nonrecombining genomic segment experiencing strong background selection deviates strongly rom the prediction o neutral theory. The purple line shows the simulated neutral site requency spectrum in Wright-Fisher simulations o an asexual population o N = individuals, where deleterious mutations occur at rate NU d = 5000 and all have the same eect on itness Ns = 000. The black lines show the neutral expectation or the site requency spectrum o a population o N and N e = Ne Ud/s individuals (ull and dashed line, respectively). The inset shows the same data, but with the x-axis linearly scaled to emphasize intermediate requencies. Simulated site requency spectra were obtained by measuring whole-population neutral site requency spectra in 0 5 Wright-Fisher simulations in which neutral mutations were set to occur at rate NU n = 0 3, and by averaging and then smoothing the obtained curve using a box kernel smoother o width much smaller than the scale on which the site requency spectrum varies (the kernel width was set to < 0% o the minor allele requency). (B, C) Statistics o the simulated site requency spectrum (purple) can deviate rom predictions o neutral theory (black) by many orders o magnitude in large samples, even though the eect o background selection will be small in small samples. Here shown: (B) The average ratio o the number polymorphisms present as derived singletons ( =, ull lines) or ancestral singletons ( =, dotted lines) in the sample to N N the number present at 50% requency and (C) the average minor allele requency o the sampled alleles. We then present the analysis o our model. We begin by reviewing how dynamical aspects o allele requency trajectories can be related to site requency spectra, using the trajectories o isolated loci as an example. Readers already amiliar with this intuition may choose skip ahead, but those less interested in the technical details may ind that this section provides useul intuition or the calculations in a simpler context. We then explain how this approach must be modiied to account or linkage between multiple selected sites, and present an intuitive description o the key eatures o allele requency trajectories. These sections may be o interest to readers who wish to understand the intuitive origins o non-neutral behaviors o alleles in the presence o strong background selection. Finally, in the Analysis, we turn to a ormal stochastic treatment o the trajectories o neutral and deleterious mutations. In the last section, we use these trajectories to calculate the site requency spectrum and other statistics describing genetic diversity within the population. STRONG BACKGROUND SELECTION DISTORTS THE SITE FREQUENCY SPECTRUM We begin by presenting a more detailed description o the eects o background selection on linked neutral alleles. We ocus on analyzing the allele requency spectrum, deined as the expected number, p(), o mutations that are present at requency within the population in steady state. This allele requency spectrum contains all relevant inormation about single-site statistics: any such statistic o interest can be calculated by subsampling appropriately rom p(). In Figure A, we show an example o the site requency spectrum o neutral mutations at a locus experiencing strong background selection, generated by Wright-Fisher orward time simulations. This example shows several key generic eatures o background selection. First, at intermediate requencies the site requency spectrum has a neutral shape, p(), with the total number o such intermediate-requency polymorphisms consistent with the simple reduced eective population size prediction (Charlesworth et al., 993). How-

4 4 ever, at both low and high requencies p() is signiicantly distorted. At low requencies, we see an enormous excess o rare alleles, qualitatively similar to what we expect in expanding populations (Rannala, 997; Slatkin and Hudson, 99). We also see a large excess o very high requency variants, leading to a non-monotonic site requency spectrum. This is reminiscent o the non-monotonicity seen in the presence o positive selection (Sawyer and Hartl, 99). Notably, these distortions at both high and low requencies arise in populations o constant size in which all variation is either neutral or deleterious. The excess o rare derived alleles arises because selection takes a inite amount o time to purge deleterious genotypes. Thus we expect that there can be substantial neutral variation linked to deleterious alleles that, though doomed to be eventually purged rom the population, can still reach modest requencies. At the very lowest requencies, we expect that neutral mutations arising in all individuals in the population (independent o the number o deleterious mutations they carry) can contribute. Thus, at the lowest requencies, the site requency spectrum should be unaected by selection, and should agree with the neutral site requency spectrum o a population o size N. On the other hand, as argued above, the total number o common alleles must relect the (much smaller) number o deleterious mutation ree individuals, because only neutral mutations arising in such individuals can reach such high requencies. Since the overall number o very rare alleles is proportional to the census population size N, and the number o common alleles relects a much smaller deleterious mutation-ree subpopulation, there must be a transition between these two: between these extremes the site requency spectrum must all o more rapidly than the neutral prediction p(). This transition relects the act that as requency increases, the eect o selection will be more strongly elt, and neutral mutations arising in genotypes o increasingly lower itnesses will become increasingly unlikely. As the requency increases even urther, we see rom our simulations that the total number o polymorphisms increases again, until at very high requencies it matches the prediction or a neutral population o size equal to the census size N. Note that at these requencies, the total number o backgrounds contributing to the diversity is constant (i.e. all mutations reaching these requencies must arise in the small subpopulation o mutation-ree individuals). This suggests that undamentally non-neutral behaviors must be dominating the dynamics o these high requency neutral polymorphisms. To understand this, as well as the details o the rapid all-o at very low requencies, we will need to develop a more detailed description o the trajectories o neutral alleles in the population; we analyze this in quantitative detail in a later section. However, a simple argument can explain the agreement with the neutral prediction at the highest requencies. Polymorphisms observed at these very high requencies correspond to neutral variants that have almost reached ixation. The ancestral allele is still present in the population, but at a very low requency. In principle, the dynamics o the derived and ancestral alleles should depend on the itnesses o their backgrounds. However, once the requency o the ancestral allele is suiciently low, the eects o drit will once again dominate over the eects o selection. Thus, at extremely high requencies o the derived allele, its dynamics must become neutral. In addition to having neutral dynamics, the overall rate at which neutral mutations enter this high requency regime also agrees with the rate in a neutral population at the census population size. This is because at steady state, the total rate at which neutral mutations ix is equal to the product o the rate at which they enter the population at any point in time (NU n ) and their ixation probability, N, (Birky and Walsh, 988). Thus, since the total rate at which alleles enter this high requency regime is unaected by selection, and since their dynamics within this regime are neutral, we expect that the site-requency spectrum should also agree with the neutral prediction or a population o size N. Although these simple arguments do not provide a ull quantitative explanation o the site requency spectrum, they already oer some intuition about the presence and magnitude o the distortions due to background selection. First, these distortions arise in part as a result o the dierence in the number o backgrounds on which mutations that remain at the lowest requencies and mutations that reach substantial requencies can arise. Thus, they will always occur when background selection is strong enough to cause a substantial reduction in the eective population size : i the pairwise diversity π is at all reduced compared to the neutral expectation π 0 ( π π 0 <, or, in terms o McVicker s B-statistic, B <, (McVicker et al., 009)), these distortions exist (see Figure A). Second, because the distortions rom the neutral shape are limited to high and low ends o the requency spectrum, they will have limited eect on site requency spectra o small samples, but will have dramatic consequences as the sample size increases (see Figure B,C). On a practical level, this means that extrapolating conclusions rom small

5 5 samples about the eects o background selection can be grossly misleading. MODEL AND RESULTS In the next ew sections, we will analyze the dynamics o neutral mutations under background selection in detail. We ocus on the simplest possible model o puriying selection at a perectly linked genetic locus in a population o N individuals. We assume neutral mutations occur at a per-locus pergeneration rate U n, and deleterious mutations occur at rate U d (U d ). Throughout the bulk o the analysis, we will assume that all deleterious mutations reduce the (log) itness o the individual by the same amount s, though we analyze the eects o relaxing this assumption in a later section. We assume that s, since this is the interesting case or biologically relevant mutation rates, though we also consider the eects o more strongly deleterious (or lethal) mutations in the Discussion. We neglect epistasis throughout, so that the itness o an individual with k deleterious mutations at this locus is ks. For simplicity we consider haploid individuals, but our analysis also applies to diploids in the case o semidominance (h = /). We assume that selection is suiciently strong that alleles carrying deleterious mutations cannot ix in the population (Nse Ud/s ). The opposite case, in which deleterious mutations are weak enough to routinely ix (Nse Ud/s ), has been the subject o earlier work (Good and Desai, 03; Good et al., 04; Neher and Hallatschek, 03). In the Discussion we comment on the connection between these earlier weak-selection results and the strong-selection case we study here. Our model is equivalent to the non-epistatic case o the model ormulated by Kimura and Maruyama (966) and Haigh (978) and to the h = case o the model considered by Charlesworth et al. (993) and Hudson and Kaplan (994), and later studied by many other authors (Gordo et al., 00; Nicolaisen and Desai, 0; Seger et al., 00; Walczak et al., 0). However, instead o modeling the genealogies o a sample o individuals rom the population backwards in time, we oer a orward-time analysis o this model, in which we analyze the ull requency trajectory o alleles. In the presence o strongly selected deleterious mutations (Nse Ud/s ), we ind that the magnitude o the eects o background selection critically depends on the ratio, λ, o the deleterious mutation rate, U d, to the selective cost o each deleterious mutation, s, λ = U d /s (Figure ). This ratio controls the overall variance in the number o deleterious mutations carried by individuals in the population, which is equal to λ = U d s (Kimura and Maruyama, 966). Whenever λ, both the overall genetic diversity and the ull neutral site requency spectrum p() are unaected by background selection and the site requency spectrum p() is to leading order equal to p() NU n when λ. () This prediction agrees with the results o orwardtime simulations (see Figure ). The intuition behind this result is simple: in the limit that λ, a majority o individuals in the population are ree o deleterious mutations; neutral alleles are thereore rarely linked to deleterious mutations. This results in a neutral site requency spectrum. However, we will show that the site requency spectrum o neutral mutations ollows a very dierent orm when λ : NU n, or Nσ, NU n ( ), Ns e λ log λ Ns or Nσ eλ Ns, p() NU ne λ, or eλ Ns eλ Ns, NU n Ns( ) log λ ( e λ Ns( ) ), or Nσ eλ Ns, NU n, or Nσ, () where σ = U d s represents the standard deviation in itness in the population, and line in Eq. is valid up to a constant actor (see Appendix I. or details). Comparisons between Eq. and simulations o the model are shown in Figure. We note that p() matches the site requency spectrum o a neutral population with a smaller eective population size N e = Ne λ or < <, but deviates strongly outside this requency range. This implies that summary statistics based on the site requency spectrum (e.g. the average minor allele requency) will start to deviate rom the neutral expectation in samples larger than = N e s individuals, but not in smaller samples (Fig. B,C). Our results also oer an intuitive interpretation o the origins o these distortions, which are summarized in Figure 3. When U d s, a large majority o individuals in the population will carry some

6 6 Nse Ud/s Nse Ud/s FIG. Comparison between the theoretical predictions or the site requency spectrum and Wright-Fisher simulations. In all simulations, Nse Ud/s = 000 e , while the parameter λ = U d /s varies rom 5 to 0. (values shown on igure). Dashed lines show the expectations or a neutral population at reduced eective population size N e = Ne λ. At requencies smaller than (where σ = U Nσ d s) and larger than, the theoretical predictions agree with Nσ < <, Nσ Nσ the predictions or a neutral population with census size N (black line). Within the range the theoretical predictions (Eq. ) are given by colored lines. A single theory curve was constructed rom Eq. by joining the piecewise orms using sigmoid unctions (or details see Appendix I.4). Note that this involves itting O() constants to the curve, or reasons explained in Appendix I. and Appendix I.4. The values o the constants used are tabulated in Appendix I.4. In simulations in which λ, N = 0 5 whereas N = 0 4 or smaller λ. In all simulations, the per-individual per-generation neutral mutation rate is U n = 0., and site requency spectra were obtained rom these simulations as described in the caption o Figure. deleterious mutations at the locus, which results in substantial itness variation within the population. However, the majority o neutral alleles are present on backgrounds that are within O(σ) o the mean o the distribution. Thus, at requencies Nσ and Nσ the eects o genetic drit dominate over any eects o linked selection or the majority o neutral alleles. At these requencies, the site requency spectrum agrees with that o a neutral population o size N (see Figure 3). In contrast, the eects o linked selection have a crucial impact on allele requency trajectories at requencies or which Nσ Nσ. As we show in a later section, this region o the site requency spectrum is dominated by alleles that arise on unusually it backgrounds (with itness with respect to the mean larger than O(σ)). For these alleles, a crucial distinction arises between their short-term and long-term behavior: although genotypes that carry any polymorphic strongly deleterious variants are guaranteed to be eventually purged rom the population, those that contain ewer than average deleterious mutations are still positively selected on shorter timescales. This results in strong non-neutral eatures in the requency trajectories o these alleles. Their trajectories are characterized by rapid initial expansions, ollowed by a peak, and eventual exponential decline (Figure 4). These deterministic aspects o allele requency trajectories are similar to those seen by Neher and Shraiman (0) in models o linked selection in large acultatively sexual populations. We describe them in detail in the section titled Key eatures o lineage trajectories. A part o the rapid allo in the site requency spectrum between = Nσ and = results rom these deterministic eects: alleles arising on backgrounds with more deleterious variants can reach more limited requencies than alleles arising on backgrounds with ewer deleterious variants. Thus, the number o backgrounds on which neutral alleles could have arisen declines with the requency, leading to a allo o the site requency spectrum. However, these deterministic aspects o the allele requency trajectory are not suicient to produce the site-requency spectrum in Eq., even i stochastic eects in the early phase o the trajectory are taken into account (i.e. during establishment, see Desai and Fisher (007) and Neher and Shraiman (0)). This is because luctuations in the numbers o mostit individuals that occur ater establishment continue to drive luctuations in the overall allele requency. This is closely related to the luctuations in the population itness distribution studied by Neher and Shraiman (0) in an analysis o Muller s

7 7 Nσ Nse Ud/s Nse Ud/s Nσ FIG. 3 A summary o the dominant eects shaping the site requency spectrum. The site requency spectrum and theoretical predictions are reproduced rom Fig. and Fig. (λ = 5). At requencies below and above Nσ Nσ the allele requency trajectories o the majority o neutral alleles are dominated by drit, resulting in neutral site requency spectra corresponding to a population o size N. In contrast, linked selection has a crucial impact or < <. The rapid all-o o the site requency spectrum or < < is primarily a result o allele Nσ Nσ Nσ Nse requency trajectories having undamentally non-neutral properties. In this regime, λ the number o backgrounds on which neutral alleles can arise also declines with the requency. As we show later, or, the site requency NU d e λ spectrum is dominated by neutral mutations originating on deleterious backgrounds. In contrast to the rapid decline at lower requencies, the site requency spectrum has a neutral shape between = and =. In this Nse regime, both the neutral and wild-type allele are in approximate mutation-selection balance (see blue inset, λ showing the itness distribution o such alleles), and large luctuations o the allele requency mirror the neutral luctuations o the most-it individuals. At requencies larger than, the relative number o polymorphisms increases with Nse the requency. This pattern results rom the eective positive λ selection o neutral alleles that ix among the ittest individuals (see red inset), and are as a result linked to ewer deleterious mutations than the wild type. ratchet. In the Analysis, we quantiy how these luctuations propagate to shape the statistics o allele requency trajectories, inding that luctuations in the number o most-it individuals that happen on a timescale shorter than s are smoothed out due to the inite timescale on which selection can respond. In contrast, luctuations that happen on timescales longer than s are aithully reproduced in the allele requency trajectory, which leads to quasi-neutral statistics o allele requency trajectories at requencies between and (see Figure 3). The smoothing o luctuations on a inite timescale introduces an additional undamentally non-neutral eature in the total allele requency trajectory. This distorts the site requency spectrum at requencies below above and beyond what would be predicted i we asserted a simple requency-dependent eective population size equal to the number o backgrounds that can contribute to a given requency. Finally, we will demonstrate that the nonmonotonicity in the site-requency spectrum at requencies between and Nσ arises as a result o sweep-like behaviors o neutral alleles that have ixed among the most it individuals in the population (see Figure 3). Because these derived alleles carry on average ewer deleterious mutations than the wild type, they are positively selected despite having no inherent beneit. We will show that this dierence in the average number o linked deleterious mutations gives rise to an eective requencydependent selection coeicient s e (). This selection coeicient changes with the requency o the mutation as high-itness wild type individuals ratchet to extinction: s e () = log λ ( ) s, ( ) i. (3)

8 8 In the next sections, we derive the orm o the site requency spectrum in Eq. and explain these eects in more detail. We begin by presenting background necessary or understanding these results. We irst revisit the intuition behind the shape o the site requency spectra o isolated loci (Ewens, 963; Sawyer and Hartl, 99). We show that in the absence o linkage between multiple selected sites, background selection does not lead to a site requency spectrum o the orm in Eq.. Next, we explain how linkage between multiple selected sites modiies allele requency trajectories. We revisit the key deterministic aspects o allele requency trajectories in the presence o background selection, previously studied by Etheridge et al. (007) and others, and extend these results to identiy the key timescales important or understanding this problem. Finally, we turn to a ull stochastic treatment o allele requency trajectories in the Analysis, where we also derive the expressions or the site requency spectra o neutral and deleterious mutations. In the Discussion, we comment on the practical implications o our results, as well as on connections to previous work and other models. BACKGROUND Isolated loci To gain insight into the more complicated case o linked selection, we irst begin by reviewing the simplest case o a single locus isolated rom any other selected loci. The probability that an allele at that locus is present at requency at time t, p(, t), is described by the diusion equation p t = [s(( )p)] + [ ( ) N ] p. (4) Ewens (963) showed that the expected site requency spectrum (SFS) can be obtained rom this orward-time description o the allele requency trajectory: because mutations are arising uniormly in time, and the time at which a mutation is observed is random, the SFS is proportional to the average time an allele is expected to spend in a given requency window. In this section, we show that the low and high requency ends o the site requency spectrum o isolated loci can be obtained rom a simple heuristic argument that emphasizes this connection between allele requency trajectories and the site requency spectrum. These calculations are not intended to be exact (resulting requency spectra are only valid up to O() actors), but they provide intuition or the origins o key eatures o the site requency spectrum that we will return to more ormally below. Consider the simplest case o isolated purely neutral loci. Neutral mutations will arise in the population at rate NU n. In the absence o selection, the trajectories o these mutations are governed by genetic drit. At steady state, the number o mutations we expect to see at requency is simply proportional to the number o mutations that reach that requency and the typical time each o these mutations spend at that requency beore ixing or going extinct. In the absence o selection, a new mutation that arises at initial requency 0 = N will reach requency beore going extinct with probability 0 = N. Standard branching process calculations (Fisher, 007) show that, given that it reaches requency, the mutation will spend about N generations around that requency (deined as log() not changing by more than O()), provided that is small ( ). By combining these results, we can calculate the expected site requency spectrum or small. The rate at which new mutations reach requency is NU n N. Those that do will remain around (in the sense deined above) or about N generations. Thus the total number o neutral mutations within d o requency is p()d NU n N N d(log ). In other words, we have p() NU n. (5) This argument is valid when is rare, but will start to break down at intermediate requencies. However, because the wild type is rare when the mutant approaches ixation, an analogous argument can be used to describe the site requency spectrum at high requencies. The mutant trajectory still reaches requency with probability N. It will then spend roughly N( ) generations around this requency (i.e. within O() o log( )). This gives p() NUn NU n in the high requency end o the spectrum. This simple orward-time heuristic argument reproduces a well-known result o coalescent theory (Wakeley, 009) and agrees with the more ormal calculation o sojourn times in the Wright- Fisher process (Ewens, 963). We can use a similar argument to calculate the requency spectrum o strongly selected deleterious mutations with itness eect s (with N s ) that occur at a locus that is isolated rom any other selected locus. Provided that the deleterious mutation is rare (below the drit barrier requency, < Ns ), its trajectory is dominated by drit. Thus

9 9 or < Ns, the mutation trajectory will be the same as or a neutral mutation, and the requency spectrum will thereore be neutral. In contrast, at requencies larger than Ns selection is stronger than drit, which prevents the mutation rom exceeding this requency. Combining these two expressions, we ind that the requency spectrum o an isolated deleterious mutation is to a rough approximation given by { NUd i < Ns p() 0 otherwise. (6) For completeness, we also show how a similar argument can be used to obtain the requency spectrum o beneicial mutations. Though it is not immediately obvious that this is relevant to background selection, we will later see how similar trajectories emerge in the case o strong puriying selection. Just like deleterious alleles, strongly beneicial alleles with itness eect s (with Ns ) will not eel the eects o selection as long as they do not exceed the drit barrier ( < Ns ). Their trajectory and requency spectrum will thereore be neutral below the drit barrier. As a result, only a small raction s o beneicial mutations will reach requency Ns. However, those that do will be destined to ix, since at requencies larger than Ns, selection dominates over drit. Above this threshold, selection will cause the requency o the mutation to grow logistically at rate s ( d dt = s( )), spending s( ) generations near requency. This is valid as long as < Ns, at which point the eects o drit become dominant due to the wild type being rare, and the trajectory o the mutant is once again the same as the trajectory o a neutral mutation. Combining these expressions, we obtain a rough approximation or the requency spectrum o an isolated beneicial mutation NU b, i < Ns NU p() b ( ), i Ns < < NU b Ns, i > Ns Linked loci under background selection Ns. (7) We now turn to the analysis o background selection. Since we assume that all mutations have the same eect on itness, the population can be partitioned into discrete itness classes according to the number o deleterious mutations each individual carries at the locus. When the itness eect o each mutation is suiciently strong, the population assumes a steady-state itness distribution in which the expected raction o individuals with k deleterious mutations, h k, ollows a Poisson distribution with mean k = λ (Haigh, 978; Kimura and Maruyama, 966), λ λk h k = e k!. (8) A new allele in such a population will arise on a background with k existing mutations with probability h k. From the orm o h k, we see that depending on the value o λ, the population can be in one o two regimes. In the irst regime, the rate at which mutations are generated is smaller than the rate at which selection can purge them (λ ). In this case, the majority o individuals in the populations carry no deleterious mutations (h 0 ), with only a small proportion, h 0 λ, o backgrounds in the population carrying some deleterious variants. To leading order in λ, all new neutral mutations will arise in a mutation-ree background and will remain at the same itness as the ounding genotype. Their trajectories are thus the same as the trajectories o mutants at isolated genetic loci o the same itness as the ounding genotype (see Appendix D or details). This means that the ull site requency spectrum can be calculated by summing the contributions o site requency spectra o isolated loci that we calculated above. The neutral and deleterious site requency spectra are, to leading order in λ, given by Equations 5 and 6, respectively (or details see Appendix H). Thus, background selection has a negligible impact on mutational trajectories and diversity when λ. In the opposite regime where λ, mutations are generated aster than selection can purge them and there will be substantial itness variation at the locus. Consider a new allele (i.e. a new mutation at some site within the locus) that arises in this population. A short time ater arising, individuals that carry this allele will accumulate newer deleterious mutations, which will lead the allele to spread through the itness distribution. The undamental diiculty in calculating the requency trajectory o this allele, (t), stems rom the act that a short time ater arising, individuals that carry the allele will have accumulated dierent numbers o newer deleterious mutations. The total strength o selection against the allele depends on the average number o deleterious mutations that the individuals that carry the allele have. This will change over time in a complicated stochastic way, as the lineage purges old deleterious mutations, accumulates new ones, and changes in requency due to drit and selection. To

10 0 FIG. 4 (A) The average itness o a lineage composed o individuals carrying k deleterious mutations at time t = 0 (blue dot and blue inset). As the descendants o these individuals accumulate urther deleterious mutations, the itness o the lineage declines until the individuals accumulate an average o λ deleterious mutations (red dot) and reach their own mutation-selection balance, which is a steady-state Poisson proile with mean λ that has been shited by the initial deleterious load, ks, (red inset). (B) In the absence o genetic drit, the lineage will increase at a rate proportional to its relative itness. Lineages arising in the class with k = 0 deleterious mutations reach mutationselection balance about t d = log(λ) ater arising, ater which the size o the lineage asymptotes to n s 0(t = 0)e λ. The itness o these lineages changes on a shorter timescale,. (C) In contrast, lineages arising in classes with k > 0 s deleterious mutations peak in size ater a time t (k) d, when their average relative itness is zero, ater which they decline exponentially at rate ks. calculate the distribution o allele requency trajectories in this regime, we will need to model these changes in the itness distribution o individuals carrying the allele. Although we will ormally be treating λ as a large parameter, in practice our results will also adequately describe allele requency trajectories in the cases o moderate λ (i.e. λ, see Figure ). To make progress, we classiy individuals carrying this allele (the labeled lineage ) according to the number o deleterious mutants they have at the locus. We denote the total requency o the labelled individuals that have i deleterious mutations as i (t), so that the total requency o the lineage, (t), is given by (t) = i i (t). (9) The time evolution o the allele requency in a Wright-Fisher process is commonly described by a diusion equation or the probability density o the allele requency (Ewens, 004). Instead, or our purposes it will be more convenient to consider the equivalent Langevin equation (Van Kampen, 007), d i dt = ( is + k(t)s ) i U d i + U d i + C i (t). (0) Here, C i (t) is a noise term with a complicated correlation structure necessary to keep the total size o the population ixed (see Good and Desai (03) or details) and k(t) is the mean number o mutations per individual in the entire population at time t. In the strong selection limit that we are interested in here ( ), luctuations in the mean o the itness distribution o the population are small and k(t) λ (Neher and Shraiman, 0). Key eatures o lineage trajectories Beore turning to a detailed analysis o Eq. 0, it is helpul to consider some o the key eatures o lineage trajectories that we will model more ormally below. To begin, imagine a lineage ounded by a neutral mutation in an individual with k deleterious mutations. Let the lineage be composed o n k (0) individuals at some time t = 0 shortly ater arising, all o which carry k deleterious mutations (see blue inset in Figure 4A). At this time, the relative itness o this lineage is simply ks ( ks) = U d ks. Thus, lineages ounded in classes with k > λ will tend to decline in size. In contrast, the more interesting case arises i k < λ, since these lineages will tend to increase in size.

11 However, though the overall number o individuals that carry the allele will tend to increase when k < λ, the part o the lineage in the ounding class k (the ounding genotype ) will tend to decline in size, because it loses individuals through new deleterious mutations (at per-individual rate U d ). As a result, the ounding genotype eels an eective selection pressure o U d ks U d = ks, which is negative or all k > 0 and 0 or k = 0. This means that the lineage will increase in requency not through an increase in size o the ounding genotype, but rather through the appearance o a large number o deleterious descendants in classes o lower itness. The lineage must thereore decline in itness as it increases in size. In the absence o genetic drit, we can calculate how the size and itness o the lineage change in time by dropping the stochastic terms in Eq. 0 (subject to the initial condition n k (t = 0) = n k (0) and n k+i (t = 0) = 0 or all i 0). These deterministic dynamics o the lineage have been analyzed previously by Etheridge et al. (007), who showed that the number o additional mutations that an individual in the lineage carries at some later time t is Poisson-distributed with mean λ( e st ). Thus the average number o additional deleterious mutations eventually approaches λ ater t t d = log(λ) s generations. At this point, the lineage has reached its own mutation-selection balance: the itness distribution o the lineage has the same shape as the distribution o the population (i.e. n i+k (t) = h i n k (t)), but is shited by ks compared to the distribution o the population (see red inset in Figure 4A). The average relative itness x(t) o individuals in the lineage (Fig. 4A) is thereore equal to x(t) = U d e st ks, () and the total number o individuals in the lineage is simply n(t) = n k (t = 0)e t 0 x(t )dt n k (t = 0)g k (t), where we have deined g k (t) = e kst+λ( e st). () Thus, we can see rom Equations and that lineages ounded in the 0-class will on average steadily increase in size at a declining rate until they asymptote at a total size equal to n k (t = 0)e λ roughly t d = log(λ) s generations later (see Figure 4B). In contrast, lineages ounded in the k-class will increase in size or only t (k) d = log(λ/k) s (3) generations, when they peak at a size o n k (t = 0) g k individuals (see Figure 4C), where we have deined ( ) k k g k e λ. (4) eλ The lineages remain near this peak size or about t (k) = ks (5) generations (Fig. 4C). At longer times, they exponentially decline at rate ks (Fig. 4C). These simple deterministic calculations capture the average behavior o an allele and show that all alleles ounded in classes with k > 0 are likely to be extinct on timescales much longer than t (k) d, whereas suiciently large lineages ounded in the 0-class should simply relect the requency in the ounding class about t d generations earlier, (t) e λ 0 (t t d ). This is the orward-time analogue o the intuition presented by Charlesworth et al. (993). O course, this deterministic solution neglects the eects o genetic drit, which will be crucial, particularly because drit in each class propagates to aect the requency o the lineage in all lower-itness classes (or a more detailed heuristic describing why drit can never be ignored, see Appendix B.). Although these eects are complex, there is a hierarchy in the luctuation terms which we can exploit to gain some intuition. From the deterministic solution above, we can see that a luctuation o size δ i in class i will on average eventually cause a change in the total size o the lineage proportional to δ i g i ater a time delay t (i) d. Thus, the luctuations that have the largest eect on the total size o the lineage are those that occur in the class o highest itness (i.e. the ounding class k). These luctuations will turn out to be the most important to describing the requency trajectory o the entire allele, though luctuations in classes o lower itness will still matter in lineages o small enough size. One could imagine that this result means that luctuations in the total size o the lineage simply mirror the luctuations in the ounding class, ampliied by a actor g k and ater a time delay t (k) d. I luctuations in the ounding class are suiciently slow, this is indeed the case. However, or luctuations that occur on shorter timescales, this is not true. Consider or example the case where a neutral mutation is ounded in the mutation-ree (k = 0) class. Imagine that the requency o the allele in the ounding class changes by a small amount rom 0 to 0 + δ 0 as a result o genetic drit (shown in the irst panel o Figure 5). Based on the deterministic solution, this luctuation will lead to a proportional

12 Change in requency o ounding genotype Percent Change relected in -class Change relected in -class Time Change relected in allele requency Fitness FIG. 5 A schematic showing how a change in the requency o the lineage in the mutation-ree class propagates to aect the requency in all classes o lower itness. At time t = 0, the lineage is in mutation-selection balance at total requency, when the requency o the portion o the lineage in the zero class changes suddenly rom 0 to 0 + δ 0. This change is elt in the -class generations later, and propagates to the -class yet another generations later. s s The lineage reaches a new equilibrium about log(λ) generations later, when the total allele requency is proportional s to ( 0 + δ 0)e λ. change in the requency o the portion o the lineage in the -class, and this change will take place over s generations (see Appendix A or details). During this time, the change in the -class begins to lead to a shit in the requency in the -class, which will mirror the change in the 0-class a urther s generations later (see Fig. 5). This change will then propagate in turn to lower classes, and ultimately results in a proportional change in the total allele requency a total o λ i= is = log(λ) s generations later (see Fig. 5). Now consider what happens i there is another change in the requency in the ounding class. I this change occurs within the initial s generations, it will inluence the -class simultaneously with the irst luctuation, and thus the eect o these two luctuations on the overall lineage requency will be smoothed out. In contrast, i the changes are separated by more than s generations, they will propagate sequentially through the itness distribution, and are ultimately mirrored in the total allele requency. Similar arguments apply to lineages ounded in other itness classes, though the relevant timescales and scale o ampliication are dierent. Together, these arguments suggest that luctuations in the ounding class will have the largest impact on overall luctuations in the lineage requency, and these overall luctuations will represent an ampliied but smoothed-out mirror o the luctuations in the ounding class. This smoothing will be crucial: the size o the lineage in the ounding class will typically luctuate neutrally, but the smoothed-out and ampliied versions will have non-neutral statistics. As we will see below, this smoothing ultimately leads to distortions in the site requency spectrum at low requencies ( ). ANALYSIS Formally, we analyze all o the eects described above by computing the distribution o the requency trajectories (t) o the allele, p(, t), rom Eq. 0 or an allele arising in class k. This process is complicated by the correlation structure in the C i (t) terms required to keep the population size constant. These correlations are important once the lineage reaches a high requency, and in the presence o strong selection they result in a complicated hierarchy o the moments o, which do not close (Good and Desai, 03; Higgs and Woodcock, 995). However, we can simpliy the problem by considering low, high, and intermediate-requency lineages separately. First, at suiciently low requencies ( ), the C i (t) in Eq. 0 reduce to simple uncorrelated white noise. At these low requencies, Eq. 0 thus

13 3 simpliies to d i dt = ( is + λs) i i U d i + U d i + N η i(t), (6) where the noise terms have η i (t) = 0 and covariances η i (t)η j (t ) = δ ij δ(t t ) and should be interpreted in the Itô sense. At very high requencies ( ), a similar simpliication arises. In this case, the wild-type lineage is at low requency, and we can model the wild-type requency using an analogous coupled branching process with uncorrelated white noise terms. Finally, at intermediate requencies, we cannot simpliy the noise terms in this way. Fortunately, or the case o strong selection we consider here, we will show that or, lineage trajectories have neutral statistics on relevant timescales. As we will see below, these low, intermediate, and high requency solutions can then be asymptotically matched, giving us allele requency trajectories and site requency spectra at all requencies. In the next several subsections, we ocus on the analysis o the distribution o trajectories at low and high requencies ( or ), where Eq. 6 is valid. We then return in a later subsection to the analysis o trajectories at intermediate requencies. The dynamics o the lineage within each itness class To obtain the distribution o trajectories o the allele p(, t) at low requencies ( ) rom Eq. 6, we will irst compute the generating unction o (t). This generating unction is deined as H (z, t) = e z(t), (7) where angle brackets denote the expectation over the probability distribution o the requency trajectory (t). H (z, t) is simply the Laplace transorm o the probability distribution o (t), and it thereore contains all o the relevant inormation about the probability distribution o (t). As we have already anticipated rom our discussion above, the time evolution o (t) depends on the distribution o the lineage among dierent itness classes. In order to understand how this distribution changes under the inluence o drit, mutation and selection in these classes, we can consider the joint generating unction or the i (t), H({z i }, t) = e i zii(t). (8) The generating unction or the total allele requency H (z, t) can then be obtained rom this joint generating unction by setting z i = z. We will use this relationship between the two generating unctions to evaluate the importance o drit, mutation and selection within each o the itness classes on the total allele requency. By taking a time derivative o Eq. 8 and substituting the time-derivatives di dt rom Eq. 6 (where the stochastic terms should be interpreted in the Itô sense, see Appendix C), we can obtain a partial dierential equation describing the evolution o the joint generating unction, H t = i ( isz i U d z i+ + z i N ) H z i. (9) We see rom Equation 9 that the joint generating unction is constant along the characteristics z i (t t ) deined by dz i dt = is z i U d z i+ + z i N. (0) Thus, the joint generating unction can be obtained by integrating along the characteristic backwards in time rom t = 0 to t = t, subject to the boundary condition z i (t) = z. Note that the linear terms in the characteristic equations arise rom selection and mutation out o the i class, and that the nonlinear term arises rom drit in class i. In Appendix E., we show that when considering the distribution o trajectories p(, t) at requencies ) i = Nsi g i, the nonlinear terms in ( Nsi eλ i eλ Eq. 0 are o negligible magnitude uniormly in time in all classes containing i or more deleterious mutations per individual as long as i λ, and λ. Here, g i represents the peak o the expected number o individuals in a lineage ounded by a single individual in class i (see Eq. 4 and Fig. 4C). Thus, when Nsi g i, the eect o genetic drit is negligible in classes with i or more deleterious mutations. Conversely, when Nsi g i, genetic drit in the class with i deleterious mutations does aect the overall allele requency. Since drit is negligible in classes with i or more mutations, total allele requencies o (t) Nsi g i require that i (t t (i) d ). This threshold is Nsi reminiscent o the drit barrier, but its origin or classes below the ounding class (i > k) is more subtle. We oer an intuitive explanation or this threshold in Appendix B.. Thus, drit in class i has an important impact on the overall requency trajectory as long as i Nsi. However, once i exceeds Nsi, the eect o genetic drit in that class,

14 4 as well as all classes below i, becomes negligible, because the requencies o the parts o the lineage in all classes below i are then also guaranteed to exceed the corresponding thresholds. Note that the requency o the ounding genotype k is exponentially unlikely to substantially exceed Nsk. This is because, as we explained earlier, the requency trajectory o the ounding genotype k (t) has the same statistics as the trajectory o a mutation o itness ks at an isolated locus (see Equation 6 and Appendix F). Thus, because k is unlikely to exceed Nsk, the overall allele requency o an allele ounded in class k is exponentially unlikely to substantially exceed Nsk g k. In summary, by analyzing the generating unction or the components o the lineage in dierent itness classes, we have ound that there is a clear separation between high itness classes in which mutation and drit are the primary orces, and classes o lower relative itness in which mutation and selection dominate. The boundary between the stochastic and deterministic classes can be determined rom the total allele requency, allowing us to reduce a complicated problem involving a large number o coupled stochastic terms to what we will see is a small number o stochastic terms eeding an otherwise deterministic population. Statistics o trajectories with Ns g At this point, we are in a position to calculate a piecewise orm or the generating unction H (z, t) valid near any requency. For example, consider the allele requency trajectory in the vicinity o some requency Ns g. As we have explained above, at these requencies contributions rom mutations arising in class k are exponentially small, since they would require the requency o the lineage in that class to substantially exceed Ns, which happens only exponentially rarely. Thus, in this requency range we will only see mutations arising in the mutation-ree class (k = 0). In addition to this, we have shown that at these requencies genetic drit can be neglected in all classes but the 0-class. To obtain the generating unction at these requencies, we can thereore integrate the characteristic equations by dropping the nonlinear terms in Eq. 0 or all i > 0 (see Appendix E. or details). This yields the generating unction or the requency o the labelled lineage, H (z, t) = e z(0+u t dτ0(τ)g(t τ)) d, () A B spreading peak phase extinction FIG. 6 Schematic o (A) the trajectory o the total allele requency, and (B) the trajectory o the requency o the portion o the allele that remains in the ounding class. Soon ater arising in the ounding class, the allele requency rapidly increases in the spreading phase o the trajectory. Early in this phase, the total allele requency (purple) becomes much larger than the requency o the ounding genotype (black; let inset, panel A). In the peak phase o the trajectory, the total allele requency trajectory represents a smoothed out and ampliied version o the trajectory in the ounding class (the relationship between the total allele requency (purple curve) and the ounding genotype requency (black curve) is based on Eq. ). In the extinction phase o the trajectory, the allele requency declines at an increasing rate (right inset, panel A). We describe the extinction phase in more detail in Appendix E and in the section o the main text titled The trajectories o high requency alleles,, where we also explain how the rate o extinction changes with the requency. where the average is taken over all possible realizations o the trajectory in the ounding class 0 (t). As beore, g (t τ) represents the expected number o individuals descended rom an individual present in class in the -class t τ generations earlier (see Eq. ). Thus, the two terms in the exponent in Eq. represent the requency o the lineage in the ounding class 0 and the total requency o the deleterious descendants o that lineage. The latter are seeded into the -class at rate NU d 0 (τ), and each o these deleterious descendants ounds a lineage that t τ generations later contains g (t τ) individuals, so that the total requency o the allele

15 5 is simply (t) = 0 (t) + U d t dτ 0 (τ)g (t τ). () Thus, we have obtained a simple expression or the requency o the entire allele in which all o the stochastic eects have been reduced to a single stochastic component, 0 (t). Furthermore, the stochastic dynamics o 0 (t) are those o a simple, isolated, neutral mutation (see schematic o such a trajectory in Figure 6B). Note however that the statistics o the luctuations in (t) are not necessarily the same as the statistics o the trajectory in the ounding class (see Figure 6A). This is because (t) depends on an integral o 0 (t) (see Eq. ), and thereore has dierent stochastic properties than 0 (t) itsel. From Eq., we can see that the requency trajectory o the allele still has the same qualitative eatures as those we have seen in the deterministic behavior o mutations. Shortly ater being ounded, the lineage will become dominated by the deleterious descendants o the ounding class, which are captured by the second term in Eq. (see let inset in Fig. 6A). At early times (t t d = log(λ) s ), the total allele requency must rapidly grow, as the lineage spreads through the itness distribution and approaches mutation-selection balance (see Figure 6A). About t d generations ater ounding, the peak phase o the trajectory begins (see Fig. 6A). During this phase, the average itness o the lineage is approximately 0 and the allele traces out a smoothed-out and ampliied version o the trajectory in the ounding class (Fig. 6B). Finally, t d generations ater the descendants o the last individuals present in the ounding class have peaked, the average itness o the lineage will all signiicantly below 0, and the extinction phase o the trajectory begins. As we show in Appendix I, the peak phase o the trajectory is the most important or understanding the site requency spectrum. This is also the phase during which the trajectory o the mutation spends the longest time near a given requency. In contrast, the spreading phase (see Fig. 6A) has a negligible eect on the site requency spectrum: by this we mean that the site requency spectrum at a given requency will always be dominated by the peak phase o trajectories that peak around that requency, and not inluenced by the spreading phase o trajectories that peak at much higher requencies. We will thereore not consider the spreading phase in the main text, but discuss it in Appendix I.3. The extinction phase o the trajectory can also be neglected or a similar reason, except when considering the very highest requencies, (Appendix I.). At these requencies, the wild type requency is small and the mutant is in the process o ixation. To analyze the allele requency trajectory at these requencies, we model the wild type using the coupled branching process in Eq. 6, and hence describe these trajectories by the extinction phase o the wild type. To calculate the distribution o (t) in the peak phase, we need to calculate the distribution o the time integral o 0 (t) in Eq.. We can simpliy this integral by observing that g (t) is highly peaked in time between t () d t() and t () d + t(), where t () d and t () are given by Equations 3 and 5 and annotated in Figure 4C. In other words, starting at times around t () d generations ater the lineage reaches a substantial requency in the ounding class, the labelled lineage is dominated by the deleterious descendants o individuals extant in the ounding class between t () d t() and t () d + t() generations earlier, with individuals extant in the ounding class at other times having exponentially smaller contributions (see Appendix E. or details). Thus, the total size o the lineage will be proportional not to the requency 0 (t t () d ) in the ounding class t() d generations earlier, but to the total time-integrated requency within some window o width t () centered around that time. We call this quantity the weight and denote it by W t (), where W t ()(t) = t+ t () t t() 0 (t )dt. (3) The total allele requency in the peak phase is thereore equal to (t) U d W t ()(t t d ) g (t d ). (4) Thus, to calculate the distribution o the allele trajectory, we only need to calculate the distribution o the weight in the ounding class over a window o speciied width, t (). It is inormative to consider the time-integrated orm o the distribution o this weight, p(w t ()) = dt p(w t (), t), since this orm is also directly relevant to the site requency spectrum (or a discussion o the timedependent distribution W t ()(t), see Appendix F). In Appendix F we show that p(w t ()) is given by t () Nπ, W W p(w t ()) 3/ t () t () N, t () W, W t() t () t () N. (5)

16 6 This distribution has a orm that can be simply understood in terms o the trajectory in the ounding class. Since genetic drit takes order N 0 generations to change 0 substantially, drit will not change 0 signiicantly within t () generations when the requency in the ounding class exceeds t() N = Ns. As a result, the weight, W t (), will be approximately equal to W t () 0 t () = 0 s. Thereore, at these large requencies, the weight simply traces the eeding class requency, and the two quantities have the same distributions. At lower requencies, 0 s, the ounding genotype will typically have arisen and gone extinct in a time o order N 0,max generations (where 0,max the maximal requency the lineage reaches over the course o its lietime). By assumption, this time is much shorter than s. Thus, the weight in a window o width s that contains this trajectory is simply W t () = 0,max N 0,max. This large a trajectory is obtained with probability N 0,max, rom which it ollows (by a change o variable) that the distribution o weights in the ounding class scales as W 3/. t () As we anticipated in our discussion o the propagation o luctuations o the ounding genotype through the itness distribution (Fig. 5), we have ound that the trajectory o the allele in the peak phase looks like a smoothed out, time-delayed and ampliied version o the trajectory in the ounding class (Fig. 6). At suiciently high requencies, (t) U g Ns, the timescale o the smoothing is shorter than the typical timescale o the luctuations in the ounding class. At these requencies the statistics o the luctuations o the allele simply mirror the statistics o the luctuations in the ounding class, with a time delay equal to t () d = log λ s. At lower requencies, Ns g, the timescale o smoothing is much longer than the typical lietime o the ounding genotype. As a result, the deleterious descendants o the entire original genotype rise and all simultaneously and luctuations in the ounding class are not reproduced in detail. Instead, the peak phase o the allele requency trajectory consists o a single peak with size proportional to the total lietime weight o the ounding genotype, W = (τ)dτ. As we calculated above, the distribution o these peak sizes alls o more rapidly than neutrally. This gives us a complete description o the statistics o the peaks o allele requency trajectories in the requency range Ns g. Statistics o trajectories with Ns g So ar, we have only considered trajectories o lineages that reach a maximal allele requency larger than Ns g, all o which must have arisen in the mutation-ree class. At lower requencies, 4Ns g Ns g, the eects o genetic drit in class i = must also considered, but the behavior in classes with i is deterministic. In this case, by repeating our earlier procedure, we obtain a slightly dierent orm or the generating unction H (z, t), H (z, t) = e z(0++u t dτ(τ)g(t τ)) d, so that the total allele requency is (t) = 0 (t) + (t) + U d t dτ (τ)g (t τ). (6) (7) The total allele requency is once again dominated by the last term, which represents the bulk o the deleterious descendants. Thus, by an analogous argument, the peak size o the lineage is proportional to the weight in the -class in a window o width t () = s, (t) U d W t ()(t t () d ) g. (8) There are two types o trajectories that can reach these requencies: trajectories that arise in the - class and reach a suiciently large requency in their ounding class ( λ/ NUd, see Appendix G), and trajectories that arise in the 0-class and reach a smaller requency in their ounding class ( 0 λ / NUd ), but still leave behind enough deleterious descendants in the -class that the overall requency in that class exceeds λ/ NUd. By the argument that we outlined beore, this ensures that genetic drit will negligible in classes o lower itness (i.e. or i ), and is guaranteed to happen i 0 λ/4 NU d (see Appendix G). The trajectories o the ormer type are simple to understand, since in this case, the trajectory (t) is that o a simple, isolated, deleterious locus with itness s (and 0 (t) = 0 at all times). By repeating the same procedure as above, we ind that the timeintegrated distribution o the weights in the -class is p(w t ()) t () e Ns W t (). 3/ (9) πnw t () Note that since the trajectory o a mutation in the ounding -class is longer than s generations only

17 7 exponentially rarely, a window o length t () nearly always contains the entire ounding class trajectory (see Appendix F). This is relected in the orm o the weight distribution in Equation 9, which alls as W 3/ with an exponential cuto at t () Ns. Thus, the requency trajectory o an allele that arises in the -class will not mirror the luctuations in the ounding genotype. Instead, the peak phase o the allele requency trajectory will nearly always consist o a single peak, just as we have seen in the case o alleles peaking at requencies Ns g. We now return to the other type o trajectory that can peak in this range: alleles arising in the 0-class, but reaching a small enough requency that the eects o genetic drit in the -class cannot be ignored ( 0 λ NU d ). Because the trajectory o these alleles in the -class represents the combined trajectory o multiple clonal sub-lineages each ounded by a mutational event in the 0-class, the distribution o weights in the -class will be dierent (p(w t ()) W 5/4 t (), see Appendix G), which leads to a dierent distribution o overall allele requencies. However, as we show in Appendix I., these trajectories have a negligible impact on the site requency spectrum: because the overall number o mutations arising in class is substantially larger than the overall number o mutations arising in class 0, trajectories that arise in class 0 and peak in the same requency range as mutations originating in class are less requent by a large actor (λ 3/4, see Appendix G),. Similarly, at even lower requencies in the range Ns(i+) g i+ Nsi g i, we will see the peaks o trajectories arising on backgrounds with i or ewer deleterious mutations. These trajectories all have a single peak o width equal to t (i+) = s. The i+ maximal peak sizes are, once again, proportional to the total weight in the i-class, which will be distributed according to a dierent power law depending on the dierence in the number o deleterious mutations = k i between the ounding class k and the i-class (see Appendix G or details). As we show in Appendix I., the most numerous o these mutations are those that arise in the i-class (k = i). The index o this most numerous class is a quantity that we return to at multiple points, and we denote it with k c (). We can obtain an explicit orm or how k c () depends on the requency by solving the im- plicit condition Ns(k g c+) k c+ Nsk c g kc or k c. We show in Appendix I. that, to leading order, ( ) k c () + log λ, when k c (). (30) Finally, at the very lowest requencies, λ NU d = Nσ, the site requency spectrum is dominated by the trajectories o lineages that arise in a class that is within a standard deviation σ o the mean o the itness distribution (i.e. lineages with λ k σ s = λ). Unlike the trajectories o lineages that arise in classes o higher itness that we discussed above, allele requency trajectories o lineages arising within a standard deviation o the mean are typically dominated by drit throughout their lietime (see Appendix E.3). This is because the timescale on which these lineages remain above the mean o the itness distribution (which is limited by t d (k)) is shorter than the timescale that it takes them to drit to a requency large enough or the e- ect o selection to be elt ( N(U d sk) t d(k)). Lineages arising in these classes do not reach requencies substantially larger than Nσ, and have largely neutral trajectories at requencies that remain below this threshold. The mirrored luctuations o the allele at intermediate requencies, We have seen that the eects o genetic drit in multiple itness classes may be important when, but that at requencies larger than, genetic drit in all classes apart rom the 0-class can be neglected. At these requencies, the trajectory o the allele mirrors the luctuations in the 0- class that occur on timescales longer than s generations. We have also seen that overall allele requencies larger than correspond to 0-class requen- cies 0 Ns. At more substantial allele requencies (or which the condition that is not satisied), the coupled branching process in Eq. 6 cannot be used to adequately model the allele requency trajectory. This is because at these requencies the correlations between luctuations in the requencies o the mutant and o the wild-type, which are imposed by the inite size constraint o the population, become important. However, we can account or these correlations simply by making use o the act that the eect o genetic drit in all classes but the 0-class will remain negligible as long as both the mutant and the wild type remain at suiciently large requencies. Thus, to model the overall allele requency trajectory at these intermediate requencies, we can use a simple, neutral model to describe the requency o the mutant in the 0-class, 0 (t), and the requency o the wild type in the 0-class, wt,0 = h 0 0 (t) =

18 8 e λ 0 (t), d 0 dt = 0 (e λ 0 ) η 0 (t) (3) N and treating the remainder o the population deterministically (which yields an expression or the relationship between 0 (t) and (t) that is identical to Eq. ). Furthermore, since we have assumed that, an additional simpliication arises. In this requency range, the re- quency o both the mutant and o the wild type 0-class exceed Ns. Thus, large luctuations in the requency o the mutant and o the wild type occur on timescales that are longer than s generations. Because this timescale is longer than the timescale on which selection in lower classes responds ( s ), large luctuations in the 0-class are mirrored by the overall requency trajectory, ater a time-delay. In other words, on timescales longer than s generations, we can expand the exponent in the integrand in Eq. around its peak and approximate the total allele requency o the ( mutant ) and the wild type alleles as (t) e λ 0 t log λ s and ( )] wt (t) e λ [h 0 0 t log(λ) s, which yields a model or the total allele requency o the mutant d ( ) dt = Ne λ η(t), (3) where η(t) is an eective noise term with mean η(t) = 0, variance η(t) = and auto-correlation η(t)η(t ) that vanishes on timescales longer than s. Thus, on timescales longer than s, the allele requency trajectory is just like that o a neutral mutation in a population o smaller size Ne λ. On shorter timescales, the allele requency trajectory will be more correlated in time than the requency trajectory o a neutral population in a population o that size and will appear smoother. However, since large requency changes o alleles at these requencies will only occur on a timescale o order Ne λ, which is much longer than s, this description will be suicient or describing site requency spectra. We emphasize that Eq. 3 relies on the overall luctuations in the itness distribution o the population being negligible on relevant timescales, so that the average number o deleterious mutations per individual, k(t) is approximately equal to λ (and, crucially, independent o ). We expect that this approximation is valid when, because the overall luctuations in k(t) are small compared to λ in this limit (Neher and Shraiman, 0). However, it is less clear whether this approximation continues to be appropriate as approaches more moderate, O() values. A more detailed exploration o these eects would require a path integral approach similar to that o Neher and Shraiman (0), and is beyond the scope o this work. The trajectories o high requency alleles, The neutral model rom the previous section breaks down when the allele requency o the mutant exceeds. These total allele requencies are attained when the requency o the ounding genotype 0 exceeds the requency h 0 Ns. When this occurs, the requency o the wild-type in the ounding class will all below Ns and luctuations that occur on timescales shorter than s generations will once again become important. Mutant lineages that reach such high requencies are almost certain to ix in the 0-class. Once this happens, all individuals that carry the wild type allele at the locus will also be linked to a deleterious variant. Thus, though the mutant carries no inherent itness beneit, it will thereater appear itter than the wild type, because it has ixed among the most it individuals in the population. The mutant will thereore proceed to perorm a true selective sweep and will drive the wild type allele to extinction. At these high allele requencies,, we can once again use the coupled branching process in Eq. 6 to describe the allele requency trajectory o the wild type, wt (t) = (t). Seen rom the point o view o the wild type, the ixation phase o the mutant corresponds to the extinction phase o the wild type (see right inset in Figure 3). To obtain a description o the allele requency trajectory o the wild type at these times, we can expand the generating unction in Eq. at long times, which yields H wt (z, t) e z e (k+)s(t t 0 ), (33) or some choice o t 0 (see Appendix E. or details). Note that, as beore, Eq. 33 is valid only as long as wt Ns g (i.e. as long as the size o the lineage in the -class exceeds Ns ). Once the requency o the wild type in the -class alls below Ns, we can no longer treat this class deterministically. Once this happens, the part o the wild type that is in the -class will drit to extinction within about s generations, whereas its bulk will continue to decay at a rate proportional to its average itness, s. This will go on or as long as the requency in the -class

19 9 is larger than 4Ns, corresponding to the total requency o the lineage being larger than 4Ns g. Once the requency o the wild type in the -class also alls below 4Ns, the bulk o the lineage will continue to decay even more rapidly, at rate 3s, and so on. In general, once the requency o the lineage in class k c, but not in class k c +, alls below Nsk c, which corresponds to the total requency o the wild type being in the range Ns(k c+) g k c+ wt Nsk c g kc, the average itness o the bulk o the wild type will be equal to s e = s (k c + ) (see Fig. 3). Thus, the wild type goes extinct in a staggered ashion, dying out in classes o higher itness irst, and declining in relative itness in this process. As a result, the eective negative itness o the wild type increases as its requency declines, leading to an increasingly rapid exponential decay o the allele requency (see right inset in Fig. 6A). By solving the implicit condition or k c ( wt ) above as previously (see Appendix E., Eq. E9 Eq. E), we ind that average itness o the bulk o the wild type distribution s e ( wt ) is to leading order equal to ( ) s e ( wt ) = s(k c ( wt ) + ) s log λ, wt (34) when wt. This means that the requency trajectory o the wild type in this phase obeys wt (t) e log λ ( N wt se λ )st. (35) THE SITE FREQUENCY SPECTRUM IN THE PRESENCE OF BACKGROUND SELECTION Having obtained a distribution o allele requency trajectories, we are now in a position to evaluate the site requency spectrum. Since the trajectory o any lineage depends on the itness o the background that it arose on, we will ind it convenient to divide the total site requency spectrum p() into the site requency spectra o mutations with dierent ancestral background itnesses, p(, k). By deinition, the total site requency spectrum p() is the sum over these single-class requency spectra, p() = k p(, k). (36) We evaluate the site requency spectrum in three overlapping regimes,, and. The site requency spectrum o rare alleles, The rare end o the requency spectrum ( ) consists o neutral alleles that (because they are rare) occurred on dierent genetic backgrounds. These alleles thus have independent allele requency trajectories that can be described by the coupled branching process, Eq. 6. As long as (t) Nσ, most lineage trajectories are dominated by genetic drit. Intuitively this result is simple: provided that the lineage is rare enough, selection pressures in any itness class (or more precisely, the bulk o the itness classes where the vast majority o such alleles arise) can be neglected compared to drit. Thus, the resulting site requency spectrum is p(, k) NU nh k, or Nσ. (37) At these requencies, the total site requency spectrum is equal to p() = k p(, k) NU n, or Nσ. (38) This agrees with our earlier intuition that at the lowest requencies the entire population contributes to the site requency spectrum, and also with the results o Wright-Fisher simulations (see Figure 3). Since the eects o selection are negligible, each itness class contributes proportionally to its size, with the largest itness classes contributing the most (see Figure 7). The deleterious mutation-ree (k = 0) class has a negligible eect on the site requency spectrum, contributing only a small proportion (proportional to its total requency, h 0 = e λ ) o all variants seen at these requencies. At larger requencies, Nσ, selection plays an important role in shaping allele trajectories and the site requency spectrum. The overall contribution o mutations originating in class k near some requency is determined not only by the overall rate NU n h k at which such mutations arise, but also by the probability that these mutations reach, which declines with the initial deleterious load k. As a result, as increases, the site requency spectrum will become increasingly enriched or alleles arising in unusually it backgrounds (see Figure 7). The contributions to the site requency spectra p(, k) are straightorwardly obtained by integrating in time the distributions o allele requency trajectories that we have described in the Analysis section, p(, k) = NU n h k p(, k, t)dt. (39)

20 0 FIG. 7 Proportions o polymorphisms at a given requency that arose in genotypes that had a speciied number o additional deleterious mutations compared to the most it genotype at the locus at the time they arose. The parameters in these simulations are the same as the parameters used to generate site requency spectra in Fig. and Fig. 3. Note that or requencies >, the entire site requency spectrum is composed o lineages that arose Nse due to neutral mutations in the k = 0 class λ (with only a ew exceptions that arise due to rare ratchet events). This integral is dominated by the peak phase o allele requency trajectories, during which we have seen that the allele requency is simply( proportional ) to the weight in class k c () log λ, which corresponds to the class o lowest itness in which the dynamics are not deterministic, { [ p p(, k, t) =U d g kc W t (kc) ( t t (kc) d ) ], k k c (), 0, otherwise. (40) The overall site requency spectrum is equal to the sum o these terms (Eq. 36). In Appendix I. we show that this sum is well-approximated by the last term, corresponding to k = k c (), and obtain that the site requency spectrum in the rare end is, to leading order NU n p() Ns λ log λ ( /Nse λ ) or λ NU d, NU ne λ or, (4) where the orm o the requency spectrum or is valid up to a constant actor (see Appendix I. or details). A comparison between these predictions and site requency spectra obtained in Wright-Fisher simulations o the model is shown in Figure. These results reproduce much o what we may have anticipated rom our analysis o allele requency trajectories. At requencies, these peaks represent the mirrored and ampliied trajectories in the mutation-ree (k = 0) ounding class. To reach these requencies, mutations need to arise in the mutation-ree class (which happens at rate NU n e λ ) and drit to substantial requencies ( 0 Ns ). Since luctuations in the ounding class o lineages that exceed this requency are slow compared to the timescale on which their deleterious descendants remain at their peak ( t = s ), the entire allele requency trajectory reproduces the luctuations o the neutral ounding class. Thus, neutral site requency spectra proportional to emerge. At smaller requencies,, allele requency trajectories relect the smoothed out luc- tuations in the high-itness classes. At these requencies, the site requency spectrum is composed o a rapidly increasing number o polymorphisms as the requency decreases or three reasons (see Fig. 3 and Fig. 7). First, lower requencies correspond to smaller eeding class weights, which are more likely simply due to the eects o drit. Note that because in this requency range the overall allele requency is proportional to the total weight in the ounding class and not the requency, this eect leads to the site requency spectrum alling o at a aster rate than the baseline expectation o, which would occur in the absence o smoothing o luctuations in the ounding class due to the inite timescale o selection. Second, the number o individuals with k deleterious mutations in the locus increases with k (or k < λ), causing an increase in the overall rate at which alleles peaking at lower requencies arise. This variation in the overall number o such alleles gives rise to the steeper power law,, which can be compared to the distribution o peaks o individual lineages, which decays at most

21 as 3/. Finally, peaks that occur at requencies Nsk g c() k c() Ns(k g c() ) k c() have duration o order t (kc()), which declines kc()s with the requency, giving rise to the root logarithm actor. By comparing our predictions to the results o Wright-Fisher simulations, we can see that this argument correctly predicts the orm o the site requency spectrum in these regimes, as well as the requencies at which these transitions happen (see Figure 3). The site requency spectrum at intermediate and high requencies At requencies much larger than but still smaller than, the allele requency trajectory is described by an eective neutral model on coarse enough timescales ( s ). At these requencies, the site requency spectrum is p() NU ne λ, (4) or. Note that this agrees with the result o the branching process calculation, which is valid in a part o this range, at requencies corresponding to and. This breaks down at even higher requencies. These requencies correspond to the extinction phase o the wild type, during which the allele requency no longer mirrors the requency in the 0-class, but instead declines exponentially at an accelerating rate (see Eq. 35). Equation 35 can be straightorwardly integrated in time (see Appendix I. or details), which yields the orm o the site requency spectrum at these high requencies p() U n s( ) log λ ( /Ns( )e λ ), (43) or Nσ. Finally, once the wild type requency alls below Nσ, it will be in an analogous situation as the mutant at very low requencies: independent o how it is distributed among the itness classes, its trajectory will be dominated by drit, since most individuals in the population have itness that does not dier rom the mean itness by more than σ. Thus, at these requencies, the site requency spectrum will once again agree with the site requency spectrum o neutral loci isolated rom any selected sites in the genome p() U n, or Nσ. (44) The deleterious site requency spectrum So ar, we have ocused primarily on describing the trajectories and site requency spectra o neutral mutations. However, because the trajectory o a neutral mutation that arises in an individual with k deleterious mutations is equivalent to the trajectory o a deleterious mutation that arises in an individual with k deleterious mutations, descriptions o trajectories o deleterious mutations ollow without modiication rom our descriptions o trajectories o neutral mutations. The deleterious site requency spectrum can thus be constructed rom the singleclass site requency spectra o neutral mutations by a simple modiication o the total rates at which new deleterious mutations arise (speciically, with the contribution to the deleterious site requency spectrum o mutations arising in class k being equal to p del (, k) = NU dh k NU nh k+ p neutral (, k + ) ). By summing these contributions, we ind that the deleterious site requency spectrum is to leading order NU d, or Nσ p del () =, i λ log λ ( /Nse λ ) Nσ 0, otherwise (45) where the orm proportional to log( /Nse λ ) is once again valid up to a constant actor (or the same reason as described in Appendix I.). DISTRIBUTIONS OF EFFECT SIZES NU d e λ The model we have thus ar considered assumes that all deleterious mutations have the same eect on itness, s. In reality, dierent deleterious mutations will have dierent itness eects. In Appendix J, we show that as long as the variation in the distribution o itness eects (DFE) is small enough that var(s) s log(u d / s), the eects o background selection are well-captured by a single-s model. In practice, this means that when considering moderate values o U d / s e 5, ractional dierences in selection coeicients up to log(u d / s) 0% will not substantively alter allele requency trajectories. In,

22 this case, the combined eects o these mutations are well-described by our single-s model. However, when mutational eect sizes vary over multiple orders o magnitude, properties o the DFE will have an important impact on the quantitative details o the mutational trajectories that are not captured by our single-s model. The qualitative properties o allele requency trajectories will remain the same (see Appendix J): alleles arising on unusually it backgrounds will rapidly spread through the itness distribution, peak or a inite amount o time about t d generations later, and then proceed to go extinct at a rate proportional to their average itness cost. However, the quantitative aspects o these trajectories will be dierent. For instance, small dierences in the itness eects o mutations δs t d that do not impact the early stages o trajectories will be revealed on timescales o order t d, and aect the size and the width o the peak o the allele requency trajectory. We have seen that these two quantities play an important role in determining the properties o allele requency trajectories and o the site requency spectrum. As a result o the act that weaker eects and smaller dierences in eect sizes play a more important role in later parts o the allele requency trajectory, the DFE relevant during the early phases o the trajectory may be dierent than the DFE relevant in the later phases o the trajectory. Furthermore, since longer-lived trajectories are also those reach higher requencies (having originated in backgrounds o higher itness), this can result in a dierent DFE that is relevant at larger requencies, compared to the DFE relevant at lower requencies. As a result, it is possible that or certain DFEs, no single eective eect size can be used to describe the trajectories at all requencies. The ull analysis o a model o background selection in which mutational eect sizes have a broad distribution remains an interesting avenue or uture work. DISCUSSION In this work, we have analyzed how linked puriying selection changes patterns o neutral genetic diversity in a process known as background selection. We have ound that whenever background selection reduces neutral genetic diversity, it also leads to signiicant distortions in the neutral site requency spectrum that cannot be explained by a simple reduction in eective population size (see Figure ). These distortions become increasingly important in larger samples and have more limited eects in smaller samples (Fig. B,C). In this sense, the sample size represents a crucial parameter in populations experiencing background selection. By introducing a orward-time analysis o the trajectories o individual alleles in a ully linked genetic locus experiencing neutral and strongly selected deleterious mutations, we derived analytical ormulas or the whole-population site requency spectrum (see Equations and ). These results can be used to calculate any diversity statistic based on the site requency spectrum in samples o arbitrary size. Our results also oer intuitive explanations o the dynamics that underlie these distortions, and give simple analytical conditions that predict when such distortions occur (Figure 3). In addition to single-timepoint statistics such as the site-requency spectrum, our analysis also yields time-dependent trajectories o alleles. We suggest that these may be crucial or distinguishing between evolutionary models that may remain indistinguishable based on site requency spectra alone. We explain how this intuition about the time-dependent behavior can, in principle, be used to make simple predictions about the history and uture o alleles, and we explain that it suggests new statistics o time-resolved samples that can be used to distinguish between dierent evolutionary models. We discuss these implications in turn below. The requency o a mutation tells us about its history and uture In addition to describing the expected site requency spectrum at a single time point, our analysis o allele requency trajectories allows us to calculate timedependent quantities such as the posterior distribution o the past requency trajectory o polymorphisms seen at a particular requency, their ages, and their uture behavior. For example, since the maximal requency a mutation can attain strongly depends on the itness o the background in which it arose (with lower-itness backgrounds constraining trajectories to lower requencies), observing an allele at a given requency places a lower bound on the itness o the background on which it arose. This in turn is inormative about its past requency trajectory. For example, alleles observed at requencies NU d almost certainly arose in an individual that was among the most it individuals in e λ the population and experienced a rapid initial exponential expansion at rate U d, while alleles observed ( at requencies k k Nske eλ) very likely arose λ on backgrounds with ewer than k deleterious mu-

23 3 tations compared to the most it individual at the time. We emphasize that these thresholds are substantially smaller than the naïve thresholds obtained by assuming that a mutation arising on a background with k mutations can only reach the drit barrier N e ks corresponding to isolated deleterious mutations o itness ks in a population o eective size N e = Ne λ. The itness o the ancestral background that a mutation arose on is not only interesting in terms o characterizing the history o a mutation, but is also inormative o its uture behavior. In the strong selection limit o background selection that we have considered here ( ), deleterious mutations can ix in the population only exponentially rarely (Neher and Shraiman, 0). Thus, mutations arising on backgrounds already carrying deleterious mutations must eventually go extinct. We have shown that the site requency spectrum at requencies is dominated by mutations arising on deleterious backgrounds. Furthermore, we have shown that most polymorphisms seen at these requencies are at the peak o their requency trajectory. This means that we expect the requency o such polymorphisms to decline on average. Thus, i we were to observe the population at some later timepoint, we expect that the polymorphisms present at such low requencies should on average be observed at a lower requency. In Figure 8 we show how the average change in requency ater Ne λ generations depends on the original requency that a mutation was sampled at,. Note that the expectation or a neutral population o any size is that the average allele requency change is exactly equal to zero. In the presence o background selection, this is no longer true or neutral mutations previously observed at requencies < and > (see Fig. 8). In contrast, since polymorphisms observed in the range must have originated in a mutation-ree background, and since their dynamics relects neutral evolution in this 0-class, the overall dynamics o such alleles are neutral. Thereore, though drit will lead to variation in the outcomes o individual alleles in this range, the average expected requency change is equal to zero. This expectation is conirmed by simulations (Fig. 8). Finally, we have seen that polymorphisms seen at requencies will typically already have replaced the wild type allele within the 0-class. Thus, the wild-type allele must eventually go extinct (except or exponentially rare ratchet events). In other words, polymorphisms seen at these requencies are certain to ix, replacing the ancestral allele at Average scaled requency change, Eectively deleterious Neutral Original requency, Sweep-like FIG. 8 The average change in requency o an allele observed at requency in an earlier sample. The pairs o requencies were sampled in two consecutive sampling steps in Wright-Fisher simulations in which N = 0 5, NU d = 4Ns = 0 4. In the irst step, the requencies o all polymorphisms in the population and their unique identiiers were recorded. In the second sample, which was taken δt = Ne Ud/s generations later, the requencies o all polymorphisms seen in the irst sample were recorded (i a polymorphism had gone extinct, or ixed, a requency (δt) = 0, or, was recorded). The average represents the average requency change over many distinct polymorphisms, and the curve has been smoothed using a Gaussian kernel with width < 5% o the minor allele requency. some later point in time (see Fig. 8). Together, these results show that the site requency spectrum can be divided into three regimes, in which the dynamics o individual neutral alleles are eectively negatively selected, eectively neutral, and eectively positively selected. These eective selection pressures arise indirectly, as a result o the itnesses o the variants a neutral mutation at a given requency is likely to be linked to. This eect is important to bear in mind when analyzing time-resolved samples, where these eective selection pressures could naïvely be misinterpreted as evidence o direct negative selection on low-requency derived neutral alleles, and direct positive selection on high-requency derived neutral alleles. The distinguishability o models based on site requency spectra As has long been appreciated, background selection can lead signatures in the site requency spectrum that are qualitatively similar to population expansions and selective sweeps (Charlesworth et al., 993, 995; Good et al., 04; Gordo et al., 00; Hudson and Kaplan, 994; Nicolaisen and Desai, 0; O Fallon et al., 00; Tachida, 000; Walczak et al., 0; Williamson and Orive, 00). Here we have

The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent"

The Structure of Genealogies in the Presence of Purifying Selection: a Fitness-Class Coalescent The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent" The Harvard community has made this article openly available. Please share how this access benefits you.

More information

arxiv: v2 [q-bio.pe] 26 May 2011

arxiv: v2 [q-bio.pe] 26 May 2011 The Structure of Genealogies in the Presence of Purifying Selection: A Fitness-Class Coalescent arxiv:1010.2479v2 [q-bio.pe] 26 May 2011 Aleksandra M. Walczak 1,, Lauren E. Nicolaisen 2,, Joshua B. Plotkin

More information

The achievable limits of operational modal analysis. * Siu-Kui Au 1)

The achievable limits of operational modal analysis. * Siu-Kui Au 1) The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES

CHAPTER 8 ANALYSIS OF AVERAGE SQUARED DIFFERENCE SURFACES CAPTER 8 ANALYSS O AVERAGE SQUARED DERENCE SURACES n Chapters 5, 6, and 7, the Spectral it algorithm was used to estimate both scatterer size and total attenuation rom the backscattered waveorms by minimizing

More information

Aggregate Growth: R =αn 1/ d f

Aggregate Growth: R =αn 1/ d f Aggregate Growth: Mass-ractal aggregates are partly described by the mass-ractal dimension, d, that deines the relationship between size and mass, R =αn 1/ d where α is the lacunarity constant, R is the

More information

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Objectives. By the time the student is finished with this section of the workbook, he/she should be able FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise

More information

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data

More information

AP* Bonding & Molecular Structure Free Response Questions page 1

AP* Bonding & Molecular Structure Free Response Questions page 1 AP* Bonding & Molecular Structure Free Response Questions page 1 Essay Questions 1991 a) two points ΔS will be negative. The system becomes more ordered as two gases orm a solid. b) two points ΔH must

More information

THE vast majority of mutations are neutral or deleterious.

THE vast majority of mutations are neutral or deleterious. Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.106.067678 Beneficial Mutation Selection Balance and the Effect of Linkage on Positive Selection Michael M. Desai*,,1 and Daniel

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

The Deutsch-Jozsa Problem: De-quantization and entanglement

The Deutsch-Jozsa Problem: De-quantization and entanglement The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the

More information

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution Wright-Fisher Models, Approximations, and Minimum Increments of Evolution William H. Press The University of Texas at Austin January 10, 2011 1 Introduction Wright-Fisher models [1] are idealized models

More information

Genetic hitch-hiking in a subdivided population

Genetic hitch-hiking in a subdivided population Genet. Res., Camb. (1998), 71, pp. 155 160. With 3 figures. Printed in the United Kingdom 1998 Cambridge University Press 155 Genetic hitch-hiking in a subdivided population MONTGOMERY SLATKIN* AND THOMAS

More information

Feasibility of a Multi-Pass Thomson Scattering System with Confocal Spherical Mirrors

Feasibility of a Multi-Pass Thomson Scattering System with Confocal Spherical Mirrors Plasma and Fusion Research: Letters Volume 5, 044 200) Feasibility o a Multi-Pass Thomson Scattering System with Conocal Spherical Mirrors Junichi HIRATSUKA, Akira EJIRI, Yuichi TAKASE and Takashi YAMAGUCHI

More information

( 1) ( 2) ( 1) nan integer, since the potential is no longer simple harmonic.

( 1) ( 2) ( 1) nan integer, since the potential is no longer simple harmonic. . Anharmonic Oscillators Michael Fowler Landau (para 8) considers a simple harmonic oscillator with added small potential energy terms mα + mβ. We ll simpliy slightly by dropping the term, to give an equation

More information

Internal thermal noise in the LIGO test masses: A direct approach

Internal thermal noise in the LIGO test masses: A direct approach PHYSICAL EVIEW D VOLUME 57, NUMBE 2 15 JANUAY 1998 Internal thermal noise in the LIGO test masses: A direct approach Yu. Levin Theoretical Astrophysics, Caliornia Institute o Technology, Pasadena, Caliornia

More information

Surfing genes. On the fate of neutral mutations in a spreading population

Surfing genes. On the fate of neutral mutations in a spreading population Surfing genes On the fate of neutral mutations in a spreading population Oskar Hallatschek David Nelson Harvard University ohallats@physics.harvard.edu Genetic impact of range expansions Population expansions

More information

TLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall TLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall Problem 1.

TLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall TLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall Problem 1. TLT-5/56 COMMUNICATION THEORY, Exercise 3, Fall Problem. The "random walk" was modelled as a random sequence [ n] where W[i] are binary i.i.d. random variables with P[W[i] = s] = p (orward step with probability

More information

RATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions

RATIONAL FUNCTIONS. Finding Asymptotes..347 The Domain Finding Intercepts Graphing Rational Functions RATIONAL FUNCTIONS Finding Asymptotes..347 The Domain....350 Finding Intercepts.....35 Graphing Rational Functions... 35 345 Objectives The ollowing is a list o objectives or this section o the workbook.

More information

Online Appendix: The Continuous-type Model of Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye

Online Appendix: The Continuous-type Model of Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye Online Appendix: The Continuous-type Model o Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye For robustness check, in this section we extend our

More information

An Ensemble Kalman Smoother for Nonlinear Dynamics

An Ensemble Kalman Smoother for Nonlinear Dynamics 1852 MONTHLY WEATHER REVIEW VOLUME 128 An Ensemble Kalman Smoother or Nonlinear Dynamics GEIR EVENSEN Nansen Environmental and Remote Sensing Center, Bergen, Norway PETER JAN VAN LEEUWEN Institute or Marine

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

Least-Squares Spectral Analysis Theory Summary

Least-Squares Spectral Analysis Theory Summary Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,

More information

EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION

EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION Friday, July 27th, 11:00 EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION Karan Pattni karanp@liverpool.ac.uk University of Liverpool Joint work with Prof.

More information

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites Scott Williamson and Maria E. Orive Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence We investigate

More information

Model for the spacetime evolution of 200A GeV Au-Au collisions

Model for the spacetime evolution of 200A GeV Au-Au collisions PHYSICAL REVIEW C 70, 021903(R) (2004) Model or the spacetime evolution o 200A GeV Au-Au collisions Thorsten Renk Department o Physics, Duke University, P.O. Box 90305, Durham, North Carolina 27708, USA

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

Anderson impurity in a semiconductor

Anderson impurity in a semiconductor PHYSICAL REVIEW B VOLUME 54, NUMBER 12 Anderson impurity in a semiconductor 15 SEPTEMBER 1996-II Clare C. Yu and M. Guerrero * Department o Physics and Astronomy, University o Caliornia, Irvine, Caliornia

More information

Selection and Population Genetics

Selection and Population Genetics Selection and Population Genetics Evolution by natural selection can occur when three conditions are satisfied: Variation within populations - individuals have different traits (phenotypes). height and

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

( x) f = where P and Q are polynomials.

( x) f = where P and Q are polynomials. 9.8 Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm ( ) ( ) ( ) P where P and Q are polynomials. Q An eample o a simple rational

More information

Gaussian Process Regression Models for Predicting Stock Trends

Gaussian Process Regression Models for Predicting Stock Trends Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no

More information

Philadelphia University Faculty of Engineering Communication and Electronics Engineering

Philadelphia University Faculty of Engineering Communication and Electronics Engineering Module: Electronics II Module Number: 6503 Philadelphia University Faculty o Engineering Communication and Electronics Engineering Ampliier Circuits-II BJT and FET Frequency Response Characteristics: -

More information

Population Genetics: a tutorial

Population Genetics: a tutorial : a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development

More information

Genetic Variation in Finite Populations

Genetic Variation in Finite Populations Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2

More information

Analysis Scheme in the Ensemble Kalman Filter

Analysis Scheme in the Ensemble Kalman Filter JUNE 1998 BURGERS ET AL. 1719 Analysis Scheme in the Ensemble Kalman Filter GERRIT BURGERS Royal Netherlands Meteorological Institute, De Bilt, the Netherlands PETER JAN VAN LEEUWEN Institute or Marine

More information

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction . ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION Xu Bei, Yeo Jun Yoon and Ali Abur Teas A&M University College Station, Teas, U.S.A. abur@ee.tamu.edu Abstract This paper presents

More information

Micro-canonical ensemble model of particles obeying Bose-Einstein and Fermi-Dirac statistics

Micro-canonical ensemble model of particles obeying Bose-Einstein and Fermi-Dirac statistics Indian Journal o Pure & Applied Physics Vol. 4, October 004, pp. 749-757 Micro-canonical ensemble model o particles obeying Bose-Einstein and Fermi-Dirac statistics Y K Ayodo, K M Khanna & T W Sakwa Department

More information

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge 0 E ttenuator Pair Ratio or vs requency NEEN-ERLN 700 Option-E 0-0 k Ultra-precision apacitance/loss ridge ttenuator Ratio Pair Uncertainty o in ppm or ll Usable Pairs o Taps 0 0 0. 0. 0. 07/08/0 E E E

More information

Letters to the editor

Letters to the editor Letters to the editor Dear Editor, I read with considerable interest the article by Bruno Putzeys in Linear udio Vol 1, The F- word - or, why there is no such thing as too much eedback. In his excellent

More information

CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS. Convective heat transfer analysis of nanofluid flowing inside a

CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS. Convective heat transfer analysis of nanofluid flowing inside a Chapter 4 CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS Convective heat transer analysis o nanoluid lowing inside a straight tube o circular cross-section under laminar and turbulent conditions

More information

First-principles calculation of defect free energies: General aspects illustrated in the case of bcc-fe. D. Murali, M. Posselt *, and M.

First-principles calculation of defect free energies: General aspects illustrated in the case of bcc-fe. D. Murali, M. Posselt *, and M. irst-principles calculation o deect ree energies: General aspects illustrated in the case o bcc-e D. Murali, M. Posselt *, and M. Schiwarth Helmholtz-Zentrum Dresden - Rossendor, Institute o Ion Beam Physics

More information

A Systematic Approach to Frequency Compensation of the Voltage Loop in Boost PFC Pre- regulators.

A Systematic Approach to Frequency Compensation of the Voltage Loop in Boost PFC Pre- regulators. A Systematic Approach to Frequency Compensation o the Voltage Loop in oost PFC Pre- regulators. Claudio Adragna, STMicroelectronics, Italy Abstract Venable s -actor method is a systematic procedure that

More information

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective Numerical Solution o Ordinary Dierential Equations in Fluctuationlessness Theorem Perspective NEJLA ALTAY Bahçeşehir University Faculty o Arts and Sciences Beşiktaş, İstanbul TÜRKİYE TURKEY METİN DEMİRALP

More information

THE FATE OF A RECESSIVE ALLELE IN

THE FATE OF A RECESSIVE ALLELE IN THE FATE OF A RECESSIVE ALLELE IN A MENDELIAN DIPLOID MODEL Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Mathematisch-Naturwissenschatlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität

More information

( ) ( ) ( ) + ( ) Ä ( ) Langmuir Adsorption Isotherms. dt k : rates of ad/desorption, N : totally number of adsorption sites.

( ) ( ) ( ) + ( ) Ä ( ) Langmuir Adsorption Isotherms. dt k : rates of ad/desorption, N : totally number of adsorption sites. Langmuir Adsorption sotherms... 1 Summarizing Remarks... Langmuir Adsorption sotherms Assumptions: o Adsorp at most one monolayer o Surace is uniorm o No interaction between adsorbates All o these assumptions

More information

A Method for Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model

A Method for Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model APRIL 2006 S A L M A N E T A L. 1081 A Method or Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model H. SALMAN, L.KUZNETSOV, AND C. K. R. T. JONES Department o Mathematics, University

More information

Formalizing the gene centered view of evolution

Formalizing the gene centered view of evolution Chapter 1 Formalizing the gene centered view of evolution Yaneer Bar-Yam and Hiroki Sayama New England Complex Systems Institute 24 Mt. Auburn St., Cambridge, MA 02138, USA yaneer@necsi.org / sayama@necsi.org

More information

Neutral Theory of Molecular Evolution

Neutral Theory of Molecular Evolution Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

Analytical expressions for field astigmatism in decentered two mirror telescopes and application to the collimation of the ESO VLT

Analytical expressions for field astigmatism in decentered two mirror telescopes and application to the collimation of the ESO VLT ASTRONOMY & ASTROPHYSICS MAY II 000, PAGE 57 SUPPLEMENT SERIES Astron. Astrophys. Suppl. Ser. 44, 57 67 000) Analytical expressions or ield astigmatism in decentered two mirror telescopes and application

More information

Curve Sketching. The process of curve sketching can be performed in the following steps:

Curve Sketching. The process of curve sketching can be performed in the following steps: Curve Sketching So ar you have learned how to ind st and nd derivatives o unctions and use these derivatives to determine where a unction is:. Increasing/decreasing. Relative extrema 3. Concavity 4. Points

More information

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

SEPARATED AND PROPER MORPHISMS

SEPARATED AND PROPER MORPHISMS SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN The notions o separatedness and properness are the algebraic geometry analogues o the Hausdor condition and compactness in topology. For varieties over the

More information

Linking levels of selection with genetic modifiers

Linking levels of selection with genetic modifiers Linking levels of selection with genetic modifiers Sally Otto Department of Zoology & Biodiversity Research Centre University of British Columbia @sarperotto @sse_evolution @sse.evolution Sally Otto Department

More information

Effective population size and patterns of molecular evolution and variation

Effective population size and patterns of molecular evolution and variation FunDamental concepts in genetics Effective population size and patterns of molecular evolution and variation Brian Charlesworth Abstract The effective size of a population,, determines the rate of change

More information

Robust Residual Selection for Fault Detection

Robust Residual Selection for Fault Detection Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed

More information

Some mathematical models from population genetics

Some mathematical models from population genetics Some mathematical models from population genetics 5: Muller s ratchet and the rate of adaptation Alison Etheridge University of Oxford joint work with Peter Pfaffelhuber (Vienna), Anton Wakolbinger (Frankfurt)

More information

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell

More information

Long-Term Response and Selection limits

Long-Term Response and Selection limits Long-Term Response and Selection limits Bruce Walsh lecture notes Uppsala EQG 2012 course version 5 Feb 2012 Detailed reading: online chapters 23, 24 Idealized Long-term Response in a Large Population

More information

CHAPTER 5 Reactor Dynamics. Table of Contents

CHAPTER 5 Reactor Dynamics. Table of Contents 1 CHAPTER 5 Reactor Dynamics prepared by Eleodor Nichita, UOIT and Benjamin Rouben, 1 & 1 Consulting, Adjunct Proessor, McMaster & UOIT Summary: This chapter addresses the time-dependent behaviour o nuclear

More information

An Alternative Poincaré Section for Steady-State Responses and Bifurcations of a Duffing-Van der Pol Oscillator

An Alternative Poincaré Section for Steady-State Responses and Bifurcations of a Duffing-Van der Pol Oscillator An Alternative Poincaré Section or Steady-State Responses and Biurcations o a Duing-Van der Pol Oscillator Jang-Der Jeng, Yuan Kang *, Yeon-Pun Chang Department o Mechanical Engineering, National United

More information

The concept of limit

The concept of limit Roberto s Notes on Dierential Calculus Chapter 1: Limits and continuity Section 1 The concept o limit What you need to know already: All basic concepts about unctions. What you can learn here: What limits

More information

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,

More information

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2014

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2014 Lecture 10 1/31/14 University o Washington Department o Chemistry Chemistry 453 Winter Quarter 014 A on-cooperative & Fully Cooperative inding: Scatchard & Hill Plots Assume binding siteswe have derived

More information

Muller s Ratchet and the Pattern of Variation at a Neutral Locus

Muller s Ratchet and the Pattern of Variation at a Neutral Locus Copyright 2002 by the Genetics Society of America Muller s Ratchet and the Pattern of Variation at a Neutral Locus Isabel Gordo,*,1 Arcadio Navarro and Brian Charlesworth *Instituto Gulbenkian da Ciência,

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds

More information

Aerodynamic Admittance Function of Tall Buildings

Aerodynamic Admittance Function of Tall Buildings Aerodynamic Admittance Function o Tall Buildings in hou a Ahsan Kareem b a alou Engineering Int l, Inc., 75 W. Campbell Rd, Richardson, T, USA b Nataz odeling Laboratory, Uniersity o Notre Dame, Notre

More information

Fitness landscapes and seascapes

Fitness landscapes and seascapes Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of

More information

Interatomic potentials for monoatomic metals from experimental data and ab initio calculations

Interatomic potentials for monoatomic metals from experimental data and ab initio calculations PHYSICAL REVIEW B VOLUME 59, NUMBER 5 1 FEBRUARY 1999-I Interatomic potentials or monoatomic metals rom experimental data and ab initio calculations Y. Mishin and D. Farkas Department o Materials Science

More information

NON-EQUILIBRIUM REACTION RATES IN THE MACROSCOPIC CHEMISTRY METHOD FOR DSMC CALCULATIONS. March 19, 2007

NON-EQUILIBRIUM REACTION RATES IN THE MACROSCOPIC CHEMISTRY METHOD FOR DSMC CALCULATIONS. March 19, 2007 NON-EQUILIBRIUM REACTION RATES IN THE MACROSCOPIC CHEMISTRY METHOD FOR DSMC CALCULATIONS M.J. GOLDSWORTHY, M.N. MACROSSAN & M.M ABDEL-JAWAD March 9, 7 The Direct Simulation Monte Carlo (DSMC) method is

More information

9.1 The Square Root Function

9.1 The Square Root Function Section 9.1 The Square Root Function 869 9.1 The Square Root Function In this section we turn our attention to the square root unction, the unction deined b the equation () =. (1) We begin the section

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

Calculated electron dynamics in an electric field

Calculated electron dynamics in an electric field PHYSICAL REVIEW A VOLUME 56, NUMBER 1 JULY 1997 Calculated electron dynamics in an electric ield F. Robicheaux and J. Shaw Department o Physics, Auburn University, Auburn, Alabama 36849 Received 18 December

More information

NONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION

NONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION NONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION Steven Ball Science Applications International Corporation Columbia, MD email: sball@nmtedu Steve Schaer Department o Mathematics

More information

The dynamics of complex adaptation

The dynamics of complex adaptation The dynamics of complex adaptation Daniel Weissman Mar. 20, 2014 People Michael Desai, Marc Feldman, Daniel Fisher Joanna Masel, Meredith Trotter; Yoav Ram Other relevant work: Nick Barton, Shahin Rouhani;

More information

Quadratic Functions. The graph of the function shifts right 3. The graph of the function shifts left 3.

Quadratic Functions. The graph of the function shifts right 3. The graph of the function shifts left 3. Quadratic Functions The translation o a unction is simpl the shiting o a unction. In this section, or the most part, we will be graphing various unctions b means o shiting the parent unction. We will go

More information

Chapter 4. Improved Relativity Theory (IRT)

Chapter 4. Improved Relativity Theory (IRT) Chapter 4 Improved Relativity Theory (IRT) In 1904 Hendrik Lorentz ormulated his Lorentz Ether Theory (LET) by introducing the Lorentz Transormations (the LT). The math o LET is based on the ollowing assumptions:

More information

arxiv: v1 [q-bio.pe] 5 Feb 2013

arxiv: v1 [q-bio.pe] 5 Feb 2013 Genetic draft, selective interference, and population genetics of rapid adaptation Richard A. Neher Max Planck Institute for Developmental Biology, Tübingen, 72070, Germany. richard.neher@tuebingen.mpg.de

More information

URN MODELS: the Ewens Sampling Lemma

URN MODELS: the Ewens Sampling Lemma Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 3, 2014 1 2 3 4 Mutation Mutation: typical values for parameters Equilibrium Probability of fixation 5 6 Ewens Sampling

More information

The Relation between Multilocus Population Genetics and Social Evolution Theory

The Relation between Multilocus Population Genetics and Social Evolution Theory vol. 169, no. the american naturalist ebruary 007 The Relation between Multilocus Population Genetics and Social Evolution Theory Andy Gardner, 1,,* Stuart A. West,, and Nicholas H. Barton, 1. Department

More information

Partially fluidized shear granular flows: Continuum theory and molecular dynamics simulations

Partially fluidized shear granular flows: Continuum theory and molecular dynamics simulations Partially luidized shear granular lows: Continuum theory and molecular dynamics simulations Dmitri Volson, 1 Lev S. Tsimring, 1 and Igor S. Aranson 2 1 Institute or Nonlinear Science, University o Caliornia,

More information

MEASUREMENT UNCERTAINTIES

MEASUREMENT UNCERTAINTIES MEASUREMENT UNCERTAINTIES What distinguishes science rom philosophy is that it is grounded in experimental observations. These observations are most striking when they take the orm o a quantitative measurement.

More information

Chem 406 Biophysical Chemistry Lecture 1 Transport Processes, Sedimentation & Diffusion

Chem 406 Biophysical Chemistry Lecture 1 Transport Processes, Sedimentation & Diffusion Chem 406 Biophysical Chemistry Lecture 1 Transport Processes, Sedimentation & Diusion I. Introduction A. There are a group o biophysical techniques that are based on transport processes. 1. Transport processes

More information

Remote Sensing ISSN

Remote Sensing ISSN Remote Sens. 2012, 4, 1162-1189; doi:10.3390/rs4051162 Article OPEN ACCESS Remote Sensing ISSN 2072-4292 www.mdpi.com/journal/remotesensing Dielectric and Radiative Properties o Sea Foam at Microwave Frequencies:

More information

Light, Quantum Mechanics and the Atom

Light, Quantum Mechanics and the Atom Light, Quantum Mechanics and the Atom Light Light is something that is amiliar to most us. It is the way in which we are able to see the world around us. Light can be thought as a wave and, as a consequence,

More information

Reliability of Axially Loaded Fiber-Reinforced-Polymer Confined Reinforced Concrete Circular Columns

Reliability of Axially Loaded Fiber-Reinforced-Polymer Confined Reinforced Concrete Circular Columns American J. o Engineering and Applied Sciences (1): 31-38, 009 ISSN 1941-700 009 Science Publications Reliability o Axially Loaded Fiber-Reinorced-Polymer Conined Reinorced Concrete Circular Columns Venkatarman

More information

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection CHAPTER 23 THE EVOLUTIONS OF POPULATIONS Section C: Genetic Variation, the Substrate for Natural Selection 1. Genetic variation occurs within and between populations 2. Mutation and sexual recombination

More information

Problems on Evolutionary dynamics

Problems on Evolutionary dynamics Problems on Evolutionary dynamics Doctoral Programme in Physics José A. Cuesta Lausanne, June 10 13, 2014 Replication 1. Consider the Galton-Watson process defined by the offspring distribution p 0 =

More information

Chapter 2 Lecture 7 Longitudinal stick fixed static stability and control 4 Topics

Chapter 2 Lecture 7 Longitudinal stick fixed static stability and control 4 Topics hapter 2 Lecture 7 Longitudinal stick ixed static stability and control 4 Topics 2.4.6 Revised expression or mcgt 2.4.7 mαt in stick-ixed case 2.5 ontributions o uselage to mcg and mα 2.5.1 ontribution

More information

On the Flow-level Dynamics of a Packet-switched Network

On the Flow-level Dynamics of a Packet-switched Network On the Flow-level Dynamics o a Packet-switched Network Ciamac C. Moallemi and Devavrat Shah Abstract: The packet is the undamental unit o transportation in modern communication networks such as the Internet.

More information

Scattering of Solitons of Modified KdV Equation with Self-consistent Sources

Scattering of Solitons of Modified KdV Equation with Self-consistent Sources Commun. Theor. Phys. Beijing, China 49 8 pp. 89 84 c Chinese Physical Society Vol. 49, No. 4, April 5, 8 Scattering o Solitons o Modiied KdV Equation with Sel-consistent Sources ZHANG Da-Jun and WU Hua

More information

Strain and Stress Measurements with a Two-Dimensional Detector

Strain and Stress Measurements with a Two-Dimensional Detector Copyright ISSN (C) 97-, JCPDS-International Advances in X-ray Centre Analysis, or Volume Diraction 4 Data 999 5 Strain and Stress Measurements with a Two-Dimensional Detector Baoping Bob He and Kingsley

More information

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Evolutionary Theory Mathematical and Conceptual Foundations Sean H. Rice Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Contents Preface ix Introduction 1 CHAPTER 1 Selection on One

More information