The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations

Size: px
Start display at page:

Download "The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations"

Transcription

1 Copyright Ó 2009 by the Genetics Society of America DOI: /genetics The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations Thomas Städler,*,1 Bernhard Haubold, Carlos Merino, Wolfgang Stephan and Peter Pfaffelhuber *Institute of Integrative Biology, Plant Ecological Genetics, ETH Zurich, CH-8092 Zurich, Switzerland, Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, D Plön, Germany, Department of Biology II, Section of Evolutionary Biology, University of Munich, D Planegg-Martinsried, Germany and Faculty of Mathematics and Physics, University of Freiburg, D Freiburg, Germany Manuscript received August 6, 2008 Accepted for publication February 17, 2009 ABSTRACT Using coalescent simulations, we study the impact of three different sampling schemes on patterns of neutral diversity in structured populations. Specifically, we are interested in two summary statistics based on the site frequency spectrum as a function of migration rate, demographic history of the entire substructured population (including timing and magnitude of specieswide expansions), and the sampling scheme. Using simulations implementing both finite-island and two-dimensional stepping-stone spatial structure, we demonstrate strong effects of the sampling scheme on Tajima s D (D T ) and Fu and Li s D (D FL ) statistics, particularly under specieswide (range) expansions. Pooled samples yield average D T and D FL values that are generally intermediate between those of local and scattered samples. Local samples (and to a lesser extent, pooled samples) are influenced by local, rapid coalescence events in the underlying coalescent process. These processes result in lower proportions of external branch lengths and hence lower proportions of singletons, explaining our finding that the sampling scheme affects D FL more than it does D T. Under specieswide expansion scenarios, these effects of spatial sampling may persist up to very high levels of gene flow (Nm. 25), implying that local samples cannot be regarded as being drawn from a panmictic population. Importantly, many data sets on humans, Drosophila, and plants contain signatures of specieswide expansions and effects of sampling scheme that are predicted by our simulation results. This suggests that validating the assumption of panmixia is crucial if robust demographic inferences are to be made from local or pooled samples. However, future studies should consider adopting a framework that explicitly accounts for the genealogical effects of population subdivision and empirical sampling schemes. ALTHOUGH most (if not all) species are characterized by various degrees of population subdivision, population geneticists continue to employ models of panmictic populations of constant size (the neutral equilibrium, NE, model) when testing for selection and/ or demographic changes through time. Motivated by empirical data sets that were found to be incompatible with expectations under the NE model, recent largescale studies have employed coalescent simulations of demographic changes such as bottlenecks and/or population expansions in attempts to estimate parameters of such (arguably) biologically more realistic scenarios (e.g., Marth et al. 2004; Nordborg et al. 2005; Ometto et al. 2005; Schmid et al. 2005; Heuertz et al. 2006; Li and Stephan 2006; Pyhäjärvi et al. 2007; Ross-Ibarra et al. 2008). However, such simulations still assume unstructured populations, while the empirical data may derive Supporting information is available online at http: / cgi/content/full/genetics /dc1. 1 Corresponding author: Institute of Integrative Biology, Plant Ecological Genetics, ETH Zurich, Universitätstrasse 16, CH-8092 Zurich, Switzerland. thomas.staedler@env.ethz.ch from subdivided species and a variety of sampling schemes. The data typically collected in such projects consist of multilocus DNA sequences from which genealogical information is extracted, e.g., on the number of segregating sites (S), the average pairwise sequence diversity (p), and either the full site frequency spectrum or summary statistics based on the site frequency spectrum, such as Tajima s D (Tajima 1989), Fu and Li s D (Fu and Li 1993), and Fay and Wu s H (Fay and Wu 2000). Here, we are concerned with estimates of specieswide demographic history from such summary statistics and particularly with the joint impact of population subdivision, population expansion, and the sampling scheme on statistics summarizing the site frequency spectrum. Our motivation stems largely from striking patterns in empirical data sets that, we believe, need to be evaluated for their general implications. For example, in empirical studies that include several genuine population samples, it becomes possible to contrast the site frequency spectra of local population samples with those obtained for the pooled (combined) samples. Although this possibility has not been exploited in all such studies, Genetics 182: (May 2009)

2 206 T. Städler et al. the common observation has been that site frequency spectra are more skewed toward low-frequency polymorphisms in pooled samples, compared to the means of genuine population samples; we refer to this phenomenon as the pooling effect. Such genealogical signals have been obtained in a variety of organisms such as humans (Alonso and Armour 2001; Ptak and Przeworski 2002; Hammer et al. 2003), Drosophila (Pool and Aquadro 2006), and several plant species (Ingvarsson 2005; Arunyawat et al. 2007; Moeller et al. 2007; Pyhäjärvi et al. 2007). At least partly due to the putatively confounding effects of population subdivision and population expansion first identified by Ptak and Przeworski (2002), several studies have explicitly favored the analysis and interpretation of data obtained from large samples of single populations or geographic regions (e.g., Adams and Hudson 2004; Marth et al. 2004; Garrigan et al. 2007). Similarly, a recent multilocus study of several populations of teosinte (the wild ancestor of maize) reiterated the notion of genealogical signals for population expansion (e.g., negative values of Tajima s D) being confounded with effects of population structure (Moeller et al. 2007). In our recent study of patterns of polymorphism and population subdivision in two closely related species of wild tomatoes, however, we arrived at quite different conclusions (Arunyawat et al. 2007). Inspired by the results of two complementary simulation studies of subdivided populations (Ray et al. 2003; De and Durrett 2007; see below), we interpreted this apparently widespread empirical pattern (see references cited above) as being a consequence of the effect of the sampling scheme on sample genealogies. Specifically, Arunyawat et al. (2007) envisioned the impact of population subdivision as masking the true genealogical signal of the specieswide demography via loss of low-frequency polymorphisms in local samples, rather than as creating an excess of low-frequency variants in pooled samples. The difference in interpretation compared to the teosinte study by Moeller et al. (2007) is not due to the underlying data, as the site frequency spectra of pooled population samples (four populations per species in our study on wild tomatoes) also showed a marked excess of singletons compared to the average spectra for individual populations, resulting in significantly negative values of Tajima s D and/or Fu and Li s D (Arunyawat et al.2007; Städler et al. 2008). Using parameter values designed to reflect patterns of variation in human mitochondrial DNA, Ray et al. (2003) simulated a range expansion under steppingstone structure and sampled 30 sequences taken from a single deme. Under fairly low levels of gene flow between demes, sample genealogies were shaped by a mixture of local coalescent events (leading to near-zero branch lengths) and ancestral lineages coalescing in the founding deme at the onset (looking forward in time) of the specieswide range expansion. The overall shape of such genealogies yielded values of Tajima s D that were not significant under NE expectations for low to moderate migration rates. Only under very high levels of gene flow did the single-deme samples exhibit genealogies expected for strong population expansions (Ray et al. 2003), reflecting the low number of coalescence events within the sampled deme (Wakeley 1999, 2001, 2004; Wakeley and Aliacar 2001). Analogous phenomena were recently uncovered by De and Durrett (2007), who simulated an equilibrium stepping-stone model of 100 demes, using scaled mutation parameters appropriate for nuclear loci in humans. Compared to expectations for a panmictic equilibrium population, these authors found lower average S, more extended linkage disequilibrium, and average frequency spectra skewed toward intermediate-frequency polymorphisms (i.e., positive Tajima s D values) in simulated samples from single demes. A notable limitation of these studies is their focus on samples obtained from single demes (Ray et al. 2003), although De and Durrett (2007) also considered random samples and showed that their genealogical properties approach those of samples from random-mating populations. Arunyawat et al. (2007) qualitatively interpreted the pooling effect identified in their wild tomato data (and observed for several other empirical studies cited above) as resulting from the changed genealogical structure of combined samples, analogous to increasing the proportion of migrants in single-deme samples. In both circumstances, scattering-phase coalescence events ought to decrease in favor of higher numbers of ancestral lines remaining at the start of the collecting phase (looking backward in time) of the coalescent process, thus yielding higher proportions of external branches of the genealogies. In this study, we set out to test the interpretations of Arunyawat et al. (2007), in particular their qualitative prediction that the genealogies of pooled samples (composed of several local population samples treated as one entity), as reflected in the site frequency spectrum, should be intermediate between those of local and scattered samples. The latter comprise single sequences from each of many demes and are known to possess the genealogical structure of samples from unstructured populations (assuming a large number of demes; Wakeley 1999, 2001, 2004; Wakeley and Aliacar 2001). More generally, we assess the genealogical properties of these three types of samples under both the symmetric finite-island model and the stepping-stone model of population structure. Given that Arunyawat et al. (2007) interpreted the frequent observation of negative Tajima s D values in pooled samples of diverse taxa (e.g., Ptak and Przeworski 2002; Hammer et al. 2003; Pool and Aquadro 2006; Moeller et al. 2007) as signatures of specieswide (range) expansions, our emphasis is on models of structured populations undergo-

3 Genealogies and Population Subdivision 207 ing an overall expansion in population size. To this end, our coalescent simulations encompass a wide range of migration rates, magnitude, and timing of expansions. We find a strong effect of the sampling scheme on two commonly used summary statistics based on the site frequency spectrum, where the pooling effect gradually decreases with increasing levels of gene flow. Our results imply that sequence data for most species should not be evaluated under a panmixia framework and that attention to sampling is paramount even for weakly subdivided taxa. Moreover, disentangling the effects of demography from those of positive selection appears to be an even more challenging task than currently assumed. MODELS AND METHODS Coalescent simulations under two models of population structure: All patterns of sequence diversity were generated using the coalescent simulation software ms (Hudson 2002) to model the following evolutionary scenario. At the time of sampling, the population consists of I demes, each containing N 0 diploid individuals. For the first set of simulations, the subdivided population is at equilibrium with constant population size. Along every line mutations accumulate at rate m, and u ¼ 4N 0 m. We chose to simulate mainly with a fixed value of u ¼ 1 (for simulations implementing 100 demes) rather than with a fixed number of segregating sites S or choosing different u s for different simulation scenarios to obtain realistic values of S and/or p. Simulating with afixeds has been shown to yield inaccurate results under nonequilibrium demography (Ramos-Onsins et al. 2007), and the critical values for Tajima s D and Fu and Li s D depend not too strongly on u. Moreover, the effects we describe in this article are orders of magnitude larger than what could be generated by choosing a different u or fixing S. The demes exchange (haploid) migrants under either an island model or a two-dimensional stepping-stone model at rate m, and we consider a broad range of gene flow: 0.1 # 4N 0 m # 100. Under the island model an ancestral line in deme j switches its location to deme i at rate m/(i 1). Under the stepping-stone model, we assume that I ¼ a 2 ; i.e., the population is arranged in a square lattice of a 3 a demes and we assume periodic boundary conditions. This means that the migration rate is m/4 if i ¼ (i 1, i 2 )andj ¼ (j 1, j 2 ) are neighboring demes and 0 otherwise. Here, i and j are neighbors if j(i 1 j 1 )mod aj 1 j(i 2 j 2 )mod aj ¼1. In other words, an individual at location (a, i 2 ) can migrate to (1, i 2 ) and one at (i 1, a) can migrate to (i 1, 1) and vice versa. We modified ms to be able to efficiently sample sequences from randomly chosen demes rather than fixed demes. Specifically, in each iteration the modified version of ms shuffles the entries in the sample configuration array inconfig at the beginning of the function segtre_mig (Hudson 2002). The C code of this program is available from Implementing range expansions under population subdivision: For an additional set of coalescent simulations, we assume that the structure in the population was created some time t in the past (t is measured in units of 4N 0 generations), i.e., before time t the population was panmictic and of size N A. This scheme ought to be plausible under range expansions, e.g., as exemplified by migration out of Africa by both humans and Drosophila melanogaster and subsequent colonization of expansive areas, or temperate-zone populations expanding from glacial refugia, or following a speciation event such as that inferred for the two wild tomato species Solanum peruvianum and S. chilense (Städler et al. 2005, 2008). Moreover, this is essentially a generalized isolation with migration (IM) model of divergence with a large number of extant demes (Nielsen and Wakeley 2001; Hey and Nielsen 2004; Wilkinson-Herbots 2008). Looking forward in time, at time t the ancestral population splits into I demes of equal size and in equal proportions. The expansion factor for the total population at time t is thus given by b ¼ I 3 N 0 =N A : ð1þ Note that a value of b ¼ 1 implies constant population size in the sense that a panmictic population at time t in the past split into I demes each of size N 0 (¼ N A /I), without changing the total census size of the entire, now subdivided population. This particular scenario can be seen as a form of range fragmentation, albeit without decline in total population size. Sampling schemes and descriptors of diversity and differentiation: Simulated samples of total size n (20 in our numerical examples) from the structured population were implemented as local, pooled, orscattered. Local samples contain n sequences from a single island; i.e., only one arbitrarily chosen deme is sampled. Pooled samples contain several lines each from several demes (we take five lines from each of four demes in our simulations). Scattered samples encompass single sequences from each of n different demes, i.e., only one sequence per sampled deme. A commonly used statistic to quantify population structure from patterns of diversity within and among local populations is F ST (e.g., Hudson et al. 1992). In particular, if the number of demes is large and the population is in equilibrium, E½F ST Š¼1=ð1 1 4N 0 mþ ð2þ (e.g., Wright 1951), and thus migration rates can, in principle, be estimated from observed values of F ST (but see Whitlock and McCauley 1999 for numerous caveats and Jost 2008 for a more fundamental critique of F ST -based estimates of differentiation and gene flow).

4 208 T. Städler et al. In our simulations, F ST can be computed only under the pooled sampling scheme. We used the formula F ST ¼ 1 p w /p b, where p b is the average number of differences for pairs of sequences taken from different demes, and p w is the average number of differences for pairs sampled within demes. The average here means only the average over all pairs of sequences, and not over all simulation runs. Thus, for every simulation run, we recorded exactly one F ST value. For the stepping-stone model, we used the same computations; i.e., F ST does not take isolation by distance into account. To describe sequence diversity patterns, we focus on the site frequency spectrum, as summarized by statistics such as the widely used Tajima s D (Tajima 1989) and Fu and Li s D (Fu and Li 1993). For clarity, we denote these distinct D statistics as D T and D FL, respectively. Using these particular summary statistics enables us to perform power analyses to reject the standard NE model. Moreover, we chose to include D FL because the singleton class appeared to be the major reason for the lower/ more negative D T values in pooled vs. local samples of wild tomatoes (Arunyawat et al. 2007). The statistical package R was used to drive our modified version of ms and to compute these statistics from its output; the corresponding R scripts are available at evolbio.mpg.de/sampling. Demographic signatures in multilocus data from wild tomatoes: Our recent studies on population subdivision and speciation in two wild tomato species provided the motivation for our current simulation study (Roselius et al. 2005; Städler et al. 2005, 2008; Arunyawat et al. 2007; see Introduction). We extracted the biallelic, synonymous and noncoding single-nucleotide polymorphisms (SNPs) contained in the data of Arunyawat et al. (2007) to confine analyses of the site frequency spectrum to putatively neutral polymorphisms and those fitting the infinite-sites assumption implemented in Hudson s (2002) ms program. Tajima s D T and Fu and Li s D FL values of these pruned data sets were computed in the SITES program ( Specifically, we computed the average D T and D FL values across the four single-population samples per species, as well as the corresponding statistics obtained by treating all sequences within species as a single sample; we refer to these latter entities as the pooled samples (see Arunyawat et al. 2007). While devising estimators of the demographic parameters b and t is beyond the scope of the present article (and may require the availability of multilocus data obtained from scattered samples), we implemented coalescent simulations reflecting the actual sampling scheme used by Arunyawat et al. (2007) and fixed other parameters according to aspects of the empirical data and perceived biological context. Both stepping-stone and island-model simulations were performed with 400 local demes, u ¼ 0.25, and sampling 11 sequences from each of 4 local demes (recall that demes were randomly chosen for each of the 1000 iterations). The migration rate was set to 4N 0 m ¼ 5, reflecting the average F ST estimates found for both species (Arunyawat et al. 2007; and see Figure 5 below). Simulations explored the joint effects of varying the time (t) and magnitude (b) of demographic expansions concomitant with establishment of population subdivision. From these simulations, we recorded estimates of D T and D FL for two sampling schemes, local (means of the 4 local samples) and pooled (all 44 sequences treated as one sample). Average estimates of D T and D FL obtained in this manner for many simulated combinations of b and t are presented as contour plots. RESULTS Our coalescent simulations yield results for levels of nucleotide diversity, summary statistics based on the site frequency spectrum, and population differentiation under both equilibrium and nonequilibrium demographic history, all obtained for three different sampling schemes: local, pooled, and scattered. All our findings are consistent between the island model and the stepping-stone model, and in this article we focus on quantitative results obtained under stepping-stone spatial structure; results for the island model are presented as supporting figures online. The site frequency spectrum under population subdivision: The simplest demographic scenario we analyzed is an equilibrium population subdivided into 100 demes; i.e., we first focus on the effects of population structure per se without any past changes of population size. As expected, characteristics of the site frequency spectra depend strongly on the sampling scheme. For the stepping-stone model of population structure, Figure 1 shows the simulation results under various levels of gene flow. For a migration rate of 4N 0 m ¼ 10, local samples produce values of Tajima s D T (Fu and Li s D FL )thatare significantly different from values expected under the NE model (two-tailed test, P, 0.05) in 16% (39%) of all cases, while scattered samples give significant results in only 6% (7%) of all simulations. In particular, we see that local samples generate values for both statistics that are higher than expected for samples from panmictic populations, reflecting a site frequency distribution skewed toward intermediate-frequency mutations; this result mirrors the recent work of De and Durrett (2007). For migration rates (in units of 4N 0 m) between 2 and 50, pooled samples exhibit site frequency spectra that are broadly intermediate between those of local and scattered samples. The differences in sample genealogies (as reflected in estimates of D T and D FL ) gradually diminish with higher levels of gene flow, but some differences among sampling schemes are still apparent at fairly high migration rates (e.g., 4N 0 m. 50 for D T and 4N 0 m 100 for D FL ; Figure 1). Importantly, pooling data from several subpopulations does not generate negative

5 Genealogies and Population Subdivision 209 Figure 1. Averages of Tajima s D T (A) and Fu and Li s D FL (B) under three sampling schemes as a function of levels of gene flow. We simulated an equilibrium stepping-stone model with I ¼ 100 islands; the simulations were carried out without recombination. Every plotted point is based on 1000 independently generated data sets, and standard errors of the means are indicated by vertical lines. values of D T or D FL without an expansion of the total population (see discussion). These observations also hold for the island model, albeit with smaller discrepancies between the summary statistics for scattered samples and those for the two other sampling schemes (i.e., a less pronounced skew toward intermediatefrequency mutations for both pooled and local samples; supporting information, Figure S1). For lower levels of gene flow (4N 0 m, 2), the site frequency spectra of local samples gradually shift toward NE expectations, while those of pooled samples yield increasingly positive values of both D T and D FL under decreasing levels of migration. We checked our simulations of this equilibrium model against analytical results showing that local samples ought to be invariant for the level of nucleotide diversity, p, irrespective of the level of symmetrical interdeme migration in an island model (Slatkin 1987; Strobeck 1987). For both models of population structure, we found approximately invariant mean p-values for local samples over the entire range of simulated migration rates (results not shown). The impact of population/range expansions on the site frequency spectrum: Next, we considered scenarios of (range) expansions under concomitant establishment of subdivision of the total species range, as described in the models and methods section. In particular, we simulated a single ancestral population that at time t before the present experienced a fragmentation into I demes, where the total number of individuals across the subdivided population could vary from N A (the number of individuals in the single ancestral population, in which case the expansion factor b ¼ 1) to N A (equivalent to b ¼ 100). Under a steppingstone model with a 10-fold population expansion 8N 0 generations in the past, local samples still exhibit values of D T and D FL that would be expected under NE conditions, as long as gene flow is fairly low (Figure 2); these findings are consistent with simulation results by Ray et al. (2003). The corresponding simulation results for the island model are shown in Figure S2. Scattered samples, however, contain a clear signal of the specieswide expansion at any level of gene flow. The reason is that the coalescent of scattered samples almost behaves like a neutral one with a population size proportional to the number of demes. Hence, this coalescent picks up the signal of expansion as in the panmictic case. The same should hold for any sampling scheme under sufficiently high migration, but Figure 2 shows that even under high levels of gene flow (4N 0 m ¼ 100), the site frequency spectrum of local samples is still different from that of pooled and scattered samples. Moreover, the simulation results plotted in Figure 2 were obtained under a 10-fold expansion, and we may expect discrepancies between local and scattered samples to extend to even higher migration rates with higher expansion factors (see Figure 3). Again, D T and D FL values obtained for pooled samples are intermediate between those of local and scattered samples except Figure 2. Averages of Tajima s D T (A) and Fu and Li s D FL (B) under specieswide expansion (b ¼ 10, t ¼ 2) as a function of migration rates between demes. We simulated a stepping-stone model with 100 demes without recombination. Every plotted point is based on 1000 independently generated data sets; error bars represent standard errors.

6 210 T. Städler et al. Figure 3. Averages of Tajima s D T (A) and Fu and Li s D FL (B) as functions of the expansion factor b (with fixed migration rate and time of expansion), with the power of these statistics evaluated in the top part of each plot (right y-axis; power was assessed at a level of P ¼ 0.05). The same simulation scheme as in Figure 2 was used. for very low migration rates (4N 0 m, 0.5), similar to the case of equilibrium subdivided populations (Figures 1 and 2). These simulation results imply that both local and pooled samples may be expected to underestimate the extent of any specieswide (range) expansion to various degrees, depending on levels of gene flow connecting the demes and the age and magnitude of the expansion. The latter aspect, however, is influenced by our choice of simulating an instantaneous expansion followed by a period of constant population size until the time of sampling; an exponential expansion scheme until the present would yield higher detectability of expansion from local and pooled samples. We point out that quantitative details of our simulation results for the three sampling schemes depend to some extent on our choice of sampling 20 sequences distributed over one ( local ), four (pooled), and 20 demes ( scattered ), respectively. Generally speaking, decreasing the number of sequences sampled per deme and/or increasing the number of demes that sequences are sampled from shifts the site frequency spectrum more toward that of scattered samples, reflecting the diminished impact of the scattering phase of the coalescent process on such samples. The exact numerical composition of local and pooled samples also affects the migration rate at which the expected D T and D FL values for pooled samples drop below those for local samples (see Figures 1 and 2). Power of D T and D FL to detect expansion under different sampling schemes and expansion times: Illustrated by an intermediate level of interdeme migration (4N 0 m ¼ 10), we assessed the power of the test statistics D T and D FL under a range of expansion factors and times of expansion. For the three sampling schemes, Figure 3 summarizes the power of D T and D FL assuming an expansion time of t ¼ 8N 0 generations ago. The low power of local samples to detect significant departures from the NE model regardless of the magnitude of the specieswide expansion is striking; qualitatively, this is consistent with results by Ray et al. (2003). Even if the specieswide expansion was 100-fold, local samples deviate from NE expectations in only 8% (for D T ) and 12% (for D FL ) of all cases under these conditions. In sharp contrast, scattered samples deviate from NE expectations in 98% (99%) of all cases for b ¼ 100. The corresponding simulation results for the island model are presented in Figure S3. Next, we illustrate the effect of the timing of the expansion for a fixed 10-fold expansion and a migration rate of 4N 0 m ¼ 10. Under these conditions, expansion times in the approximate range 1, t, 15 can be detected in principle, but again with striking differences in power exhibited by local vs. scattered samples (Figure 4). The shape of the curves in Figure 4 can be explained intuitively: if t is very small (i.e., establishment of population structure was very recent), samples appear to be drawn from a panmictic population of constant size. On the other hand, if t is very large, samples appear to be drawn from an equilibrium subdivided population and hence the expansion cannot be detected. Under the Figure 4. The power of Tajima s D T (A) and Fu and Li s D FL (B) as a function of the expansion time t (with fixed migration rate and expansion factor). The same simulation scheme as in Figure 2 was used; power was assessed at a level of P ¼ 0.05.

7 Genealogies and Population Subdivision 211 Figure 5. F ST as a function of 4N 0 m. (A) t ¼ 8N 0 generations is fixed and b is varying; (B) b ¼ 10 is fixed and t is varying. The same simulations as in Figure 2 were used (i.e., stepping-stone model with 100 demes). island model, all results are qualitatively the same but with even smaller power to reject stable population size for local samples (at most 10% less power; Figure S4). As these power assessments are based on simulated sample genealogies of single loci, the actual power available with empirical multilocus data would be correspondingly higher. All simulation results appear to depend only weakly on the number of demes, as long as I? n. However, an increase in the number of demes carries important connotations, as the genealogical signatures of past expansions remain detectable for longer time periods than with fewer demes. For example, simulating with a constant ratio t/i ¼ 0.02 (i.e., equivalent to t ¼ 2 for I ¼ 100, t ¼ 10 for I ¼ 500, etc.) under otherwise equal demographic conditions, we obtained fairly constant but sampling-specific estimates of D T and D FL, as illustrated for the island model in Figure S5. One interpretation is that the effective population size is proportional to I but depends on the sampling scheme. These observations imply approximately equal detectability of expansions (for a given sampling scheme) over a large range of I and thus the dependency of relevant t-values on I. This latter effect is a direct consequence of lengthening the collecting phase of the coalescent process with increasing numbers of demes in the total population. Estimates of F ST under nonequilibrium conditions: We studied the behavior of F ST in expanding populations (cf. models and methods) and consider both the stepping-stone and the island model of population structure. Given such scenarios, estimates of F ST depend strongly on the migration rate 4N 0 m but only weakly on the expansion time t and the expansion factor b, as long as 4N 0 m $ 2 (Figure 5 and Figure S6). Under moderate and high levels of gene flow, Equation 2 gives a reasonable approximation of the simulated results, although the level of population structure is mostly higher than expected under equilibrium conditions, and thus actual levels of gene flow would be mostly underestimated using F ST calculated from nonequilibrium populations. Under low levels of gene flow (4N 0 m, 2), however, strong and/or recent expansions yield less population differentiation than expected under the assumption of equilibrium conditions, and thus levels of gene flow would be mostly overestimated (Figure 5; see also Excoffier 2004). Reanalyzing the pooling effect in two species of wild tomatoes: Our reanalysis of the multilocus wild tomato data, limited to silent sites and biallelic SNPs, confirms the patterns originally identified by Arunyawat et al. (2007) based on all SNPs; with the exception of one locus in S. peruvianum (CT066, D FL ) and one locus in S. chilense (CT268, D T ), the site frequency spectra at individual loci yield lower/more negative values of D T and D FL in the pooled samples compared to the means of four samples representing local populations (Table 1). Considering the pooled samples, multilocus average D T (D FL ) values are 1.14 ( 1.38) in S. peruvianum and 0.30 ( 0.76) in S. chilense; the magnitude of the drop from single-population means in D T (D FL ) is 0.89 (1.04) in S. peruvianum and 0.78 (1.31) in S. chilense (Table 1). Given our simulation results under both equilibrium and expansion scenarios, the observed summary statistics for pooled samples of both species are incompatible with equilibrium assumptions, but compatible with specieswide expansions (Figures 1 3). On the basis of the observed levels of population subdivision (F ST ; Arunyawat et al. 2007), a rough estimate of average migration rates is 4N 0 m 5 (Figure 5). Our simulation results presented above convey how summary statistics such as D T and D FL obtained from local or pooled samples ought to be affected jointly by migration rates and both the magnitude and the time of expansion. Figure 6 presents results of coalescent simulations tailored to fit the empirical wild tomato data (i.e., sampling 11 sequences from each of four local demes and fixing the migration rate to 4N 0 m ¼ 5). Even when restricting the analysis to two parameters that were allowed to vary widely, it is clear that given observed values of D T and D FL are not sufficient to permit robust estimation of the demographic parameters t and b. The simulated average D T values for pooled samples shown in the contour plot (Figure 6B) suggest that the observed pooled D T values are compatible with $25-fold expansion in S. peruvianum and with $4-fold expansion in S. chilense.

8 212 T. Städler et al. Locus TABLE 1 Tajima s D and Fu and Li s D values in pooled vs. population-specific tomato samples S. peruvianum S. chilense Mean of four populations Pooled Mean of four populations Pooled CT CT CT CT CT CT CT CT Average For each locus, the first row shows values of Tajima s D T, while the second row shows values of Fu and Li s D FL. Data were taken from Arunyawat et al. (2007), but here all nonsynonymous SNPs and those segregating more than two nucleotides (i.e., showing evidence of recurrent mutations) were eliminated before estimating D T and D FL. The pooled columns show values based on the combined (total) sample within each species, treated as a single entity. Note that almost all loci individually exhibit lower/more negative D T and D FL values in the pooled samples. Locus CT208 was not included for S. chilense because patterns of polymorphism were suggestive of natural selection (Arunyawat et al. 2007). When viewed in isolation, the observed local D T values are consistent with a very high expansion factor for S. peruvianum (b. 200) and with b 1 for S. chilense. However, considering the observed differences local vs. pooled,.40-fold and,4-fold expansions appear to be inconsistent with the data for both species (Figure 6; Table 1). Similarly, the observed D FL values would seem to be compatible with.150-fold expansion in S. peruvianum and with.3-fold expansion in S. chilense, whereas on the basis of the observed differences local vs. pooled, only.120-fold expansions appear to be compatible with the data for both species (Table 1; Figure S7). Assuming that the expansion time of both species mirrors their speciation time, we note that these b- values are larger than those assuming the model of Wakeley and Hey (1997) as obtained in Städler et al. (2008). This apparent discrepancy can be explained at least partly by our findings, since the Wakeley Hey model treats any empirical sample as one from a panmictic population and hence picks up signals of expansion only partly, leading to smaller estimates of the expansion factor. We emphasize that changing some of the simulated parameter values, such as the number of demes and the migration rate among demes, would suggest other b-values as being consistent with the empirical data. Similarly, assuming an island model rather than a stepping-stone structure would result in a site frequency spectrum more biased toward low-frequency variants, hence implying signatures of more moderate expansions than suggested by some features of the data mentioned above (Figure S8, Figure S9). Moreover, these simulations were performed with a uniform migration rate across the entire subdivided population, whereas the empirical data may reflect (perhaps dramatic) differences in immigration rates among local samples. Irrespective of the inherent difficulties of inferring credible demographic parameters under population subdivision and nonstationarity, the generally lower/ more negative D T and D FL values for both local and pooled S. peruvianum samples strongly suggest a more pronounced expansion compared to S. chilense (Table 1; Arunyawat et al. 2007; Städler et al. 2008). DISCUSSION Although models of subdivided populations have been studied extensively, the lack of analytical results for local samples of size.2 may have restricted their utility for genealogical inference. Wilkinson-Herbots (2008) recently obtained expressions for the expected mean and variance of pairwise sequence divergence (samples of size 2 drawn either from the same subpopulation or from different subpopulations in a finite island model) in a generalized IM model (see also Excoffier 2004). As our simulations were motivated mainly by patterns in empirical data, we have focused on larger sample sizes and aspects other than pairwise sequence diversity, but the underlying model of population subdivision with variable timing and magnitude of population expansion is identical to that studied by Wilkinson-Herbots (2008). For the most part, our simulations implemented 100 demes and a sample size of 20, and thus should approach the many-demes limit utilized in Wakeley s studies of the island model (e.g., Wakeley 1999, 2001, 2004; Wakeley and Aliacar 2001) and more recently shown by De and Durrett (2007) for the stepping-stone model. Simulating local samples under both island and stepping-stone spatial structure, De and Durrett (2007) found lower numbers of segregating sites, higher linkage disequilibrium (LD), and median site frequency spectra that were skewed toward intermediate-frequency polymorphisms (i.e., yielding positive D T values) in comparison with samples from randomly mating populations. However, they did not investigate the behavior of samples pooled across several demes. Wakeley and Lessard (2003) have shown that levels of LD can be significantly elevated in local samples, depending on migration rates

9 Genealogies and Population Subdivision 213 Figure 6. Contour plots showing average values of D T obtained in simulations designed to mimic the empirical sampling of Arunyawat et al. (2007). (A) Single-deme samples (n ¼ 11 sequences); (B) pooled samples (n ¼ 44 sequences). For each of the 1000 simulations for a given combination of expansion factor (b) and time since expansion (t), 11 sequences were drawn from each of 4 randomly chosen demes, and the 44 sequences were also evaluated as a pooled sample (B). We simulated 400 demes under a stepping-stone structure [u (per deme) ¼ 0.25, 4N 0 m ¼ 5]. The empirical average D T values for S. peruvianum are 0.25 (local) and 1.14 (pooled), while they are 0.48 (local) and 0.30 (pooled) for S. chilense ; see Table 1. and the total number of demes. Their analyses highlighted the potential impact of sampling scheme and demography on the validity of evolutionary inferences drawn under unrealistic assumptions, a topic to which we shall return below. Site frequency spectra under different sampling schemes: Using coalescent simulations, we evaluated three sampling schemes of sequences drawn from subdivided populations under different levels of migration among local demes; all three sampling schemes have corresponding examples in empirical studies of DNA sequence diversity designed to infer aspects of demographic history and natural selection from the site frequency spectrum and/or patterns of LD. Our simulations of subdivided populations of constant total size (Figure 1) have identified a wide range of migration rates where the pooling effect, as interpreted by Arunyawat et al. (2007), appears to hold: local samples are characterized by higher values of Tajima s D T and Fu and Li s D FL than pooled samples for Nm (immigrants per generation). 0.5, and the underlying genealogies of such samples converge in shape only at very high levels of gene flow (Nm. 25). With low migration levels (Nm, 0.5), the pattern is reversed in that pooled samples show an excess of highly positive D T and D FL values. At these low gene flow levels, local samples are expected to undergo a series of coalescent events within the sampled demes before entering the collecting phase with a single ancestral line per deme. The resulting long internal branches of genealogies encompassing lineages from several local demes (i.e., pooled samples) typically exhibit multiple fixed mutations between different demes, leading to intermediate-frequency SNPs in pooled samples. Pannell (2003; his Table 3) has addressed these effects of local vs. pooled samples in the context of metapopulation dynamics. That these genealogical effects are mostly in the opposite direction, i.e., that pooling several local samples is expected to increase the proportion of lowfrequency polymorphisms under intermediate and high levels of migration, has until recently been obscure, even though simulation results by Pannell (2003; his Table 3) indicated such an effect. Analyzing previously published human data sets encompassing a variety of sampling schemes, Ptak and Przeworski (2002) identified a pattern of increasing skew toward low-frequency mutations (i.e., negative D T ) with more geographically heterogeneous sampling (equivalent to pooling across demes) and concluded that signatures of population expansion may be confounded with effects of population subdivision. Our coalescent simulations of equilibrium island/stepping-stone models under a reasonable range of migration rates show that pooling does not lead to negative D T values and thus do not support this particular interpretation of a confounding effect of subdivision. On the contrary, pooled samples (albeit to a lesser extent than local samples) still exhibit the signatures of local (scattering-phase) coalescent events (i.e., lower proportion of external branch length of the sample genealogies), compared to scattered samples whose genealogies should be roughly equivalent in structure to those from panmictic populations (but see below). This intermediate frequency spectrum of pooled samples (between the extremes of local and scattered samples, respectively) is what Arunyawat et al. (2007) predicted on the basis of their genealogical interpretation of the pooling effect. For the case of nonequilibrium populations, one caveat that ought to be mentioned is that signatures of expansion will be seen in scattered samples under low levels of gene flow even without any increase in the total number of individuals (i.e., b#1). The reason is that at time t, the establishment of subdivision increases the effective population size by a factor of 1 1 (1/4N 0 m)(wakeley 2000). Consequently, an apparent expansion may be seen under strong subdivision (e.g., 4N 0 m, 2); the increasingly negative values for the scattered samples shown in Figure 2 with decreasing migration rate appear to reflect this phenomenon. Site frequency spectra and sampling in nonequilibrium subdivided populations: We found effects of the sampling scheme on sample genealogies up to very high migration rates, especially under strong specieswide expansions (Figures 2 and 3). In other words, even

10 214 T. Städler et al. under high migration rates (e.g., Nm ¼ 25), local samples may not adequately reflect the specieswide demography. The relevance of our simulation findings is essentially an empirical issue that should be judged by its predictive power and relevant data. Our simulations predict a pooling effect over a wide range of migration rates, implying one way to test the null hypothesis of panmixia, i.e., that the genealogies of local samples are indistinguishable from those of scattered samples. Very few published studies have performed separate analyses of local population samples and the pooled sample consisting of several local samples, but many studies can, in principle, be used to test our predictions because several local samples have been included [Drosophila (Baudry et al. 2004, 2006; Nolte and Schlötterer 2008), humans (Marth et al. 2004; Voight et al. 2005; Keinan et al. 2007; Garrigan et al. 2007), and plants (Heuertz et al. 2006; Pyhäjärvi et al. 2007)]. Pool and Aquadro (2006) demonstrated the pooling effect in sub-saharan D. melanogaster, as have several studies in plants (Ingvarsson 2005; Arunyawat et al. 2007; Moeller et al. 2007) and humans (e.g., Ptak and Przeworski 2002; Hammer et al. 2003). In all these cases, the site frequency spectra of pooled samples were characterized by an excess of low-frequency variants, resulting in (more) negative D T values. Some of these studies explicitly considered the possible contribution of purifying selection to these patterns, but concluded that demographic (or range) expansions were at least partly responsible for the skew in the site frequency spectra (Pool and Aquadro 2006; Moeller et al. 2007; see also Nordborg et al. 2005). As far as neutral polymorphisms are concerned, our simulation results fully concur with these interpretations, as only global expansions and not the effects of population subdivision per se can generate substantially negative D T values (Figures 1 3). As many other researchers before us, we have investigated the properties of general models of population structure in the context of sampling schemes and their effects on sample genealogies. In contrast, Hammer et al. (2003) presented coalescent simulations of a nonstandard continuous-splitting model of human population structure (where demes do not exchange migrants after being isolated from other demes) and concluded that site frequency spectra of human data sets may be explained without invoking global expansion. Levels of gene flow and within-species panmixia: It may seem surprising to observe the pooling effect even in highly mobile species such as D. melanogaster (Pool and Aquadro 2006), thus providing clear-cut evidence for deviations from panmixia at seemingly low levels of population subdivision (e.g., as quantified by F ST ). The impression of low levels of population substructure may, at least in part, be a consequence of the F ST estimator most widely used for DNA sequence data, F ST ¼ 1 p w /p b, which treats each SNP as a separate locus (Hudson et al. 1992). In empirical as well as simulated data sets, many segregating sites that appear as singletons in local population samples remain singletons in pooled samples. These sites do not contribute much to nucleotide diversity in either local or total samples, compared to SNPs with intermediate frequencies, but they fuel a more pronounced increase in the number of segregating sites in pooled samples [i.e., Watterson s (1975) estimator u W increases much more than p; seealsoray et al. 2003]. These underlying features explain the effect of pooling under apparently low levels of population subdivision. Moreover, they also highlight the inherent dangers of equating low F ST estimates with the (near) absence of population subdivision. In the framework of the infinite-alleles model, Jost (2008) recently argued that F ST (and G ST ) are not at all appropriate measures of differentiation and pinpointed mathematical misconceptions underlying the standard approach. In particular, he identified the classical additive partitioning of total heterozygosity (H T,the heterozygosity of the pooled subpopulations) into mean within-subpopulation heterozygosity (H S ) and a between-subpopulation component (H T H S ¼ D ST ; Nei 1973) as erroneous, because H S and D ST are not truly independent but rather related through an incomplete partitioning ( Jost 2008). Because classical results concerning the absolute number of migrants per generation required to prevent significant differentiation among local demes in the finite-island and steppingstone models (e.g., Nm. 1, Equation 2 above; e.g., Wright 1951; Maruyama 1971) are based on the interpretation of F ST as a measure of differentiation, Jost (2008) concluded that such rules of thumb must be considered invalid. Although Jost (2008) did not formally evaluate the infinite-sites model of sequence evolution, this conclusion goes hand in hand with our interpretations in the preceding paragraph. Importantly, the marked effects of sampling scheme on the site frequency spectrum in our simulations despite high levels of gene flow are not restricted to expanding populations but were found also for subdivided populations at demographic equilibrium (e.g., Nm. 10; Figures 1 and 2). Similar results were previously obtained by De and Durrett (2007) for equilibrium populations and, implicitly, by Ray et al. (2003) for expanding populations. Empirical considerations and implications for statistical inference: Large-scale population-genetic data sets are commonly evaluated in the framework of panmictic populations undergoing temporal changes in population size [humans (e.g., Marth et al. 2004; Voight et al. 2005; Garrigan et al. 2007), Drosophila (e.g., Ometto et al. 2005; Li and Stephan 2006; Thornton and Andolfatto 2006), and plants (Heuertz et al. 2006; Pyhäjärvi et al. 2007)]. In conjunction with the empirical data discussed above, our results suggest that a more appropriate approach would acknowledge the subdivided nature of these species explicitly, which

11 Genealogies and Population Subdivision 215 would require paying particular attention to sampling schemes and their genealogical consequences. Especially if empirical sampling is from a local population (as has often been the case), it is inappropriate to compare patterns of diversity with expectations under the classical NE conditions embodied in commonly used tests of neutrality. Generally, local population samples should not be modeled as coming from a panmictic (e.g., continental) population but rather as being drawn from one of many demes composing the total metapopulation. Put another way, the traditional almost exclusive focus on the temporal trajectory of population-size changes that best explains properties of local/regional samples ought to be questioned, especially in systems with moderate-to-high levels of gene flow where sample genealogies are not regionally monophyletic, but rather are deeply embedded in specieswide sample genealogies (Wakeley 2001; Wakeley and Aliacar 2001). In such cases, the properties of local samples (e.g., the number of segregating sites, nucleotide diversity, the site frequency spectrum, and the extent and decay of LD) arguably reflect the level of connectivity with (degree of isolation from) other demes during the scattering phase of the coalescent process (Wakeley 2001; Ray et al. 2003; Wakeley and Lessard 2003; De and Durrett 2007). Our simulation scheme has been simplistic in assuming equal migration rates and deme sizes through time and across the entire metapopulation, but it is straightforward to explain empirical patterns, such as those found for various human population samples, in terms of differences in local population size and immigration rates during the recent past (Wakeley 2001; Ray et al. 2003; Excoffier 2004), arguably mediated by environmental heterogeneity (Wegmann et al. 2006). Guided by notions of the special importance of population structure and the potential for local adaptation in plants, several recent studies have emphasized the need to include genuine local population samples in population-genetic studies, rather than relying exclusively on scattered samples (Wright and Gaut 2005; Moeller et al. 2007; Ross-Ibarra et al. 2008). While we agree that sampling locally has particular merit for studying signatures of local adaptation (see Arunyawat et al. 2007), our present results caution against analyzing such local samples naively, i.e., in the conventional framework of panmictic populations. In principle, sampling several local demes offers the opportunity to empirically contrast the properties of local samples (e.g., site frequency spectrum, decay of LD) with those of the pooled sample, analogous to two of our three simulation sampling schemes. At the very least, this exercise has the potential to infer general features of the specieswide demographic history despite the biases inherent in local samples. It would also (at least partially) mitigate against inferring spurious signatures of natural selection from levels and patterns of nucleotide polymorphism or from the extent of haplotype structure (for examples, see Wakeley and Lessard 2003; De and Durrett 2007). However, only the genealogical structure of scattered samples is comparable to that of an elusive panmictic population that has experienced the same temporal demographic history (e.g., expansion), whereas such samples preclude finding molecular evidence of local adaptation. We thank Tanja Pyhäjärvi for information on her multideme data set on Pinus sylvestris and Aurelien Tellier for valuable discussions on metapopulation dynamics and the coalescent in subdivided populations. Two anonymous referees provided constructive criticism that helped to improve the final version of this article. Furthermore, we thank Dick Hudson for allowing us to redistribute a modified version of his coalescent simulation program ms. This work was funded by the Deutsche Forschungsgemeinschaft through its Priority Program Radiations Origins of Biological Diversity (SPP-1127), grant Ste 325/5-3 (to W.S.), and through Research Unit Natural Selection in Structured Populations (FOR-1078), grant Ste 325/12-1 (to W.S.) and grant Pf 672/1-1 (to P.P.). P.P. acknowledges additional support by the German Federal Ministry of Education and Research through the Freiburg Initiative for Systems Biology, grant LITERATURE CITED Adams, A. M., and R. R. Hudson, 2004 Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics 168: Alonso, S., and J. A. L. Armour, 2001 A highly variable segment of human subterminal 16p reveals a history of population growth for modern humans outside Africa. Proc. Natl. Acad. Sci. USA 98: Arunyawat, U., W. Stephan and T. Städler, 2007 Using multilocus sequence data to assess population structure, natural selection, and linkage disequilibrium in wild tomatoes. Mol. Biol. Evol. 24: Baudry, E., B. Viginier and M. Veuille, 2004 Non-African populations of Drosophila melanogaster have a unique origin. Mol. Biol. Evol. 21: Baudry, E., N. Derome,M.Huet and M. Veuille, 2006 Contrasted polymorphism patterns in a large sample of populations from the evolutionary genetics model Drosophila simulans. Genetics 173: De, A., and R. Durrett, 2007 Stepping-stone spatial structure causes slow decay of linkage disequilibrium and shifts the site frequency spectrum. Genetics 176: Excoffier, L., 2004 Patterns of DNA sequence diversity and genetic structure after a range expansion: lessons from the infinite-island model. Mol. Ecol. 13: Fay, J. C., and C.-I Wu, 2000 Hitchhiking under positive Darwinian selection. Genetics 155: Fu, Y.-X., and W.-H. Li, 1993 Statistical tests of neutrality of mutations. Genetics 133: Garrigan, D., S. B. Kingan, M. M. Pilkington, J. A. Wilder, M. P. Cox et al., 2007 Inferring human population sizes, divergence times and rates of gene flow from mitochondrial, X and Y chromosome resequencing data. Genetics 177: Hammer, M. F., F. Blackmer, D. Garrigan, M. W. Nachman and J. A. Wilder, 2003 Human population structure and its effects on sampling Y chromosome sequence variation. Genetics 164: Heuertz, M., E. De Paoli, T. Källman, H. Larsson, I. Jurman et al., 2006 Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway Spruce [Picea abies (L.) Karst]. Genetics 174: Hey, J., and R. Nielsen, 2004 Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167:

12 216 T. Städler et al. Hudson, R. R., 2002 Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: Hudson, R. R., M. Slatkin and W. P. Maddison, 1992 Estimation of levels of gene flow from DNA sequence data. Genetics 132: Ingvarsson, P. K., 2005 Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European aspen (Populus tremula L., Salicaceae). Genetics 169: Jost, L., 2008 G ST and its relatives do not measure differentiation. Mol. Ecol. 17: Keinan, A., J. C. Mullikin,N.Patterson and D. Reich, 2007 Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat. Genet. 39: Li, H., and W. Stephan, 2006 Inferring the demographic history and rate of adaptive substitution in Drosophila. PLoS Genet. 2: e166. Marth, G. T., E. Czabarka, J. Murvai and S. T. Sherry, 2004 The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166: Maruyama, T., 1971 Analysis of population structure II. Two-dimensional stepping stone models of finite length and other geographically structuredpopulations.ann.hum.genet.35: Moeller, D. A., M. I. Tenaillon and P. Tiffin, 2007 Population structure and its effects on patterns of nucleotide polymorphism in teosinte (Zea mays ssp. parviglumis). Genetics 176: Nei, M., 1973 Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70: Nielsen, R., andj. Wakeley, 2001 Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158: Nolte, V., and C. Schlötterer, 2008 African Drosophila melanogaster and D. simulans populations have similar levels of sequence variability, suggesting comparable effective population sizes. Genetics 178: Nordborg, M., T. T. Hu, Y. Ishino, J. Jhaveri, C. Toomajian et al., 2005 The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3: e196. Ometto, L., S. Glinka, D. De Lorenzo and W. Stephan, 2005 Inferring the effects of demography and selection on Drosophila melanogaster populations from a chromosome-wide scan of DNA variation. Mol. Biol. Evol. 22: Pannell, J. R., 2003 Coalescence in a metapopulation with recurrent local extinction and recolonization. Evolution 57: Pool, J. E., and C. F. Aquadro, 2006 History and structure of sub- Saharan populations of Drosophila melanogaster. Genetics 174: Ptak, S. E., and M. Przeworski, 2002 Evidence for population growth in humans is confounded by fine-scale population structure. Trends Genet. 18: Pyhäjärvi, T., M. R. García-Gil, T. Knürr, M. Mikkonen, W. Wachowiak et al., 2007 Demographic history has influenced nucleotide diversity in European Pinus sylvestris populations. Genetics 177: Ramos-Onsins, S. E., S. Mousset, T. Mitchell-Olds and W. Stephan, 2007 Population genetic inference using a fixed number of segregating sites: a reassessment. Genet. Res. 89: Ray, N., M. Currat and L. Excoffier, 2003 Intra-deme molecular diversity in spatially expanding populations. Mol. Biol. Evol. 20: Roselius, K., W. Stephan and T. Städler, 2005 The relationship of nucleotide polymorphism, recombination rate and selection in wild tomato species. Genetics 171: Ross-Ibarra, J., S. I. Wright, J. P. Foxe, A. Kawabe, L. DeRose- Wilson et al., 2008 Patterns of polymorphism and demographic history in natural populations of Arabidopsis lyrata. PLoS One 3: e2411. Schmid, K. J., S. E. Ramos-Onsins, H. Ringys-Beckstein, B. Weisshaar and T. Mitchell-Olds, 2005 A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism. Genetics 169: Slatkin, M., 1987 The average number of sites separating DNA sequences drawn from a subdivided population. Theor. Popul. Biol. 32: Städler, T., K. Roselius and W. Stephan, 2005 Genealogical footprints of speciation processes in wild tomatoes: demography and evidence for historical gene flow. Evolution 59: Städler, T., U. Arunyawat and W. Stephan, 2008 Population genetics of speciation in two closely related wild tomatoes (Solanum section Lycopersicon). Genetics 178: Strobeck, C., 1987 Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117: Tajima, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: Thornton, K., and P. Andolfatto, 2006 Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172: Voight, B. F., A. M. Adams, L.A.Frisse, Y.Qian, R.R.Hudson et al., 2005 Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc. Natl. Acad. Sci. USA 102: Wakeley, J., 1999 Nonequilibrium migration in human history. Genetics 153: Wakeley, J., 2000 The effects of subdivision on the genetic divergence of populations and species. Evolution 54: Wakeley, J., 2001 The coalescent in an island model of population subdivision with variation among demes. Theor. Popul. Biol. 59: Wakeley, J., 2004 Metapopulations and coalescent theory, pp in Ecology, Genetics, and Evolution of Metapopulations, edited by I. Hanski and O. Gaggiotti. Elsevier, Oxford. Wakeley, J., and N. Aliacar, 2001 Gene genealogies in a metapopulation. Genetics 159: Wakeley, J., and J. Hey, 1997 Estimating ancestral population parameters. Genetics 145: Wakeley, J., and S. Lessard, 2003 Theory of the effects of population structure and sampling on patterns of linkage disequilibrium applied to genomic data from humans. Genetics 164: Watterson, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: Wegmann, D., M. Currat and L. Excoffier, 2006 Molecular diversity after a range expansion in heterogeneous environments. Genetics 174: Whitlock, M. C., and D. E. McCauley, 1999 Indirect measures of gene flow and migration: F ST doesn t equal 1/(4Nm 1 1). Heredity 82: Wilkinson-Herbots, H. M., 2008 The distribution of the coalescent time and the number of pairwise nucleotide differences in the isolation with migration model. Theor. Popul. Biol. 73: Wright, S., 1951 The genetical structure of populations. Ann. Eugen. 15: Wright, S. I., and B. S. Gaut, 2005 Molecular population genetics and the search for adaptive evolution in plants. Mol. Biol. Evol. 22: Communicating editor: N. Takahata

13 Supporting Information The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations Thomas Städler, Bernhard Haubold, Carlos Merino, Wolfgang Stephan and Peter Pfaffelhuber Copyright 2009 by the Genetics Society of America doi: /genetics

14 2 SI T. Städler et al. FIGURE S1. The same parameters were used as in Figure 1 of the main text. Here, we used an island model with 100 demes.

15 T. Städler et al. 3 SI FIGURE S2. The same parameters were used as in Figure 2 of the main text. Here, we used an island model with 100 demes.

16 4 SI T. Städler et al. FIGURE S3. The same parameters were used as in Figure3 of the main text. Here, we used an island model with 100 demes.

17 T. Städler et al. 5 SI FIGURE S4. The same parameters were used as in Figure 4 of the main text. Here, we used an island model with 100 demes.

18 6 SI T. Städler et al. FIGURE S5. The dependence of D T and D FL on the number of islands I in an island model. We fix τ /I = 0.02, β = 10 and 4N 0 m = 10. In addition θ I = 100 is held constantå

19 T. Städler et al. 7 SI FIGURE S6. The same parameters were used as in Figure 5 of the main text. Here, we used an island model with 100 demes.

20 8 SI T. Städler et al. FIGURE S7. Same as Figure 6 in the main text, but for D FL.

21 T. Städler et al. 9 S FIGURE S8. Same as Figure 6 in the main text, but for an island model with I = 400 islands.

22 10 SI T. Städler et al. FIGURE S9. Same as Figure S8, but for D FL.

I of a gene sampled from a randomly mating popdation,

I of a gene sampled from a randomly mating popdation, Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University

More information

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009 Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

122 9 NEUTRALITY TESTS

122 9 NEUTRALITY TESTS 122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences Indian Academy of Sciences A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences ANALABHA BASU and PARTHA P. MAJUMDER*

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

MOLECULAR population genetic approaches have

MOLECULAR population genetic approaches have Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.070631 Population Structure and Its Effects on Patterns of Nucleotide Polymorphism in Teosinte (Zea mays ssp. parviglumis)

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs Statistical Tests for Detecting Positive Selection by Utilizing High Frequency SNPs Kai Zeng *, Suhua Shi Yunxin Fu, Chung-I Wu * * Department of Ecology and Evolution, University of Chicago, Chicago,

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

Demographic inference reveals African and European. admixture in the North American Drosophila. melanogaster population

Demographic inference reveals African and European. admixture in the North American Drosophila. melanogaster population Genetics: Advance Online Publication, published on November 12, 2012 as 10.1534/genetics.112.145912 1 2 3 4 5 6 7 Demographic inference reveals African and European admixture in the North American Drosophila

More information

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants Genetics: Published Articles Ahead of Print, published on September 1, 2006 as 10.1534/genetics.106.061432 Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Kai Zeng,*

More information

Effective population size and patterns of molecular evolution and variation

Effective population size and patterns of molecular evolution and variation FunDamental concepts in genetics Effective population size and patterns of molecular evolution and variation Brian Charlesworth Abstract The effective size of a population,, determines the rate of change

More information

Metapopulation models for historical inference

Metapopulation models for historical inference Molecular Ecology (2004) 13, 865 875 doi: 10.1111/j.1365-294X.2004.02086.x Metapopulation models for historical inference Blackwell Publishing, Ltd. JOHN WAKELEY Department of Organismic and Evolutionary

More information

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University STAT 536: Migration Karin S. Dorman Department of Statistics Iowa State University October 3, 2006 Migration Introduction Migration is the movement of individuals between populations. Until now we have

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Supporting Information

Supporting Information Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Population Structure

Population Structure Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations

More information

Demography April 10, 2015

Demography April 10, 2015 Demography April 0, 205 Effective Population Size The Wright-Fisher model makes a number of strong assumptions which are clearly violated in many populations. For example, it is unlikely that any population

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Space Time Population Genetics

Space Time Population Genetics CHAPTER 1 Space Time Population Genetics I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things. Waldo Tobler (1970) Spatial

More information

Population Genetic Approaches to Speciation of Wild Tomatoes with Special Reference to Solanum habrochaites and S. arcanum

Population Genetic Approaches to Speciation of Wild Tomatoes with Special Reference to Solanum habrochaites and S. arcanum Population Genetic Approaches to Speciation of Wild Tomatoes with Special Reference to Solanum habrochaites and S. arcanum Dissertation der Fakultät für Biologie der Ludwig-Maximilians-Universität München

More information

Stochastic Demography, Coalescents, and Effective Population Size

Stochastic Demography, Coalescents, and Effective Population Size Demography Stochastic Demography, Coalescents, and Effective Population Size Steve Krone University of Idaho Department of Mathematics & IBEST Demographic effects bottlenecks, expansion, fluctuating population

More information

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates JOSEPH FELSENSTEIN Department of Genetics SK-50, University

More information

Genetic hitch-hiking in a subdivided population

Genetic hitch-hiking in a subdivided population Genet. Res., Camb. (1998), 71, pp. 155 160. With 3 figures. Printed in the United Kingdom 1998 Cambridge University Press 155 Genetic hitch-hiking in a subdivided population MONTGOMERY SLATKIN* AND THOMAS

More information

Expected coalescence times and segregating sites in a model of glacial cycles

Expected coalescence times and segregating sites in a model of glacial cycles F.F. Jesus et al. 466 Expected coalescence times and segregating sites in a model of glacial cycles F.F. Jesus 1, J.F. Wilkins 2, V.N. Solferini 1 and J. Wakeley 3 1 Departamento de Genética e Evolução,

More information

Genetic Drift in Human Evolution

Genetic Drift in Human Evolution Genetic Drift in Human Evolution (Part 2 of 2) 1 Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University Outline Introduction to genetic drift Modeling genetic drift

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

Robust demographic inference from genomic and SNP data

Robust demographic inference from genomic and SNP data Robust demographic inference from genomic and SNP data Laurent Excoffier Isabelle Duperret, Emilia Huerta-Sanchez, Matthieu Foll, Vitor Sousa, Isabel Alves Computational and Molecular Population Genetics

More information

TO date, several studies have confirmed that Drosophila

TO date, several studies have confirmed that Drosophila INVESTIGATION Demographic Inference Reveals African and European Admixture in the North American Drosophila melanogaster Population Pablo Duchen, 1 Daniel Živković, Stephan Hutter, Wolfgang Stephan, and

More information

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Mathematical Population Genetics II

Mathematical Population Genetics II Mathematical Population Genetics II Lecture Notes Joachim Hermisson March 20, 2015 University of Vienna Mathematics Department Oskar-Morgenstern-Platz 1 1090 Vienna, Austria Copyright (c) 2013/14/15 Joachim

More information

Notes 20 : Tests of neutrality

Notes 20 : Tests of neutrality Notes 0 : Tests of neutrality MATH 833 - Fall 01 Lecturer: Sebastien Roch References: [Dur08, Chapter ]. Recall: THM 0.1 (Watterson s estimator The estimator is unbiased for θ. Its variance is which converges

More information

Genetic Variation in Finite Populations

Genetic Variation in Finite Populations Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2

More information

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p. Lecture 8 - Selection and Tests of Neutrality Gibson and Muse, chapter 5 Nei and Kumar, chapter 2.6 p. 258-264 Hartl, chapter 3, p. 22-27 The Usefulness of Theta Under evolution by genetic drift (i.e.,

More information

Contrasts for a within-species comparative method

Contrasts for a within-species comparative method Contrasts for a within-species comparative method Joseph Felsenstein, Department of Genetics, University of Washington, Box 357360, Seattle, Washington 98195-7360, USA email address: joe@genetics.washington.edu

More information

Fitness landscapes and seascapes

Fitness landscapes and seascapes Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of

More information

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. EVOLUTIONARY BIOLOGY EXAM #1 Fall 2017 There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. Part I. True (T) or False (F) (2 points each). Circle

More information

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Coalescent based demographic inference. Daniel Wegmann University of Fribourg Coalescent based demographic inference Daniel Wegmann University of Fribourg Introduction The current genetic diversity is the outcome of past evolutionary processes. Hence, we can use genetic diversity

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

arxiv: v1 [q-bio.pe] 14 Jan 2016

arxiv: v1 [q-bio.pe] 14 Jan 2016 Efficient Maximum-Likelihood Inference For The Isolation-With-Initial-Migration Model With Potentially Asymmetric Gene Flow Rui J. Costa and Hilde Wilkinson-Herbots arxiv:1601.03684v1 [q-bio.pe] 14 Jan

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Evaluating the performance of a multilocus Bayesian method for the estimation of migration rates

Evaluating the performance of a multilocus Bayesian method for the estimation of migration rates University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Publications, Agencies and Staff of the U.S. Department of Commerce U.S. Department of Commerce 2007 Evaluating the performance

More information

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites Scott Williamson and Maria E. Orive Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence We investigate

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Mathematical Population Genetics II

Mathematical Population Genetics II Mathematical Population Genetics II Lecture Notes Joachim Hermisson June 9, 2018 University of Vienna Mathematics Department Oskar-Morgenstern-Platz 1 1090 Vienna, Austria Copyright (c) 2013/14/15/18 Joachim

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

- point mutations in most non-coding DNA sites likely are likely neutral in their phenotypic effects.

- point mutations in most non-coding DNA sites likely are likely neutral in their phenotypic effects. January 29 th, 2010 Bioe 109 Winter 2010 Lecture 10 Microevolution 3 - random genetic drift - one of the most important shifts in evolutionary thinking over the past 30 years has been an appreciation of

More information

Pairwise Comparisons of Mitochondrial DNA Sequences in Subdivided Populations and Implications for Early Human Evolution

Pairwise Comparisons of Mitochondrial DNA Sequences in Subdivided Populations and Implications for Early Human Evolution Copyright 0 1994 by the Genetics Society of America T Pairwise Comparisons of Mitochondrial DNA Sequences in Subdivided Populations and Implications for Early Human Evolution Paul Marjoram' and Peter Donnelly

More information

Neutral Theory of Molecular Evolution

Neutral Theory of Molecular Evolution Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation

More information

Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure

Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure Molecular Ecology (2010) 19, 4648 4660 doi: 10.1111/j.1365-294X.2010.04783.x Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure BENJAMIN M. PETER,*

More information

EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION

EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION Friday, July 27th, 11:00 EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION Karan Pattni karanp@liverpool.ac.uk University of Liverpool Joint work with Prof.

More information

ms a program for generating samples under neutral models

ms a program for generating samples under neutral models ms a program for generating samples under neutral models Richard R. Hudson December 11, 2004 This document describes how to use ms, a program to generate samples under a variety of neutral models. The

More information

Neutral behavior of shared polymorphism

Neutral behavior of shared polymorphism Proc. Natl. Acad. Sci. USA Vol. 94, pp. 7730 7734, July 1997 Colloquium Paper This paper was presented at a colloquium entitled Genetics and the Origin of Species, organized by Francisco J. Ayala (Co-chair)

More information

Testing for spatially-divergent selection: Comparing Q ST to F ST

Testing for spatially-divergent selection: Comparing Q ST to F ST Genetics: Published Articles Ahead of Print, published on August 17, 2009 as 10.1534/genetics.108.099812 Testing for spatially-divergent selection: Comparing Q to F MICHAEL C. WHITLOCK and FREDERIC GUILLAUME

More information

Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data

Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data Modelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data Na Li and Matthew Stephens July 25, 2003 Department of Biostatistics, University of Washington, Seattle, WA 98195

More information

Statistical phylogeography

Statistical phylogeography Molecular Ecology (2002) 11, 2623 2635 Statistical phylogeography Blackwell Science, Ltd L. LACEY KNOWLES and WAYNE P. MADDISON Department of Ecology and Evolutionary Biology, University of Arizona, Tucson,

More information

Supporting Information Text S1

Supporting Information Text S1 Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived

More information

The Wright Fisher Controversy. Charles Goodnight Department of Biology University of Vermont

The Wright Fisher Controversy. Charles Goodnight Department of Biology University of Vermont The Wright Fisher Controversy Charles Goodnight Department of Biology University of Vermont Outline Evolution and the Reductionist Approach Adding complexity to Evolution Implications Williams Principle

More information

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution 15.2 Intro In biology, evolution refers specifically to changes in the genetic makeup of populations over time.

More information

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Challenges when applying stochastic models to reconstruct the demographic history of populations. Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017 Outline 1 Introduction 2 Inverse

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data Proc. Natl. Acad. Sci. USA Vol. 95, pp. 5456 546, May 998 Statistics Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data ANDRZEJ

More information

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm * Evolutionary Dynamics and Extensive Form Games by Ross Cressman Reviewed by William H. Sandholm * Noncooperative game theory is one of a handful of fundamental frameworks used for economic modeling. It

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Population genetics snippets for genepop

Population genetics snippets for genepop Population genetics snippets for genepop Peter Beerli August 0, 205 Contents 0.Basics 0.2Exact test 2 0.Fixation indices 4 0.4Isolation by Distance 5 0.5Further Reading 8 0.6References 8 0.7Disclaimer

More information

UON, CAS, DBSC, General Biology II (BIOL102) Dr. Mustafa. A. Mansi. The Origin of Species

UON, CAS, DBSC, General Biology II (BIOL102) Dr. Mustafa. A. Mansi. The Origin of Species The Origin of Species Galápagos Islands, landforms newly emerged from the sea, despite their geologic youth, are filled with plants and animals known no-where else in the world, Speciation: The origin

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift: 9. Genetic Drift Genetic drift is the alteration of gene frequencies due to sampling variation from one generation to the next. It operates to some degree in all finite populations, but can be significant

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

The genomic rate of adaptive evolution

The genomic rate of adaptive evolution Review TRENDS in Ecology and Evolution Vol.xxx No.x Full text provided by The genomic rate of adaptive evolution Adam Eyre-Walker National Evolutionary Synthesis Center, Durham, NC 27705, USA Centre for

More information

Introduction to Advanced Population Genetics

Introduction to Advanced Population Genetics Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency

More information

Heaving Toward Speciation

Heaving Toward Speciation Temporal Waves of Genetic Diversity in a Spatially Explicit Model of Evolution: Heaving Toward Speciation Guy A. Hoelzer 1, Rich Drewes 2 and René Doursat 2,3 1 Department of Biology, 2 Brain Computation

More information

5/31/17. Week 10; Monday MEMORIAL DAY NO CLASS. Page 88

5/31/17. Week 10; Monday MEMORIAL DAY NO CLASS. Page 88 Week 10; Monday MEMORIAL DAY NO CLASS Page 88 Week 10; Wednesday Announcements: Family ID final in lab Today Final exam next Tuesday at 8:30 am here Lecture: Species concepts & Speciation. What are species?

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Outline Genetics of complex

More information

I. Short Answer Questions DO ALL QUESTIONS

I. Short Answer Questions DO ALL QUESTIONS EVOLUTION 313 FINAL EXAM Part 1 Saturday, 7 May 2005 page 1 I. Short Answer Questions DO ALL QUESTIONS SAQ #1. Please state and BRIEFLY explain the major objectives of this course in evolution. Recall

More information

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh,

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Genetics: Published Articles Ahead of Print, published on November 19, 2005 as 10.1534/genetics.105.047217 Estimating selection on non-synonymous mutations Laurence Loewe 1, Brian Charlesworth, Carolina

More information

CONGEN Population structure and evolutionary histories

CONGEN Population structure and evolutionary histories CONGEN Population structure and evolutionary histories The table below shows allele counts at a microsatellite locus genotyped in 12 populations of Atlantic salmon. Review the table and prepare to discuss

More information

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution Classical vs. balanced views of genome structure - the proposal of the neutral theory by Kimura in 1968 led to the so-called neutralist-selectionist

More information

Identifying targets of positive selection in non-equilibrium populations: The population genetics of adaptation

Identifying targets of positive selection in non-equilibrium populations: The population genetics of adaptation Identifying targets of positive selection in non-equilibrium populations: The population genetics of adaptation Jeffrey D. Jensen September 08, 2009 From Popgen to Function As we develop increasingly sophisticated

More information

Evolutionary Genetics: Part 0.2 Introduction to Population genetics

Evolutionary Genetics: Part 0.2 Introduction to Population genetics Evolutionary Genetics: Part 0.2 Introduction to Population genetics S. chilense S. peruvianum Winter Semester 2012-2013 Prof Aurélien Tellier FG Populationsgenetik Population genetics Evolution = changes

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

Surfing genes. On the fate of neutral mutations in a spreading population

Surfing genes. On the fate of neutral mutations in a spreading population Surfing genes On the fate of neutral mutations in a spreading population Oskar Hallatschek David Nelson Harvard University ohallats@physics.harvard.edu Genetic impact of range expansions Population expansions

More information

J. Hey - Genetic Species

J. Hey - Genetic Species long been recognized (Mayr, 1982), the definition of the word "species" and the identification of species January 20, 1997 remain problematic. One advance is the understanding that the word "species",

More information

Selection and Population Genetics

Selection and Population Genetics Selection and Population Genetics Evolution by natural selection can occur when three conditions are satisfied: Variation within populations - individuals have different traits (phenotypes). height and

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

reciprocal altruism by kin or group selection can be analyzed by using the same approach (6).

reciprocal altruism by kin or group selection can be analyzed by using the same approach (6). Proc. Nati. Acad. Sci. USA Vol. 81, pp. 6073-6077, October 1984 Evolution Group selection for a polygenic behavioral trait: Estimating the degree of population subdivision (altruism/kin selection/population

More information

Genetics. Metapopulations. Dept. of Forest & Wildlife Ecology, UW Madison

Genetics. Metapopulations. Dept. of Forest & Wildlife Ecology, UW Madison Genetics & Metapopulations Dr Stacie J Robinson Dr. Stacie J. Robinson Dept. of Forest & Wildlife Ecology, UW Madison Robinson ~ UW SJR OUTLINE Metapopulation impacts on evolutionary processes Metapopulation

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

Formalizing the gene centered view of evolution

Formalizing the gene centered view of evolution Chapter 1 Formalizing the gene centered view of evolution Yaneer Bar-Yam and Hiroki Sayama New England Complex Systems Institute 24 Mt. Auburn St., Cambridge, MA 02138, USA yaneer@necsi.org / sayama@necsi.org

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information

ACCORDING to current estimates of spontaneous deleterious

ACCORDING to current estimates of spontaneous deleterious GENETICS INVESTIGATION Effects of Interference Between Selected Loci on the Mutation Load, Inbreeding Depression, and Heterosis Denis Roze*,,1 *Centre National de la Recherche Scientifique, Unité Mixte

More information

Lecture Notes: BIOL2007 Molecular Evolution

Lecture Notes: BIOL2007 Molecular Evolution Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits

More information