Genetics: Early Online, published on May 5, 2017 as /genetics

Size: px
Start display at page:

Download "Genetics: Early Online, published on May 5, 2017 as /genetics"

Transcription

1 Genetics: Early Online, published on May 5, 07 as 0.534/genetics GENETICS INVESTIGATION Mathematical constraints on F ST : biallelic markers in arbitrarily many populations Nicolas Alcala and Noah A. Rosenberg Department of Biology, Stanford University, Stanford, CA , USA ABSTRACT F ST is one of the most widely used statistics in population genetics. Recent mathematical studies have identified constraints that challenge interpretations of F ST as a measure with potential to range from 0 for genetically similar populations to for divergent populations. We generalize results obtained for population pairs to arbitrarily many populations, characterizing the mathematical relationship between F ST, the frequency M of the more frequent allele at a polymorphic biallelic marker, and the number of subpopulations. We show that for fixed, F ST has a peculiar constraint as a function of M, with a maximum of only if M = i/, for integers i with / i. For fixed M, as grows large, the range of F ST becomes the full closed or half-open unit interval. For fixed, however, some M < ( / always exists at which the upper bound on F ST lies below 84. We use coalescent simulations to show that under weak migration, F ST depends strongly on M when is small, but not when is large. Finally, examining data on human genetic variation, we use our results to explain the generally smaller F ST values between pairs of continents relative to global F ST values. We discuss implications for the interpretation and use of F ST. EYWORDS allele frequency; F ST ; genetic differentiation; migration; population structure GENETIC differentiation, in which individuals from the same subpopulation are more genetically similar than are individuals from different subpopulations, is a central concept in population genetics. It can arise from a large variety of processes, including from aspects of the physical environment such as geographic barriers, variable permeability to migrants, and spatially heterogenous selection pressures, as well as from biotic phenomena such as assortative mating and self-fertilization. Genetic differentiation among populations is thus a pervasive feature of population-genetic variation. To measure genetic differentiation, Wright (95 introduced the fixation index F ST, defined as the correlation between random gametes, drawn from the same subpopulation, relative to the total. Many definitions of F ST and related statistics have since been proposed (reviewed by Holsinger and Weir 009. F ST is often defined in terms of a ratio involving mean heterozygosity of a set of subpopulations, H S, and total heterozygosity of a population formed by pooling the alleles of the subpopulations, doi: 0.534/genetics.XXX.XXXXXX Manuscript compiled: Friday 5 th May, 07% Correspondence to Nicolas Alcala. nalcala@stanford.edu. H T (Nei 973: F ST = H T H S H T. ( For a polymorphic biallelic marker whose more frequent allele has mean frequency M across subpopulations, denoting by p k the frequency of the allele in subpopulation k, H S = k= [p k + ( p k ] and H T = [M + ( M ]. F ST and related statistics have a wide range of applications. For example, F ST is used as a descriptive statistic whose values are routinely reported in empirical population-genetic studies (Holsinger and Weir 009. It is considered as a test statistic for spatially divergent selection, either acting on a locus (Lewontin and rakauer 973; Bonhomme et al. 00 or, using comparisons to a corresponding phenotypic statistic Q ST, on a trait (Leinonen et al. 03. F ST is also used as a summary statistic for demographic inference, to measure gene flow between subpopulations (Slatkin 985, or via approximate Bayesian computation, to estimate demographic parameters (Cornuet et al Applications of F ST generally assume that values near 0 indicate that there are almost no genetic differences among subpopulations, and that values near indicate that subpopulations are genetically different (Hartl and Clark 997; Frankham et al. 00; Holsinger and Weir 009. Mathematical studies, however, have Copyright 07. Genetics, Vol. XXX, XXXX XXXX May 07

2 challenged the simplicity of this interpretation, commenting that the range of values that F ST can take is considerably restricted by the allele frequency distribution (Table. Such studies have highlighted a direct relationship between allele frequencies and constraints on the range of F ST, through functions of the allele frequency distribution such as the mean heterozygosity across subpopulations, H S. The maximal F ST has been shown to decrease as a function of H S, both for an infinite (Hedrick 999 and for a fixed finite number of subpopulations (Long and ittles 003; Hedrick 005. Consequently, if subpopulations differ in their alleles but separately have high heterozygosity, then H S can be high and F ST can be low; F ST can be near 0 even if subpopulations are completely genetically different in the sense that no allele occurs in more than one subpopulation. Detailed mathematical results have clarified the relationship between allele frequencies and F ST in the case of = subpopulations. Considering a biallelic marker, Maruki et al. (0 evaluated the constraint on F ST by the frequency M of the most frequent allele: the maximal F ST decreases monotonically from to 0 with increasing M, M <. Jakobsson et al. (03 extended this result to multiallelic markers with an unspecified number of distinct alleles, showing that the maximal F ST increases from 0 to as a function of M when 0 < M <, and decreases from to 0 when M < in the manner reported by Maruki et al. (0. Edge and Rosenberg (04 generalized these results to the case of a fixed finite number of alleles, showing that the maximal F ST differs slightly from the unspecified case when the fixed number of distinct alleles is odd. In this study, we characterize the relationship between F ST and the frequency M of the most frequent allele, for a biallelic marker and an arbitrary number of subpopulations. We derive the mathematical upper bound on F ST in terms of M, extending the biallelic -subpopulation result to arbitrary. To assist in interpreting the bound, we simulate the joint distribution of F ST and M in the island migration model, describing its properties as a function of the number of subpopulations and the migration rate. The -population upper bound on F ST as a function of M facilitates an explanation of counterintuitive aspects of global human genetic differentiation. We discuss the importance of the results for applications of F ST more generally. Mathematical constraints Model Our goal is to derive the range of values F ST can take the lower and upper bounds on F ST as a function of the frequency M of the most frequent allele for a biallelic marker when the number of subpopulations is a fixed finite value greater than or equal to. We consider a polymorphic locus with two alleles, A and a, in a setting with subpopulations contributing equally to the total population. We denote the frequency of allele A in subpopulation k by p k. The frequency of allele a in subpopulation k is p k. Each allele frequency p k lies in the interval [0, ]. The mean frequency of allele A across the subpopulations is M = k= p k, and the mean frequency of allele a is M. Without loss of generality, we assume that allele A is the more frequent allele in the total population, so that M M. Because by assumption the locus is polymorphic, M =. We assume that the allele frequencies M and p k are parametric allele frequencies of the total population and subpopulations, and not estimated values computed from data. F ST as a function of M Eq. expresses F ST as a ratio involving within-subpopulation heterozygosity, H S, and total heterozygosity, H T. We substitute H S and H T in eq. with their respective expressions in terms of allele frequencies: F ST = k= [p k + ( p k ] [M + ( M ] [M + ( M ]. ( Simplifying eq. by noting that k= p k = M leads to: ( p k M k= F ST =. (3 M( M For fixed M, we seek the vectors (p, p,..., p, with p k [0, ] and k= p k = M, that minimize and maximize F ST. We clarify here that in mathematical analysis of the relationship between F ST and allele frequencies, we adopt an interpretation of F ST as a statistic that describes a mathematical function of allele frequencies rather than as a parameter that describes coancestry of individuals in a population. Multiple interpretations of F ST exist, giving rise to different expressions for computing it (e.g. Nei 973; Nei and Chesser 983; Weir and Cockerham 984; Holsinger and Weir 009. Our interest is in disentangling properties of the mathematical function by which true allele frequencies are used to compute F ST from the population-genetic relationships among individuals and sampling phenomena that could be viewed as affecting the computation. As a result, it is natural to follow the statistic interpretation that has been used in earlier scenarios involving F ST bounds in relation to allele frequencies viewed as parameters rather than as estimates or outcomes of a stochastic process (Table and in which such a disentanglement is possible. We return to this topic in the Discussion. Lower bound From eq. 3, for all M [,, setting p k = M in all subpopulations k yields F ST = 0. The Cauchy-Schwarz inequality guarantees that k= p k ( k= p k, with equality if and only if p = p =... = p. Hence, k= p k = ( k= p k, or, dividing both sides by, k= p k = M, requires p = p =... = p = M. Examining eq. 3, (p, p,..., p = (M, M,..., M is thus the only vector that yields F ST = 0. We can conclude that the lower bound on F ST is equal to 0 irrespective of M, for any value of the number of subpopulations. Upper bound To derive the upper bound on F ST in terms of M, we must maximize F ST in eq. 3, assuming that M and are constant. Because all terms in eq. 3 depend only on M and except the positive term k= p k in the numerator, maximizing F ST corresponds to maximizing k= p k at fixed M and. Denote by x the greatest integer less than or equal to x, and by {x} = x x the fractional part of x. Using a result from Rosenberg and Jakobsson (008, Theorem from Appendix A states that the maximum for k= p k satisfies p k M + {M}, (4 k= with equality if and only if allele A has frequency in M subpopulations, frequency {M} in a single subpopulation, and Alcala and Rosenberg

3 Table Studies describing the mathematical constraints on F ST. H S and H T denote the within-subpopulation and total heterozygosities, respectively, δ denotes the absolute difference in the frequency of a specific allele between two subpopulations, and M denotes the frequency of the most frequent allele. Reference Number of alleles Number of subpopulations Variable in terms of which constraints are reported Hedrick 999 unspecified value H S Long and ittles 003 unspecified value fixed finite value H S Rosenberg et al. 003 δ Hedrick 005 unspecified value fixed finite value H S Maruki et al. 0 H S, M Jakobsson et al. 03 unspecified value H T, M Edge and Rosenberg 04 fixed finite value H T, M This paper fixed finite value M Instead of heterozygosities H S or H T, some studies consider homozygosities J S = H S or J T = J T. frequency 0 in all other subpopulations. Substituting eq. 4 into eq. 3, we obtain the upper bound for F ST : F ST M + {M} M. (5 M( M The upper bound on F ST in terms of M has a piecewise structure, with changes in shape occurring when M is an integer. For i = /, / +,...,, define the interval I i by [, i+ for i = / in the case that is odd and by [ i, i+ for all other (i,. For M I i, M has a constant value i. Writing x = M M = M i so that M = i+x, for each interval I i, the upper bound on F ST is a smooth function Q i (x = (i + x (i + x (i + x( i x, (6 where x lies in [0, (or in [, for odd and i = /, and i lies in [, ]. The conditions under which the upper bound is reached illuminate its interpretation. The maximum requires the most frequent allele to have frequency or 0 in all except possibly one subpopulation, so that the locus is polymorphic in at most a single subpopulation. Thus, F ST is maximal when fixation is achieved in as many subpopulations as possible. Figure shows the upper bound on F ST in terms of M for various values of. It has peaks at values i, where it is possible for the allele to be fixed in all subpopulations and for F ST to reach a value of. Between i i+ and, the function reaches a local minimum, eventually decreasing to 0 as M approaches. The upper bound is not differentiable at the peaks, and it is smooth and strictly below between the peaks. If is even, then the upper bound begins from a local maximum at M =, whereas if is odd, it begins from a local minimum at M =. Properties of the upper bound Local maxima. We explore properties of the upper bound on F ST as a function of M for fixed by examining the local maxima and minima. The upper bound is equal to on interval I i if and only if the numerator and denominator in eq. 6 are equal. Noting that, this condition is equivalent to x = x and hence, because 0 x <, x = {M} = 0. Thus, on interval I i for M, the maximal F ST is if and only if M is an integer. M has exactly integer values for M [,. Consequently, given, there are exactly maxima at which F ST can equal, at M = +, +3,..., if is odd and at M =, +,..., if is even. This analysis finds that F ST is only unconstrained within the unit interval for a finite set of values of the frequency M of the most frequent allele. The size of this set increases with the number of subpopulations. Local minima. Equality of the upper bound at the right endpoint of each interval I i and the left endpoint of I i+ for each i from to demonstrates that the upper bound on F ST is a continuous function of M. Consequently, local minima necessarily occur between the local maxima. If is even, then the upper bound on F ST possesses local minima, each inside an interval I i, i =, +,...,. If is odd, then the upper bound has local minima, the first in interval [, +, and each of the others in an interval I i, with i = +, +3,...,. Note that because we restrict attention to M [,, we do not count the point at M = and F ST = 0 as a local minimum. Theorem from Appendix B describes the relative positions of the local minima within intervals I i, as a function of the number of subpopulations. From Proposition of Appendix B, for fixed, the relative position of the local minimum within interval I i increases with i; as a result, the leftmost dips in the upper bound (those near M = are less tilted toward the right endpoints of their associated intervals than are the subsequent dips (nearer M =. The unique local minimum in interval I i lies either exactly at M = i+/ = for the leftmost dip for odd (Proposition or slightly to the right of the midpoint i+/ of interval I i in other intervals, but no farther from the center than M = i+ i (Proposition 3. The values of the upper bound on F ST at the local minima as a function of i are computed in Appendix B (eq. B.5 by substituting the positions M of the local minima into eq. 5. From Proposition 4 of Appendix B, for fixed, the value of F ST at the local minimum in interval I i decreases as i increases. The maximal Mathematical constraints on F ST 3

4 A B C D E = = 3 = 7 = 0 = M M Figure Bounds on F ST as a function of the frequency of the most frequent allele, M, for different numbers of subpopulations. (A =. (B = 3. (C = 7. (D = 0. (E = 40. The shaded region represents the space between the upper and lower bounds on F ST. The upper bound is computed from eq. 5; for each, the lower bound is 0 for all values of M. F ST among local minima increases as increases (Proposition 5. The upper bound on F ST at the local minimum closest to M = tends to as (Proposition 5. The upper bound on F ST at the local minimum closest to M =, however, is always smaller than 84 (Proposition 6. In conclusion, although F ST is constrained below for all values of M in the interior of intervals I i = [ i, i+, the constraint is reduced as and in the limit, it even completely disappears in the interval I i closest to M =. Nevertheless, there always exists a value of M < for which the upper bound on F ST is lower than 84. Mean range of possible F ST values. We now evaluate how strongly M constrains the range of F ST as a function of. Following similar computations for other settings where F ST is considered in relation to a quantity that constrains it (Jakobsson et al. 03; Edge and Rosenberg 04, we compute the mean maximum F ST across all possible M values. This quantity, denoted A(, is the area between the lower and upper bounds on F ST divided by the length of the domain for M, or. A( near 0 indicates a strong constraint. Because the lower bound on F ST is 0 for all M between and, A( corresponds to the area under the upper bound on F ST divided by, or twice the integral of eq. 5 between and : A( = M= M + {M} M dm. (7 M( M The integral is computed in Appendix C. We obtain A( = + (+ ln 4 i ln i. (8 i= We also obtain an asymptotic approximation Ã( A( in Appendix C, where Ã( = ln 3 4 ln C. (9 Here, C.84 represents the Glaisher-inkelin constant. A( = ln , in accord with the = case of Jakobsson et al. (03. Interestingly, the constraint on the mean range of F ST disappears as. Indeed, from eq. 9, we immediately see that lim A( = (Figure. As a mean of indicates that F ST ranges from 0 to for all M (except possibly on a set of measure 0, for large, the range of F ST is approximately invariant with respect to M. A( Figure The mean A( of the upper bound on F ST over the interval M [,, as a function of the number of subpopulations. A( is computed from eq. 8 (black line. The approximation Ã( is computed from eq. 9 (gray dashed line. A numerical computation of the relative error of the approximation as a function of, A( Ã( /A(, finds that the maximal error for 000 is 074, achieved when =. The x-axis is plotted on a logarithmic scale. The increase of A( with is monotonic (Theorem 3 of Appendix C. By numerically evaluating eq. 8, we find that although A( , for 7, A( > 0.75, and for 46, A( > Nevertheless, although the mean of the upper bound on F ST approaches, we have shown in Proposition 6 from Appendix B that for large, values of M continue to exist at which the upper bound is constrained substantially below. Evolutionary processes and the joint distribution of M and F ST for a biallelic marker and subpopulations Thus far, we have described the mathematical constraint imposed on F ST by M without respect to the frequency with which particular values of M arise in evolutionary scenarios. As an assessment of the bounds in evolutionary models can illuminate the settings in which they are most salient in population-genetic data analysis (Hedrick 005; Whitlock 0; Rousset 03; Alcala et al. 04; Wang 05, we simulated the joint distribution of F ST and M in three migration models, in each case relating the distribution to the mathematical bounds on F ST. This analysis considers allele frequency distributions, and hence values of M and F ST, generated by evolutionary models. Simulations We simulated independent single-nucleotide polymorphisms (SNPs under the coalescent, using the software MS (Hudson 4 Alcala and Rosenberg

5 00. We considered a population of total size N diploid individuals subdivided into subpopulations of equal size N. At each generation, a proportion m of the individuals in a subpopulation originated from another subpopulation. Thus, the scaled migration rate is 4Nm, and it corresponds to twice the number of individuals in a subpopulation that originate elsewhere. We focus on the finite island model (Maruyama 970; Wakeley 998, in which migrants have the same probability m to come from any specific other subpopulation. The finite rectangular and linear stepping-stone models are considered in Supplementary File S and generate similar results (Figures S-S4. We examined three values of (, 7, 40 and three values of 4Nm (0.,, 0. Note that in MS, time is scaled in units of 4N generations, so there is no need to specify the subpopulation sizes N. To obtain independent SNPs, we used the MS command -s to fix the number of segregating sites S to. For each parameter pair (, 4Nm, we performed 00,000 replicate simulations, sampling 00 sequences per subpopulation in each replicate. F ST values were computed from the parametric allele frequencies. Fixing S = and accepting all coalescent genealogies entails an implicit assumption that all genealogies have equal potential to produce exactly one segregating site. We therefore also considered a different approach to generating SNPs, assuming an infinitely-many-sites model with a specified scaled mutation rate θ and discarding simulations leading to S >. We chose θ so that the expected number of segregating sites in a constant-sized population, or N θ i= i, was. This approach produces similar results to the fixed-s simulation (Figure S5. MS commands appear in Supplementary File S. Weak migration Under the island model with weak migration (4Nm = 0., the joint distribution of M and F ST is highest near the upper bound on F ST in terms of M, for all (Figure 3A-C. For =, most SNPs have M near 0.5, representing fixation of the major allele in one subpopulation and absence in the other, and F ST near (Figure 3A. The mean F ST in sliding windows for M closely follows the upper bound on F ST. For = 7, most SNPs have M near 4 7, 5 7, or 6 7, representing fixation of the major allele in 4, 5, or 6 subpopulations and absence in the others, and F ST (Figure 3B. The mean F ST closely follows the upper bound. For = 40, most SNPs either have M near 37 40, , or 40, and F ST, or M < and F ST 0.9 (Figure 3C. The mean F ST follows the upper bound for M > For M < 40, it lies below the upper bound and does not possess its characteristic peaks. We can interpret these patterns using the model of Wakeley (999, which showed that when migration is infrequent compared to coalescence, coalescence follows two phases. In the scattering phase, lineages coalesce in each subpopulation, leading to a state with a single lineage per subpopulation. In the collecting phase, lineages from different subpopulations coalesce. As a result, considering subpopulations with equal sample size n, when 4Nm, genealogies tend to have long branches close to the root, each corresponding to a subpopulation and each leading to n shorter terminal branches. The long branches coalesce as pairs accumulate by migration in shared ancestral subpopulations. A random mutation on such a genealogy is likely to occur in one of two places. It can occur on a long branch during the collecting phase, in which case the derived allele will have frequency in all subpopulations whose lineages descend from the branch, and 0 in the others. Alternatively, it can occur toward the terminal branches in the scattering phase, 4Nm = 0. 4Nm = 4Nm = 0 F ST F ST F ST A.0 0. D.0 0. G = M B.0 0. E H = 7 = 40 C M Figure 3 Joint density of the frequency M of the most frequent allele and F ST in the island migration model, for different numbers of subpopulations and scaled migration rates 4Nm (where N is the subpopulation size and m the migration rate. (A =, 4Nm = 0.. (B = 7, 4Nm = 0.. (C = 40, 4Nm = 0.. (D =, 4Nm =. (E = 7, 4Nm =. (F = 40, 4Nm =. (G =, 4Nm = 0. (H = 7, 4Nm = 0. (I = 40, 4Nm = 0. The black solid line represents the upper bound on F ST in terms of M (eq. 5; the red dashed line represents the mean F ST in sliding windows of M of size (plotted from 0.5 to Colors represent the density of SNPs, estimated using a Gaussian kernel density estimate with a bandwidth of 07, with density set to 0 outside of the bounds. SNPs are simulated using coalescent software MS, assuming an island model of migration and conditioning on segregating site. See Figure S5 for an alternative algorithm for simulating SNPs. Each panel considers 00,000 replicate simulations, with 00 lineages sampled per subpopulation. Figures S and S3 present similar results under finite rectangular and linear stepping-stone migration models. in which case the mutation will have frequency p k > 0 in one subpopulation and 0 in all others. These scenarios that are likely under weak migration one allele fixed in some subpopulations or present only in one subpopulation correspond closely to conditions under which the upper bound on F ST is reached at fixed M. Thus, the properties of likely genealogies explain the proximity of F ST to its upper bound. Intermediate and strong migration With intermediate migration (4Nm =, for all, the joint density of M and F ST is highest at lower values of F ST than with 4Nm = 0. (Figure 3D-F. For =, most SNPs have M >, and the mean F ST is almost equidistant from the upper and lower bounds on F ST, nearing the upper bound as M increases (Figure 3D. For = 7, most SNPs have M > 0.9; as was seen for =, the mean F ST is almost equidistant from the upper and lower bounds, moving toward the upper bound as M increases (Figure 3E. For = 40, the pattern is similar, most SNPs having M > 0.95 (Figure 3F F.0 0. I.0 0. M proportion of SNPs Mathematical constraints on F ST 5

6 Under intermediate migration, migration is sufficient that more mutations than in the weak-migration case generate polymorphism in multiple subpopulations. A random mutation is likely to occur on a branch that leads to many terminal branches from the same subpopulation, but also to branches from other subpopulations. Thus, the allele is likely to have intermediate frequency in multiple subpopulations. This setting does not generate the conditions under which the upper bound on F ST is reached, so that except at the largest M, intermediate migration leads to values farther from the upper bound than in the weakmigration case. For large M, the rarer allele is likely to be only in one subpopulation, so that F ST is nearer to the upper bound. With strong migration (4Nm = 0, the joint density of M and F ST nears the lower bound (Figure 3G-I. For each, most SNPs have M > 0.9 and F ST 0, with the mean F ST increasing somewhat as increases. Under strong migration, because lineages can migrate between subpopulations quickly, they can also coalesce quickly, irrespective of their subpopulations of origin. As a result, a random mutation is likely to occur on a branch that leads to terminal branches in many subpopulations. The allele is expected to have comparable frequency in all subpopulations, so that F ST is likely to be small. This scenario corresponds to the conditions under which the lower bound on F ST is approached. Proximity of the joint density of M and F ST to the upper bound To summarize features of the relationship of F ST to the upper bound seen in Figure 3, we can quantify the proximity of the joint density of M and F ST to the bounds on F ST. For a set of Z loci, denote by F z and M z the values of F ST and M at locus z. The mean F ST for the set, or F ST, is F ST = Z Z F z, (0 z= Using eq. 5, a corresponding mean maximum F ST given the observed M z, z =,,..., Z, denoted F max, is F max = Z Z z= M z + {M z } Mz. ( M z ( M z The ratio F ST / F max gives a sense of the proximity of the F ST values to their upper bounds: it ranges from 0, when F ST values at all SNPs equal their lower bounds, to, when F ST values at all SNPs equal their upper bounds. Figure 4 shows the ratio F ST / F max under the island model, for different values of and 4Nm. For each each value of the number of subpopulations, F ST / F max decreases with 4Nm. This result summarizes the influence of the migration rate observed in Figure 3: F ST values tend to be close to the upper bound under weak migration, and near the lower bound under strong migration. F ST / F max is only minimally influenced by the number of subpopulations (Figure 4. Even though the upper bound on F ST in terms of M is strongly affected by, the proximity of F ST to the upper bound is similar across values. Application to human genomic data We now use our theoretical results to explain observed patterns of human genetic differentiation, and in particular, to explain the impact of the number of subpopulations. We examine data from Li et al. (008 on 577,489 SNPs from 938 individuals of the Human Genome Diversity Panel (HGDP; Cann et al. 00, as compiled by Pemberton et al. (0. We use the same division of F ST / F max.0 0. = =7 =40 4Nm = 0. 4Nm = 0 4Nm =.0 Figure 4 F ST / F max, the ratio of the the mean F ST to the mean maximal F ST given the observed frequency M of the most frequent allele, as a function of the number of subpopulations and the scaled migration rate 4Nm, for the island migration model. Colors represent values of. F ST values are computed from coalescent simulations using MS, for 0,000 independent SNPs and 00 lineages sampled per subpopulation. F max is computed from eq.. Figure S4 presents similar results under rectangular and linear stepping-stone migration models. the individuals into seven geographic regions that was examined by Li et al. (008 (Africa, Middle East, Europe, Central and South Asia, East Asia, Oceania, America. Previous studies of these individuals have used F ST to compare differentiation in regions with different numbers of subpopulations sampled (Rosenberg et al. 00; Ramachandran et al. 004; Rosenberg 0. We computed the parametric allele frequencies for each region, averaging across regions to obtain the frequency M of the most frequent allele. We then computed F ST for each SNP, averaging F ST values across SNPs to obtain the overall F ST for the full SNP set. To assess the impact of the number of subpopulations on the relationship between M and F ST, we computed F ST for all 0 sets of two or more geographic regions (Figure 5. The pairwise F ST values range from 07 (between Middle East and Europe to (Africa and America, with a mean of 57, standard deviation of 7, and median of 6. F ST is substantially larger for sets of three geographic regions. The smallest value is larger, (Middle East, Europe, Central/South Asia, as is the largest value, 0.33 (Africa, Oceania, America, the mean of 76, and the median of 89. Among the 5 = 05 ways of adding a third region to a pair of regions, 83 produce an increase in F ST. For 7 sets of three regions, the value of F ST exceeds that for each of its three component pairs. The pattern of increase of F ST with the inclusion of additional subpopulations can be seen in Figure 6A, which plots the F ST values from Figure 5 as a function of. The magnitude of the increase is greatest from = to = 3, decreasing with increasing. From = 3 to 4, 8 of 40 additions of a region increase F ST ; 54 of 05 produce an increase from = 4 to 5, of 4 from = 5 to 6, and 3 of 7 from = 6 to 7. The seven-region F ST of exceeds all the pairwise F ST values. The larger F ST values with increasing can be explained by the difference in constraints on F ST in terms of M (Figure 7. For fixed M, as we saw in the increase of A( with (Figure, the permissible range of F ST values is smaller on average for F ST values computed among smaller sets of populations than among larger sets. For example, the maximal F ST value at the mean M of 0.76 observed in pairwise comparisons is 0.33 for = (black line in Figure 7A, while the maximal F ST value at the mean M of 6 Alcala and Rosenberg

7 Number of regions A B 55 XX X.X X..X X...X.. X...X. X...X 07.XX X.X X..X.. 7.X...X. 64.X...X 09..XX X.X X..X. 64..X...X 9...XX X.X. 5...X..X 6...XX X.X 93...XX 59 XXX XX.X XX..X.. XX...X. 99 XX...X 63 X.XX X.X.X.. X.X..X. 6 X.X...X 77 X..XX.. 99 X..X.X. 96 X..X..X 9 X...XX. 3 X...X.X 0.33 X...XX.XXX... 4.XX.X.. 64.XX..X. 57.XX...X 36.X.XX.. 60.X.X.X. 53.X.X..X 78.X..XX. 69.X..X.X 0.X...XX 36..XXX.. 6..XX.X. 53..XX..X 8..X.XX. 70..X.X.X 3..X..XX 67...XXX XX.X 89...X.XX 87...XXX 55 XXXX XXX.X.. 95 XXX..X. 9 XXX...X 7 XX.XX.. 90 XX.X.X. 87 XX.X..X 6 XX..XX. XX..X.X XX...XX 76 X.XXX.. 96 X.XX.X. 9 X.XX..X 0. X.X.XX. 5 X.X.X.X 0.9 X.X..XX X..XXX. 96 X..XX.X 0.9 X..X.XX X...XXX 35.XXXX.. 55.XXX.X. 49.XXX..X 74.XX.XX. 66.XX.X.X 90.XX..XX 67.X.XXX. 59.X.XX.X 84.X.X.XX 95.X..XXX 69..XXXX. 60..XXX.X 85..XX.XX 97..X.XXX 84...XXXX 69 XXXXX.. 85 XXXX.X. 8 XXXX..X 0 XXX.XX. 95 XXX.X.X 0.5 XXX..XX 94 XX.XXX. 90 XX.XX.X 9 XX.X.XX 0.9 XX..XXX 98 X.XXXX. 93 X.XXX.X 0.3 X.XX.XX 0.3 X.X.XXX X..XXXX 64.XXXXX. 57.XXXX.X 78.XXX.XX 89.XX.XXX 83.X.XXXX 84..XXXXX 90 XXXXXX. 86 XXXXX.X XXXX.XX 0. XXX.XXX 7 XX.XXXX X.XXXXX 79.XXXXXX XXXXXXX Figure 5 Mean F ST values across loci for sets of geographic regions. Each box represents a particular combination of, 3, 4, 5, 6, or all 7 geographic regions. Within a box, the numerical value shown is F ST among the regions. The regions considered are indicated by the pattern of "." and "X" symbols within the box, with "X" indicating inclusion and "." indicating exclusion. From left to right, the regions are Africa, Middle East, Europe, Central/South Asia, East Asia, Oceania, America. Thus, for example, X...X.. indicates the subset {Africa, East Asia}. Lines are drawn between boxes that represent nested subsets. A line is colored red if the larger subset has a higher F ST value, and blue if it has a lower F ST. Computations rely on 577,489 SNPs from the Human Genome Diversity Panel. F ST Figure 6 F ST values for sets of geographic regions as a function of, the number of regions considered. (A F ST, computed using eq. 0. (B F ST / F max, computed using eq.. For each subset of populations, the value of F ST is taken from Figure 5. The mean across subsets for a fixed appears as a solid red line, and the median as a dashed red line observed for the global comparison of seven regions is 6 for = 7 (Figure 7B. Given the stronger constraint in pairwise calculations, it is not unexpected that pairwise F ST values would be smaller than the values computed with more regions, such as in the 7-region computation. Interestingly, the effect of on F ST is largely eliminated when F ST values are normalized by their maxima (Figure 6B. The normalization, which takes both and M into account, generates nearly constant means and medians of F ST as functions of, with higher values for =. Discussion We have evaluated the constraint imposed by the frequency M of the most frequent allele at a biallelic locus on the range of F ST, for arbitrarily many subpopulations. Although F ST is unconstrained in the unit interval when M = i for integers i satisfying i, it is constrained below for all other M. We have found that the number of subpopulations has considerable impact on the range of F ST, with a weaker constraint on F ST as increases. As shown by Jakobsson et al. (03 for =, across possible values of M, F ST is restricted to 38.63% of the possible space. For = 00, however, F ST can occupy 97.47% of the space. Although the mean over M values of the permissible interval for F ST approaches the full unit interval as, for any, an allele frequency M < ( / exists for which the maximal F ST is lower than. Multiple studies have highlighted the relationship between F ST and M in two subpopulations, for biallelic markers (Rosenberg et al. 003; Maruki et al. 0, and more generally, for an unspecified (Jakobsson et al. 03 or specified number of alleles (Edge and Rosenberg 04. We have extended these results to the case of biallelic markers in a specified but arbitrary number of subpopulations, comprehensively describing the relationship between F ST and M for the biallelic case. The study is part of an increasing body of work characterizing the mathematical relationship of population-genetic statistics with quantities that constrain them (Hedrick 999, 005; Rosenberg and Jakobsson 008; Reddy and Rosenberg 0. As we have seen, such relationships contribute to understanding the behavior of the statistics in evolutionary models and to interpreting counterintuitive results in human population genetics. F ST / F max Mathematical constraints on F ST 7

8 Properties of F ST in evolutionary models. Our work extends classical results about the impact of evolutionary processes on F ST values. Wright (95 showed that in an equilibrium population, F ST is expected to be near if migration is weak, and near 0 if migration is strong. On the basis of our simulations, we can more precisely formulate this proposition: considering a SNP at frequency M in an equilibrium population, F ST is expected to be near its upper bound in terms of M if migration is weak and near 0 if migration is strong. This formulation of Wright s proposition makes it possible to explain why SNPs subject to the same migration process can display a variety of F ST patterns; indeed, under weak migration, we expect F ST values to mirror the considerable variation in the upper bound on F ST in terms of M. Lower F ST values in pairwise comparisons than in comparisons of more subpopulations. F ST values have often been compared across computations with different numbers of subpopulations. Such comparisons appear frequently, for example, in studies of domesticated animals such as horses, pigs, and sheep (Cañon et al. 000; im et al. 005; Lawson Handley et al In human populations, Table of the microsatellite study of Rosenberg et al. (00 presents comparisons of F ST values for scenarios with ranging from to 5. Table 3 of Rosenberg et al. (006 compares F ST values for microsatellites and biallelic indels in population sets with ranging from to 8. Major SNP studies have also compared F ST values for scenarios with = and = 3 groups (Hinds et al. 005; International HapMap Consortium 005. Our results suggest that such comparisons between F ST values with different can hide an effect of the number of subpopulations, especially when some of the comparisons involve the most strongly constrained case of =. For human data, we found that owing to a difference in the F ST constraint for different values, pairwise F ST values between continental regions were consistently lower than F ST computed using three or more regions, and sets of three regions were identified for which the F ST value exceeded the values for all three pairs of regions in the set. The effect of might help illuminate why SNP-based pairwise human F ST values (Table S of 000 Genomes Project Consortium et al. 0 are generally smaller than estimates that use all populations together (.% of genetic variance due to between-region or between-population differences; Li et al We find that comparing F ST values with different choices of can generate as much difference twofold as comparing F ST with different marker types (Holsinger and Weir 009. This substantial impact of on F ST merits further attention. Consequences for the use of F ST as a test statistic. The effects of constraints on F ST extend beyond the use of F ST as a statistic for genetic differentiation. In F ST -based genome scans for local adaptation, tracing to the work of Lewontin and rakauer (973, a hypothesis of spatially divergent selection at a candidate locus is evaluated by comparing F ST at the locus with the F ST distribution estimated from a set of putatively neutral loci. Under this test, F ST values smaller or larger than expected by chance are interpreted as being under stabilizing or divergent selection, respectively. Modern versions of this approach compare F ST values at single loci with the distribution across the genome (Beaumont and Nichols 996; Akey et al. 00; Foll and Gaggiotti 008; Bonhomme et al. 00; Günther and Coop 03. The constraints on F ST in our work and the work of Jakobsson et al. (03 and Edge and Rosenberg (04 suggest that F ST values strongly depend on the frequency of the most frequent allele. Consequently, we expect that F ST outlier tests that do F ST A mean M M B mean M Figure 7 Joint density of the frequency M of the most frequent allele and F ST in human population-genetic data, considering 577,489 SNPs. (A F ST computed for pairs of geographic regions. The density is evaluated from the set of F ST values for all pairs of regions. (B F ST computed among = 7 geographic regions. The figure design follows Figure 3. not explicitly take into account this constraint will result in a deficit of power at loci with high- and low-frequency alleles. Because pairwise F ST and F ST values in many populations have different constraints, we predict that the effect of the constraint on outlier tests relying on a single global F ST (e.g., Beaumont and Nichols 996; Foll and Gaggiotti 008 will be smaller than in tests relying on pairwise F ST (e.g., Günther and Coop 03. M proportion of SNPs F ST as a statistic or as a parameter The perspective we used in obtaining F ST bounds treats F ST as a mathematical function of allele frequencies rather than as a population-genetic parameter. Thus, the starting point for our mathematical analysis (eq. 3 is that the allele frequencies are mathematical constants rather than random outcomes of an evolutionary process. In the alternative perspective that F ST is a parameter rather than a statistic, both the sample of alleles drawn from a set of subpopulations and the sample of subpopulations drawn from a larger collection of subpopulations are treated as random. An analysis of mathematical bounds analogous to our analysis of eq. 3 in terms of M would then investigate bounds on estimators of F ST, where the value of the estimator is bounded in terms of the largest sample allele frequency. In this perspective, the estimator of Weir and Cockerham (984 for a biallelic locus ( ˆθ, Weir 996, p. 73 under an assumption of equal sample sizes in the subpopulations and either haploid data or a random union of gametes in diploids is ˆθ = s n [ M( M s ]. ( M( M + s Here, n is the sample size per subpopulation (n diploid individuals or n haploids, M is the mean sample frequency of the most frequent allele across subpopulations, and s = k= ( p k M is the empirical variance of the sample frequency p k of the most frequent allele across subpopulations. Although eq. has more terms than eq. 3, it can be shown that for fixed M with M < and fixed and n, s and hence ˆθ are minimized and maximized under corresponding conditions to those that minimize and maximize eq. 3. In particular, Theorem applies to { p k } k=, with k= p k = M. We then expect that corresponding mathematical results to those seen for F ST as computed in eq. 3 will hold for ˆθ from eq.. Such computations indicate that our statistic perspective on F ST generates mathematical results of interest to a parameter interpretation of F ST. 8 Alcala and Rosenberg

9 Conclusion. Many recent articles have noted that F ST often behaves counterintuitively (Whitlock 0; Alcala et al. 04; Wang 05 for example, indicating low differentiation in cases in which populations do not share any alleles (Balloux et al. 000; Jost 008 or suggesting less divergence among populations than is visible in clustering analyses (Tishkoff et al. 009; Algee-Hewitt et al. 06. It has thus become clear that observed F ST patterns often trace to peculiar mathematical properties of F ST in particular its relationship to other statistics such as homozygosity or allele frequency instead of to biological phenomena of interest. Our work here, extending approaches of Jakobsson et al. (03 and Edge and Rosenberg (04, seeks to characterize those properties, so that the influence of mathematical constraints on F ST can be disentangled from biological phenomena. One response to the dependence of F ST on M may be to compute F ST only when allele frequencies lie in a specific class, such as M Such choices can potentially avoid a misleading interpretation that a genetic differentiation measure is low in scenarios when minor alleles, though rare, are in fact private to single populations. We note, however, that the dependence of F ST spans the full range of values of M, and exists for values of M both above and below a choice of cutoff. In addition, this dependence varies with the number of subpopulations, so that use of the same cutoff could have a different effect on F ST values in scenarios with different numbers of subpopulations. In a potentially more informative approach, addressing the mathematical dependence of F ST on the within-subpopulation mean heterozygosity H S, Wang (05 has proposed plotting the joint distribution of H S and F ST to assess the correlation between the two statistics. Using the island model, Wang (05 argued that when H S and F ST are uncorrelated, F ST is expected to be more revealing about the demographic history of a species than when they are strongly correlated and F ST merely reflects the within-subpopulation diversity. Our results suggest a related framework: studies can compare plots of the joint distribution of M and F ST with the bounds on F ST in terms of M. This framework, which examines constraints on F ST in terms of allele frequencies in the total population, complements that of Wang (05, which considers constraints in terms of subpopulation allele frequencies. Such analyses, considering F ST together with additional measures of allele frequencies, are desirable in diverse scenarios for explaining counterintuitive F ST phenomena, for avoiding overinterpretation of F ST values, and for making sense of F ST comparisons across settings that have a substantial difference in the nature of one or more underlying parameters. Data availability. MS commands for the coalescent simulations appear in Supplementary File S. See Li et al. (008 for the human SNP data. Acknowledgements. We thank the editor M. Beaumont,. Holsinger, and an anonymous reviewer for comments on an earlier version of the manuscript. Support was provided by NSF grants BCS-557 and DBI , NIJ grant 04-DN- BX-05, a postdoctoral fellowship from the Stanford Center for Computational, Evolutionary, and Human Genomics, and Swiss National Science Foundation Early Postdoc.Mobility fellowship PLAP3_6869. Literature Cited 000 Genomes Project Consortium et al., 0 An integrated map of genetic variation from,09 human genomes. Nature 49: Akey, J. M., G. Zhang,. Zhang, L. Jin, and M. D. Shriver, 00 Interrogating a high-density SNP map for signatures of natural selection. Genome Res. : Alcala, N., J. Goudet, and S. Vuilleumier, 04 On the transition of genetic differentiation from isolation to panmixia: what we can learn from G ST and D. Theor. Pop. Biol. 93: Algee-Hewitt, B. F. B., M. D. Edge, J. im, J. Z. Li, and N. A. Rosenberg, 06 Individual identifiability predicts population identifiability in forensic microsatellite markers. Curr. Biol. 6: Balloux, F., H. Brünner, N. Lugon-Moulin, J. Hausser, and J. Goudet, 000 Microsatellites can be misleading: an empirical and simulation study. Evolution 54: Beaumont, M. A. and R. A. Nichols, 996 Evaluating loci for use in the genetic analysis of population structure. Proc. R. Soc. Lond. Ser. B Biol. Sci. 63: Bonhomme, M., C. Chevalet, B. Servin, S. Boitard, J. Abdallah, S. Blott, and M. SanCristobal, 00 Detecting selection in population trees: the Lewontin and rakauer test extended. Genetics 86: 4 6. Cann, H. M., C. De Toma, L. Cazes, M.-F. Legrand, V. Morel, L. Piouffre, J. Bodmer, W. F. Bodmer, B. Bonne-Tamir, A. Cambon- Thomsen, et al., 00 A human genome diversity cell line panel. Science 96: 6 6. Cañon, J., M. L. Checa, C. Carleos, J. L. Vega-Pla, M. Vallejo, and S. Dunner, 000 The genetic structure of Spanish Celtic horse breeds inferred from microsatellite data. Anim. Genet. 3: Cornuet, J.-M., F. Santos, M. A. Beaumont, C. P. Robert, J.-M. Marin, D. J. Balding, T. Guillemaud, and A. Estoup, 008 Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 4: Edge, M. D. and N. A. Rosenberg, 04 Upper bounds on F ST in terms of the frequency of the most frequent allele and total homozygosity: the case of a specified number of alleles. Theor. Pop. Biol. 97: Foll, M. and O. Gaggiotti, 008 A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 80: Frankham, R., J. D. Ballou, and D. A. Briscoe, 00 Introduction to Conservation Genetics. Cambridge University Press, Cambridge. Günther, T. and G. Coop, 03 Robust identification of local adaptation from allele frequencies. Genetics 95: Hartl, D. L. and A. G. Clark, 997 Principles of Population Genetics. Sinauer, Sunderland, MA. Hedrick, P. W., 999 Highly variable loci and their interpretation in evolution and conservation. Evolution 53: Hedrick, P. W., 005 A standardized genetic differentiation measure. Evolution 59: Hinds, D. A., L. L. Stuve, G. B. Nilsen, E. Halperin, E. Eskin, D. G. Ballinger,. A. Frazer, and D. R. Cox, 005 Wholegenome patterns of common DNA variation in three human populations. Science 307: Holsinger,. E. and B. S. Weir, 009 Genetics in geographically structured populations: defining, estimating and interpreting F ST. Nature Rev. Genet. 0: Hudson, R. R., 00 Generating samples under a Wright Fisher neutral model of genetic variation. Bioinformatics 8: International HapMap Consortium, 005 A haplotype map of the human genome. Nature 437: Mathematical constraints on F ST 9

10 Jakobsson, M., M. D. Edge, and N. A. Rosenberg, 03 The relationship between F ST and the frequency of the most frequent allele. Genetics 93: Jost, L., 008 G ST and its relatives do not measure differentiation. Mol. Ecol. 7: im, T. H.,. S. im, B. H. Choi, D. H. Yoon, G. W. Jang,. T. Lee, H. Y. Chung, H. Y. Lee, H. S. Park, and J. W. Lee, 005 Genetic structure of pig breeds from orea and China using microsatellite loci analysis. J. Anim. Sci. 83: Lawson Handley, L. J.,. Byrne, F. Santucci, S. Townsend, M. Taylor, M. W. Bruford, and G. Hewitt, 007 Genetic structure of European sheep breeds. Heredity 99: Leinonen, T., R. J. S. McCairns, R. B. O Hara, and J. Merilä, 03 Q ST F ST comparisons: evolutionary and ecological insights from genomic heterogeneity. Nature Rev. Genet. 4: Lewontin, R. and J. rakauer, 973 Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74: Li, J. Z., D. M. Absher, H. Tang, A. M. Southwick, A. M. Casto, S. Ramachandran, H. M. Cann, G. S. Barsh, M. Feldman, L. L. Cavalli-Sforza, et al., 008 Worldwide human relationships inferred from genome-wide patterns of variation. Science 39: Long, J. C. and R. A. ittles, 003 Human genetic diversity and the nonexistence of biological races. Hum. Biol. 75: Maruki, T., S. umar, and Y. im, 0 Purifying selection modulates the estimates of population differentiation and confounds genome-wide comparisons across single-nucleotide polymorphisms. Mol. Biol. Evol. 9: Maruyama, T., 970 Effective number of alleles in a subdivided population. Theor. Pop. Biol. : Nei, M., 973 Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70: Nei, M. and R.. Chesser, 983 Estimation of fixation indices and gene diversities. Ann. Hum. Genet. 47: Pemberton, T. J., D. Absher, M. W. Feldman, R. M. Myers, N. A. Rosenberg, and J. Z. Li, 0 Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 9: Ramachandran, S., N. A. Rosenberg, L. A. Zhivotovsky, and M. W. Feldman, 004 Robustness of the inference of human population structure: A comparison of X-chromosomal and autosomal microsatellites. Hum. Genomics : Reddy, S. B. and N. A. Rosenberg, 0 Refining the relationship between homozygosity and the frequency of the most frequent allele. J. Math. Biol. 64: Rosenberg, N. A., 0 A population-genetic perspective on the similarities and differences among worldwide human populations. Hum. Biol. 83: Rosenberg, N. A. and M. Jakobsson, 008 The relationship between homozygosity and the frequency of the most frequent allele. Genetics 79: Rosenberg, N. A., L. M. Li, R. Ward, and J.. Pritchard, 003 Informativeness of genetic markers for inference of ancestry. Am. J. Hum. Genet. 73: Rosenberg, N. A., S. Mahajan, C. Gonzalez-Quevedo, M. G. B. Blum, L. Nino-Rosales, V. Ninis, P. Das, M. Hegde, L. Molinari, G. Zapata, et al., 006 Low levels of genetic divergence across geographically and linguistically diverse populations from India. PLoS Genet. : e5. Rosenberg, N. A., J.. Pritchard, J. L. Weber, H. M. Cann,.. idd, L. A. Zhivotovsky, and M. W. Feldman, 00 Genetic structure of human populations. Science 98: Rousset, F., 03 Exegeses on maximum genetic differentiation. Genetics 94: Slatkin, M., 985 Rare alleles as indicators of gene flow. Evolution 39: Tishkoff, S. A., F. A. Reed, F. R. Friedlaender, C. Ehret, A. Ranciaro, A. Froment, J. B. Hirbo, A. A. Awomoyi, J.-M. Bodo, O. Doumbo, M. Ibrahim, A. T. Juma, M. J. otze, G. Lema, J. H. Moore, H. Mortensen, T. B. Nyambo, S. A. Omar,. Powell, G. S. Pretorius, M. W. Smith, M. A. Thera, C. Wambebe, J. L. Weber, and S. M. Williams, 009 The genetic structure and history of Africans and African Americans. Science 34: Wakeley, J., 998 Segregating sites in Wright s island model. Theor. Pop. Biol. 53: Wakeley, J., 999 Nonequilibrium migration in human history. Genetics 53: Wang, J., 05 Does G ST underestimate genetic differentiation from marker data? Mol. Ecol. 4: Weir, B. S., 996 Genetic Data Analysis II. Sinauer, Sunderland, MA. Weir, B. S. and C. C. Cockerham, 984 Estimating F-statistics for the analysis of population structure. Evolution 38: Whitlock, M. C., 0 G ST and D do not replace F ST. Mol. Ecol. 0: Wright, S., 95 The genetical structure of populations. Ann. Eugen. 5: Appendix A. Demonstration of eq. 4 This appendix provides the derivation of the upper bound on k= p k as a function of and M. Theorem. Suppose σ > 0 and σ + are specified, where is an integer. Considering all sequences {p k } k= with p k [0, ], k= p k = σ, and k < l implies p k < p l, k= p k is maximal if and only if p k = for k σ, p σ + = σ σ, and p k = 0 for k > σ +, and its maximum is (σ σ + σ. Proof. This theorem is a special case of Lemma 3 from Rosenberg and Jakobsson (008, which states (changing notation for some of the variables to avoid confusion: Suppose A > 0 and C > 0 and that C/A is denoted L. Considering all sequences {p i } i= with p i [0, A], i= p i = C, and i < j implies p i p j, H(p = i= p i is maximal if and only if p i = A for i L, p L = C (L A, and p i = 0 for i > L, and its maximum is L(L A C(L A + C. In our special case, we apply the lemma with A = and C = σ. We also restrict consideration to sequences of finite rather than infinite length; however, our condition σ + for the number of terms in the sequence guarantees that the maximum in the case of infinite sequences which requires σ σ + nonzero terms is attainable with sequences of the finite length we consider. For convenience in numerical computations, we state our result using the floor function rather than the ceiling function, requiring some bookkeeping to obtain our corollary. If σ is not an integer, then in Lemma 3 of Rosenberg and Jakobsson (008, L = σ +, and the maximum occurs with p = p =... = p σ =, p σ + = σ σ, and p k = 0 for k > σ +, equaling L(L A C(L A + C = ( σ + σ σ σ + σ. 0 Alcala and Rosenberg

Genetic Drift in Human Evolution

Genetic Drift in Human Evolution Genetic Drift in Human Evolution (Part 2 of 2) 1 Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University Outline Introduction to genetic drift Modeling genetic drift

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Population Structure

Population Structure Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Lecture 13: Population Structure. October 8, 2012

Lecture 13: Population Structure. October 8, 2012 Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

Testing for spatially-divergent selection: Comparing Q ST to F ST

Testing for spatially-divergent selection: Comparing Q ST to F ST Genetics: Published Articles Ahead of Print, published on August 17, 2009 as 10.1534/genetics.108.099812 Testing for spatially-divergent selection: Comparing Q to F MICHAEL C. WHITLOCK and FREDERIC GUILLAUME

More information

Detecting selection from differentiation between populations: the FLK and hapflk approach.

Detecting selection from differentiation between populations: the FLK and hapflk approach. Detecting selection from differentiation between populations: the FLK and hapflk approach. Bertrand Servin bservin@toulouse.inra.fr Maria-Ines Fariello, Simon Boitard, Claude Chevalet, Magali SanCristobal,

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

BECAUSE microsatellite DNA loci or short tandem

BECAUSE microsatellite DNA loci or short tandem Copyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.107.081505 Empirical Tests of the Reliability of Phylogenetic Trees Constructed With Microsatellite DNA Naoko Takezaki*,1 and Masatoshi

More information

What is genetic differentiation, and how should we measure it--gst, D, neither or both?

What is genetic differentiation, and how should we measure it--gst, D, neither or both? What is genetic differentiation, and how should we measure it--gst, D, neither or both? Verity, R; Nichols, RA "This is the peer reviewed version of the following article which has been published in final

More information

Theoretical Population Biology

Theoretical Population Biology Theoretical Population Biology 87 (013) 6 74 Contents lists available at SciVerse ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb Genotype imputation in a coalescent

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Introduction to Advanced Population Genetics

Introduction to Advanced Population Genetics Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency

More information

Match probabilities in a finite, subdivided population

Match probabilities in a finite, subdivided population Match probabilities in a finite, subdivided population Anna-Sapfo Malaspinas a, Montgomery Slatkin a, Yun S. Song b, a Department of Integrative Biology, University of California, Berkeley, CA 94720, USA

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

Genetic Variation in Finite Populations

Genetic Variation in Finite Populations Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

I of a gene sampled from a randomly mating popdation,

I of a gene sampled from a randomly mating popdation, Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University

More information

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006 Lecture 13: Variation Among Populations and Gene Flow Oct 2, 2006 Questions about exam? Last Time Variation within populations: genetic identity and spatial autocorrelation Today Variation among populations:

More information

Supporting Information

Supporting Information Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Surfing genes. On the fate of neutral mutations in a spreading population

Surfing genes. On the fate of neutral mutations in a spreading population Surfing genes On the fate of neutral mutations in a spreading population Oskar Hallatschek David Nelson Harvard University ohallats@physics.harvard.edu Genetic impact of range expansions Population expansions

More information

Robust demographic inference from genomic and SNP data

Robust demographic inference from genomic and SNP data Robust demographic inference from genomic and SNP data Laurent Excoffier Isabelle Duperret, Emilia Huerta-Sanchez, Matthieu Foll, Vitor Sousa, Isabel Alves Computational and Molecular Population Genetics

More information

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates JOSEPH FELSENSTEIN Department of Genetics SK-50, University

More information

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs Statistical Tests for Detecting Positive Selection by Utilizing High Frequency SNPs Kai Zeng *, Suhua Shi Yunxin Fu, Chung-I Wu * * Department of Ecology and Evolution, University of Chicago, Chicago,

More information

A simple genetic model with non-equilibrium dynamics

A simple genetic model with non-equilibrium dynamics J. Math. Biol. (1998) 36: 550 556 A simple genetic model with non-equilibrium dynamics Michael Doebeli, Gerdien de Jong Zoology Institute, University of Basel, Rheinsprung 9, CH-4051 Basel, Switzerland

More information

Long-term adaptive diversity in Levene-type models

Long-term adaptive diversity in Levene-type models Evolutionary Ecology Research, 2001, 3: 721 727 Long-term adaptive diversity in Levene-type models Éva Kisdi Department of Mathematics, University of Turku, FIN-20014 Turku, Finland and Department of Genetics,

More information

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data Proc. Natl. Acad. Sci. USA Vol. 95, pp. 5456 546, May 998 Statistics Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data ANDRZEJ

More information

Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended

Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.117275 Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended Maxime Bonhomme,* Claude Chevalet,*

More information

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences Indian Academy of Sciences A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences ANALABHA BASU and PARTHA P. MAJUMDER*

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

Expected coalescence times and segregating sites in a model of glacial cycles

Expected coalescence times and segregating sites in a model of glacial cycles F.F. Jesus et al. 466 Expected coalescence times and segregating sites in a model of glacial cycles F.F. Jesus 1, J.F. Wilkins 2, V.N. Solferini 1 and J. Wakeley 3 1 Departamento de Genética e Evolução,

More information

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Evolutionary Theory Mathematical and Conceptual Foundations Sean H. Rice Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Contents Preface ix Introduction 1 CHAPTER 1 Selection on One

More information

A Test of the Influence of Continental Axes of Orientation on Patterns of Human Gene Flow

A Test of the Influence of Continental Axes of Orientation on Patterns of Human Gene Flow AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 146:515 529 (2011) A Test of the Influence of Continental Axes of Orientation on Patterns of Human Gene Flow Sohini Ramachandran 1,2 * and Noah A. Rosenberg 3,4,5

More information

The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations

The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.094904 The Impact of Sampling Schemes on the Site Frequency Spectrum in Nonequilibrium Subdivided Populations Thomas Städler,*,1

More information

Endowed with an Extra Sense : Mathematics and Evolution

Endowed with an Extra Sense : Mathematics and Evolution Endowed with an Extra Sense : Mathematics and Evolution Todd Parsons Laboratoire de Probabilités et Modèles Aléatoires - Université Pierre et Marie Curie Center for Interdisciplinary Research in Biology

More information

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009 Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?

More information

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift: 9. Genetic Drift Genetic drift is the alteration of gene frequencies due to sampling variation from one generation to the next. It operates to some degree in all finite populations, but can be significant

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Demography April 10, 2015

Demography April 10, 2015 Demography April 0, 205 Effective Population Size The Wright-Fisher model makes a number of strong assumptions which are clearly violated in many populations. For example, it is unlikely that any population

More information

Neutral Theory of Molecular Evolution

Neutral Theory of Molecular Evolution Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation

More information

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity, AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity, Today: Review Probability in Populatin Genetics Review basic statistics Population Definition

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

Supplementary Figures.

Supplementary Figures. Supplementary Figures. Supplementary Figure 1 The extended compartment model. Sub-compartment C (blue) and 1-C (yellow) represent the fractions of allele carriers and non-carriers in the focal patch, respectively,

More information

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Coalescent based demographic inference. Daniel Wegmann University of Fribourg Coalescent based demographic inference Daniel Wegmann University of Fribourg Introduction The current genetic diversity is the outcome of past evolutionary processes. Hence, we can use genetic diversity

More information

Microsatellite data analysis. Tomáš Fér & Filip Kolář

Microsatellite data analysis. Tomáš Fér & Filip Kolář Microsatellite data analysis Tomáš Fér & Filip Kolář Multilocus data dominant heterozygotes and homozygotes cannot be distinguished binary biallelic data (fragments) presence (dominant allele/heterozygote)

More information

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE Department of Biology, University of Penmyluania, Philadelphia, Pennsyluania 19174 Manuscript received September 17,

More information

LINKAGE DISEQUILIBRIUM, SELECTION AND RECOMBINATION AT THREE LOCI

LINKAGE DISEQUILIBRIUM, SELECTION AND RECOMBINATION AT THREE LOCI Copyright 0 1984 by the Genetics Society of America LINKAGE DISEQUILIBRIUM, SELECTION AND RECOMBINATION AT THREE LOCI ALAN HASTINGS Defartinent of Matheinntics, University of California, Davis, Calijornia

More information

Population genetics snippets for genepop

Population genetics snippets for genepop Population genetics snippets for genepop Peter Beerli August 0, 205 Contents 0.Basics 0.2Exact test 2 0.Fixation indices 4 0.4Isolation by Distance 5 0.5Further Reading 8 0.6References 8 0.7Disclaimer

More information

ALLELIC DIVERSITY AND ITS IMPLICATIONS FOR THE RATE OF ADAPTATION

ALLELIC DIVERSITY AND ITS IMPLICATIONS FOR THE RATE OF ADAPTATION Genetics: Early Online, published on October, 203 as 0.534/genetics.3.5840 ALLELIC DIVERSITY AND ITS IMPLICATIONS FOR THE RATE OF ADAPTATION Armando Caballero and Aurora García-Dorado 2 Dep. Bioquímica,

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Package FinePop. October 26, 2018

Package FinePop. October 26, 2018 Type Package Title Fine-Scale Population Analysis Version 1.5.1 Date 2018-10-25 Package FinePop October 26, 2018 Author Reiichiro Nakamichi, Hirohisa Kishino, Shuichi Kitada Maintainer Reiichiro Nakamichi

More information

BIOL 502 Population Genetics Spring 2017

BIOL 502 Population Genetics Spring 2017 BIOL 502 Population Genetics Spring 2017 Lecture 1 Genomic Variation Arun Sethuraman California State University San Marcos Table of contents 1. What is Population Genetics? 2. Vocabulary Recap 3. Relevance

More information

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.

More information

Fitness landscapes and seascapes

Fitness landscapes and seascapes Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of

More information

Notes on Population Genetics

Notes on Population Genetics Notes on Population Genetics Graham Coop 1 1 Department of Evolution and Ecology & Center for Population Biology, University of California, Davis. To whom correspondence should be addressed: gmcoop@ucdavis.edu

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

122 9 NEUTRALITY TESTS

122 9 NEUTRALITY TESTS 122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that

More information

Populations in statistical genetics

Populations in statistical genetics Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January

More information

Shane s Simple Guide to F-statistics

Shane s Simple Guide to F-statistics Pop. g. stats2.doc - 9/3/05 Shane s Simple Guide to F-statistics Intro The aim here is simple & very focussed: im : a very brief introduction to the most common statistical methods of analysis of population

More information

OVERVIEW. L5. Quantitative population genetics

OVERVIEW. L5. Quantitative population genetics L5. Quantitative population genetics OVERVIEW. L1. Approaches to ecological modelling. L2. Model parameterization and validation. L3. Stochastic models of population dynamics (math). L4. Animal movement

More information

Genetic hitch-hiking in a subdivided population

Genetic hitch-hiking in a subdivided population Genet. Res., Camb. (1998), 71, pp. 155 160. With 3 figures. Printed in the United Kingdom 1998 Cambridge University Press 155 Genetic hitch-hiking in a subdivided population MONTGOMERY SLATKIN* AND THOMAS

More information

Big Idea #1: The process of evolution drives the diversity and unity of life

Big Idea #1: The process of evolution drives the diversity and unity of life BIG IDEA! Big Idea #1: The process of evolution drives the diversity and unity of life Key Terms for this section: emigration phenotype adaptation evolution phylogenetic tree adaptive radiation fertility

More information

Evaluating the performance of a multilocus Bayesian method for the estimation of migration rates

Evaluating the performance of a multilocus Bayesian method for the estimation of migration rates University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Publications, Agencies and Staff of the U.S. Department of Commerce U.S. Department of Commerce 2007 Evaluating the performance

More information

Genetics. Metapopulations. Dept. of Forest & Wildlife Ecology, UW Madison

Genetics. Metapopulations. Dept. of Forest & Wildlife Ecology, UW Madison Genetics & Metapopulations Dr Stacie J Robinson Dr. Stacie J. Robinson Dept. of Forest & Wildlife Ecology, UW Madison Robinson ~ UW SJR OUTLINE Metapopulation impacts on evolutionary processes Metapopulation

More information

Multivariate analysis of genetic data an introduction

Multivariate analysis of genetic data an introduction Multivariate analysis of genetic data an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Population genomics in Lausanne 23 Aug 2016 1/25 Outline Multivariate

More information

Formalizing the gene centered view of evolution

Formalizing the gene centered view of evolution Chapter 1 Formalizing the gene centered view of evolution Yaneer Bar-Yam and Hiroki Sayama New England Complex Systems Institute 24 Mt. Auburn St., Cambridge, MA 02138, USA yaneer@necsi.org / sayama@necsi.org

More information

(Write your name on every page. One point will be deducted for every page without your name!)

(Write your name on every page. One point will be deducted for every page without your name!) POPULATION GENETICS AND MICROEVOLUTIONARY THEORY FINAL EXAMINATION (Write your name on every page. One point will be deducted for every page without your name!) 1. Briefly define (5 points each): a) Average

More information

Contrasts for a within-species comparative method

Contrasts for a within-species comparative method Contrasts for a within-species comparative method Joseph Felsenstein, Department of Genetics, University of Washington, Box 357360, Seattle, Washington 98195-7360, USA email address: joe@genetics.washington.edu

More information

STABILIZING SELECTION ON HUMAN BIRTH WEIGHT

STABILIZING SELECTION ON HUMAN BIRTH WEIGHT STABILIZING SELECTION ON HUMAN BIRTH WEIGHT See Box 8.2 Mapping the Fitness Landscape in Z&E FROM: Cavalli-Sforza & Bodmer 1971 STABILIZING SELECTION ON THE GALL FLY, Eurosta solidaginis GALL DIAMETER

More information

Sequence evolution within populations under multiple types of mutation

Sequence evolution within populations under multiple types of mutation Proc. Natl. Acad. Sci. USA Vol. 83, pp. 427-431, January 1986 Genetics Sequence evolution within populations under multiple types of mutation (transposable elements/deleterious selection/phylogenies) G.

More information

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Challenges when applying stochastic models to reconstruct the demographic history of populations. Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017 Outline 1 Introduction 2 Inverse

More information

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm * Evolutionary Dynamics and Extensive Form Games by Ross Cressman Reviewed by William H. Sandholm * Noncooperative game theory is one of a handful of fundamental frameworks used for economic modeling. It

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Disentangling the effects of geographic and. ecological isolation on genetic differentiation

Disentangling the effects of geographic and. ecological isolation on genetic differentiation Disentangling the effects of geographic and ecological isolation on genetic differentiation arxiv:1302.3274v4 [q-bio.pe] 11 Sep 2013 Gideon S. Bradburd 1,a, Peter L. Ralph 2,b, Graham M. Coop 1,c 1 Center

More information

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. EVOLUTIONARY BIOLOGY EXAM #1 Fall 2017 There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. Part I. True (T) or False (F) (2 points each). Circle

More information

Effective population size and patterns of molecular evolution and variation

Effective population size and patterns of molecular evolution and variation FunDamental concepts in genetics Effective population size and patterns of molecular evolution and variation Brian Charlesworth Abstract The effective size of a population,, determines the rate of change

More information

Conservation Genetics. Outline

Conservation Genetics. Outline Conservation Genetics The basis for an evolutionary conservation Outline Introduction to conservation genetics Genetic diversity and measurement Genetic consequences of small population size and extinction.

More information

arxiv: v1 [q-bio.qm] 7 Aug 2017

arxiv: v1 [q-bio.qm] 7 Aug 2017 HIGHER ORDER EPISTASIS AND FITNESS PEAKS KRISTINA CRONA AND MENGMING LUO arxiv:1708.02063v1 [q-bio.qm] 7 Aug 2017 ABSTRACT. We show that higher order epistasis has a substantial impact on evolutionary

More information

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants Genetics: Published Articles Ahead of Print, published on September 1, 2006 as 10.1534/genetics.106.061432 Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Kai Zeng,*

More information

Heaving Toward Speciation

Heaving Toward Speciation Temporal Waves of Genetic Diversity in a Spatially Explicit Model of Evolution: Heaving Toward Speciation Guy A. Hoelzer 1, Rich Drewes 2 and René Doursat 2,3 1 Department of Biology, 2 Brain Computation

More information

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of

More information

BIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data

BIOINFORMATICS. StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data BIOINFORMATICS Vol. 00 no. 00 2011 Pages 1 9 StructHDP: Automatic inference of number of clusters and population structure from admixed genotype data Suyash Shringarpure 1, Daegun Won 1 and Eric P. Xing

More information

Mathematical Population Genetics II

Mathematical Population Genetics II Mathematical Population Genetics II Lecture Notes Joachim Hermisson March 20, 2015 University of Vienna Mathematics Department Oskar-Morgenstern-Platz 1 1090 Vienna, Austria Copyright (c) 2013/14/15 Joachim

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

Supporting information for Demographic history and rare allele sharing among human populations.

Supporting information for Demographic history and rare allele sharing among human populations. Supporting information for Demographic history and rare allele sharing among human populations. Simon Gravel, Brenna M. Henn, Ryan N. Gutenkunst, mit R. Indap, Gabor T. Marth, ndrew G. Clark, The 1 Genomes

More information

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,

More information