1 Positively versus Negatively Frequency-Dependent Selection Robert Morris and Tim Watson Faculty of Technology, De Montfort University, Leicester, United Kingdom, LE1 9BH Abstract. Frequency-dependent selection (FDS) refers to situations where individual fitnesses are dependent (to some degree) on where the individual s alleles lie in the proximate allele frequency distribution. If the dependence is negative that is, if alleles become increasingly detrimental to fitness as they become increasingly common at a given locus then genetic diversity may be maintained. If the dependence is positive, then alleles may converge at given loci. A hypothetical evolutionary model of FDS is here presented, in which the individuals themselves determined by means of a gene whether their fitnesses were positively or negatively frequency-dependent. The population ratio of the two types of individual was monitored in runs with different parameters, and explanations of what happened are offered. Key words: Frequency-dependent selection, multiple alleles, meta gene 1 Introduction Ridley s textbook Evolution  contains a good entry on FDS, the opening of which is reproduced here. Frequency-dependent selection occurs when the fitness of a genotype depends on its frequency. It is possible for the fitness of a genotype to increase (positively frequency-dependent) or decrease (negatively frequency-dependent) as the genotype frequency in the population increases. Many abstract models of FDS have been studied, with the principal aims being to strengthen the theory underpinning these phenomena, and to explore the surrounding space of possibilities. Curtsinger  looked at many different selection modes, and found a condition that determined whether or not the system would stably converge. Asmussen and Basnayake  studied several models, focussing on the potential for the maintenance of genetic diversity, and Roff  looked at maintaining both phenotypic and additive variation via FDS. Bürger  performed an extensive analysis of a general model (of which previous known models could be considered special cases) and gave a near-complete characterisation of the equilibrium structure. Schneider  carried out a similar study, with multiple alleles and loci, and found no equilibria possible with more than two alleles at a locus. And Trotter and Spencer  investigated the potential for maintaining polymorphism taking into account the presence of positive FDS.
2 2 Positively versus Negatively Frequency-Dependent Selection In every previous simulation of FDS, the selection regime has been imposed on the individuals by (essentially) the environment, in order to reproduce realworld conditions. In the present FDS model, the novel step is taken of putting the form of the frequency-dependence under the control of the individuals. In more precise terms, the individuals have a meta-gene in their genotype that dictates whether their fitness will be proportional to how similar they are to their neighbours, or how different they are to them. (Cf. the meta-genes in  and .) The purpose here is not to model any known natural phenomena, but to speculatively extend the domain of theoretical FDS in an interesting what if? way. 2 Experiments and Analysis A standard genetic algorithm was written whose genotype consisted of one meta gene plus a chromosome. The meta gene was a bit, and the chromosome consisted of non-negative integers in a given range. A zero allele in the meta gene told the fitness function to reward that individual for similarity, and a one told the function to reward it for difference. The measures were based on the concept of Hamming distance, and were implemented here in two different ways: pairwise, and population-wide. In the pairwise method, an individual s fitness was calculated by comparing it to a randomly chosen individual from the population (which could have been itself). If the individual in question had a meta gene of zero, its fitness was given by 1 plus the total number of genes 1 by locus which it had in common with the other individual, ignoring the meta genes. If the meta gene was a one, the fitness was given by 1 plus the total number of genes which they did not have in common. For example, for the individuals < > and < >, the fitness of the first w.r.t. the second would be 4 (= 1 + 3), whereas the fitness of the second w.r.t. the first would be 3. In the population-wide method, an individual s fitness was calculated by comparing it to every individual in the population (including itself). This was performed by applying the pairwise method down the whole population, and tallying up the points. These were the only components of the fitnesses. Six runs were executed for every combination of the following parameter ranges: population size = 10 and 100; chromosome length = 1, 4, and 40; allele range = 2 (i.e. bitstrings), 3, and 4; fitness calculation = pairwise and population wide; mutation rate = zero, low, and high ; crossover = off and on ( 70% across each new population). The chromosomes were initialised randomly, but the metabits were set alternately to 0 and 1, to prevent biased starts. The main datum that was tracked during every run was the ratio of the two types of individual in the population per generation this is what the vertical axes represent in most of the figures. A value of 1 indicates that the population is dominated by individuals whose fitness is negatively frequency-dependent (hereafter NFDs ); a value of 0 1 The minimal possible fitness was 1 so no individual could have a zero probability of being selected.
3 Positively versus Negatively Frequency-Dependent Selection 3 indicates domination by individuals with positively frequency-dependent fitness (PFDs); a value of 0.5 indicates a 50:50 mixture. 2.1 Two Alleles When the bitstring results were plotted and compared, it was seen that neither the presence of crossover nor the choice of fitness function (pairwise or population wide) made much difference. The mutation rate was not particularly important either, so long as it was neither vanishingly small nor excessively high. And the population size and the chromosome length only changed the destiny of the system when they were very small. Figure 1 shows what usually happened in the experiments for all but the extreme settings. The PFDs made more copies of themselves than the NFDs from the outset, and the population soon became dominated by PFDs. The system entered an evolutionarily stable state, which mutation and crossover could not overturn. Fig. 1. How the ratio of binary PFDs (meta gene = 0) to NFDs (meta gene = 1) varied over the first 200 generations of two sets of six runs. Plot (a) shows a noisy case, and plot (b) shows a more representative case. Why the PFDs always defeated the NFDs in base-2 (in non-extreme conditions) can be understood in the following terms. If two bitstrings are generated completely at random, the Hamming distance between them will lie somewhere between zero and their length, according to a bell-shaped probability distribution with a peak at half their length. In other words, they will probably have half their bits in common. This means that in a random initial population of 50% PFDs and 50% NFDs, the mixture of fitnesses found in the PFD group will probably be the same as that of the NFD group. In the first few generations therefore, selection will effectively be random. This means that different individuals will make varying numbers of copies of themselves, so after the first few generations, the population will comprise a number of (near-)homogenous groups of PFDs, a number of (near-)homogenous groups of NFDs, and a mixture of unique individuals of both type. (Crossover is temporally being ignored here, but when it is factored in it does not change the result.) In the pairwise fitness method, a member of a group will be compared to one of three other kinds of individual: a fellow group member, a member of another group, or one of the singletons. A PFD will get maximum points from a fellow group member, and (essentially) random points from the others. A NFD, on the other hand, will
4 4 Positively versus Negatively Frequency-Dependent Selection get minimum points from a fellow group member, and random points from the others. This means that as the population evolves, the PFDs get progressively fitter and the NFDs get progressively less fit, and at the same time, PFD groups that are similar to each other will grow faster than PFD groups that are more genetically isolated. The inevitable outcome is that the NFDs all die out as the population drives towards uniformity. The same occurs with the population-wide fitness method, because the same fitness mixture is there. A converged PFD-only population cannot be invaded by an NFD mutant, because such a mutant would have a minimal (or near minimal) fitness compared to the maximal (or near maximal) fitnesses of the PFDs. A diverse PFD-only population is resistant to NFD mutants by an amount that negatively correlates to its diversity: if it is approaching convergence, it will have a high resistance, but if it is very mixed, then an NFD mutant could arise with a fitness comparable to those of the PFDs. However, if an NFD mutant does manage to get into a mixed population and starts spreading, the mutant group will die down, for the same reason as their kind dies out from the start. 2.2 Three Alleles When the base-3 results were plotted, it was seen that the mutation rate and fitness function were similarly (ir)relevant as for base 2, but that the chromosome length and the population size were important, as was crossover in certain circumstances. One of the most important differences between the two-allele and the three-allele systems was how they behaved initially, from a random start. Whereas with two alleles, the population generally converged quickly to zeroes at the meta-locus (i.e. the PFDs dominated), with three alleles the population generally did the opposite, and converged to ones at the meta-locus. This was because in every initial population, any two individuals could expect to have on average only 1 3 of their genes in common (as opposed to the bitstring case, where it was 1 2 ). The NFDs could thus expect fitnesses of 2 3 of the maximum, while the PFDs could only expect 1 3 of the maximum. Consequently, the NFDs made around twice as many copies of themselves during the first few generations, so quickly wiped out the PFDs. Hence in every run (excepting some with extreme parameters) the populations took themselves into a negativelyfrequency-dependent selection regime. (This is also what happens for all larger allele alphabets, so the two-allele situation is exceptional in this regard.) These NFD convergences did not always last: they often turned out to be merely the first of two phases. When the population was large, and particularly when the chromosome was short as well, the NFD domination seemed very stable. Figure 2 shows two examples of this stability, as well as the fast convergences mentioned previously. No PFDs could invade during the timescales of the experiments, some of which went as far as 5000 generations, so these situations were the complements in terms of the PFD:NFD ratio of the two-allele situations (though they were not complementary in terms of diversity, because NFD populations stay diverse while PFD populations converge).
5 Positively versus Negatively Frequency-Dependent Selection 5 Fig. 2. How the ratio of PFDs to NFDs varied during the first and last 100 generations of two sets of six 2000-generation runs. The NFDs dominate in both cases, partly because the chromosomes were short and the populations large. Two examples of the second phase are plotted in figure 3. In (a), which is representative of most small-population cases, the NFDs died out nearly as quickly as they took over, whereas in (b) where the populations were large it took varying lengths of time for the PFDs to successfully invade, with the longest being around 300 generations. As stated earlier, PFD-domination states, where the genotypes are converged or converging, are global attractors in this kind of system, and a return to NFD domination may only occur via a vastly improbable mutation and/or selection sequence. Fig. 3. How the ratio of PFDs to NFDs varied over the first 200 and 400 generations of two sets of six runs. The PFDs eventually dominate in both cases. 2.3 Multiple Alleles The sets of results gathered for base-4 chromosomes were almost the same as their counterparts in base-3, and preliminary runs in even higher bases indicated that the patterns continue. Explanations of why certain multi-allele populations can support long-term NFD domination, but others cannot, are now offered. It was found that the configurations most conducive to stable NFD-domination were those of very large populations and allele alphabets, but very short chromosomes, ideally single-locus. Reducing the population size, reducing the number of alleles, and increasing the chromosome length all tended to reduce the length of time the NFDs could survive before before displaced by PFDs. This result can be explained with an example. In a population of 200 NFD-individuals with base-5 single-gene chromosomes (i.e. the genotype comprises 2 genes: the meta gene [0..1] + the chromosomal gene [0..4]) where there are 40 of each allele, every
6 6 Positively versus Negatively Frequency-Dependent Selection individual s fitness as measured by the inclusive population-wide method is 161. If an individual experienced a mutation in its meta gene, its fitness would be 41: with only 1 4 the fitness of its neighbours, it would struggle to survive. And if it did survive and spread a little, its group would continue to struggle, because no matter how many copies it made, as long as there were at least two other alleles in the population, those others would be fitter. The experimental reality would be that the mutant would disappear in a generation or two s time, as would any copies it managed to make. An observer would have to wait a long time to see a PFD takeover. Now, if the allele range is increased, the expected fitness of a PFD mutant decreases, and if the population size is increased, the amount of work it must do to dominate the population increases with it. This is why those settings have the effect they do. Regarding the chromosome length, the effect of increasing it is perhaps counter-intuitive; one might have thought that longer chromosomes would have more capacity to be different from each other, thereby making NFDdomination stabler and longer lasting. Not only is this not the case, it is the opposite of the case, for the following reason. In a well-spaced-out population, at each locus there should be approximately equal ratios of the alleles across the population (so for example, in a population of 100 base-4 individuals, 25 individuals should have a 0 as their first gene, 25 should have a 1, etc.). When the chromosome is short, the low capacity for difference means that every gene is important, so every gene has healthy selective pressure on it. But when the chromosome is long, a given gene is less important, as it represents only a very small portion of the distance between individuals, so the selective pressure applied on it is relatively light. Consequently, whereas for short chromosomes the local allele ratios are kept under quite tight control, for long chromosomes they can drift and become skewed. This tends to reduce the genetic distance between individuals, making it easier (to some degree) for PFD mutants to establish themselves in the population. 2.4 Crossover The last parameter to be discussed is crossover. This operator did not discernably change the results in most of the runs, but when the populations and chromosomes were (relatively) large and the number of alleles greater than two, it made a difference. The disappearance of NFDs shown in figure 3(b) did not happen in that same time period in different runs when the only difference in settings was the absence of crossover. This may seem strange at first, when it is considered that crossover does not change population-wide gene frequencies at any loci, and that it is those frequencies that control the fitnesses. Something subtle was happening. In a mutation-only system (with a big population, long chromosomes, and multiple alleles) where NFDs are dominant, the population is usually made up of several roughly-equally-fit groups, within each of which there is homogeneity or near homogeneity. If a group happens to expand, its members fitnesses dip, and the fitnesses of the non-members rise, so the group
7 Positively versus Negatively Frequency-Dependent Selection 7 usually shrinks back down. (This is a standard dynamic that one finds in populations subject to negatively FDS.) The crucial observation is that when a group expands, causing certain alleles become undesirably frequent at several loci, the undesirable genes are carried by identifiable individuals. Evolution can therefore remove these genes by selecting against the individuals that carry them. When crossover of any type is added, it breaks up the group structure of the population, but this in itself does not have too much impact on the system s behaviour. Crossing over can be described as dispersing or mixing up the alleles at each locus across the population. Thus, if any individuals now make extra copies of themselves, the alleles they add to the population are dispersed up and down the loci, so there are no identifiably-bad individuals that can be selected against. In other words, instead of selection having guilty individuals it can remove, the guilt is spread across the population, so there are no longer any outstandingly guilty individuals. Fig. 4. The mean fitnesses over the first 300 generations of three sets of six runs. To see the exact effect crossover had, the mean fitnesses were plotted for the relevant base-3 runs. Figure 4 shows how they varied for 0%, 10%, and 70% uniform crossover, where the other parameters were the same as those of plot (b) in figure 3. Figure 4(a) shows a case where the groups scenario was played out. The fitnesses quickly rose to around 2 3 of the maximum which was to be expected with three alleles and stayed there. The variability of the values reflects the group-sizes changing as well as the drifting of groups themselves. Plot (c) which represents the exact same runs as those in figure 3(b) differs in two keys way to (a). Firstly, there is a gradual drop after the initial rise, and secondly, the curves take off at those times that correspond to the PFD-mutant invasions, as PFD populations are fitter than NFD ones. Plot (b) shows that a very small amount of crossover changes the system s behaviour. It is roughly the same as (c), with the key differences being higher resistances to PFD invasions, and the shorter durations of those invasions when they occur. The former can be attributed to less dispersion of unwanted alleles; the latter shows that crossover slows down invasions, because the dispersals make the PFDs less similar to each other. Further runs with more alleles and other population sizes suggested that crossover causes the mean fitness to decay geometrically from its high early value to a stable value somewhere above the halfway value. (Also, the dip depth increased with the number of alleles, so the dips in figure 4 are the least severe examples of their kind.) It appears that it is in those resultant regions of stable
8 8 Positively versus Negatively Frequency-Dependent Selection fitness that selection can positively promote the rarer alleles as effectively as crossover can assist the commoner ones. 3 Conclusion This paper has sought to present and explain a novel hypothetical model of frequency-dependent selection in which the individuals determine for themselves whether the frequency dependence of their fitness is positive or negative. It was found that in this particular artificial evolutionary system, the important parameters are the population size, the chromosome length, the allele range, and the presence or not of crossover. When there are only two alleles (the binary case) a random initial population will become stably dominated by individuals whose fitness is positively frequency-dependent. When there are more than two alleles, the population will become dominated by individuals whose fitness is negatively frequencydependent. These converged states vary in their stability, with their durations depending on the parameters. When they end, they end with the imposition of stable positively-fds across the population, a state whose arrival can be hastened by making any of the following parameter changes: reducing the population size, increasing the chromosome length, and enabling crossover. References 1. Asmussen, M. A., Basnayake, E.: Frequency-Dependent Selection: The High Potential for Permanent Genetic Variation in the Diallelic, Pairwise Interaction Model. Genetics. 125, (1990) 2. Bürger, R.: A Multilocus Analysis of Intraspecific Competition and Stabilizing Selection on a Quantitative Trait. Journal of Math. Biology. 50, (2005) 3. Curtsinger, J. W.: Evolutionary Principles for Polynomial Models of Frequency- Dependent Selection. Proc. Natl. Acad. Sci. USA. 81, (1984) 4. Grefenstette, J. J.: Evolvability in Dynamic Fitness Landscapes: A Genetic Algorithm Approach. Proc. of the 1999 Conf. on Ev. Comp (1999) 5. Hillman, J. P. A., Hinde, C. J.: Evolving UAV Tactics with an Infected Genome. Proc. of the 5th UK Workshop on Comp. Intel (2005) 6. Ridley, M., 7. Roff, D. A.: The Maintenance of Phenotypic and Genetic Variation in Threshold Traits by Frequency-Dependent Selection. Journal of Evolutionary Biology. 11, (1998) 8. Schneider, K. A.: A Multilocus-Multiallele Analysis of Frequency- Dependent Selection Induced by Intraspecific Competition. Journal of Math. Biology. 52, (2006) 9. Trotter, M. V., Spencer, H. G.: Frequency-Dependent Selection and the Maintenance of Genetic Variation: Exploring the Parameter Space of the Multiallelic Pairwise Interaction Model. Genetics. 176, (2007)