Haploid-Diploid Algorithms

Similar documents
A Simple Haploid-Diploid Evolutionary Algorithm

The Evolution of Gene Dominance through the. Baldwin Effect

The Evolution of Sex Chromosomes through the. Baldwin Effect

Evolving Boolean Regulatory Networks with. Epigenetic Control

Development. biologically-inspired computing. lecture 16. Informatics luis rocha x x x. Syntactic Operations. biologically Inspired computing

1. The diagram below shows two processes (A and B) involved in sexual reproduction in plants and animals.

Chapter 8: Introduction to Evolutionary Computation

Lesson Overview Meiosis

Lesson Overview 11.4 Meiosis

Unit 6 : Meiosis & Sexual Reproduction

Repeated Occurrences of the Baldwin Effect Can Guide Evolution on Rugged Fitness Landscapes

9 Genetic diversity and adaptation Support. AQA Biology. Genetic diversity and adaptation. Specification reference. Learning objectives.

Linking levels of selection with genetic modifiers

What is a sex cell? How are sex cells made? How does meiosis help explain Mendel s results?

Classification of Random Boolean Networks

Haploid & diploid recombination and their evolutionary impact

Chapter 13 Meiosis and Sexual Reproduction

Evolutionary Computation

Essential Questions. Meiosis. Copyright McGraw-Hill Education

Ch. 10 Sexual Reproduction and Genetics. p

THINK ABOUT IT. Lesson Overview. Meiosis. As geneticists in the early 1900s applied Mendel s laws, they wondered where genes might be located.

Formalizing the gene centered view of evolution

The Dynamic Changes in Roles of Learning through the Baldwin Effect

that does not happen during mitosis?

Classification of Random Boolean Networks

Introduction. Key Concepts I: Mitosis. AP Biology Laboratory 3 Mitosis & Meiosis

Cell division / Asexual reproduction

Cellular Automata. ,C ) (t ) ,..., C i +[ K / 2] Cellular Automata. x > N : C x ! N. = C x. x < 1: C x. = C N+ x.

Ch 11.Introduction to Genetics.Biology.Landis

A Note on Crossover with Interval Representations

Parents can produce many types of offspring. Families will have resemblances, but no two are exactly alike. Why is that?

Sexual Reproduction. Page by: OpenStax

An Analysis of Diploidy and Dominance in Genetic Algorithms

Binary fission occurs in prokaryotes. parent cell. DNA duplicates. cell begins to divide. daughter cells

Introduction to Genetics

Meiosis. Two distinct divisions, called meiosis I and meiosis II

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

The Cell Cycle. The Basis for Heritability: Mitosis and Meiosis. Let s start with a banana.

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

Variation of Traits. genetic variation: the measure of the differences among individuals within a population

On Sex, Mate Selection and the Red Queen

Cell division / Asexual reproduction. Asexual reproduction. How about the rest of us? 1/24/2017. Meiosis & Sexual Reproduction.

CELL BIOLOGY - CLUTCH CH MEIOSIS AND SEXUAL REPRODUCTION.

Objective 3.01 (DNA, RNA and Protein Synthesis)

Chapter 10 Sexual Reproduction and Genetics

MEIOSIS C H A P T E R 1 3

Diploid versus Haploid Organisms

Evolutionary computation

The Role of Crossover in Genetic Algorithms to Solve Optimization of a Function Problem Falih Hassan

Sexual Reproduction *

Sexual Reproduction and Genetics

Meiosis and Sexual Reproduction. Chapter 9

V. Evolutionary Computing. Read Flake, ch. 20. Genetic Algorithms. Part 5A: Genetic Algorithms 4/10/17. A. Genetic Algorithms

Evolution of Genotype-Phenotype mapping in a von Neumann Self-reproduction within the Platform of Tierra

Evolutionary Computation. DEIS-Cesena Alma Mater Studiorum Università di Bologna Cesena (Italia)

Name Class Date. Pearson Education, Inc., publishing as Pearson Prentice Hall. 33

Sex accelerates adaptation

Meiosis produces haploid gametes.

Mitosis & Meiosis. PPT Questions. 4. Why must each new cell get a complete copy of the original cell s DNA?

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse

Evolutionary Design I

Unit 6 Test: The Cell Cycle

KEY: Chapter 9 Genetics of Animal Breeding.

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Science Unit Learning Summary

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Introduction to Genetics

Lecture 9 Evolutionary Computation: Genetic algorithms

Chapter 11 Meiosis and Sexual Reproduction

Chapter 13. Meiosis & Sexual Reproduction. AP Biology

Lesson Overview Meiosis

Sexual and Asexual Reproduction. Cell Reproduction TEST Friday, 11/13

Exam 1 PBG430/

Biology Kevin Dees. Chapter 13 Meiosis and Sexual Life Cycles

Chapter 13 Meiosis and Sexual Life Cycles. Reproduction

6-10 Sexual reproduction requires special cells (gametes) made by meiosis.

V. Evolutionary Computing. Read Flake, ch. 20. Assumptions. Genetic Algorithms. Fitness-Biased Selection. Outline of Simplified GA

Section 11 1 The Work of Gregor Mendel

A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA

Outline for today s lecture (Ch. 13)

Chapter 10.2 Notes. Genes don t exist free in the nucleus but lined up on a. In the body cells of animals and most plants, chromosomes occur in

IV. Evolutionary Computing. Read Flake, ch. 20. Assumptions. Genetic Algorithms. Fitness-Biased Selection. Outline of Simplified GA

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

Learning Objectives:

UNIT 3: GENETICS 1. Inheritance and Reproduction Genetics inheritance Heredity parent to offspring chemical code genes specific order traits allele

Measures for information propagation in Boolean networks

Sexual Reproduction and Genetics

Lecture 9: Readings: Chapter 20, pp ;

Meiosis and Fertilization Understanding How Genes Are Inherited 1

Sorting Network Development Using Cellular Automata

Complex Systems Theory and Evolution

1. If a eukaryotic cell has a single set of chromosomes, it is called A. haploid B. diploid C. polypoid

For a species to survive, it must REPRODUCE! Ch 13 NOTES Meiosis. Genetics Terminology: Homologous chromosomes

CHROMOSOMAL BASIS OF INHERITANCE

Reading Assignments. A. Systems of Cell Division. Lecture Series 5 Cell Cycle & Cell Division

Lecture Series 5 Cell Cycle & Cell Division

Name: Period: EOC Review Part F Outline

Origin of Sexual Reproduction

Ladies and Gentlemen.. The King of Rock and Roll

Evolving plastic responses to external and genetic environments

Transcription:

Haploid-Diploid Algorithms Larry Bull Department of Computer Science & Creative Technologies University of the West of England Bristol BS16 1QY, U.K. +44 (0)117 3283161 Larry.Bull@uwe.ac.uk LETTER Abstract This short paper uses the recently presented idea that the fundamental haploid-diploid lifecycle of all eukaryote organisms exploits a rudimentary form of the Baldwin effect. The general approach presented here differs from all previous known work using diploid representations within evolutionary computation. The role of recombination is also changed from that previously considered in both natural and artificial evolution under the new theory. Using the NK model of fitness landscapes and the RBNK model of gene regulatory networks it is here shown that varying landscape ruggedness varies the benefit of a haploid-diploid approach in comparison to the traditional haploid representation in both cases. Keywords: Baldwin effect, diploid, NK model, RBNK model, recombination, regulatory network.

1. Introduction The vast majority of work within evolutionary computation has used a haploid representation scheme; individuals carry one solution to the given problem. Prokaryotic organisms similarly contain one set of genes whereas the more complex eukaryotes are diploid. A small body of work exists using a diploid representation scheme within evolutionary computation; individuals carry two solutions to the given problem. In all but one known example, a dominance scheme is utilized to reduce the diploid down to a traditional haploid solution for evaluation (see [Bhasin & Mehta, 2016] for a recent review - which misses the early work by Holland [1975]). The exception is work by Hillis [1991] on sorting networks wherein non-identical alleles at a given gene loci were both used to form the haploid solution for evaluation, thereby enabling solutions of up to the combined length of the two constituent genomes. Eukaryotes exploit a so-called haploid-diploid cycle where the diploid organisms form haploid gametes that (may) join with a haploid gamete from another organism to form a diploid, etc. That is, each of the two genomes in an organism is replicated, typically twice each, with one copy of each genome also being crossed over. In this way copies of the original pair of genomes can be passed on, mutations aside, along with two versions containing a portion of genes from each. A number of explanations have previously been presented for why diploidy - or higher ploidy in general - is beneficial to eukaryotes, typically based around the potential for hiding mutations within extra copies of the genome (eg, see [Otto, 2007] for an overview). Similarly, there have previously been explanations for the emergence of the alternation between the haploid and diploid states, typically based upon its being driven by changes in the environment (after [Margulis & Sagan, 1986]). Recently, an explanation for the haploid-diploid cycle in eukaryotes has been presented which also explains the other aspects of their sexual reproduction, including the use of recombination [Bull, 2016], based on the Baldwin effect [Baldwin, 1896][Lloyd-Morgan, 1896][Osborn, 1896]. The Baldwin effect is here defined as the existence of phenotypic plasticity which enables an organism to reach different (better) regions of the fitness landscape than its genome directly represents. Over time, as evolution is guided towards better regions under selection, higher fitness alleles/genomes which rely less upon the phenotypic plasticity to find the regions can be discovered by mutation and become assimilated into the population (eg, see [Sznajder et al., 2012] for a recent overview). A change in ploidy can potentially alter gene expression - and hence the phenotype even if no

mutations occur between the lower and higher ploidy states since both genomes are active, through epigenetic mechanisms, through rates of changes in gene product concentrations, no or partial or co-dominance, etc. (eg, see [Chen & Ni, 2006]). In all cases, the fitness of the cell/organism is a non-trivial combination of the fitness contributions of the composite haploid genomes. If the cell/organism subsequently remains diploid and reproduces asexually, there is no scope for a rudimentary Baldwin effect. However, if there is a reversion to a haploid state, there is potential for a mismatch between the utility of the haploids compared to that of the diploid; individual haploids do not contain all of the genetic material over which selection operated. That is, the effects of genome combination can be seen as a simple form of phenotypic plasticity for the individual genomes before they revert to a solitary state and hence the Baldwin effect may occur. The Baldwin effect working in this way can be seen to alter the characteristics of the evolutionary process. In the haploid case, variation operators essentially generate a new genome at a point in the fitness landscape for evaluation. Under the haploid-diploid cycle, the variation operators create the bounds for sampling a region within the fitness landscape by specifying two end points, ie, each haploid genome to be partnered in the diploid. The actual position of the fitness point for the diploid taken from within that region then depends upon the percentage of the lifecycle the diploid state occupies - the larger, the closer to the midpoint (with all other things being equal). The effects of the recombination process then become clear with this insight: recombination moves the current end points which define the sample region either closer together or further apart, thereby altering the potential amount of learning under a given lifecycle percentage of the diploid state. Previous explanations for the recombination stage in eukaryotes vary from the removal of deleterious mutations (after [Kondrashov, 1982]) to avoiding parasites (after [Hamilton, 1980]) (eg, see [Bernstein & Bernstein, 2010] for an overview). The Baldwin effect view does not contradict such ideas but provides an underpinning mechanism for all aspects of sexual reproduction in eukaryotes. Similarly, the role of recombination in evolutionary computation has long been questioned, with many explanations proposed (eg, see [Spears, 1998]). The adjustment of end points defining a region of which sampling occurs applies equally well to evolutionary computation. Again, without contradicting pervious explanations. This paper presents a new class of evolutionary algorithm which exploits this new understanding of the natural phenomenon in eukaryotic organisms, using the NK [Kauffman & Levin, 1987] and RBNK [Bull, 2012] models.

2. The NK and RBNK Models Kauffman and Levin [1987] introduced the NK model of fitness landscapes to allow the systematic study of various aspects of evolution (see [Kauffman, 1993]). In the model an individual is represented by a set of N (binary) genes or traits, each of which depends upon its own value and that of K randomly chosen others in the individual. Thus increasing K, with respect to N, increases the epistatic linkage. This increases the ruggedness of the fitness landscapes by increasing the number of fitness peaks. The model assumes all epistatic interactions are so complex that it is only appropriate to assign (uniform) random values to their effects on fitness. Therefore for each of the possible K interactions, a table of 2 (K+1) fitnesses is created, with all entries in the range 0.0 to 1.0, such that there is one fitness value for each combination of traits. The fitness contribution of each trait is found from its individual table. These fitnesses are then summed and normalised by N to give the selective fitness of the individual. Exhaustive search of NK landscapes [Smith & Smith, 1999] suggests three general classes exist: unimodal when K=0; uncorrelated, multi-peaked when K>3; and, a critical regime around 0<K<4, where multiple peaks are correlated. Kauffman [1969] introduced the RBN model to explore genetic regulatory networks. A network of R nodes, each with B directed connections from other nodes in the network all update synchronously based upon the current state of those B nodes. Hence those B nodes are seen to have a regulatory effect upon the given node, specified by the given Boolean function attributed to it. Nodes can also be self-connected. Since they have a finite number of possible states and they are deterministic, such networks eventually fall into an attractor. It is well-established that the value of B affects the emergent behaviour of RBN wherein attractors typically contain an increasing number of states with increasing B. Three phases of behaviour were originally suggested by Kauffman through observation of the typical dynamics: ordered when B=1, with attractors consisting of one or a few states; chaotic when B>3, with a very large number of states per attractor; and, a critical regime around 1<B<4, where similar states lie on trajectories that tend to neither diverge nor converge (see [Kauffman, 1993] for discussions of this critical regime, e.g., with respect to perturbations). Subsequent formal analysis using an annealed approximation of behaviour identified B=2 as the critical value of connectivity for behaviour change [Derrida & Pomeau, 1986].

The combination of the RBN and NK models enables the exploration of the relationship between phenotypic traits and the genetic regulatory network by which they are produced the RBNK model [Bull, 2012]. In this paper, the following simple scheme is adopted: N phenotypic traits are attributed to randomly chosen nodes within the network of R genes (Figure 1). Thereafter all aspects of the two models remain as described, with simulated evolution used to evolve the RBN on NK landscapes. Hence the NK element creates a tuneable component to the overall RBN fitness landscape. Fig. 1: Example RBNK model. Each network consists of R nodes, each node containing B integers in the range [1, R] to indicate input connections and a binary string of length 2 B to indicate the Boolean logic function over those connections. The identity of the N trait nodes is assigned at random per model.

3. A Haploid-Diploid Evolutionary Algorithm A diploid representation consists of two genomes, each a complete solution to the given task. In contrast to previous studies, within a haploid-diploid evolutionary algorithm (HD-EA), a dominance scheme is not a prerequisite - both solutions within an individual are evaluated. Fitness is then a weighted average of the two evaluations: f = f A.(1- ) + f B. After selection, a parent copies each of its haploid genomes twice. One copy of each of the original two genomes is crossed over to create two new variants. One of the four resulting haploid genomes is then picked at random. Finally, mutation is applied to the chosen genome. This process is repeated for both parents and the new individual formed by the two genomes. The details of the selection process, recombination and mutation operators are not specific to the general approach. Specific examples are next described for the chosen tasks. 4. Experimentation 4.1 NK Model A standard steady state evolutionary process is used here: reproduction selection is via a binary tournament (size 2), replacement selection uses the worst individual, the population contains 50 diploid individuals, the constituent haploid genomes are binary strings of length N=50, mutation is deterministic at rate 1/N, single-point crossover is used, and =0.5. Since each diploid individual requires two evaluations, the comparative traditional haploid scheme is run for twice as many generations, with all other details the same. The results presented are the average fitness of the best individual after 20,000 generations (40,000 for the haploid), for various K. Each experiment consists of running ten random populations on each of ten fitness landscapes, ie, results are the average of 100 runs. As can be seen in Figure 2, for all K the HD-EA is better than or equal to the traditional haploid approach (T-test, p<0.05 for K=2, 10, 15). That the Baldwin effect is generally beneficial with increased fitness landscape ruggedness is well-established [Bull, 1999].

Fig. 2: Comparative performance of the best individual evolved using the haploid-diploid algorithm to the traditional haploid approach, over a range of fitness landscapes (N=50). Error bars show max and min values. 4.2 RBNK Model Most details of the experimentation remain unchanged from the NK model above. Individual solutions are now integer strings of length R=100 and mutation is applied deterministically at a rate 1/R. In contrast to bit-flipping, mutation can either alter the Boolean function of the randomly chosen node or alter a randomly chosen connection for that node (equal probability). Following the known behavior of B=2 RBN discussed above, only B=2 is used here. A fitness evaluation is ascertained by first assigning each node to a randomly chosen start state and updating each node synchronously for T cycles (T=50). Here T is chosen such that the networks have typically reached an attractor. At update cycle T, the value of each of the N trait nodes is then used to calculate fitness on the given NK landscape. This process is repeated ten times on the given NK landscape iteration. Each experiment again consists of running ten random populations on each of ten fitness landscapes, ie, results are here the average of 1000 runs (after [Bull, 2012]). Figure 3 shows how for all K>4 the HD-EA is better than the traditional haploid approach (T-test, p<0.05), otherwise performance is equal (T-test, p 0.05).

Fig. 3: Comparative performance of the best RBN individual evolved using the haploid-diploid algorithm to the traditional haploid approach, over a range of fitness landscapes (R=100, B=2, N=50). 5. Conclusion This short paper has presented a new form of evolutionary algorithm which exploits the haploid-diploid cycle seen in eukaryotic organisms. In contrast to previous work on diploid representations, both candidate solutions are evaluated and a combined fitness attributed to the individual. Since the diploid is reduced to a haploid under reproduction, there is a mismatch between the actual utility of the haploid and the fitness attributed to it. This is seen as exploiting a rudimentary form of the Baldwin effect, with the diploid phase seen as the learning step [Bull, 2016]. The scheme has been shown beneficial as the ruggedness of the fitness landscape increases for both a traditional binary-encoded optimization task and a dynamical network design task. No significant change in performance was seen for either task from varying the weighting of the haploid genomes (not shown) but this may not be true for other classes of problem. Similarly, every generation was assumed sexual here, whereas some eukaryotes can also reproduce asexually. The effects of varying the number of copies of genomes at the point of reproduction should also be explored the typical natural case of premeiotic doubling was used here. The use of many recombination schemes would also seem to hold the potential to improve performance.

References Baldwin, J. M. (1896). A new factor in evolution. American Naturalist, 30, 441 451. Bernstein, H. & Bernstein, C. (2010) Evolutionary origin of recombination during meiosis. BioScience, 60, 498-505. Bhasin, H. & Mehta, S. (2016) On the applicability of diploid genetic algorithms. AI & Society, 31(2), 265-274. Bull, L. (1999). On the Baldwin effect. Artificial Life, 5, 241 246. Bull, L. (2012) Evolving Boolean networks on tuneable fitness landscapes. IEEE Transactions on Evolutionary Computation, 16(6), 817-828 Bull, L. (2106) The evolution of sexual reproduction through the Baldwin effect. Arxiv preprint: https://arxiv.org/abs/1607.00318 Chen, Z.J., and Ni, Z. (2006). Mechanisms of genomic rearrangements and gene expression changes in plant polyploids. Bioessays, 28, 240 252. Derrida, B. & Pomeau, Y. (1986) Random networks of automata: a simple annealed approximation. Europhys. Lett. 1: 45-49. Hamilton, W.D. (1980) Sex verus non-sex versus parasite. Oikos, 35, 282-290. Hillis, D. (1991) Coevolving parasites improve simulated evolution as an optimization procedure. In C. Langton (ed.) Artificial Life II. San Mateo, CA: Addison-Wesley, pp313-324. Holland, J.H. (1975) Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.

Kauffman, S. (1969) Metabolic stability and epigenesis in Randomly Constructed Genetic Nets. J. Theoretical Biology, 22, 437-467. Kauffman, S. A. (1993). The Origins of Order: Self-Organisation and Selection in Evolution. New York, NY: Oxford University Press. Kauffman, S. A., & Levin, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128, 11 45. Kondrashov, A.S. (1982) Selection against harmful mutations in large sexual and asexual populations. Genetical Research, 40, 325-332. Lloyd-Morgan, C. (1896). On modification and variation. Science, 4, 733 740. Margulis, L. & Sagan, D. (1986) Origins of Sex: Three Billion Years Recombination. Yale University Press, New Haven. Osborn, H.F. (1896). Ontogenic and phylogenic variation. Science, 4, 786-789. Otto, S. (2007) The evolutionary consequences of polyploidy. Cell, 131. Smith, J., & Smith, R. E. (1999). An examination of tuneable random search landscapes. In W. Banzhaf & C. Reeves (Eds.), Foundations of genetic algorithms V (pp. 165 182). San Mateo, CA: Morgan Kauffman. Spears, W. (1998) Evolutionary Algorithms: The Role of Mutation and Recombination. Springer. Sznajder, B., Sabelis, M.W. & Egas, M. (2012) How adaptive learning affects evolution: Reviewing theory on the Baldwin effect. Evolutionary Biology, 39, 301-310.