Evolved Representations and Their Use in Computational Creativity

Size: px

Start display at page:

Download "Evolved Representations and Their Use in Computational Creativity"

Susan Little
6 years ago
Views:

1 Evolved Representations and Their Use in Computational Creativity Thorsten Schnier Thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy 1999 Key Centre of Design Computing Department of Architectural and Design Science University of Sydney, NSW, 2006, Australia thorsten@arch.usyd.edu.au

2 Acknowledgements I would like to thank my supervisor, Prof. John Gero, for his support and guidance throughout the period of this Ph.D. work, and for providing a very stimulating research environment. I also would like to thank the staff and fellow students in the Key Centre of Design Computing for interesting and open discussions, and for the help with numerous administrative and computing issues, especially in my early time at that Key Centre. Thanks also go to the authors of all the free software that has been used in the research and in the preparation of this thesis, among others GNU Common Lisp and CMU Lisp, LaTeX, Xfig, Xemacs, AucTeX, gnuplot, and the Linux/GNU operating system. Finally, I would like thank my partner for her love, support and encouragement, philosophical arguments about creativity, listening to and helping me express my ideas, and the preparation of Figures 8.15 and 8.22(a). This work is supported by a grant of the Australian Research Council and by a University of Sydney Postgraduate Research Award. Computing and other resources have been provided by the Key Centre of Design Computing. i

3 Abstract In any computational process, the representation used plays an important role. Depending on how the representation is chosen, some designs will generally not be representable, restricting the size of the search space. However, the representation can also change the topology of the search space, making some designs more likely to be the outcome of the design process than others. Generally, representations are designed to minimize this bias caused by the representation. This thesis explores how it is possible to develop specific representations that influence a design process in a useful, predictable way, and the possible use of such a method in creative computational processes. The work is based on evolutionary algorithms. These algorithms use an initial representation, which can be referred to as the basic representation, using symbols from an initial alphabet, the basic genes. This representation is developed into an evolved representation by adding a number of evolved genes. These evolved genes encapsulate fixed combinations of basic genes, and protect them from disruption in the evolutionary design process. As a consequence, design processes using the evolved representation will be biased in favour of these gene combinations. If the gene combinations are associated with certain specific features in designs, then a focus is introduced into the design process, centred around designs which show these features. To create appropriate evolved genes with minimal user interaction, a machine learning approach is developed. An evolutionary algorithm creates sample individuals from a set of user provided example designs, and evolved genes are created from successful gene combinations in those sample individuals. The thesis shows that the creation of a focus using an evolved representation can fulfil the procedural definitions of creativity found in the literature, if certain finite system restrictions are taken into account. The only additional requirement is the ability to modify the focus in the design process. Since the focus is created by the evolved representation, modifying the evolved representation provides a way to induce these changes in the focus. Creation, use and transformation of evolved representations are demonstrated using two different example applications. In the first example, floor plans of Frank Lloyd Wright s Prairie Houses are used to create an evolved representation; designs produced using this representation show similarities to the example floor plans. The possibility for a transformation of the search space is demonstrated by a modification of the evolved genes, which changes the proportions of living and service areas in the designs created. In the second example, a tree structure is used for the basic representation, requiring a more complex notion of evolved genes. The evolved representation is created from a set of paintings by the Dutch painter Piet Mondrian and from a window design by Frank Lloyd Wright. Again, designs produced using the representations show similarities to the paintings and the window design respectively. Two different transformations of the focus are demonstrated in this example: mixing the two representations created from the paintings and the window, and adapting the evolved representations for use in a different ii

4 Abstract iii application domain. Analysis of data gathered during the example runs is used to gain quantitative measures of the distribution of the evolved genes created, the transformation of the search space, the interaction of the evolved representation with the fitness, and other characteristic values.

5 Contents Acknowledgements Abstract i ii 1 Introduction Goal of the Thesis Organization of the Thesis Background Evolutionary Algorithms Background Variable-length Representation Genotype-Phenotype Distinction Multi-level Representation of Designs Human and Computational Creativity Creativity as Introduction of Variables Creativity as Exploration and Transformation Evolved Representations Using the Representation to Focus Design Processes Basic Implementation Schema Use of Evolutionary Algorithms Influencing Search Space using Evolved Representations Transformation of the Search Space Creating Evolved Representations Feedback into the evolutionary system Creating New Designs Using Evolved Representations Computational Creativity Finite Systems and Closed Worlds Focussing: Creativity in a Finite System Is Focussing Creative? Soft Focus versus Hard Focus Moving the Focus Humans and Finite Systems Requirements for a Creative Design Process Evolved Representations in Computational Creativity Focussing and Transformation using Evolved Representations Methods for Transforming the Representation Mixing Evolved Representations Filtering Genetic Material iv

6 CONTENTS v Direct Manipulation of Evolved Representation Control of Transformation Operations Other Interpretations of Evolved Representations Evolved Representations and Schema Theorem Evolved Representations and Genetic Engineering Evolved Representations and Style Evolved Representations and Case-based Design Evolved Representation and Cross-Breeding Combining Genetic Material The Unit of Inheritance Evolved Representations and Artificial Life Evolved Representations in Other Work Evolved Representations in Genetic Programming Evolved Representations using other Representations Other Work on Computational Creativity Implementing Evolved Representations Formal Notation Influence of Evolved Genes on Evolutionary Algorithm Evolved Representation: More than a Collection of Evolved Genes Creation of Evolved Genes Other training methods Non-consecutive Evolved Genes Interlocking Evolved Genes Using Dominant/Recessive Genes Example Applications General Implementation Selection Design Example 1: Generating Floor Plans Background The Basic Representation Learning Representation Using evolved genes to produce new designs Shifting focus by manipulation of evolved genes Discussion of Results Design Example 2: Generating Paintings Background Representing Mondrian Paintings Representing a Frank Lloyd Wright Window Definition of Evolved Genes Creating Evolved Representation Using the Evolved Representation Transforming the Evolved Representation Discussion of Results

7 CONTENTS vi 9 Quantitative Analysis of Examples Generation of Evolved Genes Population Dynamics Fitness sharing Produced genes Effect of Gene Creation on Population Use of dominant/recessive gene pairs Use of evolved genes Focussing effect of evolved genes Usage of evolved genes in new designs Influence of evolved representation on convergence time and results Discussion Results of the Example Runs Evolved Representations can be Created using Machine Learning Evolved Representations Incorporate Knowledge Evolved Representations Transform the Search Space Evolved Representations Focus the Design Process Evolved Representation Interacts with Fitness Transformation of the Representation Changes the Focus Is it Creative? Future Work Automatic Control of Transformations Higher-Level Evolved Genes Conclusion Abstracts of Published Papers 139

9 Chapter 1 Introduction In any kind of design activity, the design worked on has to be represented in some way. For a human designer, designs are for example represented using models, drawings, or verbal descriptions. If a computer is used for design work, designs are usually represented by groups of pixels (paintbrush programs), lines and shapes (general-purpose CAD programs) or higher-level objects like walls and rooms (purpose-specific CAD programs). In general language, the term representation method can be used to refer to these different ways to represent designs, while representation refers to a particular instance. Used in the computer science context, however, the word representation itself often is used to refer to the method of representation, as in floating point representation. In this thesis, this second meaning of representation is used; for example evolved representations in the title of the thesis is to be interpreted as evolved methods of representation. A human designer usually has a large number of representations available, and can use the representation most suitable for what he or she is working on. Humans can also introduce new representations and thereby represent objects that are not part of the world they experience with their sensory organs, for example vector representations of four and five dimensional objects. In design computing on the other hand, the representation or representations used have to be explicitly defined. Many different representations have been suggested, often optimized for specific design domains or design methods (see for example Shih, 1991; Bentley and Wakefield, 1996; Olivier et al., 1996; Stal and Turkiyyah, 1996), but each individual computational design system has only one or very few different representations available. Whatever the choice of the representation, it is likely to influence the outcome of the design process. In any representation, some designs may be more difficult to represent than others, and some designs may not be representable at all. For example, a computer might allow the design of a perfectly sinusoidal-shaped roof, while it is difficult for it to represent a roof that follows a freehand form. In a cardboard-model, on the other hand, a freehand form would be much easier to generate than a perfectly sinusoidal shape. Anecdotal evidence also points to the fact that architects who design on paper tend to produce different kinds of designs than those who always use 3D models to design. The same applies if the design process is implemented in a computer program. If a design cannot be represented with a given representation, it cannot be the outcome of a design process using this representation. As it is the case for human designers, it is also possible that the representation influences a computational design process such that it is easier for the program to find some designs than others. Depending on the design process used, this might make those designs a more likely outcome of the design process. This 1

10 Introduction 2 is for example the case with stochastic optimization processes, like evolutionary systems and simulated annealing. In these cases, the representation is likely to introduce a bias into the design process. The selection of the representation is therefore of high importance in the development of a computational design system. Obviously, while choosing the representation the programmer has to ensure that all or as many as possible potentially interesting designs can be represented. But it is also generally desirable to minimize the bias introduced by the representation. As opposed to the user-provided design criteria, the bias caused by the representation influences the outcome of the design process in an implicit way which is not obvious to the user, and difficult to predict and control. The idea developed in this thesis is that it is possible to turn the bias caused by the representation into a virtue, by deliberately choosing or modifying the representation to influence the design process in a certain, desired way. The resulting focussing of the search process is connected to the idea of expansion of search spaces, a notion used in some definitions of computational creativity. Both focussing and expansion of search space will be explored in this thesis, and demonstrated in example implementations. 1.1 Goal of the Thesis The goal of this thesis is to develop a method to create representations for evolutionary systems that can be used to introduce a user-defined bias into an evolutionary design process, and to show how such a method can be used to implement a creative computational process. More specifically, this method should: allow the creation of appropriate representations with minimal user interaction, allow use of the representations in the production of new designs, allow modification of the bias, be applicable in a range of design domains, and fulfil a set of minimum criteria for a creative computational process which will be defined in the thesis. 1.2 Organization of the Thesis The first chapter after the introduction (Chapter 2) gives a short overview of the research areas that have inspired this work. The three following chapters describe the idea of using evolved representation to focus a search process, and its possible use in computational creativity. The first of these chapters (Chapter 3) introduces the idea of evolving representations, and describes how they can be used to influence a search process. The following chapter (Chapter 4) contains general considerations about current definitions of computational creativity and the different ways computational processes can search a search space. The ideas from these two chapters are brought together in Chapter 5, which shows how evolved representations can be used in the implementation of a creative computational process. However, creative processes are not the only possible use of evolved representations, and Chapter 6 discusses the use of evolved representations from a number of other perspectives. It also describes the work of other authors that in some ways is similar either in means or in goals to this work.

11 Introduction 3 The different elements required in the implementation of systems using evolved representation are described in more detail in Chapter 7. The two following chapters present the example implementations: Chapter 8 shows the implementation and the qualitative results, and Chapter 9 presents the quantitative data that can be gained from those examples. Finally, Chapter 10 discusses both the qualitative and quantitative results of the examples, suggests some directions for further development, and presents the conclusions that can be drawn from this work.

12 Chapter 2 Background This works draws ideas and methodologies from a number of different research areas. Design computing research, with the existing research on computational creativity, is a very important part of the background. Since the notion of evolved representation, which is developed in this thesis, is based on evolutionary algorithms, the literature on evolutionary algorithms also forms an important part of the background. This section contains only the background material necessary to underpin the following three chapters of the thesis. Additional background information and references will appear in the individual sections of Chapter 6, where other research areas and their connections to the methods and ideas used in this work will be presented. Further background material, which is related only to the particular example applications, will be described in short introductory sections in Chapter Evolutionary Algorithms Background Neo-Darwinism, that is, Darwin s theory of evolution through natural selection (Darwin, 1979) combined with the modern knowledge of genetics, is nearly universally accepted as the theory of the development of the species that exist or have existed on Earth. This theory has been summarized as survival of the fittest, a term first used by a contemporary of Charles Darwin, Alfred Wallace (Bronowski, 1973). If evolution is seen as an unsupervised search for adapted species in an unknown fitness landscape, it becomes a useful metaphor for computational systems, which can be described as population-based search algorithms. Evolutionary algorithms, evolutionary systems or evolutionary computation, are generic terms for a number of algorithms that follow this metaphor (Fogel, 1997). Genetic algorithms (Holland, 1992), evolutionary strategies (Schwefel, 1995), evolutionary programming (Fogel, 1995) and genetic programming (Koza, 1992) are the main instances of evolutionary algorithms (For a history of evolutionary computation, see De Jong et al., 1997). Figure 2.1 shows the basic flowchart of an evolutionary algorithm. It is a cyclic process, which is usually terminated either when sufficiently good results have been produced, or when the algorithm has failed to improve results for a number of cycles. The loop contains the four key elements of an evolutionary algorithm. Population of individuals: each individual represents a point in the search space; this point is defined by a so-called genotype. Selection is used to chose the individuals used as parents, and to decide which individuals in the population survive into the next population. To do this, an evaluation function is used to calculate a fitness value; individuals with higher fitness are 4

13 Background 5 Create initial population Termination condition? N Select parents Y Recombination (*) Mutation (*) Select new population from offspring and current population *: see text Figure 2.1: Basic flowchart of evolutionary algorithms more likely to be selected as parents and less likely to be removed from the population. Recombination is used to generate new individuals by combining the genotypes of two or more individuals (parents) in the population. The usual operation is the crossover operator, which assembles the new genotype from sections taken from the parent genotypes. The simplest cross-over operator, the single-point crossover, is shown in Figure 2.2. Mutation introduces additional variety into individuals, by randomly changing parts of the genotype. Some versions of evolutionary algorithms use only either recombination or mutation, for example evolutionary programming/strategies only employ mutation. Parents Crossover point Offspring Figure 2.2: Single-point cross-over operator For more details on particular algorithms, and on many other aspects of evolutionary algorithms, see Bäck et al. (1997). Evolutionary algorithms have found application in many different areas of design, for example electronic circuit design (Koza et al., 1998), aircraft design (Wright and Holden, 1998; Cvetkovic et al., 1998) layout design (Rosenman, 1996; Hower et al., 1996), design of frame structures (Ghasemi and Hinton, 1998) (for more examples of evolutionary algorithm applications, see Parmee, 1998b) The main advantage of evolutionary algorithms compared with other search algorithms is that they are well suited

14 Background 6 to large search spaces, where only little domain knowledge is available. But they also allow for very flexible representations. Both reasons make evolutionary algorithms a key tool in the research reported in this thesis, see Section Variable-length Representation In classic evolutionary algorithm applications, all individuals in the population are represented by genotypes of identical length. For some applications, however, it is more appropriate to allow different genotype lengths, requiring modifications in the genetic operations. Figure 2.3 shows a modified version of the single-point crossover: in each parent genotype, a random crossover point is selected, and the start or end segment is then swapped. The resulting genotype can be shorter or longer than the genotype of either of the parents, allowing for the size of the genotypes in the population to adapt to the requirements of the fitness. Parents Crossover points Offspring Figure 2.3: Variable length single-point crossover Genotype-Phenotype Distinction In natural reproduction, a distinction can be made between genotypes and phenotypes (Langton, 1991). The word genotype refers to the genetic material of an individual, while the word phenotype refers to its physical appearance. This distinction is important because the genetic operations operate only on the genotype, while the fitness of an individual is determined by the performance of the phenotype. A transformation occurs from genotype to phenotype, in natural systems this is usually called development (for example De Robertis et al., 1990). It is strongly influenced by the environment the transformation occurs in, the same genotype can therefore lead to different phenotypes. Genotype-phenotype development has been studied as part of the Artificial Life research, (for example in the work on L-systems, Lindenmayer and Prusinkiewicz, 1988; de Boer et al., 1991). In evolutionary algorithms, the genotype-phenotype distinction is rarely made. Since environmental influences are generally not modelled in these systems, the genotype-phenotype transformation is a direct mapping and can therefore be considered as part of a fitness function that works directly on the genotype or a directly mapped phenotype Multi-level Representation of Designs The representation of a design used by the computer will not generally be readily interpretable for a human user. A transformation will therefore be used to transform the design from the internal representation, for example a list of points, into a human-readable representation, for example a drawing (Figure 2.4). As described earlier (Section 2.1.2), a distinction can be made in evolutionary algorithm applications between the genotypes and the phenotypes of individuals. In this case, up to three different representations can be involved: the genotype representation

15 Background 7 ((0,0) (4,0)) ((4,0) (4,4)) ((4,4) (0,4)) ((0,4) (0,0)) Internal representation Human-readable representation Figure 2.4: Internal representation and human readable representation the phenotype representation a human-readable representation In most applications of evolutionary algorithms in design, two of these representations are identical. In systems where user feedback is used for fitness calculation (for example Graf and Banzhaf, 1995; Nagasaka et al., 1995) the phenotype has to be humanreadable. In many other systems, no distinction between genotype and phenotype is made, the fitness is calculated directly from the genotype. The transformations between genotype space and phenotype space and between phenotype space and human-readable phenotype space can be one-to-one or n-to-one. If non-linear transformations are used, they are generally not reversible, for example it might not be possible to determine what genotype would be required to produce a certain phenotype (see Section 4.1). Figure 2.5 shows a schematic illustration of the different spaces. The genotype in this case is composed from the alphabet 0, 1, 2, 3, transformations (which the user does not have to understand) are used to map the genotype into a phenotype and into a human-readable phenotype. In this figure, the genotype-phenotype transformation is an n-to-one transformation, while the transformation into human-readable form is one-to-one. N-to-one genotype-phenotype transformation are often generative procedures (e.g. grammars), where the genotype can be described as a recipe, as opposed to a blueprint in the one-to-one mapping. Genetic operations Fitness User (a) Genotype space Phenotype space Human-readable space Genotype (b) ((0,0) (4,0)) ((4,0) (4,4)) ((4,4) (0,4)) ((0,4) (0,0)) Phenotype Human-readable phenotype Figure 2.5: Different representation spaces involved in design using evolutionary computations: (a) transformations between the spaces; (b) example members of the spaces The search space therefore consists of up to three different spaces: as a genotype search space, a phenotype search space, and a search space of human readable designs.

16 Background 8 The size and extent of the genotype search space is defined by the genotype representation. If only one-to-one transformations are used, the number of elements in each of the search spaces will be identical. In other cases, the phenotype and human readable search spaces can have fewer elements. It is also possible to have a oneto-n genotype-phenotype transformation, where the phenotype produced is either selected randomly from a set of phenotypes (for example in stochastic L-systems, see Prusinkiewicz et al., undated), or where interaction with the environment is taken into account in the genotype-phenotype translation. A general recommendation for representations used in evolutionary algorithms is that the representation should be suggested for the problem at hand (Fogel and Angeline, 1997), in other words the units of representation should be meaningful for the particular application. Two more specific criteria that influence the choice are (Hammel, 1997): the representation should not exclude any possibly interesting designs, and the representation should allow only possibly interesting to be represented. Possibly interesting in this context means that there might exist a set of conditions where the user would want this design as output of the design process. The second condition is intended to restrict the size of the search space and therefore the complexity of the search process. 2.2 Human and Computational Creativity Human creativity is an extremely interesting phenomenon. While every person has a notion of creativity, it proves to be very hard to define. Expressions like a creative solution are commonly used, but it is difficult to give a reason for exactly why something is considered to be creative. When attempts are made, two characteristics requirements are usually mentioned: an item has to be novel and it has to be useful. The usefulness requirement is intended to exclude simple random novelty : a creative design has to fulfil a purpose. The requirement of novelty needs a few more considerations. On one side, novelty can only be relative; a frame of reference is required. For creative processes, this reference could be either the creator of an artefact, or the observer. Since each observer would have different experiences, the judgement of creativity relative to the observer could be different as well. A consistent, and therefore useful, judgement can only be made if the novelty is judged with respect to the creator, a concept called psychological creativity or P-creativity in Boden (1995). On the other side, the concept of novelty itself is not restricted enough. The sentences in this paragraph are (hopefully) useful, and many of them are probably novel, however, they would not be called creative. As Boden points out: A merely novel idea is one which can be described and/or produced by the same set of generative rules as are other, familiar ideas. A genuinely original, or creative, idea is one which cannot (Boden, 1995). As she says later, transformations of the conceptual space are therefore required before any genuinely really novel object can be produced. In computational terms, this requirement has been defined as the addition of variables and the extension of the search space (Gero, 1994). Given that creativity itself is very difficult to define, it is not surprising that precise computational definitions are rare. Two attempts to define necessary conditions for a

17 Background 9 creative process will be summarized in this section. Additional work on creative computational processes will be discussed in Section Creativity as Introduction of Variables Creative design has been characterized in computational terms as that design activity which occurs when a new variable is introduced into the design (Gero, 1990). This is opposed to routine design, where knowledge about variables, objectives expressed in terms of those variables, constraints expressed in terms of those variables and the processes needed to find values for those variables, are all known a priori. A third alternative, innovative design, occurs when no new variables are introduced, but one or more variables are used with values outside the usual scope. In computational terms, routine design can also be seen as search within a fixed, predefined search space (the n-dimensional design space). This is also sometimes referred to as exploitation. For creative design, where the search proceeds outside the boundaries of a predefined search space, the term exploration can be used (see Gero, 1990). Creative design according to this definition requires the addition of one or more variables, but it does not require that the other variables remain unchanged. There are therefore two cases that can be distinguished: one or more variables are added, while the other variables remain part of the design (additive case), and one or more variables are added, while at the same time one or more other variables are deleted from the design (substitutive case). The two cases can best be visualized in terms of design space. If a design has n variables, then these variables can be seen as coordinates in an n dimensional space. Every possible design can thus be mapped onto a point in this space, the design space. (a) (b) (c) (d) Figure 2.6: Changing search space: (a) original search space, (b) search space with expanded borders, (c) moved search space due to variable substitution, (d) expanded search space due to variable addition. Figure 2.6 illustrates the concepts described above in a simplified, schematic way. The original search space with two dimensions is illustrated in Figure 2.6(a). Figure

18 Background (b) shows an instance of innovative design (as defined above), where no new variables are introduced, but the existing ones use values beyond the usual range. Figure 2.6(c) and (d) illustrate the two possibilities of creative design : moving the search space by substituting variables (c) and expanding the search space by adding variables (d). (x,y) r (a) (b) Figure 2.7: Variable addition: (a) homogeneous addition, (b) heterogeneous addition With respect to the addition of variables, again two cases are distinguished: homogeneous addition: the variable added is of the same type as other variables already in the design (for example transform an irregular pentagon into a hexagon by adding a side, see Figure 2.7(a), but see Section 4.2 ), and heterogeneous addition: the variable added is of a different type from those already in the design (for example changing one side of a triangle into a section of a circle, the variable added is the radius of the circle, see Figure 2.7(b)) Creativity as Exploration and Transformation Maher, Boulanger, Poon and Gomez (1995) characterize the way designs are generated by two variables: exploration and transformation. Exploration describes the motion through an existing design space. This variable can assume the values mundane, novel and original. Mundane exploration uses mostly converging operations, whereas original exploration uses diverging operations to a large extent. Transformation characterizes the way the design space is changed. The values mundane, novel and original for transformation are defined corresponding roughly to the definitions of routine, innovative and creative design in Gero (1990). Exploration (E) original (o) novel (n) oe/mt oe/nt oe/ot ne/mt ne/nt ne/ot creative zone transition zone non-creative zone mundane (m) me/mt me/nt me/ot mundane (m) novel (n) original (o) Transformation (T) Figure 2.8: Framework to classify design processes in terms of exploration and transformation (from Maher, Boulanger, Poon and Gomez, 1995)

19 Background 11 Figure 2.8 shows how in this framework the two variables define a matrix that can be used to characterize design processes. To be classified as creative, a process needs to use non-mundane levels of exploration or transformation. The area of original exploration and transformation is excluded, because it is assumed to be highly unstable. Maher, Boulanger, Poon and Gomez point out that because of the inherent difficulty in measuring creativity, the framework is only intended to give a measure of the potential of being able to produce creative designs.

20 Chapter 3 Evolved Representations As described in the introduction, the choice of a representation will have an influence on the result of a design process using this representation. Any particular representation might make some designs impossible to generate, and make some designs less likely to be produced than others. The first of these effects is often used to restrict the size of the search space; the second however is generally avoided, because it is implicit and difficult to predict and control. In this chapter, it will be shown that it is possible to create representations that bias a search process in a predictable and controllable way. This means that the representation introduces a user-controllable focus into the design process. Designs inside this focus, those showing user-defined features, have a higher probability to be the result of the design process than designs from other areas of the search space. 3.1 Using the Representation to Focus Design Processes To make use of the influence that representations can have on a design process, any implicit bias introduced by the representation has to be replaced by a bias that is both predictable and controllable by the user. The goal is therefore to find a way to create a representation for an evolutionary system that transforms the search space in a way such that designs are more likely to be generated that show certain preferred attributes. This goal can be divided into two parts. Identify a method to influence a design process in a predictable way by modifying the representation used in this process. Many different representations are used for design, and variations on these representations influence the design process in different ways. The method should be applicable to different representations in different applications. Design a mechanism that allows the creation of such a representation for particular design problems. Ideally, this mechanism would require little user-interaction. The following section presents an intuitive view of the algorithm that has been developed, and the sections following describe how this algorithm achieves the two parts of the goal described here. 3.2 Basic Implementation Schema The general schema of the algorithm is shown in Figure 3.1. The first step is the same as in any design computing application: the definition of a representation. In this case 12

21 Evolved Representations 13 Basic representation (a) Creation of evolved representation Design examples (b) Use of evolved representation Basic and evolved represetation (c) New design using evolved representation (d) Figure 3.1: Basic concept of use of evolved representations in creative design however, this is only an initial representation, referred to as the basic representation. It is designed to allow a very large search space, including as many possibly interesting designs as possible, and is therefore usually very basic and low-level. In the example, the basic representation is based on squares that can be connected to form larger shapes (Figure 3.1(a)). In the second step, the system is put into a training situation, where a search algorithm using the initial representation is set to solve a simple design task: to produce phenotypes that are (partial) copies of a set of examples given to the system. During this phase, a meta-level process observes how the basic representation is used. It identifies patterns in the genotypes of the individuals that are particularly successful, and modifies the representation used by the system by adding symbols for these patterns. The result is a new, complex or evolved representation, biased in favour of common features in the designs produced in the training session. In Figure 3.1, L-shaped shapes appear in the design examples (Figure 3.1(b)), therefore the representation is expanded by adding a symbol for this shape (Figure 3.1(c)) The designs produced in this step, copies of the examples, are discarded. However, the evolved representation provides the required focus, centred around the examples. A regular search algorithm, using this evolved representation, can then be used to produce new designs that are likely to be similar to the examples. The effect of the evolved representation depends on whether the basic representation is replaced by the evolved representation or whether the evolved representation is added to the basic representation. In this example, the L-shape is only added, and the new designs can use both the original square and the L-shape (Figure 3.1(d)). This basic algorithm solves the two parts of the goal specified in the previous chapter. The representation is manipulated by adding new symbols to the alphabet used in the representation. These additional symbols represent certain features, and by introducing them into the representation designs containing these features are favoured; in other words the additional symbols create a focus in the search space. The additional symbols can be identified automatically, using machine learning. No user interaction is required, only the provision of a set of example designs. Both points will be elaborated in the following sections. A system as described here also allows for the transformation of the focus thus created, simply by modification of

22 Evolved Representations 14 the representation. This feature is important in connection with creative design, as will be shown in Chapter Use of Evolutionary Algorithms The creation of an evolved representation from a training situation requires two main features: the ability to produce copies or partial copies of the examples without additional knowledge or supervision, and a flexible and easily modifiable representation. At the same time, the evolved representation is intended to be used to produce new designs. Therefore, it has to be compatible with the search method that will be used to produce these new designs. Evolutionary algorithms seem to fit these requirements very well. They require only a fitness value that can easily be calculated from a comparison between phenotypes and the design. As will be shown in the following sections, they also allow for the manipulation of the representation during the search. Finally, as the large body of existing design systems using evolutionary algorithms shows, they are also very well suited to the generation of the new designs. This means that the same type of algorithm can be used for the creation and for the use of the evolved representation, and the compatibility of the representation is therefore ensured. 3.3 Influencing Search Space using Evolved Representations In evolutionary algorithms, a bias towards particular designs can be introduced either in the genotype representation, or in the genotype-phenotype transformation. For example, using evolutionary algorithms with variable-length genotypes, individuals with short genotypes are generally easier to find than individuals with long genotypes. Similarly, if the genotype-phenotype transformation is such that particular phenotypes can be represented by many genotypes, these phenotypes would be expected to be easier to find than phenotypes that are represented by only one genotype. In the method presented here, the biggest influence on the search process comes from the first effect: designs with certain, desired features are represented with shorter genotypes, and therefore easier to find. However, the method also introduces new ways to represent these designs, which again improves the chances of these designs in the design process. To illustrate the creation of a focus, an evolutionary system with a string representation is used. In such a system, the genotypes are strings of fixed or variable length, constructed from symbols of a predefined alphabet (see Section 2.1). To create a focus in the search space of such an evolutionary algorithm, the representation used in this algorithm is modified by the introduction of additional symbols to the original alphabet. To distinguish the introduced symbols, they will be referred to as evolved genes, while the original symbols will be called basic genes. Evolved genes can be used together with the basic genes to produce new genotypes. As a result, two different kinds of genotypes can be distinguished: genotypes that use only basic genes (referred to as basic-level genotypes ), and genotypes that also use evolved genes (referred to as evolved-level genotypes ). This introduces another representation level, as the genotype representation is now split into a basic genotype representation (or basic representation ) and an evolved genotype representation (or evolved representation ), as shown in Figure 3.2. The evolved genes are defined such that each evolved gene represents a certain combination of basic genes (see Section 7.1 for details). As Figure 3.2 shows, evolved-level

23 Evolved Representations 15 A A C A = < 0 0 > C = < > Evolved-level genotype and evolved representation Basic-level genotype Human-readable phenotype ((0,0) (4,0)) ((4,0) (4,4)) ((4,4) (0,4)) ((0,4) (0,0)) Phenotype Figure 3.2: Additional layer of representation caused by the use of evolved genes genotypes can therefore be transformed into basic-level genotypes by replacing each evolved gene with the set of basic genes it represents. For example, if the gene A in the figure appears in an evolved level genotype, it indicates that in the corresponding basic level genotype, this and the next position is filled by basic gene 0; evolved gene C indicates a sequence of four basic genes 2. The original genotype-phenotype transformation can then be used to generate a phenotype. In the general case, the evolved-level genotype to basic-level genotype transformation is an n-to-one transformation, there will be many different ways to represent the same basic level genotype using evolved genes Transformation of the Search Space Since the evolved genes are represented by single symbols in the genotype, they are atoms for the evolutionary operations. At the same time, they represent a combination of basic genes, effectively encapsulating this gene combination. As a result, this gene combination cannot be broken up by any genetic operation. It still can be removed as a whole, but its chances of surviving a genetic operation are much higher than for all other gene combinations that occupy the same positions on the basic-level genotype, but are not encapsulated into an evolved gene. The more basic genes an evolved gene encapsulates, the stronger is this effect. Similarly, if evolved genes are used in the creation of an initial population, then the gene combinations represented in the evolved genes will have a higher chance of being represented in an individual than any other, random combinations of basic genes. The effect of the introduction of evolved genes is, therefore, that certain combinations of basic genes will be advantaged in the genetic search. It follows that evolved genes can be used to bias the search of the evolutionary system in favour of this feature if combinations of basic genes can be identified such that the probability that a certain feature is present in the phenotype is higher if the gene combination is present in the genotype than if it is not present. The introduction of evolved genes can be seen as a transformation of the search space, as illustrated in Figure 3.3. The example assumes a variable-length representation where each basic gene in the genotype is directly translated into a movement of a pen in a certain direction. The original alphabet therefore has four members, shown in the figure as a, b, c and d. The genotype-phenotype transformation transforms each letter into the movement of a pen, for example each occurrence of the symbol a in a genotype results in a upward movement of the pen of one unit length. The genotype dacb describes a simple square, constructed by the movement of the pen one unit to the left, one unit upwards, one unit to the right, and one unit down. The genotype dbca represents the

24 Evolved Representations 16 same square, however the pen ends up at a different corner or the square. In the figure, the endpoint of the movement is indicated with an arrow. The search space can be illustrated by a number of concentric circles, each defining the space of designs that can be defined by a genotype of a certain length. The inner circle contains the designs represented by genotypes of length one, in other words the basic genes translated into phenotypes. The further away a design (or part of design) is from the centre, the larger is the genotype required to represent it, and with it the larger the space that has to be searched to arrive at this design. (a) ac a b c d dd daa dacb dacbc dbc cb dbca dbd dbcc bdacc dbcbc dbccadbbdacbca BA Ac a b A c d B 1 Ab 2 BdA BAc BdB 3 BcaB ABcA 4 ABcAB 5 (b) cb Bc Bac BcAc BcAcB Figure 3.3: Example of an evolved representation: (a) original representation; (b) representation with evolved genes. Some of the corresponding genotypes are given, capital letters denote evolved genes. The transformation from phenotype to genotype is not always unique, e.g. the genotypes ABc and BAc produce the same phenotype. Arc segments indicate that only part of the space is shown. The original search space is illustrated in Figure 3.3(a), with the four basic genes in the centre. The second circle shows all designs that can be derived from genotypes of length two (i.e., using two vectors). The other circles give some examples of designs using genotypes of length three, four and five. Every time an evolved gene is created, the structure of the search space is changed. The state of the new gene in the search space is moved into the centre, all design states in the next circle that can be derived from that state are moved into the second circle, and so on. For example, if an evolved gene is introduced for each of the combination of four consecutive basic genes that represent the two closed shapes in the fourth circle, the search space changes as shown in Figure 3.3(b). The squares are now represented directly by an evolved gene, and the shapes on the fifth circle that are derived from the squares can now be found in the second circle. The greater number of evolved genes a design state involves, the more it is moved towards the centre. For example, the shape with the four squares that is now on the fifth circle (that is, can be constructed from genotypes of length five) would have been on the fourteenth circle before 1. Since the introduction of a new gene increases the size of the alphabet of the representation, more genotypes of a given evolved-level genotype length exist, and the size 1 fourteen vectors, because the shape cannot be drawn without drawing two lines twice

25 Evolved Representations 17 of the search space for a given length increases. This is illustrated by using larger circles in Figure 3.3(b). However, the reduction in genotype length has a much stronger, search space reducing, effect, as can be shown for the four-square shape. To produce this shape using basic genes only, the search space consists of all basic-level genotypes of length fourteen, with four basic genes, containing 4 14 = 268, 434, 456 elements. To produce the same designs using basic and evolved genes, the search space would consist of all genotypes of length five with 6 symbols in the alphabet, containing 6 5 = 7, 776 elements. Creating complex representations from simpler, basic representations has been shown in other work, but generally for a different purpose. See Section 6 for a discussion of such work. 3.4 Creating Evolved Representations The previous section showed how evolved genes can be used to influence an evolutionary search in such a way that certain features are favoured. The second task is to find a way to identify the appropriate combinations of basic genes, so that the evolved representation can be created. Creating an appropriate evolved representation is straightforward in the case where the features that are intended to be included are explicitly known, and the genotypephenotype representation is such that it is possible to directly map those features onto gene combinations. However, neither of those conditions is usually fulfilled. Explicitly enumerating all desired features requires a high amount of user input, and the genotypephenotype translation can be such that a reverse translation is difficult or impossible. It is therefore necessary to find a different method to create the evolved representation. Machine learning can provide such a method. Figure 3.4 shows a schematic outline of a system employing machine learning to create the evolved genes. The central element is a user provided example. The features present in this example will provide the centre of the focus created by the evolved representation. The only user input required is the provision of this example. The loop on the left of Figure 3.4 (red) is based on a conventional evolutionary system. Individuals are taken from the population (which initially is generated randomly), offspring are produced, fitnesses calculated and the new individuals are either discarded or introduced into the population. The fitness function in this system is a comparison between the phenotypes produced and the example. This comparison returns a high value for phenotypes that are similar to the example, and lower values for phenotypes that are less similar; the exact implementation depends on the application domain. This fitness will be referred to as similarity fitness ρ. At the start, the individuals produced will have hardly any similarities to the example; at the end the system might have found an identical copy of the example. In between, the system produces a high number of individuals that are in some features similar to the example. The goal of this system is not to produce the final individual, but to generate a range of individuals that contain a large variety of features from the example in a variety of combinations. Some additional control may therefore be necessary to prevent convergence of the population, this control usually influences the fitness and the way new individuals are inserted into the population. The population generated by the evolutionary system is then used as a pool of samples to create the evolved representation. This is done in the right (blue) loop in Figure 3.4. Gene combinations that appear predominantly in sample individuals that are very similar to the example can be used to create the set of evolved genes. The assumption

26 Evolved Representations 18 Select Compare Create offspring Compare Example Gene combination statistics Best gene combination Maybe introduce Introduce Convergence control Population Sample creation cycle Gene creation cycle Figure 3.4: Schematic representation of the proposed system to create evolved genes behind this is that the genotype-phenotype transformation is defined in a way that features in the phenotype are correlated to subsets of genes on the genotype. This does not have to be a direct mapping, it is sufficient when: the probability that the feature exists in the phenotype is higher if the gene combination can be found in the genotype than if not, and the probability that the gene combination can be found in the genotype is higher if the feature exists in the phenotype than if not. Throughout this thesis, expressions like a feature represented by a gene combination refers to this kind of correlation. It does generally not imply that it is always possible to directly translate the gene combination into a feature, though in the two example implementations this is indeed the case. A result of the probabilistic nature of the evolutionary process is that the evolved representation created for an example is not unique. Instead, different runs will produce different evolved representations, each creating a focus around the example, but each slightly different from the others. To reduce the computational cost in identifying the best new gene combination, it is possible to take only combinations of two existing genes into account. These existing genes can be either basic genes or evolved genes, any new evolved gene can therefore be composed of either two basic genes, two evolved genes, or a basic gene and an evolved gene. Section 7.3 will describe this in more detail. In cases where the genotype-phenotype transformation allows converting gene combinations directly into features, this construction of high order genotypes can be interpreted as a creation of large building blocks by combining smaller ones. Figure 3.5 shows how a building block with seven elements can be assembled in four steps, from

27 Evolved Representations Basic gene Evolved gene Figure 3.5: Interpretation of the creation of high-order evolved genes as the combination of building blocks both basic blocks and lower-order building blocks. The building block that is added in the third step would in turn have been created from basic blocks in a similar process. Creating complex evolved genes by combining simpler evolved genes can be described as a bottom-up process, in the sense of artificial life research, where complex behaviours and structures are the result of interaction of a number of simpler behaviours or structures Feedback into the evolutionary system It is possible to run the evolutionary system until a sufficient number of samples are generated, and then run the gene extraction to create all evolved genes. However, given the bottom-up construction of complex evolved genes, a different approach offers itself: phases of sample creation and gene extractions can be interwoven. The evolved genes created in the gene extraction can be added to the representation and introduced into the population. In the early stages, where the individuals produced contain only little knowledge about the examples, simple evolved genes are produced and introduced; in later stages, more complex evolved genes will be introduced. The evolved genes thereby continually improve the representation used in the evolutionary process, helping it to produce larger and better fitting individuals (see also Sections 6.1 and 9.1.4). If this strategy is adopted, it is especially important to calculate the similarity fitness ρ in a way that ensures a gene combination is in fact related to a feature, because the first gene extractions will occur in early phases of the run of the evolutionary system, where the individuals still differ strongly from the examples. Any gene combination, even if it occurs in high-fitness individuals, could otherwise simply reflect random influence from the initial population, instead of features in the example. 3.5 Creating New Designs Using Evolved Representations When a set of evolved genes has been created, it can be used to provide a focus for the generation of new designs. The evolved genes are used to create a new representation, and a new random initial population is created using this representation. A conventional

28 Evolved Representations 20 evolutionary system can then be run to produce new designs, using a fitness function that represents user-defined design criteria. Depending on how the evolved genes are used, their effect on the search space can be different. If the evolved genes are added to the basic representation, the system can still use basic genes at any place in the genotype if the fitness requires it. Therefore, the set of genotypes in the basic-level genotype search space and the resulting set of designs in the phenotype space is not changed, only the probability that some designs are found. This will be referred to later as soft focus, see Section If, on the other hand, the evolved genes replace the basic genes, only some basic-level genotypes can be produced, and therefore the set of designs in the phenotype space is restricted to a subset of all phenotypes possible with the basic representation. This will be referred to as hard focus. In most applications, the soft focus approach is more appropriate, since it still allows the adaptation of the design to any specific design criteria. In those cases, the outcome of the design is influenced by two forces : the influence of evolved genes on the initial population and on the genetic operations, and the selection for or against phenotypes containing certain features. Kauffman (1993) shows that on general rugged, multi-peaked fitness landscapes, often large bands of near constant fitness exist, where selection therefore has no influence on the population. Released from random points in the fitness landscape, the population usually ends up in those bands. Inside a band, other forces, usually weaker than selection, can influence the population. Local optima inside regions favoured by such a force are then more likely to be reached than other local optima in other regions. Rugged fitness landscapes usually result when the influence of a gene in the genotype onto the fitness of the phenotype depends on a number of other genes in the genotype, a condition that certainly holds for most situations where evolutionary algorithms are used in design. The evolved representations can then be seen as a force that controls the neutral drift of the population towards the focus in the search space. The evolved genes will only introduce those features that are positive or neutral with respect to the user provided fitness. If no basic genes are used in the evolved representation, the evolutionary design system has no choice but to use the evolved genes. The choice of evolved genes however is again dependent on how the use of the specific evolved genes interacts with the fitness function.

29 Chapter 4 Computational Creativity When can a computational process be called creative? It seems there are two ways to give a claim of creativity some foundation. One is to derive the process directly from observations of particular processes of human creative design, for example the use of analogies and emergence. Both of these processes are assumed to play a role in human creative behaviour, and a number of computational processes have subsequently been developed that use analogies (Qian and Gero, 1992; Bhatta et al., 1994; Wolverton and Hayes-Roth, 1995; Qian and Gero, 1995) and emergence (Gero and Jun, 1995; Nagasaka et al., 1995; Poon and Maher, 1996; Grabska and Borkowski, 1996) to create or facilitate creative design. The second way is to try to define a general characterization in computational terms of human creative design activity, and use this to guide the development of a computational process. This top-down-approach allows the use of computational techniques and methods that are not related to any specific human cognitive behaviour, as long as they correspond to the general characterization. It might, for example, allow the use of evolutionary algorithms and neural networks, which most likely do not play any role in human creativity. Section 2.2 has already described some attempts to develop procedural definitions of creativity. The definitions involve concepts related to transformation or expansion of search spaces (for computers) and conceptual spaces (for humans). Neither of the definitions, however, says anything about how such a behaviour can be achieved. Where do new rules come from? How can the conceptual space be transformed? Or, in computational terms, where do the new variables come from? This chapter will look at search spaces in the context of a finite system, and how a computational process can expand a search space. 4.1 Finite Systems and Closed Worlds The fact that a process is running as a program inside a computer introduces a set of theoretical limitations. The two most important in the context of design computing are: 1. The size of memory available to the program is limited, this means that the total number of different states the program can assume is limited (finite system). 2. The computing power available to run the program is limited, this means that the number of different states a program can assume in any specific time span is limited. The limitations have a strong influence on what a computational design process can do, and what is impossible. The main implications are: 21

30 Computational Creativity Since each different design produced is connected to a different state of the machine, and the number of states is limited (limitation one), the total set of different designs that a program can generate is limited and defined a priori. Another way of putting this restriction is that every design has to be represented by design variables, and each variable has to be stored in memory, limiting the number of variables and therefore designs. 2. Due to limitation two, the set of different designs that a program can produce and evaluate in an acceptable time-frame is limited. In practice, this number is usually much smaller than the set of possible designs. This means that the space searched usually has to be much smaller than the space of possible designs. 3. The set of designs a program could produce in a limited time frame is not generally known a priori. As Langton (1988a) observes, Turing s halting theorem can be expanded to show that it is impossible in the general case to determine any nontrivial property of the future behaviour of a sufficiently powerful computer from a mere inspection of its program and initial state alone. Even with a knowledge of the representation and the program, it might therefore not be possible to predict which points in the search space the system can assume, and therefore which designs can possibly be the outcome of the design process. 4. Due to limitations one and two, the evaluation of the design can take into account only a certain, limited set of interaction between a design and its environment. For example, individuals in an artificial life application can develop vision only if (a) the individuals have access to some kind of optical sensory organs, and (b) in every time-step of the evolution, a simulation is run to calculate what each individual would see (as it is done for example in Yaeger, 1992). However even this would not allow for individuals developing for example flight. The main consequences of the limitations are therefore that the total set of designs that can be produced is fixed, and that usually only a small part of it can be tested by the design process in acceptable time. If the design process is seen as a search process, it means that the search will always proceed inside a predetermined search space, which can be referred to as the meta search space; and that of the designs in this meta search space, only a small fraction can be tested by the search process. However, there are still a number of methods by which this search space can be searched by the design process. The following discussion and the illustration in Figure 4.1 assume that one design exists in the meta search space that can be considered the optimum in terms of design performance; but the same methods are also applicable if more than one, equally acceptable performance exist. 1. A process can search only a small search space, accepting the outcome of this search, even if it represents only a local optimum, and better designs lie outside the search space. For example, the search space can be restricted to designs where methods to optimize them analytically are known. This method is illustrated in Figure 4.1(a). 2. Using domain knowledge, a search space can be created that is known to contain the desired design. This could be the case in a situation where, say, theoretical analysis shows that all designs outside a certain space give results that violate one or more of the design restrictions. The method is illustrated in Figure 4.1(b).

31 Computational Creativity Searching the global optimum in a large space without heuristics. This could either use a random search or attempt to enumerate all possible designs, Figure 4.1(c). 4. Using forms of domain knowledge, including soft knowledge and heuristics, to guide the search for the best design in a very large search space. Search strategies such as following the local gradient (hill-climbing), simulated annealing (Kirkpatrick et al., 1983) and evolutionary algorithms fall under this criterion, as well as guaranteed methods such as logic programming. This use of local knowledge is illustrated by arrows in Figure 4.1(d). 5. Focussing the search onto a sub-space. The sub-space is searched with any of the other methods. Domain knowledge is used to change the focus inside the total search-space, as is shown in Figure 4.1(e). (a) (b) (c) (d) (e) Figure 4.1: Methods to search a large search space (black dot: global design optimum; broken line: meta search space, bounded by restriction one; continuous line: search space searched: (a) focus on a subspace, possibly excluding optimum; (b) use domain knowledge to set focus to include optimum; (c) search whole search space; (d) search whole search space using domain knowledge; (e) focus on subspace, but move focus In practice, the methods will be combined, so that what appears as the meta search space in any of the last three methods is in fact a subset of the set of theoretical possible designs, resulting from the limitation of the search space by either method one or two. With focussing, the search becomes a three-layered process, as shown in Figure 4.2: the search space is restricted from the meta search space to a smaller search space, usually using domain knowledge. In this search space, focussing creates smaller sub-spaces, which in turn are searched using local knowledge. While the limitations of a finite state process are related to the concept of closed world, as used in artificial life (see for example Ackley, 1997), where it implies total reproducibility, and logic programming (see for example Jäger, 1990), where the term implies complete knowledge, it is important to note that a closed world is not necessary for the above restrictions to apply. For example, it would be possible to exchange the pseudo random generator used in many programs by a physical device, based on thermal noise or on radioactive decay. This results in a process that is neither reproducible, nor allows complete knowledge; however the rest of the computer is still a finite state

32 Computational Creativity 24 Figure 4.2: Hierarchy of search spaces machine, and the openness of the world will have no effect on the qualitative outcome of the process. 4.2 Focussing: Creativity in a Finite System The focussing in method 5 is very similar to the introduction of new variables and modification of search space, described in Section In fact, the difference in definition may be seen as a difference in perspective. Looking at a process as a local observer which only sees the currently accessible part of the search space, the move of focus in this method appears as a move of the search space. However, a global observer will be able to tell that all successively searched search spaces are in fact part of the larger meta search space. An example from the literature can be used to illustrate this point. In Gero et al. (1994), an evolutionary system is used to generate beam sections, with perimeter and moment of inertia as two competing design criteria. The sections are represented using a shape grammar, the initial search space S 0 is the set of all designs that can be generated using this grammar. The authors then allow the shape grammar itself to change, at the end of the evolutionary process a new shape grammar is learned. With this new grammar, a different set of sections can be produced, the system is therefore using a new search space, S n. The authors observer that S 0 S n and S 0 S n, and therefore argue that the change in the shape grammar led to a substitutive change in search space. However, a global observer would be able see that both S 0 and S n are in fact part of a larger search space S, S 0 S, S n S, which also may hold many other designs that are neither part of S 0 or S n In terms of focussing, S 0 and S n both represent a focus inside the space of all designs that can be represented by all possible sets of shape grammars, S. Another example can be seen in the variable addition shown in Figure 2.7(a). If the space of all possible pentagons is considered the original search space S 0, then adding a variable and thereby introducing hexagons is creating a new, expanded search space S n. But it is also possible to argue that both are subset of the space of all possible polygons, S Is Focussing Creative? In term of Gero s (1994) classifications introduced in Section 2.2.1, most of the methods illustrated in Figure 4.1 would have to be classified as routine design, since the search space that is searched remains constant. Focussing on a sub-space, however, requires that some of the total set of variables are restricted in their range or set to constant values. Moving the focus, then, introduces new variables, and/or uses variables with values outside their current scope. This fulfils the condition for creative or innovative design, not for the entire process, but for the local view onto the focussed search.

33 Computational Creativity 25 The position of a search using focussing in Poon and Maher s (1996) transformationexploration matrix (Figure 2.8) depends on the way the sub-space is searched. As argued above, the process certainly can be classified as novel or original in terms of transformation. If then for example an evolutionary search is used inside the focus area, strong diverging elements are introduced, giving a value of novel or original for exploration. The resulting process is then inside the area of processes with the potential for creativity. A simple hill climbing inside the focus-area, on the other hand, would be entirely convergent, the value for exploration would therefore be mundane. Such a process would then be classified as having only a low potential for creativity Soft Focus versus Hard Focus The focus does not have to be as clear-cut as in Figure 4.1. It is also possible to have a soft focus, where all variables can be modified, but some are much more likely to be modified than others, and/or variables are much more likely to assume values in one range than in a different range. In other words, certain points in the search space, those inside the focus, are much more likely to be found than other points. Moving the focus would then correspond to changing the probabilities of the search process. In Figure 4.3, the soft focus is shown as regions of higher and lower probability inside a search space. As the figure shows, the regions of higher and lower probability do not have to be connected; indeed it is also possible that neighbouring designs have very different probabilities, and no distinct regions exist at all. In the literal sense, moving or expanding a soft focus constitutes neither an addition of variables nor an expansion of the search space, not even from a local perspective, since every design inside the meta search space can always be produced, independent of the positions and shape of the focus. However, this is only a difference in degrees, a system where some designs that previously have been impossible to find are now available (moving a hard focus) will behave very similar to one where some designs have been very unlikely to be found and are now much more likely (moving soft focus). For this reason, it can be argued that moving a soft focus equally well fulfils this requirement for creativity. Figure 4.3: Transforming a soft focus inside a search space, darker shades represent higher probability for a design to appear as result of the search Moving the Focus Neither of the two procedural definitions of creativity described in Section 2.2 (Gero, 1990; Maher, Boulanger, Poon and Gomez, 1995) define any necessary attributes of the mechanism that drives the transformation or exploration. An entirely random mechanism seems not very useful: if there are only a few acceptable designs in the meta search space,

34 Computational Creativity 26 then a randomly positioned focus will have a low probability of containing one of them. This point is also made by Boden when she says that without hunches, a creative robot would waste a lot of time in following up new ideas that anyone could have seen would lead to a dead end (Boden, 1995). Two sources for such hunches have already been discussed: the use of analogy and of emergence. Other sources seem possible, the important aspect is that some connection exists between the current and the transformed focus that improves the probability of finding good designs in the new focus above that of random moves Humans and Finite Systems The previous section has argued that the possibilities for expanding and moving the search space in a computational process are limited by the fact that they have to work inside a finite system. It is an ongoing philosophical debate whether the human mind is essentially nothing more than a complex computational process, or if other, fundamentally different, processes are involved (see for example Penrose, 1989). However, it is possible to argue that human designers also only focus onto subspaces of a larger meta search space without requiring any assumptions on the fundamental nature of the human mind. For example, as argued in Maher et al. (1989), a very good knowledge both in breadth and in depth about a field is a necessary condition for creativity in humans. Apart from limited knowledge, humans are also restricted by the number of design alternatives they can consider in a limited time. For complex design tasks this might easily be more restricting for a human than for a computer. Focussing in human design processes can be directly shown using protocol analysis. In Mc. Neill et al. (1998), the authors analyse the design behaviour of designers during conceptual design. Among other things, they classify the design activities into analysis of the problem, synthesis of solutions, and evaluation of the solutions with regard to the problem. Only during the analysis phase, where the problem is taken into account, does the designer define the search space. The authors report that, as could be expected, the designers observed usually proceeded from analysis to synthesis, and from synthesis to evaluation. However, even in the early stages of the design process, the designers often went from evaluation back to synthesis without a new problem analysis, and therefore without a change in search space. After the initial phase, the designers were about five to six times more likely to proceed from evaluation to synthesis than to analysis. This can be interpreted as focussed search of the search space, interrupted by analysis phases which allow to change the focus. Evidence of focussing occurring in human design can also be found in the observation that designers tend to reproduce both adequate and inadequate design features of examples if they are given such examples in a design brief, a phenomenon referred to as design fixation (Purcell et al., 1994). Seeing an example is sufficient to create a focussing effect in the following design activity. The notion of focussing as an essential component in a creative computational process, as presented in this chapter, has been derived entirely from general ideas about creativity and computational processes, quite specifically without looking at particular instances of human creative behaviour. It is therefore especially encouraging to find this evidence of focussing in human creative design. As in the computational processes, the focus can be clear-cut, for example as a result of a decision not to modify some design variables, or soft, as a tendency to or preference for certain types of design.

35 Computational Creativity Requirements for a Creative Design Process From the previous sections, a number of criteria can be identified that a potentially creative computational process should fulfil. The process should: be able to define a non-trivial sub-space, or focus, inside a very large search-space (the meta search space); use divergent and convergent elements in a search process, such that the search is either entirely bounded by the focus, or that designs inside the focus are far more likely to be sampled by the search than designs outside the focus; allow for transformations of the focus inside the meta search space; and allow for goal-oriented control of the modifications of the focus. This characterization differs from those used by Gero (1994) and Maher, Boulanger, Poon and Gomez (1995) in three points: it specifically acknowledges that every search space searched S n will be a subspace of a larger, predefined search space, the meta search space S, it specifically allows for soft focus, and it requires that the control for the modification of the focus is not random. The requirements form a set of necessary conditions for a creative computational process. If they are also sufficient conditions depends on what measure is used to judge the creativity. They are sufficient according to the two definitions used, but as mentioned, Maher, Boulanger, Poon and Gomez (1995) carefully limit the definition to the potential for creativity. It is very likely that processes exist that fulfil the criteria, but where the designs produced appear not to be creative. Additional requirements, which narrow down the definition, might emerge in future research.

36 Chapter 5 Evolved Representations in Computational Creativity The previous chapter has derived a number of characteristics that a creative computational design process could be expected to have. This chapter will show how such a process could be implemented. As described in Chapter 1, a variety of representations exists that can be used to represent a design space. There is also a variety of methods to search those design spaces; evolutionary algorithms (Section 2.1) are only one example. However, they generally have been designed for optimization, and not specifically for creativity, and therefore search the search space using one of the first four methods illustrated in Figure 4.1. Compared with the requirements listed in the previous section, the key elements of a creative process, creating a focus inside a given search space and moving this focus, are not present. Both operations require domain knowledge. One possibility to implement them is that the user specifies which variables are constant, and what range the other variables can assume. During the search, the user can then periodically change these settings. This, essentially, requires the user to be able to foresee what exact influence each variable will have, ruling out complex, nonlinear genotype-phenotype transformations. It also requires a large amount of interaction from the user. Systems like this are found in practical applications where computationally expensive fitness calculations are used, and therefore only small parts of the design space are to be searched (for example Parmee, 1998a). In this thesis, however, a mechanism that is able to provide all the required parts has been introduced: focussing a search space using evolved representations. As has already been demonstrated, it allows the creation of a focus around a user-provided example, employing a machine-learning process. The following sections will show that it also allows for the second operation, the goal-oriented manipulation of the focus. 5.1 Focussing and Transformation using Evolved Representations The basic idea is very simple: if the evolved representation provides the focus, then manipulating the representation will modify this focus. Figure 5.1 shows the whole schema: the first steps are the same as in Figure 3.1, a basic representation is established, and using some designs as example, an evolved representation is created. In the additional step, a transformation operation, applied to the evolved representation, is then used to shift the 28

37 Evolved Representations in Computational Creativity 29 Basic representation (a) Creation of evolved representation Design examples (b) Use of evolved representation Evolved representation (c) Modification of evolved representation New design using evolved representation (d) Use of modified evolved representation Modified evolved represetation (e) New design using modified evolved representation (f) Figure 5.1: Basic concept of use of evolved representations in creative design focus. This is demonstrated in the example by changing the L-shape in the representation into a T-shape (Figure 5.1(e)). New designs generated using this representation are now biased towards incorporating T-shapes (Figure 5.1(f)). Since in this example, the evolved genes are added to the basic genes instead of replacing them, the system can still use the basic genes to produce designs, in other words, the representation creates a soft focus (see Section 4.2.2). The system could therefore produce any of the designs in Figure 5.1(b), (d) and (f) with any of the representations (a), (c) and (e) if the design criteria required it, simply by not using the evolved genes. However, if the design criteria do not select against them, designs like (f) are far more likely to be the outcome of a design process if representation (e) is used, while designs like (d) are far more likely if representation (c) is used. Figure 5.2 shows the same process, but this time focussed not on the data, but on the operations. It also indicates which parts of the system would be provided by the programmer, and where in the system user input is required and where it can be optionally used to support the system. The programmer would supply one or more basic representations, appropriate for the class of designs the program is intended for. The programmer also has to write a function to learn an evolved representation from a set of examples, a function that uses this evolved representation to generate new designs, and one or more functions to transform the evolved representation. The user provides a set of examples which is used by the system to generate the initial evolved representation. The user then provides a set of evaluation criteria (a formalized design brief), which the system uses to produce new designs based on the evolved representation. If the resulting designs are not satisfactory, a modification of the

38 Evolved Representations in Computational Creativity 30 Basic representation Definition Examples machine learning Program Design specification Evolved representation search Designs Control or Control manipulation Program Program Design specification Evolved representation search Designs Control or Control manipulation Program Program Design specification Evolved representation search Program Designs Programmer User System Figure 5.2: Flowchart of the suggested use of transformed representations in a design system representation, or in other words a shift of the focus, can be triggered. The selection of the function used to modify the representation can be done either by the user or by the system itself. This new representation is then again used with the user specified design criteria to produce new designs. Additional aspects of the control of the transformations and a suggestion for an implementation is provided in Section Methods for Transforming the Representation Thanks to the flexibility of evolutionary algorithms, a variety of modifications of the evolved representation are possible. This section introduces three operations: combining evolved representations from different sources, filtering of evolved representations, and transforming representations by direct manipulation of the evolved genes. All three methods can be combined, thereby offering a very large set of possible modifications Mixing Evolved Representations A simple way to modify evolved representations, and thereby move the focus, is to combine evolved representations from two sources. As long as all evolved representations are based on the same basic representation, it is possible to simply combine two or more sets of evolved genes. It is only required that the evolved genes are relabelled to ensure that no two evolved genes from different sources have identical labels. It is then possible to use the combined representations to produce new designs, starting with the creation of the initial individuals using evolved genes from both sources. If the system has a number of different sets of evolved genes based on the same basic representation available, it can

39 Evolved Representations in Computational Creativity 31 create a large number of combinations of representations, and therefore a large number of focusses in the search space. The focus produced by mixing two representations is not simply the union of two original focusses, as the new focus contains designs that were part of neither original focus: those designs combining evolved genes and thereby features from both foci. An important advantage of this method of transformation is that it requires no knowledge of the basic representation. An example of the combining of two representations is shown in Section Filtering Genetic Material The second possible modification of the evolved representation is the removal of some of the evolved genes. The criteria that are used to decide which genes to remove might require knowledge of the basic representation and interpretation of the evolved genes. A simple criterion that does not require detailed knowledge is the order in which the genes are generated; for example keeping only the first few generated genes will generally exclude any more complex features. The results of such an exclusion can be seen in Section Interesting results can also be found if the representation is reduced to very few, possibly randomly selected, genes. The resulting designs will then show a strong influence of those few genes Direct Manipulation of Evolved Representation The third method requires the largest amount of knowledge about the basic representation, but also allows the most powerful manipulations. The possibilities of direct manipulation include the modification, addition and removal of genetic material from each evolved gene. The easiest version of these modifications is done randomly, this operation could be called a meta-mutation. While this modification can lead to interesting results, it does not allow any goal-orientedness in the process. Goal-oriented manipulations require the ability to relate a part of an evolved gene to a feature. Replacement operations allow the global replacement of one feature with a different one. For example, if it is known that a certain basic gene represents a red line, then it is possible to replace all occurrences of this gene with a different basic gene which encodes lines of a different colour. Removal of parts of the evolved gene can allow the total removal of some information from the evolved representation, for example the removal of all colour information. This information will be supplied by basic genes in the creation of new individuals, or by evolved genes from a second example, if more than one evolved representation is available. An example of this is shown in Section Finally, addition of genetic material is possible as well. The additional material can be fixed, random, or computed from other features of the evolved gene which is being manipulated. This can for example insert additional line segments into floor plans, a similar manipulation is demonstrated in Section Direct manipulation also allows the change of the basic representation, and thereby the transfer of genetic material from one domain to another. Removal and addition of genetic material, for example, allow the transfer to a lower or higher dimensional problem, as demonstrated in Section In other cases, it might be possible to rewrite the evolved genes using a new representation, which would then allow mixing with evolved genes from a different domain which also uses this representation. While these operations require knowledge about the basic representation, it is important to note that for any given basic representation, they have to be implemented only

40 Evolved Representations in Computational Creativity 32 once and then can be used by a designer or by the system without this knowledge, for any application using this basic representation. 5.3 Control of Transformation Operations The final questions are when to apply a transformation, and which transformation to selected. This question has not been investigated in depth as part of this thesis, however one possible scenario will be suggested in section

41 Chapter 6 Other Interpretations of Evolved Representations The previous three chapters have described how evolved representations can be used to create a focus in the design space, and how by adding transformations of the representation, this can be used to implement a design process that shows important characteristics of a creative process. However, it turns out that the use of evolved representations allows a number of other interpretations, and has connections with other work. These will be described, with some background information, in this chapter. 6.1 Evolved Representations and Schema Theorem An important idea in evolutionary algorithms, and especially genetic algorithms, is the schema theorem. A schema is defined for binary-string genotypes of length n by a ternary vector of the same length. The third symbol, commonly represented by a *, is referred to as a don t care symbol. The schema thus represents all instances in the population of gene combinations that for all locations on the schema that are not occupied with a * have the same value as the schema. For example, the schema 01 can represent the gene combinations 0010, 0011, 1010, The schema theorem says that, under certain conditions about selection and evolutionary operations, the increase in the expected number of occurrences of each schema in the population is roughly proportional to its observed fitness. While the mathematical proof is not directly applicable here for a number of reasons (variable length representation, non-constant alphabet for the genotype, different selection) and the meaningfulness of the schema theorem is under dispute, it is interesting to compare some of the ideas and results of work on the schema theorem (Radcliffe, 1997) with the work presented here. Depending on how exactly the evolved genes are defined in the application at hand, schemata might be directly compared to evolved genes. Fixedposition evolved genes (which will be defined in Section 7.1) are essentially frozen schemata, in that they fix the values of the genes at all positions where the schema does not have a don t care symbol, and protect them from further disruption. Movable evolved genes are similar, with the difference that they represent a different schema for each position on the genotype where they appear. One interpretation of schemata is as building blocks, which solve subproblems of the fitness function used to evaluate the genotypes. In the case of evolved genes in design applications, this interpretation can often be taken nearly literally. The only difference is that the phenotypes represented by the evolved genes constitute the larger building blocks, instead of the evolved genes themselves; evolved genes of different sizes give rise to building blocks of different 33

42 Other Interpretations of Evolved Representations 34 complexity. A condition for the building block interpretation to apply is that the problem is indeed to some degree decomposable into subproblems with little interdependence. The generation of evolved genes relies on a similar assumption (see Section 3.4), and in cases where evolved genes can be directly translated into building block phenotypes, this is obviously the case. It is however not automatically true when the evolved genes are used in the creation of new designs, with a potentially very different fitness function. Here, depending on the details of the fitness function, the evolved genes are not necessarily directly related to any sub-problems in the new fitness function. For example, in the floor plan example shown in Section 8.2, creating closed rooms of correct size requires a number of evolved genes that are assembled in the right order; any change in a single evolved gene can drastically alter the fitness. This differs from the use of evolved genes in optimization applications as in Gero and Kazakov (1996a) (see also the Section 6.2 below), where the re-use of the evolved genes is successful because of the similarity between the problems during gene creation and gene use. The no free lunch theorem (Wolpert and Macready, 1995) is related to this, it shows that any search algorithm can outperform linear search only if it incorporates knowledge about the problem domain. As Radcliffe (1997) points out, this means for the choice of representation that it is important to use meaningful genes. Using evolved genes, the system itself creates these meaningful genes, incorporating domain knowledge into the representation. As an alternative to the schema theorem, it is possible to analyse the behaviour of evolutionary algorithms from an information processing viewpoint. Kargupta has done this in Kargupta (1994) and Kargupta (1995): the individuals can be treated as signals, and the genetic operations as transmission channels. This allows for the application of measures such as signal-to-noise ratio and crosstalk. Interesting in the context of this thesis is that information theory predicts that the representation used determines how much of the signal is transmitted, and that different channels have different optimal representations. Gene extraction can drastically reduce the redundancy in the individuals, as will be shown in measures of search space size in Section Evolved Representations and Genetic Engineering An interesting analogy that is related to work in this thesis is that of genetic engineering. In genetic engineering in biology, the genetic material of living organisms is modified to introduce new characteristics, or to remove unwanted characteristics. This technique has been used with much success in large variety of biological applications, from the pesticide-resistance of plants (Plucknett and Winkelmann, 1995) to the treatment of medical conditions in humans (Anderson, 1995; Felgner, 1997). The applicability of this as an analogy for evolved representations becomes obvious when evolved genes are used in an optimization context: evolved genes are those that are found to be common in high-fitness genotypes, and incorporating these evolved genes into new designs can help the optimization of different, but related problems. Work that uses this analogy can be found in Gero and Kazakov (1996b) and Gero et al. (1997). 6.3 Evolved Representations and Style In Chan (1995), the author explains that the distinct style of an architectural design is a result of both common features and common procedures used by the architect. Describing a style as a function of how it is generated therefore requires a deep understanding of

43 Other Interpretations of Evolved Representations 35 the design process, usually supported by comments from the designer. Shape grammars (Stiny, 1980a; Stiny, 1980b) usually take this approach, they represent both the common procedures (in the rules and the sequences of rules that are possible), and the common features (in the shapes manipulated by the rules). However, Chan also notes that and common features present in an architects work are indeed used by viewers to categorize the architect s style.... a style is said to be the function of common features. (Chan, 1995) Style is identified by grouped features... More features tend to make the style coherent and strongly hold the style together... the degree of a style is related proportionally to the number of common features that appear in the artefacts... when the number of features in an artefact is reduced to three or fewer, interaction occurs and a style is no longer perceptible (Chan, 1994). In other words, style as recognized by a viewer is a result only of the visual features, and the more common features are present, the better the style is recognizable. It should therefore be possible to infer important aspects of a style common to a set of designs without knowledge about the design process, and use evolved representation for this purpose. If the style is a result of common features among a set of examples, then by providing such a set of examples to the creation of evolved genes, and using a similarity function that returns higher values for phenotypes that have similarities with more than one example, it is possible to create an evolved representation with evolved genes that incorporate knowledge about the style of the examples. If such a representation is used to produce new designs, it produces a focus containing designs that share style features with the examples. Both examples presented in the next chapter use multiple examples this way, and style features from the example are clearly visible in the results. However, as the analysis explains, because of the particular example designs used, it is difficult to say if the features are learned because they are represented in more than one example, or because they are represented more than once on a single example (see Section ). 6.4 Evolved Representations and Case-based Design Case-based reasoning has been introduced in design to allow the reuse of knowledge from design cases, rather than having to start from first principles or compiled knowledge with every new design (Maher, Balachandran and Zhang, 1995). Contrary to knowledgebased design systems (Coyne et al., 1990), the expert knowledge is not stored as an explicit rule set, but is implicit in a database of previous design cases. Since requirements and environmental conditions will usually vary between the retrieved and the actual case, the design has to be changed to fit the new conditions. Design adaptation is therefore an important step in the application of case-based design. Case adaptation can be simply stated as making changes to a recalled case so that it can be used in the current situation. Recognizing what needs to change and how these changes are made are the major considerations. Adapting design cases is more than the surface considerations of making changes to the previous design, it is a design process itself (Maher, Balachandran and Zhang, 1995).

44 Other Interpretations of Evolved Representations 36 Automatic adaptation of design cases has been studied for example by Dave et al. (1994) in building design. These authors use two different adaptation operations: dimensional modification and topological modification. Dimensional modification changes the dimensions of the design elements without changing their number. It uses a subset of parameters that is developed from the original three-dimensional model by a process called data reduction. If dimensional modification is not sufficient to adapt the case to the new requirements, the topology of the model has to be modified as well. In Schmitt (1993), the author investigates the use of string grammars to automate topological modification. Here, the most interesting aspects are extracted from the design case and then are subject to modifications. Other representations used in case-based design are case-specific rules and a wall representation developed by Flemming et al. (1988). Dave et al. s (1994) work shows that the potential for adaptation of a case very much depends on its representation. Every adaptation operation first requires the transformation from a general case representation (for example a three-dimensional model) into a different representation, specialized for the intended adaptation. The adaptation is applied only to this special representation, and the result is then transformed back into the general case representation. Usually, the transformation into a special representation also includes a strong parameter reduction. This reduces the size of the search space and is necessary to keep the computational complexity within bounds. However, it also often means that some desirable design solutions are excluded. It is apparent that there are two competing goals: the representation has to create a search space that, on one hand, has few degrees of freedom as to preserve the knowledge from the case and allow an adaptation with reasonable computational cost but, on the other hand, does not exclude any possibly desirable design solutions. Evolved representations can solve this problem: they contain knowledge in different levels of detail and complexity, and the design process that is using this knowledge is free to use or disregard any of this knowledge. This is especially so if the basic representation is used together with the evolved representation: the resulting soft focus (Section 4.2) allows the process to find good solutions based on the design case with reasonable computational effort, while not excluding any design solution that can be represented with the basic representation. 6.5 Evolved Representation and Cross-Breeding One of the most interesting methods for transforming the focus created by the evolved representation is to combine evolved genes from two or more evolved representations (Section 5.2.1). Cross-breeding between races can be seen as the biological equivalent of this function. A comparison shows that evolved representations are especially suitable to model cross-breeding Combining Genetic Material In nature, it is sometimes possible to combine the genetic material of two individuals from two different groups of animals; the resulting offspring includes features from both groups. The most common example is cross-breeding between different breeds, for example in dogs; the results are generally referred to as hybrids. In the general case, cross-breeding is not possible, in fact the definition of species involves being not normally able to interbreed with other such groups (Henderson, 1989). A number of biological conditions have to be met before successful interbreeding is possible, the three most important of them are:

45 Other Interpretations of Evolved Representations The genetic material of both parents has to be such that the same genotypephenotype transformation can be used to transform it into a living individual. For life on earth, this is rarely a problem, since the vast majority of life forms use the same universal, RNA or DNA based genetic material. 2. The environment in which the transformation engine works has to be compatible. For example, mixing dog races of different sizes is generally only successful if the female dog belongs to the larger race, otherwise its womb might not be able to support the developing embryo. For similar reasons, it would be difficult to interbreed water-bound with land-bound species. 3. The genetic materials have to be compatible. In other words, the transformation engine has to be able to transform genotype consisting of material from both sources into a functioning individual. This, in nature, is the most important obstacle in inter-breeding. While biological systems use a common representation and achieve a huge variety of organisms by a highly-interactive multi-level development process, most evolutionary algorithms use a specialized representation, with a simple, usually linear genotypephenotype transformation. The three conditions therefore have very different importance for evolutionary algorithms. 1. Contrary to biological systems, evolutionary algorithms use many different genotype-phenotype transformations, often designed for specific applications. As a result, this condition prevents cross-breeding in most cases. 2. In the vast majority of evolutionary algorithm implementations, the genotypephenotype transformation is very simple, without any interaction with the environment. This point is therefore usually unimportant for evolutionary algorithms. 3. To make the evolutionary search as efficient as possible, the genotype-phenotype transformation is usually designed so that most or all of the possible genotypes can be transformed into phenotypes. If all the genetic material used to produce a new individual comes from genotypes that use the same genotype-phenotype transformation, the offspring is equally likely to be a valid individual. However, this does not guarantee that the offspring will have a high fitness. In evolutionary algorithms, the most important condition is therefore that the sources of the genetic material are systems that use the same genotype-phenotype transformation The Unit of Inheritance To simulate cross-breeding in an evolutionary algorithm environment, the representations used are of specific importance. If, as defined in the previous section, the species that are to be combined use the same genotype-phenotype translation, then they will also use the same basic representation. As a result, an initial random population, which consists of individuals that are random combinations of basic genes, will look the same in any of the species. Features specific to a species can only be present in the form of certain gene configurations in individuals in later stages of the evolutionary process. The combination of genetic material from different species is in this case therefore only possible by combining individuals, for example with a cross-over operation. The resulting individual

46 Other Interpretations of Evolved Representations 38 will show features of both individuals, and therefore both sources; however in following generations the genetic operations can destroy any such features. Evolved representations present an alternative: evolved representations can be created for each of the species, representing features of the species in the evolved genes. Each species has the same basic representation, but very different evolved representations. As a result, random individuals generated from different evolved representations will look different, and if the evolved representations are combined, the random initial individuals will show features from all the sources. During the evolutionary process, the evolved features are protected: while it is still possible that the evolutionary process leads to individuals that use only evolved genes, and therefore features, from one source, the genetic operations cannot disrupt the evolved genes. Using evolved genes this way, the different species can use the same basic representation, while still having their individual features preserved through the evolutionary process. Evolved genes create a difference between unit of representation and unit of inheritance, which can also be found in nature: the base pairs on the DNA (or any variant of RNA) are the equivalent of the basic coding ; but the units of inheritance are long sequences of base pairs, the genes. 6.6 Evolved Representations and Artificial Life Artificial Life (Langton, 1988b; Langton et al., 1991; Langton, 1992; Brooks and Maes, 1994; Langton and Shimohara, 1996) studies natural life by attempting to recreate biological phenomena from scratch within computers and other artificial media.... rather than studying biological phenomena by taking apart living organisms to see how they work, one attempts to put together systems that behave like living organisms (MIT Press Alife WWW, undated). Though at first glance they appear very different, there are a number of connections between Alife and computational design creativity that suggest an approach to incorporate ideas from the one into the other. Ideas from artificial life research are therefore incorporated in a number of places in this thesis. The most important connection is that most studies in artificial life research are done as computer simulations, and as such are subjected to certain limitation. Those limitations are the subject of Section 4.1. Another connection can be seen in the use of the evolved representation. In nature, genotype - phenotype expression is a complex, hierarchical process. Sequences of base pairs make up the genes, some of the genes regulate the expression of other genes or gene sequences, some genetic material is reused in different contexts, and a large portion of genetic material is not expressed at all. In contrast, evolutionary algorithm applications generally use very simple, linear expressions of the genotype. The use of evolved representation can be seen as a step towards creating richer representations, which are more comparable to natural systems. This is also reflected in the discussion of the unit of inheritance, see Section The creation of evolved genes makes use of the bottom-up paradigm. This paradigm has been developed as an alternative approach to robot control in the animat research (for example Meyer and Wilson, 1991; Meyer et al., 1993). In bottom-up designed systems, the interaction of a number of simple subsystems leads to a complex, emerging behaviour. Steels (1991) shows in a comparison how, compared with hierarchical systems, emergent systems can be more fault-tolerant and robust against disturbances

47 Other Interpretations of Evolved Representations 39 and environmental changes. Connections to evolved representations can be seen in two places. First, the evolved genes are created in a bottom-up fashion, the difference is that not complex behaviours, but complex structures are created. Second, the focus can be said to emerge from the interaction of all the evolved genes in the evolved representation (see also Section 7.2.1). The evolved representation shows the same robustness to changes in the environment as bottom-up designed control systems: the evolved representation is created from the example using a similarity-based fitness and is then used to generate new individuals with very different fitness functions. Thanks to the large pool of evolved genes, the evolved representation can successfully be used in these different environments. Other connections to artificial life can be found in Section 3.5 where the connection to Kauffman s (1993) work on the connections of self organization and evolution is discussed; in the use of the cross-breeding metaphor in Section 6.5, and in the fitness sharing mechanism used in the first example implementation (see Section ). 6.7 Evolved Representations in Other Work Evolved representations have parallels to the way representations are modified in other work. In all cases, this is done to increase the performance of an evolutionary algorithm as an optimization process, as described in Section 6.1 above Evolved Representations in Genetic Programming Genetic Programming (Koza, 1992) is a derivative of the genetic algorithm, in which genotypes form executable LISP s-expressions. The genotypes are treated as rooted trees, and tree-based versions of the cross-over and mutation operation are used. To improve the performance of genetic programming, a number of different methods have been developed that can all be described as the creation and use of subroutines in the LISP expressions. Koza (1994) introduces so-called Automatically Defined Functions or ADF s. These functions are part of the overall program tree, and can therefore be automatically evolved by the evolutionary process. The advantage introduced by ADFs is twofold: they can reduce the memory requirements for the storage of the genotype, and they allow the search to perform larger jumps in the search space, since modification to the functions can introduce large changes to the overall behaviour. Since functions can call each other, some minimal level of function hierarchy exists; however the number of functions that each individual defines is fixed and generally very small. However, as each individual has got its own defined functions, the overall representation does not change. In Angeline and Pollack (1992), a different approach is used. In a random process, sections of trees of individuals are collapsed into modules. These modules are stored outside the population, much like evolved genes in this thesis, and are therefore protected from evolutionary operations. To prevent loss of genetic material, an additional process is used that in a random fashion replaces existing modules with the corresponding original section of the tree. In both processes, the functions or modules are created using random procedures, and therefore are not necessarily meaningful subfunctions for the fitness function used. Subfunctions that occur in the population in a higher-than-average rate are more likely to become part of a defined function, and the schema theorem, though not directly applicable to genetic programming, can be used to suggest that higher-fitness blocks appear in a higher rate. However, as Rosca and Ballard (1994a) observe, the converse of the

48 Other Interpretations of Evolved Representations 40 schema theorem does not hold true, and poor blocks may appear in a high rate in the population. Rosca and Ballard therefore suggest a fitness value be computed for each possible new function block, using either from the overall fitness of the individual, a special fitness function, or user input (Rosca and Ballard, 1994a; 1994b; 1994c). To keep the computational complexity down, the authors also use hierarchical definitions, creating larger blocks from smaller ones in a bottom-up process much like the creation of larger evolved genes in this thesis (though the blocks in Rosca and Ballard s (1994c) work are always produced from the terminals of the trees). For these reasons, their method is most closely related to this work. The intention, however, is still different: Rosca and Ballard try to improve the performance of Genetic Programming for sparse search spaces. The work of Rosca and Ballard is also interesting because it uses a measure called the Minimum Defining Length to characterize the complexity of the individuals in the population. A very similar measure, the length of an equivalent bit string, will be used in Chapter 9 to produce qualitative information about the search space. In a later paper, (Rosca, 1995) suggests the use of entropy as a measure of diversity in the population, which can then be used to control the creation of new function blocks Evolved Representations using other Representations An interesting approach to the evolution of a representation is presented in Paredis (1995): in addition to the population of individuals which represent problem solutions, a second population exists that contains individuals that represent permutations. These permutations are applied to the genotypes of the individuals in the first population to re-order the genes. This brings some genes closer together on the genotype, and therefore reduces the probability that they are separated during cross-over; while the distance between other genes is increased. The idea is that successful permutations group functionally related genes, this helps to decompose the problem. Permutations that are used in the production of successful offspring receive positive feedback, this way a fitness can be calculated for the population of permutations. While the algorithm is developed to optimize the behaviour of evolutionary algorithms, and not with creativity in mind, the idea of using a second population of representations that co-evolves with the solutions is interesting. Instead of identifying the best gene combinations by calculating a gene combination fitness over all combinations as is suggested in this thesis, it might be possible to use a similar approach of co-evolving the representation together with the sample individuals. Other researchers have experimented with adding genetic material that controls genetic operations to the genotype, guided by the fact that in natural systems, different locations on the genotype have different cross-over probabilities (for example Levenick, 1995). While this is not directly modifying the representation, it has a similar effect of grouping sets of genes and protecting them from disruption. Another approach to decomposition of the genotype space is demonstrated in the gene expression messy genetic algorithm (GEMGA, Kargupta, 1996b; Kargupta, 1996a). In this algorithm, relations are formulated that partition the genotype space in the classes. Reproduction operations are then controlled in a way that ensures that better classes receive proportionally higher numbers of offspring. These offspring generate new knowledge about the genotype space, and allow the creation of new relations. This work is interesting in the optimization context, as it derives mathematically the optimal distribution of relations, classes and individuals herein. But as in Paredis (1995), it might be possible to use the knowledge contained in the relations as an alternative means of creating a focus in the search space.

49 Other Interpretations of Evolved Representations 41 A different approach to hierarchical assembly of complex structures is demonstrated in Rosenman (1996; 1997). Here, the evolutionary process is separated into distinctive stages, and in each stage the material produced in the previous stage is used to create higher-level objects. For example, in the creation of floor plans, individual units of floor area are first assembled into rooms of different shapes. In the next step, a selection of these room shapes are then used as the basic material to create functional units. Finally, these units are assembled into floor plans in a third processing step. In each of these steps, a different fitness function is used. The main advantage of this approach is that it reduces the computational complexity of the evolutionary optimization process. The size of the genotype in each of the stages is far smaller than it would be if the complete floor plan was assembled directly from floor area units; in other words the search space search in each step is much smaller. Also, the fitness function that is used in each step relates only to the requirements of the objects created in that step. 6.8 Other Work on Computational Creativity Navinchandra does not talk about creativity, but about innovation. He states that innovation requires an exploratory process, his definition of innovation encompasses both innovative and creative designs as defined above. In the CYCLOPS model presented, exploration is in the first place realized by constraint relaxation, which in terms of search space equals an additive expansion of the search space. However, as the author explains, this expansion of the search space can lead to the emergence of new criteria, which can sometimes change the focus of the problem solving task and lead to solutions completely different from what a normal synthesis technique would produce. (Navinchandra, 1991) The author also asks where new criteria could come from, and concludes that the user would be the best source. This seems similar to the question asked about search spaces and new variables in Section 4, but is more related to the question about control of the transformation of the focus, see Section Goldberg does not give a definition of creativity, but suggests a mechanism that can be used to make an evolutionary system creative: transfer of information from a different domain. This could be done in two ways: modification of the underlying representation, and transfer of deep, whole building blocks from other fields (Goldberg, 1998a; Goldberg, 1998b). The author does not elaborate very much on the reasons for this suggestions, or on possible implementations. The work presented in this thesis provides both a explanation of the role that representations and the transfer of building blocks can play in a creative process, and example implementations.

50 Chapter 7 Implementing Evolved Representations The previous chapters have presented the idea of using evolved representation to focus a design process. They have also shown how this idea can be used in creative computational processes, as well as connections with other research areas. Many of the details on how evolved representations can be created and used, and how exactly the evolved representation influences a search process, depend on a number of implementation details, like the basic representation used, the implementation of the evolved genes, and the genetic operations used. However, it is possible to make some general observations. This chapter will describe some of the issues involved in implementing algorithms using evolved representations. 7.1 Formal Notation Evolutionary algorithms use a number of different data structures for their genotypes (Michalewicz, 1992). The simplest, and most commonly used, data structure is a string of length n. This string has positions p 1,..., p n, each position is filled with a symbol b n from an alphabet B. These symbols are comparable to genes in DNA, they will be referred to as basic genes. For example, in the typical genetic algorithm implementation, a genotype is made up from n basic genes, each either 0 or 1 (B = {0, 1}). To construct a representation that influences the evolutionary search in a particular way, the notion of evolved genes is introduced. Evolved genes e, e E, E B =, represent combinations of basic genes in a genotype. Every evolved gene represents a set of position/symbol groups, which indicate that a certain basic gene appears at a certain position in the genotype: e = [(p x1 = b 1 ), (p x2 = b 2 ),...]; p x1, p x2 {1,..., n}; b 1, b 2 B. Evolved genes can be represented in a notation that is borrowed from schema theorems (Section 6.1), by introducing a don t care symbol *. With the help of this symbol, strings can be used to indicate the positions of basic genes in evolved genes, for example the evolved gene e 1 = {(p 2 = 0), (p 4 = 1)} can be written as e 1 = [ 0 1 ]. The order of an evolved gene O{e} is defined as the number of position/symbol combinations in the evolved gene. A variation of the evolved gene as defined above is the movable or position independent evolved gene. Movable evolved genes will be indicated by using angular brackets. They differ from regular evolved genes in that positions indicated in the evolved gene are relative to the position in which the evolved gene appears on the genotype. For example, if e 2 = 1 1 appears in position 4 on a genotype, it indicates that positions 5 and 7 on the corresponding basic genotype will be filled with the symbol 1. Other 42

51 Implementing Evolved Representations 43 versions of evolved genes, adapted to different data structures used in the genotype, or defined to include domain knowledge from the specific application, are possible as well. One version of evolved genes defined for tree-structure genotypes is used in the second example application (Section 8.3). 7.2 Influence of Evolved Genes on Evolutionary Algorithm To be able to influence the search space in favour of certain features, it has to be possible to find combinations of basic genes that, if they are present in a genotype, improve the likelihood that a certain feature is found in the corresponding phenotype (see Section 3.4). Whether such combinations of basic genes can be found or not depends on the particular feature and on the genotype-phenotype transformation used. The requirement is very weak, no direct mapping from feature to gene combination is required. Looking at particular cases, it turns out that evolved genes can generally be found unless every possible gene combination is equally probable to be present in a genotype that is translated into a phenotype with this particular feature. Some particular cases are listed below: Features that are similar to the parity property: To define the parity property, a binary alphabet {0,1} is given for the basic genes and the genotype-phenotype transformation is the identity, that is the genotype is equal to the phenotype. The parity property is fulfilled if and only if the number of 1 s in the phenotype is even. Since exactly half of the genotypes will have this property, and the fitness depends on every basic gene in the genotype, it will not be possible to create evolved genes that, when present, increase the probability of the parity property. Evolved genes will therefore not be usable to promote the parity feature. Features that depend on a value of a single gene in a genotype: This example assumes a genotype with a binary alphabet, where the feature exists if the first gene on the genotype is 1. In this case it is for example possible to create two evolved genes e 1 = [ 1 0 ], e 2 = [ 1 1 ]. If a set of initial individuals is produced, using the two basic and two evolved genes with equal probability, the first symbol in the corresponding basic gene is three times as likely to be a 1 than a 0. The evolved representation will protect the 1 at the start from genetic operations, and if the mutation operation is implemented to choose a replacement with equal probability from basic and evolved genes, a 1 at the first position will again be favoured. Features that depend on the relative values of two genes: For example, if in an alphabet with the symbols (1..9), the value of p 1 has to be three times the value of p 2 for the feature to exist in the corresponding phenotype, it is possible to create three evolved genes: e 1 = [ 3 1 ], e 2 = [ 6 2 ], e 3 = [ 9 3 ]. The existence of any of these evolved genes in a genotype will then increase the likelihood of the feature in the phenotype. If the dependence on relative values is typical for the application, it would also be possible to define a specialized type of evolved genes for this purpose. Features that depend on the relative position of genes in the genotype: For example, if in a genotype of length four, using a binary alphabet, the feature is present if two successive 1 s exist on the genotype, it is possible to create three evolved genes e 1 = [ 1 1 ], e 2 = [ 1 1 ], e 3 = [ 1 1 ]. Alternatively, it is possible to use a movable evolved gene. In this case, a single gene e 1 = 1 1 is sufficient.

52 Implementing Evolved Representations Evolved Representation: More than a Collection of Evolved Genes Apart from protection of gene sequences from genetic operations, evolved genes have a second effect: due to the random generation of the initial population and the random replacement of genes in mutation operations, features that are represented by more evolved genes are more likely to be represented in the population. For example, a feature might be present mainly in phenotypes represented by genotypes containing mostly odd-numbered basic genes. This kind of global information cannot be represented directly in an evolved gene, unless a special definition of evolved genes is used that incorporates some high-level knowledge. However, if a large number of evolved genes is created, each containing only knowledge about some specific successful gene combinations, these gene combinations are likely to reflect the rate of odd to even basic genes. Using this evolved representation will then bias any new designs towards assuming this ratio, too. The accumulation of evolved genes can therefore contain additional knowledge that cannot be represented in single evolved genes. It is an emergent feature of the evolved representation, an effect that can be compared to gestalt effects in cognitive science (Frisby, 1979), and to emergent behaviour as used in artificial life (see Section 6.6). A good example of this effect can be found in the second example implementation, see Section Creation of Evolved Genes As described in Section 3.4, an evolutionary algorithm is used to generate a population of individuals that encode phenotypes that show similarities to an example design. To create evolved genes from this pool of sample genotypes, the most successful gene combinations in those genotypes have to be identified. Since the similarity fitness ρ, used in the evolutionary algorithm, gives a numerical measure of how similar each sample individual is to the example, it can be used to calculate which gene combinations are the most common in sample individuals that are very similar to the example. To do this, it is assumed that each gene combination that occurs on a genotype contributes equally to its similarity fitness. It is then possible to calculate the average contribution to the similarity fitness over all sample individuals for every possible gene combination, and then use this value to identify the best gene combinations. Given a similarity fitness ρ p for each of the phenotypes p in the sample population, the gene combination fitness κ c for a gene combination c can be calculated as κ c = np =0 ρ P or as κ c = 1/n n P =0 ρ P, with P 0... P n all phenotypes in the sample set that are produced by genotypes that contain c. The difference between the two is that κ c is normalized by the number of genotypes in the sample set that contain the gene combination. This way, a bias introduced by partial convergence in the population which results in some features being over-represented can be removed from the gene fitness. The value κ c can be calculated for all possible combinations of two, three, four, or more genes. Unfortunately, the number of possible gene combinations grows exponentially with the number of elements in the gene combination (the order of the evolved gene). Unless only moving evolved genes are created, it also grows linearly with the number of positions in the genotype the gene combination can assume, in other words, with the length of the genotype. Calculating the gene combination fitness for all possible gene combinations therefore becomes quickly impossible if combinations of more than a few basic genes are considered. This holds even in cases where domain knowledge is available to reduce the set of possible gene combinations, for example when it is

53 Implementing Evolved Representations 45 known from the genotype-phenotype translation that only relative positions of genes are important. However, all gene combinations of order O{c} > 2 can be divided into two gene combinations of lower order that have to occur together in a genotype. It is therefore possible to initially create only evolved genes for gene combinations of order O{c} = 2, and create all gene combinations of order O{c} = n, n > 2 by either combining a gene combination or order O{c 1 } = m, m = n 1 with a single gene, or by combining two gene combinations of lower order: O{c 2 } = k and O{c 3 } = l, l = n k. This allows the search for the best gene combinations to be limited to combinations of two basic genes, a basic and an evolved gene, or two evolved genes. In many applications, domain knowledge allows further simplification of the search by taking only directly neighbouring positions on the genotype into account. In this case, if n b different basic genes and n e different evolved genes exist in the population, exactly n 2 b combinations of two basic genes can exist, together with (2n b n e ) combinations of a basic gene with an evolved gene, and n 2 e different combinations of evolved genes. If only movable genes are produced that are on directly neighbouring positions on the genotype, there are a maximum of n 2 b + (2n bn e ) + n 2 e different gene combinations, or candidates for a new evolved gene. Of crucial importance for the creation of complex evolved genes from simpler ones to work is that all the required lower-order evolved genes can be identified in order to create a higher-order evolved gene. Whether this is possible or not depends on the way the gene fitness is calculated. The fact that a high-order gene combination c has a high contribution to the fitness does not guarantee that any lower-order gene combination c n that is part of c will have a high contribution as well. Table 7.1 shows an example case, where all individuals containing gene combination c 1 : [ ] achieve a high similarity fitness ρ, while individuals containing the gene combination c 2 : [ ], c 3 : [ ] or c 4 : [ ] receive a lower-than-average ρ. Since c 1 is of order O{c 1 } = 3, a lower-order evolved gene has to be created first. In this case, this could be c 5 : [ 1 1 ], c 6 : [ 1 1 ] or c 7 : [ 1 1 ]. genotype similarity fitness gene-combination * * * 10 c 1 * * * 0 c 2 * * * 0 c 3 * * * 0 c 4 all other 5 Table 7.1: Example set of genotypes and assumed similarity fitness Assuming the set of samples contains all gene combinations in equal numbers, the gene fitness can be calculated for the relevant gene combinations as in Table 7.2. For example, half of the genotypes that contain c 5 would contain c 1 and receive a ρ of 10.0, the other half would contain c 4 and would receive a ρ of 0.0; resulting in a κ of 5.0 as most of the other gene combinations. As can be seen, c 5, c 6 and c 7 have a gene fitness equal to most of the other gene combinations, they will therefore not be converted into an evolved gene, and no evolved gene for c 1 can be created. The exact result obviously depends on the particular combination of similarity fitness for individuals containing c 1 to c 4, the gene combinations c 5 to c 7 can receive gene fitnesses below or above the average. If, on the other hand, the samples are generated using an evolutionary system, they are not likely to be evenly distributed. Since genotypes containing c 1 achieve a much

54 Implementing Evolved Representations 46 genotype κ gene combination * 0 0 * * * 5 * 0 1 * * * 2.5 * 1 0 * * * 2.5 * 1 1 * * * 5 c 5 * * 0 0 * * 5 * * 0 1 * * 5 * * 1 0 * * 2.5 * * 1 1 * * 5 c 6 * 0 * 0 * * 5 * 0 * 1 * * 5 * 1 * 0 * * 2.5 * 1 * 1 * * 5 c 7 Table 7.2: Gene fitness for different gene combinations of order 2 higher similarity fitness, which is used as the fitness in the evolutionary system, than those containing c 2, c 3 or c 4, they are likely to be more common in the sample set. As a result, c 5, c 6 and c 7 will have increased κ-values. For example, if the relative ratios of the genotypes are as given in Table 7.3, the κ-values are as shown in Table 7.4. Here, c 5, c 6 and 7 have the highest κ-value, and can therefore be identified and used to create a new evolved gene, which will then in turn allow the creation of an evolved gene for c 1. genotype ratio gene combination * * * 3 c 1 * * * 1 c 2 * * * 1 c 3 * * * 1 c 4 other 2 Table 7.3: Sample distribution of genotypes in population genotype κ gene combination * 0 0 * * * 5 * 0 1 * * * 3.33 * 1 0 * * * 5 * 1 1 * * * 7.5 c 5 * * 0 0 * * 5 * * 0 1 * * 5 * * 1 0 * * 3.33 * * 1 1 * * 7.5 c 6 * 0 * 0 * * 5 * 0 * 1 * * 5 * 1 * 0 * * 3.33 * 1 * 1 * * 7.5 c 7 Table 7.4: Gene fitness for different gene combinations of order 2, if different genotype ratios are taken into account

55 Implementing Evolved Representations Other training methods Instead of using an evolutionary system for the task of creating the samples, one could also attempt to enumerate all individuals, or use random samples. However, due to the size of the search space, enumerating is usually impossible. Random creation of genotypes has the same problem, as in a very large search space randomly generated phenotypes are highly unlikely to show any of the more complex features. An evolutionary system, requiring only an example and a measure of similarity, is therefore the best choice. Also, the inherent parallelism in the evolutionary system helps to generate a large variety of individuals, and the basic representation for the individuals is already designed for use with evolutionary systems. Finally, as described above, the approximately fitness-proportional distribution of individuals in the population of an evolutionary system is important for the ability to create evolved genes from smaller sub-genes. Also, instead of using a single example, it is possible to use a set of examples. In this case, the similarity fitness would be calculated against all the examples, and either average or maximum could be used. The result would be a somewhat broader focus, centred around the common features of the examples. An entirely different possibility is not to specify an example, but an example design fitness. While the evolutionary system is running to produce individuals optimized with regard to this design fitness, evolved genes are created from these individuals. The resulting evolved representation is then creating a focus on designs that are suited to optimize the example design fitness. As long as the phenotypes are only similar, and not identical, to the example, any feature in the genotype can also be simply caused by random influences in the search. As a result, it is only likely, but not guaranteed, that a gene combination with a high κ is in fact related to a feature of the example. In some applications, it might be possible to create a similarity fitness that is 0 if the individual contains only approximations of the features, but no exact copies. This would guarantee the existence of features in the phenotype. This fitness could be used to prevent the extraction of genes that do not correspond to features, but experiments with such a fitness indicated that using it in the evolutionary process can strongly slow down the creation of new individuals. This is similar to some conventional evolutionary algorithm applications, where experimental evidence shows that it can be more efficient to keep individuals that violate constraints in the population and reduce their fitness with a penalty function, rather than removing them immediately. Since the similarity fitness used in the evolutionary system does not have to be the same as used for the computation of κ, it is possible to use one function to compute a similarity based fitness for the evolutionary process, and a different function to derive another similarity based value ρ, which is adapted for the gene creation. The example shown in Section 8.3 uses two similarity fitnesses in this manner. 7.4 Non-consecutive Evolved Genes As defined in Section 3.3, evolved genes protect sets of basic genes in genotypes from disruption from the genetic operations, like mutation and cross-over. Implementing such a protection, however, can be difficult if the protection leads to interlocking chains of evolved genes on the genotypes. Use of a diploid representation can help in this case, as shown in the following section.

56 Implementing Evolved Representations Interlocking Evolved Genes As described in Section 2.1, cross-over operations work by cutting the genotypes of two parents into two or more parts, and then swapping the parts between the parents. Evolved genes, however, can interlock under certain conditions, and lead to situations where no point can be found where cutting is possible without destroying one or more evolved genes. Interlocking of evolved genes can always happen if evolved genes are used that do not consist entirely of consecutive basic genes, in other words if they contain holes, indicated by * symbols in the schema representation. Figure 7.1(a) shows an example of four interlocking evolved genes: e 1 = af c, e 2 = a ab a, e 3 = d c d b, e 4 = a a. If the operation was allowed to cut through an evolved gene, the protection of the combinations of basic genes would be violated and the evolved genes would lose their functions. e 1 e 3 a f a c d a b c a d a b a (a) e 2 e 4 a f a c d a b c a d a b b (b) e 5 e 6 e 7 e 8 Figure 7.1: (a) Interlocking evolved genes, (b) non-interlocking evolved genes; evolved genes indicated by connecting lines, broken lines indicate positions for possible cuts. Interlocking can be prevented in some cases by allowing only evolved genes that consist of directly adjacent, consecutive basic genes. These genes cannot interlock, and conventional genetic operations thus work with little change, Figure 7.1(b). Section 8.2 shows an example of an application using evolved genes in this manner Using Dominant/Recessive Genes While for some application, the genotype-phenotype transformation is such that nonconsecutive gene combinations are unlikely to be connected to features in the phenotypes, this cannot be assumed in the general case. To allow the use of arbitrary evolved genes, the genetic operations and the method used to generate new individuals from evolved genes have to be modified. In doing this it is important to make sure that evolved genes will still be protected Cross-over The cross-over operations consist of two parts: cutting the genotypes and re-assembling the two parts. To deal with interlocking evolved genes, the cutting operation has to be modified. It has to cut around the evolved genes. Given a random cutting site, the complex genes spanning the cross-over site can be identified and the genotype cut so that the evolved genes remain completely on either of the two pieces. Figure 7.2 shows that the genotype in Figure 7.1(a) can be cut in three different ways into two parts without destroying any of the evolved genes. The cut can be done either between evolved genes e 1 and e 2, between evolved genes e 2 and e 3, or between evolved genes e 3 and e 4.

57 Implementing Evolved Representations 49 e 2 e 4 a d a b a a d a b a a f c e 3 e 1 e 3 e 1 d a d a b a a f a c a b a e 4 e 2 e 2 e 4 a a a f a c d a b a a d b e 1 e 3 Figure 7.2: Possible cross-over cuts for the genotype shown in Figure 7.1(a); evolved genes indicated by connecting lines. The difficulty arises in the reassembly step: if at least one evolved gene is spanning the cutting site, the ends of the two sections that have to be combined into a new genotype will contain holes. If neither the end of the first segment nor the beginning of the second segment contain holes, the new genotype can be simply constructed by appending one segment to the other, as demonstrated in Figure 7.3(a). Similarly, if the holes in the two segments happen to fit, they can be assembled in a zipper -like manner, see Figure 7.3(b). (a) (b) (c) (d) (e) a b a c + a c a d = a b a c a c a d a a d + c d a d = a c a d d a d a b a a + d c d a = a b d e + c e f g = a b a a + d c d a = a b d a a c d a a b c d e e f g a b a a d c d a Figure 7.3: Different ways to recombine two segments in a cross-over operation, dominant/recessive gene pairs produced in (c) and (d) are indicated by two genes sharing one position. In the general case, genes in the end of one segment will only partially overlap with holes in the other beginning of the other segment. The solution suggested here borrows the notion of recessive and dominant genes in sexual reproduction: by allowing pairs of genes in some locations on the genotype, with one gene shadowing the other, see Figure 7.3(c). When the genotype is transformed into a phenotype, one of the genes is expressed while the other is ignored. Which gene is expressed is decided when the cross-

58 Implementing Evolved Representations 50 over operation is executed. Possible criteria are: the gene of the larger evolved gene; the gene of the evolved gene with the higher gene combination fitness (see Section 3.4); or one of the two genes randomly selected. Since some evolved genes are not entirely expressed in the phenotype, the protection of evolved genes is violated. However, this violation is small, since the rest of the evolved gene is still expressed. The important advantage of allowing overlapping, dominant and recessive genes, however, is that all evolved genes involved are still complete units and protected as evolved genes. In further genetic operations they will still be protected from being permuted by the genetic operation. It is also possible that the shadowed parts will be expressed again in an offspring after a further cross-over or a mutation. An interesting special case exists where the holes do not fit but the values of the conflicting genes are equal. This continuity between the two segments, as in Figure 7.3(d), indicates that the new individual might have a higher chance of being successful. An alternative option to allowing dominant and recessive genes can be to always append the segments so that holes in the genotype remain, and fill the holes with random basic genes, Figure 7.3(e). This is only possible if variable length representations are used, and it results in very long genotypes being produced, especially once the evolved genes start getting relatively large (maybe spanning 30 or more). It also adds a larger amount of random basic genes; which softens the focus created by the evolved genes. Equally unsatisfying is the option of just rejecting the cross-over, since with a high percentage of evolved genes in the population, this would lead to a very large number of rejects and a very strong restriction in possible cross-overs, thereby reducing the performance of the system Mutation In a mutation operation, genes in the genotype are randomly swapped with different genes from the current set of basic and evolved genes. While replacing a basic gene by a new basic gene is straightforward, replacing a complex evolved gene with another complex evolved gene is likely to create an overlap. If an evolved gene is removed from a genotype, it leaves a pattern of holes; any different evolved gene will probably have genes at different places, and therefore will collide with genes in the genotype. As with cross-over, using dominant and recessive genes is the best solution. At all places where the new evolved gene collides with an existing evolved gene, both conflicting genes are kept. The set of criteria from Section can be used to decide which gene is the dominant and which the recessive. If a collision is caused by a basic gene in the genotype, the basic gene can simply be removed in favour of the evolved gene. To minimize the number of shadowed genes, one can optionally select the replacement gene as the best fitting one out of a random selection of evolved genes. Remaining holes are filled with basic genes, see Figure 7.4. This figure also shows how a recessive gene (basic gene d of evolved gene e 2 ) can become expressed again as the result of a mutation. e 1 e 3 d a c a d e c d c d c d f c d d c e 2 e 2 e 2 Figure 7.4: Mutation using complex evolved genes: Evolved gene e 1 is removed from the genotype, and replaced by evolved gene e 3, one recessive gene of evolved gene e 2 becomes visible, while another one becomes recessive.

59 Implementing Evolved Representations Creating new individuals At the start of any design process using the evolved representation, a new initial set of individuals has to be generated from this representation. If a set of complex evolved genes is randomly selected from the representation, it is unlikely that they can fit together without holes and overlaps. Re-ordering the evolved genes can minimize the overlaps, but for the remaining overlaps in the individuals the best solution is again to use dominant and recessive genes. Remaining holes can be filled with basic genes Dominant/Recessive Genes in Other Work The notion of diploid genetic representations with dominant and recessive genes has been used in genetic algorithms before, but with a very different purposes: a recessive set of genes can act as a memory of previously useful gene sequences, and help the genetic algorithm to adapt in applications with non-stationary fitness functions (Goldberg and Smith, 1987; Ng and Wong, 1995). Other work has shown that diploidity can help protect genetic algorithms against premature convergence (Yukiko and Nobue, 1994; Greene, 1996).

60 Chapter 8 Example Applications This chapter will present two example implementations, demonstrating how evolved representations can be created and used in different domains. The first section will describe some general implementation schemes that are used for both implementations, the following sections will then describe the two examples. This chapter describes only the implementation and resulting evolved genes and designs. Quantitative analysis of data collected during the runs and comparisons between the two applications are discussed in Chapter 9, and the results summarized in Chapter General Implementation All implementations consist of two independent programs: one to create the evolved representation, and one to use a given evolved representation to produce new designs. It is important to note that none of the systems is designed to optimize the performance in terms of convergence speed, or to find a globally optimal design. Instead, the goal is always to produce results that show how the evolved representation can be used to influence the design process. Figure 8.1 shows the flowchart of the gene creation programs. The shaded elements in the top are part of a conventional evolutionary system; the functions below are added for the gene extraction. Some functions are only executed when certain conditions are met, the exact conditions vary between the implementations and will be described in the context of each application. The flowchart of the programs used to produce new designs is the flowchart of a standard evolutionary system (see Figure 2.1). In all cases, different runs with different parameters and functions have been carried out. Different parameters generally resulted in differences in the quality of the result, but not in a change in general behaviour. The parameters that have been shown to produce the best results are used for the examples presented here; best results means in this case an evenly distributed set of evolved genes for the gene creation, and highest fitness for the creation of new designs. In the text, the actual parameter values used are indicated in brackets in an italicised typeface (). If not indicated otherwise, random selections use equidistributed random functions Selection The selection method employed for both the generation of evolved genes and of new designs is very similar in both implementations. 52

61 Example Applications 53 Create initial population Termination condition? N Select parents Y Recombination Mutation Select new population from offspring and current population N Time for gene extraction? Y Create table of gene combinations Find best gene combination Maybe introduce gene combination into representation and population Figure 8.1: Flowchart of the gene creation programs In every cycle, only two parents are selected, and only one or two offspring are generated. The systems are therefore generation-less, or steady-state evolutionary systems. With larger population sizes, steady state evolutionary algorithms perform comparably with non-overlapping population systems (Sarma and De Jong, 1997), they are preferred for practical reasons, as limiting the process to the addition and removal of single individuals allows simpler book-keeping of gene-combination and other statistics. Steady-state evolutionary algorithms are automatically elitist as well, since there is no restriction on the time a good individual can survive in the population (Sarma and De Jong, 1997). The systems employ either tournament selection or niched Pareto selection as the selection method Tournament Selection To select parents from a population using tournament selection (Bickle, 1997), q individuals are selected randomly from the population, and ranked according to their fitness. The highest-ranked individual is the individual selected for this tournament. For each parent required, a separate tournament is used. The tournament size q can be used to control the selection pressure, a higher q produces a higher pressure. To select individuals that are removed from the population, the lowest ranking individual in the tournament is selected; the tournament size can be different for the selection of parent individuals and for the selection of individuals that are removed from the population.

62 Example Applications 54 Tournament selection has the advantage that it is well suited for the selection of single individuals, as used in steady state evolutionary algorithms. Also, since only the rank order of the individuals in tournament is important, tournament selection is invariant to scaling of the fitness, and therefore gives the design of the fitness function a higher degree of freedom. Finally, it does not require the ordering of all individuals in the population, and is therefore independent of the population size Niched Pareto Selection Creation of a rank ordering for individuals in tournament selection is straightforward if only one fitness criterion is used. If more than one fitness criterion is used, the fitnesses can be integrated into one fitness, for example by calculating a value between 0 and 1 for every individual fitness, and adding or multiplying them into a single fitness value. Unfortunately, by integrating all fitnesses into one value, the information about what fitness conditions are fulfilled and what conditions are not is lost to the system. As a result, the individuals that are produced are often those that achieve the highest overall fitness by optimizing with respect to a subset of the fitnesses, and sacrifice performance with respect to the other fitnesses. The population converges towards a small subset of potentially desirable solutions. A better way to handle a high number of individuals is therefore to utilize Pareto optimization (see for example Radford and Gero, 1988). In a Pareto optimization process, only a partial ranking between two individuals is established. If two individuals are compared, one individual is better than the other (dominates it) if and only if it is better in one fitness criterion and not worse in all the other. Often, each of the two individuals compared has at least one fitness criterion where it is better than the other individual, in this case no individual dominates the other. To select individuals that are used as parents to produce offspring in the genetic operations, two individuals are picked randomly from the population and compared to a randomly picked reference set. If one of the individuals is dominated by one of the reference individuals while the other is not, the second individual is selected as the parent. Otherwise, neither of the individuals is preferred. In this case, either of the two individuals is used randomly. This selection alone is not sufficient to prevent all individuals clustering as a small subset of possible, good solutions. As an additional measure to prevent convergence, niching is used (Horn and Nafpliotis, 1993; Mahfoud, 1997). Here, candidate individuals are compared with a number of other individuals in the population. For every individual, the distance between the fitness values is calculated. The number of individuals with a distance smaller than a threshold value is called the niche-count (Mahfoud, 1993). In niching Pareto optimization, in order to select between two individuals that either both dominate the reference set or are both dominated by at least one individual in the reference set, the individual is chosen that has a smaller niche-count. If Pareto fitnesses are used, the treatment of a newly generated individual depends on its domination of individuals in the population. If a newly generated individual dominates another individual in the population it can replace it. It might also dominate more that one individual, in which case it is implementation dependent if one, some, or all of the dominated individuals are removed from the population. If the new individual does not dominate at least one individual in the population, and it is dominated by at least one other individual in the population, it is rejected. The last possibility is that the new individual is neither dominated nor dominates another individual, and therefore populates a new part of what is called the Pareto optimal front. In this case, the individual has to be added to the population without replacing another individual.

63 Example Applications 55 The population size of an evolutionary system using Pareto fitnesses is therefore not constant, it can both grow or shrink over time. 8.2 Design Example 1: Generating Floor Plans The first of the two example implementations is taken from the domain of floor plan generation. The idea is to use a set of floor plans with a clear, recognizable style, and generate an evolved representation for those floor plans. Using this evolved representation in the creation of new designs will then focus the design process on designs which show style similarities to the examples. Shifting the focus is possible by manipulation of the evolved genes, leading to designs of a different appearance. The example floor plans used were created by the American architect Frank Lloyd Wright, they are all Prairie Houses designed between 1901 and 1910 (Wright, 1983). These houses have been selected for a number of reason: They show a very clear and distinctive style, they are well documented and analysed, and are easy to represent in an encoding suitable for evolutionary algorithms Background The unique and distinctive style and the large number of designs available have made the work of Frank Lloyd Wright a popular example for researchers who are interested in formal aspects of style. Among the publications of special interest in the context of this thesis are the work of Chan (1992) and Koning and Eizenberg (1981). Both chose the Prairie Houses as the group of designs studied Frank Lloyd Wright Floor plans Chan (1992) has analysed the style of Frank Lloyd Wright s Prairie Houses, in terms of the design process and design features. He identifies the first four designs steps. 1. An abstract of the space is developed. 2. From this, a geometric pattern is created. 3. The functional requirements are integrated into the basic pattern. 4. The elevations follow from application of an elevation grammar to the plan. After step three, a first basic outline of the floor plan, together with the functional organization of the building, is established. This basic outline might never be put onto paper by the designer. However, it exists implicitly in the final designs, and can generally be reconstructed from the floor plans of the actual buildings. As step four shows, further design work follows from this outline. The outline shows many of the characteristics of the buildings without incorporating too much detail. For these reasons, this outline of shapes and functional organization is very well suited as the base of the examples used to create an evolved representation. Chan also identifies a number of common features in the designs, which can be identified directly from the set of floor plans without requiring any additional knowledge. The features that are represented in the basic outline defined above are: 1. Floor plans are always based on a grid, the grid size depends on the project. 2. The fireplace is at the centre of the composition, all spaces extend from there.

64 Example Applications One major shape in the floor plan is long and narrow, much of the house is only one room in depth. 4. The Prairie Houses have similar topological arrangements. An evolved representation created from example designs given as a basic outline could in principle learn all these features. Which of these features are learned in an actual implementation depends on a number of factors, for example the basic representation and the set of examples used Shape Grammar representation of Frank Lloyd Wright s Prairie Houses Based on the layouts of eleven Prairie Houses, Koning and Eizenberg develop a shape grammar that can be used to construct ten of these houses, as well as many others that show a similar style. Their work is based on what could be described as a top-down analysis of the design process. Roughly, the following phases can be distinguished: starting with the fireplace, a basic composition is created (18 rules). This composition is further elaborated by adding corners and porches, and detailing the interior layout (16 rules). More exterior details are added (22 rules), and the design is extended into the third dimension (12 rules). The roof is established (19 rules), together with some more details (4 rules). With an additional 8 rules to manipulate labels, 99 rules are necessary to generate the ten different layouts. The focus of the work described in this thesis is the designs that are created by the first 34 rules: 2-dimensional layouts, with a developed basic layout, organized into function zones, and some detailing. This is very similar in level of detail to the basic outline created in step three in Chan s (1992) analysis (see Section ) Semantics An important aspect of the designs used is the distinction between different function zones. The layouts have zones representing living space, service space and porches. Of central importance is the location of the fireplace. In the shape grammar used in Koning and Eizenberg (1981), the zones for service and living space are established around the fireplace with the first rules, and detailed at the end of the first 34 rules. At the same time, porches are added. Labels are used to distinguish between the different function zones The Basic Representation The choice of the basic representation for the floor plans is influenced by a number of criteria. The basic representation has to be able to represent the floor plans in the level of detail chosen for the examples. As described, both shapes and functional organization can be important aspects of the style of a set of designs. The basic representation therefore has to be able to represent both shape and function knowledge. The basic representation should be able to represent a large set of designs other than the examples, it should not exclude any designs that could potentially be interesting to the user. The basic representation should try not to introduce a bias towards specific designs or design features.

65 Example Applications 57 The size of the search space that is created by the basic representation should be small enough so that for the creation of the evolved representation, the evolutionary process is able to find a large enough variety of samples that are sufficiently similar to the examples. The last criterion, which is in conflict with the others, is not an absolute measure, it depends on a number of factors like computational difficulty of the gene creation fitness function, computational power available and the time the user is prepared to wait. However, it does represent a real limit to the basic representation. In general terms, it can be paraphrased together with the second criterion as while the search space created by the basic representation should include all potentially interesting designs, it should not be unnecessarily large. One of the style features identified by Chan for Frank Lloyd Wright s Prairie House floor plans is that the floor plans generally are laid out on a uniform grid. A comparison of the plans drawn up in Koning and Eizenberg (1981) also shows that all inner walls and nearly all of the outer walls can be represented by straight lines on an orthogonal grid. A simple representation that is able to approximately represent the plans shown in Koning and Eizenberg (1981), as well as a very large set of other designs, can therefore be created by using sequences of north-south and east-west unit length vectors. This can be encoded into a genotype as a sequence of movements and turns, with the following set of instructions: turn 90 0 left, turn 90 0 right, draw a line of one unit length in the current direction, and move one unit length in the current direction without drawing. Phenotypes of different complexity require a different number of turns and lines, and therefore differently sized genotypes, requiring a variable-length representation (Section 2.1.1). To add function knowledge, colour information can be used. For this purpose, a table is used that associates different functions with different colours. The basic representation is then changed so that each basic gene representing a segment also contains information about its colour, which can be any of the colours used in the table. Functions that can be represented by a colour this way could be for example fireplace, wall between service and living space and outer wall of a porch. Using n different colours, two possible turns, and a move without drawing, n + 3 different instructions are required to represent the drawings. These instructions can be represented using m basic genes in a number of ways. Figure 8.2(a) shows possible representations for m = 2 and m = n + 3. To define the initial direction, the first gene in a genotype is used. For this gene, the n + 3 basic genes are mapped onto the four possible initial direction using a modulo operation, as shown in Figure 8.2(b). Figure 8.2(c) shows an example of a phenotype and its genotype representation including a gene for the initial direction, using both types of encoding. For m = 2, more than one basic gene is required to represent a feature, as a result, the function of a basic gene is different in different positions in the genotype. Using m = n + 3, a single basic gene represents exactly one instruction, the connection between basic genes and features in the phenotype can therefore be expected to be strongest. For that reason, this representation is used in the implementation described here. The use of colours to represent the functions is not actually required, the functions could equally well be directly encoded into the genotype. For example instead of having a code for a red line segment, it would be possible to have a code for a outer living space wall segment. The use of colours as an intermediate representation however provides an additional layer of abstraction between the example domain and the gene creation, and helps the preparation of the examples and understanding of the results by a program user. Since the colours are used only to represent function, the actual colours used can be chosen arbitrarily. It is interesting to note the difference to the use of colour in shape

66 Example Applications 58 Instruction Code for m=2: Code for m=n+3: Right turn (000) or (001) (0) Left turn (010) or (011) (1) Step ahead (100) (2) Blue line (101) (3) Red line (110) (4) Green line (111) (5) (a) Position and direction before instruction Position and direction after instruction Initial direction: Code for m=2: Code for m=n+3: Right (00) (0) or (4) Left (01) (1) or (5) Up (10) (2) Down (11) (3) (b) (c) Code for m=2: ( ) Code for m=n+3: ( ) Figure 8.2: Basic representation used to represent Frank Lloyd Wright floor plans: (a) encoding of line segments and turns, (b) encoding of initial direction, (c) representation of an example phenotype grammars (Knight, 1989a). In the so-called colour grammars, the actual colours are the final goal, and are in turn represented by markers in the grammar Learning Representation The system used to create the floor plan follows the flowchart shown in Figure 8.1, but uses an additional function in the evolutionary algorithm loop that is used to manipulate the fitness assignment. The fitness calculation is described in detail in the following section, details of the other function blocks of the evolutionary system follow Example Floor Plans To learn the evolved representation, four examples derived from the Prairie House floor plans illustrated in Koning and Eizenberg (1981) are used: the Henderson house, the Thomas house, the Martin house and the Baker house. The representation does not allow diagonal lines, two diagonal line segments in the Henderson house therefore had to be replaced by corners. The plans are fitted onto a grid, all four plans consist of a similar number of line segments: Henderson house 111, Martin house 101, Baker house 103, and Thomas house 108. Any room in the floor plan is assigned one of three functions: porch, living area or service area. Every wall segment separates either two rooms of identical function, two rooms of different function, or a room and the outside area. Nine colours are used to represent the possible combinations, though two combinations are not used in the examples: porches can only have walls to the outside and living areas, but not to other

67 Example Applications 59 porch or service rooms. A tenth colour is used to represent the fireplace. Figure 8.3 shows the four examples and the colours used. (a) (b) (c) (d) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch (unused) Porch- Service (unused) Figure 8.3: Frank Lloyd Wright Houses used to create the evolved coding: (a) Henderson house, (b) Martin house, (c) Baker house, (d) Thomas house Fitness Calculation The task of the fitness calculation during the creation of the evolved representation is to produce a measure or how many features from the set of examples are present in an individual. To do this, the phenotypes may be thought of as coloured rubber stamps. This stamp is then tried in all grid-points in the example, it is said to fit at a certain position in an example if the mark it would produce at this position exactly overlaps with the drawing. In other words, for every segment in the phenotype there exists a corresponding segment of identical colour in the example. Figure 8.4 demonstrates this for a simplified case without colours: given are an example drawing 8.4(a) and three different phenotypes 8.4(b). For phenotype A, no position exists in the example where a corresponding segment exists for all segments in the phenotype. Phenotype B fits at four positions, as shown in Figure 8.4(c). An additional restriction is placed on phenotypes that consist either entirely of horizontal segments of the same colour or entirely of vertical segments of the same colour. For these phenotypes, at least one end of any of the lines (sequences of segments) in the phenotype has to coincide with a corner or intersection in the example. Figure 8.4 shows this for the one-segment phenotype C. Without this restriction, these phenotypes would have a strong advantage over phenotypes that contain corners and are therefore

68 Example Applications 60 anchored to positions in the phenotype with corners. For example, in the example plan in Figure 8.4(a), phenotypes producing only a horizontal line of length one, two or three would each cover all horizontal segments in the example. (a) (b) A B C (c) (d) (e) Figure 8.4: Fitness calculation for floor plan examples: (a) Example plan, (b) phenotypes, (c) phenotype B at different positions in the example, (d) total segments covered by phenotype B (e) segments covered by phenotype C anchored at corners and intersections. From this definition of fit, a number of measures can be derived for a phenotype: 1. the number of positions in the genotype where the individual fits (4 for phenotype B), 2. the number of segments covered at any single place, which is identical to the number of segments in the phenotype (3 for phenotype B) 3. the total number of segments covered by the phenotype at all positions where it fits, with segments covered multiple times counted once (11 for phenotype B) Fitness sharing Experiments with these fitnesses show that with any of them the system shows a very strong tendency towards convergence, producing populations where nearly all individuals are covering a small section of only one of the examples. These populations are therefore not providing the variety of samples required for gene creation, and an additional mechanism is required to improve the distribution of the samples. To actively discourage convergence in the population, an additional mechanism is introduced that borrows an idea from artificial life research. The fitness for each single segment in the examples is seen as a resource that has to be shared between all individuals that cover this segment. The more individuals cover a segment, the less fitness each individual receives for covering it. Segments covered by only a few individuals will provide niches, individuals covering those segments receive higher fitnesses and will therefore have a higher probability of being selected as parents. To implement this fitness sharing, a weight is associated with every segment. This weight W S could be calculated as W S = C/n, where C is a constant, and n the number of individuals covering the segment. However, experiments with the implementation described here have shown that this linear sharing is still not sufficient. For the runs shown here a quadratic sharing was therefore used: W S = C/n 2. With weights associated with the segments, two more measures can be derived for the fitness of an individual: 4. the sum of weights for all segments covered at all positions, for example with weights as in Figure 8.5(a), phenotype B receives 27 (Figure 8.5(b)).

69 Example Applications the maximum of the sum of weights for segments covered at a single position, over all positions where the phenotype fits (9 for phenotype B, see Figure 8.5(c) (a) (b) (c) Figure 8.5: Fitness sharing: (a) weights associated with a design, (b) weighted segments covered by phenotype B, (c) maximum set of weighted segments covered by the phenotype at a single position The two methods of fitness sharing proved to be comparable in preventing convergence, however, fitness 5 seemed to encourage the production of new individuals slightly more than fitness 4. The examples presented here therefore use this fitness. This Artificial Life inspired fitness sharing is different from Pareto niching: Pareto niching usually works with a moderate number of fitness values that are each floatingpoint numbers or integers with a large number of possible values. If the total number of segments in the design is N, then the process can be described as using N fitnesses, each corresponding to one of the segments (about 400 in this case), which can assume only the values 0 and 1. The mechanism could be a useful alternative to Pareto fitnesses in other applications with similar fitnesses as well Other implementation details Creation of initial population The initial individuals are generated as a random sequence of basic genes, the length of the genotype is selected randomly with equal probability between a minimum (3) and a maximum (9) length. Initial population size is 150. The initial individuals are very small compared with the size required to represent a full copy of one of the examples. But since the individuals are generated randomly, only very small individuals have a chance to fit anywhere in the examples. This is not a problem, though, since the crossover operation will gradually produce larger individuals, as more and more information about the example is learned by the individuals. Selection of parents Parents are selected using tournament selection. The tournament size is comparatively high (19), this is to increase the selection pressure towards use of the fitness sharing to prevent convergence. Since a steady state evolutionary algorithm is used, only two parents are selected each time. Production of offspring The two parents are used to generate two offspring individuals using variable-length single-point crossover (Section 2.1.1). With a certain mutation probability (0.3), one of the offspring is mutated. The mutation operation selects one gene randomly from the evolved-level genotype. If an evolved gene is picked, it is replaced by another evolved gene, selected randomly from the set of evolved genes available at that time. A basic gene is replaced by a different basic gene. The length of the genotype of the offspring depends on the location of the crossover points, it can be shorter of longer than the parents. Inserting offspring into the population Since the goal of the system is to produce a set of samples for the gene extraction, all offspring are kept if:

70 Example Applications 62 they are not empty, that is if the phenotype contains at least one segment; they fit at least at one position in one of the examples; there is no individual in the population that has an identical phenotype. No individuals already in the population are removed. As a result, the population grows continually during the run of the evolutionary system. Creation of gene table To create new evolved genes, a gene extraction is run periodically. In the example runs presented here this is done whenever a certain number of individuals (20) has been added to the population. Section 7.3 has shown that it is necessary to consider only gene combinations of two genes for the creation of new evolved genes. However the number of gene combinations of only two genes is still much too high to allow tabulation of them all in order to find the best combination. Fortunately, the specific genotype-phenotype translation used here allows further restrictions on the set of combinations. Since basic genes represent the same type of segment independent of their position on the genotype, only the relative position between genes has to be considered. Additionally, any non-consecutive genes in the genotype will produce very different features in the phenotype, depending on the genes between them. For example, two genes coding for line segments will produce a very different phenotype depending on whether the gene in between is a left turn, right turn, pen movement or another segment. Non-consecutive gene combinations are therefore not very likely to be connected to any feature in the phenotype, and a table of consecutive gene-pairs is sufficient to identify the most promising gene combinations in the population. Since the weights associated with the segments in the examples have only been introduced to prevent convergence in the population, but have no relevance for the comparison between phenotype and example, they are ignored for the calculation of the table of gene combinations. Instead, the fitness used is the number of all segments covered by the phenotype (Fitness 3 in Section ). To calculate the contribution of each gene combination to the fitness, this value is divided by the number of genes in the evolved-level genotype. The calculation of the best gene combination therefore requires the following steps. 1. Create a table of all different pairs of successive genes that occur in the population, with an entry for the gene-combination fitness and an occurrence counter. 2. For all individuals in the population: Divide the fitness (calculated with method 3) of the individual by the length of the evolved-level genotype. In the table, add this value to the gene-fitness of all pairs of successive genes occuring in the genotype of that individual. Also, increase the occurrence counter for each pair. 3. For all combinations, divide the gene fitness by the occurrence counter. 4. Find the pair with the highest quotient in the table. Introduction of evolved gene into population To ensure that the new gene combination occurs in a number of individuals, the occurrence counter is tested against a set threshold (2). If the occurrence counter is higher, a new symbol is added to the evolved representation, representing the gene combination. The new evolved gene is introduced into the population by replacing the gene combination with the new symbol in most of the genotypes it occurs in. A small randomly selected fraction of individuals (10%)

71 Example Applications 63 remains unchanged to ensure that no genetic material is entirely removed from the population Results A number of runs have been performed, the results from one are presented here. This run, referred to in the following as flw-create-1, is also used to provide the evolved genes for the creation of new designs in the following sections. The creation of an evolved representation has no obvious termination condition (like for example a certain minimum fitness). This run was therefore terminated once a sufficient number of evolved genes had been created. In this run, offspring were produced 1,668,720 times, each time creating two new individuals, resulting in a final population size of 22,116 (offspring are only kept if they fit the example and are not duplicates, see previous Section), producing 743 evolved genes. (a) (b) (c) (d) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.6: Largest individuals in the population covering the four example floor plans: (a) Henderson house, (b) Martin house, (c) Baker house, (d) Thomas house. Individuals in the population Figure 8.6 shows some of the individuals produced in the run. Selected in this figure are the largest phenotypes that cover the four example floor plans. As can be seen in a comparison with Figure 8.3, the evolutionary system produced individuals that cover the example set very well. The Baker house is covered completely, one single segment (yellow) is missing from the Martin house, a four-segment line (black) from the Henderson house; and a six-segment line (brown) plus a single segment (yellow) from the Thomas-house. The fitness sharing worked very

72 Example Applications 64 well, preventing a convergence towards individuals and genes clustering on only one of the examples (for more detail see Section 9.1.2). Evolved genes Figures 8.7, and 8.8 show two different selections from the set of genes produced in flw-create-1. Figure 8.7 shows the first 20 evolved genes produced, the numbers represent the order in which the genes are created. Since the first gene in a genotype is interpreted as the initial direction, a constant basic gene has been prepended to each evolved gene before transforming it into a phenotype for display in this figure. By default, all genotypes therefore start off in horizontal direction. However, in some evolved genes the first basic gene represents a turn, those therefore start off vertically, like genes 2, 5, 10, 11, and 15 in Figure 8.7. Gene 4 codes for two left turns only, it therefore produces no visible phenotype. Some evolved genes also have a basic gene representing a turn as the last gene in the genotype, this turn is not visible in this illustration Figure 8.7: First genes created in the example run The genes shown nicely demonstrate how evolved genes incorporating segments of different colour can contain additional topological information. For example, gene 3 implies that porches occur next to living areas, and gene 8 implies that a fireplace is located in the wall between a service zone room and a living zone room. Figure 8.8 shows the last 20 genes produced in the run. The genes show how larger shapes are represented by the evolved representation, good examples of shapes that could very easily be used in new designs are those produced by genes 732 and 738. The largest of the evolved genes, gene 728, is composed of 40 basic genes. Translated into a binary string, a 133 bit string would be required to define this genotype Using evolved genes to produce new designs The evolved representation, created using the example floor plans, can be used in the second step to produce new floor plans. The evolved genes are used in the creation of the initial population, and preserved throughout the evolutionary process, thereby creating a focus on designs that are in some way similar to the example designs. A conventional evolutionary system with steady state Pareto selection is used, with the fitness function now incorporating design criteria Fitness calculation To produce satisfactory new individuals, a number of different fitness functions, collected into a Pareto fitness vector, have been used. The fitnesses can be distinguished into three groups. One group of fitness functions, the design brief fitnesses, is required to allow a designer to specify certain aspects of the new designs. This would include for example the number and the size of rooms in the design. Another group of fitnesses, representation fitnesses, particular to the use of the evolved representation, is required to ensure that the representation is used properly.

73 Example Applications Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.8: Last genes created in the example run In this case, this means that all segments should be of the appropriate colour, for example that lines representing a wall between a living and a service room (brown in Figure 8.3) are in fact used between two rooms that are classified accordingly. The third group of fitnesses measure other design criteria that are usually implicitly assumed, but that have to be explicitly added to the fitness to produce acceptable designs. These implicit design fitnesses can be used to ensure for example that all rooms are completely enclosed, or that only one fireplace exists. All fitnesses are mapped onto a value between 0.0 and 1.0, with 1.0 being perfect fitness. Since the different fitness criteria do not contradict each other, individuals can reach perfect fitness in all criteria. The sum of the individual fitness values can be used to identify the best individual in the population. However, due to the size of the search space and the ruggedness of the fitness landscape, a perfect design will not always be found. The first step in the fitness calculation is to classify each room into porch, service or living space, depending on the line colour of the walls surrounding the room. This is done for the individual rooms, starting with the top-left, and ending with the bottomright. The type of the room is decided using a majority vote : each segment of a colour separating either two rooms of identical type or a room and an outer space votes for this room type. A segment of a colour that separates two different rooms (dark green,

74 Example Applications 66 brown and pink in Figure 8.3) votes for both room types, unless the neighbouring room has already been classified and is of one of the two types, in which case the segment votes for only the remaining function. The function that is voted for by the largest number of segments is then assigned to the room. An example is shown in Figure 8.9: the left room has already been classified as living, and the right room receives three votes for porch, and two votes for living. It therefore is classified as porch, producing two segments with wrong line colours (marked with * in the figure). This room-byroom classification will not always produce the optimal function assignment. However, a global optimization is computationally much more expensive, and would result in different assignments only in cases with many contradicting votes, where any assignment would result in a phenotype with a high fraction of wrongly-coloured lines. These phenotypes would be classified as badly-optimized anyway, and would receive a relatively low overall fitness. Living Porch * Living Porch Porch or Living * Porch- Outside Living- Outside Porch- Living Figure 8.9: Room classification procedure by voting ; * marks segments that have wrong colour after classification. After classification of the rooms, a number of measures can be generated from the designs. All measures are mapped onto a fitness value between 0.0 and 1.0. For binary measures, the fitness is set to either 1.0 or 0.0; for the other measures a mapping function is used with three parameters: a minimum value min, a maximum value max, and a cutoff cut value. If the measure M i lies between the minimum and maximum values (min M i max), a fitness of 1 is assigned. On the other hand, if M i is either min cut or above max + cut, a fitness of 0 results. In between those values, the fitness is assigned linearly, see Figure Fitness 0.0 Raw measurement min-cut min max max+cut Figure 8.10: Mapping from generic measure to fitness value. The following lists the 17 different fitnesses that are calculated, with min, max, cut given in brackets: B1 Total area enclosed by rooms classified as porch (in square units) (9/12/5); B2 Total area enclosed by rooms classified as living (in square units) (55/70/10); B3 Total area enclosed by rooms classified as service (in square units) (45/60/10);

75 Example Applications 67 B4 Total area enclosed by all rooms (in square units) (110/130/110); B5 Number of rooms classified as porch (1/1/1); B6 Number of rooms classified as living (2/4/2); B7 Number of rooms classified as service (2/4/2); I1 Exactly one porch block (a block is a group of rooms of the same type, connected by internal walls) (binary) I2 Exactly one living block (binary) I3 Exactly one service block (binary) I4 Number of islands (an island is a group of rooms of unspecified types, connected by internal walls) (0/0/5) I5 Porch island connected to living island by internal walls (binary) I6 Porch island not connected to service island by internal walls (binary) I7 At least one valid fireplace (two connected segments classified as fireplace, between service and living room) (binary); I8 Number of additional fireplace segments (0/0/3); I9 Segments in loose ends (line segments that do not completely enclose a room), calculated as a fraction of the total number of segments, (0/0/1.0) R1 Segments of wrong colour, calculated as a fraction of the total number of segments (0/0/1.0); A number of runs using different subsets of these measures have been performed. These will be described, together with the results that were produced in Section Other implementation details Initial population The initial individuals (700) are generated from basic and evolved genes using a random process. The length of the genotype is selected randomly between a minimum (6) and a maximum (20) value. For each gene a random decision is used to decide whether it is a basic gene or an evolved gene, this decision is biased in favour of evolved genes (80%). The gene is then selected randomly from the appropriate set of genes. Population control In Pareto selection, the offspring is accepted into the population if it is not dominated by an individual in the population. The accepted individual itself can dominate any or no individuals in the population. A non-dominated offspring that does not dominate any other individual leads to an increase in population size. With offspring dominating more than one individual in the population, it is possible to either remove only one individual from the population (keeping the population size constant), or to remove all individuals dominated by the offspring (reducing the size of the population). Reduction in population size can be beneficial, because it counteracts the growth in population from non-dominated, non-dominating individuals, thereby keeping the computation overhead associated with large populations low, and increasing the selection

76 Example Applications 68 pressure by removing non-optimal individuals. At the same time, it can lead to a strong loss in genetic material, especially right after the initial population is generated. Some new individuals might dominate a large fraction of the initial random population. A compromise method therefore has been adopted: as long as the population size is over a minimum size (1000), all dominated individuals are removed from the population. If the population size falls below the minimum size, only one dominated individual is removed. Genetic operations ) are used. The same genetic operations as in the gene creation runs (Section Designs Produced A number of runs have been executed. All runs use the number of wrongly coloured segments as representation fitness, but they use different sets of design brief fitnesses and implicit design fitnesses. Most runs shown here use the evolved representation produced in flw-create-1. They have been run until the system seemed to have reached a stable state, without improving the best design for a high number of cycles. The fitness landscape created by the interaction of representation and fitness function, together with the very large search space, proves to be a very difficult problem for the evolutionary system. For example, the system cannot easily improve solutions by adding rooms, because in most cases this would mean that an outer wall would become an inner wall, requiring a change of line colour in all segments of that wall. Additionally, the fitness calculation is computationally expensive, posing a practical limit to the number of reproduction cycles that could be executed in a run. As a result, only one of the runs produced an individual that achieved a perfect value for all fitness criteria used. The first three runs executed, flw-use-1,2,3, use nine different fitness functions. As design brief fitnesses, it was specified that the designs should have one porch room, between two and four living rooms, between two and four service rooms, and a total enclosed area of between 110 and 130 square units (fitnesses B5,B6,B7,B4). These fitnesses provide some very generic specification for the designs the system is expected to design; very similar criteria could be expected in a brief given to a human designer. The implicit design fitnesses used specify that no lines that do not enclose any room ( loose ends ) occur, that exactly one fireplace consisting of two segments and located between service and living area exists, and that all rooms are connected through internal walls (fitnesses I9,I7,I8,I4). As described above, these criteria are used to explicitly enforce certain aspects of the designs that a human designer would usually implicitly use in any design work. The final fitness is the representation fitness, requiring that all segments have appropriate colours (fitness R1). Figure 8.11 shows the best two designs produced for the three different runs with this fitness. All three runs managed to produce individuals with nearly perfect fitness, the first run producing the only perfect result (Figure 8.11 (a)). In the other designs, the only flaws are a few segments with incorrect colour. In all designs, a number of topographical features, not specified in the fitness function, can be noticed: the fireplace consists of two straight segments; the porch is attached to a living zone room, the size of the porch is comparable to the size of the porches in the examples, and most of the rooms are long and not very wide. On the other hand, the size of the service areas is considerably smaller than in the examples. Noticeable in all designs is that they use few internal walls, this is a result of the structure of the fitness landscape, as described above. Some of the room shapes would be very unlikely in a building designed by a

77 Example Applications 69 (a) (b) (c) (d) (e) (f) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.11: Designs produced using evolved representation, with 9 fitness functions human designer; however it is important to remember that there is no fitness criterion for the shapes of the rooms. The fact that most of the rooms have acceptable shapes is a direct result of the use of the evolved representation. Also noticeable in the results is that gene 738 (see Figure 8.8) is used in the best designs in all three runs. This gene produces the service-room section that is identical in all designs. The best two designs of the first and third run also use gene 728; inspection of the genotype shows that the best two designs from the second run use a similar, but slightly smaller gene (predecessor of gene 728). An interesting feature of the genotypes produced is the repeated occurrence of gene 738, as can be seen in the genotype for the design in Figure 8.11(c): ( c ) (c indicates a basic gene). This is possible, because the starting-point and end-point of the phenotype represented by this gene and the initial and final direction happen to be identical. Repeating the gene therefore only leads to a repeated drawing of the same section, without produc-

78 Example Applications 70 ing any changes in the phenotype. While there is nothing to indicate that this has occured in these runs, it is interesting to note that gene-duplication and subsequent modification of one copy is seen as an important mechanism in evolution in nature. Generally, it can be observed that overwriting of existing segments occurs to quite a large extent in the individuals in the final population. The genotypes contain far more basic genes than required to produce a given phenotype. Some of the genetic material is entirely redundant, as in the example of the repeated evolved gene above. The source of this redundant material is either the initial random creation of the individuals, or any of the genetic operations. As the fitness function does not consider genotype length, redundant genetic material is neutral with respect to selection. If it occurs in an otherwise successfull individual, it is likely to be passed on to at least some of the offspring of the individual. Most of the genetic material involved in overwriting, however, is not redundant, but helps the evolutionary system to assemble large evolved genes into genotypes that produce valid phenotypes, without loose ends, and with correctly coloured segments. The genotype shown above also demonstrates the variety of evolved genes used, created in all phases of the gene creation run. All genotypes of the designs shown use one or more basic genes. (b) (a) (c) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.12: Designs produced using evolved representation, with 9 fitness functions, using a different evolved representation Figure 8.12 (flw-use-4) shows results from a run with an identical fitness functions, but using 252 evolved genes from a second gene-creation run, flw-create-2. The minimization of internal walls is especially noticeable in these results. Finally, Figure 8.13 shows designs produced using only the basic representation (run flw-use-9). The size of the initial individuals, now only composed from basic genes, is set to be between 10 and 60. Figures 8.13(a) to 8.13(c) show the designs with the best total fitness. The colour of the segments is used in the same way as in the other runs

79 Example Applications 71 to determine the function of each room. As can be seen, only a very small fraction of segments actually has the correct colour, the evolutionary process seems unable to correct the colour of the segments. None of the runs with evolved representation shows this problem, the longer sequences of single-coloured segments provided by the evolved representation allow the system to produce long walls of correctly coloured segments. (b) (c) (a) (d) (e) (f) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.13: Designs produced using basic representation only, with 9 fitness functions: (a) to (c): best overall individuals; (d) to (f) best individuals when representation fitness is ignored. However, enforcing a correct segment colour is only meaningful with evolved genes, which can actually provide topological information. Figures 8.13(d) to 8.13(f) therefore show the best results if the number of wrongly coloured segments is not taken into account. The design in Figure 8.13(d) actually achieves perfect fitness in all other fitness criteria. However only a few of the topological features identified in the designs in Figure 8.11 can be found in the designs. Some of the designs have huge porch areas, in others the porch is connected to the service area. In some, the fireplace is located in a corner. Some rooms are very small, and function areas split into unconnected rooms. The fireplace is always located between living area and service area, this is however

80 Example Applications 72 only because one of the fitness values used checks for this. Since the number of wrongly coloured segments is already very high, there is no advantage in creating designs with a minimal number of inner walls, the designs are therefore more compact than the other designs shown in this section. Influence of the Fitness Function Additional fitness functions can improve the topology of the resulting designs. For the results in Figure 8.14 (runs flw-use-4 and flwuse-5), the fitness function used for runs flw-use-1,2,3 has been modified in two ways. Instead of specifying only a total size, individual sizes are specified for the three different function types (size in square units: porch 9-12, living 55-70, service 45-60). Also, all living rooms have to be connected to each other by internal walls, as do all service rooms, at least one living room has to be connected to one service room, and the porch has to be connected to a living room. Together with the fitness functions used for flw-use-1,2,3, 15 different fitness functions are used (fitnesses B1,B2,B3,B5,B6,B7,I1,I2,I3,I5,I6,I7,I8,I9,R1). While the additional topological fitnesses force the system to produce more internal walls, and better balanced area sizes for service and living, many of the segments now have wrong colours. Again, gene 738 proves very successful, it is used in all four designs. 1 (a) (b) (c) (d) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.14: Designs produced using evolved representation, with 15 fitness functions The designs created with this set of 15 fitness functions produced the best overall results, therefore the best design of the first run flw-use-4, shown in Figure 8.14(a), has been selected to be manually extended from a floor plan into a three dimensional design for a house. Figure 8.15 shows the result of this process.

81 Example Applications 73 Figure 8.15: View of a house, created manually from floor plan shown in Figure 8.14(a). Influence of the Evolved Representation A number of runs have been performed to demonstrate the influence of the evolved representation. In all these runs, the fitness function is the 9-fitness-Pareto function used for flw-use-1,2,3,4. The runs also use the same parameters, unless noted otherwise. Restricting the set of evolved genes used to the first 150 produced in flw-create-1 produces the designs shown in Figure 8.16 (runs flw-use-5 and flw-use-6). The system was able to produce designs where most of the segments have the correct colours, as in the runs with the full set of genes this was mostly achieved by minimizing the number of internal wall segments. However, the shapes produced are noticeably different. Simple rectangular shapes dominate, and only one design (Figure 8.16(d)) uses a porch which can be found in one of the examples. On the other hand, the proportions of living/service/porch areas are consistent, and not very much different from the results produced with the full set of genes. Also, the topology of the results is very good, with the porch always connected to the living area, and single blocks of living and service areas in all designs but one (Figure 8.16(d)). The genes used in these two runs, the first 150 genes produced in flw-create-1, mostly represent features consisting of only a small number of segments. These genes obviously already carry a large amount of information about the topology of the examples, as well as information on shapes. Information about more complex shapes, however, is only contained in the larger genes that are produced later in flw-create-1. Influence of the evolved representation on initial population A different experiment shows the influence of the evolved representation on the initial population. In these two runs (flw-use-7 and flw-use-8, results shown in Figure 8.17), the initial population was created using the evolved representation, but then all evolved genes in the genotypes of the population were replaced with their equivalent basic-level gene sequence. This operation did not change the phenotypes of the individuals, but it did remove the protection of the gene sequences from disruption in the evolutionary design process. The results show that some of the gene sequences survive, but often in a modified form. The most obvious example is the service area shape, produced by gene 738, which appears in all four designs, however the location of the fireplace is shifted in the designs of the second

82 Example Applications 74 (a) (b) (c) (d) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.16: Designs produced using evolved representation, with 9 fitness function, using a reduced set of evolved genes run, Figures 8.17(c) and 8.17(d). Similarly, the shape marked with an asterisk in Figures 8.17(a) and 8.17(b) is likely to be created from the Henderson house, with two red segments inserted in the middle Shifting focus by manipulation of evolved genes The evolved representation generated in run flw-create-1 creates a focus around designs with similarities to the examples, as shown in the previous section. Creative design, however, requires that this focus can be modified, towards different designs. This section shows an example of how such a shift can be generated simply by modification of the evolved representation Modification of the evolved representation The goal of the modification used in this example is to change the proportional area that the service rooms occupy in the designs. For this purpose, the sequence of basic genes represented by each of the evolved genes has been modified in the following way: Every basic gene that produces a black segment (outer service area) is replaced by two such genes. Every second basic gene in a series of more than one basic genes that produce red segments (outer living area) is removed.

83 Example Applications 75 * * (a) (b) (c) (d) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.17: Designs produced using evolved representation, with 9 fitness function, using evolved genes only for the initial population; * marks sequence surviving from Henderson example Other implementation details The implementation used is exactly the same as for the runs done with the unmodified evolved representation, as described in Section The runs presented also use the same 9-fitness-Pareto fitness function that has been used for most of those runs Designs produced Three runs have been performed with the modified evolved representation. The best two designs of each run are shown in Figure The results are as expected: the service areas are the dominant areas in all the designs. Small versions of some of the shapes used in the examples can be recognized for the living areas. The porch areas are unchanged, this is as expected since parts of evolved genes coding for porches have been left unmodified. The performance of the designs is slightly worse compared with the designs from flw-use-1,2,3,4, the designs use more dead ends and wrongly-coloured segments. The modification of the evolved genes seems to make it more difficult to combine the evolved genes into a high-fitness genotype. It is important to notice that the change in proportions is not influenced by the fitness, as the fitness function remains unchanged from the runs with the original evolved representation. The modification of the representation alone causes the change in features in the resulting designs, providing the shift of focus.

84 Example Applications 76 (a) (b) (c) (d) (e) (f) Service- Living Porch- Outside Porch- Living Service- Outside Service- Service Living- Living Living- Outside Fireplace Porch- Porch Porch- Service Figure 8.18: Designs produced using a modified evolved representation, with 9 fitness functions Discussion of Results The results shown demonstrate all the steps for a creative process, according to the definition in Section 4.3: The meta search space is the space of all designs that can be generated from the basic representation. An evolved representation can be created from example designs (a set of Prairie House floor plans in this example), using an evolutionary process. Using this representation in a design process has a strong influence on the outcome of the designs, the new designs share many features with the examples. The evolved representation therefore creates a focussing effect on the search process that is used to create the new designs. Since the new designs are created using an evolutionary search, diverging elements are used in this process, fulfilling the next criterion. Manipulation of the evolved representation, in this case by adding and removing basic genes, allows the transformation of the search space; new designs will be distinctively different, while no change in fitness function has occured. Finally, the shift in focus is goal oriented, it was intended to change the relative size of living and service areas, and did exactly that. The comparison of the runs with the full set of evolved genes and with the reduced set of evolved genes shows that evolved genes can contain different amounts of knowl-

85 Example Applications 77 edge, and that this difference can be used to control how much information from the examples is used in the new designs. The importance of the continued protection of gene combinations inside evolved genes throughout the creation of new design is shown in the fact that when this protection is removed, features in the final design appear in only as modified versions of the original features. Finally, the results point out that an interaction between the evolved genes and the fitness occurs: the fitness function influences which evolved genes appear in the final design (for example genes 728 and 738), and the evolved genes influence the fitness landscape, as noticeable in the minimized inner wall length in most of the designs. 8.3 Design Example 2: Generating Paintings The second example demonstrating the use of evolved representations will draw from two different design domains. First, a set of evolved genes is created from a set of paintings by the Dutch De Stijl painter Piet Mondrian. The same basic representation is then used to create evolved genes from a window design by Frank Lloyd Wright. The resulting evolved representations can be combined and manipulated in a number of ways, allowing the production of new designs that combine features from the two sources. The process creating the new designs can be classified as creative according to the necessary requirements stated in Section Background The choice of De Stijl paintings as a source for an evolved representation is inspired by the work of Knight. In Knight (1989b) and Knight (1994), she analyses how shape grammars can be used to describe the style and the change in style over time of two De Stijl artist, Fritz Glarner and Georges Vantongerloo. As with the use of shape grammars for Prairie Houses in the previous section, differences exist between shape grammars and evolved representations in both the way the knowledge is generated and in how it is used. In Knight s (1989b) work, the shape grammars are created by hand, and used mainly as descriptive tools. Using evolved representations as demonstrated here, the knowledge is generated by machine learning, and used directly to create new designs. An interesting parallel with Knight s (1989b) work is the importance of transformation. However, while Knight uses shape grammars to describe the transformations of the style of the artists between different periods in their artistic career, transformations are used here in generative way to modify existing style knowledge Representing Mondrian Paintings From the large body of paintings from Piet Mondrian, a considerable subset can be described as containing single-coloured, rectangular areas, separated by black lines of different width, on a white background. Colours used are mainly red, blue, yellow, black and grey. This is a simplifying description, ignoring details like colour variations and lines that do not extend the whole length of the side of a rectangle. However, images reproduced to fit the description are still very much recognizable as Mondrian paintings, as Figure 8.19 shows. For the design of the initial representation, the same criteria as in Section apply. The tree-coding, which is used in this example, has been adopted for two reasons: the hierarchical representation seems to fit the structure of the paintings very well, and

86 Example Applications Figure 8.19: Simplified Mondrian paintings, numbers correspond to Bois et al. s (1995) reference of Mondrian s work tree representations are used extensively in evolutionary systems, especially genetic programming applications; this example can show how evolved representation can be used with such a representation. In the tree representation used, the root of the tree represents the original, white canvas. Every node, including the root node, corresponds to a division of a rectangular canvas into two smaller rectangles. In a recursive process, the resulting rectangles can then be interpreted as the whole canvas, and divided into smaller rectangles. Each node has four attributes: the position and direction of the division, the thickness of the dividing line, and colour information. Every node can have a left and a right subtree, the left subtree is always associated with the smaller of the two sub-rectangles. The colour information in the node is used to determine the colour of the smaller sub-rectangle, the colour of the larger sub-rectangle is inherited from the node above. One rectangle will inherit the canvas-colour, this rectangle can be found by following the right subtrees (larger sub-rectangles) from the root node until no further right subtree exists. The canvas colour is set outside the evolutionary system, this restricts the set of possible designs to designs that have at least one rectangle of the same colour as the canvas. Figures 8.20(a) and 8.20(b) show an example of how a Mondrian painting (Number 131 in Figure 8.19) can be described in this way. In the first step, the original canvas is divided by a thin vertical line, and the smaller (in this case left) rectangle is coloured grey. The larger rectangle inherits the canvas colour white. The left subtree (node 2) then treats the grey rectangle as if it was the original canvas, and causes a horizontal split, where the smaller (lower) rectangle is coloured white. Similarly, node 3 causes a vertical division at the same position, also producing a white smaller rectangle, while the larger, top rectangle inherits the original (white) canvas colour. The lower rectangle is then divided vertically, producing a red and a white sub-rectangle (node 4), and those sub-rectangles finally divided vertically to produce the final painting. The division in node 6 is the only one in this painting where a thick line is used. Because the left branch always represents the division of the smaller rectangle, nodes 5 and 6 are swapped in Figure 8.20(b) compared with 8.20(a). The illustration in Figure 8.20(b) is slightly simplified, as the location of the division is encoded in two attributes in the implementation. One attribute describes the part of the rectangle where the division is done (top half, bottom half, left half, right half), the other specifies the exact location as a value in the range ]0, 0.5]. While the genotype-phenotype transformation is unique, there can be multiple ways to represent a given phenotype in a genotype. Figures 8.20(c) to 8.20(e) show the three

87 Example Applications 79 (c) (d) (e) 1 vertical grey thin horizontal white thin 2 horizontal white thin 3 4 vertical red thin horizontal white thick 6 horizontal white thin 5 (a) (b) (f) Figure 8.20: Representation of a Mondrian Painting: (a) as tree-structured series of rectangle divisions, (b) as tree of nodes with three attributes; (c),(d),(e) shapes with multiple representations; (f) representation of a pinwheel topological features that can be represented in different ways. If a horizontal and a vertical line intersect, either of them can be the first division, the other line is then produced by a division at the same location in the two sub-nodes. In Figure 8.20(c), one could therefore either use a vertical line to start with, and then two horizontal divisions in the sub nodes at 1/3 from the bottom; or a horizontal split at 1/3 from the bottom followed by a central vertical split in both of the sub-nodes. Two parallel divisions of a rectangle can be produced in either order, Figure 8.20(d) could be produced by a tree where the top node represents a horizontal split either 1/3 from the top or 1/3 from the bottom; the right sub-node (larger sub-rectangle) would in both cases represent a centred horizontal split. Finally, divisions in the centre of a rectangle can be represented in two ways, the line in Figure 8.20(e) could be represented as either 1/2 from the bottom or as 1/2 from the top. The representation allows the creation of a large set of rectangle-based twodimensional designs. Some additional designs, including for example the pinwheel shapes used in paintings by Vantongerloo (Knight, 1989b), can be represented as well if invisible lines are allowed, as shown in Figure 8.20(f). The invisible line, which can be located anywhere in the painting as long as it intersects the middle rectangle, divides the design into two halves that can then each be represented by tree-based divisions. The representation is extensible - a version for three-dimensional objects will be shown in Section Figure 8.21 shows the three example paintings used to create the evolved representation.

88 Example Applications Figure 8.21: Mondrian paintings used to create evolved representation Representing a Frank Lloyd Wright Window As a second example, a window design, created by Frank Lloyd Wright for Hollyhock House (Hanks, 1989), has been transformed into a representation compatible with the one used for the Mondrian paintings. The original design contains five repetitions of a frame containing together more than 100 panels, plus a simple frame at the top and two simple frames at the bottom, see Figure 8.22(a). The version used for the gene creation has been simplified, it contains three identical frames, containing 66 panels (see Figure 8.22(b)). The different widths of the connecting segments have been approximated by using two different line widths. The design is then rescaled, so that the sides of the outer rectangle have unit length, as shown in Figure 8.22(c). The line widths and colours used are different from the ones used for the Mondrian paintings. For comparison, a shape grammar representation of Frank Lloyd Wright window designs can be found in Rollo (1995) Definition of Evolved Genes With the representation defined, the next task is to define the type of evolved genes appropriate for the representation. Using a tree representation, a simple clustering of neighbouring evolved genes, as in the floor plan example (Section 8.2), is not possible. This section will describe the type of evolved genes used, and the diploid representation used in conjunction with these genes. The representation has been designed with gene creation in mind. It has two properties that are important for the gene extraction. Locality All divisions caused by nodes at any depth in the tree will always occur in the inside of the rectangle produced by the next-higher node, and divisions that are created by two sub-nodes of a common node will always occur in neighbouring rectangles in the phenotype. As a result, nodes close to each other in the tree will always produce effects that are close to each other in the phenotype. However, the inverse relation does not always hold, for example the lower rectangle created by node 2 in Figure 8.20(a) is a neighbour of the rectangles created in node 5. Position independence The effect produced by a node or a combination of nodes is similar, independent of where the node is located in the tree. Only the current canvas colour and the external shape depend on the nodes above. The tree representation makes it difficult to exactly define what a basic gene is. Two kinds of information exist in any genotype: local information, contained in the values

89 Example Applications 81 (a) (b) (c) Figure 8.22: Representing a Frank Lloyd Wright window design: (a) original design (copied from Hanks (1989)); (b) simplified version; (c) design rescaled for gene creation of the attributes in the nodes of the tree (colour, position, fraction, line-width), and topological information, contained in the structure of the tree. Both kinds of information are important, and evolved genes therefore have to be able to include both of them. Instead of basic genes, the following discussion will refer to free values, that is values of the attributes that are not part of any evolved genes. Because of the position independence, only relative position information has to be incorporated into evolved genes. Generally, evolved genes therefore consist of combinations of one or more values on one or more nodes, with the nodes in some relation to each other. For example, genes that could be derived from the genotype in Figure 8.20(b) are: ( horizontal, white ) on a single node: this evolved gene could be used at nodes 2, 3, 5 and 6 in Figure 8.20(b); ( vertical on one node, and horizontal in the left and right sub-node): this evolved gene could be used at nodes 1 and 3 in Figure 8.20(b). Evolved genes themselves form trees, with the highest node defined as the root node. Topological information without local information does not convey much information. The requirement, therefore, is that nodes are only added to an evolved gene if at least

90 Example Applications 82 one attribute on this node is part of the evolved gene, or if the node is required to connect to another node that fulfils this condition. As in the floor plan application (see Section 8.2), evolved genes are created bottom up, by always combining two smaller units (free values or evolved genes). Ideally, all combinations of values on all nodes should be considered. However, as before, the number of possible combinations quickly gets too large with growing genotype size, and again, the locality of the representation is used to restrict the search to nodes that are direct neighbours in the tree. For the possible combinations, this means: Free value + Free value An evolved gene can be created for two free values if they are on the same node, on a node and one of its two sub-nodes, or on the two sub-nodes of a common node, in which case the common node becomes the root node of the evolved gene. Evolved gene + Free value A free value can be added to an evolved gene if it occurs on any of the nodes that are already part of the evolved gene, or on any of the direct sub-nodes of those nodes. Evolved gene + Evolved gene Two evolved genes can be combined if the root node of one evolved gene is located on one of the nodes that are part of the other evolved gene, or on any of the direct sub-nodes of those nodes. An additional condition is that there is no conflict between the evolved genes, see the discussion of dominant and recessive genes below. Even with a restriction to only neighbouring nodes, the number of possible new genes can easily become too large. Fortunately, only a small fraction of the possible combinations will occur in the population at any given moment. Since any new evolved gene will obviously have to occur in the population, it is possible to restrict the search to the combinations existing in the population. A hash-table can therefore be used to allow efficient access to the different gene combinations. For the Mondrian application, the table usually contains about 3,000-4,000 entries, and the most common new genes occur about times in a population of 1,000 individuals Dominant and Recessive Genes Evolved genes as defined above have to contain information about only some values on the different nodes. As a result, overlapping evolved genes can occur, for example caused by cross-over. To use evolved genes in this basic representation, dominant and recessive genes have to be used as described in Section 7.4, requiring special genetic operations, and influencing the creation of evolved genes. Cross-over The function of the cross-over operation is illustrated in Figure In this Figure, the interesting sections of tree structures as used to represent the paintings are shown; dotted lines indicate other parts of the tree that are not shown. The values at the nodes are represented by four circles; white and grey circles represent attributes containing free values, coloured circles represent attribute that are part of evolved genes. The crossover operation is an asymmetric operation, where a branch of one tree (the goal-tree, Figure 8.23(a)) is replaced by a branch from another tree (the source tree, Figure 8.23(b)). The first step is to randomly determine a cross-over point in each tree, illustrated with a in Figure Both trees have to be prepared for crossover. In the goal-tree, all nodes below the crossover point are removed, with the exception of those nodes that contain values from evolved genes whose root is located above the crossover

91 Example Applications 83 (a) Goal-tree (b) Source-tree ~ ~ (c) (d) (e) Figure 8.23: Implementation of a cross-over operation in tree coding with dominant and recessive genes point, like the red evolved gene in the illustration. Evolved genes with a root below the crossover point (blue gene) are removed (Figure 8.23(c)). In the source tree, all nodes above the cutting point are removed, and values that are part of evolved genes with the root above the crossover point (green gene) are replaced by the corresponding free values (Figure 8.23(d)). The branch from the source tree is then merged into the goal tree, preserving all evolved genes, with genes from the source tree shadowing genes from the goal tree in case of conflicts. Remaining holes are filled in from the source tree (Figure 8.23(e)). Mutation For mutation, a random attribute at a random node in the individual to be mutated is selected. If this attribute contains a free value, it is simply replaced by a new, random value from within the range allowed for that attribute. If the value is part of an evolved gene, the first step is to find the root of that evolved gene. In Figure 8.24(a), the value selected is part of the yellow gene, indicated by an arrow. The evolved gene is removed by replacing all its parts in the genotype by the corresponding value (Figure 8.24(d)). A new evolved gene is selected randomly, and tried at the place of the old gene. In some cases, the new evolved gene will contain values at nodes where no node exists in the current tree; in this case the evolved gene is rejected. For example, an attempt to insert the red gene (Figure 8.24(b)) at the node where the root of the yellow gene was located would fail, because it would require the addition of a node at the bottom right of the tree (Figure 8.24(e)). A number of randomly selected genes (5) are tried. If none of them fits no new evolved gene is inserted and the holes are filled in by random

92 Example Applications 84 (a) (b) (c) Mutation point (d) (e) (f) Figure 8.24: Implementation of a mutation operation in tree coding with dominant and recessive genes values. If a new gene is inserted, it replaces existing free values, and dominates existing evolved genes. In the illustration, the purple gene (Figure 8.24(c))is inserted into the tree, creating an overlap with the existing green gene (Figure 8.24(f)) Creating Evolved Representation The evolved representation is created along the lines of the algorithm described in Section 3.4. The implementation of a fitness function to create the proximity value is the major task of the implementation. This is described in the following section; other implementation details are listed in Section The evolved genes produced from the two domains are shown in Section Fitness Calculation The fitness function used to create samples for the gene creation has to produce a measure of similarity between the individual and the examples. A relatively simple way to do this would be to compare the final images, for example on a pixel-by-pixel basis. However, given that a number of attributes for the paintings have already been identified and incorporated into the basic representation, it seems more useful to create individual similarity measures for those attributes. To do this, the fitness function does not directly use the final phenotype, but instead works in parallel with the genotype-phenotype transformation. It has to match divisions in the phenotype with divisions in the example. Starting with the root node of the genotype, it matches the division caused by the node with a division in the original. Both divisions have to be in the same direction, and the division in the original has to span the whole canvas. If no such division is found the fitness calculation stops here; if more than

93 Example Applications 85 one such division is found in the original, the one closest in position to the one created by the node is used. For example, to compare the phenotype in Figure 8.20 with any of the originals in Figure 8.19, it would look for a vertical line spanning the canvas. All of the originals have exactly one such line, though at different positions on the canvas. If a line can be matched, three fitness values can be derived: a measure of the difference in the positions of the divisions, this is measured as a fraction of the total width (for vertical divisions) or height (for horizontal divisions) of the current canvas. a test for the line width, this can be either 1 (lines have the same width) or 0 (lines have different width) a test for the correct colour. The colour value in the node influences the colour of the smaller sub-rectangle, but further divisions can introduce additional colours. The colour is therefore accepted as correct if at least one of the colours in the smaller sub-rectangle in the original is the same as the colour defined by the node. If a matching division has been found, the original is split along this division into two halves. In a recursive process, the smaller sub-rectangle with all its further subdivisions is then used as a new original and compared with the left sub-tree of the genotype, and the larger sub-rectangle is compared with the right subtree. The fitness values are summed for all nodes, and finally normalized by dividing them by the number of divisions that could be successfully matched, for example if six divisions have been matched, and three of them have the correct line width, the line width fitness is 0.5. Figure 8.25 shows the pseudo-code for the recursive fitness calculation. proc calculate-fitness(node, original, f itnesses) match division in node with division in original if match found then updatef itnesses position updatef itnesses colour updatef itnesses linewidth if node has sub-nodes then suboriginal 1, suboriginal 2 = divide-original(node, original) calculate-fitness(lef t subnode, original 1, f itnesses) calculate-fitness(right subnode, original 2, f itnesses) fi fi Figure 8.25: Pseudo-code for the comparison of an example with a genotype, the arguments of the initial call are the root node of the genotype, the example, and an empty fitness structure Three additional fitnesses are generated at the end: (number of matched divisions)/(number of areas in the example 1): this gives a measure of how complete the phenotype describes the example, (1 (number of nodes not matched including sub nodes) / (total number of nodes)): this gives a measure of how many divisions are in the phenotype that could not be matched with the example, and

94 Example Applications 86 maximum position offset of two matched divisions (as opposed to the average calculated above). The fitness computation produces six fitness values for the comparison between an individual and an example; with three examples, a Pareto fitness vector with 18 values is used. Niching is used to prevent convergence in the population Other implementation details Initial population To create the initial individuals, the size of the individuals is determined randomly first (2 to 10). Starting with an empty node, an appropriate number of nodes are then added at random places into the growing tree. Finally, all attributes in all nodes are filled with random values from their respective set of possible values: direction: top, bottom, left, right fraction: 1/2, 1/3, 1/4, 1/5, 2/5, 1/6, 1/7, 2/7, 3/7, 1/8, 3/8, 1/9, 2/9, 1/10, 3/10 Line width: none, thin, medium, thick colour: black, blue, green, cyan, red, magenta, yellow, white, blue4, blue3, blue2, light blue Population control In the floor plan application, all individuals are kept that have a fitness larger than 0 and are not already in the population. In this application, however, the fitness functions are defined in such a way that nearly all individuals created will have at least some fitness values better than 0, leading to a very fast growing population containing many badly fitting individuals. For that reason, individuals have to pass an additional test: during the comparison with the three examples, counts are made of how often the line width fitness is 1.0, the colour fitness is 1.0, the number of wrong nodes is 0, or the maximum position error is below a threshold value (1/27). If at least a certain number (3) of those conditions are fulfilled, the individual is accepted into the population. For example, an individual that has correct colours for two examples, and correct positions for one of them, will be accepted into the population. A stricter test, using fits as described below for the gene fitness, is not useful since it would reject to many individuals for a successful run of the evolutionary system. Also, the Pareto fitness vector would be reduced to a single value, making convergence control through niching impossible. If an offspring dominates more than one individual already in the population, all of them are removed from the population as long as the population size is above 500; otherwise only one individual is removed. Gene Combination Fitness For the identification of the best new gene combinations, a single fitness value is required. The Pareto fitness value used for selection in the evolutionary algorithm therefore has to be reduced into a single fitness value. Since most individuals in the population will fit any of the examples only partly, simply adding up the fitness values could lead to cases where the individuals with the highest sum are ones that are close, but not quite correct in all of the features. This could lead to creation of evolved genes containing features that do not occur in any of the examples. To contribute to the gene combination fitness, an individual therefore has to be an exact fit of a subset of at least one of the examples. To fit a subset of an example, all colours, line widths and divisions have to be correct for all nodes of the genotype, however the genotype can

95 Example Applications 87 have fewer nodes than needed to produce the full example. Evolved genes created from these individuals are guaranteed not to contain features that are not part of any of the examples. The gene fitness used is the number of nodes of a genotype, multiplied with the number of examples where the individual fits. Gene Creation The use of dominant and recessive genes creates a complication in the gene extraction. If an evolved gene is created from two existing evolved genes that overlap (one dominant, one recessive), the new evolved gene will end up with duplicate values for one or more attributes. Which of those values are dominant and which recessive can be different in every individual where this combination of evolved genes occurs. There is therefore no constant mapping from the evolved gene to the values it represents. In the implementation used here, overlapping evolved genes are therefore not combined into a new evolved gene unless the duplicate values are identical (in which case it does not matter which one dominates). An alternative solution would be to use a static dominance scheme, such as older evolved genes always dominate younger evolved genes Results of Gene Creation Evolved genes have been created for both the three Mondrian paintings and the Frank Lloyd Wright window design. In both cases, a number of gene creation runs have been performed, and for both of them, one representation has been selected to be used in all further examples. yellow medium right medium right white white medium bottom 1/10 yellow medium bottom 1/10 blue thick top 2/5 Figure 8.26: Evolved genes number 103 and 93 as schemata Display Convention As expected, most of the evolved genes produced in the runs specify only part of the set of attributes at each node. As an example, Figure 8.26 shows two evolved genes created from Mondrian paintings (number 103 and 93, compare Figure 8.27) as a tree. In both evolved genes, only some of the attributes in the root node are determined by the gene. To transform a tree into a displayable figure, however, a value has to exist for all attributes at each node. All attributes in the nodes that are not part of evolved genes therefore have to be filled with default values. To indicate which values are part of the evolved gene and which are default, the following convention is used. Default colour: the area is coloured pink, a colour not available for genes, as for example in gene 104 in Figure Default line width: a line width is used that is between the width used for thin and medium, as in gene 81 in Figure None of the genes in Figure 8.27 uses thin lines. Default direction: direction is assumed to be vertical, and the line is drawn stippled, as in gene 95 in Figure 8.27.

96 Example Applications 88 Default position: a position of 1/3 is used, and the line is coloured red, as in gene 80 in Figure Figure 8.27: Last 30 genes created in run mo-create-1. See text for presentation details Mondrian paintings The last 30 out of 109 genes produced in one run (mo-create- 1) creating evolved genes from the examples in Figure 8.21 are shown in Figure None of these evolved genes incorporate a red rectangle in the top left, this however is only a result of the dynamics of the evolutionary production of samples for the gene creation. Early in this particular run, many individuals started to incorporate red rectangles, and therefore a large number of evolved genes with red rectangles were created at that stage. In later stages of the run, the evolutionary system incorporated more of the other rectangles into the individuals, resulting in evolved genes being produced that use these colours, as shown in this figure. The evolved genes produced also grow in complexity over time - many of the evolved genes created at the final phase of the gene creation span three nodes. A large number of evolved genes produced throughout the whole run specify the colour white, this reflects the fact that the largest fraction of rectangles in the examples is also white. As a result, the designs produced using this representation will also have a large fraction of white rectangles, see Section Individual evolved genes, which as defined in this application can represent only local knowledge, cannot contain this information about the examples; the evolved representation as the collection of evolved genes however can. This is a manifestation of emergence from the interaction of many evolved genes, see Section While the individuals generated by the evolutionary algorithm are used only as samples and discarded at the end, they give a good indication of the quality of the samples. Figure 8.28 shows the best individuals created in mo-create-1. The individual with the

97 Example Applications 89 (a) (b) (c) (d) Figure 8.28: (a) Best overall copy and (b) to (d) best copies of the individual examples, created in run mo-create-1 highest overall sum of the Pareto fitness values (Figure 8.28(a)) incorporates features from all three examples. The other three individuals shown are the ones with the highest sum of fitnesses for the comparison with each of the three examples. The only visible flaws are the slight offset in the vertical division in the top rectangle in Figure 8.28(b), and the missing vertical division of the yellow rectangle in Figure 8.28(b) and (d), a feature which exists in the individual shown in Figure 8.28(a) Figure 8.29: Evolved genes created from window design Frank Lloyd Wright Window To create evolved genes from the window design, the system is run with the same setting as for the Mondrian examples. In this run (referred to as win-create-1), 159 genes were created. The best individual in the final population differed from the window designs only in the position of one dividing line. Some of the genes produces are shown in Figure Since only a single but very complex example was used for this application, the evolved genes are considerably more complex than those produced from the Mondrian paintings. The most complex gene shown, gene 156, contains information about values on seven nodes. Again, the genes specify mainly white areas Using the Evolved Representation Having created the evolved representation, it can now be used to produce new designs. Both representations (those created by mo-create-1 and win-create-1) will be used in two tasks: to create new painting-like designs, and to create new window-like designs Implementation Details

98 Example Applications 90 Initial population To maximize the influence of the evolved representation on the results, the initial population in this example is created using only evolved genes. The evolved representation therefore provides a hard focus (see Section 4.2.2). Creating an initial individual by randomly adding evolved genes into a growing tree will usually lead to strong overlaps between the genes, at the same time leaving unfilled attributes at many of the nodes. For this reason, a three stage process is used, that is designed to fill all attributes by evolved genes, while minimizing overlap. After randomly determining the number of nodes (3 to 10) in the final tree, an initial empty node is created. In the first phase, random terminal nodes of the current tree are selected and random evolved genes added to those nodes until this is not possible without producing a tree with more nodes than allowed. In the second phase, evolved genes are randomly selected and tested at all nodes in the tree, and added at the one node where they fill the highest number of remaining empty attribute in the tree, without enlarging the tree over the maximum number of nodes. Finally, when all attributes are filled, evolved genes that are entirely shadowed by other evolved genes are removed from the tree. Fitness calculation To create new individuals, a fitness has to be specified to evaluate the designs. For real world designs of this kind, aesthetic measures would be the appropriate criteria. However, few such measures exist that can be automated, and relying on the user to select nicer paintings is both tedious and has the danger of hiding the influence of the evolved representation by a bias of the user s selection. Instead, a very simple fitness function is used that is sufficiently difficult for the system, while being intended to be neutral with respect to the features learned from the examples. It consists of a number of fitness measurements, which are used independent of each other in a Pareto fitness vector. As in Section , the measurements are mapped between 0 and 1. The fitnesses used for the results presented here require (with mapping parameters in brackets): seven rectangles in the painting (7/7/7), shortest side of any rectangle in the painting, relative to canvas size not less than 0.7 (0.07/1.0/0.07), a white rectangle of given size in the top-right corner of the canvas. The colour white is used since this is the canvas colour and therefore has to appear in the painting in at least one rectangle anyway; two measurements are used for this: the fraction of the white goal rectangle that is covered by a corresponding white rectangle in the painting (1.0/1.0/1.0), and the area of the white rectangle in the painting which does not overlap with the white goal rectangle (0.0/0.0/1.0). Genetic operations To produce offspring the same tree based mutation and crossover operators, using dominant and recessive genes, as developed for the gene creation (see Section ) are used. Population control The initial population is 100 individuals, and offspring never replaces more than one individual.

99 Example Applications New Painting Designs Since the initial individuals are created without basic genes, the focus created by the evolved genes is a hard focus. For example, all initial individuals will use only the five colours used in the examples. Other colours, though part of the basic representation, cannot be present in the initial population. However, mutation is still able to introduce basic genes, thereby slightly softening the focus. The influence of the evolved representations can easily be seen in the initial, random individuals. The fitness function is not very difficult, the system usually finds perfect phenotypes within a few minutes. Figure 8.30 shows the results of eight runs with the evolved representation created by mo-create-1(figure 8.30(a)), eight runs with the evolved representation created in win-create-1, (Figure 8.30(b)) and eight runs with only basic genes, but an otherwise identical system (Figure 8.30(c)). The influence of the evolved genes is obvious: the results using the evolved representation use the colours and line widths of the examples, use fewer colours per painting, have larger white areas, and use proportions similar to those of the respective examples. An exception is the fourth painting generated from mo-create-1, where mutation obviously was able to change one colour into light blue, and change a line width to thin. This particular run needed about twice as long as the average number of cycles (see Section ), mutation therefore had a higher chance to influence the designs in this way. A similar effect can be found in one of the designs using win-create-1-genes, in this case the colours light green and turquoise were introduced by mutation New Window Designs In this example, new window designs, similar to a single frame in Frank Lloyd Wright s window design (Figure 8.22) are created. The fitness function is similar to the one used before, in this case 22 panes are required; and the design has to have a vertical line in the middle of the canvas. Again, the fitness function is intentionally chosen in a way that is neutral to the use of the style features, it neither prevents nor promotes them. Again, with the exception of the runs using no evolved genes, the initial individuals are always created so that they contain no basic genes. The results, using evolved genes from mocreate-1, evolved genes from win-create-1, or no evolved genes, are shown in Figure Again, the influence of the evolved representations is obvious Transforming the Evolved Representation Section 5.2 described how evolved representations can be transformed, by either combining evolved genes from different sources inside the same domain, or by transformation of evolved genes into different domains. Using the evolved representations created in mo-create-1 and win-create-1, both types of transformation have been implemented and will be shown here Mixing Genetic Material To show the influence of mixed sets of evolved genes, the sets of evolved genes from mo-create-1 and from win-create-1 are mixed in two different ways. The resulting sets of genes are used to create new designs for windows, with the same fitness as described in Section Combining gene pools Pools of evolved genes can be mixed by simply creating initial individuals using evolved genes randomly from both pools, and then running a process to

100 Example Applications 92 (a) (b) (c) Figure 8.30: New designs using (a) evolved genes created in mo-create-1, (b) evolved genes created in win-create-1, (c) basic genes only. create new designs as usual. In this implementation, both sources for the evolved genes use exactly the same transformation from basic-level genotype to phenotype, this transformation, therefore, does not require any modifications of the evolved representations. However, the transformation from evolved gene to basic gene has to be able to identify the correct set of basic genes represented by an evolved gene. The sets of symbols used for the evolved genes should therefore not overlap. Since in this implementation evolved genes are represented by numbers, this can be ensured by simply re-numbering one of the sets of evolved genes. Designs created using the mixed pool of 269 genes are shown in Figure 8.32(a). The influence of the genes is clearly visible in the designs, the different features can be directly mapped onto either of the examples. The initial individuals and the final designs contain more genes from win-create-1 than from mo-create-1, this reflects the fact that

101 Example Applications 93 (a) (b) (c) Figure 8.31: New designs using (a) evolved genes created in win-create-1, (b) evolved genes created in mo-create-1, (c) basic genes only. more evolved genes from the window design were available. Selectively combining gene pools Instead of just combining the gene pools, it is possible to create a new pool that contains only selected features from the two original pools. In this example, this is done by removing all information about position and fraction of the rectangle division from all nodes in the genes from mo-create-1, and removing all information about colour and line thickness from all nodes in the genes from win-create-1. In some cases, this leads to genes with only empty nodes, these genes are removed from the set of evolved genes. Because of this, the set of evolved genes combined as above contains only 264 genes. The individuals in the initial population created using these genes therefore have information about topological features defined by evolved genes from win-create-1, and

102 Example Applications 94 (a) (b) Figure 8.32: Window designs created from two combined evolved representations: (a) simple combination, (b) selective combination information about colour and line thickness defined by evolved genes from mo-create-1. This information is preserved by the evolved genes throughout the evolution, producing window designs like the ones shown in Figure 8.32(b). Only the occasional mutation can introduce different features, like the thin line in the first design in Figure 8.32(b). The imbalance in the number of evolved genes does not matter in this case, since the different sets of evolved genes are responsible for different aspects of the new designs. Figure 8.33 shows how the fourth design in Figure 8.32(b) can be assembled into a complete window, similar to the original Frank Lloyd Wright design. A rendered version of this design is also shown in the inner cover of this thesis. Figure 8.33: One of the designs in Figure 8.32(b), assembled into a full window (shown rotated) Using evolved genes in a different domain As mentioned in section 5.2.3, it is possible to use evolved representations in different domains from those in which they were initially created if either the genotype-phenotype

103 Example Applications 95 transformation or the evolved representations are adapted for this purpose. The new domain used in this example is the creation of coloured cubes. To create the three-dimensional objects, a basic representation is used that is similar to the one used for the paintings. However, instead of rectangles divided by lines, the nodes specify cubes intersected by planes. Each node has an additional attribute, specifying whether the intersecting plane is perpendicular to the x-y plane, to the y-z plane or to the z-x plane. In other words, to divide a cube, its projection into one of those three planes is used and the resulting two-dimensional shape is then cut as in the Mondrian painting. As Figure 8.34 shows, every cut can be represented in two different ways. ( 1 1) ( 2 2 ) ( 2 1 ) ( 3 2 ) ( 3 1 ) ( 1 2 ) Figure 8.34: Representation of cube intersections, values in brackets denote plane and direction, fraction is always 3/4 Due to the similar nature of the representations, it is easy to adapt the evolved genes created from the paintings and the window. Obviously, none of the evolved genes provides a value for the cutting plane. One possibility is to assemble initial individuals without this value, and then provide a random value for this at every node. However, this makes the topology features learned from the window less recognizable in the phenotype. The reason for this is that genes specifying topological features span a number of connected nodes, and these features are only recognizable easily in the resulting phenotype if all the divisions specified in these nodes are perpendicular to the same plane. For example, a series of parallel divisions only remains parallel if all divisions are perpendicular to the same plane or, in other words, have the same value for the added attribute. If this value were randomly assigned, this would generally not be the case. Better results can be produced by adapting the evolved genes before use for the threedimensional application. For every gene, a value for the intersection plane is randomly chosen, and added to every node of that gene. For the example, all topology-features were removed from the evolved genes produced in mo-create-1, and all colour and line-width information removed from the evolved genes created in win-create-1. Then, information about the intersection plane was added to all window-genes in the way described above. The resulting genes were

104 Example Applications 96 then used to create the initial population. The fitness function is similar to the one used to create new window designs: 70 sub-cubes, a plane intersecting the resulting cube in the middle, and no edges shorter than 7% of unit length. The result is shown in Fig (a). As a comparison, Fig (b) shows a cube created with the same fitness function and genes, but assigning the planes randomly to the individual nodes. While similarities in colouring and line-width to the Mondrian paintings exist in both cubes, the cube in Figure 8.35 (a) shows more topological features relating to the Frank Lloyd window example. Figure 8.35: 3d-objects created from 2-d representation, views show opposing corners. (a) Plane information added to evolved genes, (b) plane information added to nodes Discussion of Results Like the floor plan example described in Section 8.2, this example demonstrates all steps required for a creative process, as defined in Section 4.3: production of evolved genes to create a focus, evolutionary search using this focus, and goal-oriented transformation of the focus. Compared with the floor plan example, the use of an evolved representation here is far more complex, as a tree representation is used, with non-consecutive genes that require diploid values. As the resulting genes show, it is possible to create evolved genes that contain knowledge about both the shape of the tree and about values at the nodes. The resulting designs very strongly reflect the use of the evolved representation. One of the reasons for this is that a hard focus was used, and therefore the colours, line widths, and topology features are restricted to those occuring in the examples; another reason is that the examples have very strong visual style characteristics that are easily represented

Genetic Engineering and Creative Design

Genetic Engineering and Creative Design Background genes, genotype, phenotype, fitness Connecting genes to performance in fitness Emergent gene clusters evolved genes MIT Class 4.208 Spring 2002 Evolution