Starrylink Editrice. Tesi e Ricerca. Informatica

Size: px

Start display at page:

Download "Starrylink Editrice. Tesi e Ricerca. Informatica"

Kenneth Crawford
5 years ago
Views:

2 Starrylink Editrice Tesi e Ricerca Informatica

6 Table of Contents Measures to Characterize Search in Evolutionary Algorithms... 7 Giancarlo Mauri and Leonardo Vanneschi 1 Introduction PreviousandRelatedWork FitnessDistanceCorrelation StructuralTreeDistance Structural Mutations Property 1. Distance/Operator Consistency Experimental Results on fdc W MultimodalTrapFunctions Series Series Series RoyalTrees MAX Problem Counterexample Negative Slope Coefficient Experimental Results on nsc Thebinomial-3problem The even parity k problem Summingup SubtreeCrossoverDistance ExperimentalResults FitnessDistanceCorrelation SyntacticTrees TrapFunctions FitnessSharing Diversity SummingUp ConclusionsandFutureWork... 37

8 Measures to Characterize Search in Evolutionary Algorithms 7 Measures to Characterize Search in Evolutionary Algorithms Giancarlo Mauri 1 and Leonardo Vanneschi 1 Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) University of Milano-Bicocca Milan, Italy Abstract. The ability of evolutionary algorithms to solve a combinatorial optimization problem is often very hard to verify. Thus, it would be very useful to have one or more numeric measures able to quantify the ability of evolutionary algorithms to find good quality solutions to a given problem from its high level specifications. In this paper, two difficulty measures are presented: fitness distance correlation and negative slope coefficient. Advantages and drawbacks of both these measures are presented both from a theoretical and empirical point of view for genetic programming. Furthermore, to analyse various properties of the search process of evolutionary algorithms, it is useful to quantify the distance between two individuals. Using operator-based distance measures can make this analysis more accurate and reliable than using distance measures which have no relationship with the genetic operators. This paper also presents a pseudo-distance measure based on subtree crossover for genetic programming. Empirical studies are presented that show the suitability of this measure to dynamically calculate the fitness distance correlation during the evolution, to construct a fitness sharing system for genetic programming and to measure genotypic diversity in the population. Experiments have been performed on a set of well-known hand-tailored problems and real-life-like GP benchmarks. 1 Introduction For classical algorithms a well-developed theory exists to categorise problems into complexity classes. Problems in the same class have roughly the same complexity, i.e., they consume (asymptotically) the same amount of computational resources, usually time [36]. Although, properly speaking, Evolutionary Algorithms (EAs) are randomised heuristics and not algorithms, it would be useful to be somehow able to likewise classify problems according to some measure of difficulty. Difficulty studies in Genetic Algorithms (GAs) have been pioneered by Goldberg and coworkers (e.g., see [14, 9, 21]). Their approach consisted in constructing functions that should a priori be easy or hard for GAs to solve. These ideas have been followed by many others (e.g. [33, 12]) and have been at least partly successful in the sense that they have been the source of a considerable amount of other work on what makes a problem easy or difficult for GAs.

9 8 G.Mauri and L. Vanneschi One concept that underlies many approaches is the notion of fitness landscape, which originated with the work of Wright in genetics [53]. The fitness landscape metaphor can be helpful to understand the difficulty of a problem for a searcher that is trying to find the optimal solution for that problem. For example, imagine a very smooth and regular landscape with a single hill top. This is the typical fitness landscape of an easy problem: most search strategies (hill climbing, simulated annealing, tabu search, EAs, etc.) are able to find the top of the hill in a straightforward manner. The opposite is true for a very rugged landscape, with many hills and local optima which are not as high as the best one. In this case, even approaches based on populations of individuals, like GAs or Genetic Programming (GP), might have problems. The graphical visualisation of fitness landscapes, whenever possible, can give an indication about the difficulty of a problem for a searching agent like EAs. However, even assuming that one is able to draw a fitness landscape (which is generally not the case, given the huge sizes of typical search spaces and neighbourhoods), the mere observation of its graph surely lacks formality. The ideal situation would be to have a numeric measure able to condense useful informations on fitness landscapes. This work presents two numeric indicators of problem hardness: fitness distance correlation (fdc) and negative slope coefficient (nsc). Their ability to measure the difficulty of fitness landscapes will be investigated for tree based GP [25] in this paper. Nevertheless, all the concepts can be applied to other search heuristics, like local search, simulated annealing or tabu search and to other kinds of EAs, like GAs. Tree-based GP uses transformation operators on tree structures [26] to carry out search. These operators define a neighbourhood structure over the trees. To analyse various dynamics of the GP search process, it is often useful to quantify the distance between two trees in this topological space. For example, the distance between trees is useful if we want to monitor population diversity (see for instance [17, 16, 4, 32, 43, 11]) or if we want to calculate the fdc, asin the first part of this work (see among others [23, 42, 48, 44]). Operator-based distance measures can make calculating distance and the analysis of the search process more accurate [48, 44, 17, 16, 4, 32]. The difficulty in defining operatorbased distance measures was highlighted in [18]. Defining a distance measure, or a measure of similarity, that is, in some sense bound to (or consistent with) the genetic operators informally means that if two trees are close to each other, or similar, one can be transformed into the other in a few applications of the operator(s). Mutation-based distance measures for GP have been defined, the most common being some variations on the Levenshtein edit distance [16] and the structural distance [11]. In the second part of this paper, a subtree crossover based pseudo-distance measure for GP is defined and its usefulness to analyse some properties of the search process is experimentally shown. This paper is structured as follows: Section 2 presents the main results and investigations related to the present work that can be found in literature. Section 3 defines the fdc, presents some experimental results and discussed the main advantages and drawbacks of this measure. Section 4 presents the nsc as a mea-

10 Measures to Characterize Search in Evolutionary Algorithms 9 sure defined to overcome the main limitations of the fdc. A set of experimental results shows the suitability of the nsc for some standard and real-life-like GP benchmarks. Section 5 contains the definition of the subtree crossover pseudodistance for GP. A large set of experiments demonstrate its usefulness to describe some important dynamics of GP using the crossover operator. Finally, Section 6 concludes this work and proposes some hints for future research activities. 2 Previous and Related Work The usual empirical approach to problem difficulty in GP has been to run a more or less agreed upon set of test problems that have their origin in Koza s work [25], such as the even-n parity problem, various kinds of symbolic regression and the artificial ant on the Santa Fe trail. However, this point of view, while useful for practical benchmarking purposes, lacks generality since results are problemdependent and it is difficult to infer more general issues pertaining to intrinsic GP difficulty by just looking at statistics derived from running these problems a number of times. There have been few attempts to date to characterize GP difficulty by means of a single measure. One early approach has also been proposed by Koza [25] and consists in calculating, for a given problem, the number of individuals that must beprocessedinorderforasolutiontobefoundwithagivenprobabilityp (usually P = 0.99). This gives a number characterizing the required computational effort but it cannot be relied upon for distinguishing easy from hard in GP. In particular, it cannot give any clue as to what causes difficulty for GP on a given problem. However, empirical performance measures like this one are required in any case to assess the reliability of other synthetic hardness indicators and will be used later in this work. A potentially more fruitful approach would be to transfer to GP some of the considerations that have proved useful for studying GAs difficulty, in particular investigations that make use of the concept of fitness landscape. One early example is the work of Kinnear [24], where GP difficulty was related to the shape of the fitness landscape and analysed through the use of the fitness autocorrelation function, as first proposed by Weinberger [52] and later used in GAs work by Manderick et al. [30]. While fitness autocorrelation analysis has been useful in the study of GAs NK landscapes [52], Kinnear found his results inconclusive and difficult to interpret for GP: essentially no simple relationship was found between correlation length values and GP hardness. As well, correlation analysis was found to be unreliable in another study on the MAX3SAT problem, where the measure predicts the problem to be an easy one while it is actually difficult [40]. The work of Nikolaev and Slavov [35] is the only previous one that makes use of the concept of fitness distance correlation in GP. They define a suitable distance for trees and apply fdc to a problem of regular expression induction. However, their main goal was to determine which mutation operator, among a

11 10 G.Mauri and L. Vanneschi few that they propose, sees a smoother landscape on this problem, rather than a general study of problem difficulty in GP, which is the aim here. In the same vein, Punch and coworkers [37, 38] proposed a new synthetic benchmark problem of tunable difficulty dubbed the Royal Tree problem which was inspired by the well-known Royal Road problem used in GA theory [33]. However, Royal Trees were used by Punch and coworkers to test the effectiveness of multipopulation GP as compared to standard single population GP and not to gauge intrinsic GP difficulty. Nevertheless, Royal Trees will be used to that end in the present work. A recent, and atypical, attempt to quantify GP problem difficulty also deserves to be mentioned. In [7, 8], Daida and coworkers claim that, although a fitness landscape description of what is seen by a GP searcher may be legitimate, there are other factors that are not taken into account by this view and that are related to structural mechanisms such as tree shapes and depths. Their approach is based on the exhaustive study of the dynamics of the binomial-3 problem [7], and on purely structural constructive problems [8]. These are of tunable difficulty due to the existence of a range of ephemeral random constants for the binomial problem and other parameters. This approach is interesting but it is different from the one used here and, in some sense, it goes counter to the main goal of difficult studies, namely trying to find broad classes of problem types. In fact, it effectively restricts the scope of the search by limiting itself to contingencies and problem-specific aspects. However, this point of view is not necessarily contradictory with the fitness landscape view and could yield useful insights of general value. Langdon and Poli take an experimental view of GP fitness landscapes in several works summarized in their book [29]. After selecting important and typical classes of GP problems such as boolean problems, the ant problem, and the MAX problem, they study these fitness landscapes either exhaustively, whenever possible, or by randomly sampling the program space when enumeration becomes unfeasible. Their work highlights several important characteristics of GP spaces, such as density and size of solutions and their distribution. This is useful work and even if the author s goals are not openly aimed at establishing problem difficulty, it certainly has a bearing on it. More recently, Langdon has extended his studies for convergence rates in GP for simple machine models (which are amenable to quantitative analysis by Markov chain techniques) to convergence of program fitness landscapes for the same machine models using genetic operators and search strategies to traverse the space [27]. This approach is rigorous because the models are simple enough to be mathematically treatable. The ideas are thus welcome, although their extension to standard GP might prove difficult. A middle ground approach, based on what Goldberg has aptly called facetwise models [15] has still something to offer in the effort of better understanding the behavior of real programs on real program spaces using standard, or at least typical, GP operators and machine models. In fact, probably, an ensemble of techniques, rather than a single magic silver bullet would prove useful to advance the understanding of GP problem

12 Measures to Characterize Search in Evolutionary Algorithms 11 difficulty. To this aim, two indicators, which are different but not contradictory, will be proposed in Sections3 and 4 respectively. 3 Fitness Distance Correlation Fitness distance correlation has previously been studied for GAs by Jones [22] and some preliminary results on GP have been presented in [44, 41, 46, 47, 5]. Jones s approach to GAs problem difficulty states that what makes a problem easy or hard is the relationship between fitness of individuals in the search space and their distance to the global optima. The easiest way to measure the extent to which the fitness function values are correlated to the distance to a global optimum is to examine a problem with known optima, take a sample of individuals and compute the correlation of the set of (fitness, distance) pairs. Thus, given a sample F = {f 1,f 2,..., f n } of n individual fitnesses and a corresponding sample D = {d 1,d 2,..., d n } of the n distances to the nearest global optimum, fdc is defined as: fdc = C FD σ F σ D where: C FD = 1 n n (f i f)(d i d) i=1 is the covariance of F and D and σ F, σ D, f and d are the standard deviations and means of F and D. As shown in [22], GAs problems can be classified in three classes, depending on the value of the fdc coefficient: misleading (fdc 0.15), unknown ( 0.15 < fdc < 0.15) and straightforward (fdc 0.15), where the threshold values 0.15 and 0.15 have been determined empirically by Jones [22]. The second class corresponds to problems for which the difficulty can t be estimated because there is virtually no correlation between fitness and distance. This section of the paper contains an empirical demonstration that this problem classification also holds for GP. 3.1 Structural Tree Distance The distance metric used to calculate the fdc should be defined with regard to the neighborhood produced by the genetic operators, so to assure the conservation of the genetic material between neighbors [22, 39]. This is generally an easy task for GAs, where the well-known Hamming distance has a clear and simple relationship with standard GA mutation [20, 14]. For GP, defining a distance bound to the main genetic operators is obviously more difficult, given that genotypes are trees and genetic operators are more complex. In the first part of this paper, the well known structural distance (see [11]) is used for GP. This distance is the most suitable between the different definitions found in literature and the tranformations on which this distance is based allow to define two new mutation genetic operators in a very simple way. The new resulting evolutionary process

13 12 G.Mauri and L. Vanneschi will be called structural mutation genetic programming (SMGP), to distinguish it from GP based on the standard Koza s crossover (that will be referred to as standard GP). Later in this paper, a new pseudo-distance measure bound to GP crossover will be defined (see section 5). According to structural distance, given the sets F and T of functions and terminal symbols that are used to build the trees that are evolved by GP, a coding function c must be defined such that c : {T F} IN. The distance of two trees T 1 and T 2 with roots R 1 and R 2 is defined as follows: m d(t 1,T 2 )=d(r 1,R 2 )+k d(child i (R 1 ),child i (R 2 )) (1) i=1 where: d(r 1,R 2 )=( c(r 1 ) c(r 2 ) ) z, z IN, child i (Y )isthei th of the m possible children of a generical node Y,ifi m, or the empty tree otherwise, and c evaluated on the root of an empty tree is 0. Constant k is used to give different weights to nodes belonging to different levels. In most of this section of the paper, individuals will be coded using the same syntax as in [5] and [37], i.e. considering a set of functions A, B, C, etc. with increasing arity (i.e. arity(a) = 1, arity(b) = 2, and so on) and a single terminal X (i.e. arity(x) = 0) as follows: F = {A, B, C, D,...}, T = {X} The c function, for this particular language, will be defined as follows: x {F T} c(x) = arity(x) + 1. In the experiments presented here, the following values will always be used: k = 1 2 and z = Structural Mutations Given the sets F and T and the coding function c defined in Section 3.1, we define c max (resp. c min ) as the maximum (resp. the minimum) value assumed by c on the domain {F T }. Moreover, given a symbol n (resp. m) such that n {F T}(resp. m {F T})andc(n) <c max (resp. c(m) >c min ), we define: succ(n) (resp.pred(m)) as a node such that c(succ(n)) = c(n)+1 (resp. c(pred(m)) = c(m) 1). Then we can define the following operators on a generic tree T [44]: inflate mutation: a node labelled with a symbol n such that c(n) <c max is selected in T and replaced by succ(n). A new random terminal node is added to this new node in a random position (i.e. the new terminal becomes the i th son of succ(n), where i is comprised between 0 and arity(n)). deflate mutation: a node labelled with a symbol m such that c(m) >c min, and such that at least one of his sons is a leaf, is selected in T and replaced by pred(m). A random leaf, between the sons of this node, is deleted from T. Given these definitions, the following property holds: Property 1. Distance/Operator Consistency. Let s consider the sets F and T and the coding function c defined in Section 3.1. Let T 1 and T 2 be two trees composed

14 Measures to Characterize Search in Evolutionary Algorithms 13 by symbols belonging to {F T } and let s consider the k and z constants of definition (1) to be both equal to 1. If d(t 1,T 2 )=D, thent 2 can be obtained from T 1 by a sequence of D 2 editing operations, where an editing operation can be a inflate mutation or a deflate mutation. (this property has been formally proven in [44, 48]). From this property, we conclude that the operators of inflate and deflate mutation are consistant with the notion of structural distance: an application of these operators enable to move on the search space from a tree to its neighbors according to that distance. 3.3 Experimental Results on fdc To test the suitability of the fdc as a hardness measure for GP, we have used a large set of hand-tailored GP bechmarks, that all share the property of having a tunable difficulty (i.e. the difficulty of the problem for GP can be changed by simply variating the value of some parameters, which makes these problems particularly interesting for the study that is presented here). These benchmarks are: unimodal trap functions, multimodal trap functions, Royal Trees and the MAX problem. Only results concerning multimodal trap functions, Royal Trees and MAX problem are shown here, given that unimodal trap functions can be seen as a particular case of multimodal ones. Before showing the experimental results, these problems are briefly presented in each of the respective paragraphs. For a more detailed introduction, see for instance [44]. In all experiments, fdc has been calculated via a sampling of randomly chosen individuals without repetitions. For the GP simulations, the total population size was 200 and generational GP was used with tournament selection of size 10. Both standard GP (i.e. GP using standard subtree crossover [25]) and SMGP (i.e. GP using structural mutation operators presented in Section 3.2 as the sole genetic operators) will be used in the experiments. In both cases, the GP process was stopped either when a perfect solution was found (global optimum) or when 500 generations were executed. All experiments have been performed 100 times. Once a measure of hardness and the way to compute it have been chosen, the problem remains of finding a means to validate the prediction of the measure with respect to the problem instance and the algorithm. This is a necessary step in this approach for otherwise the whole argument might become circular, since there would be no way to relate different values of the difficulty measure among themselves. The easiest way is to use a performance measure. Naudts and Kallel [34] have a good discussion of that point. For the purposes of the present work, performance is defined as being the proportion of the runs for which the global optimum has been found in less than 500 generations over 100 runs. Even if this definition is informal and prone to criticism, good or bad performance values correspond to our intuition of what easy or hard means in practice 1. 1 McPhee and Poli [31] defined some other performance measures for a trivial problem (the 1-then-0s problem) and showed how one can use the GP schema equation to find optimum choice of operators and parameters.

15 14 G.Mauri and L. Vanneschi W Multimodal Trap Functions Multimodal trap functions, first proposed in [10] (informally called W trap functions, given their typical shape shown in figure 1), are typical EAs benchmarks, where the fitness of each individual can be calculated as a function of its distance to one of the global optima. They are characterized by the presence of several global optima. They depend R 1 R 2 Fitness (f) B 1 B 2 B 3 Distance (d) Fig. 1. Graphical representation of a W trap function with B 1 =0.1, B 2 =0.3, B 3 =0.7, R 1 =1,R 2 =0.7. Note that distances and fitness are normalized into the range [0,1]. on 5 variables called B 1, B 2, B 3, R 1 and R 2, each one belonging to the range [0, 1], and they can be expressed by the following formula: 1 d if d B 1 B 1 f(d) = R 1 (d B 1 ) B 2 B 1 if B 1 d B 2 R 1 (B 3 d) B 3 B 2 if B 2 d B 3 R 2 (d B 3 ) 1 B 3 otherwise where the property B 1 B 2 B 3 must hold. To built a GP landscape for multimodal trap functions, one has to choose a particular tree T o as the origin of the fitness/distance plane. All the distances lying on the abscissas of this plane will be calculated to T o,andthust o is a global optimum, since its fitness is equal to 1. All the trees having a distance to T o equal to B 2 (if R 1 =1)or equalto1(ifr 2 = 1) are global optima, since they have the same fitness as T o.

16 Measures to Characterize Search in Evolutionary Algorithms 15 The term multimodal has been used here to indicate the presence of multiple global optima (while this term is often used in literature to indicate the presence of multiple optima, either local or global). In this work, a particular tree belonging to the search space is chosen as origin (it will be called T o ). The maximum fitness (i.e. a fitness value equal to 1) is arbitrarily assigned to T o, so as to make it a global optimum. Now, if the R 1 constant is set to 1, all the trees having a normalized distance equal to B 2 to the origin have a fitness equal to 1 and thus are global optima too. In the same way, if R 2 is set to 1, all the trees at distance 1 to the origin are global optima. Let S o be the set of all these global optima different from T o.onetree (that will be indicated as T 1 ) belonging to S o is arbitrarily chosen, and all the other trees belonging to S o, but different from T 1, receive a random normalized fitness different from 1. To calculate the fdc, the minimum of the distances from each sampled tree to T o and T 1 is respectively considered (as suggested for GAs by Jones in [22]). Three series of different experiments have been performed, using both SMGP and standard GP and the coding language introduced in section 3.1: in the first two series (series 1 and series 2), the distance between T o and T 1 has been chosen inside the range (0, 1) (precisely, it is equal to 0.25 in series 1 and equal to 0.75 in series 2); in the third series (series 3), the distance between T o and T 1 has been chosen to be equal to 1. Results of these experiments are discussed below. Series 1. Trees T o and T 1 have been chosen randomly among all the couples of trees in the search space that respect the property that the distance between them is equal to Table 1 shows a subset of the experimental results of this first series of tests, on various W trap functions, obtained by changing the values of B 1, B 3 and R 2,whileR 1 has been kept constantly equal to 1 and B 2 has been set to the value of the normalized distance between T o and T 1.Results shownhavebeenchoseninthefollowingway:valuesforwhichfdc gives a result included into the range [ 0.15, 0.15] have been discarded, since no correlation between fitness and distance exists. Among all the other cases, a subset of triples B 1, B 3, R 2 have been randomly chosen. They are shown in the first column of Table 1.The value of the fdc for the corresponding multimodal trap function is shown in the second column of Table 1. The third column of Table 1 classifies each multimodal trap function according to Jones terminology, i.e. problems for which fdc 0.15 are labelled as straightforward and problems for which fdc 0.15 are labelled as misleading. For each multimodal trap function, corresponding to different triple values of B 1, B 2 and R 2, 100 independent GP runs have been executed and performance has been calculated, both for SMGP (reported in the fourth column of Table 1) and standard GP (reported in the fifth column of Table 1). As this table clearly shows, for each one of the considered multimodal trap functions, when the fdc value is lower than 0.15, i.e. the problem is classified as an easy one, the corresponding performance value is remarkably higher than 0.5 both for standard GP and SMGP, i.e. the global optimum has been found in the large majority of the runs that have been executed. This clearly corresponds

17 16 G.Mauri and L. Vanneschi fdc fdc prediction p (SMGP) p (stgp) B 1 =0,B 3 =0.3,R 2 = misleading B 1 =0,B 3 =0.35,R 2 = misleading B 1 =0,B 3 =0.4,R 2 = misleading B 1 =0,B 3 =0.45,R 2 = misleading B 1 =0,B 3 =0.5,R 2 = misleading B 1 =0,B 3 =0.55,R 2 = misleading 0 0 B 1 =0,B 3 =0.6,R 2 = misleading B 1 =0,B 3 =0.65,R 2 = misleading 0 0 B 1 =0,B 3 =0.7,R 2 = misleading 0 0 B 1 =0,B 3 =0.75,R 2 = misleading 0 0 B 1 =0,B 3 =0.8,R 2 = misleading 0 0 B 1 =0,B 3 =0.85,R 2 = misleading B 1 =0,B 3 =0.9,R 2 = misleading 0 0 B 1 =0,B 3 =0.95,R 2 = misleading B 1 =0,B 3 =1,R 2 = misleading 0 0 B 1 =0.1,B 3 =0.4,R 2 = straightf B 1 =0.1,B 3 =0.5,R 2 = straightf B 1 =0.15,B 3 =0.3,R 2 = straightf B 1 =0.15,B 3 =0.35,R 2 = straightf B 1 =0.15,B 3 =0.55,R 2 = straightf B 1 =0.15,B 3 =0.6,R 2 = straightf B 1 =0.15,B 3 =0.7,R 2 = straightf B 1 =0.15,B 3 =0.85,R 2 = straightf B 1 =0.15,B 3 =1,R 2 = straightf B 1 =0.2,B 3 =0.3,R 2 = straightf B 1 =0.2,B 3 =0.35,R 2 = straightf B 1 =0.2,B 3 =0.55,R 2 = straightf B 1 =0.2,B 3 =0.65,R 2 = straightf B 1 =0.2,B 3 =0.75,R 2 = straightf B 1 =0.2,B 3 =0.85,R 2 = straightf B 1 =0.2,B 3 =1,R 2 = straightf B 1 =0.25,B 3 =0.35,R 2 = straightf B 1 =0.25,B 3 =0.4,R 2 = straightf B 1 =0.25,B 3 =0.55,R 2 = straightf B 1 =0.25,B 3 =0.6,R 2 = straightf B 1 =0.25,B 3 =0.75,R 2 = straightf B 1 =0.25,B 3 =0.8,R 2 = straightf B 1 =0.25,B 3 =1,R 2 = straightf Table 1. Results of fdc using SMGP and standard GP for the first series of experiments with W trap functions; p stands for performance. to our intuition of what an easy problem is. Analogously, for each multimodal trap function with an fdc value larger than 0.15, i.e. for each problem that is classified as an hard one by the fdc, performance is remarkably lower than 0.5, and often equal or approximately equal to zero, i.e. the problem is indeed hard

18 Measures to Characterize Search in Evolutionary Algorithms 17 (the global optimum has been found very rarely over the runs that have been performed). Results shown in Table 1 are encouraging and suggest that fdc could be a reasonable measure to predict problem difficulty for some typical W trap functions. Finally, we remark that this is true both for SMGP (which uses genetic operators bound to the distance metric used to compute the fdc) and for standard GP (which uses standard subtree crossover). This probably means that a relationship also exists between structural distance and subtree crossover, even though we are not able to formalize it. Series 2. In this case, T o and T 1 have been chosen randomly with the contraint that the distance between them has to be equal to Table 2 shows a subset of the experimental results of this second series of tests, on various W trap functions, obtained by changing the values of B 1, B 3 and R 2. This table has to be interpreted exactly as Table 1. Once again, fdc seems to be a fairly good indicator of problem difficulty for each one of the functions that we have tested, both for standard GP and for SMGP. Series 3. This last series of experiments consists in choosing as global optima two trees with a normalized distance equal to 1 between them. Table 3 shows a subset of the results obtained with this set of experiments. These results are encouraging too, thus confirming the suitability of fdc as an indicator of problem hardness for multimodal trap functions both for standard GP and for SMGP Royal Trees This set of function uses the same language to code individuals as the one presented in Section 3.1. It was first introduced in [37] and it isbasedontheconceptof perfect tree.forinstance,ifonlyfunctionsymbols A and B are allowed (i.e. the maximum allowed arity for the nodes is equal to 2), the perfect tree has B as root (node at level 0), A as sons of the root (nodes at level 1) and leaves (X nodes) as nodes of level 2. This tree is shown at the left side of figure 2. If the set of function nodes is composed by A, B and C (i.e. the C A B A A B A A B A A B A X X X X X X X X Fig. 2. Optimum trees for F = {A, B} (left) and for F = {A, B, C} (right). maximum arity allowed is 3), then the perfect tree is the one shown in the right part of figure 2. Note that this tree has C as root and three optimum trees of root B as subtrees. By the same argument, the form of the optimum trees of any

19 18 G.Mauri and L. Vanneschi fdc fdc prediction p (SMGP) p (stgp) B 1 =0,B 3 =0.8,R 2 = misleading 0 0 B 1 =0,B 3 =0.85,R 2 = misleading B 1 =0,B 3 =0.85,R 2 = misleading B 1 =0,B 3 =0.9,R 2 = misleading B 1 =0,B 3 =0.9,R 2 = misleading 0 0 B 1 =0,B 3 =0.9,R 2 = misleading B 1 =0,B 3 =0.95,R 2 = misleading 0 0 B 1 =0,B 3 =0.95,R 2 = misleading 0 0 B 1 =0,B 3 =0.95,R 2 = misleading B 1 =0,B 3 =1,R 2 = misleading 0 0 B 1 =0,B 3 =1,R 2 = misleading 0 0 B 1 =0,B 3 =1,R 2 = misleading B 1 =0.05,B 3 =0.95,R 2 = straightf B 1 =0.1,B 3 =0.8,R 2 = straightf B 1 =0.15,B 3 =0.8,R 2 = straightf B 1 =0.15,B 3 =0.8,R 2 = straightf B 1 =0.2,B 3 =0.8,R 2 = straightf B 1 =0.2,B 3 =0.95,R 2 = straightf B 1 =0.2,B 3 =1,R 2 = straightf. 1 1 B 1 =0.25,B 3 =0.9,R 2 = straightf. 1 1 B 1 =0.3,B 3 =0.8,R 2 = straightf B 1 =0.3,B 3 =0.9,R 2 = straightf B 1 =0.35,B 3 =0.85,R 2 = straightf. 1 1 B 1 =0.4,B 3 =0.8,R 2 = straightf B 1 =0.4,B 3 =0.95,R 2 = straightf B 1 =0.4,B 3 =0.95,R 2 = straightf B 1 =0.45,B 3 =0.85,R 2 = straightf. 1 1 B 1 =0.5,B 3 =0.8,R 2 = straightf B 1 =0.5,B 3 =0.95,R 2 = straightf. 1 1 B 1 =0.55,B 3 =0.9,R 2 = straightf B 1 =0.55,B 3 =0.95,R 2 = straightf B 1 =0.6,B 3 =0.9,R 2 = straightf. 1 1 B 1 =0.6,B 3 =0.95,R 2 = straightf B 1 =0.65,B 3 =0.9,R 2 = straightf. 1 1 B 1 =0.7,B 3 =0.85,R 2 = straightf B 1 =0.7,B 3 =0.95,R 2 = straightf B 1 =0.7,B 3 =1,R 2 = straightf B 1 =0.75,B 3 =0.9,R 2 = straightf. 1 1 Table 2. Results of fdc using SMGP and standard GP for the second series of experiments with W trap functions; p stands for performance. other root can be deduced. The fitness of a tree (or any subtree) is defined as the score of its root. Each function calculates its score by summing the weighted scores of its direct children. If the child is a perfect tree of the appropriate level (for instance, a complete level-c tree beneath a D node), then the score of that

20 Measures to Characterize Search in Evolutionary Algorithms 19 fdc fdc prediction p (SMGP) p (stgp) B 1 =0,B 2 =0.5,B 3 =0.9,R 1 = misleading 0 0 B 1 =0,B 2 =0.8,B 3 =0.9,R 1 = misleading 0 0 B 1 =0,B 2 =0.6,B 3 =0.9,R 1 = misleading 0 0 B 1 =0.1,B 2 =0.2,B 3 =0.3,R 1 = straightf. 1 1 B 1 =0.1,B 2 =0.2,B 3 =0.3,R 1 = straightf. 1 1 B 1 =0.2,B 2 =0.3,B 3 =0.9,R 1 = straightf. 1 1 B 1 =0.3,B 2 =0.5,B 3 =0.6,R 1 = straightf. 1 1 B 1 =0.6,B 2 =0.8,B 3 =0.9,R 1 = straightf. 1 1 B 1 =0.8,B 2 =0.9,B 3 =1,R 1 = straightf. 1 1 Table 3. Results of fdc using SMGP and standard GP for the second series of experiments with W trap functions; p stands for performance. subtree, times a FullBonus weight, is added to the score of the root. If the child hasacorrectrootbutisnotaperfecttree,thentheweightispartialbonus. If the child s root is incorrect, then the weight is Penalty. After scoring the root, if the function is itself the root of a perfect tree, the final sum is multiplied by CompleteBonus. Usual values for these constants are as follows: FullBonus = 2, PartialBonus =1,Penalty = 1 3, CompleteBonus = 2 (see [37] for a more detailed explanation and for some examples of fitness calculations for some simple trees). According to this algorithm, the global optima for the royal trees of a given arity are the perfect trees having as root the node with the maximum arity. Different experiments, considering different nodes as the node with maximum arity have been performed. The values of the royal tree constants used here are, as in [37] i.e. FullBonus =2,PartialBonus =1,Penalty = 1 3, CompleteBonus = 2. Results are shown in table 4. Predictions made by fdc for level-a, level-b, Root fdc fdc prediction p (SMGP) p (stgp) B straightf. 1 1 C straightf. 1 1 D straightf E unknown F 0.44 misleading 0 0 G 0.73 misleading 0 0 Table 4. Results of fdc for the Royal Trees; p stands for performance. level-c and level-d functions are correct. For level-e function, no correlation between fitness and distance is observed and performance is either equal (for SMGP)or near(for standard GP)to zero.finally,level-f and level-g functions are predicted to be misleading (in accord with Punch in [37]) and they really are, since the global optimum is never found before 500 generations. Royal trees

21 20 G.Mauri and L. Vanneschi problem spans all the classes of difficulty as described by the fdc and fdc works for these problems, both for standard GP and for SMGP MAX Problem Contrarily to trap functions and to the Royal Tree problem, the MAX problem do not use the coding language defined in section 3.1, but sets of arithmetic functions and constants. The task of the MAX problem for GP, defined in [13] and [28], is to find the program which returns the largest value for a given terminal and function set with a depth limit d, wheretheroot node counts as depth 0. The choice of the sets F and T strongly influences the ability of GP to generate the optimum tree. For a deeper introduction to this problem and a study the GP behavior on it, see [28]. Three series of experiments have been performed for the MAX problem and are presented here: in the first series, F = {+} and T = {1} have been used, in the second series, F = {+} and T = {1, 2} have been used, and in the third series F = {+, } and T = {0.3} have been used (where the particular constant value has been chosen to avoid the presence of multiple global optima). Table 5 shows the fdc and p values for these three series (each line corresponds to a series). Two problems are correctly classified as straightforward by fdc, MAX problem fdc fdc prediction p (SMGP) p (stgp) {+} {1} straightf. 1 1 {+} {1,2} straightf. 1 1 {+, *} {0.3} 0.08 difficult 0 0 Table 5. Results of fdc for the MAX problem using SMGP and standard GP. The first column shows the sets of functions and terminals used in the experiments; p stands for performance. both for SMGP and standard GP. For the third problem, that is difficult since solutions are never found over the 100 runs performed, no correlation between fitness and distance to the global optimum has been detected for the individuals of the sample used (the fdc value is approximately equal to 0). In conclusion, also for the MAX problem the fdc seems a reasonable measure to quantify difficulty. 3.4 Counterexample Until now, we have only shown test problems for which the fdc succeeds in correctly quantifying difficulty for GP. An hand-tailored problem, built to contradict the fdc conjecture, is presented here. This problem is based on the Royal Trees and is inspired by the technique used in [39] to build a counterexample for fdc in GAs. The technique basically consists in assigning to all the trees of the search

22 Measures to Characterize Search in Evolutionary Algorithms 21 space the same fitness as for the Royal Tree problem, except all the trees containing only the nodes A and X. To these trees, a fitness equal to the optimal Royal Tree s fitness times its own depth is assigned. It is clear that the optimal Royal Tree is now a local optimum, while the global optimum is the tree containing only A and X symbols having the maximum possible depth. Moreover, it is clear that a very specific path {A(X),A(A(X)),A(A(A(X))),A(A(A(A(X)))),...} has been defined. Each tree belonging to this path has a fitness greater or equal to the optimal Royal Tree, while all the trees that don t belong to this path have a fitness smaller or equal to the optimal Royal Tree. The value of fdc for this function is 0.88.Thus,according to the fdc conjecture this function should be difficult to solve. By the way, over 100 independent runs, the global optimum has been found 100 times before generation 500 (i.e. p = 1) both by standard GP and by SMGP. These results, obviously contradict the fdc conjecture. This clearly happens because individuals belonging to the path are easy to build by means of genetic operators and, once at least one of them has been generated, it is easy to obtain the global optimum by composition or by simple mutations. In conclusion, the fdc is a reasonable hardness measure for many test functions, but it is not infallible: functions can be build for which it fails to correctly measure the difficulty. 4 Negative Slope Coefficient As shown in the previous section, fdc is a rather reliable indicator of problem hardness. However, it has some flaws: the existence of counterexamples casts a shadow on its usefulness, although such cases are contrived ones and they do not seem to appear often among natural problems. But the most severe drawback of fdc, and its main weakness, is that the optimal solution (or solutions) must be known beforehand, which is obviously unrealistic in applied search and optimization problems, and prevents one from applying fdc to more usual GP benchmarks and real-life applications. Thus, although the study of fdc is useful to understand EAs dynamics, it is also important to try other approaches based on quantities that can be measured without any explicit knowledge of the genotype of optimal solutions. The measure presented in this section, the negative slope coefficient (nsc), is based on the concepts of evolvability and fitness clouds. Evolvability is a feature that is intuitively related, although not exactly identical, to problem difficulty. It has been defined as the ability of genetic operators to improve fitness quality [1]. The most natural way to study evolvability is, probably, to plot the fitness values of individuals against the fitness values of their neighbours, where a neighbour is obtained by applying one step of a genetic operator to the individual. Such a plot has been presented in [51, 6, 50, 3] and it is called a fitness cloud. Since high-fitness points tend to be much more important than low-fitness ones in determining the behaviour of EAs, an alternative algorithm to generate fitness clouds was proposed in [45]. The main steps of this algorithm can be informally summarised as follows:

23 22 G.Mauri and L. Vanneschi Generate a set of individuals Γ = {γ 1,..., γ n } by sampling the search space and let f i = f(γ i ), where f(.) is the fitness function. For each γ j Γ generate k neighbours, v j 1,...,vj k, by applying a genetic operator to γ j and let f j =max j f(v j ). Finally, take C = {(f 1,f 1),...,(f n,f n)} as the fitness cloud. This is the interpretation of fitness cloud used in this paper. Note how this algorithm essentially corresponds to the sampling produced by a set of n stochastic hill-climbers at their first iteration after initialisation. The fitness cloud can be of help in determining some characteristics of the fitness landscape related to evolvability and problem difficulty. But the mere observation of the scatterplot is not sufficient to quantify these features. The nsc has been defined to capture with a single number some interesting characteristics of fitness clouds. It can be calculated as follows: let us partition C into a certain number of separate ordered bins C 1,...,C m such that (f a,f a) C j and (f b,f b ) C k with j<kimplies f a <f b. Consider the averages fitnesses f i = 1 and f i = 1 C i (f,f ) C i f.thepoints( f i, f i C i (f,f ) C i f ) can be seen as the vertices of a polyline, which effectively represents the skeleton of the fitness cloud. For each of the segments of this we can define a slope, S i =(f i+1 f i )/(f i+1 f i ). Finally, the negative slope coefficient is defined as: nsc = m 1 i=1 min (0,S i ). (2) The hypothesis that is proposed in this paper is that ncs should classify problems in the following way: if nsc= 0, the problem is easy; if nsc< 0 the problem is difficult and the value of nsc quantifies this difficulty: the smaller its value, the more difficult the problem. The justification for this hypothesis is that the presence of a segment with negative slope would indicate a bad evolvability for individuals having fitness values contained in that segment as neighbours are, on average, worse than their parents in that segment. The definition of nsc is very general and has many degrees of freedom. In particular, a question must be answered to be able to calculate the nsc: how should we partition the abscissas of a fitness cloud into bins? A partitioning technique called size driven bisection and inspired by the well-known bisection algorithm was proposed and justified in [49]. This technique is used in this paper, given its strong theoretical fondation based on statistics. 4.1 Experimental Results on nsc In the last few years, the nsc hasbeentestedasameasureofproblemhardnessfor GP over a large set of problems. These problems include hand-tailored functions as the ones used to test the fdc (trap functions, Royal Trees, MAX functions); but, given that the nsc can be calculated without prior knowledge of the global optima, it can also be applied on more realistic real-life-like GP benchmarks, like various forms of symbolic regressions, the even parity problem, the artificial

24 Measures to Characterize Search in Evolutionary Algorithms 23 ant on the Santa Fe trail, the multiplexer problem and the intertwined spirals problem (all these problems are described and discussed in detail in [25]). Tests of the ability of the nsc to predict the difficulty have been performed for all these problems and have given remarkably positive results. In this paper, results on two of these problems are presented: the binomial-3 problem (an instance of the symbolic regression problem, which is particularly interesting for this work, because its difficulty can be tuned by simply changing the value of a parameter) and the even parity problem. Both these problems are briefly introduced below in the respective paragraphs, before showing and discussing the experimental results The binomial-3 problem This benchmark (first introduced by Daida et al. in [7]) is an instance of the well known symbolic regression problem [25]. Thefunctiontobeapproximatedisf(x) =1+3x +3x 2 + x 3. Fitness cases are 50 equidistant points over the range [ 1, 0). Fitness is the sum of absolute errors over all fitness cases. A hit is defined as being within 0.01 in ordinate for each one of the 50 fitness cases. The function set is F = {+,,,//}, where// is the protected division, i.e. it returns 1 if the denominator is 0. The terminal set is T = {x, R}, wherex is the symbolic variable and R is the set of ephemeral random constants (ERCs). ERCs are uniformly distributed over a specified interval of the form [ a R,a R ], they are generated once at population initialization and they are not changed in value during the course of a GP run. According to Daida and coworkers, difficulty tuning is achieved by varying the value of a R. Figure 3 shows the scatterplots and the set of segments {S 1,S 2,...,S m } as defined in the previous section (with m = 10) for the binomial-3 problem with a R = 1 (Figure 3(a)), a R = 10 (Figure 3(b)), a R = 100 (Figure 3(c)) and a R = 1000 (Figure 3(d)). Parameters used are as follows: maximum tree depth = 26 and sample size of individuals. Table 6 shows some data about these experiments. Column one of table 6 represents the corresponding scatterplot in Figure 3. Column two contains the a R value. Column three contains performance. Columns four contains the value of the nsc. These results show that nsc scatterplot a R p nsc fig. 3(a) fig. 3(b) fig. 3(c) fig. 3(d) Table 6. Binomial-3 problem. Some data related to scatterplots of Figure 3. values get smaller as the problem becomes harder, and it is zero when the problem is easy (a R = 1). Moreover, the points in the scatterplots seem to cluster around good (i.e. small) fitness values as the problem gets easier.

25 24 G.Mauri and L. Vanneschi (a) (b) (c) (d) Fig. 3. Binomial-3 results. (a): a R = 1. (b): a R = 10. (c): a R = 100. (d): a R = The even parity k problem The boolean even parity k function [25] of k boolean arguments returns true if an even number of its boolean arguments evaluates to true, otherwise it returns false. The number of fitness cases to be checked is 2 k. Fitness is computed as 2 k minus the number of hits over the 2 k cases. Thus a perfect individual has fitness 0, while the worst individual has fitness 2 k. The set of functions we employed is F = {NAND,NOR}. The terminal set is composed of k different boolean variables. Difficulty tuning is achieved by varying the value of k. Figure 4 shows the scatterplots and the set of segments for the even parity 3, even parity 5, even parity 7 and even parity 9 problems. Parameters used are as follows: maximum tree depth = 10, sample size of individuals.

26 Measures to Characterize Search in Evolutionary Algorithms 25 Table 7 shows some data about these experiments with the same notation and 6 19 Fitness of Neighbors Fitness of Neighbors Fitness of Neighbors Fitness (a) Fitness of Neighbors Fitness (b) Fitness (c) Fitness (d) Fig. 4. Results for the even parity k problem. (a): Even parity 3. (b): Even parity 5. (c): Even parity 7. (d): Even parity 9. meaning as in table 6, except that column two now refers to the problem rank. Analogously to what happens for the binomial-3 problem, nsc values get smaller scatterplot problem p nsc fig. 4(a) even parity fig. 4(b) even parity fig. 4(c) even parity fig. 4(d) even parity Table 7. Even parity. Indicators related to scatterplots of Figure 4. as the problem becomes harder, they are always negative for hard problems, and zero for easy ones.

27 26 G.Mauri and L. Vanneschi 4.2 Summing up The goal of the study of GP problem hardness is to provide the final user a tool, or a set of tools, to measure the ability of GP systems to find good solutions to a given problem. Before being able to define and develop these tools, it is necessary to identify the features that make a problem an easy or hard one for GP to solve. The attempt to answer this question, in order to develop a theory of problem hardness, has lead to the hypothesis that the relationship between fitness and distance to the goal is one of the main features that make a problem easy or hard, and to the definition of the fdc. In Section 3, it has been shown that the fdc is an adequate descriptor of GP landscape s statistics and can be used to measure the difficulty of many test function, but also that it has some known drawbacks: functions can be built for which fdc fails to correctly measure the difficulty and, even more importantly, fdc is not predictive: global optima must be known beforehand to be able to calculate it. Even though sometimes some genotypic characteristics of the global optima can be known even in reallife applications, and this can allow one to approximate fdc values, this is not the case in general, and this prevents fdc to be used in a wide set of practical cases. For this reason, in the present section, the negative slope coefficient (nsc) has been presented. It is based on the concept of fitness cloud and it is a predictive measure, in the sense that no prior knowledge of the global optima s genotypes is requested to calculate it. For this reason, nsc canbeappliedtoanygpproblem (and also to other EAs different from GP). The suitability of nsc as an indicator of GP problem hardness has been experimentally shown on a set of rather diverse standard GP benchmarks and synthetic hand-tailored test functions. Tha main drawback of the nsc is that it uses only mutation as a variation operator to generate neighborhoods. A measure to model and capture the main properties of subtree crossover, the most common and most frequently used GP genetic operator, is presented below. 5 Subtree Crossover Distance In the previous section, only mutation has been considered as a variation operator to obtain neighbors and generate fitness clouds. Indeed, being mutation a unary operator, it is much easier to use it, rather than crossover, in theoretical models and measures. Nevertheless, crossover is the most common and the most frequently used genetic operator in EAs. In particular in GP, Koza s subtree crossover is often considered as one of the main search engines [25, 2]. Thus, it d be a relevant result to define a distance, or a similarity/dissimilarity measure, which could be bound to standard subtree crossover in a similar sense as structural distance is related to structural mutation (see Section 3.2). Following the same notation as in [18], let P be a population of trees, T 1 be the tree we want to compute a distance from (or the parent tree) and T 2 be the tree which we would like to transform T 1 into. The subtree crossover

28 Measures to Characterize Search in Evolutionary Algorithms 27 distance 2 (SCD) between T 1 and T 2 depends not only on T 1 and T 2 themselves, but also on the population P.Letdiff(T 1, T 2 ) be an operator that returns the set S = {(s 1 T 1,s 1 T 2 ), (s 2 T 1,s 2 T 2 ),..., (s n T 1,s n T 2 )} such that i [1,n]ifwereplace s i T 1 with s i T 2 in T 2 we obtain T 1 ; diff(t 1, T 2 ) returns the empty set if T 1 and T 2 share no genetic material. Now, the new SCD can be defined by the following algorithm: func SCD(T 1,T 2,P ){ S = diff (T 1,T 2 ) res =0 for i =1 to cardinality(s) do ps1 =probselecting(s i T 1, T 1 ) ps2 =probcreating(s i T 2, P) res = res +(ps1 ps2) endfor return(1 res) } Given the subtrees s i T 2 that need to replace s i T 1 T 1, the distance is defined in terms of the probability of selecting the subtrees s i T 1 in T 1 and the probability of creating (or selecting) the subtrees s i T 2 from P. Both functions, probselecting(.) and probcreating(.) require knowledge of the selection probabilities used in the algorithm, but can be calculated in linear computational time, as proven in [19]. 5.1 Experimental Results The goal of this section is to show the suitability of the new definition of SCD for monitoring various properties of the GP search process. In particular, Section shows how this distance can be used to calculate fitness distance correlation (fdc) inside the population during the search process, Section shows results of fitness sharing using SCD and Section shows how the SCD can be used to measure genotypic diversity of populations Fitness Distance Correlation Given that no bound has ever been proven between subtree crossover and structural distance, large samples of individuals (and not just the individuals composing a normal population) have been 2 The subtree crossover distance that we consider in this paper is a probability and thus it is clearly not a metric. Furthermore it is not just a function of two trees, but also of the population they belong to and in general it does not respect the properties of metrics (like for instance the triangle inequality). Thus, the term pseudo-distance (in the sense that it indicates how far apart the two items are) would be more appropriate than the term distance. In some senses, we could say that our measure is more like a similarity/dissimilarity measure than a proper distance (Euclidean) metric: it conveys information about how likely it is to make two trees equal, which does largely depend on their similarity. Nevertheless, we use the term distance for the sake of brevity.

29 28 G.Mauri and L. Vanneschi used in section 3 to calculate the fdc. On the other hand, the study of the trend of the fdc in the population during the evolution would be very interesting, since this study would be more dynamic than studying the fdc once for all on a single large sample of individuals. In fact, this investigation would allow us to study how the fdc gets modified during the evolution and this information could allow us to draw some conclusions on the dynamics of the GP search process. In particular, if the fdc value decreases during the evolution and it tends towards 1, the population should be converging towards the global optimum (individuals are approaching the global optimum as fitness is improving). On the other hand, if the fdc value increases during the evolution, or it remains static at some initial positive level or at zero, this probably means that the population is converging towards a local optimum (fitness is improving, but the distance to the global optimum is not decreasing). The following experiments have been done to confirm this hypothesis and to test the suitability of SCD to calculate the fdc. Syntactic Trees. In the syntactic trees problem, as used in [18], trees are represented using the set of functions F = {N}, wheren is a binary operator (N stands for Non-terminal ) and the set of terminal symbols T = {L} (L stands for Leaf ). No content is associated with the nodes and fitness is simply equal to the structural distance to a fixed global optimum. The global optimum of an instance is generated using a random tree growing algorithm described in [18]. Figure 5(a) shows the tree chosen as optimum for the experiments in Figure 6, and Figure 5(b) shows the tree chosen as optimum for the experiments in Figure 7. These experiments have been performed using the following set of parameters: (a) (b) Fig. 5. (a) The tree used as optimum for the experiments in Figure 6. (b) The tree used as optimum for the experiments in Figure 7. generational GP, population size of 30 individuals, standard subtree crossover as the only genetic operator, tournament selection of size 5, ramped half-and-half initialisation, maximum depth of individuals for the initialisation phase equal to 4, maximum depth of individuals for crossover equal to 8. All the runs have

30 Measures to Characterize G.Mauri Searc and L. in Vanneschi Evolutionary Algorithms 29 (a) (b) Fig. 6. Syntactic Trees Problem. Average values (a) and average values with their standard deviations (b) of average fitness, best fitness and fdc in the population against generations over 50 independent GP runs. In all these runs the optimum has been found before generation 100. The tree used as optimum in these experiments was the tree in Figure 5(a). been stopped at generation 100. Figure 6 reports the average values (with their standard deviations in Figure 6(b)) of the best fitness, the average fitness and the fdc (calculated using SCD) in the population (against generations) over 50 independent GP runs in which the global optimum has been found before generation 100 (successful runs). Producing 50 successful runs has been easy, probably for the very simple shape of the tree that we have used as optimum (shown in Figure 5(a)). The method that has been used to collect 50 successful runs was simply to execute a sequence of GP runs until 50 successful ones were found. It has been sufficient to execute 52 runs to get 50 successful ones. Figure 7 reports the same information, but this time for 50 unsuccessful runs. Collecting 50 unsuccessful runs has been easy, probably for the particular shape of the tree that has been used as optimum, shown in Figure 5(b) (over 61 runs, 50 were successful). These figures show that, in case of success, the fdc decreases until the global optimum is found and than remains negative until the end of the run. In case of unsuccessful runs, the fdc value always stays around zero, independently from the fact that fitness is slightly improving. The interpretation is that, in this last case the evolutionary process in leading the population towards a local optimum, which probably has a rather large crossover distance from the global one. For successful runs the fact that fdc is negative indicates that evolution is leading the population towards the global optimum. Trap Functions. Unimodal trap functions are used here. They are defined as multimodal trap functions, but they only have one global optimum. They can be considered a particular case of multimodal W trap functions where B 2 = B 3

31 30 G.Mauri and L. Vanneschi (a) (b) Fig. 7. Syntactic Trees Problem. Average values (a) and average values with their standard deviations (b) of average fitness, best fitness and fdc in the population against generations over 50 independent GP runs. In all these runs the optimum has not been found before generation 100. The tree used as optimum in these experiments was the tree in Figure 5(b). and R 1 = R 2. Figure 8(a) shows the tree chosen as optimum for the experiments in Figure 9 and Figure 8(b) shows the tree chosen as optimum for the experiments in Figure 10. Parameters used in these experiments are as follows: generational GP, population size of 100 individuals, standard subtree crossover used as the sole genetic operator, tournament selection of size 10, ramped halfand-half initialisation, maximum depth of individuals for the initialisation phase equal to 6, maximum depth of individuals for crossover equal to 10. Here, larger trees than in the case of the syntactic tree problem discussed in the previous section have been used (arity 3 nodes and deeper trees have been considered), because the hypotheses presented in this section have to be tested in different conditions. Figure 9 reports the average values (with their standard deviations in Figure 9(b)) of the best fitness, the average fitness and the fdc (calculated using SCD) in the population (against generations) over 50 independent successful GP runs. The method used to collect 50 successful runs was the same as the one discussed above. In these experiments, we have set the B 2, B 3, R 1 and R 2 trap functions parameters as follows: B 2 = B 3 =0.9 andr 1 = R 2 =0.1. In this way, the fitness landscape is easy to search for GP [44] and thus it is easy to have successful runs. Figure 10 reports the same information as Figure 9, but for 50 independent unsuccessful GP runs. In this case, the B 2 and B 3 parameters were both set to 0.1 and the R 1 and R 2 parameters to 0.9 inordertomake

32 Measures to Characterize Search in Evolutionary Algorithms 31 (a) (b) Fig. 8. (a) The tree used as optimum for the experiments in Figure 9. (b) The tree used as optimum for the experiments in Figure 10. (a) (b) Fig. 9. Trap Functions. Average values (a) and average values with their standard deviations (b) of average fitness, best fitness and fdc in the population against generations over 50 independent GP runs. In all these runs the optimum has been found before generation 100. The tree used as optimum in these experiments was the tree in Figure 8(a). the fitness landscape difficult to search for GP [44]. The method used to collect

33 32 G.Mauri and L. Vanneschi (a) (b) Fig. 10. Trap Functions. Average values (a) and average values with their standard deviations (b) of average fitness, best fitness and fdc in the population against generations over 50 independent GP runs. In all these runs the optimum has not been found before generation 100. The tree used as optimum in these experiments was the tree in Figure 8(b). 50 unsuccessful runs was the same as the one discussed in the previous section. In Figure 10(b), the scale on the ordinates axis has been restricted in order to enlarge the graph and to make it clearer and more readable. These figures show that for successful runs the fdc decreases until the global optimum is found and than remains negative until the end of the run, while in case of failure the fdc is always positive. Here the phenomenon is even more marked than in the case of syntactic trees. In fact, for successful runs fdc rapidly stabilises to approximately -0.6, while for unsuccessful runs fdc always remains approximately equal to 0.8. Once again, the conclusion is that the value of fdc in the population (calculated using the SCD) is a good indicator of the direction the search process is leading the population: negative values of the fdc mean that the search is moving towards a global optimum, while positive values of the fdc mean that the search is moving towards local ones Fitness Sharing In the previous section, it has been shown that SCD can appropriately be used to dynamically calculate the population fdc during the evolution. However, as discussed previously, fdc is not a predictive measure (i.e. the global optima must be known to be able to calculate it), which makes the fdc almost unusable in practice. Other than a measure for diversity, can the SCD be useful for practitioners? In this section, we discuss fitness sharing,

Defining Locality as a Problem Difficulty Measure in Genetic Programming

Noname manuscript No. (will be inserted by the editor) Defining Locality as a Problem Difficulty Measure in Genetic Programming Edgar Galván-López James McDermott Michael O Neill Anthony Brabazon Received: