Genetic Algorithms and permutation. Peter J.B. Hancock. Centre for Cognitive and Computational Neuroscience

Similar documents
0 o 1 i B C D 0/1 0/ /1

EVOLUTIONARY OPERATORS FOR CONTINUOUS CONVEX PARAMETER SPACES. Zbigniew Michalewicz. Department of Computer Science, University of North Carolina

Structure Design of Neural Networks Using Genetic Algorithms

Hierarchical Evolution of Neural Networks. The University of Texas at Austin. Austin, TX Technical Report AI January 1996.

Shigetaka Fujita. Rokkodai, Nada, Kobe 657, Japan. Haruhiko Nishimura. Yashiro-cho, Kato-gun, Hyogo , Japan. Abstract

Genetic Algorithms with Dynamic Niche Sharing. for Multimodal Function Optimization. at Urbana-Champaign. IlliGAL Report No

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Lecture 9 Evolutionary Computation: Genetic algorithms

Designer Genetic Algorithms: Genetic Algorithms in Structure Design. Inputs. C 0 A0 B0 A1 B1.. An Bn

Forming Neural Networks through Ecient and Adaptive. Coevolution. Abstract

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

Comparison of Evolutionary Methods for. Smoother Evolution. 2-4 Hikaridai, Seika-cho. Soraku-gun, Kyoto , JAPAN

Chapter 30 Minimality and Stability of Interconnected Systems 30.1 Introduction: Relating I/O and State-Space Properties We have already seen in Chapt

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

On the Structure of Low Autocorrelation Binary Sequences

1/sqrt(B) convergence 1/B convergence B

PRIME GENERATING LUCAS SEQUENCES

1 What a Neural Network Computes

Widths. Center Fluctuations. Centers. Centers. Widths

Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform. Santa Fe Institute Working Paper (Submitted to Complex Systems)

Laboratory for Computer Science. Abstract. We examine the problem of nding a good expert from a sequence of experts. Each expert

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University

Unicycling Helps Your French: Spontaneous Recovery of Associations by. Learning Unrelated Tasks

Error Empirical error. Generalization error. Time (number of iteration)

Lecture 14 - P v.s. NP 1

An Adaptive Bayesian Network for Low-Level Image Processing

What Makes a Problem Hard for a Genetic Algorithm? Some Anomalous Results and Their Explanation

percentage of problems with ( 1 lb/ub ) <= x percentage of problems with ( 1 lb/ub ) <= x n= n=8 n= n=32 n= log10( x )

Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim

Linear Regression and Its Applications

Stochastic Search: Part 2. Genetic Algorithms. Vincent A. Cicirello. Robotics Institute. Carnegie Mellon University

Analog Neural Nets with Gaussian or other Common. Noise Distributions cannot Recognize Arbitrary. Regular Languages.

AND AND AND AND NOT NOT

Vandegriend & Culberson Section 6 we examine a graph class based on a generalization of the knight's tour problem. These graphs are signicantly harder

Search. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.).

Neural Systems and Artificial Life Group, Institute of Psychology, National Research Council, Rome. Evolving Modular Architectures for Neural Networks

1 Introduction Tasks like voice or face recognition are quite dicult to realize with conventional computer systems, even for the most powerful of them

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Massachusetts Institute of Technology

Computational Complexity and Genetic Algorithms

2 Information transmission is typically corrupted by noise during transmission. Various strategies have been adopted for reducing or eliminating the n

Looking Under the EA Hood with Price s Equation

Artificial Neural Networks Examination, March 2004

G. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith

2 P. L'Ecuyer and R. Simard otherwise perform well in the spectral test, fail this independence test in a decisive way. LCGs with multipliers that hav

CSC 4510 Machine Learning

The simplest kind of unit we consider is a linear-gaussian unit. To

Standard Particle Swarm Optimisation

Integer weight training by differential evolution algorithms

Computing the acceptability semantics. London SW7 2BZ, UK, Nicosia P.O. Box 537, Cyprus,

Function Optimization Using Connectionist Reinforcement Learning Algorithms Ronald J. Williams and Jing Peng College of Computer Science Northeastern

The Multi-Layer Perceptron

Learning in Boltzmann Trees. Lawrence Saul and Michael Jordan. Massachusetts Institute of Technology. Cambridge, MA January 31, 1995.

Artificial Neural Networks Examination, June 2005

Abstract. In this paper we propose recurrent neural networks with feedback into the input

Gravitational potential energy *

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Active Guidance for a Finless Rocket using Neuroevolution

An average case analysis of a dierential attack. on a class of SP-networks. Distributed Systems Technology Centre, and

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore


1 INTRODUCTION This paper will be concerned with a class of methods, which we shall call Illinois-type methods, for the solution of the single nonline

The best expert versus the smartest algorithm

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

1.1. The analytical denition. Denition. The Bernstein polynomials of degree n are dened analytically:

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Genetically Generated Neural Networks II: Searching for an Optimal Representation

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.

Atomic Masses and Molecular Formulas *

cells [20]. CAs exhibit three notable features, namely massive parallelism, locality of cellular interactions, and simplicity of basic components (cel

Boxlets: a Fast Convolution Algorithm for. Signal Processing and Neural Networks. Patrice Y. Simard, Leon Bottou, Patrick Haner and Yann LeCun

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

Haploid & diploid recombination and their evolutionary impact

This material is based upon work supported under a National Science Foundation graduate fellowship.

Novel determination of dierential-equation solutions: universal approximation method

Eects of domain characteristics on instance-based learning algorithms

Week Cuts, Branch & Bound, and Lagrangean Relaxation

squashing functions allow to deal with decision-like tasks. Attracted by Backprop's interpolation capabilities, mainly because of its possibility of g

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

Reinforcement Learning: Part 3 Evolution

Rule table φ: Lattice: r = 1. t = 0. t = 1. neighborhood η: output bit: Neighborhood η

Remaining energy on log scale Number of linear PCA components

Sliding Mode Control: A Comparison of Sliding Surface Approach Dynamics

Chapter 8: Introduction to Evolutionary Computation

w1 w2 w3 Figure 1: Terminal Cell Example systems. Through a development process that transforms a single undivided cell (the gamete) into a body consi

Cryptanalysis of Akelarre Niels Ferguson Bruce Schneier DigiCash bv Counterpane Systems Kruislaan E Minnehaha Parkway 1098 VA Amsterdam, Nethe

Evolutionary Computation

the subset partial order Paul Pritchard Technical Report CIT School of Computing and Information Technology

Forecasting & Futurism

Second-order Learning Algorithm with Squared Penalty Term

Ways to make neural networks generalize better

Model Theory Based Fusion Framework with Application to. Multisensor Target Recognition. Zbigniew Korona and Mieczyslaw M. Kokar

Computational statistics

Contents 1 Introduction 4 2 Go and genetic programming 4 3 Description of the go board evaluation function 4 4 Fitness Criteria for tness : : :

Optimizing Stochastic and Multiple Fitness Functions

Representation and Hidden Bias II: Eliminating Defining Length Bias in Genetic Search via Shuffle Crossover

Lecture 22. Introduction to Genetic Algorithms

Transcription:

Genetic Algorithms and permutation problems: a comparison of recombination operators for neural net structure specication Peter J.B. Hancock Centre for Cognitive and Computational Neuroscience Departments of Psychology and Computing Science University of Stirling, Scotland, FK9 4LA Abstract The specication of neural net architectures by genetic algorithm is thought to be hampered by diculties with crossover. This is the \permutation" or \competing conventions" problem: similar nets may have the hidden units dened in dierent orders so that they have very dissimilar genetic strings, preventing successful recombination of building blocks. Previous empirical tests of a number of recombination operators using a simulated net-building task indicated the superiority of one that sorts hidden unit denitions by overlap prior to crossover. However, simple crossover also fared well, suggesting that the permutation problem is not serious in practice. This is supported by an observed reduction in performance when the permutation problem is removed. The GA is shown to be able to resolve the permutations, so that the advantages of an increase in the number of maxima outweigh the diculties of recombination. 1 Introduction It is well-established that the performance of Backprop-trained neural nets is strongly dependent on their internal structure, one reason being over-tting of the training data if there are too many weights. It is quite possible to halve the error rate on a test set by appropriately pruning connections (Hancock, 1992b). Unfortunately, there are few established guidelines for deciding a priori which connections are important for a given task. Genetic algorithms oer one possible method of exploring the space of connectivities. Miller et al (1989) suggest a classication of methods for coding nets on a genetic string. Strong codings specify each connection individually, while weak codings are more like growth rules, that may specify many units and connections simultaneously, perhaps stochastically. This paper is concerned with the precise denition of relatively small nets, so strong coding, also known as direct coding, is appropriate. The simplest such coding uses one bit per 1

connection. Miller et al (1989) use a complete connection matrix, allowing any unit to connect to any other. While this achieved their aim of freeing the GA of their preconceptions about likely patterns of connectivity, it does allow nets with recurrent connections to be specied, which causes problems for a training algorithm such as Backprop. In this work a more limited model is considered, that of a net with a single hidden layer, and forward connections only between adjacent layers. The genetic string then consists of a simple concatenation of hidden unit denitions, each specied by one bit for each input and output unit. With such a net specication there is a potentially severe problem, known variously as the competing conventions, permutation, isomorphism and the structural/functional mapping problem. With a purely feed-forward net, there is no signicance to the order of the hidden units. This means that the genetic coding is redundant, in the sense that many dierent strings will code for the same net. There will be n! possible codings of a net with n hidden units. If crossover attempts to combine two strings that have a dierent order, the result is liable to contain two copies of some units, and none of others. Figure 1 illustrates the potential problem with a net that has a 2 dimensional input, such as an image, and hidden units that have localised receptive elds (RFs). Two good parents may combine to give two poor children. The same potential problem aects eorts to train net weights by GA. Note, however, that this paper is concerned only with the specication of the connectivity of a net, and not the weights associated with the connections. Once the specied net has been built, weights are assumed to be initialised randomly prior to training with an algorithm such as Backprop. 2 1 3 4 C A D B A 3 4 B C 2 1 D Parent 1: 1234 Parent 2: ABCD Child 1: AB34 Child 2: 12CD Figure 1: Recombination of nets with localised receptive elds. The parent nets each have 4 similar hidden units, but dened in dierent orders on the genetic string, so unit 2 on parent 1 is most like unit C on parent 2. Here, there is a single crossover point, between the second and third unit denitions. The resulting nets each have two copies of similar hidden units, and do not cover the input space. While this problem is widely recognised (e.g. Belew et al (199)), there have been few attempts to solve it. Indeed, a number of workers have simply removed crossover from the algorithm, e.g. (Bornholdt & Graudenz, 1992; de Garis, 199; Nol et al., 199), which seems drastic given the centrality of recombination to the GA model. Whitley et al (1991) report successful results for training the weights of nets by GA, using what they term a genetic hill-climber, which is essentially mutation driven. This is indicated by a decrease in the number of evaluations required as the population size is reduced to one. Recently, Radclie (1993) has suggested a matching recombination operator that at- 2

tempts to overcome the permutation problem. Preliminary tests on this operator, and an extension proposed by the author, indicated that both tended to perform less well than a simple form of crossover (Hancock, 1992c). This paper explores further the causes and implications of this observation. 2 Addressing the permutation problem The permutation problem stems from the possibility that units which are equivalent in the net are dened in dierent places on two parent strings. Ideally, we should like a method of identifying the equivalent units prior to crossover. Montana and Davis (1989) suggested such a method for use in the context of training net weights by GA. They matched hidden units by their responses to a number of trial inputs applied to the net. Radclie (1993) has suggested a method that applies to net structure denition. This treats net specications as a multiset, where the elements are the possible hidden unit connectivities. The hidden layer is a multiset, because it is possible to have more than one copy of a unit with given connectivity (in the limit, all the units might be identical). Radclie suggests searching through the units dened in each parent to match those which are identical. Such units are transmitted to the child string with a high probability. A limitation of this algorithm is that it only matches units which are identical: if they dier in one connection they are treated just as if they dier in every connection. An extension was therefore proposed, which matches units on the basis of the number of connections they have in common (Hancock, 1992a; Hancock, 1992c). Since the unit denitions take the form of binary strings, Hamming distance might seem an appropriate measure of distance. However, the aim of the matching process is to identify units that might play a similar role in the nal trained net. Suppose we are specifying the connections to hidden units from 8 inputs. It seems unlikely that a unit specied by 11 would be at all similar to one specied by 11, but, because of the zeros in common, Hamming distance shows them to be quite close, and relatively more so as the number of unconnected inputs increases. A more plausible measure of overlap is given by counting only connections in common, for units dened by binary strings k and l: overlap = inputs in common total inputs connected = jk:and:lj jk:or:lj (1) Hancock (1992c) compared Radclie's operator, and a modied version that matched unit denitions on overlap, with a simple crossover, which will be referred to as Uniform. In Uniform, multiple cross points are allowed, occuring at the boundary of unit denitions with (fairly high) probability p u, and within unit denitions with (lower) probability p b. The results, on a simulated net-building task (see section 3), showed that Uniform often performed better than either of the more complex operators. However, a modied version of Uniform, referred to as Sort, was best. The recombination phase of this is just like Uniform, but the strings are sorted prior to crossover. An overlap matrix is built and the two parent strings are then reordered such that units in equivalent positions are paired in order of decreasing similarity. This allows crossover to select between equivalent units in the 3

two parents, as estimated by the overlap measure. Note that the relative complexity of the various operators is not an issue when it comes to specifying neural nets: the evaluation time of any practical net will far outweigh the generation time taken by any of these algorithms. The next sections present some more detailed comparisons of these operators. Since Radclie's operator and the author's modication of it again performed relatively badly, their results are omitted for clarity: they may be found in (Hancock, 1992a). 3 Testing the operators Ideally, the various operators should be compared by the full process of specifying, training and testing nets, using a variety of data sets. Since the performance of both nets and GAs is inherently stochastic (because of the random starting weights for Backprop-trained nets), several runs are required for statistically signicant results. At the time of writing, such runs are in progress, and are expected to be so for several more cpu-months. Some faster method of comparison is clearly desirable. The method used here is a simulated net-building task. Suppose we know the ideal connectivity for a given task. Then we can set the GA the problem of matching that design, with an evaluation given by some measure of the distance from the target net. As with a real net, the units may be dened in any order on the genetic string. The evaluation function again builds an overlap matrix, this time between test and target unit denitions. In the simplest case the nal evaluation is simply a sum of the individual unit match scores, minus an arbitrary penalty of one per unit dened in excess of the number in the target net. No specic penalty is required for having too few units, since such nets will be penalised by being unable to match all the target units. This match may be evaluated in a fraction of a second, allowing detailed comparisons between the algorithms to be made. The method also allows the operators to be compared on dierent types of net. It might be, for instance, that one operator fares best if the receptive elds of the hidden units do not overlap, but another is better when they do, or when there are multiple copies of the same hidden unit type. For the purposes of this paper, the simulated evaluation has another advantage over the real thing: it allows the eects of the permutation problem to be assessed. If the unit matching procedure is removed from the evaluation algorithm, the task is reduced to a simple bit matching of a target string. Results are shown below for Uniform, labelled NP. There are a number of potentially signicant dierences between this method and the evaluation of real nets. 1. The connections of a real net typically dier in importance, whereas the overlap measure treats all equally. 2. There are likely to be local minima with real net designs. 3. It is likely that a real net design will show a signicant degree of epistasis, i.e. an interdependence between the various units and connections. The value of one connection may only become realised in the presence of one or more others. Note that, at the 4

string level, the simulated problem already is epistatic, because of the permutation eect: the value of a given bit depends on which unit it ends up being part of. 4. Linked to this is the possibility of deception: one conguration for a particular unit may be good on its own, but a dierent conguration much more so only when combined with another particular unit type. For example, one large receptive eld might be improved on by two smaller ones, but be better than either small one on its own. 5. The evaluation of real nets will be noisy. All of these factors may be added in to the evaluation procedure, albeit not with the subtlety that is likely to be present in a real net. The initial evaluation algorithm used the same measure of overlap to compare target and test unit denitions as the Sort operator uses for its matching. It was felt that this might give it undue advantage, since it is specically attempting to propagate the same similarities that would then be used to evaluate the result. Two other overlap measures were therefore tried. The rst simply replaced the measure from equation 1 with Hamming distance. The same penalty of 1 per unit dened in excess of the target number was applied. The second matched each pair of target and test units purely on the basis of the fraction of the target connections present in the test unit. A perfect match for any target unit would therefore be obtained by switching on all the connections in the test unit. However, the test string was then penalised for every connection in excess of the total in the target string. If this penalty was too large, the GA tended to respond by reducing the number of units specied in the test string, since this is the quickest way to reduce the total number of connections. A similar eect might be produced with real nets if the cpu time penalty component of the evaluation test total function was too large. The penalty used in the tests reported here was :2 (1? ). target total This replaces the penalty for excess units used in the other two procedures. The maximum score in this case is 1., while in the other two it is equal to the number of target units. The three evaluation operators may be summarised, in each case the summation is over the pairs of test and target unit denitions after they have been matched: 1. E1 F = P i=n target i=1 overlap i? (n target? n test ) P i=n 2. E2 F = target i=1 H i? (n target? n test ) P i=n target 3. E3 F = i=1 jtest i \target i j? :2 (1? target total test total ) target total where F is the performance, and the over-size penalty only applies if the test size is bigger than the target size. Noise was introduced by addition of a Gaussian random variable to the nal score for a string. The results below used a standard deviation of 5% of the maximum evaluation score, which is similar in size to the evaluation noise of Backprop trained nets found in (Hancock & Smith, 1991). Epistasis and a form of deception were introduced by giving the GA two target strings. The rst was scored as usual. For the second, the target units were grouped, with the score 5

for the group being given by the product of the scores of the constituent units. This is epistatic because the score for one unit depends upon the match of others within the same group. It can be made deceptive by increasing the value of the group so that it exceeds the combined score of the equivalent individual units in the rst string. The building blocks given by the easy task of matching the units of the rst string do not combine to give the better score obtainable by the harder job of matching the second string. Although not necessarily formally deceptive in the sense dened by Goldberg (1987), such an evaluation function may be expected to cause problems for the GA. In the experiments reported below, the units were gathered in four groups of three. Since each unit scores 1 if correct, the maximum score for a group, being the product of three unit scores, is also 1. The sum of the group scores was multiplied by 9, to give a maximum evaluation of 36, compared with 12 for matching the rst string. The rather high bonus for nding the second string was set so as to ensure that it was in fact found (by all but one of the operators, see below). There are two points of interest in comparing the recombination operators: are there dierences in the frequency of nding the second string, and, having found it, are there dierences in the rate of convergence on it? At lower bonus values, all the operators sometimes failed to nd the second string. Despite averaging over 1 runs, no signicant dierences could be found between the operators in their ability to nd the second string. The bonus was therefore increased to test convergence, since stochastic variations in the number of failures might cause dierences in the average performance bigger than those produced by the dierences in convergence. 3.1 Target nets While a variety of target net designs have been used, results from just three are reported here. These are 1. Net-1 A net with 1 hidden units and 3 inputs. Each hidden unit receives input from three adjacent inputs, such that RFs do not overlap. Target string length l = 3. 2. Net-2 A net with 1 hidden units and 18 inputs, arranged in a 6x3 matrix. The hidden unit RFs tile all the 2x2 squares of the input matrix and are therefore heavily overlapping. Target string length l = 18. 3. Net-3 A net with 12 hidden units and 1 inputs. The input connections of 1 hidden units were generated at random, with a connection probability of.3. Two of these units were then duplicated. This was the design used for the deceptive problem: a second target being produced in the same way, using a dierent random number seed. Target string length l = 12. Note that some combinations of net and evaluation method ought to be quite straightforward: being soluble in a single pass by the simple algorithm of ipping each bit in turn and keeping the change if it gives an improvement. This requires just l evaluations. Such an algorithm will probably fail in the presence of noise or deception, but in the absence of these can solve problems using E2 or E3. E1, the overlap measure, may require more than 6

one pass, because a target and test unit pair which do not overlap will have a score of zero, which is unaected by changing any bits not set in the target. 4 Results There are dierent operators to compare, with three target nets and three evaluation procedures. To each evaluation, noise and/or deception may be added. In addition to regular GA parameters such as population and generation size, selection scaling factor and mutation rate, each of the recombination operators has parameters such as crossover probability p u. It is clearly not possible to report detailed results for all the possible combinations. A number of extensive runs were made, stepping through the key parameter values to get a feel for their interactions. The results were used to guide the parameter settings used in the runs that are reported. The main points will be summarised. All the experiments used rank-based selection, with a geometric scaling factor. In the absence of noise, the selection pressure could be very high, with a scaling factor as low as.85, which gives over half the reproduction opportunities to the top ve strings. This underlines the essential simplicity of the task. For all the experiments reported below, the scaling factor was set at.96, a value more like those used in earlier work with real nets, reported in (Hancock & Smith, 1991). The mutation probability p m has a marked eect on convergence rates, the optimal value being of the order of the inverse of the string length. Uniform and Sort are not sensitive to variations in p b, the probability of crossover inbetween unit denition boundaries. It was set at.1. For Uniform, p u, the probability of crossover at unit boundaries should be.5, corresponding to picking each unit from either parent at random. For Sort, p u was best set to 1., which implies picking from the two parents alternately. This odd nding provoked much checking of code, and was tested for many combinations of target net and evaluation procedure, with consistent results. It appears that maximal mixing of the two parents confers some real advantage, but quite how is unclear. Figures 2 to 4 show the results for the main sequence of tests of the three target net designs, with the three evaluation procedures. All used a population size of 1, with a generation size of 1 (a generation gap size of.1, in the terminology of dejong (1975)) for the runs without noise, and 1 for those with. The results shown are for the best individual in the population, averaged over 2 runs from dierent starting populations. When noise was added to the evaluation, the results are shown for the true evaluation of the string, without the noise. Note that, in order to maximise distinguishability of the lines, the axes of the graphs are not uniform. Figure 2 is for Net-1 (with non-overlapping RFs). Mutation rate p m was set at.1. Figure 3 is for Net-2 (with overlapping RFs), with p m.2. Figure 4 is for Net-3 (random connectivity), with deception. The third method of evaluation is not shown because it is not compatible with the deceptive problem. The evaluation rates the whole string at once, while the deception works at the level of individual units. Mutation rate for this problem was set at.1. The results agree with those in Hancock (1992c), with Sort out-performing Uniform. 7

1. E1 1. E1+noise 8. 8. 6. 4. 2. Sort Uniform NP 6. 4. 2.. 2 4 6 8. 2 4 6 8 1 E2 9.5 E2+noise 9.5 8.5 8.5 2 4 6 8 7.5 2 4 6 8 1 1. E3 1. E3+noise.8.8.6.6.4.4.2 2 4 6 8.2 2 4 6 8 1 Figure 2: of the various recombination algorithms on a 1 hidden unit, nonoverlapping receptive eld problem Net-1. 8

E1 E2 E2+noise E3 E3+noise 8 6 4 2 4. 6. 8. 1. 2 1 3. 4. 5. 6. 7. 8. 9. 1. Uniform Sort NP E1+noise 5 4 3 2 1 6. 7. 8. 9. 1. 2 1 5. 6. 7. 8. 9. 1. 5 4 3 2 1.4.6.8 1. 1 8 6 4 2.4.6.8 1. Figure 3: Net-2:Overlapping RF problem 9

4 E1 4 E1+noise 3 3 2 Sort Uniform NP 2 1 1 1 2 3 4 5 2 4 6 8 1 4 E2 4 E2+noise 3 2 1 3 2 1 2 3 4 5 1 2 4 6 8 1 Figure 4: Net-3: Deceptive problem. The failure of NP is discussed in section 5 Sometimes the dierence is quite marked, for instance for Net-1 in the presence of noise, gure 2. Note that the gently asymptotic curves can hide quite signicant dierences in evaluations required to reach a given performance: often a factor of two, which might be several days cpu time with real net evaluations! What may be surprising is the eect of removing the permutation \problem": the results are quite consistently worse! The explanation for this is pointed to by the consistently poorer starting evaluation for NP in all the results. When a string is compared to the target in NP, it has to match each unit with whatever is in that position on the string. When the matching algorithm is added to the evaluation, there are n! ways for each string to be arranged to match the target. With 1 hidden units, there are about 3:6 1 6 extra maxima, while the search space remains constant, dened by the string length. On average therefore, the initial population with a permutation scores signicantly better than NP. The permutation problem is that the strings corresponding to these maxima are all in dierent orders, so a simple GA should have 1

diculty combining them. These results suggest that the benets outweigh the drawbacks and in most cases Uniform is able to keep its advantage over NP. That there is indeed a permutation problem is conrmed by the consistently superior performance of Sort. This reaps the advantages of the permutations, but then seeks to reorder the unit denitions so that they may be recombined advantageously. It does not appear to be the case that the original evaluation procedure, E1, was unduely favourable to Sort: the results from the three evaluation procedures are broadly similar. 5 Solving the permutation problem The most unexpected result here was that permutations are apparently more of a help than a hindrance. That permutations should give an initially better evaluation seems clear enough, the surprise is that Uniform is able to maintain its advantage over NP. It appears that Uniform is able to resolve the permutation problem in practice. An obvious question is how: does it solve the permutation and then get on with the target problem, or do both concurrently? It is possible to observe it in action, by counting how many times a given target string unit appears in each of the possible positions in the population of test strings. If it is in the same position in (nearly) every string, the GA has (eectively) solved that bit of the permutation problem. The test unit denition does not have to be fully correct, or the same in every string, merely closer in each case to the same target unit than to any other. Displaying the data for all the unit denitions gives an indecipherable graph, so gure 5 shows the rst and last unit to be solved, averaged over ve runs, using Net-1 and E1. This and many other runs (not shown) indicate that solution of the permutations is fairly gradual, and certainly does not precede improvement on the target problem. Resolving the permutations appears to be a semi-sequential process, with each position becoming xed independently, apart from the last two which must go together. As positions become xed, the size of the permutation problem is rapidly reduced. However, the results do not suggest that the process of solving it accelerates. Figure 5 indicates that the nal pair of positions becomes xed more gradually than the rst one. This is probably because there is more eective competition between the remaining permuted strings. What evidently happens is that the whole population gradually gets xed in the order of the best few individuals. The average initial score is about 3 out of 1. If it is supposed that this results from having got three units correct, and seven with no score (obviously not the case), then it is possible to calculate the probability of combining two random individuals and producing an ospring with four or more units correct. It is in the order of.2, quite high enough for the GA to make progress. MonteCarlo simulations would be required to estimate the probabilities for the real problem. However, it is evident that the permutation problem is not as severe as had been thought. It is not necessary to solve it all at once, provided there is a reasonable chance of bringing good alleles together, the GA will do the rest. A sceptic might suppose that crossover is not contributing to the solution, and that 11

12 1 1 8 6 8 6 4 Number of strings 4 First Last 2 2 1 1 2 3 4 5 Figure 5: Uniform operator solving the permutation problem. In addition to performance of the best string, the number of strings that have the same unit denition in the same location on the string are shown, for the rst and last units to be resolved. permutation problem is overcome by a mixture of selection and mutation alone, i.e. the system is acting as a \genetic hill-climber". This is easily checked, gure 6 shows Uniform, using Net-1 and E1, with and without crossover enabled, at three mutation rates. Mutation alone evidently can solve the problem, but even with this very simple problem the addition of crossover causes a marked improvement. Resolving the permutations is aided by high selection pressure: by increasing the dominance of the top-ranked string, it is better able to enforce its order on the population. It was therefore thought that deceptive problems would pose a challenge, since too high a rate of convergence would appear to reduce the chances of nding the deceptive solution. As gure 4 shows, NP was actually worse than Uniform, failing to solve the deceptive problem in three out of four cases. The reason is the same as before: when many permutations can be tried there is a better chance of nding a combination of units that scores well on the harder task. If the value of the second string is increased suciently, NP will also nd it every time. NP is successful when using Hamming distance, E2, in the presence of evaluation noise. The noise appears to inhibit convergence long enough to allow the better strings to emerge. 12

1 8 6 4 Px Pm.1 Px Pm.5 Px Pm.2 Px.5 Pm.1 Px.5 Pm.5 Px.5 Pm.2 2 1 2 3 4 5 Figure 6: Uniform on Net-1 with E1, with and without crossover enabled The situation can be changed by increasing the population size, and reducing selection pressure. Figure 7 shows the performance of Uniform, with and without a permutation problem, on a simple net matching task, with 12 hidden units. With low selection pressure, a scaling factor of.995, NP is able to overtake Uniform, because the latter is unable to resolve the permutations fast enough. If the selection pressure is increased, Uniform does better. 6 Conclusions The initial aim of this work was to compare Radclie's proposed method for overcoming the permutation problem with a proposed extension. A simple crossover operator was intended as a baseline. This paper has explored the nding that the simple crossover often worked better than the more sophisticated recombination algorithms. It appears that, in practice, the permutation or competing conventions problem is not be as severe as had been supposed. With the population size and selection pressures used here, a GA is quite capable of resolving the permutations, even with deceptive problems. The increased number of ways of solving the problem outweigh the diculties of bringing the building blocks together. That the GA is not working purely by mutation and selection was demonstrated by showing that performance improves when crossover is enabled. Resolution of the permutation problem is assisted by sorting the strings appropriately before crossover. On the evidence presented here, Sort oers a useful improvement over simple crossover. The obvious question is the extent to which these ndings apply to real nets, as opposed 13

12. 1. 8. 6. Uniform sp=.99 NP sp=.99 Uniform sp=.995 NP sp=.995 4. 1 2 Figure 7: Results for Uniform, with and without permutation problem, on a net with 12 hidden units, using E1 and two levels of selection pressure. Population size was 1. to the simulated problem used here. There are two aspects to this: is the proposed measure of overlap useful for identifying similar hidden units, and will it be as easy to resolve the permutations with a real, noisy net to evaluate? The Sort operator is currently being tested on real nets. The extent of the permutation problem may be assessed by comparing the performance of GAs with and without crossover enabled. Menczer and Parisi (199) report that adding crossover at a probability of.25 improves a GA used for optimising the weights of a net: further testing seems appropriate. Acknowledgements I thank Nick Radclie and Leslie Smith for helpful discussions, and the referees for useful comments. This work was partly funded by grant no GR/F97393 from the UK Science and Engineering Research Council Image Interpretation Initiative. References Belew, R.K., McInerney, J., & Schraudolph, N.N. 199. Evolving networks: using the genetic algorithm with connectionist learning. Tech. rept. CSE TR9-174. UCSD. Bornholdt, S., & Graudenz, D. 1992. General asymmetric neural networks and structure design by genetic algorithms. Neural Networks, 5, 327{334. de Garis, H. 199. Genetic Programming: Modular neural evolution for Darwin machines. In: Proceedings of IJCNN Washington Jan 199. 14

DeJong, K.A. 1975. An analysis of the behavior of a class of genetic adaptive systems. Ph.D. thesis, University of Michigan, Dissertation Abstracts International 36(1), 514B. Goldberg, D.E. 1987. Simple genetic algorithms and the minimal deceptive problem. Pages 74{88 of: Davis, L. (ed), Genetic Algorithms and Simulated Annealing. Pitman, London. Hancock, P.J.B. 1992a. Coding strategies for genetic algorithms and neural nets. Ph.D. thesis, Department of Computing Science and Mathematics, University of Stirling. Hancock, P.J.B. 1992b. Pruning neural nets by genetic algorithm. Pages 991{994 of: Aleksander, I., & Taylor, J.G. (eds), Proceedings of the International Conference on Articial Neural Networks, Brighton. Elsevier. Hancock, P.J.B. 1992c. Recombination operators for the design of neural nets by genetic algorithm. Pages 441{45 of: M}anner, R., & Manderick, B. (eds), Parallel Problem Solving from Nature 2. Elsevier, North Holland. Hancock, P.J.B., & Smith, L.S. 1991. GANNET: Genetic design of a neural net for face recognition. Pages 292{296 of: Schwefel, H-P., & M}anner, R. (eds), Parallel problem solving from nature. Lecture notes in Computer Science 496, Springer Verlag. Menczer, F., & Parisi, D. 199. `Sexual' reproduction in neural networks. Tech. rept. PCIA- 9-6. C.N.R.Rome. Miller, G.F., Todd, P.M., & Hegde, S.U. 1989. Designing neural networks using Genetic Algorithms. Pages 379{384 of: Schaer, J.D. (ed), Proceedings of the third international conference on Genetic Algorithms. Morgan Kaufmann. Montana, D.J., & Davis, L. 1989. Training feedforward neural networks using Genetic Algorithms. Pages 762{767 of: Proceedings of the Eleventh IJCAI. Nol, S., Parisi, D., Vallar, G., & Burani, C. 199. Recall of sequences of items by a neural network. In: Touretzky, D., Hinton, G., & Sejnowski, T. (eds), Proceedings of the 199 Connectionist models summer school. Morgan Kaufmann. Radclie, N. 1993. Genetic set recombination and its application to neural network topology optimisation. Neural computing and applications, 1, 67{9. Whitley, D., Dominic, S., & Das, R. 1991. Genetic reinforcement learning with multilayer neural networks. Pages 562{569 of: Belew, R.K., & Booker, LB. (eds), Proceedings of the fourth international conference on Genetic Algorithms. Morgan Kaufmann. 15