Hierarchical Evolution of Neural Networks. The University of Texas at Austin. Austin, TX Technical Report AI January 1996.

Size: px

Start display at page:

Download "Hierarchical Evolution of Neural Networks. The University of Texas at Austin. Austin, TX Technical Report AI January 1996."

Shanna Lang
5 years ago
Views:

1 Hierarchical Evolution of Neural Networks David E. Moriarty and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX Technical Report AI January 1996 Abstract In most applications of neuro-evolution, each individual in the population represents a complete neural network. Recent work on the SANE system, however, has demonstrated that evolving individual neurons often produces a more ecient genetic search. This paper explores the merits of neuro-evolution both at the neuron level and at the network level. While SANE can solve easy tasks in just a few generations, in tasks that require high precision, its progress often stalls and is exceeded by a standard, network-level evolution. In this paper, a new approach called Hierarchical SANE is presented that combines the advantages of both approaches by integrating two levels of evolution in a single framework. Hierarchical SANE couples the early explorative quality of SANE's neuron-level search with the late exploitative quality of a more standard network-level evolution. In a sophisticated robot arm manipulation task, Hierarchical SANE signicantly outperformed both SANE and a standard, network-level neuro-evolution approach, suggesting that it can more eciently solve a broad range of tasks. 1 Introduction Articial evolution has proven to be an eective method for forming neural networks in tasks where sparse reinforcement precludes normal supervised learning methods such as backpropagation. The evolutionary framework frees the implementor from generating training examples and provides a highly adaptive mechanism for dynamic environments. Recent work has shown evolved neurocontrollers eective in unstable, dynamic control tasks (Cli et al. 1993; Moriarty and Miikkulainen 1996; Nol et al. 1994; Whitley et al. 1993; Yamauchi and Beer 1993). The bane of the evolutionary methods, however, has been the large number of tness evaluations that must be performed to achieve a high level of performance. Recently, Moriarty and Miikkulainen (1996) have developed a more ecient neuro-evolution system called SANE (Symbiotic, Adaptive Neuro-Evolution) with promising scale-up properties. SANE was designed as a method for eciently forming eective decision policies in sequential decision tasks. Unlike most approaches to neuro-evolution where each individual in the population is a complete neural network, SANE's individuals are single neurons (hidden neurons in a 3-layer 1

2 network). Evolution at the neuron level promotes population diversity and allows SANE to better evaluate subcomponents of the nal solution. In the pole balancing benchmark, SANE compared favorably to temporal dierence methods such as the Adaptive Heuristic Critic (Barto et al. 1983) and Q-learning (Watkins 1989), and network-level neuro-evolution systems such as GENITOR (Whitley et al. 1993). The purpose of this paper is to compare and contrast the merits of evolution at the neuron level and evolution at the network level. SANE's neuron-level approach is very adept at nding quick, eective solutions, but often has diculty pinpointing the best solutions. The diverse neuron population leads to many dierent neuron combinations, resulting in a faster exploration of the neural network space. However, SANE does not eectively exploit the best areas of the network space, and thus often cannot ne tune the best neural networks. SANE's search strategy is thus most eective in tasks where there are lots of solutions such as pole balancing or where solutions do not require precise optimizations. In contrast, evolution at the network level explores the neural network space much slower. The population may prematurely converge and it may take a very long time to nd an area of the neural network space that contains good solutions. However, once such an area is located, the network-level evolution can converge to the best networks very eciently. An analogy can be drawn to the famous competition between the tortoise (evolution at the network level) and the hare (SANE). Thus, an important research focus is how to combine the early search eciency of SANE's neuron evolution with the ne tuning of a more standard, network evolution. A hierarchical approach to neuro-evolution is presented where the two levels of evolution are integrated in a single framework. An outer-loop network-level evolution is incorporated on top of SANE's neuron population. Thus, two separate populations are maintained: a population of neurons and a population of network blueprints. The neuron population provides ecient evaluation of the genetic building blocks, while the population of network blueprints learns eective combinations of these building blocks. Initially, the population of blueprints is random, resulting in similar random neuron combinations as performed in SANE. As the blueprint population is evolved, the neuron combinations become more focused towards the best networks. Thus, the hierarchical approach combines the early ecient exploration of SANE with the late exploitation of the network-level approaches. For clarity, the following terms are adopted for the remainder of the paper. The SANE neuroevolution system described by Moriarty and Miikkulainen (1996) will be referred to as SANE1. The neuro-evolution system described in this paper, which extends on SANE1, will be referred to as Hierarchical SANE. The terms network-level or standard neuro-evolution will denote the morecommon approach of evolving individuals that represent complete neural networks. Experiments were conducted using a sophisticated robot arm manipulation task to compare SANE1, Hierarchical SANE, and the standard neuro-evolution approach. SANE1 found good solutions quickly, but was inconsistent at reaching the desired level of prociency. The networklevel approach learned much more slowly, but eventually surpassed SANE1's stalled neuron-level evolution and reached a solution in every simulation. Hierarchical SANE matched SANE1's early search eciency and found a solution in half as many generations as the network-level approach. The combined search eciency and focusing qualities of the hierarchical approach suggests that Hierarchical SANE should be applicable to a wide range of reinforcement learning problems where standard approaches are ineective. The body of this paper is organized as follows. The next section describes advantages of evolving at the neuron level and evolving at the network level, followed by the description of the hierarchical approach in section 3. The experimental comparisons of SANE1, Hierarchical SANE, 2

3 and the standard neuro-evolution are presented in section 4, and the nal section summarizes our conclusions. 2 Neuron vs. Network Level Evolution In most approaches to neuro-evolution, each individual represents a complete neural network that is evaluated independently of other networks in the population (Belew et al. 1991; Koza and Rice 1991; Nol and Parisi 1992; Whitley et al. 1993). By treating each member as a separate full solution, the genetic algorithm focuses the search towards a single dominant individual. Such concentration leads the population to converge on a single solution (Goldberg 1989). Once the population has converged, further movement is generally only possible through random mutation. Crossover operations become ineective since the same genetic building blocks are joined in the new ospring. In contrast, in a neuron-level evolution individuals are only partial neural networks; complete networks are built by decoding several individuals' chromosomes. Individual neurons are evaluated by how well they perform when combined with other neurons in the population. Since a single neuron usually cannot perform the whole task alone, eective neural networks must contain neurons that perform dierent functions. By basing neuron tness on the performance of the neural networks, SANE1 introduces evolutionary pressures to evolve several dierent types or specializations 1 of neurons. Thus, unlike the network-level evolution, the population does not converge to a single individual, but instead contains a diverse collection of neurons that optimize dierent aspects of a neural network. Diversity promotes quick exploration of the solution space. Whereas a genetic search in a converged population is directed by the mutation operator and progresses very slowly, a genetic search with a diverse population can continue to utilize its crossover operations to generate new structures and make larger traversals of the solution space in shorter periods of time. Such an advantage is important not only in early generations, but when changes occur in the domain. Often in neuro-control tasks, controllers must be quickly revised to avoid the costly eects of an outdated control policy. A diverse population can more adeptly make the modications necessary to compensate for the domain changes. In addition to maintaining diverse populations, evolution at the neuron level more accurately evaluates the genetic building blocks. In a network-level evolution, each neuron is implemented only with the other neurons encoded on the same chromosome. With such a representation, a very good neuron may exist on a chromosome but be subsequently lost because the other neurons on the chromosome are poor. In a neuron-level evolution, neurons are continually recombined with many dierent neurons in the population, which produces a more accurate evaluation of tness. Essentially, a neuron-level evolution takes advantage of the a priori knowledge that individual neurons constitute basic components of neural networks. A neuron-level evolution explicitly promotes genetic building blocks in the population that may be useful in building other networks. A network-level evolution does so only implicitly, along with various other sub- and superstructures (Goldberg 1989). In other words, by evolving at the neuron level the genetic algorithm is no longer relied upon to identify neurons as important building blocks, since neurons are the object of evolution. 1 We chose the term specialization rather than species since each neuron does not represent a full solution to the problem. 3

4 The neuron evolution in SANE1 (Moriarty and Miikkulainen 1996) achieves each of these advantages, but also contains an Achilles' heel. In SANE1, networks are formed from randomly-selected subpopulations of neurons, which is undesirable for two reasons. First, the top neurons may not be combined with neurons that work well together. Thus, a very good neuron may be lost, because it was ineectively combined during a generation. The second problem is that the quality of the networks varies greatly throughout evolution. In early generations this works to SANE1's advantage, since it is able to try many dierent types of networks to nd the most eective neurons. However, in later generations, when the search should focus on the best networks, SANE's inconsistent networks often stall the search and prevent the global optima from being located. In easy tasks such as pole-balancing, SANE1 performs quite well since solutions are plentiful and generally found in the rst 10 generations (Moriarty and Miikkulainen 1996). Experiments presented in section 4, however, show that in signicantly dicult tasks that require high precision within the solution space, SANE1 is unable to build upon the best networks to improve the population in later generations. The converged network-level population operating through mutation eventually catches up to SANE1 and nds better solutions. This phenomena motivated the development of a hierarchical approach, which preserves the quick explorative feature of SANE1, while incorporating the ne tuning ability of a network-level search. 3 Hierarchical Evolution Hierarchical SANE incorporates an outer-loop network-level evolution on top of SANE1's neuron evolution. Two populations are maintained and evolved: a population of neurons and a population of network blueprints. Each individual in the neuron population species a set of connections to be made within a neural network. Each individual in the network blueprint population species a set of neurons to include in a neural network. Conjunctively, the neuron evolution searches for eective partial networks, while the blueprint evolution searches for eective combinations of the partial networks. 3.1 The Neuron Population The neuron evolution in Hierarchical SANE is very similar to the strategy introduced by Moriarty and Miikkulainen (1996), except for a few modications. The neuron level evolution is described in two parts. First, the basic strategy is presented, which includes the tness calculation and the specics of the genetic algorithm. Second, some performance improvements over the neuron evolution in SANE1 are described and discussed Core Strategy Each individual in the neuron population represents a hidden neuron in a given type of architecture such as a 3-layer-feedforward network. Neurons are dened in bitwise chromosomes that encode a series of connection denitions, each consisting of an 8-bit label eld and a 16-bit weight eld. The absolute value of the label determines where the connection is to be made. The neurons only connect to the input and the output layer. If the decimal value of the label, D, is greater than 127, then the connection is made to output unit D mod O, where O is the total number of output units. Similarly, if D is less than or equal to 127, then the connection is made to input unit D mod I, where I is the total number of input units. The weight eld encodes a oating point weight for 4

5 Input Layer label weight Output Layer Figure 1: Forming a simple 8 input, 3 hidden, 5 output unit neural network from three hidden neuron denitions. The chromosomes of the hidden neurons are shown to the left and the corresponding network to the right. In this example, each hidden neuron has 3 connections. 1. Clear tness value of each neuron. 2. Select neurons from the population using a blueprint. 3. Create a neural network from the selected neurons. 4. Evaluate the network in the given task. 5. Repeat steps 2-4 for each individual in the blueprint population. 6. Assign each neuron the score of the best 5 networks in which it participated. 7. Perform crossover operations on the population based on the tness value of each neuron. Table 1: The basic steps in one generation of the neuron evolution. the connection. Figure 1 shows how a neural network is formed from three sample hidden neuron denitions. The basic steps in one generation of the neuron evolution are as follows (table 1): during the evaluation stage, neuron subpopulations of size are selected and combined to form a neural network. The selection method depends on the blueprint population and will be described in section 3.2. Each neuron receives the summed tness evaluations of the best ve networks in which it participated. After the evaluation stage, the neuron population is ranked based on the tness values. For each neuron in the top 25% of the population, a mate is selected randomly among the top 25%. Each mating operation creates two ospring: a copy of one of the mates and a child created through one-point crossover. The second child created by crossover is discarded. Copying one of the parents reduces the eect of adverse neuron mutation on the blueprint-level evolution, as will be further explained in section The two ospring replace the worst-performing neurons (according to rank) in the population. Mutation at the rate of 0.1% per bit position is performed on the entire population as the last step in each generation. Such an aggressive, elitist breeding strategy is not normally used in evolutionary applications, since it leads to quick convergence of the population. A neuron evolution, however, performs quite well with such an aggressive selection strategy, since it contains strong evolutionary pressures against convergence. 5

6 3.1.2 Performance Improvements Some important modications were made to the neuron evolution of Moriarty and Miikkulainen (1996) to make it a strong conduit to it's blueprint-level counterpart. Previously, a neuron's tness was determined by the average score of all the networks in which it participated. Under such a system, however, neurons that are crucial in the best networks, but may be ineective in poor networks, receive lower tness scores and are often selected against. For example in the robot arm manipulation task, a neuron that specializes in small movements near the target would be eective in networks that position the arm close to the target, but useless in networks that don't get anywhere near the target. Such neurons are very important to the population and should be propagated to future generations. Assigning tness based on the performance of the best networks helps to ensure that these neurons will not be selected against. A further modication to the original tness assignment is to sum the scores of the best networks, rather than take the average. A common problem occurs when a neuron \hitchhikes" or participates, without signicantly contributing, in a single network that performs well. Hitchhiking turns out to be particularly prevalent in the hierarchical approach, since selecting neurons based on the network blueprints biases the neuron participation towards the top neurons and many sub-par neurons are only evaluated in a single network. By taking averages, these seldom-participating neurons could receive a very high tness evaluation, by a single lucky network. Summing the scores removes the hitchhiking problem by creating a tness bias towards neurons that participate in more networks. For example, a neuron that participates in ve networks will almost always receive a higher tness evaluation than a neuron that participates in a single network. In addition, summing tness scores more closely integrates the two levels of evolution by preferring neurons that are popular components of the network-level evolution. 3.2 The Network Blueprint Population The purpose of the blueprint population is twofold: to organize the neurons into workable groups and to assign more trials to the better-performing neurons. In SANE1, networks are formed by randomly combining subpopulations of neurons. In Hierarchical SANE, the network blueprints specify which neurons should be connected together. As eective neuron combinations are found, they are preserved in the blueprint chromosomes and propagated to future generations. Maintaining network blueprints produces more accurate neuron evaluations and concentrates the search on the best neural networks. Since neurons are connected systematically based on past performance, they are more consistently combined with other neurons that perform well together. Additionally, better-performing neurons garner more pointers from the blueprint population and thus participate in a greater number of networks. Biasing the neuron participation towards the historically better-performing neurons provides more accurate evaluations of the top neurons. The sacrice, however, is that newer neurons may not receive enough trials to be accurately evaluated. In practice, allocating more trials to the top neurons produces a signicant improvement over uniform neuron participation. The primary advantage of evolving network blueprints, however, is the exploitation of the best networks found during evolution. SANE1 fosters no memory of the previous networks formed, and good neuron combinations found in one generation often never occur in subsequent generations. Such behavior causes the quick, explorative search to stall, because it cannot exploit the best neuron combinations. Hierarchical SANE maintains the procient collections of neurons in the blueprint 6

7 Network Blueprint Population Neuron Population l l l l l l l l w l w l w w l w l w w l w l w w l w l w w l w l w w l w l w w l w l w w l w l w Figure 2: An overview of the network blueprint population in relation to the neuron population. Each member of the neuron population species a series of connections (labels and weights) to be made within a neural network. Each member of the blueprint population species a series of pointers to specic neurons which are used to build a neural network. chromosomes and ensures that the best networks are reconstructed. By evolving the blueprint population, the best neuron combinations are also recombined to form new, potentially better, collections of neurons. Hierarchical SANE thus provides a more exploitative search that can build upon the best networks found during evolution and focus the search in later generations Core Strategy Each individual in the blueprint population contains a series of neuron pointers. More specically, a blueprint chromosome is an array, of size, of address pointers to neuron structures. Figure 2 illustrates how the blueprint population is integrated with the neuron population. Initially, the chromosome pointers are randomly assigned to neurons in the neuron population. During the neuron evaluation stage, subpopulations of neurons are selected based on each blueprint's array of pointers. Thus unlike SANE1 which forms networks by randomly selecting subpopulations of neurons, Hierarchical SANE forms networks by following pointers in each blueprint's chromosome. Blueprints receive the tness score of the neural network that they specify. After each blueprint has been evaluated, the blueprint population is ranked and crossover operations are applied to the top blueprints. Since the chromosomes are made up of address pointers instead of bits, crossover only occurs between pointers. The new ospring receive the same address pointers which the parent chromosomes contained. In other words, if a parent chromosome contains a pointer to a specic neuron, one of its ospring will point to that same neuron (barring mutation). The current genetic algorithm on the blueprint level is identical to the aggressive strategy used at the neuron level, however the similarity is not essential and a more-standard genetic algorithm could be used. Empirically, the aggressive strategy at the blueprint level coupled with the strong mutation strategy described in the next section, has outperformed many of the more-standard genetic algorithms. 7

8 3.2.2 Performance Improvements To avoid convergence problems at the blueprint level, a two-component mutation strategy is employed. First, one pointer in each ospring blueprint is randomly reassigned to another member of the neuron population. This strategy promotes participation of neurons other than the top neurons in subsequent networks. Thus, a neuron that does not participate in any networks can acquire a pointer and participate in the next generation. Since the mutation only occurs in the blueprint ospring, the neuron pointers in the top blueprints are always preserved. The second mutation component is a selective strategy designed to take advantage of the new structures created by the neuron evolution. Recall that a breeding neuron produces two ospring: a copy of itself and the result of a crossover operation with another breeding neuron. Each neuron ospring is thus similar to and potentially better than its parent neurons. The blueprint evolution can use this knowledge by occasionally replacing pointers to breeding neurons with pointers to ospring neurons. In the experiments described in this paper, pointers are switched from breeding neurons to one of their ospring with a 50% probability. Again, this mutation is only performed in the ospring blueprints, and the pointers in the top blueprints are preserved. This selective mutation mechanism has two advantages. First, because pointers are reassigned to neuron ospring resulting from crossover, the blueprint evolution can explore new neuron structures. Second, because pointers are also reassigned to ospring that were formed by copying the parent, the blueprints become more resilient to adverse mutation in the neuron evolution. If pointers were not reassigned to copies, many blueprints would point to the same exact neuron, and any mutation to that neuron would aect every blueprint pointing to it. When pointers are occasionally reassigned to copies, however, such mutation is limited to only a few blueprints. The eect is similar to schema promotion in standard genetic algorithms. As the population evolves, highly t schema (i.e. neurons) become more prevalent in the population, and mutation to one copy of the schema do not aect other copies in the population. 4 Evaluation: Manipulating a Robot Arm Much of our research is currently directed at applications of neuro-evolution for robot control. Specically, we are evolving neural networks to control a robot arm manipulator using visual input. Most neural network applications to this problem learn hand-eye coordination through supervised training methods such as backpropagation or conjugate gradient descent (Kawato 1990; Miller 1989; van der Smagt 1995; Werbos 1992). Supervised learning, however, requires training examples that demonstrate correct mappings from input to output. The current approaches for generating training examples for robot arm control are very limited and ineective in uncertain or obstacle-lled domains. Thus, neuro-controllers that learn from supervised techniques cannot integrate hand-eye coordination with obstacle avoidance. Neuro-evolution, however, does not require input/output examples and can learn the intermediate joint rotations necessary for avoiding obstacles in uncertain environments. Thus, neuro-evolution provides a promising framework for the automatic development of robot arm controllers in realistic domains. SANE1, Hierarchical SANE, and a standard neuro-evolution approach were evaluated experimentally in a robot arm control task, which requires hand-eye coordination, but not obstacle avoidance. The purpose of these experiments is to provide a sophisticated, real-world benchmark from which to illustrate the advantages of the dierent levels of neuro-evolution. The particular robot arm simulator used in these experiments was the Simderella 2.0 package written by van der 8

9 3 2 1 Figure 3: The Simderella robot arm simulation of the OSCAR robot. The numbers indicate the joints which are to be controlled. The goal of the controller is to position the hand within 10 cm of a randomly placed target object. The target object is shown in the right quadrant. Smagt (1994). Simderella is a simulation of the OSCAR 6 degree-of-freedom anthromoporphic arm. van der Smagt (1995) has reported that controllers that perform well in Simderella exhibit very similar performance when applied to the actual physical robot, OSCAR. Figure 3 illustrates the Simderella robot. The goal of the network controller is to maneuver the arm to a position within ten centimeters of a randomly-placed target object (A secondary network, not described in this paper, can learn the smaller, nite movements necessary to grasp the object). The controller moves the arm by specifying joint rotations to the rst three joints at each time step. While OSCAR does contain a total of six joints, van der Smagt (1995) has shown that for most industrial tasks, controlling the rst three joints is sucient. The arm contains a camera, located in the end eector (hand), that provides the x, y, and z distances of the target object from the current end eector position. At each time step, the neural network controller receives its current joint positions and the target distances as it's input and generates the degrees to which each of its three controllable joints are to be rotated as its output. 4.1 The Neuro-Control Network The neural network controller contains 6 input, 8 hidden, and 7 output units (gure 4). The input units correspond to the x, y, and z relative distances returned by the hand camera and the current joint positions of the rst three joints. The joint positions are normalized between 0 and 1. This encoding relieves the network from the task of determining the maximum and minimum positions of each joint, since for each joint the maximum position is represented as 1 and the minimum position is represented as 0. The input from the hand camera, however, is not normalized. Normalization would reduce the amount of information received from the hand camera and consequently reduce the accuracy of the networks' joint rotations. Simulations have conrmed that networks perform best with the absolute data from the hand camera. The rotation of each joint is interpreted from two unique output units. The rst unit species the direction of rotation (positive or negative) based on the sign of its total activation. The second output unit is a sigmoidal unit that species the amount of rotation, normalized between 0.0 and 5.0 by multiplying the sigmoid output by 5.0. Limiting each joint rotation to +/- 5 degrees forces the network to make several small joint rotations to reach the target. Thus a bad rotation in one 9

10 Input Layer Output Layer Hand Camera X Hand Camera Y Hand Camera Z Joint 1 Joint 2 Joint 3 Rotation of Joint 1 Rotation of Joint 2 Rotation of Joint 3 Stop Arm Figure 4: The architecture of the neural network controller for the Simderella robot. The dark arrows indicate the activation propagation from the input to hidden layer and hidden layer to output layer. The connections and weights between the hidden layer and the input and output layers are evolved by the genetic algorithm. time step can be more easily corrected in a subsequent time step. Smaller joint movements also allows the network to visit more of the state space during a trial, which should produce a more global control policy. A nal threshold output unit is included as an override unit that can prevent movement regardless of the activations of the other output units. If the activation of the override unit is positive, the arm is not moved. If it is negative, the joint rotations are made based on the other output units. The override unit allows networks to easily stop the arm if the target position has been reached. Otherwise, stopping the arm would require setting the activation of the three sigmoidal units to exactly 0.5. Since genetic algorithms do not make systematic, small weight changes, it is very dicult to evolve neurons to compute such exact values. 4.2 Experimental Setup To test the advantages of each approach, SANE1, Hierarchical SANE, and a standard neuroevolution approach were implemented in the Simderella simulator to evolve the hidden layer connections and weights of a neuro-control network. Each network contained 8 hidden units with 12 connections per unit. Connection denitions were decoded from the chromosomes as described in section 3.1. In the neuron evolution, each individual consisted of a single hidden unit, while in the standard approach, an individual consisted of 8 hidden units concatenated on a single chromosome. To focus the comparison on the dierent strategies of neuro-evolution, rather than the choice of parameter settings, several preliminary experiments were run to discover eective parameter values for each approach. Table 2 summarizes the parameter choices for each method. With the standard approach, a population size of 100 networks was found to be more eective than populations of 50 or 200. Keeping the number of network evaluations per generation constant across each approach, a neuron population size of 200 for SANE1 and 800 for Hierarchical SANE performed well. 10

11 SANE1 Standard Hierarchical SANE Neuron Population Size Network Population Size Selection Strategy Elite/Rank Tournament Elite/Rank Table 2: Summary of the parameter settings that were found eective for each approach. SANE1 requires a smaller population than Hierarchical SANE because neurons are evaluated through random combinations. The population must be small enough to allow neurons to participate in several networks per generation. For example, randomly selecting 8 neurons for 100 networks in a 200 neuron population gives each neuron an expected network participation rate of 4 networks per generation. In Hierarchical SANE, the neuron population is not as restricted since neuron participation is dictated by the network individuals. Hierarchical SANE skews the participation rate towards the best neurons and leaves many neurons unevaluated in each generation. An unevaluated neuron is normally garbage, since no network individual uses it to build a network. Several mutation rates were implemented for the standard approach, however the conservative rate of 0.1% per bit performed the best. The same mutation rate was used for SANE1 and Hierarchical SANE. Not surprisingly, the aggressive genetic selection strategy described in section performed poorly in the standard neuro-evolution approach. Some good solutions were found quickly, however, the populations often converged to suboptimal solutions and essentially became stuck. A 2-way tournament selection strategy produced more consistent and better average results for the standard neuro-evolution approach. Each neural network evaluation began with random, but legal, joint positions and a random target position. The same reach space as van der Smagt (1995) was employed, where targets were placed within a 180 degree rotation of the rst joint. A total of 450 target positions were created and separated into a 400 position training set and a 50 position test set. During evolution, a target position was randomly selected from the training set before each network evaluation. During each trial, a network was allowed to move the arm until one of the following conditions occurred: 1. The network stopped the arm. 2. The network placed the arm in an illegal position (e.g. hitting the oor). 3. The number of moves exceeded 25. The score for each trial was computed as the percentage of distance that the arm covered from its initial starting point to the target position. For example, if the arm started 120 cm from the target and its nal position was 20 cm from the target, the network received a score of (120 20)=120 = 0:83. The percentage of distance covered, instead of the absolute nal distance, provides a fairer comparison between a network that receives a close target and a network that receives a distant target. Each network was evaluated over a single randomly selected target. 4.3 Learning Speed Results Twenty simulations of each approach were run, each for 100 generations, which requires 10,000 network evaluations. The best network of each generation (according to tness) was then tested on the 50 target test set. Table 3 shows the average number of generations necessary to evolve a 11

12 Below 30 cm. Below 10 cm. Method Mean Best Worst SD Mean Best Worst SD SANE Standard Neuro-evolution Hierarchical SANE Table 3: The average number of generations required to nd a network that averaged less than 30 and 10 cm distance from 50 targets. SANE1 found a solution that averages less than 10 cm in only 9 out of 20 simulations. The hierarchical approach only required about half as many generations as the standard approach. network controller that on the average places the arm within 30 cm and 10 cm of the targets in the test set. Simulations that could not nd a network with average performance below 10 cm were allowed to evolve beyond 100 generations. SANE1 found solutions averaging less than 30 cm faster than the standard neuro-evolution approach, however, it was ineective at the 10 cm level, where it found solutions in only 9 out of 20 simulations. The hierarchical approach proved the most ecient, requiring half the generations of the standard approach at both 30 and 10 cm levels. Figure 5 plots the average distance from the targets in the test set per generation. The distance refers to the best distance found at or before each generation during a single simulation. The average distance refers to the best distances averaged over 20 simulations. The graph eectively illustrates the problems of the neuron evolution in SANE1. Good solutions are found early, but the population eventually stalls and cannot generate better solutions. The slower standard approach eventually surpasses SANE1 and continues to improve to the best solutions. Hierarchical SANE, however, achieves the same early eciency of SANE1 and continues to generate better solutions throughout the evolution. 4.4 Adaptation Results The second set of experiments tested the ability to adapt to changes in the domain. Populations were evolved as described above until a network was found that averaged less than 10 cm over the test set. The domain was then changed by removing the information of the position of the rst joint. van der Smagt (1995) showed that correct control decisions can be generated without knowledge of the rst joint position; there is sucient information from the hand camera and remaining joints to compute the position of the rst joint. Thus, by evolving with the rst joint information and then removing it, the population must adapt its network controllers to identify the rst joint information from the other input units. Figure 6 shows the adaptive performance of the network level and hierarchical approaches. SANE1 is not plotted since it failed to reliably nd solutions averaging less than 10 cm. While Hierarchical SANE was able to adapt its control policy to reach the 10 cm target within 10 generations, the standard approach adapted much more slowly, requiring 24 generations on the average. Increasing the mutation rate did produce faster adaptation in the standard approach, but it hindered the original learning rate. 12

13 70 60 SANE1 Standard Neuro-evolution Hierarchical SANE cm Generation Figure 5: The average distance from 50 randomly placed targets per generation for each level of evolution. The distances are averaged over 20 simulations. 35 Standard Neuro-evolution Hierarchical SANE cm Generation Figure 6: Adaptation comparisons between Hierarchical SANE and the standard approach. After the rst joint information is removed, Hierarchical SANE requires an average of 10 generations to recover to the 10 cm mark, while the standard approach requires

14 Participation Ranking Figure 7: The number of network participations for each neuron in the neuron population. The neurons are ranked based on their tness. Using the hierarchical approach, the rate of participation is biased towards the best neurons. 4.5 Analyzing Blueprint Evolution As described earlier, one of the advantages of the hierarchical approach is that neuron participation is biased towards the top performing neurons. In other words, the best neurons will participate in more networks than the newer or poorer neurons. Figure 7 shows the typical rate of participation in the neuron population in the last generation. The data was collected from a single simulation, however, other simulations exhibited very similar behavior. Of the 800 opportunities for participation, 2 40% were lled by neurons ranked in the top 6% of the population and 78% by neurons in the top 25%. Neurons ranked from 300 to 750 did not participate in any networks. Neurons ranked at the bottom of the population were evaluated but performed so poorly (e.g. moving the arm away from the target) that they were ranked below neurons which had no participations. Table 4 shows the neurons of the top 20 network blueprints during the last generation of a single simulation. The neuron pointer numbers refer to the rank of that neuron in the population after the tness had been distributed. For example, the top blueprint included the 122nd ranked neuron, the 135th ranked neuron, the top ranked neuron, and so on. The rst conclusion from the table is that the blueprint population is quite diverse. Blueprints 1 and 8 are the most similar, but they only share 5 out of the 8 neuron pointers. Thus the mutation strategy presented in section 3.2 appears to provide sucient diversity, allowing the blueprint population to sample many dierent combinations of neuron pointers. Table 4 also shows that many neurons ranked in the range are included in the top networks. Interestingly, these neurons only participate in 1 or 2 networks per generation (gure 7). Closer inspection reveals that the majority of these neurons are new, ospring neurons generated networks are formed per generation, each with 8 neurons. 14

15 Network Rank Neuron Pointers Table 4: The neurons in the top 20 network blueprints in the last generation of a simulation using Hierarchical SANE. The neuron numbers refer to their rank in the population after their tness was calculated. The table shows a very diverse collection of neurons and several examples of ospring neurons present in the best networks. during the previous generation. Thus, from the prevalent use of these neurons in the top blueprints, it can be concluded that the blueprint evolution is making eective use of the new neural structures created by the neuron evolution. 5 Conclusion Evolving neural networks at the neuron level oers important advantages over the more standard evolution of complete neural networks. Neuron evolution maintains population diversity and provides more accurate evaluation of the genetic building blocks, which produces a faster, more ecient genetic search. Unfortunately, neuron evolution alone, as in the SANE system, often cannot build upon the top neural networks in later generations and has diculty pinpointing the best solutions. In contrast, the slower standard neuro-evolution approach is more procient at ne tuning and can more reliably reach the best solutions. Incorporating an outer-loop network blueprint evolution on top of SANE's neuron evolution combines the advantages of both approaches by focusing the search on the best neuron combinations. Demonstrated in a sophisticated robot arm manipulation task, Hierarchical SANE consistently reached the desired level of prociency in half as many generations as a standard neuro-evolution approach. Hierarchical SANE is an important extension of the SANE system, since it makes it possible to tackle more challenging tasks that require both timely and precise solutions. In the future, Hierarchical SANE will be applied to such tasks including robot arm control with uncertain environments and sensor noise. 15

16 Acknowledgments The authors would like to thank Patrick van der Smagt for making his Simderella simulator available. This research was supported in part by the National Science Foundation under grant #IRI References Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike adaptive elements that can solve dicult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834{846. Belew, R. K., McInerney, J., and Schraudolph, N. N. (1991). Evolving networks: Using genetic algorithm with connectionist learning. In Farmer, J. D., Langton, C., Rasmussen, S., and Taylor, C., editors, Articial Life II. Reading, MA: Addison-Wesley. Cli, D., Harvey, I., and Husbands, P. (1993). Explorations in evolutionary robotics. Adaptive Behavior, 2:73{110. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. Kawato, M. (1990). Computational schemes and neural network models for formation and control of multijoint arm trajectory. In Neural Networks for Control. Cambridge, MA: MIT Press. Koza, J. R., and Rice, J. P. (1991). Genetic generalization of both the weights and architecture for a neural network. In International Joint Conference on Neural Networks, vol. 2, 397{404. New York, NY: IEEE. Miller, W. T. (1989). Real-time application of neural networks for sensor-based control of robots with vision. IEEE Transactions on Systems, Man, and Cybernetics, 19(4):825{831. Moriarty, D. E., and Miikkulainen, R. (1996). Ecient reinforcement learning through symbiotic evolution. Machine Learning, 22:11{32. Nol, S., Floreano, D., Miglino, O., and Mondada, F. (1994). How to evolve autonomous robots: Dierent approaches in evolutionary robotics. In Articial Life IV. Cambridge, MA. Nol, S., and Parisi, D. (1992). Growing neural networks. In Articial Life III. Reading, MA: Addison- Wesley. van der Smagt, P. (1994). Simderella: A robot simulator for neuro-controller design. Neurocomputing, 6(2). van der Smagt, P. (1995). Visual Robot Arm Guidance using Neural Networks. PhD thesis, The University of Amsterdam, Amsterdam, The Netherlands. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge, England. Werbos, P. J. (1992). Neurocontrol and supervised learning: An overview and evaluation. In Handbook of Intelligent Control, 65{89. New York: Van Nostrand Reinhold. Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259{284. Yamauchi, B. M., and Beer, R. D. (1993). Sequential behavior and learning in evolved dynamical neural networks. Adaptive Behavior, 2:219{

Forming Neural Networks through Ecient and Adaptive. Coevolution. Abstract

Evolutionary Computation, 5(4), 1998. (In Press). Forming Neural Networks through Ecient and Adaptive Coevolution David E. Moriarty Information Sciences Institute University of Southern California 4676