Hierarchical Evolution of Neural Networks. The University of Texas at Austin. Austin, TX Technical Report AI January 1996.

Size: px
Start display at page:

Download "Hierarchical Evolution of Neural Networks. The University of Texas at Austin. Austin, TX Technical Report AI January 1996."

Transcription

1 Hierarchical Evolution of Neural Networks David E. Moriarty and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX Technical Report AI January 1996 Abstract In most applications of neuro-evolution, each individual in the population represents a complete neural network. Recent work on the SANE system, however, has demonstrated that evolving individual neurons often produces a more ecient genetic search. This paper explores the merits of neuro-evolution both at the neuron level and at the network level. While SANE can solve easy tasks in just a few generations, in tasks that require high precision, its progress often stalls and is exceeded by a standard, network-level evolution. In this paper, a new approach called Hierarchical SANE is presented that combines the advantages of both approaches by integrating two levels of evolution in a single framework. Hierarchical SANE couples the early explorative quality of SANE's neuron-level search with the late exploitative quality of a more standard network-level evolution. In a sophisticated robot arm manipulation task, Hierarchical SANE signicantly outperformed both SANE and a standard, network-level neuro-evolution approach, suggesting that it can more eciently solve a broad range of tasks. 1 Introduction Articial evolution has proven to be an eective method for forming neural networks in tasks where sparse reinforcement precludes normal supervised learning methods such as backpropagation. The evolutionary framework frees the implementor from generating training examples and provides a highly adaptive mechanism for dynamic environments. Recent work has shown evolved neurocontrollers eective in unstable, dynamic control tasks (Cli et al. 1993; Moriarty and Miikkulainen 1996; Nol et al. 1994; Whitley et al. 1993; Yamauchi and Beer 1993). The bane of the evolutionary methods, however, has been the large number of tness evaluations that must be performed to achieve a high level of performance. Recently, Moriarty and Miikkulainen (1996) have developed a more ecient neuro-evolution system called SANE (Symbiotic, Adaptive Neuro-Evolution) with promising scale-up properties. SANE was designed as a method for eciently forming eective decision policies in sequential decision tasks. Unlike most approaches to neuro-evolution where each individual in the population is a complete neural network, SANE's individuals are single neurons (hidden neurons in a 3-layer 1

2 network). Evolution at the neuron level promotes population diversity and allows SANE to better evaluate subcomponents of the nal solution. In the pole balancing benchmark, SANE compared favorably to temporal dierence methods such as the Adaptive Heuristic Critic (Barto et al. 1983) and Q-learning (Watkins 1989), and network-level neuro-evolution systems such as GENITOR (Whitley et al. 1993). The purpose of this paper is to compare and contrast the merits of evolution at the neuron level and evolution at the network level. SANE's neuron-level approach is very adept at nding quick, eective solutions, but often has diculty pinpointing the best solutions. The diverse neuron population leads to many dierent neuron combinations, resulting in a faster exploration of the neural network space. However, SANE does not eectively exploit the best areas of the network space, and thus often cannot ne tune the best neural networks. SANE's search strategy is thus most eective in tasks where there are lots of solutions such as pole balancing or where solutions do not require precise optimizations. In contrast, evolution at the network level explores the neural network space much slower. The population may prematurely converge and it may take a very long time to nd an area of the neural network space that contains good solutions. However, once such an area is located, the network-level evolution can converge to the best networks very eciently. An analogy can be drawn to the famous competition between the tortoise (evolution at the network level) and the hare (SANE). Thus, an important research focus is how to combine the early search eciency of SANE's neuron evolution with the ne tuning of a more standard, network evolution. A hierarchical approach to neuro-evolution is presented where the two levels of evolution are integrated in a single framework. An outer-loop network-level evolution is incorporated on top of SANE's neuron population. Thus, two separate populations are maintained: a population of neurons and a population of network blueprints. The neuron population provides ecient evaluation of the genetic building blocks, while the population of network blueprints learns eective combinations of these building blocks. Initially, the population of blueprints is random, resulting in similar random neuron combinations as performed in SANE. As the blueprint population is evolved, the neuron combinations become more focused towards the best networks. Thus, the hierarchical approach combines the early ecient exploration of SANE with the late exploitation of the network-level approaches. For clarity, the following terms are adopted for the remainder of the paper. The SANE neuroevolution system described by Moriarty and Miikkulainen (1996) will be referred to as SANE1. The neuro-evolution system described in this paper, which extends on SANE1, will be referred to as Hierarchical SANE. The terms network-level or standard neuro-evolution will denote the morecommon approach of evolving individuals that represent complete neural networks. Experiments were conducted using a sophisticated robot arm manipulation task to compare SANE1, Hierarchical SANE, and the standard neuro-evolution approach. SANE1 found good solutions quickly, but was inconsistent at reaching the desired level of prociency. The networklevel approach learned much more slowly, but eventually surpassed SANE1's stalled neuron-level evolution and reached a solution in every simulation. Hierarchical SANE matched SANE1's early search eciency and found a solution in half as many generations as the network-level approach. The combined search eciency and focusing qualities of the hierarchical approach suggests that Hierarchical SANE should be applicable to a wide range of reinforcement learning problems where standard approaches are ineective. The body of this paper is organized as follows. The next section describes advantages of evolving at the neuron level and evolving at the network level, followed by the description of the hierarchical approach in section 3. The experimental comparisons of SANE1, Hierarchical SANE, 2

3 and the standard neuro-evolution are presented in section 4, and the nal section summarizes our conclusions. 2 Neuron vs. Network Level Evolution In most approaches to neuro-evolution, each individual represents a complete neural network that is evaluated independently of other networks in the population (Belew et al. 1991; Koza and Rice 1991; Nol and Parisi 1992; Whitley et al. 1993). By treating each member as a separate full solution, the genetic algorithm focuses the search towards a single dominant individual. Such concentration leads the population to converge on a single solution (Goldberg 1989). Once the population has converged, further movement is generally only possible through random mutation. Crossover operations become ineective since the same genetic building blocks are joined in the new ospring. In contrast, in a neuron-level evolution individuals are only partial neural networks; complete networks are built by decoding several individuals' chromosomes. Individual neurons are evaluated by how well they perform when combined with other neurons in the population. Since a single neuron usually cannot perform the whole task alone, eective neural networks must contain neurons that perform dierent functions. By basing neuron tness on the performance of the neural networks, SANE1 introduces evolutionary pressures to evolve several dierent types or specializations 1 of neurons. Thus, unlike the network-level evolution, the population does not converge to a single individual, but instead contains a diverse collection of neurons that optimize dierent aspects of a neural network. Diversity promotes quick exploration of the solution space. Whereas a genetic search in a converged population is directed by the mutation operator and progresses very slowly, a genetic search with a diverse population can continue to utilize its crossover operations to generate new structures and make larger traversals of the solution space in shorter periods of time. Such an advantage is important not only in early generations, but when changes occur in the domain. Often in neuro-control tasks, controllers must be quickly revised to avoid the costly eects of an outdated control policy. A diverse population can more adeptly make the modications necessary to compensate for the domain changes. In addition to maintaining diverse populations, evolution at the neuron level more accurately evaluates the genetic building blocks. In a network-level evolution, each neuron is implemented only with the other neurons encoded on the same chromosome. With such a representation, a very good neuron may exist on a chromosome but be subsequently lost because the other neurons on the chromosome are poor. In a neuron-level evolution, neurons are continually recombined with many dierent neurons in the population, which produces a more accurate evaluation of tness. Essentially, a neuron-level evolution takes advantage of the a priori knowledge that individual neurons constitute basic components of neural networks. A neuron-level evolution explicitly promotes genetic building blocks in the population that may be useful in building other networks. A network-level evolution does so only implicitly, along with various other sub- and superstructures (Goldberg 1989). In other words, by evolving at the neuron level the genetic algorithm is no longer relied upon to identify neurons as important building blocks, since neurons are the object of evolution. 1 We chose the term specialization rather than species since each neuron does not represent a full solution to the problem. 3

4 The neuron evolution in SANE1 (Moriarty and Miikkulainen 1996) achieves each of these advantages, but also contains an Achilles' heel. In SANE1, networks are formed from randomly-selected subpopulations of neurons, which is undesirable for two reasons. First, the top neurons may not be combined with neurons that work well together. Thus, a very good neuron may be lost, because it was ineectively combined during a generation. The second problem is that the quality of the networks varies greatly throughout evolution. In early generations this works to SANE1's advantage, since it is able to try many dierent types of networks to nd the most eective neurons. However, in later generations, when the search should focus on the best networks, SANE's inconsistent networks often stall the search and prevent the global optima from being located. In easy tasks such as pole-balancing, SANE1 performs quite well since solutions are plentiful and generally found in the rst 10 generations (Moriarty and Miikkulainen 1996). Experiments presented in section 4, however, show that in signicantly dicult tasks that require high precision within the solution space, SANE1 is unable to build upon the best networks to improve the population in later generations. The converged network-level population operating through mutation eventually catches up to SANE1 and nds better solutions. This phenomena motivated the development of a hierarchical approach, which preserves the quick explorative feature of SANE1, while incorporating the ne tuning ability of a network-level search. 3 Hierarchical Evolution Hierarchical SANE incorporates an outer-loop network-level evolution on top of SANE1's neuron evolution. Two populations are maintained and evolved: a population of neurons and a population of network blueprints. Each individual in the neuron population species a set of connections to be made within a neural network. Each individual in the network blueprint population species a set of neurons to include in a neural network. Conjunctively, the neuron evolution searches for eective partial networks, while the blueprint evolution searches for eective combinations of the partial networks. 3.1 The Neuron Population The neuron evolution in Hierarchical SANE is very similar to the strategy introduced by Moriarty and Miikkulainen (1996), except for a few modications. The neuron level evolution is described in two parts. First, the basic strategy is presented, which includes the tness calculation and the specics of the genetic algorithm. Second, some performance improvements over the neuron evolution in SANE1 are described and discussed Core Strategy Each individual in the neuron population represents a hidden neuron in a given type of architecture such as a 3-layer-feedforward network. Neurons are dened in bitwise chromosomes that encode a series of connection denitions, each consisting of an 8-bit label eld and a 16-bit weight eld. The absolute value of the label determines where the connection is to be made. The neurons only connect to the input and the output layer. If the decimal value of the label, D, is greater than 127, then the connection is made to output unit D mod O, where O is the total number of output units. Similarly, if D is less than or equal to 127, then the connection is made to input unit D mod I, where I is the total number of input units. The weight eld encodes a oating point weight for 4

5 Input Layer label weight Output Layer Figure 1: Forming a simple 8 input, 3 hidden, 5 output unit neural network from three hidden neuron denitions. The chromosomes of the hidden neurons are shown to the left and the corresponding network to the right. In this example, each hidden neuron has 3 connections. 1. Clear tness value of each neuron. 2. Select neurons from the population using a blueprint. 3. Create a neural network from the selected neurons. 4. Evaluate the network in the given task. 5. Repeat steps 2-4 for each individual in the blueprint population. 6. Assign each neuron the score of the best 5 networks in which it participated. 7. Perform crossover operations on the population based on the tness value of each neuron. Table 1: The basic steps in one generation of the neuron evolution. the connection. Figure 1 shows how a neural network is formed from three sample hidden neuron denitions. The basic steps in one generation of the neuron evolution are as follows (table 1): during the evaluation stage, neuron subpopulations of size are selected and combined to form a neural network. The selection method depends on the blueprint population and will be described in section 3.2. Each neuron receives the summed tness evaluations of the best ve networks in which it participated. After the evaluation stage, the neuron population is ranked based on the tness values. For each neuron in the top 25% of the population, a mate is selected randomly among the top 25%. Each mating operation creates two ospring: a copy of one of the mates and a child created through one-point crossover. The second child created by crossover is discarded. Copying one of the parents reduces the eect of adverse neuron mutation on the blueprint-level evolution, as will be further explained in section The two ospring replace the worst-performing neurons (according to rank) in the population. Mutation at the rate of 0.1% per bit position is performed on the entire population as the last step in each generation. Such an aggressive, elitist breeding strategy is not normally used in evolutionary applications, since it leads to quick convergence of the population. A neuron evolution, however, performs quite well with such an aggressive selection strategy, since it contains strong evolutionary pressures against convergence. 5

6 3.1.2 Performance Improvements Some important modications were made to the neuron evolution of Moriarty and Miikkulainen (1996) to make it a strong conduit to it's blueprint-level counterpart. Previously, a neuron's tness was determined by the average score of all the networks in which it participated. Under such a system, however, neurons that are crucial in the best networks, but may be ineective in poor networks, receive lower tness scores and are often selected against. For example in the robot arm manipulation task, a neuron that specializes in small movements near the target would be eective in networks that position the arm close to the target, but useless in networks that don't get anywhere near the target. Such neurons are very important to the population and should be propagated to future generations. Assigning tness based on the performance of the best networks helps to ensure that these neurons will not be selected against. A further modication to the original tness assignment is to sum the scores of the best networks, rather than take the average. A common problem occurs when a neuron \hitchhikes" or participates, without signicantly contributing, in a single network that performs well. Hitchhiking turns out to be particularly prevalent in the hierarchical approach, since selecting neurons based on the network blueprints biases the neuron participation towards the top neurons and many sub-par neurons are only evaluated in a single network. By taking averages, these seldom-participating neurons could receive a very high tness evaluation, by a single lucky network. Summing the scores removes the hitchhiking problem by creating a tness bias towards neurons that participate in more networks. For example, a neuron that participates in ve networks will almost always receive a higher tness evaluation than a neuron that participates in a single network. In addition, summing tness scores more closely integrates the two levels of evolution by preferring neurons that are popular components of the network-level evolution. 3.2 The Network Blueprint Population The purpose of the blueprint population is twofold: to organize the neurons into workable groups and to assign more trials to the better-performing neurons. In SANE1, networks are formed by randomly combining subpopulations of neurons. In Hierarchical SANE, the network blueprints specify which neurons should be connected together. As eective neuron combinations are found, they are preserved in the blueprint chromosomes and propagated to future generations. Maintaining network blueprints produces more accurate neuron evaluations and concentrates the search on the best neural networks. Since neurons are connected systematically based on past performance, they are more consistently combined with other neurons that perform well together. Additionally, better-performing neurons garner more pointers from the blueprint population and thus participate in a greater number of networks. Biasing the neuron participation towards the historically better-performing neurons provides more accurate evaluations of the top neurons. The sacrice, however, is that newer neurons may not receive enough trials to be accurately evaluated. In practice, allocating more trials to the top neurons produces a signicant improvement over uniform neuron participation. The primary advantage of evolving network blueprints, however, is the exploitation of the best networks found during evolution. SANE1 fosters no memory of the previous networks formed, and good neuron combinations found in one generation often never occur in subsequent generations. Such behavior causes the quick, explorative search to stall, because it cannot exploit the best neuron combinations. Hierarchical SANE maintains the procient collections of neurons in the blueprint 6

7 Network Blueprint Population Neuron Population l l l l l l l l w l w l w w l w l w w l w l w w l w l w w l w l w w l w l w w l w l w w l w l w Figure 2: An overview of the network blueprint population in relation to the neuron population. Each member of the neuron population species a series of connections (labels and weights) to be made within a neural network. Each member of the blueprint population species a series of pointers to specic neurons which are used to build a neural network. chromosomes and ensures that the best networks are reconstructed. By evolving the blueprint population, the best neuron combinations are also recombined to form new, potentially better, collections of neurons. Hierarchical SANE thus provides a more exploitative search that can build upon the best networks found during evolution and focus the search in later generations Core Strategy Each individual in the blueprint population contains a series of neuron pointers. More specically, a blueprint chromosome is an array, of size, of address pointers to neuron structures. Figure 2 illustrates how the blueprint population is integrated with the neuron population. Initially, the chromosome pointers are randomly assigned to neurons in the neuron population. During the neuron evaluation stage, subpopulations of neurons are selected based on each blueprint's array of pointers. Thus unlike SANE1 which forms networks by randomly selecting subpopulations of neurons, Hierarchical SANE forms networks by following pointers in each blueprint's chromosome. Blueprints receive the tness score of the neural network that they specify. After each blueprint has been evaluated, the blueprint population is ranked and crossover operations are applied to the top blueprints. Since the chromosomes are made up of address pointers instead of bits, crossover only occurs between pointers. The new ospring receive the same address pointers which the parent chromosomes contained. In other words, if a parent chromosome contains a pointer to a specic neuron, one of its ospring will point to that same neuron (barring mutation). The current genetic algorithm on the blueprint level is identical to the aggressive strategy used at the neuron level, however the similarity is not essential and a more-standard genetic algorithm could be used. Empirically, the aggressive strategy at the blueprint level coupled with the strong mutation strategy described in the next section, has outperformed many of the more-standard genetic algorithms. 7

8 3.2.2 Performance Improvements To avoid convergence problems at the blueprint level, a two-component mutation strategy is employed. First, one pointer in each ospring blueprint is randomly reassigned to another member of the neuron population. This strategy promotes participation of neurons other than the top neurons in subsequent networks. Thus, a neuron that does not participate in any networks can acquire a pointer and participate in the next generation. Since the mutation only occurs in the blueprint ospring, the neuron pointers in the top blueprints are always preserved. The second mutation component is a selective strategy designed to take advantage of the new structures created by the neuron evolution. Recall that a breeding neuron produces two ospring: a copy of itself and the result of a crossover operation with another breeding neuron. Each neuron ospring is thus similar to and potentially better than its parent neurons. The blueprint evolution can use this knowledge by occasionally replacing pointers to breeding neurons with pointers to ospring neurons. In the experiments described in this paper, pointers are switched from breeding neurons to one of their ospring with a 50% probability. Again, this mutation is only performed in the ospring blueprints, and the pointers in the top blueprints are preserved. This selective mutation mechanism has two advantages. First, because pointers are reassigned to neuron ospring resulting from crossover, the blueprint evolution can explore new neuron structures. Second, because pointers are also reassigned to ospring that were formed by copying the parent, the blueprints become more resilient to adverse mutation in the neuron evolution. If pointers were not reassigned to copies, many blueprints would point to the same exact neuron, and any mutation to that neuron would aect every blueprint pointing to it. When pointers are occasionally reassigned to copies, however, such mutation is limited to only a few blueprints. The eect is similar to schema promotion in standard genetic algorithms. As the population evolves, highly t schema (i.e. neurons) become more prevalent in the population, and mutation to one copy of the schema do not aect other copies in the population. 4 Evaluation: Manipulating a Robot Arm Much of our research is currently directed at applications of neuro-evolution for robot control. Specically, we are evolving neural networks to control a robot arm manipulator using visual input. Most neural network applications to this problem learn hand-eye coordination through supervised training methods such as backpropagation or conjugate gradient descent (Kawato 1990; Miller 1989; van der Smagt 1995; Werbos 1992). Supervised learning, however, requires training examples that demonstrate correct mappings from input to output. The current approaches for generating training examples for robot arm control are very limited and ineective in uncertain or obstacle-lled domains. Thus, neuro-controllers that learn from supervised techniques cannot integrate hand-eye coordination with obstacle avoidance. Neuro-evolution, however, does not require input/output examples and can learn the intermediate joint rotations necessary for avoiding obstacles in uncertain environments. Thus, neuro-evolution provides a promising framework for the automatic development of robot arm controllers in realistic domains. SANE1, Hierarchical SANE, and a standard neuro-evolution approach were evaluated experimentally in a robot arm control task, which requires hand-eye coordination, but not obstacle avoidance. The purpose of these experiments is to provide a sophisticated, real-world benchmark from which to illustrate the advantages of the dierent levels of neuro-evolution. The particular robot arm simulator used in these experiments was the Simderella 2.0 package written by van der 8

9 3 2 1 Figure 3: The Simderella robot arm simulation of the OSCAR robot. The numbers indicate the joints which are to be controlled. The goal of the controller is to position the hand within 10 cm of a randomly placed target object. The target object is shown in the right quadrant. Smagt (1994). Simderella is a simulation of the OSCAR 6 degree-of-freedom anthromoporphic arm. van der Smagt (1995) has reported that controllers that perform well in Simderella exhibit very similar performance when applied to the actual physical robot, OSCAR. Figure 3 illustrates the Simderella robot. The goal of the network controller is to maneuver the arm to a position within ten centimeters of a randomly-placed target object (A secondary network, not described in this paper, can learn the smaller, nite movements necessary to grasp the object). The controller moves the arm by specifying joint rotations to the rst three joints at each time step. While OSCAR does contain a total of six joints, van der Smagt (1995) has shown that for most industrial tasks, controlling the rst three joints is sucient. The arm contains a camera, located in the end eector (hand), that provides the x, y, and z distances of the target object from the current end eector position. At each time step, the neural network controller receives its current joint positions and the target distances as it's input and generates the degrees to which each of its three controllable joints are to be rotated as its output. 4.1 The Neuro-Control Network The neural network controller contains 6 input, 8 hidden, and 7 output units (gure 4). The input units correspond to the x, y, and z relative distances returned by the hand camera and the current joint positions of the rst three joints. The joint positions are normalized between 0 and 1. This encoding relieves the network from the task of determining the maximum and minimum positions of each joint, since for each joint the maximum position is represented as 1 and the minimum position is represented as 0. The input from the hand camera, however, is not normalized. Normalization would reduce the amount of information received from the hand camera and consequently reduce the accuracy of the networks' joint rotations. Simulations have conrmed that networks perform best with the absolute data from the hand camera. The rotation of each joint is interpreted from two unique output units. The rst unit species the direction of rotation (positive or negative) based on the sign of its total activation. The second output unit is a sigmoidal unit that species the amount of rotation, normalized between 0.0 and 5.0 by multiplying the sigmoid output by 5.0. Limiting each joint rotation to +/- 5 degrees forces the network to make several small joint rotations to reach the target. Thus a bad rotation in one 9

10 Input Layer Output Layer Hand Camera X Hand Camera Y Hand Camera Z Joint 1 Joint 2 Joint 3 Rotation of Joint 1 Rotation of Joint 2 Rotation of Joint 3 Stop Arm Figure 4: The architecture of the neural network controller for the Simderella robot. The dark arrows indicate the activation propagation from the input to hidden layer and hidden layer to output layer. The connections and weights between the hidden layer and the input and output layers are evolved by the genetic algorithm. time step can be more easily corrected in a subsequent time step. Smaller joint movements also allows the network to visit more of the state space during a trial, which should produce a more global control policy. A nal threshold output unit is included as an override unit that can prevent movement regardless of the activations of the other output units. If the activation of the override unit is positive, the arm is not moved. If it is negative, the joint rotations are made based on the other output units. The override unit allows networks to easily stop the arm if the target position has been reached. Otherwise, stopping the arm would require setting the activation of the three sigmoidal units to exactly 0.5. Since genetic algorithms do not make systematic, small weight changes, it is very dicult to evolve neurons to compute such exact values. 4.2 Experimental Setup To test the advantages of each approach, SANE1, Hierarchical SANE, and a standard neuroevolution approach were implemented in the Simderella simulator to evolve the hidden layer connections and weights of a neuro-control network. Each network contained 8 hidden units with 12 connections per unit. Connection denitions were decoded from the chromosomes as described in section 3.1. In the neuron evolution, each individual consisted of a single hidden unit, while in the standard approach, an individual consisted of 8 hidden units concatenated on a single chromosome. To focus the comparison on the dierent strategies of neuro-evolution, rather than the choice of parameter settings, several preliminary experiments were run to discover eective parameter values for each approach. Table 2 summarizes the parameter choices for each method. With the standard approach, a population size of 100 networks was found to be more eective than populations of 50 or 200. Keeping the number of network evaluations per generation constant across each approach, a neuron population size of 200 for SANE1 and 800 for Hierarchical SANE performed well. 10

11 SANE1 Standard Hierarchical SANE Neuron Population Size Network Population Size Selection Strategy Elite/Rank Tournament Elite/Rank Table 2: Summary of the parameter settings that were found eective for each approach. SANE1 requires a smaller population than Hierarchical SANE because neurons are evaluated through random combinations. The population must be small enough to allow neurons to participate in several networks per generation. For example, randomly selecting 8 neurons for 100 networks in a 200 neuron population gives each neuron an expected network participation rate of 4 networks per generation. In Hierarchical SANE, the neuron population is not as restricted since neuron participation is dictated by the network individuals. Hierarchical SANE skews the participation rate towards the best neurons and leaves many neurons unevaluated in each generation. An unevaluated neuron is normally garbage, since no network individual uses it to build a network. Several mutation rates were implemented for the standard approach, however the conservative rate of 0.1% per bit performed the best. The same mutation rate was used for SANE1 and Hierarchical SANE. Not surprisingly, the aggressive genetic selection strategy described in section performed poorly in the standard neuro-evolution approach. Some good solutions were found quickly, however, the populations often converged to suboptimal solutions and essentially became stuck. A 2-way tournament selection strategy produced more consistent and better average results for the standard neuro-evolution approach. Each neural network evaluation began with random, but legal, joint positions and a random target position. The same reach space as van der Smagt (1995) was employed, where targets were placed within a 180 degree rotation of the rst joint. A total of 450 target positions were created and separated into a 400 position training set and a 50 position test set. During evolution, a target position was randomly selected from the training set before each network evaluation. During each trial, a network was allowed to move the arm until one of the following conditions occurred: 1. The network stopped the arm. 2. The network placed the arm in an illegal position (e.g. hitting the oor). 3. The number of moves exceeded 25. The score for each trial was computed as the percentage of distance that the arm covered from its initial starting point to the target position. For example, if the arm started 120 cm from the target and its nal position was 20 cm from the target, the network received a score of (120 20)=120 = 0:83. The percentage of distance covered, instead of the absolute nal distance, provides a fairer comparison between a network that receives a close target and a network that receives a distant target. Each network was evaluated over a single randomly selected target. 4.3 Learning Speed Results Twenty simulations of each approach were run, each for 100 generations, which requires 10,000 network evaluations. The best network of each generation (according to tness) was then tested on the 50 target test set. Table 3 shows the average number of generations necessary to evolve a 11

12 Below 30 cm. Below 10 cm. Method Mean Best Worst SD Mean Best Worst SD SANE Standard Neuro-evolution Hierarchical SANE Table 3: The average number of generations required to nd a network that averaged less than 30 and 10 cm distance from 50 targets. SANE1 found a solution that averages less than 10 cm in only 9 out of 20 simulations. The hierarchical approach only required about half as many generations as the standard approach. network controller that on the average places the arm within 30 cm and 10 cm of the targets in the test set. Simulations that could not nd a network with average performance below 10 cm were allowed to evolve beyond 100 generations. SANE1 found solutions averaging less than 30 cm faster than the standard neuro-evolution approach, however, it was ineective at the 10 cm level, where it found solutions in only 9 out of 20 simulations. The hierarchical approach proved the most ecient, requiring half the generations of the standard approach at both 30 and 10 cm levels. Figure 5 plots the average distance from the targets in the test set per generation. The distance refers to the best distance found at or before each generation during a single simulation. The average distance refers to the best distances averaged over 20 simulations. The graph eectively illustrates the problems of the neuron evolution in SANE1. Good solutions are found early, but the population eventually stalls and cannot generate better solutions. The slower standard approach eventually surpasses SANE1 and continues to improve to the best solutions. Hierarchical SANE, however, achieves the same early eciency of SANE1 and continues to generate better solutions throughout the evolution. 4.4 Adaptation Results The second set of experiments tested the ability to adapt to changes in the domain. Populations were evolved as described above until a network was found that averaged less than 10 cm over the test set. The domain was then changed by removing the information of the position of the rst joint. van der Smagt (1995) showed that correct control decisions can be generated without knowledge of the rst joint position; there is sucient information from the hand camera and remaining joints to compute the position of the rst joint. Thus, by evolving with the rst joint information and then removing it, the population must adapt its network controllers to identify the rst joint information from the other input units. Figure 6 shows the adaptive performance of the network level and hierarchical approaches. SANE1 is not plotted since it failed to reliably nd solutions averaging less than 10 cm. While Hierarchical SANE was able to adapt its control policy to reach the 10 cm target within 10 generations, the standard approach adapted much more slowly, requiring 24 generations on the average. Increasing the mutation rate did produce faster adaptation in the standard approach, but it hindered the original learning rate. 12

13 70 60 SANE1 Standard Neuro-evolution Hierarchical SANE cm Generation Figure 5: The average distance from 50 randomly placed targets per generation for each level of evolution. The distances are averaged over 20 simulations. 35 Standard Neuro-evolution Hierarchical SANE cm Generation Figure 6: Adaptation comparisons between Hierarchical SANE and the standard approach. After the rst joint information is removed, Hierarchical SANE requires an average of 10 generations to recover to the 10 cm mark, while the standard approach requires

14 Participation Ranking Figure 7: The number of network participations for each neuron in the neuron population. The neurons are ranked based on their tness. Using the hierarchical approach, the rate of participation is biased towards the best neurons. 4.5 Analyzing Blueprint Evolution As described earlier, one of the advantages of the hierarchical approach is that neuron participation is biased towards the top performing neurons. In other words, the best neurons will participate in more networks than the newer or poorer neurons. Figure 7 shows the typical rate of participation in the neuron population in the last generation. The data was collected from a single simulation, however, other simulations exhibited very similar behavior. Of the 800 opportunities for participation, 2 40% were lled by neurons ranked in the top 6% of the population and 78% by neurons in the top 25%. Neurons ranked from 300 to 750 did not participate in any networks. Neurons ranked at the bottom of the population were evaluated but performed so poorly (e.g. moving the arm away from the target) that they were ranked below neurons which had no participations. Table 4 shows the neurons of the top 20 network blueprints during the last generation of a single simulation. The neuron pointer numbers refer to the rank of that neuron in the population after the tness had been distributed. For example, the top blueprint included the 122nd ranked neuron, the 135th ranked neuron, the top ranked neuron, and so on. The rst conclusion from the table is that the blueprint population is quite diverse. Blueprints 1 and 8 are the most similar, but they only share 5 out of the 8 neuron pointers. Thus the mutation strategy presented in section 3.2 appears to provide sucient diversity, allowing the blueprint population to sample many dierent combinations of neuron pointers. Table 4 also shows that many neurons ranked in the range are included in the top networks. Interestingly, these neurons only participate in 1 or 2 networks per generation (gure 7). Closer inspection reveals that the majority of these neurons are new, ospring neurons generated networks are formed per generation, each with 8 neurons. 14

15 Network Rank Neuron Pointers Table 4: The neurons in the top 20 network blueprints in the last generation of a simulation using Hierarchical SANE. The neuron numbers refer to their rank in the population after their tness was calculated. The table shows a very diverse collection of neurons and several examples of ospring neurons present in the best networks. during the previous generation. Thus, from the prevalent use of these neurons in the top blueprints, it can be concluded that the blueprint evolution is making eective use of the new neural structures created by the neuron evolution. 5 Conclusion Evolving neural networks at the neuron level oers important advantages over the more standard evolution of complete neural networks. Neuron evolution maintains population diversity and provides more accurate evaluation of the genetic building blocks, which produces a faster, more ecient genetic search. Unfortunately, neuron evolution alone, as in the SANE system, often cannot build upon the top neural networks in later generations and has diculty pinpointing the best solutions. In contrast, the slower standard neuro-evolution approach is more procient at ne tuning and can more reliably reach the best solutions. Incorporating an outer-loop network blueprint evolution on top of SANE's neuron evolution combines the advantages of both approaches by focusing the search on the best neuron combinations. Demonstrated in a sophisticated robot arm manipulation task, Hierarchical SANE consistently reached the desired level of prociency in half as many generations as a standard neuro-evolution approach. Hierarchical SANE is an important extension of the SANE system, since it makes it possible to tackle more challenging tasks that require both timely and precise solutions. In the future, Hierarchical SANE will be applied to such tasks including robot arm control with uncertain environments and sensor noise. 15

16 Acknowledgments The authors would like to thank Patrick van der Smagt for making his Simderella simulator available. This research was supported in part by the National Science Foundation under grant #IRI References Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike adaptive elements that can solve dicult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834{846. Belew, R. K., McInerney, J., and Schraudolph, N. N. (1991). Evolving networks: Using genetic algorithm with connectionist learning. In Farmer, J. D., Langton, C., Rasmussen, S., and Taylor, C., editors, Articial Life II. Reading, MA: Addison-Wesley. Cli, D., Harvey, I., and Husbands, P. (1993). Explorations in evolutionary robotics. Adaptive Behavior, 2:73{110. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley. Kawato, M. (1990). Computational schemes and neural network models for formation and control of multijoint arm trajectory. In Neural Networks for Control. Cambridge, MA: MIT Press. Koza, J. R., and Rice, J. P. (1991). Genetic generalization of both the weights and architecture for a neural network. In International Joint Conference on Neural Networks, vol. 2, 397{404. New York, NY: IEEE. Miller, W. T. (1989). Real-time application of neural networks for sensor-based control of robots with vision. IEEE Transactions on Systems, Man, and Cybernetics, 19(4):825{831. Moriarty, D. E., and Miikkulainen, R. (1996). Ecient reinforcement learning through symbiotic evolution. Machine Learning, 22:11{32. Nol, S., Floreano, D., Miglino, O., and Mondada, F. (1994). How to evolve autonomous robots: Dierent approaches in evolutionary robotics. In Articial Life IV. Cambridge, MA. Nol, S., and Parisi, D. (1992). Growing neural networks. In Articial Life III. Reading, MA: Addison- Wesley. van der Smagt, P. (1994). Simderella: A robot simulator for neuro-controller design. Neurocomputing, 6(2). van der Smagt, P. (1995). Visual Robot Arm Guidance using Neural Networks. PhD thesis, The University of Amsterdam, Amsterdam, The Netherlands. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge, England. Werbos, P. J. (1992). Neurocontrol and supervised learning: An overview and evaluation. In Handbook of Intelligent Control, 65{89. New York: Van Nostrand Reinhold. Whitley, D., Dominic, S., Das, R., and Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259{284. Yamauchi, B. M., and Beer, R. D. (1993). Sequential behavior and learning in evolved dynamical neural networks. Adaptive Behavior, 2:219{

Forming Neural Networks through Ecient and Adaptive. Coevolution. Abstract

Forming Neural Networks through Ecient and Adaptive. Coevolution. Abstract Evolutionary Computation, 5(4), 1998. (In Press). Forming Neural Networks through Ecient and Adaptive Coevolution David E. Moriarty Information Sciences Institute University of Southern California 4676

More information

Forming Neural Networks Through Efficient and Adaptive Coevolution

Forming Neural Networks Through Efficient and Adaptive Coevolution Forming Neural Networks Through Efficient and Adaptive Coevolution David E. Moriarty Information Sciences Institute University of Southern California 4676 Admiralty Way Marina del Rey, CA 90292 moriarty@isi.edu

More information

Neuroevolution for Reinforcement Learning Using Evolution Strategies

Neuroevolution for Reinforcement Learning Using Evolution Strategies Neuroevolution for Reinforcement Learning Using Evolution Strategies Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum 44780 Bochum, Germany christian.igel@neuroinformatik.rub.de Abstract-

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

0 o 1 i B C D 0/1 0/ /1

0 o 1 i B C D 0/1 0/ /1 A Comparison of Dominance Mechanisms and Simple Mutation on Non-Stationary Problems Jonathan Lewis,? Emma Hart, Graeme Ritchie Department of Articial Intelligence, University of Edinburgh, Edinburgh EH

More information

Active Guidance for a Finless Rocket using Neuroevolution

Active Guidance for a Finless Rocket using Neuroevolution Active Guidance for a Finless Rocket using Neuroevolution Gomez, F.J. & Miikulainen, R. (2003). Genetic and Evolutionary Computation Gecco, 2724, 2084 2095. Introduction Sounding rockets are used for making

More information

NEUROEVOLUTION. Contents. Evolutionary Computation. Neuroevolution. Types of neuro-evolution algorithms

NEUROEVOLUTION. Contents. Evolutionary Computation. Neuroevolution. Types of neuro-evolution algorithms Contents Evolutionary Computation overview NEUROEVOLUTION Presenter: Vlad Chiriacescu Neuroevolution overview Issues in standard Evolutionary Computation NEAT method Complexification in competitive coevolution

More information

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts. An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu

More information

Evolutionary Computation

Evolutionary Computation Evolutionary Computation - Computational procedures patterned after biological evolution. - Search procedure that probabilistically applies search operators to set of points in the search space. - Lamarck

More information

Improving the Performance of Evolutionary Optimization by. Dynamically Scaling the Evaluation Function.

Improving the Performance of Evolutionary Optimization by. Dynamically Scaling the Evaluation Function. Improving the Performance of Evolutionary Optimization by Dynamically Scaling the Evaluation Function Alex S. Fukunaga and Andrew B. Kahng Computer Science Department University of California, Los Angeles

More information

Forecasting & Futurism

Forecasting & Futurism Article from: Forecasting & Futurism December 2013 Issue 8 A NEAT Approach to Neural Network Structure By Jeff Heaton Jeff Heaton Neural networks are a mainstay of artificial intelligence. These machine-learning

More information

cells [20]. CAs exhibit three notable features, namely massive parallelism, locality of cellular interactions, and simplicity of basic components (cel

cells [20]. CAs exhibit three notable features, namely massive parallelism, locality of cellular interactions, and simplicity of basic components (cel I. Rechenberg, and H.-P. Schwefel (eds.), pages 950-959, 1996. Copyright Springer-Verlag 1996. Co-evolving Parallel Random Number Generators Moshe Sipper 1 and Marco Tomassini 2 1 Logic Systems Laboratory,

More information

Genetic Algorithms and permutation. Peter J.B. Hancock. Centre for Cognitive and Computational Neuroscience

Genetic Algorithms and permutation. Peter J.B. Hancock. Centre for Cognitive and Computational Neuroscience Genetic Algorithms and permutation problems: a comparison of recombination operators for neural net structure specication Peter J.B. Hancock Centre for Cognitive and Computational Neuroscience Departments

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

EVOLUTIONARY OPERATORS FOR CONTINUOUS CONVEX PARAMETER SPACES. Zbigniew Michalewicz. Department of Computer Science, University of North Carolina

EVOLUTIONARY OPERATORS FOR CONTINUOUS CONVEX PARAMETER SPACES. Zbigniew Michalewicz. Department of Computer Science, University of North Carolina EVOLUTIONARY OPERATORS FOR CONTINUOUS CONVEX PARAMETER SPACES Zbigniew Michalewicz Department of Computer Science, University of North Carolina Charlotte, NC 28223, USA and Thomas D. Logan IBM, Charlotte,

More information

Genetic Algorithms with Dynamic Niche Sharing. for Multimodal Function Optimization. at Urbana-Champaign. IlliGAL Report No

Genetic Algorithms with Dynamic Niche Sharing. for Multimodal Function Optimization. at Urbana-Champaign. IlliGAL Report No Genetic Algorithms with Dynamic Niche Sharing for Multimodal Function Optimization Brad L. Miller Dept. of Computer Science University of Illinois at Urbana-Champaign Michael J. Shaw Beckman Institute

More information

w1 w2 w3 Figure 1: Terminal Cell Example systems. Through a development process that transforms a single undivided cell (the gamete) into a body consi

w1 w2 w3 Figure 1: Terminal Cell Example systems. Through a development process that transforms a single undivided cell (the gamete) into a body consi Evolving Robot Morphology and Control Craig Mautner Richard K. Belew Computer Science and Engineering Computer Science and Engineering University of California, San Diego University of California, San

More information

Cooperative Coevolution of Multi-Agent Systems

Cooperative Coevolution of Multi-Agent Systems Cooperative Coevolution of Multi-Agent Systems Chern Han Yong and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, T 787 USA compute,risto@cs.utexas.edu Technical

More information

Transfer of Neuroevolved Controllers in Unstable Domains

Transfer of Neuroevolved Controllers in Unstable Domains In Proceedings of the Genetic Evolutionary Computation Conference (GECCO 24) Transfer of Neuroevolved Controllers in Unstable Domains Faustino J. Gomez and Risto Miikkulainen Department of Computer Sciences,

More information

Comparison of Evolutionary Methods for. Smoother Evolution. 2-4 Hikaridai, Seika-cho. Soraku-gun, Kyoto , JAPAN

Comparison of Evolutionary Methods for. Smoother Evolution. 2-4 Hikaridai, Seika-cho. Soraku-gun, Kyoto , JAPAN Comparison of Evolutionary Methods for Smoother Evolution Tomofumi Hikage, Hitoshi Hemmi, and Katsunori Shimohara NTT Communication Science Laboratories 2-4 Hikaridai, Seika-cho Soraku-gun, Kyoto 619-0237,

More information

An application of the temporal difference algorithm to the truck backer-upper problem

An application of the temporal difference algorithm to the truck backer-upper problem ESANN 214 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 23-25 April 214, i6doc.com publ., ISBN 978-28741995-7. Available

More information

Entropy-Driven Adaptive Representation. Justinian P. Rosca. University of Rochester. Rochester, NY

Entropy-Driven Adaptive Representation. Justinian P. Rosca. University of Rochester. Rochester, NY -Driven Adaptive Representation Justinian P. Rosca Computer Science Department University of Rochester Rochester, NY 14627 rosca@cs.rochester.edu Abstract In the rst genetic programming (GP) book John

More information

Shigetaka Fujita. Rokkodai, Nada, Kobe 657, Japan. Haruhiko Nishimura. Yashiro-cho, Kato-gun, Hyogo , Japan. Abstract

Shigetaka Fujita. Rokkodai, Nada, Kobe 657, Japan. Haruhiko Nishimura. Yashiro-cho, Kato-gun, Hyogo , Japan. Abstract KOBE-TH-94-07 HUIS-94-03 November 1994 An Evolutionary Approach to Associative Memory in Recurrent Neural Networks Shigetaka Fujita Graduate School of Science and Technology Kobe University Rokkodai, Nada,

More information

Structure Design of Neural Networks Using Genetic Algorithms

Structure Design of Neural Networks Using Genetic Algorithms Structure Design of Neural Networks Using Genetic Algorithms Satoshi Mizuta Takashi Sato Demelo Lao Masami Ikeda Toshio Shimizu Department of Electronic and Information System Engineering, Faculty of Science

More information

Policy Gradient Reinforcement Learning for Robotics

Policy Gradient Reinforcement Learning for Robotics Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous

More information

Proceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III

Proceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III Proceedings of the International Conference on Neural Networks, Orlando Florida, June 1994. REINFORCEMENT LEARNING IN CONTINUOUS TIME: ADVANTAGE UPDATING Leemon C. Baird III bairdlc@wl.wpafb.af.mil Wright

More information

Niching Methods for Genetic Algorithms. Samir W. Mahfoud E. Bay Dr. # Largo, FL IlliGAL Report No

Niching Methods for Genetic Algorithms. Samir W. Mahfoud E. Bay Dr. # Largo, FL IlliGAL Report No Niching Methods for Genetic Algorithms Samir W. Mahfoud 3665 E. Bay Dr. #204-429 Largo, FL 34641 IlliGAL Report No. 95001 May 1995 Illinois Genetic Algorithms Laboratory (IlliGAL) Department of General

More information

Novel determination of dierential-equation solutions: universal approximation method

Novel determination of dierential-equation solutions: universal approximation method Journal of Computational and Applied Mathematics 146 (2002) 443 457 www.elsevier.com/locate/cam Novel determination of dierential-equation solutions: universal approximation method Thananchai Leephakpreeda

More information

Coevolution of Multiagent Systems using NEAT

Coevolution of Multiagent Systems using NEAT Coevolution of Multiagent Systems using NEAT Matt Baer Abstract This experiment attempts to use NeuroEvolution of Augmenting Topologies (NEAT) create a Multiagent System that coevolves cooperative learning

More information

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp Input Selection with Partial Retraining Pierre van de Laar, Stan Gielen, and Tom Heskes RWCP? Novel Functions SNN?? Laboratory, Dept. of Medical Physics and Biophysics, University of Nijmegen, The Netherlands.

More information

Effect of number of hidden neurons on learning in large-scale layered neural networks

Effect of number of hidden neurons on learning in large-scale layered neural networks ICROS-SICE International Joint Conference 009 August 18-1, 009, Fukuoka International Congress Center, Japan Effect of on learning in large-scale layered neural networks Katsunari Shibata (Oita Univ.;

More information

Neural Systems and Artificial Life Group, Institute of Psychology, National Research Council, Rome. Evolving Modular Architectures for Neural Networks

Neural Systems and Artificial Life Group, Institute of Psychology, National Research Council, Rome. Evolving Modular Architectures for Neural Networks Neural Systems and Artificial Life Group, Institute of Psychology, National Research Council, Rome Evolving Modular Architectures for Neural Networks Andrea Di Ferdinando, Raffaele Calabretta and Domenico

More information

Laboratory for Computer Science. Abstract. We examine the problem of nding a good expert from a sequence of experts. Each expert

Laboratory for Computer Science. Abstract. We examine the problem of nding a good expert from a sequence of experts. Each expert Picking the Best Expert from a Sequence Ruth Bergman Laboratory for Articial Intelligence Massachusetts Institute of Technology Cambridge, MA 0239 Ronald L. Rivest y Laboratory for Computer Science Massachusetts

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Stochastic Search: Part 2. Genetic Algorithms. Vincent A. Cicirello. Robotics Institute. Carnegie Mellon University

Stochastic Search: Part 2. Genetic Algorithms. Vincent A. Cicirello. Robotics Institute. Carnegie Mellon University Stochastic Search: Part 2 Genetic Algorithms Vincent A. Cicirello Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 cicirello@ri.cmu.edu 1 The Genetic Algorithm (GA)

More information

USING ASSEMBLER ENCODING TO SOLVE INVERTED PENDULUM PROBLEM. Tomasz Praczyk

USING ASSEMBLER ENCODING TO SOLVE INVERTED PENDULUM PROBLEM. Tomasz Praczyk Computing and Informatics, Vol. 28, 2009, 895 912 USING ASSEMBLER ENCODING TO SOLVE INVERTED PENDULUM PROBLEM Tomasz Praczyk Naval University Śmidowicza 69 Gdynia, Poland e-mail: T.Praczyk@amw.gdynia.pl

More information

Competitive Co-evolution

Competitive Co-evolution Competitive Co-evolution Robert Lowe Motivations for studying Competitive co-evolution Evolve interesting behavioural strategies - fitness relative to opponent (zero sum game) Observe dynamics of evolving

More information

Replacing eligibility trace for action-value learning with function approximation

Replacing eligibility trace for action-value learning with function approximation Replacing eligibility trace for action-value learning with function approximation Kary FRÄMLING Helsinki University of Technology PL 5500, FI-02015 TKK - Finland Abstract. The eligibility trace is one

More information

Rule Acquisition for Cognitive Agents by Using Estimation of Distribution Algorithms

Rule Acquisition for Cognitive Agents by Using Estimation of Distribution Algorithms Rule cquisition for Cognitive gents by Using Estimation of Distribution lgorithms Tokue Nishimura and Hisashi Handa Graduate School of Natural Science and Technology, Okayama University Okayama 700-8530,

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence Artificial Intelligence (AI) Artificial Intelligence AI is an attempt to reproduce intelligent reasoning using machines * * H. M. Cartwright, Applications of Artificial Intelligence in Chemistry, 1993,

More information

Choosing Variables with a Genetic Algorithm for Econometric models based on Neural Networks learning and adaptation.

Choosing Variables with a Genetic Algorithm for Econometric models based on Neural Networks learning and adaptation. Choosing Variables with a Genetic Algorithm for Econometric models based on Neural Networks learning and adaptation. Daniel Ramírez A., Israel Truijillo E. LINDA LAB, Computer Department, UNAM Facultad

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

An Effective Chromosome Representation for Evolving Flexible Job Shop Schedules

An Effective Chromosome Representation for Evolving Flexible Job Shop Schedules An Effective Chromosome Representation for Evolving Flexible Job Shop Schedules Joc Cing Tay and Djoko Wibowo Intelligent Systems Lab Nanyang Technological University asjctay@ntuedusg Abstract As the Flexible

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Evolutionary Design I

Evolutionary Design I Evolutionary Design I Jason Noble jasonn@comp.leeds.ac.uk Biosystems group, School of Computing Evolutionary Design I p.1/29 This lecture Harnessing evolution in a computer program How to construct a genetic

More information

Pengju

Pengju Introduction to AI Chapter04 Beyond Classical Search Pengju Ren@IAIR Outline Steepest Descent (Hill-climbing) Simulated Annealing Evolutionary Computation Non-deterministic Actions And-OR search Partial

More information

Designer Genetic Algorithms: Genetic Algorithms in Structure Design. Inputs. C 0 A0 B0 A1 B1.. An Bn

Designer Genetic Algorithms: Genetic Algorithms in Structure Design. Inputs. C 0 A0 B0 A1 B1.. An Bn A B C Sum Carry Carry Designer Genetic Algorithms: Genetic Algorithms in Structure Design Sushil J. Louis Department of Computer Science Indiana University, Bloomington, IN 47405 louis@iuvax.cs.indiana.edu

More information

Simple Model of Evolution. Reed College FAX (505) Abstract. of evolution. The model consists of a two-dimensional world with a

Simple Model of Evolution. Reed College FAX (505) Abstract. of evolution. The model consists of a two-dimensional world with a Adaptation of Mutation Rates in a Simple Model of Evolution Mark A. Bedau Reed College Robert Seymour 3213 SE Woodstock Blvd., Portland, OR 97202, USA FAX (505) 777-7769 mab@reed.edu rseymour@reed.edu

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Approximating Q-values with Basis Function Representations. Philip Sabes. Department of Brain and Cognitive Sciences

Approximating Q-values with Basis Function Representations. Philip Sabes. Department of Brain and Cognitive Sciences Approximating Q-values with Basis Function Representations Philip Sabes Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 39 sabes@psyche.mit.edu The consequences

More information

Function Optimization Using Connectionist Reinforcement Learning Algorithms Ronald J. Williams and Jing Peng College of Computer Science Northeastern

Function Optimization Using Connectionist Reinforcement Learning Algorithms Ronald J. Williams and Jing Peng College of Computer Science Northeastern Function Optimization Using Connectionist Reinforcement Learning Algorithms Ronald J. Williams and Jing Peng College of Computer Science Northeastern University Appears in Connection Science, 3, pp. 241-268,

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

arxiv: v1 [cs.ai] 1 Jul 2015

arxiv: v1 [cs.ai] 1 Jul 2015 arxiv:507.00353v [cs.ai] Jul 205 Harm van Seijen harm.vanseijen@ualberta.ca A. Rupam Mahmood ashique@ualberta.ca Patrick M. Pilarski patrick.pilarski@ualberta.ca Richard S. Sutton sutton@cs.ualberta.ca

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

Lecture 9 Evolutionary Computation: Genetic algorithms

Lecture 9 Evolutionary Computation: Genetic algorithms Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Simulation of natural evolution Genetic algorithms Case study: maintenance scheduling with genetic

More information

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G. In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and J. Alspector, (Eds.). Morgan Kaufmann Publishers, San Fancisco, CA. 1994. Convergence of Indirect Adaptive Asynchronous

More information

The Markov Decision Process Extraction Network

The Markov Decision Process Extraction Network The Markov Decision Process Extraction Network Siegmund Duell 1,2, Alexander Hans 1,3, and Steffen Udluft 1 1- Siemens AG, Corporate Research and Technologies, Learning Systems, Otto-Hahn-Ring 6, D-81739

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

On the Structure of Low Autocorrelation Binary Sequences

On the Structure of Low Autocorrelation Binary Sequences On the Structure of Low Autocorrelation Binary Sequences Svein Bjarte Aasestøl University of Bergen, Bergen, Norway December 1, 2005 1 blank 2 Contents 1 Introduction 5 2 Overview 5 3 Denitions 6 3.1 Shift

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

AND AND AND AND NOT NOT

AND AND AND AND NOT NOT Genetic Programming Using a Minimum Description Length Principle Hitoshi IBA 1 Hugo de GARIS 2 Taisuke SATO 1 1)Machine Inference Section, Electrotechnical Laboratory 1-1-4 Umezono, Tsukuba-city, Ibaraki,

More information

by the size of state/action space (and by the trial length which maybeproportional to this). Lin's oine Q() (993) creates an action-replay set of expe

by the size of state/action space (and by the trial length which maybeproportional to this). Lin's oine Q() (993) creates an action-replay set of expe Speeding up Q()-learning In Proceedings of the Tenth European Conference on Machine Learning (ECML'98) Marco Wiering and Jurgen Schmidhuber IDSIA, Corso Elvezia 36 CH-69-Lugano, Switzerland Abstract. Q()-learning

More information

Representation and Hidden Bias II: Eliminating Defining Length Bias in Genetic Search via Shuffle Crossover

Representation and Hidden Bias II: Eliminating Defining Length Bias in Genetic Search via Shuffle Crossover Representation and Hidden Bias II: Eliminating Defining Length Bias in Genetic Search via Shuffle Crossover Abstract The traditional crossover operator used in genetic search exhibits a position-dependent

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

PRIME GENERATING LUCAS SEQUENCES

PRIME GENERATING LUCAS SEQUENCES PRIME GENERATING LUCAS SEQUENCES PAUL LIU & RON ESTRIN Science One Program The University of British Columbia Vancouver, Canada April 011 1 PRIME GENERATING LUCAS SEQUENCES Abstract. The distribution of

More information

Finding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract

Finding Succinct. Ordered Minimal Perfect. Hash Functions. Steven S. Seiden 3 Daniel S. Hirschberg 3. September 22, Abstract Finding Succinct Ordered Minimal Perfect Hash Functions Steven S. Seiden 3 Daniel S. Hirschberg 3 September 22, 1994 Abstract An ordered minimal perfect hash table is one in which no collisions occur among

More information

The Constrainedness Knife-Edge. Toby Walsh. APES Group. Glasgow, Scotland.

The Constrainedness Knife-Edge. Toby Walsh. APES Group. Glasgow, Scotland. Abstract A general rule of thumb is to tackle the hardest part of a search problem rst. Many heuristics therefore try to branch on the most constrained variable. To test their eectiveness at this, we measure

More information

AN EVOLUTIONARY ALGORITHM TO ESTIMATE UNKNOWN HEAT FLUX IN A ONE- DIMENSIONAL INVERSE HEAT CONDUCTION PROBLEM

AN EVOLUTIONARY ALGORITHM TO ESTIMATE UNKNOWN HEAT FLUX IN A ONE- DIMENSIONAL INVERSE HEAT CONDUCTION PROBLEM Proceedings of the 5 th International Conference on Inverse Problems in Engineering: Theory and Practice, Cambridge, UK, 11-15 th July 005. AN EVOLUTIONARY ALGORITHM TO ESTIMATE UNKNOWN HEAT FLUX IN A

More information

Evolutionary Computation Theory. Jun He School of Computer Science University of Birmingham Web: jxh

Evolutionary Computation Theory. Jun He School of Computer Science University of Birmingham Web:   jxh Evolutionary Computation Theory Jun He School of Computer Science University of Birmingham Web: www.cs.bham.ac.uk/ jxh Outline Motivation History Schema Theorem Convergence and Convergence Rate Computational

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

Proceedings of the Seventh Oklahoma Conference on Articial Intelligence, pp , November, Objective Function

Proceedings of the Seventh Oklahoma Conference on Articial Intelligence, pp , November, Objective Function Proceedings of the Seventh Oklahoma Conference on Articial Intelligence, pp. 200-206, November, 1993. The Performance of a Genetic Algorithm on a Chaotic Objective Function Arthur L. Corcoran corcoran@penguin.mcs.utulsa.edu

More information

Improving Coordination via Emergent Communication in Cooperative Multiagent Systems: A Genetic Network Programming Approach

Improving Coordination via Emergent Communication in Cooperative Multiagent Systems: A Genetic Network Programming Approach Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Improving Coordination via Emergent Communication in Cooperative Multiagent Systems:

More information

Search. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough

Search. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough Search Search is a key component of intelligent problem solving Search can be used to Find a desired goal if time allows Get closer to the goal if time is not enough section 11 page 1 The size of the search

More information

EVOLUTIONARY COMPUTATION. Th. Back, Leiden Institute of Advanced Computer Science, Leiden University, NL and

EVOLUTIONARY COMPUTATION. Th. Back, Leiden Institute of Advanced Computer Science, Leiden University, NL and EVOLUTIONARY COMPUTATION Th. Back, Leiden Institute of Advanced Computer Science, Leiden University, NL and divis digital solutions GmbH, Dortmund, D. Keywords: adaptation, evolution strategy, evolutionary

More information

STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo

STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo Department of Electrical and Computer Engineering, Nagoya Institute of Technology

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds

More information

ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr.

ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr. USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS ASHANTHI MEENA SERALATHAN Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr. Stephen

More information

V. Evolutionary Computing. Read Flake, ch. 20. Genetic Algorithms. Part 5A: Genetic Algorithms 4/10/17. A. Genetic Algorithms

V. Evolutionary Computing. Read Flake, ch. 20. Genetic Algorithms. Part 5A: Genetic Algorithms 4/10/17. A. Genetic Algorithms V. Evolutionary Computing A. Genetic Algorithms 4/10/17 1 Read Flake, ch. 20 4/10/17 2 Genetic Algorithms Developed by John Holland in 60s Did not become popular until late 80s A simplified model of genetics

More information

CSC 4510 Machine Learning

CSC 4510 Machine Learning 10: Gene(c Algorithms CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ Slides of this presenta(on

More information

Ajit Narayanan and Mark Moore. University of Exeter

Ajit Narayanan and Mark Moore. University of Exeter Quantum-inspired genetic algorithms Ajit Narayanan and Mark Moore Department of Computer Science University of Exeter Exeter, United Kingdom, EX4 4PT ajit@dcs.exeter.ac.uk Abstract A novel evolutionary

More information

Abstract: Q-learning as well as other learning paradigms depend strongly on the representation

Abstract: Q-learning as well as other learning paradigms depend strongly on the representation Ecient Q-Learning by Division of Labor Michael Herrmann RIKEN, Lab. f. Information Representation 2-1 Hirosawa, Wakoshi 351-01, Japan michael@sponge.riken.go.jp Ralf Der Universitat Leipzig, Inst. f. Informatik

More information

Evolving Neural Networks through Augmenting Topologies

Evolving Neural Networks through Augmenting Topologies Evolving Neural Networks through Augmenting Topologies Kenneth O. Stanley kstanley@cs.utexas.edu Department of Computer Sciences, The University of Texas at Austin, Austin, TX 78712, USA Risto Miikkulainen

More information

Abstract In this paper we consider a parallel and distributed computation of genetic algorithms on loosely-coupled multiprocessor systems. Genetic Alg

Abstract In this paper we consider a parallel and distributed computation of genetic algorithms on loosely-coupled multiprocessor systems. Genetic Alg A Parallel and Distributed Genetic Algorithm on Loosely-coupled Multiprocessor Systems 1998 Graduate School of Engineering Electrical and Information Engineering University of the Ryukyus Takashi MATSUMURA

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Perceptron. (c) Marcin Sydow. Summary. Perceptron Topics covered by this lecture: Neuron and its properties Mathematical model of neuron: as a classier ' Learning Rule (Delta Rule) Neuron Human neural system has been a natural source of inspiration for

More information

Journal of Algorithms Cognition, Informatics and Logic. Neuroevolution strategies for episodic reinforcement learning

Journal of Algorithms Cognition, Informatics and Logic. Neuroevolution strategies for episodic reinforcement learning J. Algorithms 64 (2009) 152 168 Contents lists available at ScienceDirect Journal of Algorithms Cognition, Informatics and Logic www.elsevier.com/locate/jalgor Neuroevolution strategies for episodic reinforcement

More information

Reinforcement Learning: the basics

Reinforcement Learning: the basics Reinforcement Learning: the basics Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud August 6, 2012 1 / 46 Introduction Action selection/planning Learning by trial-and-error

More information

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules. Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present

More information

Open Theoretical Questions in Reinforcement Learning

Open Theoretical Questions in Reinforcement Learning Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

IVR: Introduction to Control (IV)

IVR: Introduction to Control (IV) IVR: Introduction to Control (IV) 16/11/2010 Proportional error control (P) Example Control law: V B = M k 2 R ds dt + k 1s V B = K ( ) s goal s Convenient, simple, powerful (fast and proportional reaction

More information

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada. In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing

More information

epochs epochs

epochs epochs Neural Network Experiments To illustrate practical techniques, I chose to use the glass dataset. This dataset has 214 examples and 6 classes. Here are 4 examples from the original dataset. The last values

More information

Unicycling Helps Your French: Spontaneous Recovery of Associations by. Learning Unrelated Tasks

Unicycling Helps Your French: Spontaneous Recovery of Associations by. Learning Unrelated Tasks Unicycling Helps Your French: Spontaneous Recovery of Associations by Learning Unrelated Tasks Inman Harvey and James V. Stone CSRP 379, May 1995 Cognitive Science Research Paper Serial No. CSRP 379 The

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform. Santa Fe Institute Working Paper (Submitted to Complex Systems)

Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform. Santa Fe Institute Working Paper (Submitted to Complex Systems) Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations Melanie Mitchell 1, Peter T. Hraber 1, and James P. Crutcheld 2 Santa Fe Institute Working Paper 93-3-14 (Submitted to Complex

More information