- - Reconfigurable Self-Replicating Robotics

Size: px

Start display at page:

Download "- - Reconfigurable Self-Replicating Robotics"

Juniper McLaughlin
5 years ago
Views:

1 - - Reconfigurable Self-Replicating Robotics Abhishek Dutta E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science School of Informatics University of Edinburgh 2007

2 Abstract Self-replication to date has only been demonstrated in a planned manner or via cellular automata or as deterministic assembly of machines, that too in a scarce fashion and hence lacks significant thrust. In this paper I propose to cover this void by adapting a novel approach of growing a neural network topology to control the modular robot morphology and attaining full fledged self replication by the process of fusion and fission in a simulated world of sophisticated physics. I have developed a new model, Reinforced Central Pattern Generated Focussed Time Delay Neural Network and deployed over stable and dock-able cylindrical modular morphology. It also develops novel manoeuvring capabilities directed towards self-replication. The vision is to auto-colonise vacant parts of space using these exponentially self- replicating creatures. i

3 Acknowledgements I am deeply indebted to my supervisor Eric McKenzie for accepting this project, believing in me and supporting me right through. I am also grateful to Corrado Priami for his guidance when I was at the University of Trento, Italy. I extend my gratitude to God almighty and my loving parents for always being there for me. Finally, I recognize Erasmus Mundus for funding this research. ii

4 Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Abhishek Dutta) iii

5 Table of Contents 1 Introduction 1 2 Background Creature Morphology Creature Control Creature Evolution Self Replication Design Creature Morphology Recognition of Self & Homing Creature Control The RCPGFTDNN The Focussed Time Delay Dynamic Neural Network Central Pattern Generated Training Reinforcement Based Fixed Stochastic Behaviour Implementation Creature Morphology Evolving Controller With Morphology For Self Replication Architecture Self Replication (Stage-1) Self Replication (Stage-2) Self Replication (Stage-3) Self Replication (Stage-4) Physical Simulation Evaluation 47 iv

6 5.1 Creature Morphology Network Controller Manoevring Innovation State Space Conclusions & Future Work 57 A Code Snippets 59 A.1 Robot Server (Stage-1) A.2 Client Controller (Stage-1) Bibliography 64 v

7 Chapter 1 Introduction The morphologies of creatures and the neural systems for controlling their muscle forces can be both generated automatically using genetic algorithms. Different fitness evaluation functions can be used to direct simulated evolutions towards specific behaviours such as walking, following and replicating. A genetic language that uses nodes and connections as its primitive elements to represent directed graphs, can be used to describe both the morphology and the neural circuitry of these creatures. This genetic language defines a hyperspace containing an indefinite number of possible creatures with behaviours, and when it is searched using optimisation techniques, a variety of successful and interesting locomotion strategies would emerge, some of which would be difficult to invent or build by design (Sims, 1994). The evolutionary algorithms employed in the evolution of the embodied agent s brain allow them to learn via a reinforcement style of trial and error known as neuroevolution(ne). An embodied agent is an autonomous living creature, subjected to the constraints of its environment (Ruebsamen, 2002). However, a common problem of using evolutionary computation techniques to evolve intelligent behaviours in embodied agents is the simplicity of the environment and overall system often precludes any life-like behaviours from emerging. Self-replication is a process critical to both natural and artificial life, but has been investigated to date mostly in the context of replicating programs, cellular automata (Moore, 1970). Self-repair and Self-reconfiguration has been explored by many authors in the perspective of cellular machines and with the use of electro magnets and diffusion (Murata et al., 1994) (Yoshida et al., 1999). In contrast, we see self-replication (or 1

8 Chapter 1. Introduction 2 self-reproduction) as a continuum, quantifiable based on the amount of information being replicated. Physical machines capable of self-reproduction have been scarcely discussed, because it is often difficult to build interesting or realistic virtual entities and still maintain control over them. Self replication can be defined as emergence of a system given that one instance of the system is already present in the environment. A machine is considered replicated only when the new copy is identical and detached from its parent. Self-replication leads to exponential growth, and would allow as few as one initial factory to spawn lunar production of materials and energy on a massive scale. In this paper I shall be using modular robotic framework to attain self replication. Modular reconfigurable robotic systems that are composed of many modules have three promises, to be versatile, robust, and low cost. Versatility is the ability to form a large variety of shapes with large numbers of degrees of freedom. Reliability is the ability of the system to self repair, i.e. if one module fails, another can replace it. Finally the economics of scale come into play and the per-module cost goes down(yim et al., 2000). In this dissertation, in the first section I intend to cover all the background work done so far in the vast area of evolutionary robotics and artificial life to reconfigurable self replicating modular robots. In the next section I cover the design of my model. The cylindrical modular morphology and the novel Reinforced Central Pattern Generated Focussed Neural Network as a controller over the growing morphology directing towards self replication are discussed at depth here. Then I present the implementation of full fledged self replication in terms of the architecture used to model the combined morphology and the controller through all its stages of formation and division into replicas stressing on yet another novel approach of manoeuvring based on channelising CPGs. Finally I conclude with the critical evaluation of my model up against many others highlighting the results in terms of success as well as bottlenecks.

9 Chapter 2 Background 2.1 Creature Morphology The genetic representation of the morphology can be a directed graph of nodes and connections as shown in figure 2.1. Each graph then contains the developmental instructions for growing a creature, and provides a way of reusing instructions to make similar or recursive components within the creature. The graph can be recurrent with recursive limits. Each node in the graph contains information describing a rigid part. The dimensions determine the physical shape of the part. A joint-type determines the constraints on the relative motion between this part and its parent by defining the number of degrees of freedom of the joint and the movement allowed for each degree of freedom with joint-limits. Each connection also contains information. The nodes contain the sensors, neurons, and effectors, and the connections define the flow of signals between these nodes. The placement of a child part relative to its parent is decomposed into position, orientation, scale, and reflection, so each can be mutated independently (Sims, 1994). 2.2 Creature Control Here a developmental process is used to generate the creatures and their control systems, that allows similar components including their local neural circuitry to be defined once and then replicated, however creatures may have some neurons not associated 3

10 Chapter 2. Background 4 Figure 2.1: Genotype & Phenotype (Sims, 1994)

11 Chapter 2. Background 5 Figure 2.2: Evolved creatures with any part thus producing global synchronization or centralized control. The neurons and effectors within a part can receive signals from sensors or neurons in their parent part or in their child parts. Some functions compute an output directly from their inputs, while others such as the oscillators retain some state and can give time varying outputs even when their inputs are constant. The nodes contain the sensors, neurons, and effectors, and the connections define the flow of signals between these nodes which can be recurrent in nature. A Lindenmayer system (L-system) can be used as the generative specification system for both body and brain and is optimised by an evolutionary algorithm (EA). To realise this, for connecting neuron A to neuron B, the following commands could be used with the weight parameter n decrease-weight(n) subtracts n from the weight of the current link. duplicate(n) creates new link. increase-weight(n) add n to the weight of current link. loop(n) creates a new link to itself. merge(n) merges neuron A into B. output(n) creates an output neuron with the desired transfer function. reverse() reverses the link. split(n) inserts a new neuron C with weight n from B to C. To give the neural controller control of the body, for every morphological connection, the output(1) com-

12 Chapter 2. Background 6 Figure 2.3: CPG (Sims, 1994) mand is called. Using these rules the network of the creature is shown in figure 2.2. It moves by twisting itself to move sideways (Hornby and Pollack, 2001). Another approach is the controller being a central pattern generator (CPG) which can produce the patterns of oscillation necessary for locomotion without oscillating input (need only simple excitatory signal) either from the brain or from sensory feedback (Ijspreet et al., 1998). Lamprey swimming thrust is produced by propagation of a laterally directed wave along the entire length of the body. These mechanical waves are caused by consecutive contractions in the muscles located in each side driven directly by motoneurons located in the spinal cord. The spinal pattern generator is organized as series of coupled local segmental oscillators (each influencing the phase of its neighbours) (Sims, 1994). Figure 2.3 portrays the local (segmental) network responsible for the generation of the basic rhythmicity. The network is symmetrically organised with motoneurons (MN) providing output to the muscles on the two sides of the body and stretch receptor neurons (SR) receiving information on the local curvature of the body. Filled circles represent inhibitory synapses while open circles represent excitation. When the brainstem input to the right side motoneurons is increased, lamprey takes a right turn. If the tonic is increased only for the lower half of the network, the result is a downward directed pitch turn(ekeberg et al., 1995). There exists a motion adaptation mechanism at the cerebellum. It adapts the motion signals and parameters based on signals from peripheral sensors and other mechanisms. Stable and flexible biped walking could be realized as a global limit cycle generated by a global entrainment between the rhythmic activities of the neural oscillator (N.O.) and the rhythmic movements of a musculoskeletal system (M.S.S.). Each neuron in this model is represented by the following non-linear differential equations (Taga et al.,

13 Chapter 2. Background ). τ u i = u i βv i + Σ n j=1 w i jy j + u 0 + Feed i (2.1) τ v i = v i + y i (2.2) y i = max(0,u i ) (2.3) where u i is the inner state of the i th neuron; v i is a variable representing the degree of the self-inhibition effect of the i th neuron; y i is the output of the i th neuron; u 0 is an external input with a constant rate; Feed i is a feedback signal from the M.S.S., that is, a joint angle; and beta is a constant representing the degree of the self-inhibition influence on the inner state. The quantities τ and τ are time constants of u i and v i ; w i j is a connecting weight between the i th and j th neurons. Each N.O. consists of two mutually inhibiting neurons. These two neurons alternately induce torque proportional to the inner state u i in opposite directions, namely the directions of contraction of the flexor and extensor muscles as shown below. The N.O. and M.S.S. are mutually entrained and oscillate with the same period and phase (Kimura et al., 1999). This is illustrated in figure 2.4. Broadly speaking there are two major categories of networks used by neuronists, known as layered feed forward and recurrent. The other common network type is called fully connected as shown, which includes feedback and the outputs depend not only on the initial inputs but on the history of the internal signal strengths and hence the dynamics is lot more complicated. This greater complexity may be a disadvantage in terms of its analysability, but its greater number of degrees of freedom may actually be useful for the functioning of GenNets (i.e. a Genetically Programmed Neural Net) which is a neural net that has been evolved by the GA to perform some function or behaviour, where the fitness is the quality measure of the behaviour. GenNet is illustrated in figure 2.5. The GenNets are of two kinds, either behavioural (i.e. functional) or control. Control is also of two kinds, either direct or indirect. Firstly the behavioural GenNets are evolved. Once they perform as well as desired, their signs and weights are frozen. Then the control GenNets are evolved, since this is done in a hierarchical modular fashion, the weights of the joint module are found first. These weights are then frozen, and the weights of the control circuit found so that the arm moves as close as possible to any specified goal point. Knowing the values of the angles and the angular velocities at the beginning of a cycle which form the input, one can calculate the values of the angles

14 Chapter 2. Background 8 Figure 2.4: Neural Oscillator (Kimura et al., 1999)

15 Chapter 2. Background 9 Figure 2.5: Fully connected GenNet (degaris, 1990) and the angular velocities at the end of that cycle based on the angular accelerations output of the network (degaris, 1990). Recurrent Neural Network controller are quiet often used in these situations. Sensory input is first fed into the RNN as illustrated in figure 2.6. The RNN then calculates the activations based upon the sensory inputs and the context layers. The values produced at the output layer then are fed directly into the embodied agents effectors. The agent s effectors in turn control its appendages, thus producing movement. For every time step in the simulation, this process is repeated (Ruebsamen, 2002). Drawbacks of the RNN include: they are computationally more intensive than feedforward ANNs and the standard method of learning via back propagation of error does not work with RNNs. 2.3 Creature Evolution The evolutionary algorithms approach employs stochastic processes to generate results that significantly outperform results that would otherwise be obtained through

16 Chapter 2. Background 10 Figure 2.6: RNN controller (Ruebsamen, 2002) a random search or conventional optimisation techniques. Artificial life (A-Life) is the scientific field of study that attempts to model living biological systems through Evolutionary Computation. The three-dimensional physical structure of a creature can adapt to its control system, and vice versa, as they evolve together. The nervous systems of creatures are also completely determined by the optimisation: the number of internal nodes, the connectivity, and the type of function each neural node performs are included in the genetic description of each creature, and can grow in complexity as an evolution proceeds. Creatures grown from their genetic descriptions survive proportional to their target behaviour dependent fitness. They are mutated by adding several random numbers from a Gaussian-like distribution so that small adjustments are more likely than drastic ones. New random nodes and connections can be added to the graph, parameters of each connection are subjected to change. Grafting and crossover operations are performed by sexual/asexual reproduction of the nodes. Alternatively, fitness could be defined in a more biologically realistic way by allowing populations of virtual creatures to compete against each other, engaging in social interactions (Sims, 1994). Fitness and mutations should both be scaled to avoid premature convergence and remove bias, pruning illegal configurations due to collisions and self-locking. Interactive evolution allows procedurally generated results to be explored by simply choosing those that are the most aesthetically desirable (Sims, 1970). The fitness function positively reinforces agents who are able to travel great distances along the x-axis while

17 Chapter 2. Background 11 penalizing change in course. If the GenNet contains N neurons, there will be N N interconnections. Each connection is specified by its sign (where a positive value represents an excitatory synapse, and a negative value represents an inhibitory synapse), and its weight value, with modulus less than 1. The number of bits in the chromosome is N N (P + 1) with P binary bits for each weight and 1 for sign. Nanorobots( nanots ) could be made in huge numbers and evolved in parallel. It would be nice if these techniques could be accelerated by putting them directly into hardware, e.g. VLSI accelerator chips for GenNet development (degaris, 1990). However one of the aspects of GP which needs serious consideration concerns the concept of evolvability, i.e. the capacity of a system to evolve rapidly enough to be interesting, which is by no means guaranteed. 2.4 Self Replication At the core of biological self-replication lies the fact that nucleic acids (in particular DNAs) can produce copies of themselves when the required chemical building blocks and catalysts are present. The concept of artificial self-replicating systems was originated by von Neumann in the 1950s in his theory of automata. A self-replicating system reads instructions and converts these into assembly commands that result in the assembly of replicas of the original machine together with a copy of the assembly instructions (so that the replica also has the ability to replicate) (Sipper, 1998). Deterministic self-reproduction of robotic systems has only recently been demonstrated, where a Lego T M robot composed of three modules was able to assemble three other modules into a new identical robot. To do so, the base module of the robot followed a path drawn on the ground, pushing the other modules and joining them into an assembly using magnetic connections or with the assistance of an external passive joining rig (Chirikjian et al., 2002). When self-replicating robotic factories take hold, the moon will be transformed into an industrial dynamo. The key for this overall system to be self-replicating is the interior closed loop indicating robot self-replication when casted robot parts are made available. The original robot is remotely controlled to relocate the various subsystems

18 Chapter 2. Background 12 like gripper, motors from a storage area to start assembling. In this concept, the original robot is unable to make copies of itself directly (Chirikjian et al., 2002). The following four ways of replication are identified 1. Direct reproduction: A machine reconfigures to pick cubes from a dispenser and place them in a new location, gradually building a copy from the ground up. 2. Self-assisted reproduction: The machine being constructed reconfigures during the construction process to facilitate its own construction. 3. Multi-stage reproduction: Intermediate constructions are required before the target machine can be made. The intermediate machine is then discarded as a waste product, or can be used to catalyse the production of additional machines. 4. Tandem reproduction: Multiple machines are required to produce a single copy. One machine may place cubes while the other reorients the constructed machine. One of the approaches (Mytilinaios et al., 2004) which can be used is Evolve morphologies of machines that are capable of reaching an area large enough to contain a detached copy of themselves. The percentage of coverage provides a gradient. Evolve controllers that would make a given morphology pick modules from dispensers and place them at the correct position. The number of dispensers needed provides a gradient. Morphologies could be represented as a series of code-pairs. The first code moves the cursor in one of the four cardinal directions, while the second code defines the type of module to try to place. And controllers as a series of code-triplets, each triplet first describes a command ( Swivel, Attach, or Detach ), and a module number.

19 Chapter 3 Design 3.1 Creature Morphology The genotype is the specific genetic make-up (the genome) of an individual, in the form of DNA. Together with the environmental variation that influences the individual, it codes for the phenotype of that individual. The phenotype of an individual organism is either its total physical appearance and constitution or a specific manifestation of a trait, such as size, eye colour, or behaviour that varies between individuals. For our purpose of replication, lets consider the following genotype represented as a directed graph containing the information about the morphology and controller of the modular organism as illustrated in figure 3.1. Each segment or DNA consists of all the information required to code for the organism. In our case, it stores the following information Translation, rotation with respect to its parent module, appearance like material and texture and geometry like cylinder and box. Physics properties such as density, bounce, coulomb friction, force dependent slip, centre of mass. Specifications of the actuators, in this case embedded servos type (rotational/linear), joint location, spacial orientation, force, acceleration, velocity and position limits. Sensors like global position system, touch sensor. 13

Chapter 3. Design 14 Figure 3.1: Genotype Special alleles might also contain camera specifications like its field of view and type. Finally it houses the code for the neural controller.

20 Chapter 3. Design 14 Figure 3.1: Genotype Special alleles might also contain camera specifications like its field of view and type. Finally it houses the code for the neural controller. The above gene or DNA undergoes transcription and translation, in the physically constrained world of dynamics to form protein molecules which join up to form the phenotypes of the organism such as the ones shown in figures 3.2 and 3.3. After their birth, these are then subjected to the Darwinian laws of evolution i.e. Survival of the fittest. The chain of mole-cubes when used for replication face many hurdles in the form of in-efficient locomotion strategies, instability, the centre of mass frequently would fall out of the body leading the organism to collapse and constrained servo movements because of their rigid shapes. Docking is yet another issue with their limited number of edges. The second form of organism with a chain of connected cylindrical morphology as illustrated in figure 3.3 is highly stable with a low centre of mass, ability to swivel through each other hence giving the capability to exercise clockwise as well as anticlockwise movements. Serpentine locomotion, accurate direction manoeuvrability and universal docking abilities make these the highly fit organisms directed towards replication which survives and evolves through generations. The above organism Mr.Adam is assumed to exist from time zero in our world, where we intend to carry out self-replication. The absence of Ms.Eve directs us towards

21 Chapter 3. Design 15 Figure 3.2: Molecube morphology Figure 3.3: Cylindrical morphology

22 Chapter 3. Design 16 Figure 3.4: Snapshot asexual reproduction!. describes this morphology. Module1 Servo1 Module2 Servo2 Module3 (3.1) 3.2 Recognition of Self & Homing The parent organism has to scan the environment for tracking the detached copy of itself. Thus recognition of self is required for Visual Homing. The onboard camera or the eye of the organism captures an image of its potential offspring, which would look like figure 3.4. The RGB pixel values are converted to their corresponding greyscale intensities. Then a histogram is plotted and smoothed with a Gaussian filter window as shown in the plot of figure 3.5. The blob corresponds to the pixel intensities corresponding to our detached organism we are interested in. Hence this could be our suitable threshold. However, adaptive thresholding can be done by exploiting bimodal distribution. But with the ones like above, we need to resort to some other method like background removal by using illumination and reflectance. The other two ways are localised thresh-

23 Chapter 3. Design 17 Figure 3.5: Histogram olding and colour segmentation. The next step is to form the binary image and derive the largest connected region. Then we form the feature vector(x) of compactness(c), family of translation invariant central moments(m), scale invariant moments(µ), and rotation invariant moments(i). X = (C,M,µ,I) (3.2) Then the Multivariate Gaussian Prior(p) is calculated over the above distribution with its corresponding mean(m) and covariance(a) matrix. 1 p( x/c) = (2π) n 2 det(a c ) 1 2 Then we use Bayes Classifier to evaluate the posterior. p(c/d) = p(d/c)p(c) p(d) e 1 2 [( x m c) A 1 c ( x m c )] This forms the decision boundary for the organism to sense and approach its copy. (3.3) (3.4) Our organism is capable of performing basic obstacle avoidance by using stereo scopic vision and avoiding the direction in which the optical flow field shows more motion i.e. the vector field with higher variation in intensities in image sequences.

24 Chapter 3. Design 18 Figure 3.6: Neural Controller 3.3 Creature Control We need to design and learn a control mechanism which would direct the organisms such as above to loco mote towards a detached part of itself, then dock and eventually detach. The basic neural controller circuitry which has been used to address problems such as these is illustrated in figure 3.6. The initial sensory inputs like joint angles of appendages form the set of input signals which are processed through the layers of the neural network producing motor commands for the actuators to get movement. These desired positions either directly or after preprocessing are then fed back over the next time step, forming the inputs and so on and so forth. They can either be trained on desired trajectories in which case it is a supervised problem or can be allowed to evolve its own path planning by evolving the weights against a desired fitness function. The known problems of such Multi Layer Perceptrons or feedforward networks is that even after varied start weights or many generations of evolution, they end up in fixed points as shown in figure 3.7 which is fatal in this cases as it causes the modular robot to stop movements. Hence, ideally a creature should be able to have an internal state beyond its sensor values, or be affected by its history. The approach taken by many to alleviate this problem in order to get interesting landscapes and behaviours with modular robots is to use time varying transfer functions such as oscillators which can give dynamic output even with static input and a mixture of host of other functions. This however to

25 Chapter 3. Design 19 Figure 3.7: Attractors Figure 3.8: Limit cycles me is not biologically plausible. The other very common approach used (mostly combined with the previous method) is to add a layer of recurrence in the hidden layer also known as the context neurons. These RNNs are artificial neural networks with adaptive feedback connections, also known as an Elman Network (Elman and Jeffrey, 1990). The use of feedback connections allow the RNN to have a memory of past events. The idea here is to find some structure in time, in order to get some rhythmic motor signals, in the form of limit cycles as illustrated in figure 3.8. One more commonly used network on the same lines is the fully connected hopfield network. This no doubt has been successful in generating rhythmic patterns for various locomotions but again suffers from lack of analysability and higher complexity. It also lacks hidden layer and its biological plausibility remains question mark. One more commonly used network on the same lines is the fully connected hopfield network. This no doubt has been successful in generating rhythmic patterns for various locomotions but again suffers from lack of analysability and higher complexity. It also

26 Chapter 3. Design 20 Figure 3.9: Quiver plot lacks hidden layer and its biological plausibility remains question mark. Feedback looping over function compositions definitely gives rise to fixed points, i.e. for example consider the following system in equations 3.5 to 3.6. (s(n+1),y(n)) = update(s(n),y(n)) (3.5) y(n) = out put(s(n), y(n)) (3.6) where s(n), y(n) are the state and output at n th instant. Solving this fixed point problem is the key. The solution is to use well formed feedback composition i.e. the output should be state deterministic which means all arcs going from a state of the Moore Turing machine must have the same output. This guarantees oscillatory dynamics. It could be made complex with more states and outputs and compositions but still remains simple and deterministic. Having said all that, with respect to the morphology described above, and task at hand, second order non-linear systems such as Central Pattern Generators seem to be perfect match. Figure 3.9 illustrates its velocity diagram. As discussed before, CPGs are a very low level neural computation models and also present a open loop controller paradigm as the output motoneurons of lampreys have marginal feedback. One more drawback would be the positioning of motoneurons which are to both sides of the nervous system but in our case we have a centralised servo controller. All these make its direct application unsuited for the task at hand.

27 Chapter 3. Design 21 Figure 3.10: Open loop with feedback Figure 3.11: FTDNN (Mathworks, 2007) The RCPGFTDNN RCPGFTDNN stands for Reinforced Central Pattern Generated Focussed Time-Delay Neural Network. The approach I take draws clues from a mixture of all the above techniques presented. But first lets look at the problem with a classical control perspective. This is an obvious Inverse model, as given the behaviour we need to predict the control. Also one might be tempted to call it open-loop because of neglecting the possibility of disturbances. But not forgetting about the initial training of the trajectory this model would undergo, I would rather describe the control process as Open Loop with Feedback as illustrated in figure The Focussed Time Delay Dynamic Neural Network This is quiet a straightforward dynamic network, which consists of a feedforward network with a tapped delay line at the input. This is part of a general class of dynamic networks, called focused networks, in which the dynamics appear only at the input layer of a static multilayer feedforward network. Figure 3.11 illustrates a two-layer FTDNN (Mathworks, 2007).

28 Chapter 3. Design 22 Figure 3.12: Tan-Sigmoid Transfer Function This network is well suited to time-series prediction. One nice feature of the FTDNN is that it does not require dynamic backpropagation to compute the network gradient. This is because the tapped delay line appears only at the input of the network, and contains no feedback loops or adjustable parameters. For this reason, this network trains faster than other dynamic networks. This essentially means that the output would respond only once it has seen a series of inputs buffered in its delay lines. I have used two delays for each input meaning initially the network would like to see 2(delay)+1(input) before computing its associated output in the respective time series. In the next time step the current input pushes into the delay line and the earliest stored value pops out. This process continues in a sequential manner until the entire pulse train is run over. I have used tansig (figure 3.12) for the transfer functions throughout since it has a nice range of -1 to +1 and this when combined with the weights sampled from a 0 mean, 1/n variance Gaussian distribution ideally should not saturate any neuron. In addition it directly encodes for a bidirectional movement pattern ideal for our task. The positive and negative firing rates could be considered as excitatory and inhibitory neurons to justify the biological plausibility. The network can be trained with backpropagation that updates weights and bias values according to Levenberg-Marquardt optimization. Learning occurs with gradient descent with momentum weight and bias learning function.

Chapter 3. Design 23 Figure 3.13: Motoneuron1 3.3.3 Central Pattern Generated Training Lamprey swimming is classified as anguilliform (i.e. eel like) swimming which roughly means that the forward thrust is produced by propagation of an undulatory wave along the entire length of the body.

29 Chapter 3. Design 23 Figure 3.13: Motoneuron Central Pattern Generated Training Lamprey swimming is classified as anguilliform (i.e. eel like) swimming which roughly means that the forward thrust is produced by propagation of an undulatory wave along the entire length of the body. These laterally directed (i.e. horizontal) waves continuously move from head to tail with an approximately constant velocity and a linearly increasing amplitude. The frequency of oscillation affects the speed. The spatial wavelength might be equal or less than the length of the body. All these with a bounded stochasticity can be implemented on our model network. This can be achieved by giving out of phase oscillatory motor commands to consecutive servo joints but with a small time lag as illustrated in figures 3.13 and This lag is the most important part which simulates the travelling wave through the body. Hence based on these waves we need to train our network. The phases and lags can be tweaked before feeding in to get desired behaviour. This methodology can be extended to as many modules.

30 Chapter 3. Design 24 Figure 3.14: Motoneuron2 Figure 3.15: Lorenz Attractor

31 Chapter 3. Design Reinforcement Based Fixed Stochastic Behaviour With the above time delay network topology and supervised control trajectory training framework, we can generate different chaotic behaviours by perturbing the weights and performance of the neural network as illustrated in figure The motoneuron waves can also be fed in a more stochastic manner by mixing the frequencies, amplitudes and waveforms. Once the desired behaviour is achieved, measured by the corresponding fitness function, this reinforcement is learned by freezing the corresponding weights and network topology. The fitness function can be tracking the detached copy of itself or efficient docking or something else. Sometimes early stopping in the training phase with degraded mean squared performance may also generate interesting behaviours which one may never think of designing by hand. The number of layered neurons, network connections and weights can all be evolved together with the corresponding creature morphologies.

32 Chapter 4 Implementation 4.1 Creature Morphology The morphology of the robot and the corresponding world dynamics are represented with a hierarchical tree structure where objects can contain other objects using VRML97, 3D modelling description language as shown in figure 4.1. These correspond to our initial world, the left tree describes the 3 modular organism and the right tree the detached copy of itself which are manufactured by the factory. The first step is to model the world dynamics like the terrain and its associated physics, lighting, and basic simulation time step. In this case the terrain is modelled as a bounded object so that the robots cannot pass through them. Next step is to create the modular robots and link them together with appropriate servo and joints. We start off with the first Module(1) which is defined to be a cylinder of required dimensions. Cylindrical modules are appropriate for our task because one can swivel across another in both directions to certain angular limits imposed by the shape of the bounding object as illustrated in figure 4.2. Along with this is defined the Servo(1) node which houses in between the above module and its connecting child through a joint allowing one degree of freedom. The servo can be either rotational or linear. Typically, rotational servos are used to simulate rotational motors or hinges while linear servos are used to simulate linear motors, pistons, springs, etc. Both types of servo apply a change of coordinate system between the servos parent and children. A rotational servo which we make use of increases and de- 26

33 Chapter 4. Implementation 27 Figure 4.1: VRML

Chapter 4. Implementation 28 Figure 4.2: Cylinder (Cyberbotics, 2006) creases the value of its rotation angle while keeping a constant translation and rotation axis as illustrated in figure 4.3.

34 Chapter 4. Implementation 28 Figure 4.2: Cylinder (Cyberbotics, 2006) creases the value of its rotation angle while keeping a constant translation and rotation axis as illustrated in figure 4.3. Apart from this, we need to specify the desired parameters for the Servo i.e. hard and soft limits on position, velocity, acceleration, force/torque, and proportional controller. A detailed discussion of all these can be found later under the Physics simulation. As can be seen from figure 4.1 the next Module(2) is the child of this Servo(1) whose parent in turn is Module(1). This methodology can be continued so on and so forth growing the organism as desired. Since all these modular robots are considered impervious, or hard bound, we need to define the corresponding physics parameters for each of them. The bounding object defines the shape used for collision detection and to automatically compute the inertia matrix of a Solid from its Physics. As can be seen in figure 4.1 the Physics node describes the density, bounce, force dependent slip, coulomb friction, and centre of mass. A detailed description is again provided under the Physics section. The last node or rigid body corresponds to that of the detached copy of the organ-

35 Chapter 4. Implementation 29 Figure 4.3: Servo (Cyberbotics, 2006) ism modelled here as a Supervisor aided with a touch sensor for purpose of docking. Both the Custom modular Robot and the detached Supervisor have their corresponding controllers defined. 4.2 Evolving Controller With Morphology For Self Replication To date self-replication has been tried in the following ways 1. Many authors have used cellular automata for replication. 2. Others have used planning to demonstrate self assembly of machines. 3. Deterministic self-replication has been carried out for lunar development by remote controlled assembly of robot parts. 4. And finally a set of connected cornell cubes could swivel around and grow a copy of itself. The approach I would be taking in this paper is quiet radical in the sense I would be applying my model of Reinforced Central Pattern Generated Focused Time-Delay Neural Network (RCPGFTDNN) to attain a full fledged real-time replication on 3D

Chapter 4. Implementation 30 Figure 4.4: Architecture homogeneous modular robots in a sophisticated world of physics simulation. 4.2.

36 Chapter 4. Implementation 30 Figure 4.4: Architecture homogeneous modular robots in a sophisticated world of physics simulation Architecture The proprietary architecture I use for realizing self-replication is illustrated in figure 4.4. ODE is an open source, high performance library for simulating rigid body dynamics with an easy to use C/C++ API. It has advanced joint types and integrated collision detection with friction based on the LCP solver. Simulation is carried by using the equations of motion which are derived from a Lagrange multiplier velocity based model. Compensation for internal disturbances is by using global inverse dynamics to determine joint forces. ODE uses a highly stable integrator, so that the simulation errors should not grow out of control and is designed to be used in interactive or real-time simulation. More on this can be found in the Physics section. VRML hierarchical 3D modelling was discussed at depth in the previous section. OpenGL (for Open Graphics Library) is a software interface to graphics hardware. The interface consists of a set of several hundred procedures and functions that allow a programmer to specify the objects and operations involved in producing high-quality

37 Chapter 4. Implementation 31 Figure 4.5: OpenGl (SGI, 2007) graphical images, specifically colour images of three dimensional objects. A typical program that uses OpenGL begins with calls to open a window into the frame-buffer into which the program will draw. Then, calls are made to allocate a GL context and associate it with the window. Each primitive is a point, line segment, polygon, or pixel rectangle. Primitives are defined by a group of one or more vertices. The model for interpretation of GL commands is client-server. GL commands are functions or procedures (SGI, 2007). Block diagram of the GL is illustrated in figure 4.5. The Robot Server sits at the heart of this architecture and this essentially acts as the interface between all the other bubbles channelising information from the Controller in order to articulate the Morphology using ODE physics and displaying the actuation using OpenGl. These are the sequence of steps of it s operation 1. Create the socket server using a specified port. 2. Listen and accept connections on this port from clients, Matlab in our case. 3. Enable the sensor devices. 4. Read the buffer for matching commands and carry out actuation. Matlab is used to realize and embed the dynamic neural controllers (RCPGFTDNN ) for the reconfiguring morphologies using the Neural Network Toolbox (Mathworks, 2007). The control over the morphology could be achieved by a setting the desired combination of position, velocity, acceleration, force, and control parameters of the corresponding servo. There are three very important things I would like to highlight

38 Chapter 4. Implementation 32 Figure 4.6: CPG signal about the controller at this stage 1. Network Training Lets assume the figure 4.6 CPG signal is being used to train the input sensor neuron and its corresponding output motor neuron pair. Then we must ensure that the training sets are presented to our model network with 2 tapped delay lines in the following fashion The first point lets say [0], marked by a blob in the figure goes through to the first delay line buffer and then the second point [0.3] moves to the second delay line buffer. Now the third point/blob [0.6] forms the first input representing the current joint angle between the two modules of the robot in our language, the past two being the past joint positions the robot joints have been through. Hence the output should be future target joint angle the robot should be in, which in our case is the fourth blob or [0.8] to be precise. This training cycle goes on and as I claim the oscillations can be reproduced effectively by this model. 2. Travelling Waves To accurately simulate the travelling waves like those of the lamprey in our mod-

39 Chapter 4. Implementation 33 Figure 4.7: Halfer network ular robot chain, we need to give the tonic pulse to the servos with a small lag as we go down its body. And our robot server can only accept consecutive commands from the neural controller client with an inherent delay of 0.1 seconds, or else the network gets clogged. So, without doing anything explicit to create the moving waves, our architecture implicitly takes care of it. 3. Behavioural Circuit As we shall see further on, the modules can swivel into each other only between -1/2 to +1/2 radians, hence we need an additional behavioural model sitting on top of the controller module. This can be any standard halfer circuit such as the hopfield network as illustrated in figure Self Replication (Stage-1) Shown in figures 4.8 and 4.9 is the morphology of the first instance of the system at the very beginning of the world, and a factory which manufactures the cells or proteins or the detached constituent copies of the parent organism. One can infer from the two figures, that this cylindrical modular morphology enables one module to swivel past another in both directions but with a position limited to +0.5 to -0.5 radians. Please note that here the parent is a chain of 3 modules connected via 2 servos. The task of this organism is to get to the factory and dock with the new module, for which it utilises its vision capabilities of object recognition with the help of the onboard camera and the Gaussian classifier described in the Recognition section. The

40 Chapter 4. Implementation 34 Figure 4.8: Stage-1 Morphology (a) Figure 4.9: Stage-1 Morphology (b)

41 Chapter 4. Implementation 35 Figure 4.10: Stage-1 Controller cylindrical morphology also aids in universal docking capabilities. The two shown sequential morphological patterns make the creature move forward. In order to do so, it makes use of the RCPGFTDNN model. The tapped delay neural network for the specific task at hand is shown in figure SN1 and SN2 are the two sensory inputs which provide the value of the current rotational servo joint angles of the two servos Servo1 and Servo2 respectively and directly form the feed of the network. MN1 and MN2 give the desired target rotational joint angles for the next time step. After actuation of this desired configuration, the current joint angles are again fed back in through the input of the network and so on and so forth. D0 and D-1 are the tapped delay lines which act as the buffer for recognising patterns in time. H1 to H5 are the hidden neurons. The activation function used is the tansig as shown. W and V are the first and second layer weight vectors. This network is trained with the Central Pattern Generated out of phase signals to set it on to the track of forward navigation like that of lamprey motion. Please refer to the previous sections for more on the training phase. The bounded stochasticity is achieved through evolution. Then reinforcement comes in the form of fitness function for accessing the trajectory

42 Chapter 4. Implementation 36 of its motion. Fitness = α(x f X i ) 2 (Y f Y i ) β (4.1) where X f, X i, Y f, and Y i are the final and initial distances from origin in the X and Y directions respectively. α rewards positive movements and beta penalises off trajectory motion. The above fitness function is optimised by evolving the weights as well as playing with the mean square performance. Apart from these the network topology like number of hidden layers and neurons can also be tweaked. Once we get the near optimum homing, the entire parameter space is frozen. Docking of the child with its parent can be achieved by many ways. Specific bonding and patterning mechanisms depend on the scale of the implementation, and can be magnetic, electrostatic, or hydrophilic/hydrophobic, for example. This paper makes use of the touch sensor equipped supervisor node, assuming any/all of the above forces coming into play. However transmitters and receivers installed on the modules can facilitate exact docking process. Advanced vision computation can also enable to give the right geometries Self Replication (Stage-2) After docking, the creature grows, successfully incorporating the new module in its own body. Now as can be seen, the morphology has four modules and three servos connected in the form of a linked list. Its task is to find a factory or dispenser in the vicinity to grab another module. For this the new formed organism needs to scan the environment by rotating about its centre of mass making discrete turns and capturing snapshots. Once the new protein cylinder is identified by the Gaussian classifier, the turning behaviour stops and the forward homing takes over. The next two figures 4.11 and 4.12 show the sequential respective turns the servo enabled modules take to achieve the desired anti-clockwise trajectory. The density of the material is kept constant and low throughout. With the morphological growth, our RCPGFTDNN controller also grows and adds one extra input sensor neuron SN3 and its paired motor neuron MN3 as illustrated in figure Having said that now the million dollar question is how to achieve direction control? The lamprey team solves this problem by increasing the frequency of the waves to the desired side of turn. But our organism is controlled by only one centrally located servo

43 Chapter 4. Implementation 37 Figure 4.11: Stage-2 Morphology (a) Figure 4.12: Stage-2 Morphology (b)

44 Chapter 4. Implementation 38 Figure 4.13: Stage-2 Controller between each pair of connected modules. Others have handled this issue either by changing the morphology altogether like to a quadruped or by evolving the weights until it produces some weird behaviour. But the approach which I propose here is new and different to the ones mentioned above. There are two ways in which this could be achieved 1. As compared to generating and feeding the entire length of the organism with a single travelling wave, the modification here is to first feed the second servo joint with a propagating wave, then keeping the first servo joint static, feed the third servo joint with an out of phase oscillating pulse of course not forgetting about the very important lag in between the deployments. This is shown in the figure 4.13 network controller diagram. 2. Another way of achieving this is to evolve the weights or limit the performance to optimise the same fitness function of equation 4.1 where all the variables have the same meaning as before. This approach though lacks rationality, but also gets to the above conclusion. Rest of the network dynamics remain the same as

45 Chapter 4. Implementation 39 the previous network Self Replication (Stage-3) The individual embodied agents start with no knowledge of their environment; however, with each subsequent generation, knowledge of how to survive within the environment is passed to the offspring. The number of input and output neurons of the nervous system keeps growing with the creature and through the network connections, each module becomes more aware of the others existence and hence the position of one has an effect on the orientation of the other. However, the out of proportion and sometimes static hidden layer neurons provide the scope for centralised control. With the combination of the above two basic network topologies, any subsequent nervous system to control the growing morphologies can be created. New.Nervous.System = α (2.Sensorymotor.Toplolgy)+β (3.SensoryMotor.Topology) (4.2) For example for controlling stage 3 morphology, we need 4 pairs of sensory motor control and hence the above parameters could be set to α = 2 and β = 0. This stage is just an extension to the above two stages, the morphology and controller both growing by one more layer. The same kind of network dynamics and evolutionary strategies can be formulated and reinforced on top of a mixture of CPG signals, which forms the basis to get a wide range of guided trajectories. Docking remains the same Self Replication (Stage-4) This is the very last stage in this entire process as now the morphology has grown to six modules connected by five servos. Inspired by Mitosis and Meiosis, our organism also gets ready for binary fission. In the process it passes a copy of its chromosomal DNA to the new born child which in this case is the exact replica of the parent organism as shown in the figure Hence, the new born child will be superior compared to its previous generation as it would have already inherited the information on various weight, topology, and signal processing combinations of its nervous system. The release or anti-docking can be achieved by demagnetising the servo joint.

Chapter 4. Implementation 40 Figure 4.14: Stage-4 Morphology 4.3 Physical Simulation The virtual environment employs the Newtonian model of physics as it applies to the natural world.

46 Chapter 4. Implementation 40 Figure 4.14: Stage-4 Morphology 4.3 Physical Simulation The virtual environment employs the Newtonian model of physics as it applies to the natural world. The embodied agents that live within this virtual environment are not able to violate any of the constraints of the environment. The virtual environment utilises a sophisticated physics engine to accurately simulate rigid body dynamics, joints, contacts/collisions, friction, inertia and gravity by simulating natural world physics. The physics engine acts directly upon the rigid bodies of the agents; which can be further constrained by the use of a joint; which can connect 2 or more rigid bodies and can be permanent (such as a hinge or slider joint), or can be temporary and a result of the collision of 2 rigid bodies (such as a contact joint). Gravity in the virtual world is set at approximately 9.8m/s 2. Ground friction is necessary for locomotion, and is modelled in the temporary contact joint created during a collision. A rigid body has various properties from the point of view of the simulation illustrated in figure Four properties of rigid bodies that change with time are: (1) Position vector (x,y,z) of the body s point of reference corresponding to a bodies centre of mass, (2) linear velocity vector of the point of reference (vx,vy,vz), (3) orientation of a

47 Chapter 4. Implementation 41 Figure 4.15: Rigid body (Smith, 2006) body, represented by a quaternion (qs,qx,qy,qz) or a 3x3 rotation matrix and (4) angular velocity vector (wx,wy,wz) that describes how orientation changes with respect to time. Rigid body properties that remain constant over time include: (1) Mass of the body, (2) position of the centre of mass and (3) inertia matrix describing how the body s mass is distributed around the centre of mass. Contact joints simulate friction at the contact by applying forces in the 2 friction directions that are perpendicular to the normal (Smith, 2006). These properties are used internally within the physics engine to calculate the forces and torques that affect the rigid bodies as illustrated in figure The hinge joint has 2 sensors that can provide the current angle and angular velocity to the embodied agent. The joint also has 1 effector that accepts the desired velocity as its input as illustrated in figure The embodied agents effectors allow them to control the relative angular or linear velocities of two bodies connected via a joint, thus enabling them to control their appendages and produce motion. The effector applies torque to a joints degree(s) of freedom to get it to pivot or slide at the desired speed. Using the equations from 4.4 to 4.9, the physics engine can quickly determine the acceleration experienced by the bodies attached to the joint based upon the desired speed set by the RNN. The other rotational force the physics engine must consider is the moment of inertia, I = M.r 2 (4.3) as illustrated in figure The rigid bodies of this simulation have a homogeneous distribution of mass about the body s centre of mass.

48 Chapter 4. Implementation 42 Figure 4.16: Contact joint (Smith, 2006) Figure 4.17: Hinge joint (Smith, 2006)

49 Chapter 4. Implementation 43 Figure 4.18: Torque (Smith, 2006) The servo control is carried out in three steps as depicted in the figure The first step is achieved by the user-defined controller program that decides which position, velocity, and force must be used. 2. The second step is achieved by Webots s servo-controller that computes the current rotational or linear velocity of the servo Vc. 3. Finally, the third step is carried out by the physics simulation that computes the integration of all the forces in presence (Cyberbotics, 2006). At each simulation step, the servo-controller recomputes the current velocity Vc according to the following algorithm V c = P (P t P c ) (4.4) i f (V c > V d ) V c = V d (4.5) i f (A! = 1) { (4.6) a = V c V p t s (4.7) i f (a > A) a = A (4.8) V c = V c + a t s } (4.9) where V c is the current velocity in rad/s or m/s, P is the control parameter specified by the controlp field, P t is the target servo position set by servo set position(), P c is the current servo position such as reflected by the position field, V d is the desired motor velocity such as specified by servo set velocity(), a is the acceleration that would be

50 Chapter 4. Implementation 44 Figure 4.19: Servo control (Cyberbotics, 2006) required to reach the current speed, V p is the motor velocity in the previous time step, t s is the duration of the simulation time step such as specified by the basictimestep field of the WorldInfo node and A is the acceleration of the servo motor such as specified by the acceleration field. Both the controlp and acceleration fields are discussed in the following paragraphs. The maxvelocity field specifies the default and maximum value for the desired motor speed (V d ). The desired motor speed is the motion speed that the servo will reach, unless physical forces (external, spring, damping or custom) prevent it from doing so. The desired motor speed can also be changed in run-time with servo set velocity(). The maxforce field specifies the default and maximum motor torque/force F that is sent to the physics simulator, see figure The torque/force can also be changed in runtime with servo set force(). A too small value of maxforce may result in a servo unable to move or to maintain a target position because of the weight it has to hold. Note that setting the force to zero is equivalent to calling the function servo motor off(). The controlp field specifies the default value of the P parameter of the proportional controller. P is used to compute the current speed V c from the current P c and target position P t, such that V c = P (P t P c ), see the complete algorithm (4.4 to 4.9). With a small P a long time is needed to reach the target position while a too large P leads to instabilities. The value of P can also be changed in run-time with servo set control p(). The acceleration field defines the default motor acceleration A used by the servo controller to compute the current speed V c. As shown in the algorithm (4.4 to 4.9), the

51 Chapter 4. Implementation 45 parameter A fixes an upper limit for the rate of change of the current speed V c. The value of A can also be changed in run-time with servo set acceleration(). If A is set to -1, then the acceleration is infinite and V c reaches the desired motor speed V d immediately. However, we could add our custom forces if the above mechanism doesn t produce accurate results in the real world. This could be achieved by implementing a Proportional Integral Derivative (PID) controller given by equation u(t) = K p e(t)+k i Z t 0 e(τ)dτ+k d de dt (4.10) where e(t) is the error dynamics, u(t) the control parameter, K p the error proportional gain, K i the gradual integral gain and K d is the derivative gain for small changes. The Physics node allows us to define a number of physics parameters to be used by the physics simulation engine. The Physics node is useful when simulating legged robots to define mass repartition and friction parameters, thus allowing the physics engine to simulate a legged robot accurately, making it fall down when necessary. If the density field is a positive value and the mass field is set to -1, the actual mass of the Solid node will be computed based on the specified density and the volume defined in the boundingobject of the Solid node. The bounce field defines the bouncyness of a solid. This restitution parameter is a floating point value ranging from 0 to 1.0. When two solids hit each other, the resulting bouncyness is the average of the bounce parameter of each solid. The bouncevelocity field defines the minimum incoming velocity necessary for bounce. Incoming velocities below this will effectively have a bounce parameter of 0. The coulombfriction field defines the friction parameter which applies to the solid regardless of its velocity. It ranges from 0 to infinity. Setting the coulombfriction to -1 means infinity. The forcedependentslip field defines the Force Dependent Slip (FDS) for friction, as explained in the ODE documentation. FDS is an effect that causes the contacting surfaces to side past each other with a velocity that is proportional to the force that is being applied tangentially to that surface. It is especially useful to combine FDS with an infinite coulomb friction parameter. The inertiamatrix field defines the inertia matrix as specified by ODE. If it contains exactly 9 floating point values, and if the mass field is different from -1, then it is used as follow: the 9 parameters are the same as the ones used by the dmasssetparameters

52 Chapter 4. Implementation 46 ODE function. The parameters given in the inertiamatrix are: cgx, cgy, cgz, I11, I22, I33, I12, I13, I23, where (cgx,cgy,cgz) is the center of gravity position in the body frame. The Ixx values are the elements of the inertia matrix, expressed in kg.m 2 : I11 I12 I13 I12 I22 I23 I13 I23 I33 The centerofmass field defines the position of the center of mass of the solid. It is expressed in meters in the relative coordinate system of the Solid node. It is affected by the orientation field as well. The orientation field defines the orientation of the local coordinate system in which the position of the center of mass (centerofmass) and the inertia matrix (intertiamatrix) are defined.

53 Chapter 5 Evaluation 5.1 Creature Morphology One of the basic criteria for high performance is that the body of the modular reconfigurable robot should be light enough for efficient manoeuvrability. The next is all the modules should be homogeneous, that is the physics of one module should be exact photocopy of the other. Having said that the modules should be dense enough so that when they form a morphology, the corresponding organism should have a stable and low centre of mass. Keeping all that in mind, the effective density was sought to be 0.2kg/m 3. The last reasoning of having a low and stable centre of mass forms the basis of the proposed bonding mechanism of the cylindrical modules up against the mole cubes. However the design of the cylindrical modules itself is to facilitate bidirectional swivelling through each other up against the restricted unidirectional joint movements of the mole cubes. This also ensures that the movements are in the horizontal plane rather than in the vertical plane which makes the mole cubes highly unstable because more often its centre of mass tends to fall outside its body making it to fall down and hence ineffective for locomotion. Cylindrical morphology also enables universal docking since it gives far more edges compared to the box shapes and hence is highly fit for the task of reconfiguration and self-replication. The bounce parameters should be very low although at the same time maintaining physical plausibility. Using this kind of special moulding material prevents the modules from bouncing against the terrain as well as each other while 47

54 Chapter 5. Evaluation 48 exercising movements. For the desired serpentine type motion, we need to get sufficient reactive force from the terrain caused by friction. This also calls for the tangential force dependent slip which would have occurred naturally. The entire combination should be such that the net vector sum directs the organism to the desired behaviour. I could achieve this by a combination of infinite coulomb friction with some k times force dependent slip. For generating spontaneous and big torques, which is required to drive the organism, the acceleration should be infinite as well so that the desired servo velocity is reached within no time. This can be physically realized using high gain circuits. 5.2 Network Controller For the Stage-1 replicator network topology, the number of delay lines must be two and the number of hidden units fixed to five for generating optimum complementary oscillations. The delay lines are the essence for the autonomous generation of oscillations as based on that the network recognises patterns in time. Five hidden neurons signifies the variation of the objective function in five orthogonal basis directions of the vector space. The training function used is backpropagation and number of epochs set to 200. The mean squared error curve is plotted in figure 5.1. The performance is quiet fast. However, the training can be optimized by Stochastic Local Search (SLS) over the vector space of combinatorial weights, like A*, Simulated Annealing, Tabu Search etc. This is when the network is trained with out of phase CPG waves. The physics simulation step must be quiet low for continuous flow of state variables, ideally around 64ms. Another very important parameter is the rotational servo velocity with which the target positions generated by the network are to be realised. This is directly proportional to the velocity with which the organism swirls forward. Of course, not forgetting about the lag in the travelling wave which should ideally increase or decrease with the increase or decrease of the servo velocity. For a typical number of 100 iterations, with the above parameters the organism moves a distance of 0.05m. Something to really cheer about is that the locomotion is dead

55 Chapter 5. Evaluation 49 Figure 5.1: Network performance straight even with so many swirls. Now lets put some stochasticity in the network by training the same network with the same dynamics but this time with a mixture of CPG waves i.e. mixing periodic signals with varied frequencies and amplitudes of oscillation. The equation I used is 5.1 and corresponding plots are shown in figure 5.2. F = 1 3 sin(2 π 5 X)+ 2 cos(2 π 12 X) (5.1) 3 The results are outstanding as the network is highly successful in generating bounded oscillatory stochastic signals. Bounded because its between the rotational limits, oscillatory because in consecutive time steps the two motor outputs are out of phase relative to itself as well as each other. And finally stochastic because the amplitudes of the generated motor commands vary significantly. This circuit can be tuned by evolving the weights or initialising with different Gaussian samples or limiting the performance. All these produce varied behaviours which could be used for localization and tracking of its detached copy. To compare the results presented with other standard neural network topologies. Lets

56 Chapter 5. Evaluation 50 Figure 5.2: CPG mixture first consider how a Multi Layer Feedforward Network would perform over the same data set and predict the desired control trajectory. As expected, its a complete disaster, as after training, within no time the output settles to the maximum limits of -1 or +1, and completely fails to generate any form of oscillation whatsoever. The result is evidently in line with what was expected since this network doesnot have any provision for memory cells and hence can neither learn nor reproduce patterns in time. Now lets compare our RCPGFTDNN network with our old friend the Layered Recurrent Network (LRN) as shown in figure 5.3. This has 8 hidden units and the recurrence dynamics occurs in this layer, as can be observed. Training this network is a pain and we use Bayesian regularisation backpropagation which is expensive on cpu resource and takes long time with only 50 cycles. While predicting control, it does well initially generating out of phase motor commands output. In this way it can be better than the feed forward network. But eventually even the LRN settles to its attractor states and getting into a limit cycle cannot be guaranteed and probably would require long evolution runs. Now lets discuss the performance of the Stage3 replicator. Please recollect that it has a

57 Chapter 5. Evaluation 51 Figure 5.3: LRN (Mathworks, 2007) combination of four modules linked together by three servos and hence controlled by a three sensory motor neural controller, discussed at depth in the previous chapters. 5.3 Manoevring Innovation The important invention which I propose here is that we could manoeuvre the organism precisely by rearranging the sequence in which the wave is discharged through the length of its body. This is novel in its own way compared to other approaches requiring evolution as they would consume lot of time and computing resource, without guaranteeing an optimised fitness, and also lacking analysability of the solution. Lets analyse all the behaviours one by one as follows 1. Swirl Forward We train this network with three consecutive out of phase CPG waves. It has 5 hidden neurons and training goes on for 200 epochs. For forward tracking behaviour, control signal is first propagated through the first servo, then after a small lag, the third signal through the third servo and only then after a lag is the second servo approached. This cycle goes on till the goal is reached.

58 Chapter 5. Evaluation 52 Figure 5.4: Network performance 2. Swirl Backward All the network dynamics remain the same except for the sequence of wave propagation. The order here is third control signal passes through the third servo first, then after a lag the first is discharged, and then finally after a lag the second is discharged. This as well is highly linear even over large distances. 3. Clockwise Turn Lets consider first the approach of evolution, thus maintaining the training of the network with the same set of consecutive out of phase CPG signals. Now all we need is to limit the network performance and freeze the corresponding weight and bias matrix as shown in the following matrices and figure InputWeights =

Chapter 5. Evaluation 53 LayerWeights = Biases = Figure 5.5: Turn 4.7840 3.5976 4.6764 4.1045 5.9134 0.4257 0.2783 26.8690 0.4492 26.7718 0.4257 0.2783 26.8690 0.4492 26.7718 ( ) 0.9951 2.6183 7.

59 Chapter 5. Evaluation 53 LayerWeights = Biases = Figure 5.5: Turn ( ) The order in which the waves need to be dispensed in this case is two followed by one followed by three. Lets study the second planned approach with the anticlockwise turn next. 4. Anti-clockwise Turn For this we need to modify the network training dynamics, the first sensory motor feed should be a constant pulse, followed by the two out of phase CPG signals for the next two neuron pairs. Once trained the waves generated by this network must be dispensed first through second then through first followed by the third. The creature makes sharp 30 degree turn with a standard 100 iterations, the centre of mass remaining at the same point through out as can be seen in figure 5.5 which is excellent result for our purpose.

Chapter 5. Evaluation 54 Figure 5.6: Forward 5.4 State Space One more thing to note is that the motion dynamics significantly changes with increasing creature length.

60 Chapter 5. Evaluation 54 Figure 5.6: Forward 5.4 State Space One more thing to note is that the motion dynamics significantly changes with increasing creature length. There is a sharp increase in the amount of distance covered by the centre of mass of the entire modular robot. The stage 4 replicator almost covers double the distance than its stage 2 predecessor, that is 0.1m as can be seen in figure 5.6, with all other dynamics just scaling up but remaining the same. Two of the possible reasons up front are 1. As the creature acquires new modules, there is a linear increase in the number of servos, and the net forward thrust increases as well. 2. However, the loading on the intermediate servos increase as well as it gets responsible for the movement of longer chain of attached modules. To access the stability of the motion dynamics, the controller can be viewed as a Continuous Time Linear Time Invariant system, with the signals as inputs to the system. Such a system could be represented by equation 5.2. D (n) n 1 y(t)+ i=0 α i (t)d i y(t) = βu(t) (5.2)

Co-evolution of Morphology and Control for Roombots

Co-evolution of Morphology and Control for Roombots Master Thesis Presentation Ebru Aydın Advisors: Prof. Auke Jan Ijspeert Rico Möckel Jesse van den Kieboom Soha Pouya Alexander Spröwitz Co-evolution