Active Guidance for a Finless Rocket using Neuroevolution

Active Guidance for a Finless Rocket using Neuroevolution Gomez, F.J. & Miikulainen, R. (2003). Genetic and Evolutionary Computation Gecco, 2724, 2084 2095. Introduction Sounding rockets are used for making scientific measurements of the Earth s upper atmosphere. They serve an invaluable role in many areas of scientific research including high-g-force testing, meteorology, radio-astronomy, environmental sampling and micro-gravity experimentation and have been used for over 40 years. Today, they are the most cost-effective platform for upper atmosphere experiments. To maintain stability and to keep a relatively straight path, sounding rockets are equipped with fins. Fins however increase both mass and drag on the rocket which lowers the final altitude or apogee that can be reached with a given amount of fuel. A rocket with smaller or no fins could fly higher than a full-finned design, however it is unstable and thus needs active attitude control or guidance. Early guidance systems of finless rockets are based on classical feedback control such as Proportional Integral-Differential (PID) methods. PIDs control the thrust angle of the engines. To apply these linear methods engineers must make simplifying assumptions, because rocket flight dynamics are highly non-linear. These assumptions must not be violated during flight. This requires detailed knowledge of the rocket s dynamics which can be very costly to acquire. Non-linear approaches such as neural networks have been explored. Neural networks can implement arbitrary non-linear mappings that can make control greatly more accurate and robust, but, unfortunately, still require significant domain knowledge to train. A method is proposed to develop more economic finless sounding rockets by using Enforced SubPopulations (ESP) to evolve a neural network guidance system. Test case is a finless version of the RSX-2 sounding rocket. This rocket uses differential thrust of its four engines to control attitude. By evolving a neural network controller which maps the state of the rocket to thrust commands, the guidance problem can be solved without the need for analytical modelling of the rocket s dynamics or prior knowledge of the appropriate control strategy to employ. A sufficiently accurate simulator and a fitness function are required to test ESP. Cognitive Robotics\Applications Summary of: Gomez (2003) Esar van Hal 1

Enforced SubPopulations (ESP) ESP is a neuroevolution method that extends the Symbiotic, Adaptive Neuroevolution algorithm (SANE). Both ESP and SANE evolve partial solutions or neurons instead of complete networks, and a subset of these neurons are put together to form a complete network. In ESP, subpopulations are predefined and a neuron can only be recombined with members of its own subpopulation (explicit subtasks). This way the neurons in each subpopulation can evolve independently and specialize rapidly into good network sub-functions. Evolution in ESP proceeds as follows: 1. Initialization. A subpopulation of neuron chromosomes is created. Each chromosome encodes the input and output connection weights of a neuron with a random string of real numbers. 2. Evaluation. A random selection of neurons, one from each subpopulation, takes place to form the hidden layer of a complete network. This network is submitted to a trial and awarded a fitness score. This is added to the cumulative fitness score and the process is repeated until each neuron has participated in an average of e.g. 10 trials. 3. Recombination. Average fitness of each neuron is calculated by dividing its cumulative fitness by the number of trials in which it participated. Ranking by average fitness than takes place and the top quartile is recombined with a higher-ranking neuron. The offspring replaces the lowest-ranking half of the subpopulation. 4. Repeating the evaluation and recombination cycle until a well performing network is found. Evolving neural networks at the neuron level are efficient for solving reinforcement learning tasks. ESP is more efficient than SANE first, because the subpopulations are already present with ESP. This way, organization does not need to take place from one single large population, and their progressive specialization is not hindered by recombination across specializations. Second, because the network is formed out of a representative from each specialization, a context dependent role evaluation of neurons takes place. As with a normal GA, a diversity decline over the course of evolution takes place with ESP. To deal with this premature convergence, ESP is combined with burst mutation. When performance has stagnated for a predetermined number of generations, new subpopulations are created by adding noise to each of the neurons in the best solution. After noise is added, evolution resumes, searching the space in a neighbourhood around the previous best solution. This done by applying Cauchy distribution to ensure that most changes are small while allowing for larger change to a some weights: f(x) = 2 2 π(a + x ) This technique of recharging the subpopulations keeps diversity so that solutions can be found even in prolonged evolution. Cognitive Robotics\Applications Summary of: Gomez (2003) Esar van Hal 2

The Finless Rocket Guidance Task The motion of a rocket is defined by the translation of its center of gravity (CG), and the rotation of the body about the CG in the pitch, yaw and roll axes. Four forces act upon the rocket in flight: (1) The thrust of the engines, (2) the drag of the atmosphere exerted at the center of pressure (CP), (3) the lift force generated by the fins along the yaw axis, and (4) the side force generated by the fins along the pitch axis. The angle between the direction the rocket is flying and the longitudinal axis of the rocket in the yaw-roll plane is known as the angle of attack or α. The angle in the pitchroll plane is known as the sideslip angle or β. When α or β is greater than 0 degrees the rocket will start to tumble if it is not stable. With fins, a torque is generated by the lift or side force of the fins that counteract the drag torque and α and β are minimized. Without fins, the CP is ahead of the CG causing the rocket to be unstable and a nonzero α or β will tend to grow during flight. During flight, the interactions between the rocket and the atmosphere are highly non-linear and complex, and the behaviour continuously changes throughout flight. From 0 to about 22,000ft the drag rises sharply, causing an increase of torque on the rocket and making attitude control increasingly difficult. The distance between the CG and CP also increases during this period because the consumption of fuel causes the CG to move back. Both lead to an increasingly unstable rocket. After 22,000ft the air becomes less dense, drag thus decreases and the CP steadily migrates back towards the CG, so the rocket becomes easier to control. For ESP, this means that the task becomes progressively harder as the population improves and the rocket is controlled to higher altitudes. Even above 22,000ft, progress in evolution becomes increasingly difficult because the controller is constantly entering unfamiliar state space. The control task was found to be too hard; evolution stalls and converges to a local maxima. Therefore, an incremental evolution approach is applied. The initial task is made easier by using a more stable version of the rocket first. Cognitive Robotics\Applications Summary of: Gomez (2003) Esar van Hal 3

Rocket Control Experiments Simulation environment As an evolution environment a adapted version the JSBSim Flight Dynamics Model is used. Aerodynamic forces and moments on the rocket were calculated using a detailed geometric model of the RSX-2. Four fin configurations were used: full fins, half sized fins, quarter sized fins, and no fins. Control architecture The controller is a feedforward neural network with one hidden layer. The control timestep is 0.05 seconds. At this interval the controller receives a vector input about the current orientation, orientation change rate, α, β, current throttle of the four thrusters, altitude and velocity. This vector input is propagated through the network to produce a new throttle position for each engine determined by: u = 1.0 / δ, i = 1.. 4 i o i where u i is the throttle position of thruster i, o i is the value of network output unit i, 0 u i,o i 1, and δ 1.0. δ controls how far the controller is permitted to throttle back an engine from 100% thrust. Experimental setup The objective was to let ESP evolve a controller which was able to control the thrust of the rockets engines to maintain α and β within ±5 degrees. Each network was therefore evaluated in a single trial that consisted of the following four phases: 1. At t 0, the rocket is on a 50ft rail and the engines are ignited. 2. At t 1 >t 0, the rockets ascending begins with engines at full thrust. 3. At t 2 >t 1, the rocket leaves the rail and the controller begins to modulate the thrust. 4. At t f >t 2, one of two events occur: a. α or β exceeds ±5 degrees; failure. b. Burnout is reached; success. The rockets altitude at t f becomes the fitness score. A large locally maximal region was found in the network weight space which was keeping all four engines at full throttle. In initial populations, this policy is likely, but does not solve the task. Therefore all controllers that exhibited this policy were penalized by setting their fitness score to zero. The simulation used 10 subpopulations (10 hidden units) of 200 neurons, and δ was set to 10, meaning that the thrust range lies between 90% and 100% for each engine. This range proved sufficient in early testing. The earlier discussed incremental evolution method used was the quarter sized finned rocket. Cognitive Robotics\Applications Summary of: Gomez (2003) Esar van Hal 4

Results ESP solved the quarter-finned controlling task in approximately 600,000 evaluations. Another 50,000 evaluations were required to transition to the finless rocket. The full-finned rocket without guidance reached burnout at approximately 70,000ft. The quarter-finned and finless rockets with guidance both exceed this altitude by 10,000ft and 15,000ft respectively. The final altitude of the finless rocket is about 20 miles higher than that of the finned rocket. During flight, the controller makes smooth changes to the thrust of the engines throughout the flight. With guidance, α and β are kept at very small values up to burnout, whereas the unguided quarter-finned and half-finned rockets start to tumble as soon as α and β diverge from 0 degrees. Compared to the quarter-finned controller, the finless controller not only solves a more difficult task, but does so with more optimal performance. Discussion/Conclusion The rocket control task is representative of complex non-linear tasks. The advantage of ESP over traditional engineering approaches is that it does not require formal knowledge of system behaviour or prior knowledge of correct control behaviour. Furthermore, the differential thrust approach is feasible and the simulation has provided valuable information about the behaviour of the rocket. In future research, the task will be made more realistic in two ways: (1) the controller will no longer receive α and β values as input, and (2) instead of a generating continuous control signal, the network will output a binary vector to throttle back. After this, work will focus on varying environmental parameters and incorporating noise and wind. The authors conclude that the through ESP evolved guidance system is able to stabilize the finless rocket and greatly improve its final altitude compared to the full-finned, stable version of the rocket. Neuroevolution is a promising approach for difficult non-linear control tasks in the real world. Cognitive Robotics\Applications Summary of: Gomez (2003) Esar van Hal 5

Own Discussion In general, superficial explanations of processes are given in the article. When explaining the ESP method; the authors do not go into detail about the underlying algorithm of which ESP consist. This not only hinders acquiring insight in the ESP algorithm, it also compromises the ability to repeat the current study. It is claimed by the authors that ESP can be generalised to other tasks, examples of other tasks given in the article are closed-loop tasks, such as pole-balancing. Although tasks are being mentioned that could be controlled in a more open-loop manner, such robot arm control, I doubt that ESP will be able to do these sorts of tasks in its current form, i.e. without having both formal knowledge about the behaviour of the system and prior knowledge of the correct control behaviour. Furthermore, tests are performed with quarter-finned and finless rockets only. I wonder why the authors did not use a gradually decreasing fin size when implementing an incremental evolution approach. This seems a more natural way of implementing neural evolution and prevents making arbitrary choices such as in this case- choosing the size of the fins (full, half, quarter, and no fins). The authors stop the simulations when reaching burnout and fitness scores are then awarded based on the time until burnout (in the case of success). However, after burnout, the finless rocket cannot be controlled anymore, but it is still unstable. The authors fail to mention this or to provide possible solutions to this problem, e.g. using retractable fins which unfold directly after burnout. In the conclusion, the authors state that neuroevolution is a promising approach for difficult nonlinear control tasks in the real world. Based on their simulation experiments only, I find this to be a quite preliminary conclusion. Cognitive Robotics\Applications Summary of: Gomez (2003) Esar van Hal 6