Application of Neural Networks for Control of Inverted Pendulum

Similar documents
Neural Network Control of an Inverted Pendulum on a Cart

Methods for Supervised Learning of Tasks

Matlab-Based Tools for Analysis and Control of Inverted Pendula Systems

Non-Linear Swing-Up and Stabilizing Control of an Inverted Pendulum System

Identification and Control of Nonlinear Systems using Soft Computing Techniques

ROTARY INVERTED PENDULUM AND CONTROL OF ROTARY INVERTED PENDULUM BY ARTIFICIAL NEURAL NETWORK

FUZZY SWING-UP AND STABILIZATION OF REAL INVERTED PENDULUM USING SINGLE RULEBASE

System simulation using Matlab, state plane plots

q 1 F m d p q 2 Figure 1: An automated crane with the relevant kinematic and dynamic definitions.

Design and Comparison of Different Controllers to Stabilize a Rotary Inverted Pendulum

Lab 6d: Self-Erecting Inverted Pendulum (SEIP)

Controlling the Inverted Pendulum

Lab 6a: Pole Placement for the Inverted Pendulum

SRV02-Series Rotary Experiment # 7. Rotary Inverted Pendulum. Student Handout

ANN Control of Non-Linear and Unstable System and its Implementation on Inverted Pendulum

Lab 5a: Pole Placement for the Inverted Pendulum

Lab 4 Numerical simulation of a crane

Inverted Pendulum control with pole assignment, LQR and multiple layers sliding mode control

University of Utah Electrical & Computer Engineering Department ECE 3510 Lab 9 Inverted Pendulum

1 Introduction. 2 Process description

FUZZY LOGIC CONTROL Vs. CONVENTIONAL PID CONTROL OF AN INVERTED PENDULUM ROBOT

Reverse Order Swing-up Control of Serial Double Inverted Pendulums

EE Homework 3 Due Date: 03 / 30 / Spring 2015

Chapter 2 PARAMETRIC OSCILLATOR

available online at CONTROL OF THE DOUBLE INVERTED PENDULUM ON A CART USING THE NATURAL MOTION

Modeling, Identification and Control of Cart-Pole System

Lab 3: Quanser Hardware and Proportional Control

Digital Pendulum Control Experiments

Nonlinear Controller Design of the Inverted Pendulum System based on Extended State Observer Limin Du, Fucheng Cao

Rotary Inverted Pendulum

THE REACTION WHEEL PENDULUM

Intelligent Systems and Control Prof. Laxmidhar Behera Indian Institute of Technology, Kanpur

Mechatronic System Case Study: Rotary Inverted Pendulum Dynamic System Investigation

FUZZY-ARITHMETIC-BASED LYAPUNOV FUNCTION FOR DESIGN OF FUZZY CONTROLLERS Changjiu Zhou

QNET Experiment #04: Inverted Pendulum Control. Rotary Pendulum (ROTPEN) Inverted Pendulum Trainer. Instructor Manual

The Acrobot and Cart-Pole

In the presence of viscous damping, a more generalized form of the Lagrange s equation of motion can be written as

Real-Time Implementation of a LQR-Based Controller for the Stabilization of a Double Inverted Pendulum

Introduction to centralized control

Different Modelling and Controlling Technique For Stabilization Of Inverted Pendulam

Neural Network Approach to Control System Identification with Variable Activation Functions

MEAM 510 Fall 2011 Bruce D. Kothmann

Review: control, feedback, etc. Today s topic: state-space models of systems; linearization

Balancing of an Inverted Pendulum with a SCARA Robot

Inverted Pendulum System

Inverted Pendulum. Objectives

Chapter 2 SDOF Vibration Control 2.1 Transfer Function

A Model-Free Control System Based on the Sliding Mode Control Method with Applications to Multi-Input-Multi-Output Systems

Active Guidance for a Finless Rocket using Neuroevolution

MODELLING AND SIMULATION OF AN INVERTED PENDULUM SYSTEM: COMPARISON BETWEEN EXPERIMENT AND CAD PHYSICAL MODEL

Efficient Swing-up of the Acrobot Using Continuous Torque and Impulsive Braking

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

SWING UP A DOUBLE PENDULUM BY SIMPLE FEEDBACK CONTROL

Control Systems I. Lecture 2: Modeling. Suggested Readings: Åström & Murray Ch. 2-3, Guzzella Ch Emilio Frazzoli

(A) 10 m (B) 20 m (C) 25 m (D) 30 m (E) 40 m

Manufacturing Equipment Control

MEAM 510 Fall 2012 Bruce D. Kothmann

Comparison of LQR and PD controller for stabilizing Double Inverted Pendulum System

Positioning Servo Design Example

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN

Effect of number of hidden neurons on learning in large-scale layered neural networks

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Lab 1: Dynamic Simulation Using Simulink and Matlab

Connecting the Demons

Laboratory Exercise 1 DC servo

POWER SYSTEM DYNAMIC SECURITY ASSESSMENT CLASSICAL TO MODERN APPROACH

Dynamic Modeling of Rotary Double Inverted Pendulum Using Classical Mechanics

DISTURBANCE ATTENUATION IN A MAGNETIC LEVITATION SYSTEM WITH ACCELERATION FEEDBACK

The Study on PenduBot Control Linear Quadratic Regulator and Particle Swarm Optimization

Neural Networks Lecture 10: Fault Detection and Isolation (FDI) Using Neural Networks

Hover Control for Helicopter Using Neural Network-Based Model Reference Adaptive Controller

Robust Controller Design for Speed Control of an Indirect Field Oriented Induction Machine Drive

General procedure for formulation of robot dynamics STEP 1 STEP 3. Module 9 : Robot Dynamics & controls

The Inverted Pendulum

Emulation of an Animal Limb with Two Degrees of Freedom using HIL

Chapter 15 Periodic Motion

Embedded Control: Applications and Theory

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Robust Global Swing-Up of the Pendubot via Hybrid Control

Angular Momentum Based Controller for Balancing an Inverted Double Pendulum

A GLOBAL STABILIZATION STRATEGY FOR AN INVERTED PENDULUM. B. Srinivasan, P. Huguenin, K. Guemghar, and D. Bonvin

Nonlinear Oscillators: Free Response

Stability Analysis and State Feedback Stabilization of Inverted Pendulum

THE system is controlled by various approaches using the

An ANN based Rotor Flux Estimator for Vector Controlled Induction Motor Drive

MATHEMATICAL MODEL ANALYSIS AND CONTROL ALGORITHMS DESIGN BASED ON STATE FEEDBACK METHOD OF ROTARY INVERTED PENDULUM

Physics 207 Lecture 10. Lecture 10. Employ Newton s Laws in 2D problems with circular motion

Q2. A machine carries a 4.0 kg package from an initial position of d ˆ. = (2.0 m)j at t = 0 to a final position of d ˆ ˆ

Optimal and Robust Tuning of State Feedback Controller for Rotary Inverted Pendulum

Figure 5.16 Compound pendulum: (a) At rest in equilibrium, (b) General position with coordinate θ, Freebody

Lecture 9 Nonlinear Control Design. Course Outline. Exact linearization: example [one-link robot] Exact Feedback Linearization

The content contained in all sections of chapter 6 of the textbook is included on the AP Physics B exam.

M2A2 Problem Sheet 3 - Hamiltonian Mechanics

Topic # Feedback Control Systems

Table of Contents 1.0 OBJECTIVE APPARATUS PROCEDURE LAB PREP WORK POLE-PLACEMENT DESIGN

The Control of an Inverted Pendulum

Quaternion-Based Tracking Control Law Design For Tracking Mode

FEEDBACK CONTROL SYSTEMS

A Normal Form for Energy Shaping: Application to the Furuta Pendulum

Control of Mobile Robots

Transcription:

Application of Neural Networks for Control of Inverted Pendulum VALERI MLADENOV Department of Theoretical Electrical Engineering Technical University of Sofia Sofia, Kliment Ohridski blvd. 8; BULARIA valerim@tu-sofia.bg Abstract: - The balancing of an inverted pendulum by moving a cart along a horizontal track is a classic problem in the area of automatic control. In this paper two Neural Network controllers to swing a pendulum attached to a cart from an initial downwards position to an upright position and maintain that state are described. Both controllers are able to learn the demonstrated behavior which was swinging up and balancing the inverted pendulum in the upright position starting from different initial angles. Key-Words: - neural networks, inverted pendulum, nonlinear control, neural network controller Introduction Inverted pendulum control is an old and challenging problem which quite often serves as a test-bed for a broad range of engineering applications. It is a classic problem in dynamics and control theory and widely used as benchmark for testing control algorithms (PID controllers, neural networks, fuzzy control, genetic algorithms, etc). There are many methods in the literature (see [], [], [] and the references there) that are used to control an inverted pendulum on a cart, such as classical control and machine learning based techniques. Rocket guidance systems, robotics, and crane control are common areas where these methods are applicable. The largest implemented uses are on huge lifting cranes on shipyards. When moving the shipping containers back and forth, the cranes move the box accordingly so that it never swings or sways. It always stays perfectly positioned under the operator even when moving or stopping quickly. The inverted pendulum system inherently has two equilibria, one of which is stable while the other is unstable. The stable equilibrium corresponds to a state in which the pendulum is pointing downwards. In the absence of any control force, the system will naturally return to this state. The stable equilibrium requires no control input to be achieved and, thus, is uninteresting from a control perspective. The unstable equilibrium corresponds to a state in which the pendulum points strictly upwards and, thus, requires a control force to maintain this position. The basic control objective of the inverted pendulum problem is to maintain the unstable equilibrium position when the pendulum initially starts in an upright position. In this paper we utilize two neural network controllers based on Multilayer Feedforward Neural Network (MLFF NN) and Radial Basis Function Network (RBFN) to solve the control problem. The objective would be achieved by means of supervised learning. In practice, the teacher (or supervisor) would be a human performing the task. Since a human teacher is not available in simulation, a nonlinear controller has been selected as a teacher. The paper is organized as follows. In the next chapter a brief description of the mathematical model of the system and the teacher controller is given. Then the control problem is explained in chapter and in chapter the neural networks that control the pendulum are described. Simulation results are presented in chapter and the conclusions are given in the last chapter. Mathematical model The inverted pendulum considered in this paper is given in Figure. A cart equipped with a motor provides horizontal motion, while cart position x, and joint angle θ, measurements can be taken via a quadrature encoder. y F θ M m Figure : Inverted pendulum system L x ISSN: 9-7 9 Issue, Volume, February

The pendulum s mass m is assumed to be equally distributed on its body, so its center of gravity is at its middle of its length at l = L. No damping or other kind of friction has been considered when deriving the model. The parameters of the model that are used in the simulations are below in Table. Table : Table of parameters Parameters Values Mass of pendulum, m. [kg] Mass of cart, M [kg] Length of bar, L [m] Standard gravity, g 9.8 [m/s ] Moment of inertia of bar I = ml.8 w.r.t its CO [kg*m ] The system is underactuated since there are two degrees of freedom which are the pendulum angle and the cart position but only the cart is actuated by horizontal force acting on it. The equations of motion can either be derived by the Lagrangian method or by the free-body diagram method. Using the second one we consider the system can be divided into two separate free-body diagrams as shown in Figure. The coordinates of the center of gravity of the pendulum can be written as The velocities and accelerations in these directions can be calculated from ɺ ɺ = x + lθ cosθ ɺx ɺɺ ɺɺ θ cosθ ɺ θ = x + l l sinθ ɺx ɺɺ ɺɺ θ cosθ ɺ θ = x + l l sinθ yɺ ɺ = lθ sinθ ɺ = lɺɺ θ sinθ lɺ θ cosθ y The force equilibrium for the pendulum in x and y directions can be given as mɺ x + H = V mg my ɺ = The torque and force equilibrium for the pendulum are I ɺ θ + V l sin θ H l cosθ = F Mɺ x H = Combining the above equations we obtain ɺɺ ( ɺɺ ɺ ) ( mx ɺɺ mlɺɺ mlɺ ) l Iθ + mg mlθ sinθ mlθ cosθ l sinθ + θ cosθ θ sinθ cosθ = and x = x + l sinθ y = l cosθ ( I + ml ) ɺ θ mgl sinθ + mlɺx cosθ = F Mɺx ( mx ɺɺ + mlɺɺ θ cosθ mlɺ θ sinθ ) = ( M + m) ɺ x + mlɺɺ θ cosθ mlɺ θ sinθ = F H Mɺ ɺx mɺ ɺ x I ɺ θ F V H mg mɺ ɺ y Figure : Free body diagrams of cart and pendulum V By solving for ɺ, and ɺ θ and introducing the state variables x = θ, x = ɺ θ, x = x, x =ɺ x and u = F, the inverted pendulum system can be described in state space form I + ml ml cosθ ml cosθ ɺɺ θ mgl sinθ = M + m ɺɺ x F + mlɺ θ sinθ ISSN: 9-7 Issue, Volume, February

where ɺɺ θ M + m ml cosθ mgl sinθ = x ( I ml )( M m) m l ml cos I + ml F + ml sin ɺɺ ɺ ɺɺ θ ( M + m) mgl sinθ m l ɺ θ sinθ cosθ ml cosθ F = ɺɺ x ( θ ) m l g sinθ cosθ + ( I + ml ) mlɺ θ sinθ + ( I + ml ) F + + cos θ θ θ θ ( θ ) = ( I + ml )( M + m) m l cos θ = x ( ) sin sin cos cos = I + ml M + m m l cos x M + m mgl x m l x x x ml x u ( )( ) = x () ( )( ) ( ) ( ) m l g sin x cos x + I + ml mlx sin x + I + ml u = I + ml M + m m l cos x The equilibrium points of the inverted pendulum are when the pendulum is at its pending position (stable x = x T, where x is equilibrium point, i.e. [ ] eq any point on the track where the cart moves) and when the pendulum is at its upright position (unstable x = π x T ). equilibrium point, i.e. [ ] eq The state space equations linearized around T T θ ɺ θ x = [ ] are ( M m) mgl ml δ + δ x δ ( I + ml )( M + m) m l ( I ml )( M m) m l δ x + + = + u δ δ x δ x m l g δ x ɺ I ml + ( I + ml )( M + m) m l ( I + ml )( M + m) m l () The eigenvalues of this linearized system with the parameter values given in Table, are λ.979, λ.979, λ =, λ =. Since one of the eigenvalues λ.979 is positive (i.e. in the right half plane), the upright equilibrium point is unstable. The objective of the control problem is to swing up the pendulum from its pending position or from another state to the upright position and balance it in that condition by using supervised learning. The control problem has two parts mainly, swinging up and balancing the inverted pendulum. For swing up control energy based methods (i.e. regulating the mechanical energy of the pendulum) [], [] and for balancing control linear control methods (i.e. PD, PID, state feedback, LQR, etc.) are common among classical techniques. Artificial neural networks, fuzzy logic algorithms and reinforcement learning [], [], [] are used widespreadly in machine learning based approaches both for swing up and balancing control phases. The control problem A simple controller in order to swing up the pendulum from its pendant position can be developed by means of physical reasoning. A constant force can be applied to the cart according to the direction of the angular velocity of the pendulum so that the pendulum would swing till the vicinity of the upright position. The balancing controller is a stabilizing linear controller which can either be a PD, PID or be designed by pole placement techniques (e.g. Ackermann) or by LQR. Here a PD controller is selected. The switching from the swinging controller to the linear controller is done when the θ =,, or when pendulum angle is between [ ] ISSN: 9-7 Issue, Volume, February

θ = [, ]. This controller can formally be presented as below u = u ( θ, θ ) π K, if θ 9 ɺ 7π = K( π θ ) + K ( ɺ θ ), if θ π () 9 F sgn ( ɺ θ ), else ( θ ) + K ( ɺ θ ) where F = N, K =, K =. The closed loop poles obtained with these gains are -., -.,, and. The results of the simulations performed with this controller with the initial conditions [ π ] x = T, without any kind of sensor noise or actuator disturbance are given in Figure. pendulum angle [rad] 7 Simulation results with controller for infinite track length for x = [π ] pendulum angular velocity [rad/s] - -. cart position [m] - - cart velocity [m/s] -. - - -. -8 - Force acting on the cart [N]... -. - -. - -. 8 8 Figure : Simulation results with controller for infinite track length for x = [π ] T It can be observed from the Figure that the pendulum has been stabilized in the upright position, whereas the cart continues its travel with a constant velocity. Neural network controllers The swing-up controller is developed by means of physical reasoning, which is related to applying a constant force according to the direction of the angular velocity of the pendulum. The teacher controller that is used here has only two inputs (i.e. pendulum angle and angular velocity), so it won t be possible to keep the cart s travel limited since no cart position (or velocity) feedback is used. However with this teacher controller selection, it would be easier to visualize and thus to compare the input/output mappings of the teacher and the neural network. The input-output map (also called policy) of this nonlinear controller is given in Figure. Here the horizontal axis represents the pendulum angles and vertical axis represents the pendulum angular velocities, and colours represents the actuator forces corresponding to the pendulum angles and angular velocities. From the teacher, sensor values (i.e. pendulum angle and angular velocity) and actuator values (i.e. input force acting on the cart) can be ISSN: 9-7 Issue, Volume, February

recorded. The simulations are performed by using Matlab/Simulink and Neural Networks Toolbox. The design can be separated into two phases; the teaching and the execution phases. The teaching phase for each demonstration can be presented graphically as it is shown in Figure a. There is sensor noise present in the simulations which is selected as normally distributed random numbers with different initial seeds for each demonstration. Also bandlimited white noise is added at the output of the controller to represent the different choices that a human demonstrator can make in the teaching phase, in order to make the demonstrations less consistent, and therefore a more realistic representation of human behavior. Figure : Input-output map of the teacher controller The simulations contain two phases, the demonstration and the execution phases. In both phases the simulations are performed by using a fixed step solver (ode, Dormand-Prince) with a fixed step size of. sec. Figure b: Execution phase for supervised learning When the teaching phase is finished, the neural network is generated in order to be added to the model using Matlab s gensim command. After that the execution phase shown in Figure b, starts with the nonlinear controller replaced with the neural network, and the actuator disturbance is removed. The Simulink model is shown in Figure c. Figure a: Teaching phase for supervised learning Figure c: Simulation scheme for execution phase During the teaching phase, five examples (or demonstrations) are used, the variance of the normally distributed random numbers for sensor noise in each demonstration is selected as σ ( π ) = 8 rad, which corresponds to a standard deviation of. The actuator disturbance, representing the different actions of the demonstrator has a noise power level of σ =. N and a sampling time of sec, which is close to the free swinging period of the pendulum. This means that the demonstrator can make an inconsistent action every second with a standard deviation of σ =.. N. Each demonstration starts from the pending initial position, x = [ π ], and lasts seconds. The pendulum angles for these demonstrations are given in ISSN: 9-7 Issue, Volume, February

Figure. Figure : Pendulum angles obtained from demonstrations It can be observed from Figure that in all demonstrations the pendulum is stabilized in the upright position. The input output (I/O) map of the teacher controller with the demonstrated trajectories in the θ, θ ɺ portion of the state space on top of it, is given in Figure 7. This figure can give a rough idea about which parts of the input-output space was visited during demonstrations. trial-and-error according to the success of the network in the testing phase. The activation function that is used in the hidden layer is tan-sigmoid, whereas a linear activation function is used in the output layer. The network has been trained using a Levenberg-Marquardt backpropagation (i.e. trainlm training algorithm from Matlab) with a constant learning rate of.. The weights are updated for epochs. The desired m.s.e. levels are selected as the variance of the actuator disturbances, since the neural network would start fitting the noise after that value. The performance (mean square error) obtained with this network is given in Figure 8. It can be observed that the m.s.e. has converged but the desired mse level has not been reached although the m.s.e. level at th epoch is close (approx..) to the desired one (.). The plant input data (i.e. the target for the neural network), and its approximation obtained by the neural network at the end of the training session which is related to these examples are presented in Figure 9. Simulation results In this chapter we present the results obtained by using the training data given in the last chapter.. Training of the Neural Networks.. Multilayer Feedforward Neural Network The MLFF NN which was selected in order to approximate the behaviour of the teacher (controller) has two inputs (pendulum angle and angular velocity) and a single hidden layer with hidden neurons in it. Figure 8: Training results with MLFF NN The green plot is obtained by using Matlab s sim command in order to observe how well the neural network will fit the desired output with the pendulum angles and angular velocities from the demonstrations. Figure 7: θ, θ ɺ trajectories on top of I/O map The number of hidden layer neurons are selected by Figure 9: Actuator forces from demonstrations and MLFF NN approximation A closer look might be taken to the approximation of the rd demonstration in Figure 9 and it can be observed ISSN: 9-7 Issue, Volume, February

from Figure that the neural network has made a good approximation. The plant input data (i.e. the target for the neural network), and its approximation obtained by the neural network at the end of the training session which is related to these examples can be presented below in Figure. The green plot is obtained by using Matlab s sim command in order to observe how well the neural network will fit the desired output with the pendulum angles and angular velocities from the demonstrations. Figure : Actuator forces from rd demonstration and MLFF NN approximation.. Radial Basis Function Network The RBFN which was selected in order to approximate the behavior of the teacher (controller) has two inputs (pendulum angle and angular velocity). The centers of the gaussian functions are selected in a supervised manner. The training starts initially with no gaussian functions in the hidden layer, then gaussian functions are added incrementally such that the error between the target and network s output becomes smaller. The spread (width of the gaussian functions) parameter is selected equal to, by trial-and-error according to the success of the network in the testing phase. The number of gaussian functions that are used in the hidden layer are. The desired m.s.e. levels are selected as the variance of the actuator disturbances, since the neural network would start fitting the noise after that value. The performance (mean square error) obtained with this network is given in Figure. It can be observed that the mse has converged but the desired mse level has not been reached although the m.s.e. level at the end of training is close (approx..7) to the desired one. The training has stopped automatically after the radial basis neurons started overlapping. Figure : Actuator forces from demonstrations and RBFN approximation A closer look might be taken to the approximation of the rd demonstration in Figure and it can be observed from Figure that the neural network has made a fair approximation.. Testing of the Neural Networks The neural networks that are trained from the demonstration data are tested in two ways. First a data set that was not encountered during the training phase will be used. Figure : Actuator forces from rd demonstration and RBFN approximation Figure : Training results with RBFN This will be performed by simulating the inverted pendulum with initial pendulum angles incremented by π 8 (i.e. ) between θ = [, π ]. The duration of the simulations in the execution phase is ISSN: 9-7 Issue, Volume, February

taken as seconds. Secondly, the input-output maps of the neural networks will be compared in order to be able to determine how well the network has learned and generalized the control policy of the teacher. This will also help assessing the extrapolation capabilities of the designed networks... Multilayer Feedforward Neural Network The pendulum angles for several initial pendulum angles from simulations in the execution phase are below in Figure. The MLFF NN managed to swing up and stabilize the pendulum in out of initial angles which corresponds to a success rate of %. Figure : Input-output map of MLFF NN. Initial pendulum angle, θ =. Initial pendulum angle, θ =... -. -. - 7 Initial pendulum angle, θ =.. Initial pendulum angle, θ =.988....9 Figure : Several pendulum angle plots from execution phase The input output map of the neural controller can be given as follows in Figure. It can be observed from Figure that the controller has learned most of the swinging region that is demonstrated to it, and it has made generalization errors in some regions marked with circles. The linear control region on the right and the angle to switch from swinging control to linear control has been learned better compared to the one on the left. This might be due to insufficiency of demonstrated data in those regions... Radial Basis Function Network The pendulum angles for several initial pendulum angles from simulations in the execution phase are below in Figure. The RBFN managed to swing up and stabilize the pendulum in out of initial angles which corresponds to a success rate of 99.7 %. The input output map of the neural controller can be given as follows in Figure 7. The circles at the corners of the plot have nearly zero output, this would most probably be related to the fact that radial basis functions give nearly zero output outside their input range. The network has learned to separate the swing up region into two, even though there are some generalization errors. ISSN: 9-7 Issue, Volume, February

. Initial pendulum angle, θ =.9 Initial pendulum angle, θ =.7.. -. -. - Initial pendulum angle, θ =.. Initial pendulum angle, θ =.998.... -. Figure : Several pendulum angle plots from execution phase Figure 7: Input-output map of RBFN. Comparison of the Results The first thing that can be noticed from the results is that the interpolation capabilities the networks are quite close to each other. This can be observed from the fact that all networks can swing up and balance the inverted pendulum starting from any initial pendulum angle between θ = [, π ]. It can further be examined from the input-output maps, since the parts of the input-output space which were related to the demonstrated data was learned quite well by all networks. The least training error was obtained by using a MLFF NN. The extrapolation capabilities of MLFF NN are better compared to RBFN. The RBFN also contains more hidden neurons (gaussian functions) compared MLFF NN. A better generalization is also possible for MLFF NN by taking more and different training data. Conclusion The supervised learning method of the Neural Networks controllers consists of two phases, demonstration and execution. In the demonstration phase necessary data is collected from the supervisor in order to design a neural network. The objective of the neural network is to imitate the control law of the teacher. In the execution phase the neural controller is tested by replacing the teacher. The Neural Network control strategy is tested by using a supervisor, with a control policy that ignores the finite track length of the cart. By this way, it is easier to understand what the neural controller that is used to imitate the teacher does, by comparing the input/output (state-action) mappings using D colour plots. The policy of a demonstrator (i.e. designed controller in this case) is not a function that can be directly measured by giving inputs and measuring outputs, so learning the complete I/O map and making a perfect generalization would not be possible. The complexity of the function that would be approximated is also important since the function may include both continuous and discontinuous components (such as switches between different control laws). All of the designed neural networks controllers (MLFF NN and RBFN) are able to learn the demonstrated behaviour which was swinging up and balancing the inverted pendulum in the upright position starting from different initial angles. The MLFF NN did a better job by means of generalization and obtaining lowest training error compared to the RBFN. ISSN: 9-7 7 Issue, Volume, February

References: [] M. Bugeja, Nonlinear Swing-Up and Stabilizing Control of an Inverted Pendulum System, Proc. of EUROCON Computer as a Tool, Ljubljana, Slovenia, vol., 7-,. [] K. Yoshida, Swing-up control of an inverted pendulum by energy based methods, Proceedings of the American Control Conference, pp. - 7, 999. [] Darío Maravall, Changjiu Zhou, Javier Alonso, Hybrid Fuzzy Control of the Inverted Pendulum via Vertical Forces, International Journal of Intelligent Systems, vol., pp. 9,. [] The Reinforcement Learning Toolbox, Reinforcement Learning for Optimal Control Tasks,. Neumann, May. http://www.igi.tugraz.at/riltoolbox/thesis/diplomarbeit.pdf [] H. Miyamoto, J. Morimoto, K. Doya, M. Kawato, Reinforcement learning with via-point representation, Neural Networks, vol. 7, pp. 99,. ISSN: 9-7 8 Issue, Volume, February