Application of Neural Networks for Control of Inverted Pendulum

Application of Neural Networks for Control of Inverted Pendulum VALERI MLADENOV Department of Theoretical Electrical Engineering Technical University of Sofia Sofia, Kliment Ohridski blvd. 8; BULARIA valerim@tu-sofia.bg Abstract: - The balancing of an inverted pendulum by moving a cart along a horizontal track is a classic problem in the area of automatic control. In this paper two Neural Network controllers to swing a pendulum attached to a cart from an initial downwards position to an upright position and maintain that state are described. Both controllers are able to learn the demonstrated behavior which was swinging up and balancing the inverted pendulum in the upright position starting from different initial angles. Key-Words: - neural networks, inverted pendulum, nonlinear control, neural network controller Introduction Inverted pendulum control is an old and challenging problem which quite often serves as a test-bed for a broad range of engineering applications. It is a classic problem in dynamics and control theory and widely used as benchmark for testing control algorithms (PID controllers, neural networks, fuzzy control, genetic algorithms, etc). There are many methods in the literature (see [], [], [] and the references there) that are used to control an inverted pendulum on a cart, such as classical control and machine learning based techniques. Rocket guidance systems, robotics, and crane control are common areas where these methods are applicable. The largest implemented uses are on huge lifting cranes on shipyards. When moving the shipping containers back and forth, the cranes move the box accordingly so that it never swings or sways. It always stays perfectly positioned under the operator even when moving or stopping quickly. The inverted pendulum system inherently has two equilibria, one of which is stable while the other is unstable. The stable equilibrium corresponds to a state in which the pendulum is pointing downwards. In the absence of any control force, the system will naturally return to this state. The stable equilibrium requires no control input to be achieved and, thus, is uninteresting from a control perspective. The unstable equilibrium corresponds to a state in which the pendulum points strictly upwards and, thus, requires a control force to maintain this position. The basic control objective of the inverted pendulum problem is to maintain the unstable equilibrium position when the pendulum initially starts in an upright position. In this paper we utilize two neural network controllers based on Multilayer Feedforward Neural Network (MLFF NN) and Radial Basis Function Network (RBFN) to solve the control problem. The objective would be achieved by means of supervised learning. In practice, the teacher (or supervisor) would be a human performing the task. Since a human teacher is not available in simulation, a nonlinear controller has been selected as a teacher. The paper is organized as follows. In the next chapter a brief description of the mathematical model of the system and the teacher controller is given. Then the control problem is explained in chapter and in chapter the neural networks that control the pendulum are described. Simulation results are presented in chapter and the conclusions are given in the last chapter. Mathematical model The inverted pendulum considered in this paper is given in Figure. A cart equipped with a motor provides horizontal motion, while cart position x, and joint angle θ, measurements can be taken via a quadrature encoder. y F θ M m Figure : Inverted pendulum system L x ISSN: 9-7 9 Issue, Volume, February

The pendulum s mass m is assumed to be equally distributed on its body, so its center of gravity is at its middle of its length at l = L. No damping or other kind of friction has been considered when deriving the model. The parameters of the model that are used in the simulations are below in Table. Table : Table of parameters Parameters Values Mass of pendulum, m. [kg] Mass of cart, M [kg] Length of bar, L [m] Standard gravity, g 9.8 [m/s ] Moment of inertia of bar I = ml.8 w.r.t its CO [kg*m ] The system is underactuated since there are two degrees of freedom which are the pendulum angle and the cart position but only the cart is actuated by horizontal force acting on it. The equations of motion can either be derived by the Lagrangian method or by the free-body diagram method. Using the second one we consider the system can be divided into two separate free-body diagrams as shown in Figure. The coordinates of the center of gravity of the pendulum can be written as The velocities and accelerations in these directions can be calculated from ɺ ɺ = x + lθ cosθ ɺx ɺɺ ɺɺ θ cosθ ɺ θ = x + l l sinθ ɺx ɺɺ ɺɺ θ cosθ ɺ θ = x + l l sinθ yɺ ɺ = lθ sinθ ɺ = lɺɺ θ sinθ lɺ θ cosθ y The force equilibrium for the pendulum in x and y directions can be given as mɺ x + H = V mg my ɺ = The torque and force equilibrium for the pendulum are I ɺ θ + V l sin θ H l cosθ = F Mɺ x H = Combining the above equations we obtain ɺɺ ( ɺɺ ɺ ) ( mx ɺɺ mlɺɺ mlɺ ) l Iθ + mg mlθ sinθ mlθ cosθ l sinθ + θ cosθ θ sinθ cosθ = and x = x + l sinθ y = l cosθ ( I + ml ) ɺ θ mgl sinθ + mlɺx cosθ = F Mɺx ( mx ɺɺ + mlɺɺ θ cosθ mlɺ θ sinθ ) = ( M + m) ɺ x + mlɺɺ θ cosθ mlɺ θ sinθ = F H Mɺ ɺx mɺ ɺ x I ɺ θ F V H mg mɺ ɺ y Figure : Free body diagrams of cart and pendulum V By solving for ɺ, and ɺ θ and introducing the state variables x = θ, x = ɺ θ, x = x, x =ɺ x and u = F, the inverted pendulum system can be described in state space form I + ml ml cosθ ml cosθ ɺɺ θ mgl sinθ = M + m ɺɺ x F + mlɺ θ sinθ ISSN: 9-7 Issue, Volume, February

where ɺɺ θ M + m ml cosθ mgl sinθ = x ( I ml )( M m) m l ml cos I + ml F + ml sin ɺɺ ɺ ɺɺ θ ( M + m) mgl sinθ m l ɺ θ sinθ cosθ ml cosθ F = ɺɺ x ( θ ) m l g sinθ cosθ + ( I + ml ) mlɺ θ sinθ + ( I + ml ) F + + cos θ θ θ θ ( θ ) = ( I + ml )( M + m) m l cos θ = x ( ) sin sin cos cos = I + ml M + m m l cos x M + m mgl x m l x x x ml x u ( )( ) = x () ( )( ) ( ) ( ) m l g sin x cos x + I + ml mlx sin x + I + ml u = I + ml M + m m l cos x The equilibrium points of the inverted pendulum are when the pendulum is at its pending position (stable x = x T, where x is equilibrium point, i.e. [ ] eq any point on the track where the cart moves) and when the pendulum is at its upright position (unstable x = π x T ). equilibrium point, i.e. [ ] eq The state space equations linearized around T T θ ɺ θ x = [ ] are ( M m) mgl ml δ + δ x δ ( I + ml )( M + m) m l ( I ml )( M m) m l δ x + + = + u δ δ x δ x m l g δ x ɺ I ml + ( I + ml )( M + m) m l ( I + ml )( M + m) m l () The eigenvalues of this linearized system with the parameter values given in Table, are λ.979, λ.979, λ =, λ =. Since one of the eigenvalues λ.979 is positive (i.e. in the right half plane), the upright equilibrium point is unstable. The objective of the control problem is to swing up the pendulum from its pending position or from another state to the upright position and balance it in that condition by using supervised learning. The control problem has two parts mainly, swinging up and balancing the inverted pendulum. For swing up control energy based methods (i.e. regulating the mechanical energy of the pendulum) [], [] and for balancing control linear control methods (i.e. PD, PID, state feedback, LQR, etc.) are common among classical techniques. Artificial neural networks, fuzzy logic algorithms and reinforcement learning [], [], [] are used widespreadly in machine learning based approaches both for swing up and balancing control phases. The control problem A simple controller in order to swing up the pendulum from its pendant position can be developed by means of physical reasoning. A constant force can be applied to the cart according to the direction of the angular velocity of the pendulum so that the pendulum would swing till the vicinity of the upright position. The balancing controller is a stabilizing linear controller which can either be a PD, PID or be designed by pole placement techniques (e.g. Ackermann) or by LQR. Here a PD controller is selected. The switching from the swinging controller to the linear controller is done when the θ =,, or when pendulum angle is between [ ] ISSN: 9-7 Issue, Volume, February

θ = [, ]. This controller can formally be presented as below u = u ( θ, θ ) π K, if θ 9 ɺ 7π = K( π θ ) + K ( ɺ θ ), if θ π () 9 F sgn ( ɺ θ ), else ( θ ) + K ( ɺ θ ) where F = N, K =, K =. The closed loop poles obtained with these gains are -., -.,, and. The results of the simulations performed with this controller with the initial conditions [ π ] x = T, without any kind of sensor noise or actuator disturbance are given in Figure. pendulum angle [rad] 7 Simulation results with controller for infinite track length for x = [π ] pendulum angular velocity [rad/s] - -. cart position [m] - - cart velocity [m/s] -. - - -. -8 - Force acting on the cart [N]... -. - -. - -. 8 8 Figure : Simulation results with controller for infinite track length for x = [π ] T It can be observed from the Figure that the pendulum has been stabilized in the upright position, whereas the cart continues its travel with a constant velocity. Neural network controllers The swing-up controller is developed by means of physical reasoning, which is related to applying a constant force according to the direction of the angular velocity of the pendulum. The teacher controller that is used here has only two inputs (i.e. pendulum angle and angular velocity), so it won t be possible to keep the cart s travel limited since no cart position (or velocity) feedback is used. However with this teacher controller selection, it would be easier to visualize and thus to compare the input/output mappings of the teacher and the neural network. The input-output map (also called policy) of this nonlinear controller is given in Figure. Here the horizontal axis represents the pendulum angles and vertical axis represents the pendulum angular velocities, and colours represents the actuator forces corresponding to the pendulum angles and angular velocities. From the teacher, sensor values (i.e. pendulum angle and angular velocity) and actuator values (i.e. input force acting on the cart) can be ISSN: 9-7 Issue, Volume, February

recorded. The simulations are performed by using Matlab/Simulink and Neural Networks Toolbox. The design can be separated into two phases; the teaching and the execution phases. The teaching phase for each demonstration can be presented graphically as it is shown in Figure a. There is sensor noise present in the simulations which is selected as normally distributed random numbers with different initial seeds for each demonstration. Also bandlimited white noise is added at the output of the controller to represent the different choices that a human demonstrator can make in the teaching phase, in order to make the demonstrations less consistent, and therefore a more realistic representation of human behavior. Figure : Input-output map of the teacher controller The simulations contain two phases, the demonstration and the execution phases. In both phases the simulations are performed by using a fixed step solver (ode, Dormand-Prince) with a fixed step size of. sec. Figure b: Execution phase for supervised learning When the teaching phase is finished, the neural network is generated in order to be added to the model using Matlab s gensim command. After that the execution phase shown in Figure b, starts with the nonlinear controller replaced with the neural network, and the actuator disturbance is removed. The Simulink model is shown in Figure c. Figure a: Teaching phase for supervised learning Figure c: Simulation scheme for execution phase During the teaching phase, five examples (or demonstrations) are used, the variance of the normally distributed random numbers for sensor noise in each demonstration is selected as σ ( π ) = 8 rad, which corresponds to a standard deviation of. The actuator disturbance, representing the different actions of the demonstrator has a noise power level of σ =. N and a sampling time of sec, which is close to the free swinging period of the pendulum. This means that the demonstrator can make an inconsistent action every second with a standard deviation of σ =.. N. Each demonstration starts from the pending initial position, x = [ π ], and lasts seconds. The pendulum angles for these demonstrations are given in ISSN: 9-7 Issue, Volume, February

Figure. Figure : Pendulum angles obtained from demonstrations It can be observed from Figure that in all demonstrations the pendulum is stabilized in the upright position. The input output (I/O) map of the teacher controller with the demonstrated trajectories in the θ, θ ɺ portion of the state space on top of it, is given in Figure 7. This figure can give a rough idea about which parts of the input-output space was visited during demonstrations. trial-and-error according to the success of the network in the testing phase. The activation function that is used in the hidden layer is tan-sigmoid, whereas a linear activation function is used in the output layer. The network has been trained using a Levenberg-Marquardt backpropagation (i.e. trainlm training algorithm from Matlab) with a constant learning rate of.. The weights are updated for epochs. The desired m.s.e. levels are selected as the variance of the actuator disturbances, since the neural network would start fitting the noise after that value. The performance (mean square error) obtained with this network is given in Figure 8. It can be observed that the m.s.e. has converged but the desired mse level has not been reached although the m.s.e. level at th epoch is close (approx..) to the desired one (.). The plant input data (i.e. the target for the neural network), and its approximation obtained by the neural network at the end of the training session which is related to these examples are presented in Figure 9. Simulation results In this chapter we present the results obtained by using the training data given in the last chapter.. Training of the Neural Networks.. Multilayer Feedforward Neural Network The MLFF NN which was selected in order to approximate the behaviour of the teacher (controller) has two inputs (pendulum angle and angular velocity) and a single hidden layer with hidden neurons in it. Figure 8: Training results with MLFF NN The green plot is obtained by using Matlab s sim command in order to observe how well the neural network will fit the desired output with the pendulum angles and angular velocities from the demonstrations. Figure 7: θ, θ ɺ trajectories on top of I/O map The number of hidden layer neurons are selected by Figure 9: Actuator forces from demonstrations and MLFF NN approximation A closer look might be taken to the approximation of the rd demonstration in Figure 9 and it can be observed ISSN: 9-7 Issue, Volume, February

from Figure that the neural network has made a good approximation. The plant input data (i.e. the target for the neural network), and its approximation obtained by the neural network at the end of the training session which is related to these examples can be presented below in Figure. The green plot is obtained by using Matlab s sim command in order to observe how well the neural network will fit the desired output with the pendulum angles and angular velocities from the demonstrations. Figure : Actuator forces from rd demonstration and MLFF NN approximation.. Radial Basis Function Network The RBFN which was selected in order to approximate the behavior of the teacher (controller) has two inputs (pendulum angle and angular velocity). The centers of the gaussian functions are selected in a supervised manner. The training starts initially with no gaussian functions in the hidden layer, then gaussian functions are added incrementally such that the error between the target and network s output becomes smaller. The spread (width of the gaussian functions) parameter is selected equal to, by trial-and-error according to the success of the network in the testing phase. The number of gaussian functions that are used in the hidden layer are. The desired m.s.e. levels are selected as the variance of the actuator disturbances, since the neural network would start fitting the noise after that value. The performance (mean square error) obtained with this network is given in Figure. It can be observed that the mse has converged but the desired mse level has not been reached although the m.s.e. level at the end of training is close (approx..7) to the desired one. The training has stopped automatically after the radial basis neurons started overlapping. Figure : Actuator forces from demonstrations and RBFN approximation A closer look might be taken to the approximation of the rd demonstration in Figure and it can be observed from Figure that the neural network has made a fair approximation.. Testing of the Neural Networks The neural networks that are trained from the demonstration data are tested in two ways. First a data set that was not encountered during the training phase will be used. Figure : Actuator forces from rd demonstration and RBFN approximation Figure : Training results with RBFN This will be performed by simulating the inverted pendulum with initial pendulum angles incremented by π 8 (i.e. ) between θ = [, π ]. The duration of the simulations in the execution phase is ISSN: 9-7 Issue, Volume, February

taken as seconds. Secondly, the input-output maps of the neural networks will be compared in order to be able to determine how well the network has learned and generalized the control policy of the teacher. This will also help assessing the extrapolation capabilities of the designed networks... Multilayer Feedforward Neural Network The pendulum angles for several initial pendulum angles from simulations in the execution phase are below in Figure. The MLFF NN managed to swing up and stabilize the pendulum in out of initial angles which corresponds to a success rate of %. Figure : Input-output map of MLFF NN. Initial pendulum angle, θ =. Initial pendulum angle, θ =... -. -. - 7 Initial pendulum angle, θ =.. Initial pendulum angle, θ =.988....9 Figure : Several pendulum angle plots from execution phase The input output map of the neural controller can be given as follows in Figure. It can be observed from Figure that the controller has learned most of the swinging region that is demonstrated to it, and it has made generalization errors in some regions marked with circles. The linear control region on the right and the angle to switch from swinging control to linear control has been learned better compared to the one on the left. This might be due to insufficiency of demonstrated data in those regions... Radial Basis Function Network The pendulum angles for several initial pendulum angles from simulations in the execution phase are below in Figure. The RBFN managed to swing up and stabilize the pendulum in out of initial angles which corresponds to a success rate of 99.7 %. The input output map of the neural controller can be given as follows in Figure 7. The circles at the corners of the plot have nearly zero output, this would most probably be related to the fact that radial basis functions give nearly zero output outside their input range. The network has learned to separate the swing up region into two, even though there are some generalization errors. ISSN: 9-7 Issue, Volume, February

. Initial pendulum angle, θ =.9 Initial pendulum angle, θ =.7.. -. -. - Initial pendulum angle, θ =.. Initial pendulum angle, θ =.998.... -. Figure : Several pendulum angle plots from execution phase Figure 7: Input-output map of RBFN. Comparison of the Results The first thing that can be noticed from the results is that the interpolation capabilities the networks are quite close to each other. This can be observed from the fact that all networks can swing up and balance the inverted pendulum starting from any initial pendulum angle between θ = [, π ]. It can further be examined from the input-output maps, since the parts of the input-output space which were related to the demonstrated data was learned quite well by all networks. The least training error was obtained by using a MLFF NN. The extrapolation capabilities of MLFF NN are better compared to RBFN. The RBFN also contains more hidden neurons (gaussian functions) compared MLFF NN. A better generalization is also possible for MLFF NN by taking more and different training data. Conclusion The supervised learning method of the Neural Networks controllers consists of two phases, demonstration and execution. In the demonstration phase necessary data is collected from the supervisor in order to design a neural network. The objective of the neural network is to imitate the control law of the teacher. In the execution phase the neural controller is tested by replacing the teacher. The Neural Network control strategy is tested by using a supervisor, with a control policy that ignores the finite track length of the cart. By this way, it is easier to understand what the neural controller that is used to imitate the teacher does, by comparing the input/output (state-action) mappings using D colour plots. The policy of a demonstrator (i.e. designed controller in this case) is not a function that can be directly measured by giving inputs and measuring outputs, so learning the complete I/O map and making a perfect generalization would not be possible. The complexity of the function that would be approximated is also important since the function may include both continuous and discontinuous components (such as switches between different control laws). All of the designed neural networks controllers (MLFF NN and RBFN) are able to learn the demonstrated behaviour which was swinging up and balancing the inverted pendulum in the upright position starting from different initial angles. The MLFF NN did a better job by means of generalization and obtaining lowest training error compared to the RBFN. ISSN: 9-7 7 Issue, Volume, February

References: [] M. Bugeja, Nonlinear Swing-Up and Stabilizing Control of an Inverted Pendulum System, Proc. of EUROCON Computer as a Tool, Ljubljana, Slovenia, vol., 7-,. [] K. Yoshida, Swing-up control of an inverted pendulum by energy based methods, Proceedings of the American Control Conference, pp. - 7, 999. [] Darío Maravall, Changjiu Zhou, Javier Alonso, Hybrid Fuzzy Control of the Inverted Pendulum via Vertical Forces, International Journal of Intelligent Systems, vol., pp. 9,. [] The Reinforcement Learning Toolbox, Reinforcement Learning for Optimal Control Tasks,. Neumann, May. http://www.igi.tugraz.at/riltoolbox/thesis/diplomarbeit.pdf [] H. Miyamoto, J. Morimoto, K. Doya, M. Kawato, Reinforcement learning with via-point representation, Neural Networks, vol. 7, pp. 99,. ISSN: 9-7 8 Issue, Volume, February