Stability of backpropagation learning rule

Similar documents
Computational Intelligence Winter Term 2017/18

Computational Intelligence

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Artifical Neural Networks

8. Lecture Neural Networks

International Journal of Advanced Research in Computer Science and Software Engineering

Backpropagation Neural Net

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Deep Learning & Artificial Intelligence WS 2018/2019

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Identification of Non-Linear Systems, Based on Neural Networks, with Applications at Fuzzy Systems

ROTARY INVERTED PENDULUM AND CONTROL OF ROTARY INVERTED PENDULUM BY ARTIFICIAL NEURAL NETWORK

4. Multilayer Perceptrons

A New Weight Initialization using Statistically Resilient Method and Moore-Penrose Inverse Method for SFANN

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Based on the original slides of Hung-yi Lee

CSC 578 Neural Networks and Deep Learning

NONLINEAR IDENTIFICATION ON BASED RBF NEURAL NETWORK

Multilayer Perceptrons and Backpropagation

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Artificial Neural Networks

A Particle Swarm Optimization (PSO) Primer

Defining Feedforward Network Architecture. net = newff([pn],[s1 S2... SN],{TF1 TF2... TFN},BTF,LF,PF);

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE

Identification of two-mass system parameters using neural networks

Machine Learning: Multi Layer Perceptrons

Neural Network to Control Output of Hidden Node According to Input Patterns

Artificial Neural Networks

Bidirectional Representation and Backpropagation Learning

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Intro to Neural Networks and Deep Learning

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

y(x n, w) t n 2. (1)

Lab 5: 16 th April Exercises on Neural Networks

Statistical Machine Learning from Data

Novel determination of dierential-equation solutions: universal approximation method

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

ADAPTIVE NEURAL NETWORK CONTROL OF MECHATRONICS OBJECTS

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

epochs epochs

ECE521 Lectures 9 Fully Connected Neural Networks

Learning Neural Networks

Artificial Neural Networks

Modelling of Pehlivan-Uyaroglu_2010 Chaotic System via Feed Forward Neural Network and Recurrent Neural Networks

Address for Correspondence

PATTERN RECOGNITION FOR PARTIAL DISCHARGE DIAGNOSIS OF POWER TRANSFORMER

Simple neuron model Components of simple neuron

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

Artificial Intelligence

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Tutorial on Tangent Propagation

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Lecture 17: Neural Networks and Deep Learning

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Choosing Variables with a Genetic Algorithm for Econometric models based on Neural Networks learning and adaptation.

Lecture 4: Perceptrons and Multilayer Perceptrons

Single layer NN. Neuron Model

POWER SYSTEM DYNAMIC SECURITY ASSESSMENT CLASSICAL TO MODERN APPROACH

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Rprop Using the Natural Gradient

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

A Novel Activity Detection Method

*** (RASP1) ( - ***

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural network modelling of reinforced concrete beam shear capacity

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Least Mean Squares Regression. Machine Learning Fall 2018

Identification and Control of Nonlinear Systems using Soft Computing Techniques

Artificial Neural Networks. Edward Gatt

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Optimization of Gaussian Process Hyperparameters using Rprop

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Data Mining Part 5. Prediction

Lecture 6. Regression

Fundamentals of Metaheuristics

1 What a Neural Network Computes

Introduction to Neural Networks

GENETICALLY OPTIMIZED ARTIFICIAL NEURAL NETWORK BASED OPTIMUM DESIGN OF SINGLY AND DOUBLY REINFORCED CONCRETE BEAMS

Feedforward Neural Nets and Backpropagation

Artificial Neural Networks (ANN)

Chapter 3 Supervised learning:

Fundamentals of Neural Network

EPL442: Computational

A Priori and A Posteriori Machine Learning and Nonlinear Artificial Neural Networks

CSC321 Lecture 5 Learning in a Single Neuron

A neuro-fuzzy system for portfolio evaluation

Neural Networks Task Sheet 2. Due date: May

CE213 Artificial Intelligence Lecture 14

Intelligent Decision Support for New Product Development: A Consumer-Oriented Approach

Revision: Neural Network

Day 3 Lecture 3. Optimizing deep networks

Optimization for neural networks

More Tips for Training Neural Network. Hung-yi Lee

MODULE -4 BAYEIAN LEARNING

FORECASTING OF ECONOMIC QUANTITIES USING FUZZY AUTOREGRESSIVE MODEL AND FUZZY NEURAL NETWORK

Transcription:

Stability of backpropagation learning rule Petr Krupanský, Petr Pivoňka, Jiří Dohnal Department of Control and Instrumentation Brno University of Technology Božetěchova 2, 612 66 Brno, Czech republic krupan, pivonka, dohnalj@feec.vutbr.cz Abstract A control of real processes requires different approach to neural network learning. The presented modification of backpropagation learning algorithm changes a meaning of learning constants. A base of modification is stability condition of learning dynamics. Keywords: Neural networks, ARMA model, control, backpropagation, stability, the largest singular value, Euclidean norm. 1 Introduction The backpropagation algorithm of learning has suitable properties for an online adaptation. The algorithm is low time and memory consuming. A disadvantage is relatively low convergence ability and a possibility of non-stability in case we choose improperly learning constants. On-line learning simultaneously with running control can bring about a number of problems which can substantially influence the criteria common in learning neural networks. One of the main criteria of the quality of learning processes is the possibility of fast convergence. Specific problems in control of real processes require a different approach to the neural network learning algorithm. An optimum speed of learning must be chosen. It is of crucial importance that the algorithm is able to complete modification of weights in a certain limited time period. In connection with control of real processes the stability of the learning algorithm must be considered. 2 Backpropagation algorithm The learning algorithm is based on minimization of error E k between output y k and the desired value d k : E k = 1 2 m j=1 [y k ( j) d k ( j)] 2 (1) where: y k ( j)... response of network to j-th input pattern in step k, d k ( j)... desired response to j-th input pattern in step k and m... number of input patterns. Gradient of error to weights of network (sigmoid as the output transient function): w k (i, j) = y k (i) where: y k (i) u k (i) u k (i, j) w k (i, j) = ( ) = [ y k (i) d k (i)] y k (i)[1 y k (i)] x k ( j) = = δ k (i)x k ( j) (2) y k (i)... derivation of the error. Difference between the neuron output and the desired value. For inner layer: x j=1 k ( j), = n y k (i) = y k (i) u k (i, j)... derivation of output transient function of neuron,

u k (i) w k (i, j).. derivation of the sum output with respect to a relevant weight. The previous neuron output, x k ( j)... the previous neuron output (the neuron input), δ k (i)... derivation of the error with respect to the sum output. Learning parameters α and β are used. α represents the speed of movement on gradient, and β inertia of the previous value. The final function for the change of weights is: w k (i, j) = αδ k 1 (i)x k 1 ( j)+β w k 1 (i, j) (3) These values w k were calculated for the whole network, and the weights are changed for the next step. 3 Modification for ARMA model If the neural network is used for construction of ARMA model of the system, the network configuration is reduced to one neuron with linear output function. This structure is fully convenient to ARMA model. Also x, a and the system output d are constants, independent of learning dynamics, for one pattern learning. Gradient of local error: w k (i) = y k u k y k u k w k (i) = (y k d) 1 x(i) (4) where: d... constant value of the system output, x(i)... constant value of input vector. The formula for w k : The weights: w k (i) = α(y k d)x(i)+β w k 1 (i) (5) w k+1 (i) = w k (i)+ w k (i) = The vector form of formula: w k = w k (i) α(y k d)x(i)+ +β[w k (i) w k 1 (i)] (6) = w k 1 α[y k 1 d]x+ +β[w k 1 w k 2 ] (7) 4 Stability conditions The formula for y k : y k = w k x T (8) Let us assume the system can be approximated by the formula: d = ax T (9) where: a... vector of the system parameters. Substituting formula 8 and 9 into 7, we get: w k = w k 1 α ( w k 1 x T ax T) x+ +β(w k 1 w k 2 ) (1) By modification we get the formula in Z-transformation: w(z) = w(z)z 1 α [ w(z)z 1 x T ax T] x+ And get the final formula: w(z) = ( αax T x ) +β [ w(z)z 1 w(z)z 2] (11) (I+ [ αx T x βi I ] z 1 + βiz 2) 1 = = N(z)D(z) 1 (12) where: I... unit matrix with suitable dimension. For simplicity and inference it is possible replace matrix αx T x by scalar q and unit matrixes I by scalar 1. The condition for stability of this transfer function is observed, if all roots of characteristic polynomial are inside of unit circle in Z plane. The characteristic polynomial D 1 (for the system with one input) is described by the formula: D 1 (z) = 1+[q β 1] z 1 + β z 2 (13) The roots of the characteristic polynomial D 1 (z) are: z 1,2 = q+β+1± (q β) 2 2(q+β)+1 2 (14) The absolute values of the roots must be smaller than 1: z 1,2 < 1 (15)

By substituting, we get the condition for q: < q < (2+2β) (16) For the system with more inputs will be as follows: < αx T x S < 2I+2Iβ S (17) where: S... the largest singular value norm. The solution for α has lower limit α >. Higher limit is: α < 2I+2Iβ x T x = 2I+2Iβ S S x T = x S = 2+2β = 2+2β S (18) 5 Batch learning on a number of patterns If the network is learnt on h patterns in the learning set, then for i = 1...h the network output is given by the formula: y k,i = w k x T i (19) The system output is: d i = ax T i (2) The upper limit for α is given by the formula: α < 2I+2Iβ S 2+2β = (24) h S x T h S i x i x T i x i If the patterns in the learning set are not too far apart, the largest singular value can be substituted by much simpler Euclidean norm: A E = i a 2 i, j (25) j Then the simplified computation is the following: α 2+2β (26) h E x T i x i 6 Examples of a learning behaviour 6.1 Learning dynamics The example shows the learning behaviour for neuron with two weights. The first figure fig.6.1 shows zeros and poles of the characteristic polynomial of learning dynamics with α on edge of the stability (α = 2+2β ). The second figure fig.6.1 shows the step response of learning dynamics. 1 Pole Zero Map The modification of weights will be as follows: w k = w k 1 α h [ (wk 1 x i T ax i T ) x i ] + +β(w k 1 w k 2 ) (21) Imag Axis.8.6.4 After transfer to Z-transform and modification, we get the final formula for weights: ( w(z) = [ I+ ( α α h h ax i T x i ) x i T x i βi I )z 1 + βiz 2 ] 1 (22) The relationship can be compared with equation 12. After modifications we get the stability condition for a number of patterns: h < α x T i x i < (2I+2Iβ) S (23) S.4.6.8 1 1.8.6.4.4.6.8 1 Real Axis Figure 1: The characteristic polynomial on edge of stability The next figures fig.3 and fig.4 show the learning dynamics for α =.5 2+2β, so for half value of the stability limit. The last two figures fig.5 and fig.6 show the behaviour for α = 1.1 2+2β, so for a little bit bigger value of the stability limit.

1 Pole Zero Map Step Response.35 Weight 1 Weight 2.8.6.3.4 5 Imag Axis.15.1.4.6.5.8 1 1.5 1.5.5 1 Real Axis.5 5 1 15 2 25 3 Time (sec) 5 1 15 2 25 3 Figure 2: The learning dynamics on edge of stability Figure 5: Non-stable characteristic polynomial Step Response Weight 1 Weight 2 15 1 Pole Zero Map.8 1.6 5.4 Imag Axis 5.4 1.6.8 15 5 1 15 2 25 3 Time (sec) 5 1 15 2 25 3 1 1.8.6.4.4.6.8 1 Real Axis Figure 3: Stable characteristic polynomial Figure 6: Non-stable learning dynamics 6.2 Norm comparison.16.14.12 Weight 1 Step Response Weight 2 For simplicity of computation in real time control process is suitable change Euclidean norm instead the largest singular value norm. Graphs show behavior of the norm values during process. The modeled system is described by the transfer function:.1 F S (p) = 1.5 1p 2 +.7p+1 (27).8.6.4 5 1 15 2 25 3 5 1 15 2 25 3 Time (sec) Figure 4: Stable learning dynamics Sample period is T = 1 s, the level of noise is.5. There is 15 patterns in the training set. The form of patterns S(i) is as follows: ( S(i) = x(i),x(i 1),x(i 2),x(i 3), ),y(i 1),y(i 2),y(i 3),1 (28)

where: i... i-th pattern in the training set, x(i)... i-th system input, y(i)... i-th system output, 1... bias. The first figure 7 shows system response to input signal with noise. There is difference of norms in the second figure (fig.8). The last figure demonstrates values of norms (fig.9). The difference values are approximately 1 times less then the norm values. So consequently we can choose the learning constant with accuracy one decimal place. Norm value Difference of norms 7 6 5 4 3 2 1 Difference of norms Euclidean norm and the largest singular value norm 5 1 15 2 25 3 35 4 45 5 Time (s) Figure 9: The norm values 6.3 Algorithm comparison 15 1 5 5 System output For demonstration there were tested algorithms on backpropagation base implemented in MATLAB: 1. GD - Gradient descent backpropagation. 2. GDA - Gradient descent with adaptive learning rate backpropagation. 3. GDM - Gradient descent with momentum backpropagation. System input 1 5 1 15 2 25 3 35 4 45 5 Time (s) Figure 7: The response to input signal with noise 4. GDX - Gradient descent with momentum and adaptive learning rate backpropagation. 5. RP - Resilient backpropagation. 6. SBP - Modified BP by stability condition. Difference of norms 3 25 2 15 1 Difference of norms The simulation had same learning set and same learning constant β =.9. The learning constant α was set by MATLAB function maxlinlr. For modified BP algorithm was α =.9 2+2β. As a criterion of quality of the learning was used mean square error MSE = 1 8. The main parameters of learning were a number of epochs, a learning time of one epoch and quality rate, which is calculated by the formula: 5 5 1 15 2 25 3 35 4 45 5 Time (s) Figure 8: The difference between norms Quality rate = 1 1 6 Learning time o f one epoch Number o f epochs (29) The following table shows the simulation results of these learning algorithms:

Table 1: Algorithm comparison. Learning time of one epoch (1 1 3 s) Algorithm Number of epochs Quality rate GD 967 1.674 1.73 GDA 72 1.741 2.42 GDM 934 1.763 1.89 GDX 226 2.164 9.58 RP 269 2.31 8.55 SBP 145 1.778 12.26 7 Conclusion The stability conditions were given, and on their basis the limits for learning constants. This principle changed the meaning of learning constants. Their magnitude does not only mean the speed of the learning algorithm, but also expresses the degree of its stability. The principle, of course, makes it possible to set such constants that learning can be made faster for different patterns, independent of the initial network values, than with the common methods of BP modification. Acknowledgements The paper has been prepared as a part of the solution of GAČR project No. 12/1/1485, with the support by the research plan CEZ: MSM 2613 and the Ph.D. research grant FR: MSMT IS 432164 - Adaptive Controllers based on Neural Networks [5] K. J. Hunt - D. Sbarbaro - R. Żbikowski - P. J. Gawthrop (1992). Neural Network for Control Systems-A Survey. Automatica, volume 28, no. 6, pages 183-1112, 1992 [6] K. S. Narendra - S. Mukhopadhyay (1992). Intelligent Control using Neural Networks. IEEE Control Systems, pages 11-18, April 1992 [7] H. Demuth - M. Beale (21). Neural Network Toolbox User s Guide. The MathWorks, Inc., version 4, 21 [8] R. M. Hristev (1998). The Ann Book. GNU Public Licence, 1998 [9] M. Riedmiller - H. Braun (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. Proceedings of the IEEE International Conference on Neural Networks, 1993 References [1] P. Krupanský - P. Pivoňka (21). Adaptive Neural Controllers and Their Implementation. Proc. 9th Zittau Fuzzy Colloquium, Zittau, 21 [2] P. Krupanský - P. Pivoňka (22). The Possibilities of Direct Inversion in Neural Controllers. Mendel 22 8th International Conference of Soft Computing, Brno, 22 [3] J. Najvárek (1998). Neural Nets in Predictive Control. Dissertation work, Dept. of Automatic Control and Instrumentation, Brno, 1998 [4] P. Vavřín (1988). Theory of automatic control. Scriptum, Brno, 1988