Backpropagation Neural Net

Similar documents
Unit III. A Survey of Neural Network Model

Simple Neural Nets For Pattern Classification

Artifical Neural Networks

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

4. Multilayer Perceptrons

CS407 Neural Computation

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Chapter 3 Supervised learning:

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

Data Mining Part 5. Prediction

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Neural Networks Based on Competition

Simple neuron model Components of simple neuron

Neural Networks: Basics. Darrell Whitley Colorado State University

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Introduction to Artificial Neural Networks

Supervised Learning in Neural Networks

Deep Feedforward Networks

1 What a Neural Network Computes

EPL442: Computational

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural Networks (Part 1) Goals for the lecture

Chapter 2 Single Layer Feedforward Networks

Introduction to Neural Networks

Artificial Neural Networks. MGS Lecture 2

epochs epochs

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Lecture 7 Artificial neural networks: Supervised learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Neural Network Training

Statistical Machine Learning from Data

Notes on Back Propagation in 4 Lines

Feedforward Neural Nets and Backpropagation

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Multilayer Neural Networks

C1.2 Multilayer perceptrons

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks

CHAPTER 3. Pattern Association. Neural Networks

Multilayer Perceptrons and Backpropagation

y(x n, w) t n 2. (1)

Lecture 5: Logistic Regression. Neural Networks

Neural Networks and Ensemble Methods for Classification

Artificial Neural Networks. Edward Gatt

A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010

Artificial Neural Networks Examination, June 2004

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Error Backpropagation

Reification of Boolean Logic

Multilayer Perceptrons (MLPs)

Chapter 6: Backpropagation Nets

Linear Least-Squares Based Methods for Neural Networks Learning

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Lecture 3 Feedforward Networks and Backpropagation

Lecture 5 Neural models for NLP

Artificial Intelligence

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Deep Learning & Artificial Intelligence WS 2018/2019

Feed-forward Network Functions

Backpropagation: The Good, the Bad and the Ugly

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Multilayer Perceptron

Neural networks. Chapter 20. Chapter 20 1

Deep Feedforward Networks

Solutions. Part I Logistic regression backpropagation with a single training example

AI Programming CS F-20 Neural Networks

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Neural Networks Lecture 3:Multi-Layer Perceptron

Introduction to Neural Networks

Simulating Neural Networks. Lawrence Ward P465A

Day 3 Lecture 3. Optimizing deep networks

Convolutional Neural Networks

Neural Networks and Deep Learning

The Multi-Layer Perceptron

CSC 578 Neural Networks and Deep Learning

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

Pattern Classification

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

EEE 241: Linear Systems

Lecture 3 Feedforward Networks and Backpropagation

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Introduction to Machine Learning

Multilayer Neural Networks

Week 5: The Backpropagation Algorithm

Gradient Descent Training Rule: The Details

Artificial Neural Networks Examination, March 2004

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Transcription:

Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that are used for training (memorization) and the ability to give reasonable (good) responses to input that is similar, but not identical, to that used in training (generalization). After training, application of the net involves only the computations of the feedforward phase. Even if training is slow, a trained net can produce its output very rapidly. The training of a network by backpropagation involves three stages: 1- The feedforward of the input training pattern. - The calculation and backpropagation of the associated error. 3- The adjustment of the weights. 1- Architecture A multilayer neural network with one layer of hidden units (the Z units) is shown in Figure 1. The output units (the Y units) and the hidden units also may have biases (as shown). The bias on a typical output unit Y k is denoted by w 0k ; the bias on a typical hidden unit Z j is denoted v 0j. Only the direction of information flow for the feedforward phase of operation is shown. During the backpropagation phase of learning, signals are sent in the reverse direction. The algorithm is presented for one hidden layer, which is adequate for a large number of applications. - Algorithm As mentioned earlier, training a network by backpropagation involves three stages: the feedforward of the input training pattern, the backpropagation of the associated error, and the adjustment of the weights. 118

Figure (1) Backpropagation neural network with one hidden layer During feedforward: - Each input unit (X i ) receives an input signal and broadcasts this signal to the each of the hidden units Z 1,..., Z p. - Each hidden unit then computes its activation and sends its signal z j to each output unit. - Each output unit Y k computes its activation y k to form the response of the net for the given input pattern. During training: - Each output unit compares its computed activation y k with its target value t k to determine the associated error for that pattern with that unit. - Based on this error, the factor δ k (k = 1,..., m) is computed. - δ k is used to distribute the error at output unit Y k back to all units in the previous layer (the hidden units that are connected to Y k ). 119

- It is also used (later) to update the weights between the output and the hidden layer. - In a similar manner, the factor δ j (j = 1,..., p) is computed for each hidden unit Z j. - It is not necessary to propagate the error back to the input layer, but δ j is used to update the weights between the hidden layer and the input layer. Adjustment of the weights: - After all of the δ factors have been determined, the weights for all layers are adjusted simultaneously. - The adjustment to the weight w jk (from hidden unit Z j to output unit Y k ) is based on the factor δ k and the activation z j of the hidden unit Z j. - The adjustment to the weight v ij (from input unit X i to hidden unit Z j ) is based on the factor δ j and the activation x i of the input unit. The nomenclature we use in the training algorithm for the backpropagation net is as follows: x Input training vector: x = (x 1,...,x i,, x n ). t Output target vector: t = (t l,..., t e,..., t m ). δ k δ j α Portion of error correction weight adjustment for w jk that is due to an error at output unit Y k ; also, the information about the error at unit Y k that is propagated back to the hidden units that feed into unit Y k. Portion of error correction weight adjustment for v ij that is due to the backpropagation of error information from the output layer to the hidden unit Z j Learning rate. X i Input unit i: 10

For an input unit, the input signal and output signal are the same, namely, x i. v 0j Bias on hidden unit Z j. Z j Hidden unit j: The net input to Z j is denoted z_in j : z_in j = v 0j + x i v ij i The output signal (activation) of Z j is denoted z j : z j = f(z_in j ) w 0k Bias on output unit k. Output unit k: The net input to Y k is denoted y_in k : y_in k = w 0k + z j w jk j The output signal (activation) of Y k is denoted y k : y k = f(y_in k ) Activation function An activation function for a backpropagation net should have several important characteristics: It should be continuous, differentiable, and nondecreasing. Furthermore, for computational efficiency, it is desirable that its derivative be easy to compute. One of the most typical activation functions is the binary sigmoid function, which has range of (0, 1) and is defined as with 1 f1( x) 1 e x f 1 (x) = f 1 (x)[1 f 1 (x)] Another common activation function is bipolar sigmoid, which has range of ( - 1, 1) and is defined as 11

with f ( x) 1 e x 1 f (x) = ½ [ 1+ f (x)][1 f (x)] Note that the bipolar sigmoid function is closely related to the function tanh( x) Training algorithm e e x x e e x x Either of the activation functions defined in the previous section can be used in the standard backpropagation algorithm given here. The form of the data (especially the target values) is an important factor in choosing the appropriate function. The algorithm is as follows: 1- Initialize weights. (Set to small random values). - For each training pair: Feedforward:.1 Each input unit (Xi, i = 1,..., n) receives input signal x i and broadcasts this signal to all units in the layer above (the hidden units).. Each hidden unit (Zj, j = 1,...,p) sums its weighted input units). signals, z_in j = v 0j + x i v ij i n applies its activation function to compute its output signal, z j = f(z_in j ) and sends this signal to all units in the layer above (output.3 Each output unit (Y k, k = 1,..., m) sums its weighted input signals, 1

signal, y_in k = w 0k + z j w jk j p and applies its activation function to compute its output y k = f(y_in k ) Backpropagation of error:.4 Each output unit (Y k, k = 1,...,m) receives a target pattern corresponding to the input training pattern, computes its error information term, δ k = (t k y k ) f '(y_in k ) calculates its weight correction term (used to update w jk later), w jk = α δ k z j calculates its bias correction term (used to update w 0k later), and sends δ k w 0k = α δ k to units in the layer below..5 Each hidden unit (Z j, j = 1,...,p) sums its delta inputs (from units in the layer above), m δ_in j = δ k w jk k=1 multiplies by the derivative of its activation function to calculate its error information term, δ j = δ_in j f ' (z_in j ), calculates its weight correction term (used to update v ij later), later), v ij = α δ j x i and calculates its bias correction term (used to update v 0j v 0j = α δ j 13

Update weights and biases:.6 Each output unit (Y k, k = 1,, m) updates its bias and weights (j = 0,, p): w jk (new) = w jk (old) + w jk Each hidden unit (Z j, j = 1,,p) updates its bias and weights (i = 0,, n): v ij (new) = v ij (old) + v ij Note that in implementing this algorithm, separate arrays should be used for the deltas for the output units (δ k ) and the deltas for the hidden units (δ j ). Choice of initial weights and biases. Random Initialization: A common procedure is to initialize the weights (and biases) to random values between -0.5 and 0.5 (or between -1 and 1 or some other suitable interval). Nguyen-Widrow Initialization: The following simple modification of the common random weight initialization just presented typically gives much faster learning. Weights from the hidden units to the output units (and biases on the output units) are initialized to random values between -0.5 and 0.5, as is commonly the case. The initialization of the weights from the input units to the hidden units is designed to improve the ability of the hidden units to learn. The definitions we use are as follows: n number of input units p number of hidden units scale factor: = 0.7 (p) 1/n The procedure consists of the following steps: for each hidden unit (j = 1,...,p): 14

Initialize its weight vector (from the input units): v ij (old) = random number between -0.5 and 0.5 (or between γ and γ). Compute Euclidean norm (length) of vector v j : v ( old) v old j Reinitialize weights: vij ( old ) vij v ( old ) Set bias: 1 j ( old) v j ( old)... vnj ( ) j v 0j = random number between - and. The Nguyen-Widrow analysis is based on the activation function tanh(x). Application procedure After training, a backpropagation neural net is applied by using only the feedforward phase of the training algorithm. The application procedure is as follows: 1- Initialize weights (from training algorithm). - For each input vector:.1 For i = 1,..., n: set activation of input unit Xi. For j = 1,...,p: z_in j = v 0j + x i v ij i=1 z j = f(z_in j ).3 For k = 1,...,m: y_in k = w 0k + z j w jk j=1 n p y k = f(y_in k ) Example-1: Find the new weights when the net illustrated in Figure () is presented the input pattern (-1, l) and the target output is l. Use a learning rate of a = 0.5, and the bipolar sigmoid activation function. 15

Sol: Z j x 1 x v 0j v 1j v j z_in j z j δ_in j δ j Δv 0j Δv 1j Δv j 1-1 1.4.7 -. -.5 -.85.13.03 -.03.03.4-1 1.6 -.4.3 1.3.57.057.0.005 -.005.005 Y k z 1 z w 0 w 1 w y_in k y k δ k Δw 0 Δw 1 Δw 1 -.4.57 -.3.5.1 -.365 -.18.57.143 - Feedforward: z_in j = v 0j + x i v ij i=1 z_in 1 = 0.4 + (-1)(0.7) + (1)(-0.) = -0.5 z_in = 0.6 + (-1)(-0.4) + (1)(0.3) = 1.3 z j = f(z_in j ) The activation function is bipolar sigmoid, f ( z _ in j ) 1 z _ in j 1 e.035.08 16

z 1 = -0.45 z = 0.57 y_in k = w 0k + z j w jk j=1 y_in = -0.3 + (-0.45)(0.5)+(0.57)(0.1) = 0.365 y k = f(y_in k ) f ( y _ ink ) 1 y _ ink 1 e f(y_in k )=f( -0.365) y k = -0.18 Backpropagation of error: δ k = (t k y k ) f '(y_in k ) δ k = (t k y k )[0.5{1+ f (y_in k )}{1- f (y_in k )}] = (1 (-0.18))(0.5(1+(-0.18))(1-(-0.18))) = 0.57 w jk = α δ k z j w 1 = 0.5*0.57*(-0.45) = -0.035 w = 0.5*0.57*0.57 = 0.08 w 0k = α δ k = 0.5* 0.57 = 0.143 m δ_in j = δ k w jk k=1 δ_in 1 = 0.57*0.5 = 0.85 δ_in = 0.57*0.1 = 0.057 δ j = δ_in j f ' (z_in j ) δ 1 = 0.85 *0.5*(1+f(z_in 1 ))(1-f(z_in 1 )) δ 1 = 0.85*0.5*(1+(-0.45))*(1-(-0.45))= 0.13 δ = 0.057*0.5*(1+0.57)*(1-0.57)= 0.0 v ij = α δ j x i 17

v 11 = 0.5*0.13*(-1) = -0.03 v 1 = 0.5*0.13*(1) = 0.03 v 1 = 0.5*0.0*(-1) = -0.005 v = 0.5*0.0*(1) = 0.005 v 0j = α δ j v 01 = 0.5* 0.13 = 0.03 v 0 =.5* 0.0 = 0.005 Update weights and biases: w jk (new) = w jk (old) + w jk w 1 (new) = 0.5 0.035 = 0.465 w (new) = 0.1 + 0.08 = 0.18 w 0 (new) = -0.3 + 0.143 = -0.157 v ij (new) = v ij (old) + v ij v 11 (new) = 0.7-0.03 = 0.67 v 1 (new) = -0.+ 0.03 = -0.17 v 1 (new) =- 0.4-0.005 = -0.405 v (new) = 0.3+ 0.005 = 0.305 v 01 (new) = 0.4+ 0.03 = 0.403 v 0 (new) = 0.6+ 0.005 = 0.605 The Use of Momentum as Alternative Weight Update Procedure In backpropagation convergence is sometimes faster if a momentum term is added to the weight update formulas. In order to use momentum, weights (or weight updates) from one or more previous training patterns must be saved. For example, in the simplest form of backpropagation with momentum, the new weights for training step t + 1 are based on the weights at training steps t and t - 1. The weight update formulas for backpropagation with momentum are 18

where the momentum parameter is constrained to be in the range from 0 to 1, exclusive of the end points. Batch updating of weights In some cases it is advantageous to accumulate the weight correction terms for several patterns (or even an entire epoch if there are not too many patterns) and make a single weight adjustment (equal to the average of the weight correction terms) for each weight rather than updating the weights after each pattern is presented. This procedure has a smoothing effect on the correction terms. In some cases, this smoothing may increase the chances of convergence to a local minimum. 19