Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Similar documents
Lab 5: 16 th April Exercises on Neural Networks

y(x n, w) t n 2. (1)

Neural networks III: The delta learning rule with semilinear activation function

Multilayer Perceptrons and Backpropagation

Course 395: Machine Learning - Lectures

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Multilayer Neural Networks

Neural Networks Lecture 3:Multi-Layer Perceptron

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Artifical Neural Networks

Multilayer Perceptrons (MLPs)

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Revision: Neural Network

Artificial Neural Networks. Edward Gatt

NN V: The generalized delta learning rule

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

The Multi-Layer Perceptron

Artificial Neural Networks Examination, June 2005

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

epochs epochs

Neural networks. Chapter 20. Chapter 20 1

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Multilayer Neural Networks

Artificial Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Artificial Neural Networks Examination, March 2004

Machine Learning

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks and the Back-propagation Algorithm

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Unit III. A Survey of Neural Network Model

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Intro to Neural Networks and Deep Learning

4. Multilayer Perceptrons

Introduction to Neural Networks

EPL442: Computational

Neural Networks (Part 1) Goals for the lecture

AI Programming CS F-20 Neural Networks

Artificial Neural Networks. MGS Lecture 2

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Artificial Neural Networks Examination, June 2004

Neural Nets Supervised learning

Multilayer Feedforward Networks. Berlin Chen, 2002

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Gradient Descent Training Rule: The Details

Neuro-Fuzzy Comp. Ch. 4 June 1, 2005

Supervised Learning in Neural Networks

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Introduction to Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Neural Networks Task Sheet 2. Due date: May

Single layer NN. Neuron Model

Neural Networks Lecture 2:Single Layer Classifiers

Artificial Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural networks. Chapter 19, Sections 1 5 1

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

ECE521 Lectures 9 Fully Connected Neural Networks

Pattern Classification

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Artificial Neural Networks

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Machine Learning

Multi-layer Neural Networks

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Backpropagation Neural Net

Multilayer Perceptron

Chapter ML:VI (continued)

Computational Intelligence Winter Term 2017/18

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Introduction to Neural Networks

CS 730/730W/830: Intro AI

Feedforward Neural Nets and Backpropagation

Multilayer Perceptron

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Artificial Neural Networks Examination, March 2002

Lecture 5: Logistic Regression. Neural Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Incremental Stochastic Gradient Descent

Fundamentals of Neural Network

Artificial Neural Networks

Neural Networks: Backpropagation

Neural networks. Chapter 20, Section 5 1

Logistic Regression & Neural Networks

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Part 8: Neural Networks

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Transcription:

Plan training single neuron with continuous activation function training 1-layer of continuous neurons training multi-layer network - back-propagation method

single neuron with continuous activation function

Reminder: neuron with continuous activation function sigmoid unipolar activation function: 1 f (net) = 1 + e net sigmoid bipolar activation function: f (net) = 2 1 + e net 1 where: net = w T x

Error of continuous neuron Let's dene the following error measure for a single neuron: E = 1 2 (d y)2 = 1 2 (d f (w T x)) 2 where: d - desired output (continuous) y - actual output (continuous) (y = f(net)) (coecient 1/2 is selected for simplication of subsequent computations)

goal: minimisation of the error We wish to modify the weight vector w so that the error E is minimised. Gradient : the direction of maximum descent of the function (towards the minimum) is opposite to the gradient vector (of partial derivatives of the error as the function of weight vector) E(w) = E w E(w) = (d y)f (net)( net w 1,..., net w p ) T = = (d y)f (net)x

Derivatives of sigmoid functions Let's observe that: for unipolar sigmoid function: f (net) = f (net)(f (net) 1) = y(y 1) for bipolar sigmoid function: f (net) = 1 2 (1 f 2 (net)) = 1 2 (1 y 2 ) Thus, the derivative of f can be easily expressed in terms of itself. (Now, we can understand why such particular form of activation function was selected)

Learning rule for a continuous neuron To sum up, the weights of a continuous neuron are modied as follows: unipolar: w new = w old + η(d y)y(1 y)x bipolar: w new = w old + 1 2 η(d y)(1 y 2 )x where: η is the learning rate coecient Notice a remarkable analogy to the delta rule for discrete perceptron

1-layer of neurons with continuous activation function

1-layer of neurons with continuous activation Assume, the network has J inputs and K continuous neurons. Let's introduce the following denotations: input vector: y T = (y 1,..., y J ) output vector: z T = (z 1,..., z K ) weight matrix: W = [w kj ] (w kj : k-th neuron, j-th weight) matrix of activation funtions: Γ = diag[f ( )] (size: K K ) Computing the output vector is as follows: z = Γ[Wy]

1-layer of neurons with continuous activation funtions Let's introduce additional denotations: desired output vector: d T = (d 1,..., d K ) output error for a single input vector: E = 1 K (d k z k ) 2 = 1 d z 2 2 2 k=1 Again, the gradient method will be applied (as in the case of a single neuron) Modication of a single weight is as follows: w kj = η E w kj

1-layer NN, cont. Thus, we obtain: E = E net k w kj net k w kj error signal delta of the k-th neuron of the last layer: δ zk = E net k = (d k z k )z k (1 z k ) δ zk = E net k = 1 2 (d k z k )(1 z k ) 2 Notice that: net k w kj = y j We get the following matrix weight modication formula: W new = W old + ηδ z y T

Algorithm for training 1-layer NN select η, E max, initialise random weights W, E = 0 for each case from the training set: compute the output z mofy the weight of the k-th neuron (unipolar/bipolar): w k w k + η(d k z k )z k (1 z k )y w k w k + 1 2 η(d k z k )(1 z 2 k )y accumulate the error: E E + 1 2 K (d k z k ) 2 k=1 if all training cases where considered and E < E max then complete the training phase. Else, reset the error E and repeat training on the whole training set.

multi-layer network - Back-propagation method

Multi-layer network 1-layer NN can split the input space into linearly separable regions. Each next layer can further transform the space. As the result, multi-layer network is a universal tool that theoretically can arbitrarily well approximate any transformation of the input space into output space.

of a multi-layer network We will illustrate training of a multi-layer network on a 2-layer example. To this end we will prepend one additional layer in the front of the output layer and will demonstrate how to train it. Each layer except the output one is called hidden, since it is not known what is the correct output vector of such a layer. A method for training multi-layer networks was discovered not earlier than in 70s and it was applied since 80. of the XX-th century. It is known as the back-propagation method, since the weights are modied backwards, starting from the last layer. The method can be naturally extended from 2 layers to any number of hidden layers.

2-layer neural network Let's introduce the following denotations: input vector: x T = (x 1,..., x I ) weight matrix of the 1st layer: V = [v ji ] (v ji : j-th neuron, i-th weight) output vector of the rst layer (input to the 2nd layer): y T = (y 1,..., y J ) output vector of the 2nd layer (nal output): z T = (z 1,..., z K ) weight matrix of the 2nd layer: W = [w kj ] (w kj : k-th neuron, j-th weight) activation function operator: Γ = diag[f ( )] (size: J J lub K K ) Computing the nal output can be expressed in matrix form as: z = Γ[Wy] = Γ[W Γ[Vx]]

multi-layer network Back-propagation method: After computing the output vector z, the weights are modied starting from the last layer towards the rst one (backwards) Earlier, it was demonstrated how to modify the weights of the last layer. After modifying the weights of the last layer, the weights of the rst layer are modied. We again apply the gradient method to modify the weights of the rst (hidden) layer: v ji = η E v ji

Backpropagation method, cont. By analogy, the weight matrix V is modied as follows: V new = V old + ηδ y x T where, δ y denotes error signal vector of the hidden layer: δ T y = (δ y1,..., δ yj ) Error signal of the hidden layer is computed as follows: δ yj = E y j y j net j = E y j f (net j ) = K k=1 δ zk w kj f j (net j )

Algorithm for training multi-layer neural network select η, E max, initialise randomly the weights W and V, E = 0 for each case from the training set: compute output vectors y and z accumulate the error:e E + 1 2 K k=1 (d k z k ) 2 compute the error signals (for the last and rst layer): unipolar: δ zk = (d k z k )z k (1 z k ), δ yj = y j (1 y j ) K δ k=1 zk w kj bipolar: δ zk = 1 2 (d k z k )(1 z 2), δ k yj = 1 2 (1 y 2) K δ j k=1 zk w kj modify the weights of the last layer: w kj w kj + ηδ zk y j modify the weights of the rst (hidden) layer: v ji v ji + ηδ yj x i if all the cases from the training set were presented and E < E max then complete the training phase. Else, reset the error E and repeat training on the whole training set

Summary training single neuron with continuous activation function training 1-layer of continuous neurons training multi-layer network - back-propagation method

Thank you for attention.