Multilayer Feedforward Networks. Berlin Chen, 2002

Similar documents
Part 8: Neural Networks

Neural Networks Lecture 3:Multi-Layer Perceptron

Artificial Neural Networks. Part 2

Multilayer Neural Networks

Simple Neural Nets For Pattern Classification

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

Unit III. A Survey of Neural Network Model

Multilayer Perceptrons and Backpropagation

4. Multilayer Perceptrons

Multilayer Neural Networks

Linear Discriminant Functions

Chapter 3 Supervised learning:

Introduction to Neural Networks

Neural Networks biological neuron artificial neuron 1

Neural networks. Chapter 19, Sections 1 5 1

Networks of McCulloch-Pitts Neurons

Lecture 4: Perceptrons and Multilayer Perceptrons

Artificial Neural Networks. Edward Gatt

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Multilayer Perceptron = FeedForward Neural Network

Pattern Classification

An artificial neural networks (ANNs) model is a functional abstraction of the

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Neural Networks (Part 1) Goals for the lecture

y(x n, w) t n 2. (1)

Multilayer Perceptrons (MLPs)

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Course 395: Machine Learning - Lectures

Learning and Neural Networks

Artificial Neural Networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Lab 5: 16 th April Exercises on Neural Networks

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Multilayer Perceptron

Neural Networks DWML, /25

Neural Networks Lecture 2:Single Layer Classifiers

Neural networks. Chapter 20. Chapter 20 1

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Chapter 2 Single Layer Feedforward Networks

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Address for Correspondence

Motivation for the topic of the seminar

CS:4420 Artificial Intelligence

Artificial Neural Networks 2

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Artificial Neural Networks Examination, June 2005

Single layer NN. Neuron Model

Neural networks and support vector machines

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks The Introduction

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks Lecture 4: Radial Bases Function Networks

Input layer. Weight matrix [ ] Output layer

Lecture 7 Artificial neural networks: Supervised learning

AI Programming CS F-20 Neural Networks

Neural networks III: The delta learning rule with semilinear activation function

Machine Learning. Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Chapter ML:VI (continued)

Multilayer Perceptron

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Multilayer Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Data Mining Part 5. Prediction

Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Chapter ML:VI (continued)

Unit 8: Introduction to neural networks. Perceptrons

Artifical Neural Networks

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Introduction to Artificial Neural Networks

Revision: Neural Network

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Radial-Basis Function Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural Networks: Basics. Darrell Whitley Colorado State University

Radial-Basis Function Networks

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

Artificial Neuron (Perceptron)

Machine Learning

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Lecture 5: Logistic Regression. Neural Networks

Introduction To Artificial Neural Networks

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Artificial neural networks

POWER SYSTEM DYNAMIC SECURITY ASSESSMENT CLASSICAL TO MODERN APPROACH

Transcription:

Multilayer Feedforard Netors Berlin Chen, 00

Introduction The single-layer perceptron classifiers discussed previously can only deal ith linearly separable sets of patterns The multilayer netors to be introduced here are the most idespread neural netor architecture Made useful until the 980s, because of lac of efficient training algorithms (McClelland and Rumelhart 986)

Introduction Supervised Error Bac-propagation Training The mechanism of bacard error transmission (delta learning rule) is used to modify the synaptic eights of the internal (hidden) and output layers The mapping error can be propagated into hidden layers Can implement arbitrary comple/output mappings or decision surfaces for to separate pattern classes For hich, the eplicit derivation of mappings and discovery of relationships is almost impossible Produce surprising results and generalizations 3

Linearly Non-separable Pattern Classification Linearly non-separable dichotomization For to training sets C and C of the augmented patterns, if no eight vector eists such that t y > 0 for each y C t y < 0 for each y C Then the patterns set C and C are linearly non-separable To-dimensional Input pattern space Three-dimensional Input pattern space 4

Linearly Non-separable Pattern Classification Map the patterns in the original pattern space into the so-called image space such that a tolayer netor can classify them -dimensional pattern space 3-dimensional image space (-,,-) (-,,) (,,-) -,-,) (,-,) o 4 sgn sgn ( o + o + o 3 ) ( o + o + o ) 3 > < 0 : 0 : class class 5

Linearly Non-separable Pattern Classification Patterns mapped into the three-dimensional cube Produce linearly separable images in the image space o 4 sgn sgn ( o + o + o 3 ) ( o + o + o ) 3 > < 0 : 0 : class class 6

Linearly Non-separable Pattern Classification Eample 4.: the XOR function using a simple layered classifier (ith parameters produced by inspection) + 0 o sgn + 0 0 0 0 Output - - 0 o sgn bipolar discrete perceptron 7

Eample 4. Linearly Non-separable Pattern Classification The mapping using discrete perceptrons o sgn + The mapping using continuous perceptrons o sgn Contour map o 3 (, ) 8

Linearly Non-separable Pattern Classification Another eample: classification of planner patterns For class, only one set of them ill be both activated at the same time 9

Linearly Non-separable Pattern Classification The layered netors ith discrete perceptrons described here are also called committee netor Committee Voting Input pattern space Image space Class membership [, -] N A verte of cube 0

Error Bac-propagation Training for Multi-layer Feed-forard Netors The error bac-propagation training algorithm has reaaed the scientific and engineering community to the modeling of many quantitative phenomena using neural netors The Delta Learning Rule is applied Each neuron has a nonlinear and differentiable activation function (sigmoid function) Neurons (synaptic) eights are adusted based on the least mean square (LMS) criterion

Error Bac-propagation Training for Multi-layer Feed-forard Netors Training: eperiential acquisition of input/output mapping noledge ithin multilayer netors Input patterns submitted sequentially during training Synaptic eights and thresholds adusted to reduce the mean square classification error The eight adustments enforce bacard from the output layer through the hidden layers toard the input layer Continued until the netor are ithin an acceptable overall error for the hole training set

Error Bac-propagation Training for Multi-layer Feed-forard Netors Revisit the Delta Learning Rule for the singlelayer netor Continuous activation functions Gradient descent search [ Wy], net Wy o Γ The Derivation for A Specific Neuron E K ( ) ( ) J d o, o f net f y + E η η negative gradient decent formula ( d o ) f ( net ) y y J 3

Error Bac-propagation Training for Multi-layer Feed-forard Netors Revisit the Delta Learning Rule for the singlelayer netor The definition of the error signal term δ o E ( net ) E ( net ) E ( net ) ( net ) δ o δ y o f δ for a specific neuron E ( net ) E η ηδ f J y f ( net ) y f E ( net ) net ( net ) net f + ηδ ( d o ) o y ( net ) o y 4

Error Bac-propagation Training for Multi-layer Feed-forard Netors Revisit the Delta Learning Rule for the singlelayer netor Unipolar continuous activation function f ( net ) f + ep ( net ) ( d o ) o ( o ) y η ( net ) o ( o ) Bipolar continuous activation function f ( net ) + η ep δ o ( net ) f ( )( d ) o o y δ o ( net ) ( o ) 5

6 Error Bac-propagation Training for Multi-layer Feed-forard Netors Revisit the Delta Learning Rule for the singlelayer netor are local error signals dependent only on and t δy W W η + KJ K K J J............... W ok o o δ δ δ.. δ [ ] J t y y y.. y o δ o d

Error Bac-propagation Training for Multi-layer Feed-forard Netors Generalized Delta learning rule for hidden Layers 7

Error Bac-propagation Training for Multi-layer Feed-forard Netors Apply the negative gradient decent formula for the hidden layer E v i η net I v i z v δ v y v i i E η net ηδ E y z i i net v The error signal term of the hidden layer net having output y i i i 8

9 Error Bac-propagation Training for Multi-layer Feed-forard Netors Apply the negative gradient decent formula for the hidden layer y net y y E δ [ ] ( ) [ ] J K K y net net f d o d E E δ o ( ) [ ] ( ) ( ) [ ] ( ) { } K K net f net f d y net net net f d y net net E y net net E y E

Error Bac-propagation Training for Multi-layer Feed-forard Netors Generalized Delta learning rule for hidden Layers y y f net δ y f f ( ) net f ( net ) ( ) K net [( ) ( ) ] d o f net ( net ) K δ o v v i i v ηδ i + y z i v i v i + η f ( ) K net z i δ o 0

Error Bac-propagation Training for Multi-layer Feed-forard Netors Generalized Delta learning rule for hidden Layers Bipolar continuous activation function v i v v i i + η f + η K ( net ) z [( d o ) f ( net ) ] K ( y ) z ( d o )( o ) i i Unipolar continuous activation function v i v v i i + η f + η y K ( net ) z [( d o ) f ( net ) ] K ( y ) z [( d o ) o ( o ) ] i i The adustment of eights leading to neuron in the hidden layer is proportional to the eighted sum of all δvalues at the adacent folloing layer of nodes connecting neuron ith the output

Error Bac-propagation Training for Multi-layer Feed-forard Netors Bipolar continuous activation function

Error Bac-propagation Training for Multi-layer Feed-forard Netors o Γ [ W Γ [ V z ] 3

Error Bac-propagation Training for Multi-layer Feed-forard Netors The incremental learning of the eight matri in the output and hidden layers is obtained by the outer product rule as W η δy t δ Where is the error signal vector of a layer and is the input signal to that layer y The netor is nonlinear in the feedforard mode, hile the error bac-propagation is computed using the linearized activation The slope of each neuron s activation function 4

Eamples of Error Bac-Propagation Training Eample 4.: XOR function o o Output 0 0 0 0 - - W V [ ] 50 30 40 53 3 4 54 3 4 5

Eamples of Error Bac-Propagation Training Eample 4.: XOR function The first sample run ith random initial eight values 44 steps (η0.) W V [ 3.38 6.898 6.584 ] 3.6 0.739 5.545 5.46 7 6.094 3.78 Though continuous neurons are used for training, e replace them ith bipolar binary neurons 6

Eamples of Error Bac-Propagation Training Eample 4.: XOR function The second sample run ith random initial eight values 8 steps (η0.) W [ 3.967 8.60 5.376 ] V 6.69.69 3.854 4.674 4.8 4.578 If the netor has failed to learn the training set successfully, the training should be restarted ith Different initial eights 7

Training Errors For the purpose of assessing the quality and success of training, the oint error (cumulative error) must be computed for the entire batch of training patterns E P p ( d ) It is not very useful for comparison of netors ith different numbers of training patterns and having different number of output neurons Root-mean-square normalized error E rms PK K P p o p K p ( d ) p o p 8

Training Errors For some classification applications The desired outputs belo a threshold ill be set to 0, hile the desired outputs higher than an other threshold ill be set to o o In such cases, the decision error ill more adequately reflects the accuracy of neural netor classifiers E p p < > d 0. 0. 9 N PK err o o The netors in classification applications may ehibit zero decision errors hile still yielding substantial E and E rms p p 0 Average number of bit errors 9

Multilayer Feedforard Netors as Function Approimators Eample: a function h() approimated by H(,) 30

Multilayer Feedforard Netors as Function Approimators { } There are P samples,,..., p, hich are eamples of function values in the interval (a, b) b a i+ i, for i,..., P Each subinterval ith length is P i, i +, i,,..., P a, P + b 3

3 Multilayer Feedforard Netors as Function Approimators Define a unit step function Use a staircase approimation H(,) of the continuous-valued function h() The netor ill have P binary (nonlinear) perceptrons ith TLUs in the input layer ( ) > < + 0 for 0 for undefined 0 for 0 ) sgn( ζ ( ) + +....., h h H P P P i i ζ ζ ζ ζ

Multilayer Feedforard Netors as Function Approimators If e replace the TLUs ith continuous activation functions bump function (may not the best case for a particular problem) 33

Multilayer Feedforard Netors as Function Approimators The output layer in the above eample also can be replaced ith a preceptron ith nonlinear activation function Such a netor architecture can approimate virtually any multivariable function, if provided sufficiently many hidden neurons are available 34

Learning Factors Error Curve 35

Learning Factors Initial Weights The eights of the netor are typically initialized at small random values The initialization strongly affects the ultimate solution Equal initial eights? Select another set of initial eights, and then restart! Incremental Updating versus Cumulative Weight Adustment Incremental Updating: Weight adustments do not need to be stored May seed toard the most recent patterns in the training cycle 36

Learning Factors Incremental Updating versus Cumulative Weight Adustment Cumulative Weight Adustment : P p p Provided that the learning constant is small enough, the cumulative eight adustment procedure can still implement the algorithm close to the gradient decent minimization We may present the training eamples in random in each training cycle 37

Learning Factors f f Steepness of the activation function ( net ) ( net ) + ep net ( λ ) λ ep ( λ net ) [ + ep ( λ net )] Bipolar continuous activation function Have a maimum value of λ at net0 The large λ may yield results similar to that of large learning constant η 38

Learning Factors Momentum Method Supplement the current eight adustments ith a fraction of the most recent eight adustment ( t ) η E ( t ) + α ( t ) After a total of N steps ith the momentum method N () n t η α E ( t n ) n 0 39

Learning Factors Momentum Method 40

Summary of Error Bac-propagation Netor A set of P training pairs (z p,d p ) {( z, d ), p, P} p p,..., Minimize the vector of total error E P p o ( W, V, z ) p d p 4

Netor Architecture vs. Data Representation... 9 t [ ] t [ ] : class C,T : class C,I,T t [ 3 3] : class C,I, T 4

Necessary Number of Hidden Neurons For to-layer feedforard netor if the n-dimensional nonargumented input space is linear separable into M disoint regions, the necessary number of hidden neuons ould be J M J Mirchandini and Cao (989) 43

Character Recognition Application Proect a point of the character into its three closest vertical, horizontal, and diagonal bars Then normalized the bar values to be beteen 0 and Input vector is 3-dimensional and the activation function is unipolar continuous function 90~95% accuracy I J K 4 6 6 0 4 44

Character Recognition Application Eample 4.5 45

Digit Recognition Application 96~98% accuracy 46

Epert System Applications Eplanation function: Neural netor epert systems are typically unable to provide the user ith the reasons for the decisions made. 47

Learning Time Sequences 48

Functional Lin Netor Enhance the representation of the input data Any to elements Any three elements 49