LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

Similar documents
Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale

Recent Developments in Multilayer Perceptron Neural Networks

Chapter ML:VI (continued)

Principal Components Analysis and Unsupervised Hebbian Learning

Artificial Neural Networks Examination, March 2004

Neural network models for river flow forecasting

E( x ) [b(n) - a(n, m)x(m) ]

Chapter ML:VI (continued)

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Course 395: Machine Learning - Lectures

E( x ) = [b(n) - a(n,m)x(m) ]

CE213 Artificial Intelligence Lecture 14

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Simple Neural Nets For Pattern Classification

Artificial Neural Networks Examination, June 2005

Introduction to Neural Networks

Introduction to Artificial Neural Networks

Unit III. A Survey of Neural Network Model

Feedback-error control

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Neural networks. Chapter 20. Chapter 20 1

Revision: Neural Network

Part 8: Neural Networks

Radial Basis Function Networks: Algorithms

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Machine Learning. Neural Networks

Neural Networks (Part 1) Goals for the lecture

Input layer. Weight matrix [ ] Output layer

Neural Networks: Basics. Darrell Whitley Colorado State University

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Multilayer Neural Networks

Multilayer Feedforward Networks. Berlin Chen, 2002

Chapter 2 Single Layer Feedforward Networks

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Neural Networks and the Back-propagation Algorithm

Lecture 7 Artificial neural networks: Supervised learning

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Fuzzy Automata Induction using Construction Method

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Feedforward Neural Nets and Backpropagation

EEE 241: Linear Systems

Artifical Neural Networks

Artificial Neural Networks

Multilayer Perceptrons and Backpropagation

An Algorithm of Two-Phase Learning for Eleman Neural Network to Avoid the Local Minima Problem

4. Multilayer Perceptrons

Learning and Memory in Neural Networks

Computational Intelligence Winter Term 2017/18

Reification of Boolean Logic

Best approximation by linear combinations of characteristic functions of half-spaces

Single Layer Perceptron Networks

Computational Intelligence

Multilayer Perceptron Tutorial

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Multilayer Perceptron

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Neural networks. Chapter 19, Sections 1 5 1

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. Historical description

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

One step ahead prediction using Fuzzy Boolean Neural Networks 1

Artificial Neural Networks Examination, June 2004

Lab 5: 16 th April Exercises on Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Artificial Neural Networks The Introduction

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks biological neuron artificial neuron 1

Introduction to feedforward neural networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

CSCI 315: Artificial Intelligence through Deep Learning

Introduction To Artificial Neural Networks

PCA fused NN approach for drill wear prediction in drilling mild steel specimen

CS 4700: Foundations of Artificial Intelligence

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Artificial Intelligence

Data Mining Part 5. Prediction

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Artificial Neural Networks. Part 2

EPL442: Computational

Artificial Neural Networks. Edward Gatt

Convex Optimization methods for Computing Channel Capacity

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Introduction Biologically Motivated Crude Model Backpropagation

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Lecture 4: Perceptrons and Multilayer Perceptrons

Supervised Learning in Neural Networks

arxiv: v2 [quant-ph] 2 Aug 2012

Artificial Neural Network

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Transcription:

LIMITATIONS OF RECEPTRON XOR Problem The failure of the ercetron to successfully simle roblem such as XOR (Minsky and Paert). x y z x y z 0 0 0 0 0 0 Fig. 4. The exclusive-or logic symbol and function table. X Y Coordinate Outut reresentationz 0 0 0 Y 0 0 0 X Fig. 5 The XOR roblem in attern sace.

Summary Percetron - artificial neuron. Takes weighted sum of inuts, oututs + if grater then the threshold else oututs 0. Hebbian learning (increasing effectiveness of active junctions) is redominant aroach. Learning corresonds to adjusting the values of the weights. Feedforward suervised networks. Can use +, - instead of 0, values. Can only solve roblems that are linearly searable - therefore fails on XOR. Further Reading. Parallel Distributed Processing, Volume. J.L McClelland & D.E. Rumelhart. MIT Bradford Press, 986. An excellent, broad-ranging book that covers many areas of neural networks. It was the book that signalled the resurgence of interest in neural systems. 2. Organisation and Behaviour. Donald Hebb. 949. Contains Hebb s original ideas regarding learning by reinforcement of active neurons. 3. Percetrons. M. Minsky & S. Paert. MIT Press 969. The criticism of single-layer ercetrons are laid out in this book. A very interesting read, if a little too mathematical in laces for some tastes.

XOR Problem THE MULTILAYER PERCEPTRON An initial aroach would be to use more than one ercetron, each set u to identify small, linearly searable sections of the inuts, then combining their oututs into another ercetron, which would roduce a final indication of the class to which the inut belongs. 3 2 Fig. 6. Combining ercetrons can solve XOR roblem Percetron detects when the attern corresonding to (0,) is resent, and the other detect when (,0) is there. Combined, these to facts allow ercetron 3 to classify the inut correctly.

Note The concet roosed seems fine on first examination but for the following reasons we have to modify the ercetion model Each neuron in the structure takes the weighted sum of its inuts, thresholds it and oututs a one or a zero. For the ercetron in the first layer the inuts come from the actual inuts in the network, while the ercetrons in the second layer take as their inuts the oututs from the first layer. In consequence the ercetrons in the second layer do not know which of the real inuts were or not. Sine learning corresonds to strengthening the connections between active inuts and active units, it is imossible to strengthen the correct arts of the network, since the actual inuts are effectively masked off from the inut units. The hard-limiting threshold function removes the information that is needed if the network is successfully learn - credit assignment roblem. The solution If we smooth the thresholding rocess out so that it more or less turns on or off, as before, but has a sloing region in the middle that will give us some information on inuts we will be able to determine when we need to strengthen or weaken the relevant weights - the network will be able to learn. 0 Linear thershold 0 Sigmoidal threshold Fig. 7. Two ossible thresholding functions. THE NEW MODEL

The adated ercetron units are arranged in layers, and so the new model is naturally enough termed the multilayer ercetron. Fig. 8. The multilayer ercetron: our new model. Our model has three layers an inut layer. an outut layer a hidden layer Each unit in the hidden layer and the outut layer is like a ercetion unit, excet that the thresholding function is the one shown in figure 7, the sigmoid function not the ste function as before. The units in the inut layer serve to distribute the values they receive to the next layer, and so do not erform a weighted sum or threshold. We are forced to alter our learning rule.

The New Learning Model The learning rule for multilayer is called generalised delta rule, or the backroagation rule (Rumelhart, McClelland and Williams 986, Parker 982, Werbos 974) The oeration of the network is similar to that of the single layer ercetion. The learning rule is a little bit comlex than the revious one. We need to define an error function that reresents the difference between the network s current outut and the correct outut that we want it to roduce. Because we need to know the correct attern, this tye of learning is known as suervised learning. In order to learn successfully we want continually reduce the value of the error function - this is achieved by adjusting the weights on the links between units. The generalised delta rule does it by calculating the value of the error function for that articular inut, and next backroagating the error from one layer to the revious one in order to adjust weights. For units actually on the outut, their outut and desired outut is known, so adjusting the weights is relatively simle. For units in the middle layer, the adjustment is not obvious.

The Mathematics Notation E -the error function for attern. t -the target outut for attern on node j. o -the actual outut for the attern at the j w -the weight from node i to node j. The error function E = t o 2 ( ) j 2 The activation of each unit j, for attern net = w o i i The outut from each unit j is the threshold function f j acting on the weighted sum ( ) o = f net j

The roblem is to find weights to minimise the error function. = The second term in the above equation can be calculated jk = wo o o kj k = k = k k i since w jk = 0 excet when k=i when it equals. The change of error as a function of the change in the net inuts to a unit what gives = δ = δ Decreasing the value of E therefore means making the weight changes roortional to δ o i o, i. w = ηδ o i

We now need to know what δ is for each of the units - if we know this, then we can decrease E δ = = Now considering and = f j ( net) ( t o ) = for oututs units we get ( )( ) = f net t o δ i j If a unit is not an outut one we can write, by the chain rule again k = = wo ik i = k k k k i i δ k w ik and finally δ ( ) = f net δ w i j k jk k The two enclosed equations above reresent the change in the error function with resect to the weights in the network.

The sigmoid function with o = f( net) = + ex knet ( ) f kex ( net) = + ex ( knet) ( knet) 2 ( ) ( ) f ( net) = kf net and finally ( ) ( ) f net = ko o Note The error function is roortional to the errors δ in subsequent units. The error should be calculated in the outut units first. Next the error should be assed back through the net to the earlier units to allow them to alter their connection weights. It is the assing back of this error value that leads to the network being referred to as back-roagation networks.

THE MULTILAYER PERCEPTION ALGORITHM. Initialise weights and thresholds to small random values. 2. Present inut X = x0, x, x2,..., xn and target outut T = t0, t, t2,..., xm where n is the number of inut nodes and m is the number of outut nodes. Set w 0 = θ, the bias, and x 0 =. For attern association, X and T reresent the atterns to be associated. For classification, T is set to zero excet for one element set to that corresonds to the class that X is in. 3. Calculate actual outut Each layer calculates n y = f wx i i i= 0 and asses that as an inut to the next layer. The final layer oututs values o. 4. Adat weights Start from the outut layer, and work backwards. ( ) ( ) w t + = w t + ηδ o w reresents the weights from node i to node j at time t, η is a gain term and δ is an error term form attern on node j. For outut units ( ) ( )( ) f net = ko o t o For hidden units δ ( ) = ko o δ w i k jk k where sum is over k nodes in the layer above node j.

The XOR Problem Revisited The two layer net is shown in figure 9. is able to roduce the correct outut. The connection weights are shown on the links. The threshold of each unit is shown inside the unit. 0.5 outut unit + -2 +.5 hidden unit + + inut Fig. 9. A solution to the XOR roblem. Another solution to XOR roblem is shown in figure 20. -6.3 outut unit -4.2-9.4-4.2-2.2 hidden unit -6.4-6.4 inut Fig. 20 Weights and thresholds of a network that has learnt to solve the XOR roblem.

0.5 + - 0.5.5 + + + + inut Fig. 2. An XOR-solving network with no direct inut-outut connections. 0.8-4.5 5.3 -.8 0. -2.0 8.8 4.3 9.2 inut Fig. 22. A stable solution that does not work.