LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

LIMITATIONS OF RECEPTRON XOR Problem The failure of the ercetron to successfully simle roblem such as XOR (Minsky and Paert). x y z x y z 0 0 0 0 0 0 Fig. 4. The exclusive-or logic symbol and function table. X Y Coordinate Outut reresentationz 0 0 0 Y 0 0 0 X Fig. 5 The XOR roblem in attern sace.

Summary Percetron - artificial neuron. Takes weighted sum of inuts, oututs + if grater then the threshold else oututs 0. Hebbian learning (increasing effectiveness of active junctions) is redominant aroach. Learning corresonds to adjusting the values of the weights. Feedforward suervised networks. Can use +, - instead of 0, values. Can only solve roblems that are linearly searable - therefore fails on XOR. Further Reading. Parallel Distributed Processing, Volume. J.L McClelland & D.E. Rumelhart. MIT Bradford Press, 986. An excellent, broad-ranging book that covers many areas of neural networks. It was the book that signalled the resurgence of interest in neural systems. 2. Organisation and Behaviour. Donald Hebb. 949. Contains Hebb s original ideas regarding learning by reinforcement of active neurons. 3. Percetrons. M. Minsky & S. Paert. MIT Press 969. The criticism of single-layer ercetrons are laid out in this book. A very interesting read, if a little too mathematical in laces for some tastes.

XOR Problem THE MULTILAYER PERCEPTRON An initial aroach would be to use more than one ercetron, each set u to identify small, linearly searable sections of the inuts, then combining their oututs into another ercetron, which would roduce a final indication of the class to which the inut belongs. 3 2 Fig. 6. Combining ercetrons can solve XOR roblem Percetron detects when the attern corresonding to (0,) is resent, and the other detect when (,0) is there. Combined, these to facts allow ercetron 3 to classify the inut correctly.

Note The concet roosed seems fine on first examination but for the following reasons we have to modify the ercetion model Each neuron in the structure takes the weighted sum of its inuts, thresholds it and oututs a one or a zero. For the ercetron in the first layer the inuts come from the actual inuts in the network, while the ercetrons in the second layer take as their inuts the oututs from the first layer. In consequence the ercetrons in the second layer do not know which of the real inuts were or not. Sine learning corresonds to strengthening the connections between active inuts and active units, it is imossible to strengthen the correct arts of the network, since the actual inuts are effectively masked off from the inut units. The hard-limiting threshold function removes the information that is needed if the network is successfully learn - credit assignment roblem. The solution If we smooth the thresholding rocess out so that it more or less turns on or off, as before, but has a sloing region in the middle that will give us some information on inuts we will be able to determine when we need to strengthen or weaken the relevant weights - the network will be able to learn. 0 Linear thershold 0 Sigmoidal threshold Fig. 7. Two ossible thresholding functions. THE NEW MODEL

The adated ercetron units are arranged in layers, and so the new model is naturally enough termed the multilayer ercetron. Fig. 8. The multilayer ercetron: our new model. Our model has three layers an inut layer. an outut layer a hidden layer Each unit in the hidden layer and the outut layer is like a ercetion unit, excet that the thresholding function is the one shown in figure 7, the sigmoid function not the ste function as before. The units in the inut layer serve to distribute the values they receive to the next layer, and so do not erform a weighted sum or threshold. We are forced to alter our learning rule.

The New Learning Model The learning rule for multilayer is called generalised delta rule, or the backroagation rule (Rumelhart, McClelland and Williams 986, Parker 982, Werbos 974) The oeration of the network is similar to that of the single layer ercetion. The learning rule is a little bit comlex than the revious one. We need to define an error function that reresents the difference between the network s current outut and the correct outut that we want it to roduce. Because we need to know the correct attern, this tye of learning is known as suervised learning. In order to learn successfully we want continually reduce the value of the error function - this is achieved by adjusting the weights on the links between units. The generalised delta rule does it by calculating the value of the error function for that articular inut, and next backroagating the error from one layer to the revious one in order to adjust weights. For units actually on the outut, their outut and desired outut is known, so adjusting the weights is relatively simle. For units in the middle layer, the adjustment is not obvious.

The Mathematics Notation E -the error function for attern. t -the target outut for attern on node j. o -the actual outut for the attern at the j w -the weight from node i to node j. The error function E = t o 2 ( ) j 2 The activation of each unit j, for attern net = w o i i The outut from each unit j is the threshold function f j acting on the weighted sum ( ) o = f net j

The roblem is to find weights to minimise the error function. = The second term in the above equation can be calculated jk = wo o o kj k = k = k k i since w jk = 0 excet when k=i when it equals. The change of error as a function of the change in the net inuts to a unit what gives = δ = δ Decreasing the value of E therefore means making the weight changes roortional to δ o i o, i. w = ηδ o i

We now need to know what δ is for each of the units - if we know this, then we can decrease E δ = = Now considering and = f j ( net) ( t o ) = for oututs units we get ( )( ) = f net t o δ i j If a unit is not an outut one we can write, by the chain rule again k = = wo ik i = k k k k i i δ k w ik and finally δ ( ) = f net δ w i j k jk k The two enclosed equations above reresent the change in the error function with resect to the weights in the network.

The sigmoid function with o = f( net) = + ex knet ( ) f kex ( net) = + ex ( knet) ( knet) 2 ( ) ( ) f ( net) = kf net and finally ( ) ( ) f net = ko o Note The error function is roortional to the errors δ in subsequent units. The error should be calculated in the outut units first. Next the error should be assed back through the net to the earlier units to allow them to alter their connection weights. It is the assing back of this error value that leads to the network being referred to as back-roagation networks.

THE MULTILAYER PERCEPTION ALGORITHM. Initialise weights and thresholds to small random values. 2. Present inut X = x0, x, x2,..., xn and target outut T = t0, t, t2,..., xm where n is the number of inut nodes and m is the number of outut nodes. Set w 0 = θ, the bias, and x 0 =. For attern association, X and T reresent the atterns to be associated. For classification, T is set to zero excet for one element set to that corresonds to the class that X is in. 3. Calculate actual outut Each layer calculates n y = f wx i i i= 0 and asses that as an inut to the next layer. The final layer oututs values o. 4. Adat weights Start from the outut layer, and work backwards. ( ) ( ) w t + = w t + ηδ o w reresents the weights from node i to node j at time t, η is a gain term and δ is an error term form attern on node j. For outut units ( ) ( )( ) f net = ko o t o For hidden units δ ( ) = ko o δ w i k jk k where sum is over k nodes in the layer above node j.

The XOR Problem Revisited The two layer net is shown in figure 9. is able to roduce the correct outut. The connection weights are shown on the links. The threshold of each unit is shown inside the unit. 0.5 outut unit + -2 +.5 hidden unit + + inut Fig. 9. A solution to the XOR roblem. Another solution to XOR roblem is shown in figure 20. -6.3 outut unit -4.2-9.4-4.2-2.2 hidden unit -6.4-6.4 inut Fig. 20 Weights and thresholds of a network that has learnt to solve the XOR roblem.

0.5 + - 0.5.5 + + + + inut Fig. 2. An XOR-solving network with no direct inut-outut connections. 0.8-4.5 5.3 -.8 0. -2.0 8.8 4.3 9.2 inut Fig. 22. A stable solution that does not work.