Introduction to Artificial Neural Networks

Size: px

Start display at page:

Download "Introduction to Artificial Neural Networks"

Geoffrey Williamson
6 years ago
Views:

1 Facultés Universitaires Notre-Dame de la Paix 27 March 2007

2 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network

3 Outline 3 Single-layer ANN Perceptron Adaline Limitations 4 Topology Generalised Delta Rule Deficiencies 5 Jordan Network Hopfield Network

4 Outline 6 Application Smart sweepers 7 Conclusion

5 Introduction Fundamentals 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network

6 History Introduction Fundamentals 1943 : W. McCulloch & W. Pitts : first model of artificial neuron 1949 : D. Hebb describes the first learning rule (Hebb s Law) 1957 : F. Rosenblatt designs the Perceptron 1965 : Nils J. Nilsson publishes Learning Machines (automated learning fundamentals)

7 History Introduction Fundamentals 1969 : M. Minsky & S. Papert expose limitations of Perceptron (XOR problem) 1975 : First multi-layer ANN with training algorithm (Cognitron) 1982 : Hopfield networks (J. Hopfield), Self-Organizing Map (T. Kohonen) 1986 : Backpropagation algorithm (R. Williams, D. Rumelhart & G. Hinton)

8 Type of problems Introduction Fundamentals ANN can be used to solve certain types of problems : Classification Pattern recognition Artificial Intelligence Unknown function approximation Model complicated functions Stock exchange estimations Data processing (filtering, clustering,...)...

9 Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network

10 Introduction Fundamentals A view of the biological neuron Biological neuron Artificial neuron Artificial Neural Network The human nervous system is composed of about neurons. The synapses are the connections between axon terminals and dendrites. The synapses are characterized by a level of effectiveness.

11 Introduction Fundamentals A view of the biological neuron Biological neuron Artificial neuron Artificial Neural Network The impulses received at each dendrites are summed together. If the sum is above the stimulation threshold, the nucleus emits a spike down the axon.

12 Introduction Fundamentals A view of the artificial neuron Biological neuron Artificial neuron Artificial Neural Network x i : input from unit i w ij : weight of the connection from unit i to j θ j : bias of unit j ϕ : activation function o j : state of activation of unit j

13 Propagation rule Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Sigma unit type : net j = n w ij x i i=1 Sigma-pi unit type (Feldman & Ballard) : net j = n m w ij x ik i=1 k=1

14 Activation function Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Threshold function (Heaviside or sgn) : { 1 if v 0 ϕ(v) = 0 if v < 0 Semi-linear function : 1 if v 1 2 ϕ(v) = v if 1 2 < v < if v 1 2 Sigmoid function : ϕ(v) = e kv

15 Topology Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Separation can be made between two types of ANN structure : Feed-forward : acyclic and layer-decomposable graph (Perceptron, Adaline) Recurrent (Jordan, Hopfield, Kohonen)

16 What is an ANN? Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network ANN : I O I : set of inputs O : set of outputs Initially, the ANN will not respond correctly to given inputs. { } weights Because are not adapted. biases

17 Introduction Fundamentals How does an ANN learns? Biological neuron Artificial neuron Artificial Neural Network Most interesting characteristic of ANN is the capacity to generalize information from samples. This generalisation occurs through the learning process. Main idea : successive weight adjustment (gradient descent method) Supervised learning (or Associative learning) Unsupervised learning (or Self-organisation)

18 Supervised learning Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Idea : generate a population of input-output pairs feed ANN with input re-adjust weights and biases if the ANN doesn t output what is expected Representation of data is imposed to the ANN

19 Unsupervised learning Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Idea : generate a population of input pairs feed the population to the ANN let it extract statistical properties from the population Representation of data is defined by the ANN

20 Introduction Fundamentals Hebbian Learning Rule Biological neuron Artificial neuron Artificial Neural Network w ij = w ij + δw ij δw ij = γo i o j with : w ij : weight from unit i to unit j γ : learning rate o i : state of activation of unit i Virtually, all learning rules can be considered as variants of HLR.

21 Learning Rate Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Defines the speed at which the ANN will learn Usually a constant 0 < γ 1 γ 0 : slow convergence but stable solution. γ 1 : fast convergence but instable solution.

22 Over-fitting Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Excessive learning or inadapted training set can lead to an over-fitted network. An over-fitted ANN is specialised for the set which was used to train it. It lost a great part of its generalisation capability.

23 Two issues Introduction Fundamentals Biological neuron Artificial neuron Artificial Neural Network Representational power : ability of an ANN to represent a desired function. Since an ANN is built from standard functions, it can only approximate the desired function, even for an optimal set of weights. Ergo, the approximation error can never be equal to 0. Learning algorithm : given there exists a set of optimal weights (i.e. which minimize the approximation error), is there a procedure to compute them?

24 Single-layer ANN Perceptron Adaline Limitations 3 Single-layer ANN Perceptron Adaline Limitations 4 Topology Generalised Delta Rule Deficiencies 5 Jordan Network Hopfield Network

25 Perceptron Single-layer ANN Perceptron Adaline Limitations Proposed by F. Rosenblatt in A Perceptron is a single-layer ANN. Composed of one or more output neurons, connected to all inputs. Typically used as a linear classifier.

26 Simple case Single-layer ANN Perceptron Adaline Limitations Consider the following Perceptron : 1 neuron 2 inputs 1 output Threshold-type activation function : { 1 if v > 0 ϕ(v) = 1 otherwise We can use it as a classifier with a separation line : w 1 x 1 + w 2 x 2 + θ = 0

27 Simple case Single-layer ANN Perceptron Adaline Limitations x 2 = w 1 w 2 x 1 θ w 2

28 Perceptron learning Single-layer ANN Perceptron Adaline Limitations Learning consists of a successive weight adjustment : w ij = w ij + w ij θ j = θ j + θ j Problem : how to compute the w ij and θ j?

29 Single-layer ANN Perceptron Learning Rule Perceptron Adaline Limitations Consider a set of learning samples (x, d(x)), with : x : input vector d(x) : desired output Learning Method : 1. Start with random weights for the connections; 2. Select an input vector x from the set of training samples; 3. If o d(x) (the perceptron gives an incorrect response), modify all connections w ij according to : w ij = d j (x)x i θ j = d j (x) 4. Go back to 2.

30 Single-layer ANN Convergence Theorem Perceptron Adaline Limitations Theorem 1. If there exists a set of connection weights w which is able to perform the transformation o = d(x), the perceptron learning rule will converge to some solution (which may or may not be the same as w ) in a finite number of steps for any initial choice of the weights.

31 Numerical example Single-layer ANN Perceptron Adaline Limitations Initial parameters : w 1 = 1 w 2 = 2 θ = 2 Set of samples : Sample A : x = (0.5, 1.5) ; d(x) = 1 Sample B : x = (-0.5, 0.5) ; d(x) = -1 Sample C : x = (0.5, 0.5) ; d(x) = 1

32 Single-layer ANN Numerical example (cont d) Perceptron Adaline Limitations Sample A : Sample B : net = = 1.5 > 0 o = 1 net = = o = 1

33 Single-layer ANN Numerical example (cont d) Perceptron Adaline Limitations Sample C : Updated weights and bias : net = = o = 1 w 1 = w 1 + w 1 = = 1.5; w 2 = w 1 + w 2 = = 2.5; θ = θ + θ = = 1

34 Single-layer ANN Numerical example (cont d) Perceptron Adaline Limitations

35 Single-layer ANN Adaptive Linear Element Perceptron Adaline Limitations Proposed by B. Widrow and T. Hoff in Use a generalised version of the PLR, known as the Delta Rule. Focus is put on net j instead of o j.

36 Single-layer ANN Delta rule (Widrow-Hoff) Perceptron Adaline Limitations Main idea : minimize the error in the output through gradient descent. w i = γ(d p y p )x i with γ : learning rate d p : expected output for input p y p : obtained output for input p

37 Single-layer ANN Delta Rule derivation Perceptron Adaline Limitations Consider a single-layer ANN with an output unit using a linear activation function ; y = w i x i + θ i The objective is to minimize the total error given by : E = 1 (d p y p ) 2 2 p The idea is to adjust the weight proportionately to the negative of the derivative of the error with respect to each weight : p w i = γ Ep w i

38 Single-layer ANN Delta Rule derivation (cont d) Perceptron Adaline Limitations We can split the right derivative following the chain rule : E p w i = Ep y p y p w i The right derivative can be rewritten as : y p w i = x i because of the linearity of the activation function.

39 Single-layer ANN Delta Rule derivation (cont d) Perceptron Adaline Limitations The left derivative can be rewritten as : We obtain the Delta Rule : E p y p = (dp y p ) p w i = γ(d p y p )x i

40 XOR Problem Single-layer ANN Perceptron Adaline Limitations If no linear separation exists, single-layer ANN cannot classify properly. This limitation was exposed by Minsky and Papert through the XOR Problem. It is impossible to teach a single-layer ANN to solve the XOR Problem. Solution : add hidden layers to the ANN.

41 Single-layer ANN Topology Generalised Delta Rule Deficiencies 3 Single-layer ANN Perceptron Adaline Limitations 4 Topology Generalised Delta Rule Deficiencies 5 Jordan Network Hopfield Network

42 Topology Single-layer ANN Topology Generalised Delta Rule Deficiencies A multi-layer ANN is composed of : an input layer one or more hidden layer(s) an output layer In most applications, a single hidden layer is used with sigmoid activation functions.

43 Topology Single-layer ANN Topology Generalised Delta Rule Deficiencies

44 Single-layer ANN Generalised Delta Rule Topology Generalised Delta Rule Deficiencies An important assumption made for the Delta Rule was the linearity of the activation function. In a multi-layer ANN, this assumption no longer holds. We must find a way to generalize the Delta Rule, so that it doesn t restrain the weight adaptation to the output layer.

45 Derivation Single-layer ANN Topology Generalised Delta Rule Deficiencies Consider units with non-linear activation function : y p k = ϕ(netp k ) where net p k = i w ik y p i + θ k

46 Derivation (cont d) Single-layer ANN Topology Generalised Delta Rule Deficiencies The modification we should apply to each weight is given by : p w ik = γ Ep w ik In which E p, the total error is defined by : E p = 1 2 N o o=1 By using the chain rule we obtain : (d p o y p o) 2 p w ik = γ Ep net p net p k k w ik

47 Derivation (cont d) Single-layer ANN Topology Generalised Delta Rule Deficiencies The right derivative can be rewritten as : net p k w ik = y p i If we define δ p k = Ep net p k we obtain an update rule which is similar to the Delta Rule : p w ik = γδ p k yp i The problem now is to define this δ p k network. for the different unit k in the

48 Derivation (cont d) Single-layer ANN Topology Generalised Delta Rule Deficiencies By using the chain rule, we rewrite δ p k : δ p k = Ep net p k = Ep y p k The right derivative can be rewritten as : y p k net p k y p k net p k = ϕ (net p k ) For the left derivative, we must consider two cases : k is an output unit o ; δ p o = (d p o y p o)ϕ (net p o)

49 Derivation (cont d) Single-layer ANN Topology Generalised Delta Rule Deficiencies k is a hidden unit h ; E p y p h N o = δow p ho o=1 We can use this to write : N o δ p h = ϕ (net p h ) δow p ho o=1

50 Derivation (cont d) Single-layer ANN Topology Generalised Delta Rule Deficiencies The two equations : δ p o = (d p o y p o)ϕ (net p o) (1) N o δ p h = ϕ (net p h ) δow p ho (2) Define a recursive procedure which can be used to adjust the weights of the network. It constitutes the Generalised Delta Rule for a feed-forward network of non-linear units. o=1

51 Single-layer ANN Learning rate and Momentum Topology Generalised Delta Rule Deficiencies In order to have fast convergence with a stable solution, a momentum term is added to the variation of the weight. w jk (t + 1) = γδ p k yp j + α w jk(t) Instability of the solution is countered because the change in the weights is dependant of the previous change. It is possible to increase the learning rate γ without causing oscillation in the solution.

52 Single-layer ANN Learning rate and Momentum Topology Generalised Delta Rule Deficiencies a) γ 0 b) γ 1 c) γ 1 with a momentum term added

53 Deficiencies Single-layer ANN Topology Generalised Delta Rule Deficiencies Network paralysis : as the network is trained, the weights can increase to very high values (either positive or negative), so does net j. Because of the sigmoid function, the activation will be very close to zero or very close to one. In that case, the back-propagation algorithm may come to a standstill. Local Minima : because of the shape of the error function for a complex network, the gradient method can find itself trapped in a local minima. Some methods (probabilistic) can avoid this problem but are very slow. It is also possible to increase the number of hidden units without going beyond a certain threshold.

54 Single-layer ANN Jordan Network Hopfield Network 3 Single-layer ANN Perceptron Adaline Limitations 4 Topology Generalised Delta Rule Deficiencies 5 Jordan Network Hopfield Network

55 Jordan Network Single-layer ANN Jordan Network Hopfield Network Proposed by Jordan in Activation values of the output units are fed back into the input layer through so-called state units. The weights from output units to these state units is fixed to +1. Thus, the learning rules which apply to multi-layer ANN can be used to train Jordan Networks.

56 Jordan Network Single-layer ANN Jordan Network Hopfield Network

57 Hopfield Network Single-layer ANN Jordan Network Hopfield Network Proposed by J. Hopfield in Consists of a fully-interconnected network of N neurons which are both input and output. Updates are made asynchronously and independently. Activation values are binary (+1 / -1). Can be used as an associative memory or for optimisation problems (salesman problem).

58 Single-layer ANN Hopfield Auto-Associator Jordan Network Hopfield Network

59 Application Conclusion Smart sweepers 6 Application Smart sweepers 7 Conclusion

60 Smart sweepers Application Conclusion Smart sweepers Objective : train minesweepers to pick up mines in a 2-dimensionnal field. Parameters of the network : Topology : Feed-forward multi-layer ANN. 4 input units 6 hidden units on one layer 2 output units Activation function : sigmoid function. Learning rule : genetic algorithm.

61 Smart sweepers Application Conclusion Smart sweepers Input is composed of two vectors : Vector defining the direction of the closest mine. Vector defining the direction towards which the minesweeper is pointing. Output is composed of two components, the speeds of left and right track.

62 Smart sweepers Application Conclusion Smart sweepers Each minesweeper has its own set of weights. The ANN works for a certain amount of time T. During this period of time, each mine found increases the fitness of the sweeper. Afterwards, the GA starts running to create the new generation of weight sets.

63 Smart sweepers Application Conclusion Smart sweepers

64 Smart sweepers Application Conclusion Smart sweepers

65 Conclusion Application Conclusion Single-layer ANN : Limited representationnal power : restricted to linear classifier Linearity of the system convergence to the optimal solution (optimal weight vector) : Unlimited representationnal power : can model non-linear problems Non-linearity doesn t guarantee the convergence to an optimal solution

66 Conclusion Application Conclusion Choice of a representative learning sample is essential to obtain the expected behavior and to avoid over-fitting. Combination with other approaches can prove to be effective (genetic algorithms for instance)

67 References Application Conclusion B. Kröse, P. van der Smagt An introduction to Neural Networks, Eighth edition. S. Singh Neural Network Recognition of Hand-printed Characters.

Introduction to Neural Networks

Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning