Ch4: Perceptron Learning Rule

Size: px

Start display at page:

Download "Ch4: Perceptron Learning Rule"

Elfreda Sims
6 years ago
Views:

1 Ch4: Perceptron Learning Rule Learning Rule or Training Algorithm: A procedure for modifying weights and biases of a network. Learning Rules Supervised Learning Reinforcement Learning Unsupervised Learning 1

2 Learning Rules Supervised Learning { p 1, t 1 } { p 2, t 2 } { p Q, t Q } (مثل آموزش پرسپترون) Network is provided with a set of examples of proper network behavior (inputs/targets) Reinforcement Learning Network is only provided with a grade, or score, which indicates network performance. Unsupervised Learning Only network inputs are available to the learning algorithm. Network learns to categorize (cluster) the inputs. (Useful in such applications as Vector Quantization). 2

3 Learning Rules Supervised Learning (Learning with a Teacher) Network is provided with a set of examples of proper network behavior (inputs/targets) { p 1, t 1 } { p 2, t 2 } { p Q, t Q } The desired response (target) represents the "optimum" action to be performed by the neural network. The network parameters are adjusted under the combined influence of the training vector and the error signal. The error signal is defined as the difference between the desired response and the actual response of the network. 3

4 Figure 24 Block diagram of learning with a teacher; the part of the figure printed in red constitutes a feedback loop. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

5 Learning without a Teacher Reinforcement Learning Network is only provided with a grade, or score, which indicates network performance. The learning of an input-output mapping is performed through continued interaction with the environment in order to minimize a scalar index of performance. The goal of reinforcement learning is to minimize a cost-to-go function, defined as the expectation of the cumulative cost of actions taken over a sequence of steps instead of simply the immediate cost. 5

6 Figure 25 Block diagram of reinforcement learning; the learning system and the environment are both inside the feedback loop. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

7 Learning without a Teacher Unsupervised Learning Only network inputs are available to the learning algorithm. There is no external teacher or critic to oversee the learning process. Rather, provision is made for a task-independent measure of the quality of representation that the network is required to learn, and the free parameters of the network are optimized with respect to that measure. We may use a competitive-learning rule. 7

8 Figure 26 Block diagram of unsupervised learning. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

9 1- Pattern Association LEARNING TASKS تداعی معانی-اتحاد-پیوستگی An associative memory is a brainlike distributed memory that learns by association. Association takes one of two forms: autoassociation and heteroassociation. In autoassociation, a neural network is required to store a set of patterns (vectors) by repeatedly presenting them to the network. The network is subsequently presented with a partial description or distorted (noisy) version of an original pattern stored in it, and the task is to retrieve (recall) that particular pattern. 9

10 Heteroassociation differs from auto association in that an arbitrary set of input patterns is paired with another arbitrary set of output patterns. Autoassociation involves the use of unsupervised learning, whereas the type of learning involved in heteroassociation is supervised. The pattern association performed by the network is described by xk yk, k 1,2,, q where q is the number of patterns stored in the network. There are two phases: storage phase-recall phase 10

11 Figure 27 Input output relation of pattern associator. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

12 2- Pattern Recognition Defined as the process whereby a received pattern/signal is assigned to one of a prescribed number of classes. 3- Function Approximation Consider a nonlinear input-output mapping described by the functional relationship d = f(x) where the vector x is the input and the vector d is the output. The vector valued function f(.) is assumed to be unknown. we are given the set of labeled examples 12

13 The requirement is to design a neural network that approximates the unknown function f(.) such that the function F(.) describing the input-output mapping actually realized by the network, is close enough to f(.) in a Euclidean sense over all inputs, as shown by The ability of a neural network to approximate an unknown input-output mapping may be exploited in two important ways: System identification & Inverse modeling. 13

14 Figure 28 Illustration of the classical approach to pattern classification. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

15 Figure 29 Block diagram of system identification: The neural network, doing the identification, is part of the feedback loop. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

16 Figure 30 Block diagram of inverse system modeling. The neural network, acting as the inverse model, is part of the feedback loop. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

17 4- Control The control of a plant is another learning task that is well suited for neural networks; by a "plant" we mean a process or critical part of a system that is to be maintained in a controlled condition. The system involves the use of unity feedback around a plant to be controlled; that is, the plant output is fed back directly to the input. Thus, the plant output y is subtracted from a reference signal d supplied from an external source. The error signal e so produced is applied to a neural controller for the purpose of adjusting its free parameters. 17

18 Figure 31 Block diagram of feedback control system. The primary objective of the controller is to supply appropriate inputs to the plant to make its output y track the reference signal d. In other words, the controller has to invert the plant's input-output behavior. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

19 5- Beamforming Beamforming is used to distinguish between the spatial properties of a target signal and background noise. The device used to do the beamforming is called a beamformer. The task of beamforming is compatible, for example, with feature mapping in the cortical layers of auditory systems of echolocating bats. Beamforming is commonly used in radar and sonar systems where the primary task is to detect and track a target of interest in the combined presence of receiver noise and interfering signals (e.g., jammers). 19

20 Figure 32 Block diagram of generalized sidelobe canceller. array of antenna elements To cancel interference that leaks through the sidelobes of the radiation pattern of the spatial filter representing the linear combiner. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

21 Perceptron Perceptron was introduced by Frank Rosenblatt in the late 1950's (Rosenblatt, 1958) with a learning algorithm on it. Rosenblatt F., (1958), "The Perceptron: A probabilistic Model for Information Storage and Organization in the Brain", Psychological Review, 65: , Perceptron may have continuous valued inputs. 21

22 Perceptron Architecture w 1 1 w 1 2 w 1 R W = w 2 1 w 2 2 w 2 R w S 1 w S 2 w S R iw w i 1 w i 2 = W = 1w T 2w T w i R S wt iw is elements of ith row of W a i = hardlimn i = hardlim w T i p+ b i 22

23 Single-Neuron Perceptron w 1 1 = 1 w 1 2 = 1 b = 1 1w T p b 0 a = hardlim w T 1 p+ b = hardlimw 1 1 p 1 + w 1 2 p 2 + b 23

24 Decision Boundary w T 1 p + b = 0 w T 1 p = b All points on the decision boundary have the same inner product with the weight vector. Therefore they have the same projection onto the weight vector, and they must lie on a line orthogonal to the weight vector 24

25 Example - OR p 0 1 = t 1 = 0 0 p 0 2 = t 2 = 1 1 p 1 3 = t 3 = 1 0 p 1 4 = t 4 =

OR Solution Weight vector should be orthogonal to the decision boundary. 1 w = 0.5 0.

26 OR Solution Weight vector should be orthogonal to the decision boundary. 1 w = Pick a point on the decision boundary to find the bias. w T 1 p b + = b = b = 0 b =

27 Multiple-Neuron Perceptron Each neuron will have its own decision boundary. w T i p + b i = 0 A single neuron can classify input vectors into two categories. A multi-neuron perceptron can classify input vectors into 2 S categories. 27

28 Learning Rule Test Problem { p 1, t 1 } { p 2, t 2 } { p Q, t Q آزمايش بدون باياس } p 1 1 = t 1 = 1 2 p 1 2 = t 2 = 0 2 p 0 3 = t 3 = 0 1 دايره تو پر كالس 1=t و دايره توخالي كالس 0=t است. 28

Starting Point Random initial weight: 1 w = 1.0 0.8 Present p 1 to the network: a hardlim w T = 1 p 1 = hardlim 1.0 0.8 a = hardlim 0.

29 Starting Point Random initial weight: 1 w = Present p 1 to the network: a hardlim w T = 1 p 1 = hardlim a = hardlim 0.6 = 0 Incorrect Classification. راه حل: بردار وزن بايد در جهتی اصالح شود كه مرز تصمیم گیري بسمت بردار حركت كند. 1 2 p 1 29

30 Tentative Learning Rule Set 1 w to p 1 Not stable اگر 1w برابر p 1 و يا p 2 در نظر گرفته شود پاسخ پايداري حاصل نمی شود. Add p 1 to 1 w Tentative Rule: If t = 1 and a = 0, then w 1 new = old 1w +p 1 w new 1 w old = + p = + =

Second Input Vector a hardlim w T = 1 p 2 = hardlim 1 2.0 1.2 2 a = ha rdlim0.

31 Second Input Vector a hardlim w T = 1 p 2 = hardlim a = ha rdlim0.4 = 1 (Incorrect Classification) الزم است بردار وزن از p 2 دور گردد. Modification to Rule: If t = 0 and a = 1, then w 1 new = old 1w p new 1w ol d = 1w p 2 = =

Third Input Vector a hardlim w T = 1 p 3 = hardlim 3.0 0.8 a = hardlim0.

32 Third Input Vector a hardlim w T = 1 p 3 = hardlim a = hardlim0.8 = 1 (Incorrect Classification) w new 1 w old = p = = Patterns are now correctly classified. If t = new a, then 1 w = 1 w o ld. 32

33 Unified Learning Rule If t = 1 and a = 0, then w مجموعه قوانين 1 new If t = 0 and a = 1, then w If t = 1 n ew new a, then 1 w = = = old 1w 1 w old 1 w ol d + p p If e = e = t a new 1, then 1 w If e = 1, then w If e = 1 new = = new 0, then 1 w 1 w old = old 1w + p 1 w old p Unified LR new 1w 1w old + ep w = = 1 old b new = b ol d + e + t ap A bias is a weight with an input of 1. 33

34 Multiple-Neuron Perceptrons To update the ith row of the weight matrix: i w new = i w old +e i p b i new = b i old + e i Matrix form: W new = W old + ep T b new = b ol d + e 34

35 Apple/Banana Example 1 p = 1 1 t = Training Set Initial Weights 1 p = 2 1 t = W = b = 0.5 = = hardlim a hardlim Wp 1 + b a = hardlim 0.5 = 0 First Iteration e = 1 t a = 1 0 = W new = W ol d + ep T = = b new = b ol d + e = =

36 Second Iteration 1 a = hardlim ( Wp 2 + b) = hardlim ( ) a = hardlim ( 2.5) = 1 1 e = t 2 a = 0 1 = 1 W new = W old + ep T = = b new = b old + e = =

37 Check 1 a = hardl im ( Wp 1 + b) = hardlim ( ) 1 a = hardlim ( 1.5) = 1 = t 1 1 a = hardl im ( Wp 2 + b) = hardlim ( ) 1 a = hardlim ( 1.5) = 0 = t 2 Demonstrate program nnd4pr 37

38 Figure 1.4 separable. (a) A pair of linearly separable patterns. (b) A pair of non-linearly Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

39 The types of decision regions that can be formed by single and multilayer perceptrons with one and two layers of hidden layers are given in the Figure 6.5 [Lipmann 87]. 39

40 Learning Rule (Batch) Unified LR new 1w 1w old + ep w = = b new = b ol d + e 1 old new old e w w p + t ap e = t a A bias is a weight with an input of 1. J e b T 1w ( 1w p ) misclassified ps J e. 1 w p new old w w J w misclassified ps w w w R T new old 1 1 e. misclassified ps w w p 40

41 Figure 1.8 The double-moon classification problem. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

42 Figure 1.9 Perceptron with the double-moon set at distance d = 1. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

43 Figure 1.10 Perceptron with the double-moon set at distance d = 4. Neural Networks and Learning Machines, Third Edition Simon Haykin Copyright 2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.

44 Guarantee of Success: Novikoff (1963) Theorem 2.1: Given training samples from two linearly separable classes, the perceptron training algorithm terminates after a finite number of steps, and correctly classifies all elements of the training set, irrespective of the initial random non-zero weight vector w 0. Let w k be the current weight vector. We need to prove that there is an upper bound on k. Let p j be a misclassified input vector; Let x k = class(p j )p j, implying that w k-1 x k < 0; 44

45 Guarantee of Success: Novikoff (1963) Proof: Assume = 1, without loss of generality. After k steps of the learning algorithm, the current weight vector is w k = w 0 + x 1 + x x k. (2.1) Since the two classes are linearly separable, there must be a vector of weights w* that correctly classifies them, that is, sgn(w*p k ) = class(p k ). Multiplying each side of eq. 2.1 with w*, we get: w* w k = w*w 0 + w*x 1 + w*x w*x k. 45

46 Guarantee of Success: Novikoff (1963) w* w k = w*w 0 + w*x 1 + w*x w*x k. For each input vector p j, the dot product w*p j has the same sign as class(p j ). Since the corresponding element of the training sequence x = class(p j )p j, we can be assured that w*x = w*(class(p j )p j ) > 0. Therefore, there exists an > 0 such that w*x i > for every member x i of the training sequence. Hence: w* w k > w*w 0 + k. (2.2) 46

47 Guarantee of Success: Novikoff (1963) w* w k > w*w 0 + k. (2.2) By the Cauchy-Schwarz inequality: w*w k 2 w* 2 w k 2. (2.3) We may assume that w* = 1, since the unit length vector w*/ w* also correctly classifies the same samples. Using this assumption and eqs. 2.2 and 2.3, we obtain a lower bound for the square of the length of w k : w k 2 > (w*w 0 + k) 2. (2.4) 47

48 Guarantee of Success: Novikoff (1963) Since w j = w j-1 + x j, the following upper bound can be obtained for this vector s squared length: w j 2 = w j w j = w j-1 w j-1 + 2w j-1 x j + x j x j = w j w j-1 x j + x j 2 Since w j-1 x j < 0 whenever a weight change is required by the algorithm, we have: w j 2 - w j-1 2 < x j 2 Summation of the above inequalities over j = 1,, k gives an upper bound w k 2 - w 0 2 < k max x j 2 48

49 Guarantee of Success: Novikoff (1963) w k 2 - w 0 2 < k max x j 2 Combining this with inequality 2.4: w k 2 > (w*w 0 + k) 2. (2.4) Gives us: (w*w 0 + k) 2 < w k 2 < w k max x j 2 Now the lower bound of w k 2 increases at the rate of k 2, and its upper bound increases at the rate of k. Therefore, there must be a finite value of k such that: (w*w 0 + k) 2 > w k max x j 2 This means that k cannot increase without bound, so that the algorithm must eventually terminate. 49

50 Perceptron Rule Capability The perceptron rule will always converge to weights which accomplish the desired classification, assuming that such weights exist. Perceptron Limitations Linear Decision Boundary (hyperplane) w T 1 p+ b = 0 Multi-Layer perceptron can solve Linearly Inseparable problems HW1 - Ch 4: E 2, 4, 6, 7 50

CSC Neural Networks. Perceptron Learning Rule

CSC Neural Networks. Perceptron Learning Rule CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron