CS407 Neural Computation

Size: px

Start display at page:

Download "CS407 Neural Computation"

Aldous Simpson
5 years ago
Views:

1 CS407 Neural Computation Lecture 5: The Multi-Layer Perceptron (MLP) and Backpropagation Lecturer: A/Prof. M. Bennamoun

2 What is a perceptron and what is a Multi-Layer Perceptron (MLP)? 2

3 What is a perceptron? x x 2 w k w k2 Bias b k v m = w x + k kj j j= Activation function b y k k = ϕ (v k ) Σ v k ϕ(.) Output y k x m Input signal w km Synaptic weights Summing junction Discrete Perceptron: ϕ( ) = sign( ) Continous Perceptron: ϕ( ) = S shape 3

4 Activation Function of a perceptron + + v i v i - Signum Function (sign) Discrete Perceptron: ϕ( ) = sign( ) Continous Perceptron: ϕ( v) = s shape 4

5 MLP Architecture The Multi-Layer-Perceptron was first introduced by M. Minsky and S. Papert in 969 Type: Feedforward Neuron layers: input layer or more hidden layers output layer Learning Method: Supervised 5

6 Terminology/Conventions Arrows indicate the direction of data flow. The first layer, termed input layer, just contains the input vector and does not perform any computations. The second layer, termed hidden layer, receives input from the input layer and sends its output to the output layer. After applying their activation function, the neurons in the output layer contain the output vector. 6

7 Why the MLP? The single-layer perceptron classifiers discussed previously can only deal with linearly separable sets of patterns. The multilayer networks to be introduced here are the most widespread neural network architecture Made useful until the 980s, because of lack of efficient training algorithms (McClelland and Rumelhart 986) The introduction of the backpropagation training algorithm. 7

8 Different Non-Linearly Separable Problems Structure Types of Decision Regions Exclusive-OR Problem Classes with Most General Meshed regionsregion Shapes Single-Layer Half Plane Bounded By Hyperplane A B B A B A Two-Layer Convex Open Or Closed Regions A B B A B A Three-Layer Arbitrary (Complexity Limited by No. of Nodes) A B B A B A 8

9 What is backpropagation Training and how does it work? 9

10 What is Backpropagation? Supervised Error Back-propagation Training The mechanism of backward error transmission (delta learning rule) is used to modify the synaptic weights of the internal (hidden) and output layers The mapping error can be propagated into hidden layers Can implement arbitrary complex/output mappings or decision surfaces to separate pattern classes For which, the explicit derivation of mappings and discovery of relationships is almost impossible Produce surprising results and generalizations 0

11 Architecture: Backpropagation Network The Backpropagation Net was first introduced by G.E. Hinton, E. Rumelhart and R.J. Williams in 986 Type: Feedforward Neuron layers: input layer or more hidden layers output layer Learning Method: Supervised Reference: Clara Boyd

12 Backpropagation Preparation Training Set A collection of input-output patterns that are used to train the network Testing Set A collection of input-output patterns that are used to assess network performance Learning Rate-α A scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments 2

13 Backpropagation training cycle Reference Eric Plammer / Feedforward of the input training pattern 3/ Adjustement of the weights 2/ Backpropagation of the associated error 3

14 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 4

15 BP NN With Single Hidden Layer Reference: Dan St. Clair Fausett: Chapter 6 O/P layer w j, k Hidden layer v i, j I/P layer Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall, 994. Notation Notation p. p of of Fausett Fausett 5

16 Notation x = input training vector t = Output target vector. δ k = portion of error correction weight for w jk that is due to an error at output unit Y k ; also the information about the error at unit Y k that is propagated back to the hidden units that feed into unit Y k δ j = portion of error correction weight for v jk that is due to the backpropagation of error information from the output layer to the hidden unit Z j α = learning rate. v oj = bias on hidden unit j w ok = bias on output unit k 6

17 Activation Functions Binary step Should be continuos, differentiable, and monotonically non-decreasing. Plus, its derivative should be easy to compute. Hyberbolic tangent f ( x) = + exp( x) f ' ( x) = f ( x)*[ f ( x)] Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall,

18 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 8

19 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

20 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

21 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

22 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

23 Let s examine Training Algorithm Equations Y [ x ] X =... x n Vectors & matrices make computation easier. Z Z 2 Z 3 X X 2 X 3 v,... v, p V = vn,... vn, p [ v v ] V 0 = 0,... 0, p W = w... wp,, w w, m... p, m v 2, [ w ] W 0 = 0,... w0, m Step 4 computation becomes Step 5 computation becomes Z Z _ in = = V 0 + [ f ( z _ in )... f ( z _ )] XV in p Y Y _ in = W = 0 + ZW [ f ( y _ in )... f ( y _ )] in m 23

24 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 24

25 Generalisation Once trained, weights are held contstant, and input patterns are applied in feedforward mode. - Commonly called recall mode. We wish network to generalize, i.e. to make sensible choices about input vectors which are not in the training set Commonly we check generalization of a network by dividing known patterns into a training set, used to adjust weights, and a test set, used to evaluate performance of trained network 25

26 Generalisation Generalisation can be improved by Using a smaller number of hidden units (network must learn the rule, not just the examples) Not overtraining (occasionally check that error on test set is not increasing) Ensuring training set includes a good mixture of examples No good rule for deciding upon good network size (# of layers, # units per layer) Usually use one input/output per class rather than a continuous variable or binary encoding 26

27 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 27

28 Reference: R. Spillman Example The XOR function could not be solved by a single layer perceptron network The function is: X Y F

29 XOR Architecture x v 0 Σ v f v 2 w 0 Σ w w 2 f y v 02 Σ v 2 v 22 f 29

30 Initial Weights Randomly assign small weight values: x Σ f Σ.3 f y Σ f. 30

31 Feedfoward st Pass x0 z in = -.3() +.2(0) +.5(0) = -.3 z = Σ f y in = -.4() -.2(.43) +.3(.56) = z in2 =.25() -.4(0) +.(0) Σ.3 f y =.42 (not 0) y Σ. Training Case: (0 0 0) f z 2 =.56 Activation function f: f ( x) = + e x 3

32 Backpropagate Σ.5 f δ_in = δ w = -.02(-.2) =.02 δ = δ_in f (z_in ) =.02(.43)(-.43) =.005 δ = (t y )f (y_in ) =(t y )f(y_in )[- f(y_in )] Σ.3 f Σ. δ_in 2 = δ w 2 = -.02(.3) = -.03 f δ 2 = δ_in 2 f (z_in 2 ) = -.03(.56)(-.56) = δ = (0.42).42[-.42] =

33 Calculate the Weights First Pass v v ij 0 j = αδ x j = αδ j i j =,2 w w j = αδ j =, 2 0 = αδ z j v0 = v = δx = (.005)(0) = Σ f w0 =.02 v2 = δ 2x = (.007)(0) = 0 v2 = δx2 = (.005)(0) = 0.5 v02 =.007 w = δz = (.02)(.43) = Σ.3 f Σ f w2 = δz2 = (.02)(.56) =.057 v22 = δ 2x2 = (.007)(0) = 0. 33

34 Update the Weights First Pass Σ f Σ.243 f Σ f. 34

35 Final Result After about 500 iterations: x -.5 Σ f Σ f y -.5 Σ f 35

36 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 36

37 Reference: Vamsi Pegatraju and Aparna Patsa Example 2 Y m = v X = [ ] [ 0 0 ] 0 = v t = = Desired output for X input α = 0.3 Z Z 2 Z 3 X X 2 X 3 v 2, f ( x) = x ( + e ) [ ] w0 = w = 2 p = 3 n = 3 37

38 Primary Values: Inputs to Epoch - I X=[ ]; W=[- 2] ; W 0 =[-]; V= V 0 =[ 0 0 -]; Target t=0.9; α = 0.3; 38

39 Epoch I Step 4: Z_in= V 0 +XV = [ ]; Z=f([Z_in])=[ ]; Step 5: Y_in = W 0 +ZW = [0.34]; Y=f([Z_in])=0.5772; Sum of Squares Error obtained originally: ( ) 2 =

40 Step 6: Error = t k Y k = Now we have only one output and hence the value of k=. δ = (t y )f (Y_in ) We know f (x) for sigmoid = f(x)(-f(x)) δ = ( )(0.5772)( ) =

41 For intermediate weights we have (j=,2,3) W j,k =α δ κ Ζ j = α δ Ζ j W =(0.3)(0.0788)[ ] =[ ] ; Bias W 0, =α δ = (0.3)(0.0788)=0.0236; 4

42 Step 7: Backpropagation to the first hidden layer For Z j (j=,2,3), we have δ_in j = k=..m δ κ W j,k = δ W j, δ_in = ;δ_in 2 =0.0788;δ_in 3 =0.576; δ j = δ_in j f (Z_in j ) => δ = ; δ 2 =0.007; δ 3 =0.036; 42

43 X=[ ] V i,j = αδ j X i V = [ ] ; V 2 = [ ] ; V 3 = [ ] ; V 0 =α[δ δ 2 δ 3 ] = [ ]; 43

44 Step 8: Updating of W, V, W 0, V 0 W new = W old + W =[ ] ; V new = V old + V =[ ; ; 0 3 ]; W 0new = ; V 0new = [ ]; Completion of the first epoch. 44

45 Primary Values: Inputs to Epoch - 2 X=[ ]; W=[ ] ; W 0 =[ ]; V=[ ; ; 0 3 ]; V 0 =[ ]; Target t=0.9; α = 0.3; 45

46 Epoch 2 Step 4: Z_in=V 0 +XV=[ ]; Z=f([Z_in])=[ ]; Step 5: Y_in = W 0 +ZW = [0.3925]; Y=f([Z_in])=0.5969; Sum of Squares Error obtained from first epoch: ( ) 2 =

47 Step 6: Error = t k Y k = Now again, as we have only one output, the value of k=. δ = (t y )f (Y_in ) =>δ = ( )(0.5969)( ) =

48 For intermediate weights we have (j=,2,3) W j,k =α δ κ Ζ j = α δ Ζ j W =(0.3)*(0.0729)* [ ] =[ ] ; Bias W 0, =α δ = 0.029; 48

49 Step 7: Backpropagation to the first hidden layer For Z j (j=,2,3), we have δ_in j = k=..m δ κ W j,k = δ W j, δ_in =-0.074;δ_in 2 =0.0745;δ_in 3 =0.469; δ j = δ_in j f (Z_in j ) => δ = ; δ 2 =0.0067; δ 3 =0.0334; 49

50 V i,j = αδ j X i V = [ ] ; V 2 = [ ] ; V 3 = [ ] ; V 0 =α[δ δ 2 δ 3 ] = [ ]; 50

51 Step 8: Updating of W, V, W 0, V 0 W new = W old + W =[ ] ; V new = V old + V =[ ; ; 0 3 ]; W 0new = ; V 0new = [ ]; Completion of the second epoch. 5

52 Z_in=V 0 +XV=[ ]; =>Z=f([Z_in])=[ ]; Step 5: Y_in = W 0 +ZW = [0.4684]; => Y=f([Z_in])=0.650; Sum of Squares Error at the end of the second epoch: ( ) 2 = From the last two values of Sum of Squares Error, we see that the value is gradually decreasing as the weights are getting updated. 52

53 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 53

54 Functional Approximation Multi-Layer Perceptrons can approximate any continuous function by a two-layer network with squashing activation functions. If activation functions can vary with the function, can show that a n-input, m-output function requires at most 2n+ hidden units. See Fausett: for more details. 54

55 Function Approximators Example: a function h(x) approximated by H(w,x) 55

56 Applications We look at a number of applications for backpropagation MLP s. In each case we ll examine Problem to be solved Architecture Used Results Reference: J.Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation, Addison Wesley, 99 56

57 NETtalk - Specifications Problem is to convert written text to speech. Conventionally, this is done by hand-coded linguistic rules, such as the DECtalk system. NETtalk uses a neural network to achieve similar results Input is written text Output is choice of phoneme for speech synthesiser 57

58 NETtalk - architecture 26 output units, of 26 code representing most likely phoneme 80 hidden units, fully interconnected T h e c a t o n 7 letter sliding window, generating phoneme for centre character. Input units use of 29 code. => 203 input units (=29x7) 58

59 NETtalk - Results 024 Training Set After 0 epochs - intelligible speech After 50 epochs - 95% correct on training set - 78% correct on test set Note that this network must generalise - many input combinations are not in training set Results not as good as DECtalk, but significantly less effort to code up. 59

60 Sonar Classifier Task - distinguish between rock and metal cylinder from sonar return of bottom of bay Convert time-varying input signal to frequency domain to reduce input dimension. (This is a linear transform and could be done with a fixed weight neural network.) Used a 60-x-2 network with x from 0 to 24 Training took about 200 epochs classified about 80% of training set; classified 00% training, 85% test set 60

61 ALVINN Drives 70 mph on a public highway 30 outputs for steering 4 hidden units 30x32 pixels as inputs 30x32 weights into one out of four hidden unit 6

62 Navigation of a Car Task is to control a car on a winding road Inputs are a 30x32 pixel image from a video camera on roof, 8x32 image from a range finder => 26 inputs 29 hidden units 45 output units arranged in a line, -of-45 code representing hard-left..straight-ahead..hard-right 62

63 Navigation of Car - Results Training set of 200 simulated road images Trained for 40 epochs Could drive at 5 km/hr on road, limited by calculation speed of feed-forward network. Twice as fast as best non-net solution 63

64 Backgammon Trained on 3000 example board scenarios of (position, dice, move) rated from -00 (very bad) to +00 (very good) from human expert. Some important information such as pipcount and degree-of-trapping was included as input. Some noise added to input set (scenarios with random score) Handcrafted examples added to training set to correct obvious errors 64

65 Backgammon results 459 inputs, 2 hidden layers, each 24 units, plus output for score (All possible moves evaluated) Won 59% against a conventional backgammon program (4% without extra info, 45% without noise in training set) Won computer olympiad, 989, but lost to human expert (Not surprising since trained by human scored examples) 65

66 Encoder / Image Compression Wish to encode a number of input patterns in an efficient number of bits for storage or transmission We can use an autoassociative network, i.e. an M-N-M network, where we have M inputs, and N<M hidden units, M outputs, trained with target outputs same as inputs Hidden units need to encode inputs in fewer signals in the hidden layers. Outputs from hidden layer are encoded signal 66

67 Encoders We can store/transmit hidden values using first half of network; decode using second half. We may need to truncate hidden unit values to fixed precision, which must be considered during training. Cottrell et al. tried 8x8 blocks (8 bits each) of images, encoded in 6 units, giving results similar to conventional approaches. Works best with similar images 67

68 Neural network for OCR feedforward network trained using Backpropagation A B C D E Hidden Layer Output Layer Input Layer

69 Pattern Recognition Post-code (or ZIP code) recognition is a good example - hand-written characters need to be classified. One interesting network used 6x6 pixel map input of handwritten digits already found and scaled by another system. 3 hidden layers plus -of-0 output layer. First two hidden layers were feature detectors. 69

70 ZIP code classifier First layer had same feature detector connected to 5x5 blocks of input, at 2 pixel intervals => 8x8 array of same detector, each with the same weights but connected to different parts of input. Twelve such feature detector arrays. Same for second hidden layer, but 4x4 arrays connected to 5x5 blocks of first hidden layer; with 2 different features. Conventional 30 unit 3rd hidden layer 70

71 ZIP Code Classifier - Results Note 8x8 and 4x4 arrays of feature detectors use the same weights => many fewer weights to train. Trained on 7300 digits, tested on 2000 Error rates: % on training, 5% on test set If cases with no clear winner rejected (i.e. largest output not much greater than second largest output), then, with 2% rejection, error rate on test set reduced to %. Performance improved further by removing more weights: optimal brain damage. 7

72 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 72

73 Heuristics for making BP Better Training with BP is more an art than science result of own experience Normalizing the inputs preprocessed so that its mean value is closer to zero (see prestd function in matlab). input variables should be uncorrelated by Principal Component Analysis (PCA). See prepca and trapca functions in Matlab. 73

74 Sequential vs. Batch update Sequential learning means that a given input pattern is forward propagated, the error is determined and back-propagated, and the weights are updated. Then the same procedure is repeated for the next pattern. Batch learning means that the weights are updated only after the entire set of training patterns has been presented to the network. In other words, all patterns are forward propagated, and the error is determined and back-propagated, but the weights are only updated when all patterns have been processed. Thus, the weight update is only performed every epoch. If P = # patterns in one epoch P w = w p P p = 74

75 Sequential vs. Batch update i.e.in some cases, it is advantageous to accumulate the weight correction terms for several patterns (or even an entire epoch if there are not too many patterns) and make a single weight adjustment (equal to the average of the weight correction terms) for each weight rather than updating the weights after each pattern is presented. This procedure has a smoothing effect (because of the use of the average) on the correction terms. In some cases, this smoothing may increase the chances of convergence to a local minimum. 75

76 Initial weights Initial weights will influence whether the net reaches a global (or only a local minimum) of the error and if so, how quickly it converges. The values for the initial weights must not be too large otherwise, the initial input signals to each hidden or output unit will be likely to fall in the region where the derivative of the sigmoid function has a very small value (f (net)~0) : so called saturation region. On the other hand, if the initial weights are too small, the net input to a hidden or output unit will be close to zero, which also causes extremely slow learning. Best to set the initial weights (and biases) to random numbers between 0.5 and 0.5 (or between and or some other suitable interval). The values may be +ve or ve because the final weights after training may be of either sign also. 76

77 Memorization vs. generalization How long to train the net: Since the usual motivation for applying a backprop net is to achieve a balance between memorization and generalization, it is not necessarily advantageous to continue training until the error actually reaches a minimum. Use 2 disjoint sets of data during training: / a set of training patterns and 2/ a set of training- testing patterns (or validation set). Weight adjustment are based on the training patterns; however, at intervals during training, the error is computed using the validation patterns. As long as the error for the validation decreases, training continues. When the error begins to increase, the net is starting to memorize the training patterns too specifically (starts to loose its ability to generalize). At this point, training is terminated. 77

78 L. Studer, IPHE-UNIL Early stopping Error With validation set (which does not change w ij ) With training set (which changes w ij ) Stop Here! Training time 78

79 Backpropagation with momentum Backpropagation with momentum: the weight change is in a direction that is a combination of / the current gradient and 2/ the previous gradient. Momentum can be added so weights tend to change more quickly if changing in the same direction for several training cycles:- w ij (t+) = α δ x i + µ. w ij (t) µ is called the momentum factor and ranges from 0 < µ <. When subsequent changes are in the same direction increase the rate (accelerated descent) When subsequent changes are in opposite directions decrease the rate (stabilizes) 79

80 Backpropagation with momentum Weight update equation Momentum w( t + ) w(t) w( t) + αδ z w( t ) Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall, 994, pg

81 BP training algorithm Adaptive Learning Rate Adaptive learning rate Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall,

82 Adaptive Learning rate Adaptive Parameters: Vary the learning rate during training, accelerating learning slowly if all is well ( error, E, decreasing), but reducing it quickly if things go unstable (E increasing). For example: α (t) + a if α (t + ) = (-b). α (t) α(t) Typically, a = 0., b = 0.5 E < 0for last fewepochs if E > 0 otherwise 82

83 Matlab BP NN Architecture A neuron with a single R-element input vector is shown below. Here the individual element inputs are multiplied by weights and the weighted values are fed to the summing junction. Their sum is simply Wp, the dot product of the (single row) matrix W and the vector p. The neuron has a bias b, which is summed with the weighted inputs to form the net input n. This sum, n, is the argument of the transfer function f. This expression can, of course, be written in MATLAB code as: n = W*p + b However, the user will seldom be writing code at this low level, for such code is already built into functions to define and simulate entire networks. 83

84 Matlab BP NN Architecture 84

85 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 85

86 Learning Rule Fausett, section 6.3, p324 Similar to Delta Rule. Our goal is to minimize the error, E, which is the difference between targets, t m, and our outputs, y km, using a least squares error measure: E = / 2 Σ k (t k -y k ) 2 To find out how to change w jk and v ij to reduce E, we need to find E w E and jk v ij 86

87 Delta Rule Derivation Hidden-to-Output = 2 E E 0.5 [ t = k y k ] hence (t k w jk w jk 2 k k y k ) 2 where E w JK y k = f ( y = wjk 2 k ink [ t y ] = [ t f ( )] k ) and k K y ink wjk y ink = j z j w jk Notice the difference between the subscripts k (which corresponds to any node between hidden and output layers) and K (which represents a particular node K of interest) E w JK E w JK = (t = (t K K y y K K f ( y ) w )f '(y ink JK in K ) = (t ).z J K y K )f '(y in K ( y ). w ink JK ) 87

88 Delta Rule Derivation Hidden-to-Output It is convenient to define : δ K = (t K y K )f '(y ink ) Thus, E w jk = α = α [ t k y k ] f '( yink ) z j = w jk α δ k z j In summary, with δ K = w (t K jk = α δ k z j y )f '(y K ink ) 88

89 Delta Rule Derivation: Input to Hidden = 2 E E 0.5 [ t = k yk ] hence (tk y k vij vij 2 k where y = f ( y ) and y = z w E v IJ k ink y = k [ tk y k ] = [ tk y k ] f '( yink ) k v IJ k ink j j jk k y v ) ink IJ 2 E v IJ y z ink J = δ k = δ k wjk = δ k k vij k vij k Notice the difference between the subscripts j and J and i and I w Jk f '( z inj )[ x I ] It is convenient to define : v ij δ J k = k δ w k Jk f '(z E = α = α f '( zinj ) x i δ kw jk = αδ j x v ij i inj ) 89

90 Delta Rule Derivation: Input to Hidden In summary where : δ J = v k ij = αδ x δ w k Jk j f i '(z inj ) 90

91 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 9

92 Suggested Reading. L. Fausett, Fundamentals of Neural Networks, Prentice-Hall, 994, Chapter 6. 92

93 References: These lecture notes were based on the references of the previous slide, and the following references. Eric Plummer, University of Wyoming 2. Clara Boyd, Columbia Univ. N.Y comet.ctr.columbia.edu/courses/elen_e40/2002/artificial.ppt 3. Dan St. Clair, University of Missori-Rolla, 404_fall200/Lectures/Lect09_0230/ 4. Vamsi Pegatraju and Aparna Patsa: web.umr.edu/~stclair/class/classfiles/cs404_fs02/ Lectures/Lect09_02902/Lect8_Homework/L8_3.ppt 5. Richard Spillman, Pacific Lutheran University: 6. Khurshid Ahmad and Matthew Casey Univ. Surrey, 93

Backpropagation Neural Net

Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that