CS407 Neural Computation

Size: px
Start display at page:

Download "CS407 Neural Computation"

Transcription

1 CS407 Neural Computation Lecture 5: The Multi-Layer Perceptron (MLP) and Backpropagation Lecturer: A/Prof. M. Bennamoun

2 What is a perceptron and what is a Multi-Layer Perceptron (MLP)? 2

3 What is a perceptron? x x 2 w k w k2 Bias b k v m = w x + k kj j j= Activation function b y k k = ϕ (v k ) Σ v k ϕ(.) Output y k x m Input signal w km Synaptic weights Summing junction Discrete Perceptron: ϕ( ) = sign( ) Continous Perceptron: ϕ( ) = S shape 3

4 Activation Function of a perceptron + + v i v i - Signum Function (sign) Discrete Perceptron: ϕ( ) = sign( ) Continous Perceptron: ϕ( v) = s shape 4

5 MLP Architecture The Multi-Layer-Perceptron was first introduced by M. Minsky and S. Papert in 969 Type: Feedforward Neuron layers: input layer or more hidden layers output layer Learning Method: Supervised 5

6 Terminology/Conventions Arrows indicate the direction of data flow. The first layer, termed input layer, just contains the input vector and does not perform any computations. The second layer, termed hidden layer, receives input from the input layer and sends its output to the output layer. After applying their activation function, the neurons in the output layer contain the output vector. 6

7 Why the MLP? The single-layer perceptron classifiers discussed previously can only deal with linearly separable sets of patterns. The multilayer networks to be introduced here are the most widespread neural network architecture Made useful until the 980s, because of lack of efficient training algorithms (McClelland and Rumelhart 986) The introduction of the backpropagation training algorithm. 7

8 Different Non-Linearly Separable Problems Structure Types of Decision Regions Exclusive-OR Problem Classes with Most General Meshed regionsregion Shapes Single-Layer Half Plane Bounded By Hyperplane A B B A B A Two-Layer Convex Open Or Closed Regions A B B A B A Three-Layer Arbitrary (Complexity Limited by No. of Nodes) A B B A B A 8

9 What is backpropagation Training and how does it work? 9

10 What is Backpropagation? Supervised Error Back-propagation Training The mechanism of backward error transmission (delta learning rule) is used to modify the synaptic weights of the internal (hidden) and output layers The mapping error can be propagated into hidden layers Can implement arbitrary complex/output mappings or decision surfaces to separate pattern classes For which, the explicit derivation of mappings and discovery of relationships is almost impossible Produce surprising results and generalizations 0

11 Architecture: Backpropagation Network The Backpropagation Net was first introduced by G.E. Hinton, E. Rumelhart and R.J. Williams in 986 Type: Feedforward Neuron layers: input layer or more hidden layers output layer Learning Method: Supervised Reference: Clara Boyd

12 Backpropagation Preparation Training Set A collection of input-output patterns that are used to train the network Testing Set A collection of input-output patterns that are used to assess network performance Learning Rate-α A scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments 2

13 Backpropagation training cycle Reference Eric Plammer / Feedforward of the input training pattern 3/ Adjustement of the weights 2/ Backpropagation of the associated error 3

14 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 4

15 BP NN With Single Hidden Layer Reference: Dan St. Clair Fausett: Chapter 6 O/P layer w j, k Hidden layer v i, j I/P layer Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall, 994. Notation Notation p. p of of Fausett Fausett 5

16 Notation x = input training vector t = Output target vector. δ k = portion of error correction weight for w jk that is due to an error at output unit Y k ; also the information about the error at unit Y k that is propagated back to the hidden units that feed into unit Y k δ j = portion of error correction weight for v jk that is due to the backpropagation of error information from the output layer to the hidden unit Z j α = learning rate. v oj = bias on hidden unit j w ok = bias on output unit k 6

17 Activation Functions Binary step Should be continuos, differentiable, and monotonically non-decreasing. Plus, its derivative should be easy to compute. Hyberbolic tangent f ( x) = + exp( x) f ' ( x) = f ( x)*[ f ( x)] Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall,

18 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 8

19 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

20 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

21 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

22 Y k Z Z j Z 3 X X 2 X 3 Fausett, L., pp

23 Let s examine Training Algorithm Equations Y [ x ] X =... x n Vectors & matrices make computation easier. Z Z 2 Z 3 X X 2 X 3 v,... v, p V = vn,... vn, p [ v v ] V 0 = 0,... 0, p W = w... wp,, w w, m... p, m v 2, [ w ] W 0 = 0,... w0, m Step 4 computation becomes Step 5 computation becomes Z Z _ in = = V 0 + [ f ( z _ in )... f ( z _ )] XV in p Y Y _ in = W = 0 + ZW [ f ( y _ in )... f ( y _ )] in m 23

24 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 24

25 Generalisation Once trained, weights are held contstant, and input patterns are applied in feedforward mode. - Commonly called recall mode. We wish network to generalize, i.e. to make sensible choices about input vectors which are not in the training set Commonly we check generalization of a network by dividing known patterns into a training set, used to adjust weights, and a test set, used to evaluate performance of trained network 25

26 Generalisation Generalisation can be improved by Using a smaller number of hidden units (network must learn the rule, not just the examples) Not overtraining (occasionally check that error on test set is not increasing) Ensuring training set includes a good mixture of examples No good rule for deciding upon good network size (# of layers, # units per layer) Usually use one input/output per class rather than a continuous variable or binary encoding 26

27 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 27

28 Reference: R. Spillman Example The XOR function could not be solved by a single layer perceptron network The function is: X Y F

29 XOR Architecture x v 0 Σ v f v 2 w 0 Σ w w 2 f y v 02 Σ v 2 v 22 f 29

30 Initial Weights Randomly assign small weight values: x Σ f Σ.3 f y Σ f. 30

31 Feedfoward st Pass x0 z in = -.3() +.2(0) +.5(0) = -.3 z = Σ f y in = -.4() -.2(.43) +.3(.56) = z in2 =.25() -.4(0) +.(0) Σ.3 f y =.42 (not 0) y Σ. Training Case: (0 0 0) f z 2 =.56 Activation function f: f ( x) = + e x 3

32 Backpropagate Σ.5 f δ_in = δ w = -.02(-.2) =.02 δ = δ_in f (z_in ) =.02(.43)(-.43) =.005 δ = (t y )f (y_in ) =(t y )f(y_in )[- f(y_in )] Σ.3 f Σ. δ_in 2 = δ w 2 = -.02(.3) = -.03 f δ 2 = δ_in 2 f (z_in 2 ) = -.03(.56)(-.56) = δ = (0.42).42[-.42] =

33 Calculate the Weights First Pass v v ij 0 j = αδ x j = αδ j i j =,2 w w j = αδ j =, 2 0 = αδ z j v0 = v = δx = (.005)(0) = Σ f w0 =.02 v2 = δ 2x = (.007)(0) = 0 v2 = δx2 = (.005)(0) = 0.5 v02 =.007 w = δz = (.02)(.43) = Σ.3 f Σ f w2 = δz2 = (.02)(.56) =.057 v22 = δ 2x2 = (.007)(0) = 0. 33

34 Update the Weights First Pass Σ f Σ.243 f Σ f. 34

35 Final Result After about 500 iterations: x -.5 Σ f Σ f y -.5 Σ f 35

36 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 36

37 Reference: Vamsi Pegatraju and Aparna Patsa Example 2 Y m = v X = [ ] [ 0 0 ] 0 = v t = = Desired output for X input α = 0.3 Z Z 2 Z 3 X X 2 X 3 v 2, f ( x) = x ( + e ) [ ] w0 = w = 2 p = 3 n = 3 37

38 Primary Values: Inputs to Epoch - I X=[ ]; W=[- 2] ; W 0 =[-]; V= V 0 =[ 0 0 -]; Target t=0.9; α = 0.3; 38

39 Epoch I Step 4: Z_in= V 0 +XV = [ ]; Z=f([Z_in])=[ ]; Step 5: Y_in = W 0 +ZW = [0.34]; Y=f([Z_in])=0.5772; Sum of Squares Error obtained originally: ( ) 2 =

40 Step 6: Error = t k Y k = Now we have only one output and hence the value of k=. δ = (t y )f (Y_in ) We know f (x) for sigmoid = f(x)(-f(x)) δ = ( )(0.5772)( ) =

41 For intermediate weights we have (j=,2,3) W j,k =α δ κ Ζ j = α δ Ζ j W =(0.3)(0.0788)[ ] =[ ] ; Bias W 0, =α δ = (0.3)(0.0788)=0.0236; 4

42 Step 7: Backpropagation to the first hidden layer For Z j (j=,2,3), we have δ_in j = k=..m δ κ W j,k = δ W j, δ_in = ;δ_in 2 =0.0788;δ_in 3 =0.576; δ j = δ_in j f (Z_in j ) => δ = ; δ 2 =0.007; δ 3 =0.036; 42

43 X=[ ] V i,j = αδ j X i V = [ ] ; V 2 = [ ] ; V 3 = [ ] ; V 0 =α[δ δ 2 δ 3 ] = [ ]; 43

44 Step 8: Updating of W, V, W 0, V 0 W new = W old + W =[ ] ; V new = V old + V =[ ; ; 0 3 ]; W 0new = ; V 0new = [ ]; Completion of the first epoch. 44

45 Primary Values: Inputs to Epoch - 2 X=[ ]; W=[ ] ; W 0 =[ ]; V=[ ; ; 0 3 ]; V 0 =[ ]; Target t=0.9; α = 0.3; 45

46 Epoch 2 Step 4: Z_in=V 0 +XV=[ ]; Z=f([Z_in])=[ ]; Step 5: Y_in = W 0 +ZW = [0.3925]; Y=f([Z_in])=0.5969; Sum of Squares Error obtained from first epoch: ( ) 2 =

47 Step 6: Error = t k Y k = Now again, as we have only one output, the value of k=. δ = (t y )f (Y_in ) =>δ = ( )(0.5969)( ) =

48 For intermediate weights we have (j=,2,3) W j,k =α δ κ Ζ j = α δ Ζ j W =(0.3)*(0.0729)* [ ] =[ ] ; Bias W 0, =α δ = 0.029; 48

49 Step 7: Backpropagation to the first hidden layer For Z j (j=,2,3), we have δ_in j = k=..m δ κ W j,k = δ W j, δ_in =-0.074;δ_in 2 =0.0745;δ_in 3 =0.469; δ j = δ_in j f (Z_in j ) => δ = ; δ 2 =0.0067; δ 3 =0.0334; 49

50 V i,j = αδ j X i V = [ ] ; V 2 = [ ] ; V 3 = [ ] ; V 0 =α[δ δ 2 δ 3 ] = [ ]; 50

51 Step 8: Updating of W, V, W 0, V 0 W new = W old + W =[ ] ; V new = V old + V =[ ; ; 0 3 ]; W 0new = ; V 0new = [ ]; Completion of the second epoch. 5

52 Z_in=V 0 +XV=[ ]; =>Z=f([Z_in])=[ ]; Step 5: Y_in = W 0 +ZW = [0.4684]; => Y=f([Z_in])=0.650; Sum of Squares Error at the end of the second epoch: ( ) 2 = From the last two values of Sum of Squares Error, we see that the value is gradually decreasing as the weights are getting updated. 52

53 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 53

54 Functional Approximation Multi-Layer Perceptrons can approximate any continuous function by a two-layer network with squashing activation functions. If activation functions can vary with the function, can show that a n-input, m-output function requires at most 2n+ hidden units. See Fausett: for more details. 54

55 Function Approximators Example: a function h(x) approximated by H(w,x) 55

56 Applications We look at a number of applications for backpropagation MLP s. In each case we ll examine Problem to be solved Architecture Used Results Reference: J.Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation, Addison Wesley, 99 56

57 NETtalk - Specifications Problem is to convert written text to speech. Conventionally, this is done by hand-coded linguistic rules, such as the DECtalk system. NETtalk uses a neural network to achieve similar results Input is written text Output is choice of phoneme for speech synthesiser 57

58 NETtalk - architecture 26 output units, of 26 code representing most likely phoneme 80 hidden units, fully interconnected T h e c a t o n 7 letter sliding window, generating phoneme for centre character. Input units use of 29 code. => 203 input units (=29x7) 58

59 NETtalk - Results 024 Training Set After 0 epochs - intelligible speech After 50 epochs - 95% correct on training set - 78% correct on test set Note that this network must generalise - many input combinations are not in training set Results not as good as DECtalk, but significantly less effort to code up. 59

60 Sonar Classifier Task - distinguish between rock and metal cylinder from sonar return of bottom of bay Convert time-varying input signal to frequency domain to reduce input dimension. (This is a linear transform and could be done with a fixed weight neural network.) Used a 60-x-2 network with x from 0 to 24 Training took about 200 epochs classified about 80% of training set; classified 00% training, 85% test set 60

61 ALVINN Drives 70 mph on a public highway 30 outputs for steering 4 hidden units 30x32 pixels as inputs 30x32 weights into one out of four hidden unit 6

62 Navigation of a Car Task is to control a car on a winding road Inputs are a 30x32 pixel image from a video camera on roof, 8x32 image from a range finder => 26 inputs 29 hidden units 45 output units arranged in a line, -of-45 code representing hard-left..straight-ahead..hard-right 62

63 Navigation of Car - Results Training set of 200 simulated road images Trained for 40 epochs Could drive at 5 km/hr on road, limited by calculation speed of feed-forward network. Twice as fast as best non-net solution 63

64 Backgammon Trained on 3000 example board scenarios of (position, dice, move) rated from -00 (very bad) to +00 (very good) from human expert. Some important information such as pipcount and degree-of-trapping was included as input. Some noise added to input set (scenarios with random score) Handcrafted examples added to training set to correct obvious errors 64

65 Backgammon results 459 inputs, 2 hidden layers, each 24 units, plus output for score (All possible moves evaluated) Won 59% against a conventional backgammon program (4% without extra info, 45% without noise in training set) Won computer olympiad, 989, but lost to human expert (Not surprising since trained by human scored examples) 65

66 Encoder / Image Compression Wish to encode a number of input patterns in an efficient number of bits for storage or transmission We can use an autoassociative network, i.e. an M-N-M network, where we have M inputs, and N<M hidden units, M outputs, trained with target outputs same as inputs Hidden units need to encode inputs in fewer signals in the hidden layers. Outputs from hidden layer are encoded signal 66

67 Encoders We can store/transmit hidden values using first half of network; decode using second half. We may need to truncate hidden unit values to fixed precision, which must be considered during training. Cottrell et al. tried 8x8 blocks (8 bits each) of images, encoded in 6 units, giving results similar to conventional approaches. Works best with similar images 67

68 Neural network for OCR feedforward network trained using Backpropagation A B C D E Hidden Layer Output Layer Input Layer

69 Pattern Recognition Post-code (or ZIP code) recognition is a good example - hand-written characters need to be classified. One interesting network used 6x6 pixel map input of handwritten digits already found and scaled by another system. 3 hidden layers plus -of-0 output layer. First two hidden layers were feature detectors. 69

70 ZIP code classifier First layer had same feature detector connected to 5x5 blocks of input, at 2 pixel intervals => 8x8 array of same detector, each with the same weights but connected to different parts of input. Twelve such feature detector arrays. Same for second hidden layer, but 4x4 arrays connected to 5x5 blocks of first hidden layer; with 2 different features. Conventional 30 unit 3rd hidden layer 70

71 ZIP Code Classifier - Results Note 8x8 and 4x4 arrays of feature detectors use the same weights => many fewer weights to train. Trained on 7300 digits, tested on 2000 Error rates: % on training, 5% on test set If cases with no clear winner rejected (i.e. largest output not much greater than second largest output), then, with 2% rejection, error rate on test set reduced to %. Performance improved further by removing more weights: optimal brain damage. 7

72 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 72

73 Heuristics for making BP Better Training with BP is more an art than science result of own experience Normalizing the inputs preprocessed so that its mean value is closer to zero (see prestd function in matlab). input variables should be uncorrelated by Principal Component Analysis (PCA). See prepca and trapca functions in Matlab. 73

74 Sequential vs. Batch update Sequential learning means that a given input pattern is forward propagated, the error is determined and back-propagated, and the weights are updated. Then the same procedure is repeated for the next pattern. Batch learning means that the weights are updated only after the entire set of training patterns has been presented to the network. In other words, all patterns are forward propagated, and the error is determined and back-propagated, but the weights are only updated when all patterns have been processed. Thus, the weight update is only performed every epoch. If P = # patterns in one epoch P w = w p P p = 74

75 Sequential vs. Batch update i.e.in some cases, it is advantageous to accumulate the weight correction terms for several patterns (or even an entire epoch if there are not too many patterns) and make a single weight adjustment (equal to the average of the weight correction terms) for each weight rather than updating the weights after each pattern is presented. This procedure has a smoothing effect (because of the use of the average) on the correction terms. In some cases, this smoothing may increase the chances of convergence to a local minimum. 75

76 Initial weights Initial weights will influence whether the net reaches a global (or only a local minimum) of the error and if so, how quickly it converges. The values for the initial weights must not be too large otherwise, the initial input signals to each hidden or output unit will be likely to fall in the region where the derivative of the sigmoid function has a very small value (f (net)~0) : so called saturation region. On the other hand, if the initial weights are too small, the net input to a hidden or output unit will be close to zero, which also causes extremely slow learning. Best to set the initial weights (and biases) to random numbers between 0.5 and 0.5 (or between and or some other suitable interval). The values may be +ve or ve because the final weights after training may be of either sign also. 76

77 Memorization vs. generalization How long to train the net: Since the usual motivation for applying a backprop net is to achieve a balance between memorization and generalization, it is not necessarily advantageous to continue training until the error actually reaches a minimum. Use 2 disjoint sets of data during training: / a set of training patterns and 2/ a set of training- testing patterns (or validation set). Weight adjustment are based on the training patterns; however, at intervals during training, the error is computed using the validation patterns. As long as the error for the validation decreases, training continues. When the error begins to increase, the net is starting to memorize the training patterns too specifically (starts to loose its ability to generalize). At this point, training is terminated. 77

78 L. Studer, IPHE-UNIL Early stopping Error With validation set (which does not change w ij ) With training set (which changes w ij ) Stop Here! Training time 78

79 Backpropagation with momentum Backpropagation with momentum: the weight change is in a direction that is a combination of / the current gradient and 2/ the previous gradient. Momentum can be added so weights tend to change more quickly if changing in the same direction for several training cycles:- w ij (t+) = α δ x i + µ. w ij (t) µ is called the momentum factor and ranges from 0 < µ <. When subsequent changes are in the same direction increase the rate (accelerated descent) When subsequent changes are in opposite directions decrease the rate (stabilizes) 79

80 Backpropagation with momentum Weight update equation Momentum w( t + ) w(t) w( t) + αδ z w( t ) Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall, 994, pg

81 BP training algorithm Adaptive Learning Rate Adaptive learning rate Source: Fausett, L., Fundamentals of Neural Networks, Prentice Hall,

82 Adaptive Learning rate Adaptive Parameters: Vary the learning rate during training, accelerating learning slowly if all is well ( error, E, decreasing), but reducing it quickly if things go unstable (E increasing). For example: α (t) + a if α (t + ) = (-b). α (t) α(t) Typically, a = 0., b = 0.5 E < 0for last fewepochs if E > 0 otherwise 82

83 Matlab BP NN Architecture A neuron with a single R-element input vector is shown below. Here the individual element inputs are multiplied by weights and the weighted values are fed to the summing junction. Their sum is simply Wp, the dot product of the (single row) matrix W and the vector p. The neuron has a bias b, which is summed with the weighted inputs to form the net input n. This sum, n, is the argument of the transfer function f. This expression can, of course, be written in MATLAB code as: n = W*p + b However, the user will seldom be writing code at this low level, for such code is already built into functions to define and simulate entire networks. 83

84 Matlab BP NN Architecture 84

85 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 85

86 Learning Rule Fausett, section 6.3, p324 Similar to Delta Rule. Our goal is to minimize the error, E, which is the difference between targets, t m, and our outputs, y km, using a least squares error measure: E = / 2 Σ k (t k -y k ) 2 To find out how to change w jk and v ij to reduce E, we need to find E w E and jk v ij 86

87 Delta Rule Derivation Hidden-to-Output = 2 E E 0.5 [ t = k y k ] hence (t k w jk w jk 2 k k y k ) 2 where E w JK y k = f ( y = wjk 2 k ink [ t y ] = [ t f ( )] k ) and k K y ink wjk y ink = j z j w jk Notice the difference between the subscripts k (which corresponds to any node between hidden and output layers) and K (which represents a particular node K of interest) E w JK E w JK = (t = (t K K y y K K f ( y ) w )f '(y ink JK in K ) = (t ).z J K y K )f '(y in K ( y ). w ink JK ) 87

88 Delta Rule Derivation Hidden-to-Output It is convenient to define : δ K = (t K y K )f '(y ink ) Thus, E w jk = α = α [ t k y k ] f '( yink ) z j = w jk α δ k z j In summary, with δ K = w (t K jk = α δ k z j y )f '(y K ink ) 88

89 Delta Rule Derivation: Input to Hidden = 2 E E 0.5 [ t = k yk ] hence (tk y k vij vij 2 k where y = f ( y ) and y = z w E v IJ k ink y = k [ tk y k ] = [ tk y k ] f '( yink ) k v IJ k ink j j jk k y v ) ink IJ 2 E v IJ y z ink J = δ k = δ k wjk = δ k k vij k vij k Notice the difference between the subscripts j and J and i and I w Jk f '( z inj )[ x I ] It is convenient to define : v ij δ J k = k δ w k Jk f '(z E = α = α f '( zinj ) x i δ kw jk = αδ j x v ij i inj ) 89

90 Delta Rule Derivation: Input to Hidden In summary where : δ J = v k ij = αδ x δ w k Jk j f i '(z inj ) 90

91 Backpropagation Neural Networks Architecture BP BP training Algorithm Generalization Examples Example Example 2 Uses (applications) of of BP BP networks Options/Variations on on BP BP Momentum Sequential vs. vs. batch Adaptive learning rates Appendix References and and suggested reading 9

92 Suggested Reading. L. Fausett, Fundamentals of Neural Networks, Prentice-Hall, 994, Chapter 6. 92

93 References: These lecture notes were based on the references of the previous slide, and the following references. Eric Plummer, University of Wyoming 2. Clara Boyd, Columbia Univ. N.Y comet.ctr.columbia.edu/courses/elen_e40/2002/artificial.ppt 3. Dan St. Clair, University of Missori-Rolla, 404_fall200/Lectures/Lect09_0230/ 4. Vamsi Pegatraju and Aparna Patsa: web.umr.edu/~stclair/class/classfiles/cs404_fs02/ Lectures/Lect09_02902/Lect8_Homework/L8_3.ppt 5. Richard Spillman, Pacific Lutheran University: 6. Khurshid Ahmad and Matthew Casey Univ. Surrey, 93

Backpropagation Neural Net

Backpropagation Neural Net Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural 1 2 The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural networks. First we will look at the algorithm itself

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Introduction To Artificial Neural Networks

Introduction To Artificial Neural Networks Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units Connectionist Models Consider humans: Neuron switching time ~ :001 second Number of neurons ~ 10 10 Connections per neuron ~ 10 4 5 Scene recognition time ~ :1 second 100 inference steps doesn't seem like

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Chapter 2 Single Layer Feedforward Networks

Chapter 2 Single Layer Feedforward Networks Chapter 2 Single Layer Feedforward Networks By Rosenblatt (1962) Perceptrons For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET Unit-. Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5 Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand

More information

ADALINE for Pattern Classification

ADALINE for Pattern Classification POLYTECHNIC UNIVERSITY Department of Computer and Information Science ADALINE for Pattern Classification K. Ming Leung Abstract: A supervised learning algorithm known as the Widrow-Hoff rule, or the Delta

More information

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Huffman Encoding

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

Multilayer Feedforward Networks. Berlin Chen, 2002

Multilayer Feedforward Networks. Berlin Chen, 2002 Multilayer Feedforard Netors Berlin Chen, 00 Introduction The single-layer perceptron classifiers discussed previously can only deal ith linearly separable sets of patterns The multilayer netors to be

More information

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1 Neural Networks Xiaoin Zhu erryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Terminator 2 (1991) JOHN: Can you learn? So you can be... you know. More human. Not

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward ECE 47/57 - Lecture 7 Back Propagation Types of NN Recurrent (feedback during operation) n Hopfield n Kohonen n Associative memory Feedforward n No feedback during operation or testing (only during determination

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Computational Intelligence Winter Term 2017/18

Computational Intelligence Winter Term 2017/18 Computational Intelligence Winter Term 207/8 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Plan for Today Single-Layer Perceptron Accelerated Learning

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Simple neuron model Components of simple neuron

Simple neuron model Components of simple neuron Outline 1. Simple neuron model 2. Components of artificial neural networks 3. Common activation functions 4. MATLAB representation of neural network. Single neuron model Simple neuron model Components

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Introduction to feedforward neural networks

Introduction to feedforward neural networks . Problem statement and historical context A. Learning framework Figure below illustrates the basic framework that we will see in artificial neural network learning. We assume that we want to learn a classification

More information

Neural Nets Supervised learning

Neural Nets Supervised learning 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w

More information

Computational Intelligence

Computational Intelligence Plan for Today Single-Layer Perceptron Computational Intelligence Winter Term 00/ Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Accelerated Learning

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Incremental Stochastic Gradient Descent

Incremental Stochastic Gradient Descent Incremental Stochastic Gradient Descent Batch mode : gradient descent w=w - η E D [w] over the entire data D E D [w]=1/2σ d (t d -o d ) 2 Incremental mode: gradient descent w=w - η E d [w] over individual

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Artificial neural networks

Artificial neural networks Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

More information

EPL442: Computational

EPL442: Computational EPL442: Computational Learning Systems Lab 2 Vassilis Vassiliades Department of Computer Science University of Cyprus Outline Artificial Neuron Feedforward Neural Network Back-propagation Algorithm Notes

More information

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10 CS9840 Learning and Computer Vision Prof. Olga Veksler Lecture 0 Neural Networks Many slides are from Andrew NG, Yann LeCun, Geoffry Hinton, Abin - Roozgard Outline Short Intro Perceptron ( layer NN) Multilayer

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Learning and Neural Networks

Learning and Neural Networks Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Neural Networks Lecture 3:Multi-Layer Perceptron

Neural Networks Lecture 3:Multi-Layer Perceptron Neural Networks Lecture 3:Multi-Layer Perceptron H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural

More information