Neural Networks Perceptrons and Backpropagaton Slke Bussen-Heyen Unverstät Bremen Fachberech 3 5th of Novemeber 2012 Neural Networks 1 / 17
Contents 1 Introducton 2 Unts 3 Network structure 4 Snglelayer feed-forward neural network 5 Multlayer feed-forward neural network 6 Backpropagaton Neural Networks 2 / 17
Introducton Introducton motvated by neurons n the bran: collecton, processng, dssemnaton of electrcal sgnals asumpton: nformaton-processng n humans emerge from networks of neurons dea n AI: create artfcal neural networks synonyms: connectonsm, parallel dstrbutet processng, neural computaton Neural Networks 3 / 17
Unt Unts neural networks consst of unts unts cennected by drected lnks lnk from unt j to propagates actvaton a j lnks have a weght W j, Fgure 1: mathematcal model for a neuron Neural Networks 4 / 17
Processng n a unt Unts 1 process nput: compute weghted sum of nputs n n = W j, a j j=0 2 derve output: apply ( actvaton functon g n ) a = g(n ) = g j=0 W j,a j Fgure 2: mathematcal model for a neuron Neural Networks 5 / 17
Actvaton functon g Unts requrements: 1 unt actve(near +1) when combnaton of features detected 2 unt nactve(near 0) when combnaton of features not detected possble functons: Fgure 3: (a) threshhold functon, (b) sgmod functon sgmod functon s dfferentatable threshhold gven by bas weght W 0, : a = n j=1 W j,a j + a 0 W 0, Neural Networks 6 / 17
Unts Unts as Boolean gates unts can represent boolean gates Boolean functons can be computed Fgure 4: boolean gates AND: a 1 = 1, a 2 = 1 : a and = 1 1 + 1 1 1, 5 = 0, 5 Neural Networks 7 / 17
Network structure Network structure Network consst of 1 Input unts I 2 Hdden unts H 3 Output unts O Fgure 5: smple neural network Inputs x = (x 1, x 2 ) = (a 1, a 2 ) a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) a 5 = g(w 3,5 g(w 1,3 a 1 + W 2,3 a 2 ) + W 4,5 g(w 1,4 a 1 + W 2,4 a 2 )) functon h W (x) s computed Neural Networks 8 / 17
Network structure Network structure Network consst of 1 Input unts I 2 Hdden unts H 3 Output unts O Fgure 5: smple neural network Inputs x = (x 1, x 2 ) = (a 1, a 2 ) a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) a 5 = g(w 3,5 g(w 1,3 a 1 + W 2,3 a 2 ) + W 4,5 g(w 1,4 a 1 + W 2,4 a 2 )) functon h W (x) s computed Neural Networks 8 / 17
Network structure Network structure Network consst of 1 Input unts I 2 Hdden unts H 3 Output unts O Fgure 5: smple neural network Inputs x = (x 1, x 2 ) = (a 1, a 2 ) a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) a 5 = g(w 3,5 g(w 1,3 a 1 + W 2,3 a 2 ) + W 4,5 g(w 1,4 a 1 + W 2,4 a 2 )) functon h W (x) s computed Neural Networks 8 / 17
Network structure Network structure unts arranged n layers Fgure 6: smple neural network neural networks used for classfcaton or regresson 1 bnary classfcaton 2 k-class classfcaton Neural Networks 9 / 17
Snglelayer feed-forward neural network Perceptron sngle layer network majorty functon: n nputs W = 1, W 0 = n/2 n a = g(n ) = g W j, a j j=0 perceptron returns 1 f the sum s greater than 0 n j=0 W jx j > 0 W x > 0 W x = 0 defnes hyperplane n nput space Fgure 7: perceptron network Fgure 8: two-varate functon Neural Networks 10 / 17
Lnear seperator Snglelayer feed-forward neural network perceptron s lnear seperator Fgure 9: separablty Neural Networks 11 / 17
Perceptron learnng Snglelayer feed-forward neural network W,j consttutes weght space sum of squared output s measure for error E = 1 2 Err 2 = 1 2 (y h W(x)) 2 dervate E wth respect to W j E = Err Err = Err n g y W j x j = Err g (n) x j W j W j W j j=0 update weght, α s the learnng rate W j W j + α Err g (n) x j Neural Networks 12 / 17
Perceptron learnng Snglelayer feed-forward neural network W,j consttutes weght space sum of squared output s measure for error E = 1 2 Err 2 = 1 2 (y h W(x)) 2 dervate E wth respect to W j E = Err Err = Err n g y W j x j = Err g (n) x j W j W j W j j=0 update weght, α s the learnng rate W j W j + α Err g (n) x j Neural Networks 12 / 17
Perceptron learnng Snglelayer feed-forward neural network W,j consttutes weght space sum of squared output s measure for error E = 1 2 Err 2 = 1 2 (y h W(x)) 2 dervate E wth respect to W j E = Err Err = Err n g y W j x j = Err g (n) x j W j W j W j j=0 update weght, α s the learnng rate W j W j + α Err g (n) x j Neural Networks 12 / 17
Perceptron learnng Snglelayer feed-forward neural network W,j consttutes weght space sum of squared output s measure for error E = 1 2 Err 2 = 1 2 (y h W(x)) 2 dervate E wth respect to W j E = Err Err = Err n g y W j x j = Err g (n) x j W j W j W j j=0 update weght, α s the learnng rate W j W j + α Err g (n) x j Neural Networks 12 / 17
Multlayer feed-forward neural network Multlayer feed-forward neural network many output unts possble: h W (x) example has output vector y Err = y h W easy to compute error at hdden layers? Fgure 10: multlayer feed-forward network Neural Networks 13 / 17
Backpropagaton Backpropagaton back-propagate error from the output layer multple output unts: Err th component of y h W error: = Err g (n ) weght update: W j, W j, + α a j update wehgts for hdden unt: node j responsble for fracton of dvde accordng to W j, back-propagated j = g (n j ) W j, weght update: W k,j W kj + α a k g (n j ) W j, Neural Networks 14 / 17
Backpropagaton Backpropagaton mathematcal output layer squared error: E = 1 (y a ) 2 2 dervate E wth respect to W j, : E W j, = (y a ) a W j, = (y a ) g(n ) = (y a )g (n ) n = (y a )g (n ) W j, W j, W j, j W j, a j = (y a )g (n )a j = a j Neural Networks 15 / 17
Backpropagaton Backpropagaton mathematcal output layer squared error: E = 1 (y a ) 2 2 dervate E wth respect to W j, : E W j, = (y a ) a W j, = (y a ) g(n ) = (y a )g (n ) n = (y a )g (n ) W j, W j, W j, j W j, a j = (y a )g (n )a j = a j Neural Networks 15 / 17
Backpropagaton Backpropagaton mathematcal output layer squared error: E = 1 (y a ) 2 2 dervate E wth respect to W j, : E W j, = (y a ) a W j, = (y a ) g(n ) = (y a )g (n ) n = (y a )g (n ) W j, W j, W j, j W j, a j = (y a )g (n )a j = a j Neural Networks 15 / 17
Backpropagaton Backpropagaton mathematcal hdden layer dervate E wth respect to W k,j : E = (y a ) a (y a )g (n ) n (y a ) g(n ) W j, a j j a j W j, = g(n j ) W j, = W j, g (n j ) n j ( ) W j, g (n j ) W k,j a k = W j, g (n j )a k = a k j k Neural Networks 16 / 17
Backpropagaton Backpropagaton mathematcal hdden layer dervate E wth respect to W k,j : E = (y a ) a (y a )g (n ) n (y a ) g(n ) W j, a j j a j W j, = g(n j ) W j, = W j, g (n j ) n j ( ) W j, g (n j ) W k,j a k = W j, g (n j )a k = a k j k Neural Networks 16 / 17
Backpropagaton Backpropagaton mathematcal hdden layer dervate E wth respect to W k,j : E = (y a ) a (y a )g (n ) n (y a ) g(n ) W j, a j j a j W j, = g(n j ) W j, = W j, g (n j ) n j ( ) W j, g (n j ) W k,j a k = W j, g (n j )a k = a k j k Neural Networks 16 / 17
Backpropagaton Backpropagaton mathematcal hdden layer dervate E wth respect to W k,j : E = (y a ) a (y a )g (n ) n (y a ) g(n ) W j, a j j a j W j, = g(n j ) W j, = W j, g (n j ) n j ( ) W j, g (n j ) W k,j a k = W j, g (n j )a k = a k j k Neural Networks 16 / 17
Backpropagaton Overfttng what network structure s approprate? sze of layers the more parameters the more precse predcton on tranng data new examples are not predcted well Neural Networks 17 / 17