Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205
Multlayer neural networks Or another way of modelng nonlneartes for regresson and classfcaton problems Classfcaton wth the lnear model. Logstc regresson model defnes a lnear decson boundary Example: 2 classes (blue and red ponts) 2 Decson boundary.5 0.5 0-0.5 - -.5-2 -2 -.5 - -0.5 0 0.5.5 2 2
Lnear decson boundary logstc regresson model s not optmal, but not that bad 5 4 3 2 0 - -2-3 -4-4 -3-2 - 0 2 3 4 5 6 When logstc regresson fals? Example n whch the logstc regresson model fals 5 4 3 2 0 - -2-3 -4-4 -3-2 - 0 2 3 4 5 3
Lmtatons of lnear unts. Logstc regresson does not work for party functons - no lnear decson boundary exsts 2.5 0.5 0-0.5 - -.5-2 -2 -.5 - -0.5 0 0.5.5 2 Soluton: a model of a non-lnear decson boundary x Extensons of smple lnear unts use feature (bass) functons to model nonlneartes Lnear regresson m w0 w ( x) (x) ( x ) 2 ( x ) - an arbtrary functon of x w 0 w w 2 Logstc regresson g ( w0 w ( x)) m x d m ( x ) w m 4
Learnng wth extended lnear unts Feature (bass) functons model nonlneartes Lnear regresson m w0 w ( x) x ( x ) 2 ( x ) w 0 w w 2 Logstc regresson m g ( w0 w ( x)) x d m ( x ) w m Important property: The same problem as learnng of the weghts for lnear unts, the nput has changed but the weghts are lnear n the new nput Problem: too many weghts to learn Mult-layered neural networks An alternatve way to ntroduce nonlneartes to regresson/classfcaton models Key dea: Cascade several smple neural models wth logstc unts. Much lke neuron connectons. 5
Multlayer neural network Also called a multlayer perceptron (MLP) x x 2 x d Cascades multple logstc regresson unts Example: (2 layer) classfer wth non-lnear decson boundares w 0, () w 0,2 () w k, () w k,2 () z () z 2 () w 0, w, w 2, z p ( y x) Input layer Hdden layer Output layer Multlayer neural network Models non-lnearty through logstc regresson unts Can be appled to both regresson and bnary classfcaton problems Input layer x x 2 x d w 0, () w 0,2 () w k, () w k,2 () Hdden layer w 0, z () w, w 2, z 2 () Output layer regresson f ( x, z classfcaton p( y x, opton 6
Multlayer neural network Non-lneartes are modeled usng multple hdden logstc regresson unts (organzed n layers) The output layer determnes whether t s a regresson or a bnary classfcaton problem Input layer x Hdden layers Output layer regresson f ( x, x 2 classfcaton x d opton p( y x, Learnng wth MLP How to learn the parameters of the neural network? Gradent descent algorthm Weght updates based on the error: J ( D, w ) w w w J ( D, w ) We need to compute gradents for weghts n all unts Can be computed n one backward sweep through the net!!! The process s called back-propagaton 7
Backpropagaton (k-)-th level k-th level (k+)-th level x ( k ) x (k) w, z (k) ( k ) ( k ) w l, z l x l ( k ) (k) x (k) z w, - output of the unt on level k - nput to the sgmod functon on level k w x ( k z k) w ) (,0, x g( z ) - weght between unts and on levels (k-) and k Backpropagaton Update weght w, usng a data pont D { x, y } w, w, J ( D, w, Let J ( D, z J ( D, z Then: J ( D, x ( k ) w k) z w S.t. (k) s computed from (k) and the next layer ( k ), (, x l ( k ) wl, ( k ) x ( x ) l Last unt (s the same as for the regular lnear unts): ( K) ( y u f ( x u, ) It s the same for the classfcaton wth the log-lkelhood measure of ft and lnear regresson wth least-squares error!!! l 8
Learnng wth MLP Onlne gradent descent algorthm Weght update: w, ( k ) w, ( k ) J onlne ( D u, w ) w ( k ), J onlne ( Du, z J onlne ( Du, w, z w, x ( k ) w, ( k ) w, ( k ) ( k ) x ( k ) x ( k ) - -th output of the (k-) layer (k ) - dervatve computed va backpropagaton - a learnng rate Onlne gradent descent algorthm for MLP Onlne-gradent-descent (D, number of teratons) Intalze all weghts w, for =:: number of teratons do select a data pont D u =<x,y> from D set learnng rate compute outputs x (k ) for each unt compute dervatves (k ) va backpropagaton update all weghts (n parallel) w, ( k ) w, ( k ) ( k ) x ( k ) end for return weghts w 9
Xor Example. lnear decson boundary does not exst 2.5 0.5 0-0.5 - -.5-2 -2 -.5 - -0.5 0 0.5.5 2 Xor example. Lnear unt 0
Xor example. Neural network wth 2 hdden unts Xor example. Neural network wth 0 hdden unts
MLP n practce Optcal character recognton dgts 20x20 Automatc sortng of mals 5 layer network wth multple output functons 0 outputs (0,, 9) layer Neurons Weghts 5 0 3000 4 300 200 3 200 50000 2 784 336 20x20 = 400 nputs 336 78400 2