Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent update: n w w+ α( y ) Onlne: Gradent update: w w+ α ( y f ( The same )) w w+ α( y ) Onlne: = n = w w+ α( y )
Lmtatons of basc lnear unts Lnear regresson f () = w 0 + d = w Logstc regresson f ( ) = p ( y =, w ) = g ( w d 0 + = w ) w f () w z p ( y = ) w d w d Functon lnear n nputs!! Lnear decson boundary!! = Etensons of smple lnear unts use feature (bass) functons to model nonlneartes Lnear regresson m w0 + w φ ( ) = φ () φ ( ) φ 2 ( ) - an arbtrary functon of w Logstc regresson = g ( w0 + φ ( )) m w = φ m ( ) w m
Regresson wth a quadratc model. Quadratc decson boundary 5 4 3 2 0 - -2-3 -4-4 -3-2 - 0 2 3 4 5 6
Mult-layered neural networks Offer an alternatve way to ntroduce nonlneartes to regresson/classfcaton models Idea: Cascade several smple logstc regresson unts. Motvaton: from a neuron and synaptc connectons. Model of a neuron w z y w k Threshold functon
Multlayer neural network Also called a multlayer perceptron (MLP) Cascades multple logstc regresson unts Eample: a (2 layer) classfer wth non-lnear decson boundares, (),2 () w k, () w k,2 () z () z 2 (), w,, z p ( y = ) Input layer Hdden layer Output layer Multlayer neural network Models non-lneartes through logstc regresson unts Can be appled to both regresson and bnary classfcaton problems Input layer, (),2 () w k, () w k,2 () Hdden layer w 0, z () w,, z 2 () Output layer regresson = f (, w) z classfcaton f ( ) = p( y =, w) opton
Multlayer neural network Non-lneartes are modeled usng multple hdden logstc regresson unts (organzed n layers) Output layer determnes whether t s a regresson and bnary classfcaton problem Input layer Hdden layers Output layer regresson = f (, w) opton classfcaton f ( ) = p( y =, w) Learnng wth MLP How to learn the parameters of the neural network? Gradent descent algorthm. On-lne verson: Weght updates are based on J onlne, w ) w w α J onlne ( D, w ) w We need to compute gradents for weghts n all unts Can be computed n one backward sweep through the net!!! ( D The process s called back-propagaton
Backpropagaton (k-)-th level k-th level (k+)-th level ( k ) (k) w, ( k ) z (k) ( k ) ( k +) w l, + z l l ( k +) (k) (k) z w - output of the unt on level k - nput to the sgmod functon on level k + w ( k z k) = w ) (,0, = g( z ), - weght between unts and on levels (k-) and k Backpropagaton Update weght w, usng a data pont D u =<, y > w, w, α J onlne w, Let δ = J onlne z J onlne z Then: J onlne = = δ ( k ) w k) z w, (, S.t. δ (k) s computed from (k) and the net layer δ l ( k +) δ = δ l ( k + ) wl, ( k + ) ( ) l Last unt (s the same as for the regular lnear unts): δ ( K) = ( y It s the same for the classfcaton wth the log-lkelhood measure of ft and lnear regresson wth least-squares error!!!
Learnng wth MLP Onlne gradent descent algorthm Weght update: w, ( k ) w, ( k ) α J onlne ( D u, w ) w ( k ), J onlne z J onlne = w, z w, = δ ( k ) w, ( k ) w, ( k ) αδ ( k ) ( k ) ( k ) - -th output of the (k-) layer δ (k ) - dervatve computed va backpropagaton α - a learnng rate Onlne gradent descent algorthm for MLP Onlne-gradent-descent (D, number of teratons) Intalze all weghts w, for =:: number of teratons do select a data pont D u =<,y> from D set α =/ compute outputs (k ) for each unt compute dervatves δ (k ) va backpropagaton update all weghts (n parallel) end for return weghts w w, ( k ) w, ( k ) αδ ( k ) ( k )
Xor Eample. No lnear decson boundary 2.5 0.5 0-0.5 - -.5-2 -2 -.5 - -0.5 0 0.5.5 2 Xor eample. Lnear unt
Xor eample. Neural network wth 2 hdden unts Xor eample. Neural network wth 0 hdden unts
Problems wth learnng MLPs Decson about the number of unts must be made n advance Converges to a local optma Senstve to ntal set of weghts MLP n practce Optcal character recognton dgts 2020 Automatc sortng of mals 5 layer network wth multple output functons 0 outputs (0,, 9) layer Neurons Weghts 5 0 3000 4 300 200 3 200 50000 2 784 336 2020 = 400 nputs 336 78400