Lecture 12: Multilayer perceptrons II

Lecture : Multlayer perceptros II Bayes dscrmats ad MLPs he role of hdde uts A eample Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

Bayes dscrmats ad MLPs ( As we have see throuhout the course, the classfer that mmzes the probablty of error could be epressed as a famly of dscrmat fuctos defed by the mamum a posteror arma{ ( P( } How does the output of a MLP relates to ths optmal classfer? Assume a MLP wth a oe-of-c ecod for the tarets f t( 0 otherwse he cotrbuto to the error of the th output euro s J W ;W t ( ( ( ( ( ;W + ( ( ;W 0 Where (;W s the dscrmat fucto computed by the MLP for the th class ad the set of wehts W ( ( ;W + ( ( ;W 0 ( ;W Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

Bayes dscrmats ad MLPs ( For a fte umber of eamples, the prevous crtero fucto becomes lm J ( W lm ( ( ;W + ( ( ;W 0 P ( ( ;W + lm lm ( ( ;W 0 ( ( ( ;W P( d + P( k ( ( ;W 0 P( k ( ( ;W ( ;W + P(, d + ( ;W P(, k ( ;W ( P(, + P(, k d ( ;W P(, d + P(, ( ;W P( d ( ;W P( P( d + P ( P( d P ( P( d + P(, ( ( ;W P( P( d P ( P( d + P(, lm lm 444444 444444 depedet of W d d d d d Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

Bayes dscrmats ad MLPs ( he back propaato rule chaes W to mmze J(W, so fact t s mmz ( ( ;W P( P( d Summ over all classes (output euros, the we coclude that back-prop also mmzes So, the lmt of fte eamples, the outputs of the MLP wll appromate ( a leastsquares sese the true a posteror probabltes otce that oth sad here s specfc to MLPs Ay dscrmat fucto wth adaptve parameters traed to mmze the sum squared error at the output of a -of-c ecod wll appromate the a posteror probabltes hs result wll be true f ad oly f he MLP has eouh hdde uts to represet the a posteror destes ad We have a fte umber of eamples ad he MLP does ot et trapped a local mma I practce we wll have a lmted umber of eamples so C ( ( ( ( ;W P P ( ;W P( he outputs wll ot always represet probabltes For stace, there s o uaratee that they wll sum up to We ca use ths result to determe f the etwork has traed properly If the sum of the outputs dffers sfcatly from, t wll be a dcato that the MLP s ot model the a posteror destes properly ad that we may have to chae the MLP (topoloy, umber of hdde uts, etc. d Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 4

he role of hdde uts ( Let us assume a MLP wth o-lear actvato fuctos for the hdde layer(s ad lear actvato fucto for the output layer ω ω If we hold costat the set of hdde wehts w <m> m..m-, the mmzato of the obectve fucto J(W wth respect of the output wehts w <M> becomes a lear optmzato problem ad ca, therefore, be solved closed form W arm W C k ( ( ( yk tk It ca be show [Bshop, 99] that the role of the output bases s to compesate for the dfferece betwee the averaes (over the data set of the taret values ad the wehted sum of the averaes of the hdde ut outputs w H M < M > [ ] w E[ ] 0k E tk k y hs allows us to ore the mea of the outputs ad tarets ad epress the obectve fucto as (dropp de ( for clarty C HM ( < > < > M M ~ J W w ~ k y t { k zero mea zero mea < M > < M > < M > ~ y y E[ y ] where ~ t t E[ t ] D w <m>, m..m- w <M> ω C Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

he role of hdde uts ( o fd the optmal output wehts w <M> we form the partal dervatve H J W M < > > M w k y~ M w k We troduce the follow matr otato W <M> deotes the wehts of the lear layer <M-> deotes the zero-mea outputs of the last hdde layer (each colum s a eample, each row s a output deotes the zero-mea tarets (each colum s a eample, each row s a output So the prevous mmzato problem becomes Ad the optmal set of wehts W <M> becomes It s mportat to otce that ths soluto ca be calculated eplctly, o teratve procedure (.e. steepest descet s ecessary We ow tur our atteto to the hdde layer(s W ( < M > < M > < t < M > < M > 0 Us matr otato we ca aa epress the obectve fucto as C HM ( < > < > M M ~ J W w k y~ t { k zero mea zero mea < M > < M > r ( W ( W W < M > < M > < M > < M > < M > ( ( 4444 4444 pseudo verse of < M > ~ y~ 0 Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 6

he role of hdde uts ( Substtut the optmal value of W <M> yelds J W r ( ( ( r { where S S B depedet of W S S B Sce the product s depedet of W, the mmzato of J(W s equvalet to mamz J (W ( r [ S S J W B ] Sce we are us -of-c ecod the output layer, t ca be show that S B becomes C < M > < M > < M > < M > yk E y k SB k ( yk yk ( yk yk where k yk E[ y k] otce that ths S B dffers from the covetoal betwee-class covarace matr by hav k stead of k. hs meas that the MLP wll have a stro bas favor of classes that have a lare umber of eamples COCLUSIO [ ] Choos the optmum wehts of a MLP to mmze the square error at the output layer forces the wehts of the hdde layer(s to be chose so that the trasformato from the put data to the output of the (last hdde layer mamzes the dscrmat fucto r[s B S ] measured at the output of the (last hdde layer hs s precsely why MLPs have bee demostrated to perform classfcato tasks so well Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 7

A eample We tra a two-layer MLP to classfy fve odors from a array of sty as sesors he MLP has sty puts, oe for each as sesor he MLP has fve outputs, oe for each odor Output euros use the -of-c ecod of classes Four hdde euros are used (as may as LDA proectos he hdde layer has the lostc smodal actvato fucto he output layer has lear actvato fucto ra Hdde wehts ad bases traed wth steepest descet rule Output wehts ad bases traed wth the pseudo-verse rule 4 Sesor respose 0 - - - 60 ω ω ω ω 4 ω 0 0 0 0 40 0 60 70 80 90 orae apple cherry frut-puch tropcal-puch Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 8

A eample: results 0.8 0.8 4 4 44 44 4 hdde euro 0.6 0.4 0. 4 4 0. 0.4 0.6 0.8 hdde euro hdde euro 4 0.6 0.4 0. 0. 0.4 0.6 0.8 hdde euro 4 ω Sesor respose 0 - - - 0 0 0 0 40 0 60 70 80 90 orae apple cherry frut-puch tropcal-puch 60 ω ω ω 4 ω output euros 4 0 40 60 80 eample Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 9