Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System.

Part 7: Neura Networ & earnng /2/05 Superved earnng Neura Networ and Bac-Propagaton earnng Produce dered output for tranng nput Generaze reaonaby & appropratey to other nput Good exampe: pattern recognton Feedforward mutayer networ /2/05 /2/05 2 Feedforward Networ Credt Agnment Probem How do we adut the weght of the hdden ayer? Dered output nput ayer hdden ayer output ayer nput ayer hdden ayer output ayer /2/05 3 /2/05 4 Adaptve Sytem Sytem S P P P m Contro Parameter Evauaton Functon (Ftne, Fgure of Mert) /2/05 5 C F Contro Agorthm Gradent F meaure how F atered by varaton of P P F P M F = F P M F Pm F pont n drecton of maxmum ncreae n F /2/05 6

Part 7: Neura Networ & earnng /2/05 Gradent Acent on Ftne Surface Gradent Acent by Dcrete Step gradent acent F F /2/05 7 /2/05 8 Gradent Acent oca But Not Shortet Gradent Acent Proce P = F( P) Change n ftne : F = df dt = m F dp m = F =P dt = F = F P F = F F = F 2 0 P Therefore gradent acent ncreae ftne (unt reache 0 gradent) /2/05 9 /2/05 0 Genera Acent n Ftne Note that any adaptve proce P() t w ncreae ftne provded : 0 < F = F P = F P co where ange between F and P Hence we need co > 0 or < 90 o Genera Acent on Ftne Surface F /2/05 /2/05 2 2

Part 7: Neura Networ & earnng /2/05 Ftne a Mnmum Error Suppoe for Q dfferent nput we have target output t,k,t Q Suppoe for parameter P the correpondng actua output are y,k,y Q Suppoe D( t,y) [ 0,) meaure dfference between target & actua output et E = D( t,y ) be error on th ampe Q [ ] et F( P)= E ( P)= D t,y ( P) = Q = /2/05 3 Gradent of Ftne F = E = E E = D t D( t,y ),y y = P P y P = d D t,y d y y P = y D( t,y ) y P /2/05 4 Jacoban Matrx y P y P m Defne Jacoban matrx J = M O M y n P y n P m Note J nm and D( t,y ) n Snce ( E ) = E E = ( J ) T D( t,y ) = y D t,y P P y, /2/05 5 Dervatve of Suared Eucdean Dtance Suppoe D( t,y)= t y 2 = t y y D t y dy dd t,y = ( t y ) 2 = y 2 2 y t y /2/05 6 = d ( t y ) 2 = 2( t y ) d y = 2( y t) Gradent of Error on th Input E P = dd t,y dy y P = 2( y t ) y P y = 2 y t P /2/05 7 P = Recap ( J ) T t y To now how to decreae the dfference between actua & dered output, we need to now eement of Jacoban, y P, whch ay how th output vare wth th parameter (gven the th nput) The Jacoban depend on the pecfc form of the ytem, n th cae, a feedforward neura networ /2/05 8 3

Part 7: Neura Networ & earnng /2/05 Euaton Mutayer Notaton Net nput: h = n = h = w x 2 2 y Neuron output: = h = h 2 /2/05 9 /2/05 20 Notaton Typca Neuron ayer of neuron abeed,, N neuron n ayer = vector of output from neuron n ayer nput ayer = x (the nput pattern) output ayer = y (the actua output) = weght between ayer and Probem: fnd how output y vary wth weght ( =,, ) N h /2/05 2 /2/05 22 Error Bac-Propagaton e w compute E tartng wth at ayer ( = ) and worng bac to earer ayer ( = 2,K,) Deta Vaue Convenent to brea dervatve by chan rue : E = E h h et = E h E So = h /2/05 23 /2/05 24 4

Part 7: Neura Networ & earnng /2/05 Output-ayer Neuron Output-ayer Dervatve () N N h = y E t = E h = h = d ( t ) 2 dh = 2 ( t ) ( t ) 2 d = 2 t h dh /2/05 25 /2/05 26 Output-ayer Dervatve (2) Hdden-ayer Neuron h = E = where = 2 t h = N h N E /2/05 27 /2/05 28 Hdden-ayer Dervatve () E Reca = h = E = E h h h h = h h h m = m h h m = = h = h = h d h d h = h Hdden-ayer Dervatve (2) h = E = where = h = d = d /2/05 29 /2/05 30 5

Part 7: Neura Networ & earnng /2/05 Dervatve of Sgmod Suppoe = ( h)= (ogtc gmod) exp( h) D h = D h [ exp( h) ] = [ exp( h) ] 2 D h e h = ( e h ) 2 e h e h = e h 2 e h eh = = e h h e e h e h = ( ) Summary of Bac-Propagaton Agorthm Output ayer : = 2 ( ) ( t ) E = Hdden ayer : = E = /2/05 3 /2/05 32 Output-ayer Computaton N = h /2/05 33 = 2 ( ) t ( ) 2 = y t Hdden-ayer Computaton N = = h /2/05 34 N N E Tranng Procedure Batch earnng on each epoch (pa through a the tranng par), weght change for a pattern accumuated weght matrce updated at end of epoch accurate computaton of gradent Onne earnng weght are updated after bac-prop of each tranng par uuay randomze order for each epoch approxmaton of gradent Doen t mae much dfference /2/05 35 Summaton of Error Surface E E E 2 /2/05 36 6

Part 7: Neura Networ & earnng /2/05 Gradent Computaton n Batch earnng Gradent Computaton n Onne earnng E E 2 E E E 2 E /2/05 37 /2/05 38 The Goden Rue of Neura Net Neura Networ are the econd-bet way to do everythng! /2/05 39 7