CSE 556: Itroducto to Neural Netorks Regresso ad the LMS Algorthm CSE 556: Regresso 1
Problem statemet CSE 556: Regresso
Lear regresso th oe varable Gve a set of N pars of data {, d }, appromate d b a lear fucto of regressor.e. or d + d here the actvato fucto φ s a lear fucto, correspodg to a lear euro. s the output of the euro, ad s called the regresso epectatoal error b + ε ϕ + b ε d + b + ε + ε CSE 556: Regresso 3
CSE 556: Regresso 4 Lear regresso cot. The problem of regresso th oe varable s ho to choose ad b to mmze the regresso error The least squares method ams to mmze the square error: N N d E 1 1 1 1 ε
Lear regresso cot. To mmze the to-varable square fucto, set E b E CSE 556: Regresso 5
CSE 556: Regresso 6 Lear regresso cot. b d b d b b E 1 b d b d E 1
Aaltc soluto approaches Solve oe equato for b terms of Substtute to other equato, solve for Substtute soluto for back to equato for b Setup sstem of equatos matr otato Solve matr equato Rerte problem matr form Compute matr gradet Solve for CSE 556: Regresso 7
CSE 556: Regresso 8 Lear regresso cot. Hece here a overbar.e. dcates the mea N d d b d d Derve ourself!
Lear regresso matr otato Let XX 11 33 NN TT The the model predctos are XXXX Ad the mea square error ca be rtte EE dd dd XXXX To fd the optmal, set the gradet of the error th respect to equal to ad solve for EE dd XXXX See The Matr Cookbook Peterse & Pederse CSE 556: Regresso 9
Lear regresso matr otato EE dd XXXX dd XXXX TT dd XXXX ddtt dd TT XX TT dd + TT XX TT XXXX XX TT dd XX TT XXXX EE XXTT dd XX TT XXXX XX TT XX 1 XX TT dd Much cleaer! CSE 556: Regresso 1
Fdg optmal parameters va search Ofte there s o closed form soluto for EE We ca stll use the gradet a umercal soluto We ll stll use the same eample to permt comparso For smplct s sake, set b E 1 N 1 E s called a cost fucto d CSE 556: Regresso 11
Cost fucto E E m * Questo: ho ca e update from to mmze E? CSE 556: Regresso 1
CSE 556: Regresso 13 Gradet ad drectoal dervatves Cosder a to-varable fucto f,. Its gradet at the pot, T s defed as here u ad u are ut vectors the ad drectos, ad ad T f f f f u, u,,,, f + f f f f
CSE 556: Regresso 14 Gradet ad drectoal dervatves cot. At a gve drecto, u au + bu, th, the drectoal dervatve at, T alog the ut vector u s Whch drecto has the greatest slope? The gradet because of the dot product! u, f,, ],, [ ],, [ lm,, lm, u T h h bf af h f hb f hb f hb ha f h f hb ha f f D + + + + + + + + 1 + b a
Gradet ad drectoal dervatves cot. Eample: f, 5 3 + 5 + + CSE 556: Regresso 15
Gradet ad drectoal dervatves cot. Eample: f, 5 3 + 5 + + CSE 556: Regresso 16
Gradet ad drectoal dervatves cot. The level curves of a fucto ff, are curves such that ff, kk Thus, the drectoal dervatve alog a level curve s T Du f, u Ad the gradet vector s perpedcular to the level curve CSE 556: Regresso 17
Gradet ad drectoal dervatves cot. The gradet of a cost fucto s a vector th the dmeso of that pots to the drecto of mamum E crease ad th a magtude equal to the slope of the taget of the cost fucto alog that drecto Ca the slope be egatve? CSE 556: Regresso 18
Gradet llustrato E E m * Δ E lm E + E Gradet CSE 556: Regresso 19
Gradet descet Mmze the cost fucto va gradet steepest descet a case of hll-clmbg + 1 η E : terato umber η: learg rate See prevous fgure CSE 556: Regresso
CSE 556: Regresso 1 Gradet descet cot. For the mea-square-error cost fucto ad lear euros ] [ 1 ] [ 1 1 d d e E 1 e e E E
CSE 556: Regresso Gradet descet cot. Hece Ths s the least-mea-square LMS algorthm, or the Wdro-Hoff rule ] [ 1 d e + + + η η
Stochastc gradet descet If the cost fucto s of the form EE EE CSE 556: Regresso 3 NN 1 The oe gradet descet step requres computg Δ NN EE EE 1 Whch meas computg EE or ts gradet for ever data pot Ma steps ma be requred to reach a optmum
Stochastc gradet descet It s geerall much more computatoall effcet to use + bb 1 Δ EE For small values of bb Ths update rule ma coverge ma feer passes through the data epochs CSE 556: Regresso 4
Stochastc gradet descet eample CSE 556: Regresso 5
Stochastc gradet descet error fuctos CSE 556: Regresso 6
Stochastc gradet descet gradets CSE 556: Regresso 7
Stochastc gradet descet amato CSE 556: Regresso 8
Gradet descet amato CSE 556: Regresso 9
Mult-varable LMS The aalss for the oe-varable case eteds to the multvarable case 1 T E [ d ] E E E,,..., 1 E m here b bas ad 1, as doe for perceptro learg T CSE 556: Regresso 3
CSE 556: Regresso 31 Mult-varable LMS cot. The LMS algorthm ] [ 1 d e E + + + η η η
LMS algorthm remarks The LMS rule s eactl the same equato as the perceptro learg rule Perceptro learg s for olear M-P euros, hereas LMS learg s for lear euros..e., perceptro learg s for classfcato ad LMS s for fucto appromato LMS should be less sestve to ose the put data tha perceptros O the other had, LMS learg coverges slol Neto s method chages eghts the drecto of the mmum E ad leads to fast covergece. But t s ot ole ad s computatoall epesve CSE 556: Regresso 3
Stablt of adaptato Whe η s too small, learg coverges slol Whe η s too large, learg does t coverge CSE 556: Regresso 33
Learg rate aealg Basc dea: start th a large rate but graduall decrease t Stochastc appromato η c s a postve parameter c CSE 556: Regresso 34
Learg rate aealg cot. Search-the-coverge η η 1+ τ η ad τ are postve parameters Whe s small compared to τ, learg rate s appromatel costat Whe s large compared to τ, learg rule schedule roughl follos stochastc appromato CSE 556: Regresso 35
Rate aealg llustrato CSE 556: Regresso 36
CSE 556: Regresso 37 Nolear euros To eted the LMS algorthm to olear euros, cosder dfferetable actvato fucto φ at terato 1 ] [ 1 j j j d d E ϕ
CSE 556: Regresso 38 Nolear euros cot. B cha rule of dfferetato ] [ v e v d v v E E j j j j ϕ ϕ
CSE 556: Regresso 39 Nolear euros cot. Gradet descet gves The above s called the delta δ rule If e choose a logstc sgmod for φ the 1 v e j j j j j ηδ ϕ η + + + ep 1 1 av v + ϕ ] [1 v v a v ϕ ϕ ϕ see tetbook
Role of actvato fucto φ φ v v The role of φ : eght update s most sestve he v s ear zero CSE 556: Regresso 4