CSE 556: Itroducto to Neural Netorks Lear Regresso Part II 1
Problem statemet Part II
Problem statemet Part II 3
Lear regresso th oe varable Gve a set of N pars of data <, d >, appromate d by a lear fucto of regressor.e. or d + d here the actvato fucto φ s a lear fucto, ad t correspods to a lear euro. y s the output of the euro, ad ε s called the regresso epectatoal error b y + ε ϕ + b + ε d + y b + ε Part II 4
Part II 5 Lear regresso cot. The problem of regresso th oe varable s ho to choose ad b to mmze the regresso error The least squares method ams to mmze the square error E: N N y d E 1 1 1 1 ε
Lear regresso cot. To mmze the to-varable square fucto, set E b E 0 0 Part II 6
Part II 7 Lear regresso cot. b d b b d b E 0 1 b d b d E 0 1
Part II 8 Lear regresso cot. Hece here a overbar.e. dcates the mea N d d b ] [ d d Derve yourself!
Lear regresso cot. Ths method gves a optmal soluto, but t ca be tmead memory-cosumg as a batch soluto Part II 9
Fdg optmal parameters va search Wthout loss of geeralty, set b 0 E 1 N 1 E s called a cost fucto d Part II 10
Cost fucto E E m * Questo: ho ca e update to mmze E? Part II 11
Part II 1 Gradet ad drectoal dervatves Wthout loss of geeralty, cosder a to-varable fucto f, y. The gradet of f, y at a gve pot 0, y 0 T s here u ad u y are ut vectors the ad y drectos, ad ad 0 0,,, y y T y y f y f f f f y f f y y y y f y f u u,, 0 0 0 0 +
Gradet ad drectoal dervatves cot. At ay gve drecto, u au + bu y, th a + b 1, the drectoal dervatve at 0, y 0 T alog the ut vector u s D u f, y f + ha, y + hb h f Whch drecto has the greatest slope? The gradet because of the dot product!, y 0 0 0 0 0 0 lm h 0 [ f 0 + ha, y0 + hb f 0, y0 + hb] + [ f 0, y0 + hb f 0, y0] lm h 0 h af 0, y0 bf 0, y0 + f T 0, y0u y Part II 13
Gradet ad drectoal dervatves cot. Eample: see blackboard Part II 14
Gradet ad drectoal dervatves cot. To fd the gradet at a partcular pot 0, y 0 T, frst fd the level curve or cotour of f, y at that pot, C 0, y 0. A taget vector u to C satsfes D u T f 0, y0 u 0 because f, y s costat o a level curve. Hece the gradet vector s perpedcular to the taget vector Part II 15
A llustrato of level curves Part II 16
Gradet ad drectoal dervatves cot. The gradet of a cost fucto s a vector th the dmeso of that pots to the drecto of mamum E crease ad th a magtude equal to the slope of the taget of the cost fucto alog that drecto Ca the slope be egatve? Part II 17
Gradet llustrato E E m * Δ 0 E lm 0 E + E 0 0 0 Gradet Part II 18
Gradet descet Mmze the cost fucto va gradet steepest descet a case of hll-clmbg + 1 η E : terato umber η: learg rate See prevous fgure Part II 19
Part II 0 Gradet descet cot. For the mea-square-error cost fucto: ] [ 1 1 y d e E lear euros 1 e E E ] [ 1 d e
Gradet descet cot. Hece + 1 + ηe + η[ d y ] Ths s the least-mea-square LMS algorthm, or the Wdro-Hoff rule Part II 1
Mult-varable case The aalyss for the oe-varable case eteds to the multvarable case 1 T E [ d ] E E E E,,..., 0 1 m T here 0 b bas ad 0 1, as doe for perceptro learg Part II
Part II 3 Mult-varable case cot. The LMS algorthm 1 E + η e η + ] [ y d + η
LMS algorthm Remarks The LMS rule s eactly the same math form as the perceptro learg rule Perceptro learg s for McCulloch-Ptts euros, hch are olear, hereas LMS learg s for lear euros. I other ords, perceptro learg s for classfcato ad LMS s for fucto appromato LMS should be less sestve to ose the put data tha perceptros. O the other had, LMS learg coverges sloly Neto s method chages eghts the drecto of the mmum E ad leads to fast covergece. But t s ot a ole verso ad computatoally etesve Part II 4
Stablty of adaptato Whe η s too small, learg coverges sloly Part II 5
Stablty of adaptato cot. Whe η s too large, learg does t coverge Part II 6
Learg rate aealg Basc dea: start th a large rate but gradually decrease t Stochastc appromato η c s a postve parameter c Part II 7
Learg rate aealg cot. Search-the-coverge η0 η 1+ τ η 0 ad τ are postve parameters Whe s small compared to τ, learg rate s appromately costat Whe s large compared to τ, learg rate schedule roughly follos stochastc appromato Part II 8
Rate aealg llustrato Part II 9
Part II 30 Nolear euros To eted the LMS algorthm to olear euros, cosder dfferetable actvato fucto φ at terato ] [ 1 y d E ] [ 1 j j j d ϕ
Nolear euros cot. By cha rule of dfferetato E E y y v v j j [ d y ] ϕ v e ϕ v j j Part II 31
The gradet descet gves Nolear euros cot. The above s called the delta δ rule If e choose a logstc sgmod for φ the j + 1 +η e ϕ v ϕ v j j + ηδ 1+ 1 ep av see tetbook ϕ v aϕ v[1 ϕ v] j j Part II 3
Role of actvato fucto φ φ v v The role of φ : eght update s most sestve he v s ear zero Part II 33