CSE 5526: Introduction to Neural Networks Linear Regression

CSE 556: Itroducto to Neural Netorks Lear Regresso Part II 1

Problem statemet Part II

Problem statemet Part II 3

Lear regresso th oe varable Gve a set of N pars of data <, d >, appromate d by a lear fucto of regressor.e. or d + d here the actvato fucto φ s a lear fucto, ad t correspods to a lear euro. y s the output of the euro, ad ε s called the regresso epectatoal error b y + ε ϕ + b + ε d + y b + ε Part II 4

Part II 5 Lear regresso cot. The problem of regresso th oe varable s ho to choose ad b to mmze the regresso error The least squares method ams to mmze the square error E: N N y d E 1 1 1 1 ε

Lear regresso cot. To mmze the to-varable square fucto, set E b E 0 0 Part II 6

Part II 7 Lear regresso cot. b d b b d b E 0 1 b d b d E 0 1

Part II 8 Lear regresso cot. Hece here a overbar.e. dcates the mea N d d b ] [ d d Derve yourself!

Lear regresso cot. Ths method gves a optmal soluto, but t ca be tmead memory-cosumg as a batch soluto Part II 9

Fdg optmal parameters va search Wthout loss of geeralty, set b 0 E 1 N 1 E s called a cost fucto d Part II 10

Cost fucto E E m * Questo: ho ca e update to mmze E? Part II 11

Part II 1 Gradet ad drectoal dervatves Wthout loss of geeralty, cosder a to-varable fucto f, y. The gradet of f, y at a gve pot 0, y 0 T s here u ad u y are ut vectors the ad y drectos, ad ad 0 0,,, y y T y y f y f f f f y f f y y y y f y f u u,, 0 0 0 0 +

Gradet ad drectoal dervatves cot. At ay gve drecto, u au + bu y, th a + b 1, the drectoal dervatve at 0, y 0 T alog the ut vector u s D u f, y f + ha, y + hb h f Whch drecto has the greatest slope? The gradet because of the dot product!, y 0 0 0 0 0 0 lm h 0 [ f 0 + ha, y0 + hb f 0, y0 + hb] + [ f 0, y0 + hb f 0, y0] lm h 0 h af 0, y0 bf 0, y0 + f T 0, y0u y Part II 13

Gradet ad drectoal dervatves cot. Eample: see blackboard Part II 14

Gradet ad drectoal dervatves cot. To fd the gradet at a partcular pot 0, y 0 T, frst fd the level curve or cotour of f, y at that pot, C 0, y 0. A taget vector u to C satsfes D u T f 0, y0 u 0 because f, y s costat o a level curve. Hece the gradet vector s perpedcular to the taget vector Part II 15

A llustrato of level curves Part II 16

Gradet ad drectoal dervatves cot. The gradet of a cost fucto s a vector th the dmeso of that pots to the drecto of mamum E crease ad th a magtude equal to the slope of the taget of the cost fucto alog that drecto Ca the slope be egatve? Part II 17

Gradet llustrato E E m * Δ 0 E lm 0 E + E 0 0 0 Gradet Part II 18

Gradet descet Mmze the cost fucto va gradet steepest descet a case of hll-clmbg + 1 η E : terato umber η: learg rate See prevous fgure Part II 19

Part II 0 Gradet descet cot. For the mea-square-error cost fucto: ] [ 1 1 y d e E lear euros 1 e E E ] [ 1 d e

Gradet descet cot. Hece + 1 + ηe + η[ d y ] Ths s the least-mea-square LMS algorthm, or the Wdro-Hoff rule Part II 1

Mult-varable case The aalyss for the oe-varable case eteds to the multvarable case 1 T E [ d ] E E E E,,..., 0 1 m T here 0 b bas ad 0 1, as doe for perceptro learg Part II

Part II 3 Mult-varable case cot. The LMS algorthm 1 E + η e η + ] [ y d + η

LMS algorthm Remarks The LMS rule s eactly the same math form as the perceptro learg rule Perceptro learg s for McCulloch-Ptts euros, hch are olear, hereas LMS learg s for lear euros. I other ords, perceptro learg s for classfcato ad LMS s for fucto appromato LMS should be less sestve to ose the put data tha perceptros. O the other had, LMS learg coverges sloly Neto s method chages eghts the drecto of the mmum E ad leads to fast covergece. But t s ot a ole verso ad computatoally etesve Part II 4

Stablty of adaptato Whe η s too small, learg coverges sloly Part II 5

Stablty of adaptato cot. Whe η s too large, learg does t coverge Part II 6

Learg rate aealg Basc dea: start th a large rate but gradually decrease t Stochastc appromato η c s a postve parameter c Part II 7

Learg rate aealg cot. Search-the-coverge η0 η 1+ τ η 0 ad τ are postve parameters Whe s small compared to τ, learg rate s appromately costat Whe s large compared to τ, learg rate schedule roughly follos stochastc appromato Part II 8

Rate aealg llustrato Part II 9

Part II 30 Nolear euros To eted the LMS algorthm to olear euros, cosder dfferetable actvato fucto φ at terato ] [ 1 y d E ] [ 1 j j j d ϕ

Nolear euros cot. By cha rule of dfferetato E E y y v v j j [ d y ] ϕ v e ϕ v j j Part II 31

The gradet descet gves Nolear euros cot. The above s called the delta δ rule If e choose a logstc sgmod for φ the j + 1 +η e ϕ v ϕ v j j + ηδ 1+ 1 ep av see tetbook ϕ v aϕ v[1 ϕ v] j j Part II 3

Role of actvato fucto φ φ v v The role of φ : eght update s most sestve he v s ear zero Part II 33