CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 75 Mache Learg Lecture 8 Lear regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear combato of put compoets f + + + K d d K k - parameters eghts + d Bas term f Iput vector d d CS 75 Mache Learg

Lear regresso. Error. Data: D < > Fucto: f We ould lke to have f for all.. Error fucto measures ho much our predctos devate from the desred asers Mea-squared error J.. Learg: We at to fd the eghts mmzg the error! f CS 75 Mache Learg Lear regresso. Eample dmesoal put 3 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Lear regresso. Eample. dmesoal put 5 5-5 - -5 - -3 - - 3-4 - 4 CS 75 Mache Learg Solvg lear regresso he optmal set of eghts satsfes: J Leads to a sstem of lear equatos SLE th d+ ukos of the form A b + + K+ + K+ d d Soluto to SLE: matr verso A b CS 75 Mache Learg

Gradet descet soluto Goal: the eght optmzato the lear regresso model J Error f.. Iteratve soluto: Gradet descet frst order method Idea: Adust eghts the drecto that mproves the Error he gradet tells us hat s the rght drecto α Error α > - a learg rate scales the gradet chages CS 75 Mache Learg Gradet descet method Desced usg the gradet formato Error Error * * Drecto of the descet Chage the value of accordg to the gradet α Error CS 75 Mache Learg

Gradet descet method Iteratvel approaches the optmum of the Error fucto Error 3 CS 75 Mache Learg -th eght: Ole gradet method Lear model f O-le error J ole Error f O-le algorthm: geerates a sequece of ole updates -th update step th : D < > Error α + α f Fed learg rate: - Use a small costat α C Aealed learg rate: α - Graduall rescales chages CS 75 Mache Learg

O-le learg. Eample 4.5 4.5 4 4 3.5 3.5 3 3.5.5.5.5-3 - - 3-3 - - 3 5.5 5 4.5 5.5 3 5 4 4.5 4 4 3.5 3.5 3 3.5.5.5.5.5-3 - - 3.5-3 - - 3 CS 75 Mache Learg Etesos of smple lear model Replace puts to lear uts th feature bass fuctos to model oleartes f m + φ φ f φ φ - a arbtrar fucto of d φ m m he same techques as before to lear the eghts CS 75 Mache Learg

Addtve lear models Models lear the parameters e at to ft f m + k φ k k... m - parameters φ φ... φ m - feature or bass fuctos Bass fuctos eamples: a hgher order polomal oe-dmesoal put 3 φ φ φ 3 Multdmesoal quadratc φ φ φ 3 4 φ 5 Other tpes of bass fuctos φ s φ cos φ CS 75 Mache Learg Fttg addtve lear models Error fucto J /.. f Assume: φ φ φ J f φ.. Leads to a sstem of m lear equatos K φ m φ + K+ φ φ + K+ mφ m φ φ Ca be solved eactl lke the lear case CS 75 Mache Learg

Eample. Regresso th polomals. Regresso th polomals of degree m Data pots: pars of < > Feature fuctos: m feature fuctos φ K m Fucto to lear: m f + φ φ φ + m φ m m m CS 75 Mache Learg Learg th feature fuctos. Fucto to lear: f + φ O le gradet update for the <> par + α f k +α f φ Gradet updates are of the same form as the lear ad logstc regresso models CS 75 Mache Learg

Eample. Regresso th polomals. Eample: Regresso th polomals of degree m m + f φ O le update for <> par + α f + m + α f CS 75 Mache Learg Multdmesoal addtve model eample 5 5-5 - -5 - -3 - - 3-4 - 4 CS 75 Mache Learg

Multdmesoal addtve model eample CS 75 Mache Learg Statstcal model of regresso A geeratve model: f + ε f s a determstc fucto ε s a radom ose t represets thgs e caot capture th f e.g. ε ~ N 3 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Statstcal model of regresso Assume a geeratve model: f + ε here f s a lear model ad ε ~ N he: f E models the mea of outputs for ad the ose ε models devatos from the mea he model defes the codtoal dest of gve p ep f π CS 75 Mache Learg ML estmato of the parameters lkelhood of predctos the probablt of observg outputs D gve ad s L D p Mamum lkelhood estmato of parameters parameters mamzg the lkelhood of predctos * arg ma Log-lkelhood trck for the ML optmzato Mamzg the log-lkelhood s equvalet to mamzg the lkelhood p CS 75 Mache Learg l D log L D log p

CS 75 Mache Learg ML estmato of the parameters Usg codtoal dest We ca rerte the log-lkelhood as Mamzg th regard to s equvalet to mmzg squared error fucto p D L D l log log c f p log + C f ] ep[ f p π CS 75 Mache Learg ML estmato of parameters Crtera based o mea squares error fucto ad the log lkelhood of the output are related We ko ho to optmze parameters the same approach as used for the least squares ft But hat s the ML estmate of the varace of the ose? Mamze th respect to varace log c p J ole + D l f * ˆ mea squared predcto error for the best predctor

Regularzed lear regresso If the umber of parameters s large relatve to umber of data pots used to tra the model e face the threat of overft geeralzato error of the model goes up he predcto accurac ca be ofte mproved b settg some coeffcets to zero Icreases the bas reduces the varace of estmates Solutos: Subset selecto Rdge regresso Prcpal compoet regresso Net: rdge regresso CS 75 Mache Learg Rdge regresso Error fucto for the stadard least squares estmates: J.. * We seek: arg m Rdge regresso: J + λ Where.. d.. ad λ What does the e error fucto do? CS 75 Mache Learg

Rdge regresso Stadard regresso: J Rdge regresso: J d.. +.. pealzes o-zero eghts th the cost proportoal to λ a shrkage coeffcet If a put attrbute has a small effect o mprovg the error fucto t s shut do b the pealt term Icluso of a shrkage pealt s ofte referred to as regularzato CS 75 Mache Learg λ Regularzed lear regresso Ho to solve the least squares problem f the error fucto s erched b the regularzato term λ? Aser: he soluto to the optmal set of eghts s obtaed aga b solvg a set of lear equato. Stadard lear regresso: J Soluto: * X X X Regularzed lear regresso: here X s a d matr th ros correspodg to eamples ad colums to puts * λi + X X X CS 75 Mache Learg

Regularzed lear regresso Problem: Ho to determe the parameter over-ft? λ that cotrols the Overftg s related to ML estmate. Baesa approach allevates the problem. J * X X X here X s a d matr th ros correspodg to eamples ad colums to puts * λi + X X X CS 75 Mache Learg Bas ad Varace Epected error Bas + Varace Epected error s the epected dscrepac betee the estmated ad true fucto E [ fˆ X E [ f X ] ] Bas s squared dscrepac betee averaged estmated ad true fucto [ fˆ X ] E [ f X ] E Varace s epected dvergece of the estmated fucto vs. ts average value E [ fˆ X E[ fˆ X ] ] CS 75 Mache Learg

E Bas ad Varace Epected error Bas + Varace [ ˆ f X E [ f X ] ] fˆ X E fˆ X + E [ [ ] [ ˆ E f X ] E [ f X ] ] fˆ X E [ fˆ X ] + E [ fˆ X ] E [ f X ] + E ˆ [ ˆ ] [ ˆ f X E f X E f X ] E [ f X ] E [ fˆ X ] E [ f X ] + E fˆ X E [ fˆ X ] + bas varace CS 75 Mache Learg Uder-fttg ad over-fttg Uder-fttg: Hgh bas models are ot accurate Small varace smaller fluece of eamples the trag set Over-fttg: Small bas models fleble eough to ft ell to trag data Large varace models deped ver much o the trag set CS 75 Mache Learg