CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 75 Mache Learg Lecture 7 Lear regresso Mlos Hauskrecht los@cs.ptt.edu 59 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear cobato of put copoets f + + + K d d K k - paraeters eghts + d Bas ter f Iput vector d d CS 75 Mache Learg

Lear regresso. Error. Data: D < > Fucto: f We ould lke to have f for all.. Error fucto easures ho uch our predctos devate fro the desred asers Mea-squared error.. Learg: We at to fd the eghts zg the error! f CS 75 Mache Learg Lear regresso. Eaple desoal put 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Lear regresso. Eaple. desoal put 5 5-5 - -5 - - - - - - CS 75 Mache Learg Solvg lear regresso The optal set of eghts satsfes: T Leads to a sste of lear equatos SLE th d+ ukos of the for A b + + K+ + K+ d d Soluto to SLE: atr verso A b CS 75 Mache Learg

Gradet descet soluto Goal: the eght optzato the lear regresso odel Error f.. Iteratve soluto: Gradet descet frst order ethod Idea: Adust eghts the drecto that proves the Error The gradet tells us hat s the rght drecto α Error α > - a learg rate scales the gradet chages CS 75 Mache Learg Gradet descet ethod Desced usg the gradet forato Error Error * * Drecto of the descet Chage the value of accordg to the gradet α Error CS 75 Mache Learg

Gradet descet ethod Iteratvel approaches the optu of the Error fucto Error CS 75 Mache Learg -th eght: Ole gradet ethod T Lear odel f O-le error ole Error f O-le algorth: geerates a sequece of ole updates -th update step th : D < > Error α + α f Fed learg rate: - Use a sall costat α C Aealed learg rate: α - Graduall rescales chages CS 75 Mache Learg

O-le learg. Eaple.5.5.5.5.5.5.5.5 - - - - - - 5.5 5.5 5.5 5.5.5.5.5.5.5.5.5 - - -.5 - - - CS 75 Mache Learg Etesos of sple lear odel Replace puts to lear uts th feature bass fuctos to odel oleartes f + φ φ f φ φ - a arbtrar fucto of d φ The sae techques as before to lear the eghts CS 75 Mache Learg

Addtve lear odels Models lear the paraeters e at to ft f + k φ k k... - paraeters φ φ... φ - feature or bass fuctos Bass fuctos eaples: a hgher order poloal oe-desoal put φ φ φ Multdesoal quadratc φ φ φ φ 5 Other tpes of bass fuctos φ s φ cos φ CS 75 Mache Learg Fttg addtve lear odels Error fucto /.. f Assue: φ φ φ f φ.. Leads to a sste of lear equatos K φ φ + K+ φ φ + K+ φ φ φ Ca be solved eactl lke the lear case CS 75 Mache Learg

Eaple. Regresso th poloals. Regresso th poloals of degree Data pots: pars of < > Feature fuctos: feature fuctos φ K Fucto to lear: f + φ φ φ + φ CS 75 Mache Learg Learg th feature fuctos. Fucto to lear: f + φ O le gradet update for the <> par + α f k +α f φ Gradet updates are of the sae for as the lear ad logstc regresso odels CS 75 Mache Learg

Eaple. Regresso th poloals. Eaple: Regresso th poloals of degree + f φ O le update for <> par + α f + + α f CS 75 Mache Learg Multdesoal addtve odel eaple 5 5-5 - -5 - - - - - - CS 75 Mache Learg

Multdesoal addtve odel eaple CS 75 Mache Learg Statstcal odel of regresso A geeratve odel: f + ε f s a deterstc fucto ε s a rado ose t represets thgs e caot capture th f e.g. ε ~ N 5 5 5-5 - -5 -.5 - -.5.5.5 CS 75 Mache Learg

Statstcal odel of regresso Assue a geeratve odel: f + ε here T f s a lear odel ad ε ~ N The: f E odels the ea of outputs for ad the ose ε odels devatos fro the ea The odel defes the codtoal dest of gve p ep f π CS 75 Mache Learg ML estato of the paraeters lkelhood of predctos the probablt of observg outputs D gve ad s L D p Mau lkelhood estato of paraeters paraeters azg the lkelhood of predctos * arg a Log-lkelhood trck for the ML optzato Mazg the log-lkelhood s equvalet to azg the lkelhood p CS 75 Mache Learg l D log L D log p

CS 75 Mache Learg ML estato of the paraeters Usg codtoal dest We ca rerte the log-lkelhood as Mazg th regard to s equvalet to zg squared error fucto p D L D l log log c f p log + C f ] ep[ f p π CS 75 Mache Learg ML estato of paraeters Crtera based o ea squares error fucto ad the log lkelhood of the output are related We ko ho to optze paraeters the sae approach as used for the least squares ft But hat s the ML estate of the varace of the ose? Maze th respect to varace log c p ole + D l f * ˆ ea squared predcto error for the best predctor

Regularzed lear regresso If the uber of paraeters s large relatve to the uber of data pots used to tra the odel e face the threat of overft geeralzato error of the odel goes up The predcto accurac ca be ofte proved b settg soe coeffcets to zero Icreases the bas reduces the varace of estates Solutos: Subset selecto Rdge regresso Prcpal copoet regresso Net: rdge regresso CS 75 Mache Learg Rdge regresso Error fucto for the stadard least squares estates: T.. * T We seek: arg Rdge regresso: T + λ Where.. d.. ad λ What does the e error fucto do? CS 75 Mache Learg

Rdge regresso Stadard regresso: Rdge regresso: d T.. T +.. pealzes o-zero eghts th the cost proportoal to λ a shrkage coeffcet If a put attrbute has a sall effect o provg the error fucto t s shut do b the pealt ter Icluso of a shrkage pealt s ofte referred to as regularzato CS 75 Mache Learg λ Regularzed lear regresso Ho to solve the least squares proble f the error fucto s erched b the regularzato ter λ? Aser: The soluto to the optal set of eghts s obtaed aga b solvg a set of lear equato. Stadard lear regresso: T Soluto: * X X T X Regularzed lear regresso: T here X s a d atr th ros correspodg to eaples ad colus to puts * λi + X X T X T CS 75 Mache Learg