Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Lear Regresso Lear Regresso th Shrkage Some sldes are due to Tomm Jaakkola, MIT AI Lab

Itroducto The goal of regresso s to make quattatve real valued predctos o the bass of a vector of features or attrbutes. Examples: house prces, stock values, survval tme, fuel effcec of cars, etc. Predctg vehcle fuel effcec mpg from 8 attrbutes:

A geerc regresso problem The put attrbutes are gve as fxed legth vectors that could come from dfferet sources: puts, trasformato of puts log, square root, etc or bass fuctos. The outputs are assumed to be real valued R th some possble restrctos. Gve d trag samples D {x,...x, } from uko dstrbuto Px,, the goal s to mmze the predcto error loss o e examples x, dra at radom from the same Px,. A a example of a loss fucto: L f x, f x squared loss our predcto for x M x R

Regresso Fucto We eed to defe a class of fuctos tpes of predctos e make. Lear predcto: f x;, + x here, are the parameters e eed estmate.

Lear Regresso Tpcall e have a set of trag data x,... x, from hch e estmate the parameters,. x x,.., xm Each s a vector of measuremets for th case. or h x { h x,..., h x } a bass expaso of x M x f x; M t + jh j x h x j defe h x

Bass Fuctos There are ma bass fuctos e ca use e.g. Polomal h x Radal bass fuctos Sgmodal h j j x x j h j x x µ j σ s Sples, Fourer, Wavelets, etc exp xµ j s

Estmato Crtero We eed a fttg/estmato crtero to select approprate values for the parameters, based o the trag set D { x,... x, } For example, e ca use the emprcal loss: J, f x;, ote: the loss s the same as evaluato

Emprcal loss: motvato Ideall, e ould lke to fd the parameters, that mmze the expected loss ulmted trag data: here the expectato s over samples from Px,. ~,, ;, x f E J P x, Whe the umber of trag examples s large: P x x f x f E ~,, ;, ; Expected loss Emprcal loss

Lear Regresso Estmato Mmze the emprcal squared loss x f J, ;, x B settg the dervatves th respect to to zero e get ecessar codtos for the optmal parameter values., x x J, x J,

Iterpretato The optmalt codtos x x x ε esure that the predcto error x s decorrelated th a lear fucto of the puts.

...... X Lear Regresso: Matrx Form samples M+ M+... x x X X, t x J

Lear Regresso Soluto B settg the dervatves of to zero, X t X t e get the soluto: t X X X ˆ t t X X X The soluto s a lear fucto of the outputs.

Statstcal ve of lear regresso I a statstcal regresso model e model both the fucto ad ose Observed output fucto + ose x f x; +ε here, e.g., ε ~ N, σ Whatever e caot capture th our chose faml of fuctos ll be terpreted as ose

Statstcal ve of lear regresso fx; s trg to capture the mea of the observatos gve the put x: E [ x] E[ f x; + ε x] f x; here E[ x] s the codtoal expectato of gve x, evaluated accordg to the model ot accordg to the uderlg dstrbuto of X

Statstcal ve of lear regresso Accordg to our statstcal model x f x; +ε, ε ~ N, σ the outputs gve x are ormall dstrbuted th mea f x; ad varace σ : p x,, σ exp f x; πσ σ e model the ucertat the predctos, ot just the mea

Maxmum lkelhood estmato Gve observatos e fd the parameters that maxmze the lkelhood of the outputs: { },,...,, x x D x p L,,, σ σ Maxmze log-lkelhood k k x f ; exp σ πσ k k x f L ; log, log σ πσ σ mmze

Maxmum lkelhood estmato Thus MLE arg m f x; But the emprcal squared loss s J f x; Least-squares Lear Regresso s MLE for Gaussa ose!!!

Lear Regresso s t good? Smple model Straghtforard soluto BUT MLS s ot a good estmator for predcto error The matrx X T X could be ll codtoed Iputs are correlated Iput dmeso s large Trag set s small

Lear Regresso - Example I ths example the output s geerated b the model here ε s a small hte Gaussa ose:.8x+. x +ε Three trag sets th dfferet correlatos Three trag sets th dfferet correlatos betee the to puts ere radoml chose, ad the lear regresso soluto as appled.

Lear Regresso Example Ad the results are x RSS.9.8,.9.5.5 -.5 - -.5 RSS.9465 b8.8776,.93 < - - 3 x x 3 - - RSS.4 RSS.4955.4, b8.43483.56,.56833 < - - 3 x x RSS.47 3.36, -9. 3 RSS.75469 b83.363,-9.96 < - - 3 x x,x ucorrelated x,x correlated x,x strogl correlated Strog correlato ca cause the coeffcets to be ver large, ad the ll cacel each other to acheve a good RSS. - -

Lear Regresso What ca be doe?

Shrkage methods Rdge Regresso Lasso PCR Prcpal Compoets Regresso PLS Partal Least Squares

Shrkage methods Before e proceed: Sce the follog methods are ot varat uder put scale, e ll assume the put s ormalzed mea, varace : The offset x j x j x j x j s alas estmated as ad e ll ork th cetered meag - x σ j j / N

Rdge Regresso Rdge regresso shrks the regresso coeffcets b mposg a pealt o ther sze also called eght deca I rdge regresso, e add a quadratc pealt o the eghts: N M M J xj j + λ j j j here λ s a tug parameter that cotrols the amout of shrkage. The sze costrat prevets the pheomeo of ldl large coeffcets that cacel each other from occurrg.

Rdge Regresso Soluto Rdge regresso matrx form: J The soluto s ˆ t t X X +λ t t X X + λ I X LS t ˆ X X X t rdge M The soluto adds a postve costat to the dagoal of X T X before verso. Ths makes the problem o sgular eve f X does ot have full colum rak. For orthogoal puts the rdge estmates are the scaled verso of least squares estmates: rdge LS ˆ γ ˆ γ

Rdge Regresso sghts The matrx X ca be represeted b t s SVD: X UDV U s a N*M matrx, t s colums spa the colum space of X D s a M*M dagoal matrx of sgular values V s a M*M matrx, t s colums spa the ro space of X Lets see ho ths method looks the Prcpal Compoets coordates of X T

Rdge Regresso sghts X ' XV X T XV V X ' ' The least squares soluto s gve b: T T T T VDU UDV VDU D U T ls T ˆ ' V ˆ V ˆ ' ls The Rdge Regresso s smlarl gve b: rdge V T ˆ rdge V T T T VDU UDV + λi T ls D + λi DU D + λi D ˆ' VDU ˆ ls T Dagoal t X X X t

Rdge Regresso sghts rdge ˆ' j d j d j ˆ ' +λ I the PCA axes, the rdge coeffcets are just scaled LS coeffcets! The coeffcets that correspod to smaller put varace drectos are scaled do more. ls j d d

Rdge Regresso I the follog smulatos, quattes are plotted versus the quatt df λ d M j j d j + λ Ths mootoc decreasg fucto s the effectve degrees of freedom of the rdge regresso ft.

Rdge Regresso Smulato results: 4 dmesos,,3,4 No correlato b 6 5 4 3 Rdge Regresso Strog correlato b 6 5 4 3 Rdge Regresso RSS 6 5 4 3.5.5.5 3 3.5 Effectve Dmeso.5.5.5 3 3.5 Effectve Dmeso RSS 6 5 4 3 3 4 Effectve Dmeso.5.5.5 3 3.5 4 Effectve Dmeso

Rdge regresso s MAP th Gaussa pror J log P D P t log Ν x, σ Ν, τ t t X X + + cost σ τ Ths s the same objectve fucto that rdge solves, usg λσ /τ Rdge: J t t X X +λ

Lasso Lasso s a shrkage method lke rdge, th a subtle but mportat dfferece. It s defed b ˆ lasso arg m β N M x j j j subject to M There s a smlart to the rdge regresso problem: the L rdge regresso pealt s replaced b the L lasso pealt. j The L pealt makes the soluto o-lear ad requres a quadratc programmg algorthm to compute t. j t

Lasso If t s chose to be larger the t : t t M ls ˆ j the the lasso estmato s detcal to the least squares. O the other had, for sa tt /, the least squares coeffcets are shruk b about 5% o average. For coveece e ll plot the smulato results versus shrkage factor s: s t t M ls ˆ j t

Lasso Smulato results: No correlato b 6 5 4 3 Lasso Strog correlato b 6 5 4 3 Lasso RSS 6 5 4 3..4.6.8 Shrkage factor..4.6.8 Shrkage factor RSS 4 3..4.6.8 Shrkage factor..4.6.8 Shrkage factor

L vs L pealtes Cotours of the LS error fucto L L + β t β β β +β t I Lasso the costrat rego has corers; he the soluto hts a corer the correspodg coeffcets becomes he M> more tha oe.

Problem: Fgure plots lear regresso results o the bass of ol three data pots. We used varous tpes of regularzato to obta the plots see belo but got cofused about hch plot correspods to hch regularzato method. Please assg each plot to oe ad ol oe of the follog regularzato method.