Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall PDF Free Download

Mache Learg CSE6740/CS764/ISYE6740, Fall 0 Itroducto to Regresso Le Sog Lecture 4, August 30, 0 Based o sldes from Erc g, CMU Readg: Chap. 3, CB

Mache learg for apartmet hutg Suppose ou are to move to LA!! Ad ou wat to fd the most reasoabl prced apartmet satsfg our eeds: square-ft., # of bedroom, dstace to campus Lvg area (ft # bedroom Ret ($ 30 600 506 000 433 00 09 500 50? 70.5?

he learg problem Features: Lvg area, dstace to campus, # bedroom Deote as =[,, k ] arget: Ret Deoted as rag set: ret Lvg area ret Lvg area Locato k k k,,,,,,,,, Y or

Lear Regresso ˆ 0 J ( ( ˆ (

he Least-Mea-Square (LMS method J ( (

he Least-Mea-Square (LMS method t j t j t ( ( j

he Least-Mea-Square (LMS method Steepest descet Note that: hs s as a batch gradet descet algorthm k J J J (,, t t t (

Some matr dervatves f R m : R For, defe: race: tra A A f ( A A A, m f f tra A A a, m f f trabc trcab trbca Some fact of matr dervatves (wthout proof A trab B, traba C CAB C AB, A A A A A

he ormal equatos Wrte the cost fucto matr form: o mmze J(θ, take dervatve ad set to zero: J ( ( 0 J tr tr tr tr he ormal equatos *

A recap: LMS update rule t j t j t ( j Pros: o-le, low per-step cost Cos: coordate, mabe slow-covergg Steepest descet Pros: fast-covergg, eas to mplemet Cos: a batch, Normal equatos t * t t ( Pros: a sgle-shot algorthm! Easest to mplemet. Cos: eed to compute pseudo-verse ( -, epesve, umercal ssues (e.g., matr s sgular..

Geometrc Iterpretato of LMS he predctos o the trag data are: Note that ad s the orthogoal projecto of to the space spaed b the colums of ˆ * I ˆ 0 I ˆ!! ŷ

Probablstc Iterpretato of LMS Let us assume that the target varable ad the puts are related b the equato: where ε s a error term of umodeled effects or radom ose Now assume that ε follows a Gaussa N(0,σ, the we have: B depedece assumpto: ( ep ; ( p p L ( ep ; ( (

Probablstc Iterpretato of LMS, cot. Hece the log-lkelhood s: ( ( l log Do ou recogze the last term? Yes t s: J ( ( hus uder depedece assumpto, LMS s equvalet to MLE of θ!

Beod basc LR LR wth o-lear bass fuctos Locall weghted lear regresso Regresso trees ad Multlear Iterpolato

LR wth o-lear bass fuctos LR does ot mea we ca ol deal wth lear relatoshps We are free to desg (o-lear features uder LR 0 m j j f f( ( where the f j ( are fed bass fuctos (ad we defe f 0 ( =. Eample: polomal regresso: f( :, 3,, We wll be cocered wth estmatg (dstrbutos over the weghts θ ad choosg the model order M.

Bass fuctos here are ma bass fuctos, e.g.: Polomal f ( j j Radal bass fuctos Sgmodal f ( j f ( j j s ep j s Sples, Fourer, Wavelets, etc

D ad D RBFs D RBF After ft:

Good ad Bad RBFs A good D RBF wo bad D RBFs

Locall weghted lear regresso Overfttg ad uderfttg 0 0 5 j 0 j j

Bas ad varace we defe the bas of a model to be the epected geeralzato error eve f we were to ft t to a ver (sa, ftel large trag set. B fttg "spurous" patters the trag set, we mght aga obta a model wth large geeralzato error. I ths case, we sa the model has large varace.

Locall weghted lear regresso he algorthm: Istead of mmzg ow we ft θ to mmze Where do w 's come from? J ( ( J ( w ( ( ep w where s the quer pot for whch we'd lke to kow ts correspodg Essetall we put hgher weghts o (errors o trag eamples that are close to the quer pot (tha those that are further awa from the quer Do we also have a probablstc terpretato here (as we dd for LR?

Parametrc vs. o-parametrc Locall weghted lear regresso s the frst eample we are rug to of a o-parametrc algorthm. he (uweghted lear regresso algorthm that we saw earler s kow as a parametrc learg algorthm because t has a fed, fte umber of parameters (the θ, whch are ft to the data; Oce we've ft the θ ad stored them awa, we o loger eed to keep the trag data aroud to make future predctos. I cotrast, to make predctos usg locall weghted lear regresso, we eed to keep the etre trag set aroud. he term "o-parametrc" (roughl refers to the fact that the amout of stuff we eed to keep order to represet the hpothess grows learl wth the sze of the trag set.

Robust Regresso he best ft from a quadratc regresso But ths s probabl better How ca we do ths?

LOESS-based Robust Regresso Remember what we do "locall weghted lear regresso"? we "score" each pot for ts mpotece Now we score each pot accordg to ts "ftess" (Courtes to Adrew Moor

Robust regresso For k = to R Let ( k, k be the kth datapot Let est k be predcted value of k Let w k be a weght for data pot k that s large f the data pot fts well ad small f t fts badl: w k f ( k est k he redo the regresso usg weghted data pots. Repeat whole thg utl coverged!

Robust regresso probablstc terpretato What regular regresso does: Assume k was orgall geerated usg the followg recpe: k k N( 0, Computatoal task s to fd the Mamum Lkelhood estmato of θ

Robust regresso probablstc terpretato What LOESS robust regresso does: Assume k was orgall geerated usg the followg recpe: wth probablt p: k k N( 0, but otherwse k ~ N(, huge Computatoal task s to fd the Mamum Lkelhood estmates of θ, p, µ ad σ huge. he algorthm ou saw wth teratve reweghtg/refttg does ths computato for us. Later ou wll fd that t s a stace of the famous E.M. algorthm

Regresso ree Decso tree for regresso Geder Rch? Num. Chldre # travel per r. Age Geder? F No 5 38 M No 0 5 M Yes 0 7 : : : : : Female Predcted age=39 Male Predcted age=36

A coceptual pcture Assumg regular regresso trees, ca ou sketch a graph of the ftted fucto *( over ths dagram? < 0?

How about ths oe? Partto the space ad each partto a costat ft Each cell ca be reached b askg a set of questos

ake home message Gradet descet O-le Batch Normal equatos Equvalece of LMS ad MLE LR does ot mea fttg lear relatos, but lear combato or bass fuctos (that ca be o-lear Weghtg pots b mportace versus b ftess

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012