UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$5:$NonALinear$Regression$ Models$

Size: px

Start display at page:

Download "UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$5:$NonALinear$Regression$ Models$"

Jasmin Bailey
5 years ago
Views:

1 UVACS6316 Fall15Graduate: MachineLearning Lecture5:NonALinearRegression Models 9/1/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Whereweare?! FivemajorsecHonsofthiscourse " Regression(supervised) " ClassificaHon(supervised) " Unsupervisedmodels " Learningtheory " Graphicalmodels 9/1/15

2 Today! Regression(supervised) " Fourwaystotrain/performopHmizaHonforlinear regressionmodels " NormalEquaHon " GradientDescent(GD) " StochasHcGD " Newton smethod " Supervisedregressionmodels " Linearregression(LR) " LRwithnonWlinearbasisfuncHons " LocallyweightedLR " LRwithRegularizaHons 9/1/15 3 Today " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinear InterpolaHon(later) 9/1/15 4

3 TradiFonalProgramming Data Program Computer Output MachineLearning Data Output Computer Program 9/1/15 5 Machine Learning in a Nutshell Task Representation Score Function Search/Optimization Models, Parameters MLgrewoutof workinai Op#mize(a( performance(criterion( using(example(data(or( past(experience,(( ( Aiming(to(generalize(to( unseen(data(( 9/1/15 6

4 (1) Multivariate Linear Regression Task Regression Representation Score Function Y = Weighted linear sum of X s Least-squares Search/Optimization Linear algebra / GD / SGD Models, Parameters Regression coefficients ŷ = f (x) =θ +θ 1 x 1 +θ x 9/1/15 7 Today " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinearInterpolaHon (later) " LinearRegressionModelwithRegularizaHons " RidgeRegression " LassoRegression 9/1/15 8

5 LRwithnonWlinearbasisfuncHons LRdoesnotmeanwecanonlydealwith linearrelahonships m! y =θ + θ ϕ (x) =θ T ϕ(x) j=1 j j Wearefreetodesign(nonWlinear)features (e.g.,basisfunchonderived)underlr wherethearefixedbasisfunchons(also define). E.g.:polynomialregression: 9/1/15! ϕ (x) j! ϕ (x)= 1! ϕ(x):= 1,x,x,x 3 T 9 e.g.(1)polynomialregression IntroducebasisfuncHons 9/1/15 θ * = ( ϕ T ϕ) 1 ϕ T! y 1 Dr.NandodeFreitas stutorialslide

6 e.g.(1)polynomialregression 9/1/15 KEY:ifthebasesaregiven,theproblemof learningtheparametersisshlllinear. 11 ManyPossibleBasisfuncHons TherearemanybasisfuncHons,e.g.: Polynomial RadialbasisfuncHons ϕ j (x) = x j 1 ( x µ ) φ ( x) x µ j Sigmoidal φ j ( x) = σ s j exp j = s 9/1/15 Splines, Fourier, Wavelets,etc 1

7 e.g.()lrwithradialwbasisfunchons E.g.:LRwithRBFregression: m!ŷ =θ + θ ϕ (x) =ϕ(x) T θ j=1 j j! ϕ(x):= 1,K (x,r λ 1 1 ),K λ (x,r ),K λ3 (x,r 3 ),K λ 4 (x,r 4 ) θ * = ( ϕ T ϕ) 1 ϕ T! y T 9/1/15 13 RBF=radialAbasisfuncFon:afuncFonwhichdepends onlyontheradialdistancefromacentrepoint GaussianRBF! asdistancefromthecentrerincreases, theoutputoftherbfdecreases " K λ (x,r) = exp # (x r) λ % ' & 1Dcase Dcase 9/1/15 14

8 e.g.anotherlinearregressionwith 1DRBFbasisfuncHons (4predefinedcentresandwidth)! ϕ(x):= 1,K (x,r λ 1 1 ),K λ (x,r ),K λ3 (x,r 3 ),K λ 4 (x,r 4 ) θ * = ϕ T ϕ 15 9/1/15 ( ) 1 ϕ T! y T e.g.alrwith1drbfs (3predefinedcentresandwidth) 1DRBF Agerfit: 9/1/15 16

$e.g.dgoodandbadrbfs AgoodDRBF TwobadDRBFs 9/1/15 17 Twomainissues: Learntheparameter\theta ϕ(x) AlmostthesameasLR,just!$

9 e.g.dgoodandbadrbfs AgoodDRBF TwobadDRBFs 9/1/15 17 Twomainissues: Learntheparameter\theta ϕ(x) AlmostthesameasLR,just!Xto LinearcombinaHonofbasisfuncHons(thatcanbe nonwlinear) Howtochoosethemodelorder,e.g. polynomialdegreeforpolynomialregression 9/1/15 18

Issue:Overfikngandunderfikng Underfit Looksgood Overfit y =θ + θ1x

Generalisation:learnfuncHon/ hypothesisfrompastdatainorder to

Regression Representation Score Function Y = Weighted linear sum

10 Issue:Overfikngandunderfikng Underfit Looksgood Overfit y =θ + θ1x y 5 = θ + θ1x + θx = j = y θ j x j 9/1/15 Generalisation:learnfuncHon/ hypothesisfrompastdatainorder to explain, predict, model or control newdataexamples KWfoldCross ValidaHon!!!! 19 () Multivariate Linear Regression with basis Expansion Task Regression Representation Score Function Y = Weighted linear sum of (X basis expansion) Least-squares Search/Optimization Linear algebra Models, Parameters Regression coefficients m!ŷ =θ + θ ϕ (x) =ϕ(x) T θ j=1 j j 9/1/15

11 Today " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinear InterpolaHon(later) 9/1/15 1 Locallyweightedlinearregression Thealgorithm: Insteadofminimizing nowwefittominimize J(θ) = 1 n θ w i (x it θ y i ) i=1 Wheredow i 'scomefrom? w i = K(x i,x )= exp (x x i ) τ! wherex_isthequerypointforwhichwe'dliketoknowitscorresponding y #EssenHallyweputhigherweightson(thoseerrors from)trainingexamplesthatareclosetothequery pointx_(thanthosethatarefurtherawayfromthe 9/1/15 querypoint) n 1 T J ( θ ) = ( xi θ yi ) i= 1

couldrepresent onlytheneighbor regionofx_ UseRBFfuncHonto

12 3 Locally weighted regression aka locally weighted regression, locally linear regression, LOESS, K λ (x i, x ) linear_func(x)w>y! couldrepresent onlytheneighbor regionofx_ UseRBFfuncHonto pickout/emphasize theneighborregion ofx_! K λ (x i, x ) 9/1/15 4 Locally weighted linear regression 9/1/15

Dr. Yanjun Qi / UVA CS 6316 / f15 Locally weighted linear regression Separate weighted least squares at each target point x : min N α (x ),β (x ) i=1 K λ (x i, x )[y i α(x ) β(x

13 Dr. Yanjun Qi / UVA CS 6316 / f15 Locally weighted linear regression Separate weighted least squares at each target point x : min N α (x ),β (x ) i=1 K λ (x i, x )[y i α(x ) β(x )x i ]! ˆf (x )= ˆα(x )+ ˆβ(x )x 9/1/15 LEARNING of Locally weighted linear regression x! ˆf (x )= ˆα(x )+ ˆβ(x )x! Separate weighted least squares at each target point x 9/1/15 6

b(x) T =(1,x); B: Nx regression matrix with i-th row b(x) T ; x ( K ( x, x )), i = 1, N WN N ( x ) = diag λ i, i LWR 9/1/15 versus ˆf (x ) = b(x ) T (B T W (x )B) 1 B T W (x )y LR ˆf

14 Locally weighted linear regression e.g.whenfor onlyone featurevariable Dr. Yanjun Qi / UVA CS 6316 / f15 Separate weighted least squares at each target N point x : min K ( x, x )[ y α( x ) β ( x ) x ] λ i i α ( x ), β ( x ) i= 1 f ˆ ( x ) =α ˆ( x) + ˆ( β x) b(x) T =(1,x); B: Nx regression matrix with i-th row b(x) T ; x ( K ( x, x )), i = 1, N WN N ( x ) = diag λ i, i LWR 9/1/15 versus ˆf (x ) = b(x ) T (B T W (x )B) 1 B T W (x )y LR ˆf (x q ) = (x q ) T θ * = (x q ) T ( X T X) 1 X T! y More! Local Weighted Polynomial Regression Local polynomial fits of any degree d min N K ( x, x ) y α( x ) d λ i i α ( x ), β j ( x ), j= 1,, d i= 1 j= 1 d j fˆ( x) = ˆ( α x) + ˆ β j= j ( x x 1 ) Blue:true Green:esHmated β ( x ) x j j i 9/1/15

15 Parametricvs.nonWparametric LocallyweightedlinearregressionisanonAparametric algorithm. The(unweighted)linearregressionalgorithmthatwesaw earlierisknownasaparametriclearningalgorithm becauseithasafixed,finitenumberofparameters(the),which θ arefittothedata; Oncewe'vefitthe\thetaandstoredthemaway,wenolongerneed tokeepthetrainingdataaroundtomakefuturepredichons. Incontrast,tomakepredicHonsusinglocallyweightedlinear regression,weneedtokeeptheenhretrainingsetaround. Theterm"nonWparametric"(roughly)referstothefactthatthe amountofknowledgeweneedtokeep,inordertorepresent thehypothesisgrowswithlinearlythesizeofthetrainingset. 9/1/15 9 (3) Locally Weighted / Kernel Linear Regression Task Regression Representation Score Function Y = Weighted linear sum of X s Weighted Least-squares Search/Optimization Linear algebra Models, Parameters min N Local Regression coefficients (conditioned on each test point) K λ (x i, x )[y i α(x ) β(x )x i ] α (x ),β (x ) i=1 ˆ 9/1/15 f ( x ) =α ˆ( x) + ˆ( β x) x 3

TodayRecap " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression

relatedbytheequahon: T yi = θ xi + ε i whereεisanerrortermofunmodeledeffectsorrandomnoise Nowassumethatε(followsaGaussianN(,σ),thenwe have: T 1 ( yi θ x ) exp πσ σ i

16 TodayRecap " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinear InterpolaHon(later) 9/1/15 31 ProbabilisHcInterpretaHonof LinearRegression(LATER) Letusassumethatthetargetvariableandtheinputsare relatedbytheequahon: T yi = θ xi + ε i whereεisanerrortermofunmodeledeffectsorrandomnoise Nowassumethatε(followsaGaussianN(,σ),thenwe have: T 1 ( yi θ x ) exp πσ σ i ( yi xi; θ) = p Byindependence(amongsamples)assumpHon: 9/1/15 n n n 1 i L( θ) = p( yi xi; θ) = exp i= πσ = 1 i 1 T ( y θ x ) σ ManymorevariaHons oflrfromthis perspechve,e.g. binomial/poisson (LATER) i 3

17 References allowingmetoreusesomeofhisslides " Prof.NandodeFreitas stutorialslide 9/1/15 33

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$4:$More$opBmizaBon$for$Linear$ Regression$

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$4:$More$opBmizaBon$for$Linear$ Regression$ Dr.YanjunQi/UVACS6316/f15 UVACS6316 Fall2015Graduate: MachineLearning Lecture4:MoreopBmizaBonforLinear Regression 9/14/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Whereweare?! FivemajorsecHonsofthiscourse