Lecture 9: Regression: Regressogram and Kernel Regression

STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 9: Regressio: Regressogram ad erel Regressio Istructor: Ye-Ci Ce Referece: Capter 5 of All of oparametric statistics 9 Itroductio Let X, Y,, X, Y be a bivariate radom sample I te regressio aalysis, we are ofte iterested i te regressio fuctio mx EY X x Sometimes, we will write Y i mx i + ɛ i, were ɛ i is a mea 0 oise Te simple liear regressio model is to assume tat mx β 0 + β x, were β 0 ad β are te itercept ad slope parameter I tis lecture, we will talk about metods tat direct estimate te regressio fuctio mx witout imposig ay parametric form of mx 92 Regressogram Biig We start wit a very simple but extremely popular metod Tis metod is called regressogram but people ofte call it biig approac You ca view it as regressogram regressio + istogram For simplicity, we assume tat te covariates X i s are from a distributio over [0, ] Similar to te istogram, we first coose M, te umber of bis Te we partitio te iterval [0, ] ito M equal-widt bis: [ [ [ B 0,, B 2 M M, 2 M 2,, B M M M, M [ ] M, B M M M, We x B l, we estimate mx by m M x Y iix i B l IX average of te resposes wose covariates is i te same bi as x i B l Bias Te bias of a regressogram estimator is bias m M x O Variace Te variatio of a regressogram estimator is Var m M x O 9- M M

9-2 Lecture 9: Regressio: Regressogram ad erel Regressio Terefore, te MSE ad MISE will be at rate MSE O M 2 + O M, MISE O M 2 + O M, leadig to te optimal umber of bis M /3 ad te optimal covergece rate O 2/3, te same as te istogram Similar to te istogram, te regressogram as a slower covergece rate compared to may oter competitors we will itroduce several oter cadidates However, tey istogram ad regressogram are still very popular because te costructio of a estimator is very simple ad ituitive; practitioers wit little matematical traiig ca easily master tese approaces 93 erel Regressio Give a poit x 0, assume tat we are iterested i te value mx 0 Here is a simple metod to estimate tat value We mx 0 is smoot, a observatio X i x 0 implies mx i mx 0 Tus, te respose value Y i mx i + ɛ i mx 0 + ɛ i Usig tis observatio, to reduce te oise ɛ i, we ca use te sample average Tus, a estimator of mx 0 is to take te average of tose resposes wose covariate are close to x 0 To make it more cocrete, let > 0 be a tresold Te above procedure suggests to use i: X m loc x 0 Y i x 0 i Y ii X i x 0 x 0 I X i x 0, 9 were x 0 is te umber of observatios were te covariate X : X i x 0 Tis estimator, m loc, is called te local average estimator Ideed, to estimate mx at ay give poit x, we are usig a local average as a estimator Te local average estimator ca be rewritte as m loc x 0 Y ii X i x 0 I X i x 0 were W i x 0 I X i x 0 l I X l x 0 Y i I X i x 0 l I X l x 0 W i x 0 Y i, 92 is a weigt for eac observatio Note tat W ix 0 ad W i x 0 > 0 for all i,, ; tis implies tat W i x 0 s are ideed weigts Equatio 92 sows tat te local average estimator ca be writte as a weigted average estimator so te i-t weigt W i x 0 determies te cotributio of respose Y i to te estimator m loc x 0 I costructig te local average estimator, we are placig a ard-tresoldig o te eigborig poits tose witi a distace are give equal weigt but tose outside te tresold will be igored completely Tis ard-tresoldig leads to a estimator tat is ot cotiuous To avoid problem, we cosider aoter costructio of te weigts Ideally, we wat to give more weigts to tose observatios tat are close to x 0 ad we wat to ave a weigt tat is smoot Te Gaussia fuctio Gx 2π e x2 /2 seems to be a good cadidate We ow use te Gaussia fuctio to costruct a estimator We first costruct te weigt Wi G G x 0 l G x 0 X l 93

Lecture 9: Regressio: Regressogram ad erel Regressio 9-3 Te quatity > 0 is te similar quatity to te tresold i te local average but ow it acts as te smootig badwidt of te Gaussia After costructig te weigt, our ew estimator is m G x 0 Wi G G x 0 Y i l G x 0 X l Y i Y ig l G x 0 X l 94 Tis ew estimator as a weigt tat cages more smootly ta te local average ad is smoot as we desire Observig from equatio 9 ad 94, oe may otice tat tese local estimators are all of a similar form: m x 0 Y i l x 0 X l Wi x 0 Y i, Wi x 0 l x 0 X l, 95 were is some fuctio We is a Gaussia, we obtai estimator 94; we is a uiform over [, ], we obtai te local average 9 Te estimator i equatio 95 is called te kerel regressio estimator or Nadaraya-Watso estimator Te fuctio plays a similar role as te kerel fuctio i te DE ad tus it is also called te kerel fuctio Ad te quatity > 0 is similar to te smootig badwidt i te DE so it is also called te smootig badwidt 93 Teory Now we study some statistical properties of te estimator m We skip te details of derivatios 2 Bias Te bias of te kerel regressio at a poit x is bias m x 2 2 µ m x + 2 m xp x + o 2, px were px is te probability desity fuctio of te covariates X,, X ad µ x 2 xdx is te same costat of te kerel fuctio as i te DE Te bias as two compoets: a curvature compoet m x ad a desig compoet m xp x px Te curvature compoet is similar to te oe i te DE; we te regressio fuctio curved a lot, kerel smootig will smoot out te structure, itroducig some bias Te secod compoet, also kow as te desig bias, is a ew compoet compare to te bias i te DE Tis compoet depeds o te desity of covariate px Note tat i some studies, we ca coose te values of covariates so te desity px is also called te desig tis is wy it is kow as te desig bias Variace Te variace of te estimator is Var m x σ2 σ 2 px + o, were σ 2 Varɛ i is te error of te regressio model ad σ 2 2 xdx is a costat of te kerel fuctio te same as i te DE Tis expressio tells us possible sources of variace First, te variace icreases we σ 2 icreases Tis makes perfect sese because σ 2 is te oise level We te oise level is large, we expect te estimatio error icreases Secod, te desity of covariate px is iversely related to te variace Tis is also very reasoable because we px is large, tere teds to be more data poits ttps://ewikipediaorg/wiki/erel_regressio 2 if you are iterested i te derivatio, ceck ttp://wwwsscwiscedu/~base/78/noparametrics2pdf ad ttp: //wwwmatsmacesteracuk/~peterf/math380/npr%20n-w%20estimatorpdf

9-4 Lecture 9: Regressio: Regressogram ad erel Regressio aroud x, icreasig te size of sample tat we are averagig from Last, te covergece rate is O, wic is te same as te DE MSE ad MISE Usig te expressio of bias ad variace, te MSE at poit x is MSE m x 4 4 µ2 m x + 2 m xp 2 x + σ2 σ 2 px px + o4 + o ad te MISE is MISE m 4 4 µ2 m x + 2 m xp 2 x dx + σ2 σ 2 px px dx + o4 + o 96 Optimizig te major compoets i equatio 96 te AMISE, we obtai te optimal value of te smootig badwidt opt C /5, were C is a costat depedig o p ad 932 Ucertaity ad Cofidece Itervals How do we assess te quality of our estimator m x? We ca use te bootstrap to do it I tis case, empirical bootstrap, residual bootstrap, ad wild bootstrap all ca be applied But ote tat eac of tem relies o sligtly differet assumptios Let X, Y,, X, Y be te bootstrap sample Applyig te bootstrap sample to equatio 95, we obtai a bootstrap kerel regressio, deoted as m Now repeat te bootstrap procedure B times, tis yields m,, m B, B bootstrap kerel regressio estimator Te we ca estimate te variace of m x by te sample variace Var B m x B B m l x m,bx, m,bx B l B l m l x Similarly, we ca estimate te MSE as wat we did i Lecture 5 ad 6 However, we usig te bootstrap to estimate te ucertaity, oe as to be very careful because we is eiter too small or too large, te bootstrap estimate may fail to coverge its target We we coose O /5, te bootstrap estimate of te variace is cosistet but te bootstrap estimate of te MSE migt ot be cosistet Te mai reaso is: it is easier for te bootstrap to estimate te variace ta te bias Tus, we we coose i suc a way, bot bias ad te variace cotribute a lot to te MSE so we caot igore te bias However, i tis case, te bootstrap caot estimate te bias cosistetly so te estimate of te MSE is ot cosistet Cofidece iterval To costruct a cofidece iterval of mx, we will use te followig property of te kerel regressio: m x E m x D N 0, σ2 σ 2 px m x E m x Var m x D N0,

Lecture 9: Regressio: Regressogram ad erel Regressio 9-5 Te variace depeds o tree quatities: σ 2, σ 2, ad px Te quatity σ2 is kow because it is just a caracteristic of te kerel fuctio Te desity of covariates px ca be estimated usig a DE So wat remais ukow is te oise level σ 2 A good ews is: we ca estimate it usig te residuals Recall tat residuals are e i Y i Ŷi Y i m X i We m m, te residual becomes a approximatio to te oise ɛ i Te quatity σ 2 Varɛ so we ca use te sample variace of te residuals to estimate it ote tat te average of residuals is 0: σ 2 2ν + ν e 2 i, 97 were ν, ν are quatities actig as degree-of-freedom i wic we will explai later Tus, a α CI ca be costructed usig σ σ m x ± z α/2 p x, were p x is te DE of te covariates Bias issue 933 Resamplig Teciques Cross-validatio Bootstrap approac ttp://facultywasigtoedu/yecic/7sp_403/lec8-npregpdf 934 Relatio to DE May teoretical results of te DE apply to te oparametric regressio For istace, we ca geeralize te MISE ito oter types of error measuremet betwee m ad m We ca also use derivatives of m as estimators of te correspodig derivatives of m Moreover, we we ave a multivariate covariate, we ca use eiter a radial basis kerel or a product kerel to geeralize te kerel regressio to multivariate case Te DE ad te kerel regressio as a very iterestig relatiosip Usig te give bivariate radom sample X, Y,, X, Y, we ca estimate te joit PDF px, y as p x, y 2 Xi x Yi y Tis joit desity estimator also leads to a margial desity estimator of X: p x p x, ydy Xi x Now recalled tat te regressio fuctio is te coditioal expectatio mx EY X x ypy xdy px, y y px dy ypx, ydy px

9-6 Lecture 9: Regressio: Regressogram ad erel Regressio Replacig px, y ad px by teir correspodig estimators p x, y ad p x, we obtai a estimate of mx as y p x, ydy m x p x y 2 X i x Yi y dy X i x X i x y Yi y X i x Yi X i x X i x Y i X i x X i x m x Yi y dy Note tat we x is symmetric, y Y i Namely, we may uderstad te kerel regressio as a estimator ivertig te DE of te joit PDF ito a regressio estimator dy 94 Liear Smooter Now we are goig to itroduce a very importat otio called liear smooter Liear smooter is a collectio of may regressio estimators tat ave ice properties Te liear smooter is a estimator of te regressio fuctio i te form tat mx l i xy i, 98 were l i x is some fuctio depedig o X,, X but ot o ay of Y,, Y Te residual for te i-t observatio ca be writte as e j Y j mx j Y j l i X j Y i Let e e,, e T be te vector of residuals ad defie a matrix L as L ij l j X i : l X l 2 X l 3 X l X l X 2 l 2 X 2 l 3 X 2 l X 2 L l X l 2 X l 3 X l X Te te predicted vector Ŷ Ŷ,, Ŷ T LY, were Y Y,, Y T is te vector of observed Y i s ad e Y Ŷ Y LY I LY Example: Liear Regressio For te liear regressio, let X deotes te data matrix first colum is all value ad secod colum is X,, X We kow tat β X T X X T Y ad Ŷ X β XX T X X T Y Tis implies tat te matrix L is L XX T X X T,

Lecture 9: Regressio: Regressogram ad erel Regressio 9-7 wic is also te projectio matrix i liear regressio Tus, te liear regressio is a liear smooter Example: Regressogram Te regressogram is also a liear smooter Let B,, B m be te bis of te covariate ad defie Bx be te bi suc tat x belogs to Te l j x IX j Bx IX i Bx Example: erel Regressio As you may expect, te kerel regressio is also a liear smooter Recall from equatio 95 m x 0 Y i l x 0 X l Wi x 0 Y i, W i x 0 l x 0 X l so l j x 94 Variace of Liear Smooter x Xj l x X l Te liear smooter as a ubiased estimator of te uderlyig oise level σ 2 Recall tat te oise level σ 2 Varɛ i We eed to use two tricks about variace ad covariace matrix For a matrix A ad a radom variable X, Tus, te covariace matrix of te residual vector CovAX ACovXA T Cove CovI LY I LCovYI L T Because Y,, Y are IID, CovY σ 2 I, were I is te idetity matrix Tis implies Cove I LCovYI L T σ 2 I L L T + LL T Now takig matrix trace i bot side, TrCove Vare i σ 2 TrI L L T + LL T σ 2 ν ν + ν, were ν TrL ad ν TrLL T Because te residual square is approximately Vare i, we ave e 2 i Vare i σ 2 2ν + ν Tus, we ca estimate σ 2 by σ 2 2ν + ν e 2 i, 99 wic is wat we did i equatio 97 Te quatity ν is called te degree of freedom I te liear regressio case, ν ν p +, te umber of covariates so te variace estimator σ 2 p e2 i If you ave leared te variace estimator of a liear regressio, you sould be familiar wit tis estimator Te degree of freedom ν is easy to iterpret i te liear regressio Ad te power of equatio 99 is tat it works for every liear smooter as log as te errors ɛ i s are IID So it sows ow we ca defie effective degree of freedom for oter complicated regressio estimator