LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more observatios smooter estimates ad more bias wile smaller values of result i rouger estimates ad bigger variaces. Tus it is desirable to ave a automatic procedure tat picks te optimal value of i some sese from data. Here we cosider oe suc procedure least squares cross-validatio i te case of a uivariate desity estimatio. As te optimality criterio we use te itegrated MSE IMSE discussed i te previous lecture: IMSE E ˆf x fx dx b c xdx 4 b 4 + c + o f x dx 4 + u udu as c x was defied i Lecture as c x f x u udu; ad c c xdx udu as c x was defied i Lecture as c x fx udu. As sow i Lecture te IMSE-optimal badwidt is give by / c /5 b u du /5 f x dx /5 u u du /5 /5. Te discussio below follows Li ad Racie 007 see Sectio.3. Cross-validatio Ideally oe would like to compute te IMSE for differet values of owever fx appearig i te expressio of IMSE is ukow. Tus we sould attempt to approximate te IMSE usig data. Cosider te itegrated squared differece betwee ˆf ad f: ˆf x fx dx ˆf xdx ˆf xfxdx + f xdx.

Te last term does ot deped o ad terefore ca be igored. For te first term we ave ˆf xdx Xi x dx i Xi x Xj x dx i j u + u du i j i j we used te cage of variable u X i x/ x X i u X j x/ X j X i / + u dx du ad defied v uv + udu. Te fuctio v is called te covolutio kerel ad ca be computed for ay value v sice is kow ad cose by te ecoometricia. Te secod term i ivolves te ukow f ad terefore must be estimated. Sice for a fuctio ψx ψxfxdx EψX i oe ca estimate ψxfxdx by i ψx i. Hece a aive estimator of ˆf xfxdx would be i ˆf X i i j Tis approac owever does ot approximate ˆf xfxdx well as te expressio above cotais terms wit 0 wic produces 0. Tus te expressio must be modified to elimiate te terms wit 0: ˆf i X i i j i i Xj x ˆf i x j i.

is te so called leave-oe-out estimator of fx. Oe ca sow tat usig te leave-oe-out estimator of f produces a ubiased estimator of te leadig term of E ˆf xfxdx we data are iid ad te badwidt is fixed: E ˆf i E ˆf i X i i EE ˆf i X i X... X i X i+... X E ˆf i xf Xi X...X i X i+...x x X... X i X i+... X dx E ˆf i xfxdx by idepedece of X... X E ˆf i xfxdx E ˆf xfxdx te last equality olds sice E ˆf x E X j x/ does ot deped o te umber of observatios we is fixed. Te least squares cross-validatio criterio becomes CV i j i j i wic must be miimized umerically w.r.t to fid te optimal badwidt. Oe ca sow tat te leadig term of E CV captures tat of IMSE i. Lemma. Suppose tat Assumptios o te kerel fuctio i Lecture olds. Suppose tat data are iid wit te PDF f wic is four times cotiuously differetiable wit uiformly bouded derivatives. Te E CV f xdx + b 4 + c + o 4 +. Remark. i Note tat miimizig CV is equivalet to miimizig CV + f xdx. Hece te leadig term of E CV captures exactly te leadig term of IMSE E ˆf x fx dx. ii If oe does ot use te leave-oe-out estimator te tird term i te expressio for E CV becomes c 0/ wic is egative for typical kerels. As a result miimizig CV would produce 0 see Exercise.6 i Li ad Racie 007. Proof. Tis is a part of Exercise.6 i Li ad Racie 007. First re-write CV as CV 0 + CV + R 3 3

CV i j i i j i R i j i j i. i j i Note tat te first term i 3 captures oe of te leadig terms of te IMSE as 0 udu c. Sice u 0 oe ca easily sow tat R O p : P R > ɛ E R /ɛ wile E R E u v fufvdudv yfv + yfvdydv O. We ca re-write CV term as CV κ i j i κu u u. Next by te usual cage of variable argumet E CV κyfv + yfvdvdy 3 κy fv + j! f j vy j j + 4! f 4 v yy 4 4 fvdvdy j v y v y ad f j x dj fx dx j. 4

Usig te domiated covergece teorem ad te assumptio tat f 4 is uiformly bouded i.e. sup f 4 x < c x for some costat c > 0 we obtai: E CV 4 j0 j y j κydy j! f j vfvdv + o 4. Te proof ca ow be completed by sowig tat 0 j f j f x dx j vfvdv 0 j 3 f x dx j 4 wic ca be sow usig itegratio by parts ad tat j 0 0 j y j κydy 0 j 0 j 3 6 y ydy j 4 te result for j 4 ca be sow usig te biomial formula: for ay positive iteger p p p a + b p a i b b i. i i0 Lemma ca be used to sow tat CV E CV + s.o.p 4 s.o.p stads for smaller-order terms i probability. Te formula for CV ivolves expressios wit double summatios over i j: H X i X j. i j Suc statistics are called U-statistics. Teorem 4 i te Appedix sows ow U-statistics ca be approximated by usual averages of observatios ivolvig oly a sigle sum i wic i tur ca be used to sow 4. 5

A Appedix: U-statistics Let X... X be iid radom variables. Let H be a symmetric fuctio i.e. H : R R H a b Hb ad H is allowed to cage wit. A secod-order U-statistic is defied as U H X i X j i<j!!! deotes te umber of distict combiatios of two elemets out of elemets. Note tat a secod-order U-statistic is costructed by cosiderig all suc combiatios computig H X i X j for eac combiatio ad averagig over te combiatios sice te sum is take over i < j. Note also tat te order of X i ad X j is uimportat as H is symmetric. We are iterested i limitig teorems WLLNs ad CLTs for U-statistics. Te key step is trasformig a U-statistic ito averages of te form i g X i for some fuctio g : R R after wic usual WLLNs ad CLTs ca be applied. Suc a trasformatio is kow as Hájek s projectio. Te fuctio g x is costructed as g x E H X i X j X i x EH x X j EH X i x te secod equality olds by idepedece of X i ad X j ad te tird equality olds by te symmetry of H. Moreover µ Eg X i EE H X i X j X i EH X i X j. Projectig te U-statistic U i o observatio X i produces: E U X i E H X k X l X i. k<l Cosider E H X k X l X i for some fixed i. We i k ad i l E H X l X k X i EH X k X l µ. Te umber of suc terms is. 3 Te remaiig terms cotai X i ad terefore satisfy E H X l X k X i g X i. Hece E U X i µ + g X i µ + g X i µ + g X i µ. 5 6

Te projectio Û of U is defied as Û E U X i EU i µ + g X i µ. 6 i Note tat EÛ EU is acieved by subtractio of EU. Te scalig by i te secod term esures tat V arû/v aru as we will see from te followig lemma. Lemma. Suppose tat tere is c > 0 suc tat ξ V arg X i > c ξ ad ξ V arh X X are fiite ad as Te V arû/v aru. ξ ξ 0. Remark. Te coditio V arg X i > 0 rules out degeeracy of Û. A U-statistic satisfyig suc a coditio is called o-degeerate. Proof. First V arû 4 V arg X i 4 ξ. 7 For te U-statistic V aru i<j k<l Cov H X i X j H X k X l. Te covariace terms are zero we i j k l are all differet by idepedece. Tus a covariace term ca be o-zero we i te two variables i H X i X j coicide wit te two variables i H X k X l i.e. X i X k ad X j X l or ii H X i X j ad H X k X l ave oly oe variable i commo. Te cotributio from i to te variace of U is V arh X X V arh X X. To fid te cotributio from ii for eac i j select k i wic leaves coice for l j. Hece Cov H X i X j H X k X l E H X i X j µ H X i X l µ k<l EH X i X j H X i X l µ for some l j. Tus te cotributio from ii to te variace of U is 4 EH X i X j H X i X l µ. 7

Next EH X i X j H X i X l µ EE H X i X j H X i X l X i µ E E H X i X j X i E H X i X l X i µ Eg X i µ V arg X i. We ave ad terefore V aru V arh X X + 4 V arg X i ξ + 4 ξ V arû V aru 4ξ / ξ / + 4ξ / ξ / ξ + o +. Te asymptotic equivalece of te variaces of te U-statistic ad its projectio togeter wit equality of teir meas implies te asymptotic equivalece of U ad Û. Lemma 3. Suppose tat V arû/v aru. Te R U EU V aru Û EÛ V ar Û o p. Proof. By costructio ER 0. Tus if we sow tat V arr 0 te result follows by Cebysev s Markov s iequality. CovU V arr Û V aru V ar Û. 8 Cosider E U Û Û µ i i E U Û g X i EE U Û X i g X i 0 9 8

te last equality follows by 5 ad sice by costructio Te result i 9 implies tat EÛ X i µ + g X i µ. 0 EU Û µ EÛÛ µ CovU Û V arû. 0 It follows ow from 8 ad 0 tat V arû V arr V aru 0. We ca ow state te desired projectio result wic we ca be used to establis LLNs ad CLTs for te U-statistic U. Teorem 4 Projectio of o-degeerate secod-order U-statistics. Suppose tat te coditios of Lemma old. Te / U EU / g X i Eg X i + o p. i Moreover if ξ < d for some d > 0 ad all large U EU + O p /. Proof. Te first result follows from Lemmas ad 3 sice te result of Lemma 3 ca be re-writte as / U EU ξ + 4ξ / Û EÛ 4ξ + o p. Te secod result follows from te first by Cebysev s iequality ad sice V ar / g X i Eg X i ξ < d for all large. i 9