Kernel density estimator - PDF Free Download

Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I ecoomics, oparametric desity estimatio plays importat roles i various areas such as, for example, idustrial orgaizatio Guerre et al, 000, empirical fiace Ait-Sahalia, 996, ad etc These otes borrow from the followig sources: Li ad Racie 007, Paga ad Ullah 999, ad Härdle ad Lito 994 erel desity estimator Assumptio a Suppose {X i : i =,, } is a collectio of iid radom variables draw from a distributio with the CDF F ad PDF f b I the eighborhood N x of x, f is bouded ad twice cotiuously differetiable with bouded derivatives Whe discussig f x, we will implicitly assume that f x exists at x The ecoometricia s objective is to estimate f without imposig ay fuctioal form parametric assumptios o the PDF First, cosider estimatio of F Sice a estimator of F ca be costructed as F x = E {X i x}, ˆF x = {X i x} The fuctio ˆF x is called the empirical CDF of X i The WLLN implies that for all x, i= ˆF x p F x As a matter of fact, a stroger results ca be established Gliveko-Gatelli Theorem, see Chapter 9 of va der Vaart 998: sup ˆF x F x as 0 x R Next, by the CLT, / ˆF x F x d N 0, F x F x Furthermore, for ay x, x R, / ˆF x F x ad / ˆF x F x are joitly asymptotically ormal with mea zero ad the covariace F x x F x F x, where x x deotes the miimum betwee x ad x Sice df x F x + h F x h f x = = lim, dx h 0 h from, oe ca cosider the followig estimator for the PDF f: ˆf x = ˆF x + h ˆF x h h = {x h X i x + h }, h i=

where h is a small umber ote that we cosider cotiuously distributed radom variables, so that P X i = x h = 0 We write h istead of just h because, typically, it will be a fuctio of the sample size such that lim h = 0 Now, defie the followig kerel fuctio: The, the kerel PDF estimator is give by u = { u } ˆf x = h 3 i= Thus, with the kerel fuctio defied accordig to, the kerel desity estimator is a average umber of observatios i the small eighborhood of x as defied by the smoothig parameter or badwidth also kerel widow The kerel fuctio i is called uiform, because it correspods to the uiform distributio we have that u du = It has a disadvatage of givig equal weights to all observatios iside the h -widow with the ceter at x, regardless of how close they are to the ceter Also, if oe cosiders ˆf x as a fuctio of x, it is rough havig jumps at the poits X i ±h, ad has a zero derivative everywhere else Those problems ca be resolved if oe cosiders alterative kerel fuctios, for example, the quadratic kerel: h u = 5 u { u } 6 The class of estimators 3 with a kerel satisfyig u du = is referred to as Roseblatt-Parze erel Estimator Small sample properties of the kerel desity estimator We will make the followig assumptio cocerig : Assumptio a u du = b u = u c is compactly supported o [, ] ad bouded d u u du 0 The kerel desity estimator is biased: Lemma Uder Assumptios a ad a, E ˆf x f x = u f x + uh f x du Proof E ˆf x = h E i= = h E = h h h u x h f u du Next, usig chage of variable y = u x /h, u = x + yh, ad du = h dy, we obtai E ˆf x = u f x + uh du, ad the result follows sice f x u du = f x by Assumptio a

Lemma Uder Assumptios a ad a, the variace of ˆf x is give by V ar ˆf x = h u f x + uh du u f x + uh du Proof Sice the data are iid, V ar ˆf x = V ar h h = E E h h h h By the same chage of variable argumet as i the proof of Lemma, we obtai E u x = f u du h h = h u f x + uh du From Lemma, oe ca expect that the bias icreases with h ; a bigger badwidth implies that more observatios away from x have o-zero weights which cotributes to the bias O the other had, the variace decreases with h, as the estimator averages over more observatios The theorem below establishes more formally the bias-variace trade-off for the kerel estimator Let f s deote the s-order derivative of f: f s x = ds f x dx s Theorem Suppose that h 0 ad h as The, uder Assumptios ad, a E ˆf x f x = c x h + o h, where c x = f x u u du/ b V ar ˆf x = c x / h + O /, where c x = f x u du Proof Sice the first two derivative of f exist by Assumptio b, cosider the followig expasio for f x + uh : f x + uh = f x + f x uh + f x u u h, where x u lies betwee x ad x + uh From Lemma we have E ˆf x f x = u f x uh + f x u u h du = h u f x u u du = c h + h u f x u f x u du 4 The secod equality follows because by Assumptio b, u udu = 0 We will show ext that u f x u f x u du = o, 5 3

ad therefore the secod summad i 4 is o h By Assumptio c, we oly eed to cosider u ; by Assumptio b ad sice x u lies betwee x ad x + uh, f x u f f x sup z < z N x Next, sice h 0, Now, by the domiated covergece theorem, lim lim x u = x u f x u f x u du = = 0, u lim f x u f x u du which establishes 5 ad cocludes the proof of part a of the theorem For part b, u f x + uh du = f x u du h h + f x u udu + h u f x u u du = c h + O h, 6 sice u udu = 0 by symmetry Assumptio b, ad u f x u u du = O as i the proof of part a The result of part b follows from 6 ad Lemma Agai, Theorem shows the bias-variace trade-off The optimal choice of badwidth ca be foud by miimizig some fuctio that combies bias ad variace, for example, the mea squared error MSE: MSE ˆf x = E ˆf x f x = V ar ˆf x + E ˆf x f x = c x h = c x h Miimizatio of the leadig term of MSE gives + c x h 4 + O + c x h 4 + o 4c x h 3 = c x h, or + o h 4 h + h 4 7 h = = /5 c x 4c x /5 f x u du /5 f x u u du /5 /5 4

Whe the optimal i the MSE sese badwidth is selected, either bias or variace compoets of the MSE domiate each other asymptotically as V ar = Bias = O 4/5 Whe the Itegrated MSE criterio is employed, MSE ˆf x dx, the optimal badwidth becomes u du /5 h = f x /5 dx u u du /5 /5 Let ˆσ deote the sample variace of the data The followig rules of thumb ofte used i practice: h = 364ˆσ u du /5 u u du /5 /5, which is optimal for f x N µ, σ, ad h = 06ˆσ /5, which is optimal for f x N µ, σ ad whe is the stadard ormal desity Cosistecy of the kerel desity estimator Cosistecy of ˆf x follows immediately from Theorem by Chebychev s iequality Corollary Suppose that h 0 ad h as The, uder Assumptios ad, ˆf x p f x Proof By Chebychev s iequality, P ˆf x f x > ε E ˆf x f x ε = c x ε + c x h 4 h ε 0, + o + h 4 h where the secod lie is by 7 A stroger result ca be give, see Newey 994 Suppose that f admits at least m cotiuous derivatives o some iterval [x, x ]; has at least m cotiuous derivatives, is compactly supported ad of order m: u j u du = 0 for all j =,, m ; u m u du 0, ad u du = The sup x [x,x ] ˆf / h x f x = O p + h m log The derivatives of f ca be estimated by the derivative of ˆf, however, with a slower rates of covergece Newey 994 shows that sup x [x,x ] ˆf k h x f k k / x = O p + h m log 5

Asymptotic ormality of the kerel desity estimator Write ˆf x = h i= h = v i, where v i = h i= h Note that h ad cosequetly v i deped o The collectio {{v i : i =,, } : N} is called a triagular array I our case, uder Assumptio, v i s are iid The followig CLT is available for idepedet triagular arrays Lehma ad Romao, 005, Corollary, page 47 Lemma 3 Lyapouov CLT Suppose that for each, w,, w are idepedet Assume that Ew i = 0 ad σi = Ew i <, ad defie s = i= σ i Suppose further that for some δ > 0 the followig coditio holds: lim E w i +δ = 0 8 The, i= s +δ w i /s d N 0, i= The coditio 8 is called Lyapouov s coditio Whe the data are ot just idepedet but iid, the Lyapouov s coditio ca be simplified as follows Davidso, 994, Theorem 3 o page 373 Lemma 4 The Lyapouov s coditio is satisfied whe w,, w are iid, σ = Ewi > 0 uiformly i, ad lim E w i +δ / δ/ = 0 for some δ > 0 Proof Sice the data are iid, We have i= s +δ s = σ, ad +δ s +δ = / σ = +δ/ σ +δ E w i +δ = σ δ δ/ E w i +δ i= = σ δ δ/ E w i +δ Therefore, the Lyapouov s coditio is satisfied if δ/ E w i +δ 0, sice σ is uiformly bouded away from zero by the assumptio Assumig that lim σ exists, i the iid case, the result of Lyapouov CLT ca be stated as follows Corollary Suppose that for each, w,, w are iid, Ew i = 0, lim Ewi lim E w i +δ / δ/ = 0 for some δ > 0 The, / w i d N i= 0, lim Ew i > 0 ad fiite, ad 6

Next, we prove asymptotic ormality of the kerel desity estimator Theorem Suppose that h ad h / h 0 Assume further that f x > 0 The, uder Assumptios ad, h / ˆf x f x d N 0, f x u du 9 Furthermore, for x x, h / ˆf x f x ad h / ˆf x f x are asymptotically idepedet Proof By Theorem a, Defie The, h / ˆf x f x = h E w i = h / / h / h i= + h / h E h h / ˆf x f x h = = h f x = O h / / h / E h h / f x E h w i + O p h / h 0 i= w i + o p, where the equality i the secod lie is by the assumptio that h / h 0 It is ow left to verify the coditios of Corollary By the defiitio of w i, Ew i = 0 Next, Ewi = E E h h h h As i the proof of Lemma ad by the domiated covergece theorem, E = h u f x + uh du h i= = O h, so that the secod summad i is O h ad asymptotically egligible For the first term i, we ca use the chage of variable argumet agai: E = u x du h h h h = u f x + uh du, f x u du, 3 7

where the last result is by the domiated covergece theorem The results i -3 together imply that lim Ew i = f x u du Lastly, we show that E w i +δ / δ/ 0 We will use the c r iequality Davidso, 994, Theorem 98 o page 40 i order to deal with E w i +δ : for r > 0, m r m E X i c r E X i r, i= where c r = whe r, ad c r = m r whe r Now, by the c r iequality, E w i +δ +δ E h +δ/ +δ + h h +δ/ E +δ h By, Further, h +δ/ h +δ/ E +δ h i= E +δ h = = h +δ/ h δ/ = O h δ/ = O h +δ/ u x h +δ f u du u +δ f x + uh du where the equality i the last lie is agai by the domiated covergece theorem Hece, E w i +δ δ/ = O δ/ h This completes the proof of 9 I order to show asymptotic idepedece of ˆf x ad ˆf x, cosider their asymptotic covariace: E = u x u x f u du h h h h h h = u u + x x f x + uh du h Sice the kerel fuctio is compactly supported ad lim x x /h =, lim u + x x = 0, h ad by the domiated covergece theorem, u lim u + x x h Asymptotic idepedece the follows by the Cramer-Wold device, f x + uh du = 0 8

From 0, oe ca see that the assumptio h / h 0 is used to make the bias asymptotically egligible Cosequetly, there is uder-smoothig relatively to the MSE-optimal badwidth, ad the bias goes to zero at a faster rate tha the variace Suppose that the badwidth is chose accordig to h = c α The, h / h / α/ α = 5α/, ad for h / h 0 to hold, we eed that 5α < 0 or α > /5 Thus, for asymptotic ormality, the badwidth is o /5, while the MSE-optimal badwidth is h = c /5 A more geeral statemet of the asymptotic ormality result that also icludes the bias result, ie without imposig uder-smoothig is h / ˆf x fx 05h f x u udu d N 0, fx udu 4 The result i 4 holds provided that h ad does ot require that h / h 0 I particular, if oe chooses h = ah /5, the h / ˆf x fx d N a5/ f x u udu, fx udu, ad the kerel desity estimator is asymptotically biased Multivariate kerel desity estimatio ad the curse of dimesioality Suppose ow that {X i : i =,, } is a collectio of iid radom d-vectors draw from a distributio with a joit PDF f x,, x d The uivariate kerel desity estimator ca be exteded to the multivariate case as follows: ˆf x,, x d = d i= j= h Xij x j h = h d i= j= d Xij x j ote h d i the deomiator istead of h Oe ca see that the multivariate kerel desity estimator is a extesio of uivariate kerel smoothig to d dimesios or d variables I the multivariate case, oe ca establish results similar to those of the uivariate case To simplify the otatio, let ad write Also for u = u,, u d R d, let x = x x d R d, fx,, x d = fx d u = d u j, j= h, 9

so that ˆf x,, x d = ˆf x = h d i= dx i x/h Note that d udu = u du d u d du d = u du =, where the secod equality follows by Assumptio a Similar results to those show for the uivariate estimator ca be established i the multivariate case Assumptio 3 a Suppose {X i : i =,, } is a collectio of iid radom vectors draw from the distributio with a joit PDF f b I the eighborhood N x of x, f is bouded ad twice cotiuously differetiable with bouded partial derivatives Theorem 3 Suppose that h 0 ad h as The, uder Assumptios ad 3, fx x j a E ˆf x f x = fx + h u d u du j= b V ar ˆf x = fx u h d du + O /h d + / + oh Proof For part a, E ˆf x = h d u x d fudu = h d vfx + h vdv, where we used the chage of variable v = u x/h, u = x + h v, du j du = du du d, dv = dv dv d Next, = h dv j for j =,, d, ad fx + h v = fx + h v fx x + h v fx v x x v, where x v deotes the mea-value satisfyig x v x h v, ie it lies betwee x ad x + h v Sice the kerel fuctio is symmetric aroud zero, v d vdv = 0 By Assumptio 3b ad the same argumets as i the proof of Theorem a, fx v x x fx x x Hece, E ˆf x = fx + h v fx x x v dvdv + oh = fx + d d fx h v i v j d vdv + oh x i x j = fx + h = fx + h d j= i= j= fx x j v v dv d j= v j v j dv j + h fx x j d i= j i + oh, = o fx x i x j v i v i dv i v j v j dv j + oh 0

where the equality i the last lie holds due to the symmetry of the kerel fuctio u aroud zero For part b, V ar ˆf x = V ar d h d h [ = h d Ed h = [ h d Ed h = [ h d Ed h = h d Ed h = u x h d d h = h d dufx + h udu + O = h d du fx + h u fx x + h u fx v x x = d h h d fx u du + O h d +, ] h d E d h fx + Oh ] holds by the result i a ] f x + Oh + O fudu + O u du + O where the last lie follows sice ud udu = 0 due to the symmetry of the kerel fuctio aroud zero, ad because d udu = d u du The bias ad variace calculatios imply that ˆf x = fx + O p + h h d, ad therefore the rate of covergece slows dow with the umber of variables d I the oparametric literature, this is referred to as the curse of dimesioality Oe ca derive the MSE-poit-optimal badwidth cosiderig the leadig terms i the bias ad variace expressios implied by Theorem 3: MSE x = 4 h4 u u du Miimizig the MSE with respect to h, we obtai: h 3 u u du d j= Therefore, the MSE optimal badwidth is give by d j= fx x j fx d u du h = d u d u du j= fx x j fx x j + fx d u du h d = d fx d u du h d+ /d+4 /d+4

Oe ca see that the rate of the optimal badwidth, /d+4, icreases with the umber of variables d, ie oe should use larger values for the badwidth whe there are more variables Oe ca exted the uivariate CLT to the multivariate case as follows: h d / ˆf x fx d h u d fx u du x d N 0, fx u du j= j To elimiate the asymptotic bias, oe has to choose a uder-smoothig badwidth so that h d / h 0 Oe ca also icorporate differet badwidth values for differet variables: ˆf x,, x d = h h h d i= j= The bias ad variace results i this case take the followig form: d Xij x j E ˆf x = fx + d u fx u du h j + o h + + h d x j= j V ar ˆf x = fx d u du h + O + + h d + h h d h h d h j Aalogously, the CLT statemet ca be modified as h h d / ˆf x fx d u u du j= fx x j h j d N 0, fx u du d

Bibliography Ait-Sahalia, Y 996: Testig Cotiuous-Time Models of the Spot Iterest Rate, The Review of Fiacial Studies, 9, 385 46 Davidso, J 994: Stochastic Limit Theory, New York: Oxford Uiversity Press Guerre, E, I Perrige, ad Q Vuog 000: Optimal Noparametric Estimatio of First-Price Auctios, Ecoometrica, 68, 55 74 Härdle, W ad O Lito 994: Applied Noparametric Methods, i Hadbook of Ecoometrics, ed by R F Egle ad D L McFadde, Amsterdam: Elsevier, vol 4, chap 38, 95 339 Lehma, E L ad J P Romao 005: Testig Statistical Hypotheses, New York: Spriger, third ed Li, Q ad J S Racie 007: Noparametric Ecoometrics: Theory ad Practice, Priceto, New Jersey: Priceto Uiversity Press Newey, W 994: erel Estimatio of Partial Meas ad a Geeral Variace Estimator, Ecoometric Theory, 0, 33 53 Paga, A ad A Ullah 999: Noparametric Ecoometrics, New York: Cambridge Uiversity Press va der Vaart, A W 998: Asymptotic Statistics, Cambridge: Cambridge Uiversity Press 3