Contents. 2 Distribution function estimation Multivariate density estimation Multivariate Kernel Density Estimation Example...

Size: px

Start display at page:

Download "Contents. 2 Distribution function estimation Multivariate density estimation Multivariate Kernel Density Estimation Example..."

Gerard Hensley
5 years ago
Views:

1 Cotets Uivariate desity estimatio 2. The Roseblatt desity estimator ad basic asymptotic results Uivariate Kerel Desity Estimatio Example Bias approximatio Distributio fuctio estimatio 3 Multivariate desity estimatio 5 3. Multivariate Kerel Desity Estimatio Example Badwidth Choice 8 4. Miimizig Itegrated Squared Error Miimizig Mea Itegrated Squared Error

2 Uivariate desity estimatio. The Roseblatt desity estimator ad basic asymptotic results Let X i : Ω R for i =, 2,..., be a sequece of idepedet ad idetically distributed iid radom variables defied o the probability space Ω, F, P. We deote the distributio fuctio of X i by F x = P X i x ad assume that F x = x fudu, where f is the desity fuctio of X i. Depedig o the cotext {X i } i= may also represet a sequece of values observatios for the radom variables X i. Defiitio. Let be a ostochastic positive scalar depedig o =, 2,..., x R, Kx : R R be a measurable fuctio. The, give a set of observatios {X i } i=,2,, we defie ˆfx = K. Remarks.. ˆfx is called the Roseblatt desity estimator or the kerel desity estimator. i= 2. The estimator ca be motivated usig the empirical distributio fuctio ad Calculus. Give a set of observatios o a sequece of iid radom variables {X i } i=,2,,, the empirical distributio fuctio is give by F x = i= I{ω Ω : X i x}, where IA is the idicator fuctio for the set A. Sice E I{ω Ω : X i x} = F x, by the measurability of I we have, by Khichi s Law of Large umbers, F x p F x+h F x for all x R. From Calculus we have that lim F x h 2 0 = fx for every poit x at which f is cotiuous. Hece, for close to zero we have fx. Replacig F with F we have f s x = i= 2 I{ω Ω : x < X i x + } = F x+h F x h 2 i= K X i x where Ku = 2I < u. The estimator i Defiitio allows K to be a geeric measurable fuctio. Restrictios o K that go beyod measurability are imposed to obtai properties of ˆf. 3. Commoly used kerels are:. Ku = 2I u rectagular 2. Ku = u I u triagular 3. Ku = 3 4 u2 I u Epaechikov/parabolic 4. Ku = 5 6 u2 2 I u biweight 5. Ku = 2π exp u 2 /2 Gaussia 6. Ku = 2 exp u / 2si u 2 + π 4 Silverma s. 2

3 4. Shapes of commoly used kerels. Rectagular Kerel Triagular Kerel Ku 0.6 Ku x h x x + h 0 x h x x + h Epaechikov Kerel Biweight Kerel Ku 0.6 Ku x h x x + h 0 x h x x + h 0.5 Gaussia Kerel 0.5 Silvermas Kerel Ku x 2h x x + 2h Ku x 2h x x + 2h Theorem. Assume that: a {X i } i= is a iid sequece of radom variables; b. K : R R is 2 X measurable; b.2 E K i x X < ; b.3 E K i x <. The, E ˆfx = Kγfx + γdγ ad V ˆfx = K 2 γfx + γdγ Kγfx + h γdγ 2. { } Proof. For a fixed, sice K is a measurable fuctio ad {X i } i= is a iid sequece, X K i x i= forms a iid sequece of radom variables. Thus, give b.2 E ˆfx = Lettig γ = α x V ˆfx = h 2 V i= E K = E K = h α x K fαdα., we have E ˆfx = h Kγfx + h γ dγ = Kγfx + γdγ. Give b.3 X K i x. Therefore, V ˆfx = = h 2 K 2 γ x fγdγ K 2 γfx + γdγ γ x K 2 fγdγ Kγfx + γdγ 2. 3

4 Note that i geral E ˆfx fx idicatig that ˆfx is a biased estimator for fx. The behavior of the bias ad the variace as ca be ivestigated by usig the followig theorem. Theorem 2. Assume that: a Kγ M for all γ R; b Kγ dγ < ; c γ Kγ 0 as γ ; d > 0 for all ad 0 as. Let fx : R R such that e fγ dγ <. The, for every poit of cotiuity x of f, h E K fx Kγdγ as. Proof. We eed to show that for all ɛ > 0 there exists N ɛ such that wheever > N ɛ we have fx + γ fxkγdγ < ɛ. By the triagle iequality fx + γ fxkγdγ fx + γ fx Kγ dγ ad sice x is a poit of cotiuity of f, for all ɛ > 0 there exists some δ ɛ,x > 0 such that fx + γ fx < ɛ provided that γ < δ ɛ,x or γ < δ ɛ,x /. Hece, by partioig the rage of itegratio ad usig the triagle iequality we have fx + γ fxkγdγ ɛ Kγ dγ + fx + γ K γ dγ γ <δ ɛ,x/ γ δ ɛ,x/ + fx K γ dγ. γ δ ɛ,x/ By b, γ δ ɛ,x/ K γ dγ 0 as sice 0. Furthermore, sice x is a poit of cotiuity of f, fx < C for some costat C <. Also, by b γ <δ ɛ,x/ K γ dγ < C. Thus, the first ad the third terms o the right had side of the iequality ca be made arbitrarily small. Now, by a ad a chage i variables γ = φ i itegratio γ δ ɛ,x/ fx + γ K γ dγ = fx + φ φ φ φ δ ɛ,x φ K dφ fx + φ φ φ δ ɛ,x φ δ ɛ,x K dφ sup δ φ δɛ,x φ φ ɛ,x K fx + φ dφ. φ δ ɛ,x By e φ δ ɛ,x fx + φ dφ < C, hece we eed sup φ δɛ,x φ K φ 0 as. Sice h 0 as it suffices to have x Kx 0 as x, which is assumed by c. Remark.. Note that for r >, Kφ r satisfies assumptios a-c i Theorem 2. Thus, h E K r = K r φfx + φdφ fx K r φdφ as. 2. Uder the assumptios of Theorems ad 2 we have that V ˆfx = fx K 2 γdγ +o. The ext theorem is a direct cosequece of Theorem 2. 4

5 Theorem 3. Let {X i } i= be a sequece of iid radom variables with desity fx. Assume that a Kγ M for all γ R; b Kγ dγ < ad Kγdγ = ; c γ Kγ 0 as γ ; d > 0 for all ad 0 as. The, E ˆfx fx 0 as ad if as, V ˆfx 0 as for every x which is a poit of cotiuity of f. Remark. If = C δ, the it must be that δ < 0 sice 0 as. Furthermore, i this case = C +δ ad if as, the it must be that + δ > 0. Combiig these two iequalities we coclude that < δ < 0. Theorem 4. Let {X i } i= be a sequece of iid radom variables with desity f. Assume that a Kγ M for all γ R; b Kγ dγ < ad Kγdγ = ; c γ Kγ 0 as γ. If > 0 for all, 0 ad as the E ˆfx fx 2 0 for every x which is a poit of cotiuity of f. Proof. Note that, E ˆfx fx 2 = E ˆfx E ˆfx + E ˆfx fx 2 = E ˆfx E ˆfx 2 + EE ˆfx fx 2 + 2E ˆfx E ˆfxE ˆfx fx = V ˆfx + E ˆfx fx 2. By Theorem 3 E ˆfx fx 2 0 ad V ˆfx 0 as. Remark. Sice, covergece i quadratic mea implies covergece i probability, a direct cosequece of Theorem 4 is that ˆfx fx p 0. I may cotexts it is desirable to obtai uiform covergece of ˆf over a specified subset of the rage of X i. The ext theorem gives coditios uder which ˆfx coverges to fx uiformly i probability over a compact subset of R. Theorem 5. Assume that: a Kγ M for all γ R; b Kγ dγ < ; c γ Kγ 0 as γ ; d for ay u, u R with u u we have Ku Ku < C u u for some arbitrary costat C < Lipschitz coditio of order i R; e 0, h 3 ad if f is cotiuous i G, a compact subset of R log /2 sup ˆfx E ˆfx = O p. x G h log as. The, Proof. Sice G is a compact subset of R, it is closed ad bouded. Therefore, there exists x 0 R ad some r > 0 such that G Bx 0, r = {x R : x x 0 < r}. I additio, for ay two x, x G we have x x < 2r. By the Heie-Borel Theorem every ope cover of G cotais a fiite subcover. Hece, for every we ca write G l k= B x k, r with r > l r or l < r/r. Now, give d, for x B x k, r we have ˆfx ˆfx k C h 2 x k x < C r h 2 ad E ˆfx E ˆfx k 5 C h 2 x k x < C r h 2,

6 give that f is a desity fuctio. Hece, by the triagle iequality ˆfx E ˆfx ˆfx k E ˆfx k + 2C r h. Lettig r 2 = h3/2 /2 we have /2 log sup x G ˆfx E ˆfx /2 /2 log max /2 k l ˆfx k E ˆfx 2C k + log /2 ad to prove the theorem it suffices to show that for all ɛ > 0 there exists ɛ > 0 such that h /2 P log max /2 k l ˆfx k E ˆfx k > ɛ < ɛ for all. Note that h /2 P log max /2 k l ˆfx k E ˆfx k > ɛ l k= h /2 P log ˆfx /2 k E ˆfx k > ɛ ad by Berstei s Iequality Beett 962 h /2 P log ˆfx /2 k E ˆfx log h k > ɛ < 2 exp 2 ɛ 2 i= V W i + 2 M 3 log ɛ where W i = K Xi x k E K Xi x k with EW i = 0, V W i = h 2 E K 2 k h 2 E K 2 k /2 ad W i < M/ for all i ad M <. Sice f is uiformly cotiuous cotiuity o a compact set implies uiform cotiuity V W i fx k K 2 h ψdψ for all x k ad provided that log we have c = 2 i= V W i C ɛ log/2 2fx /2 k K 2 ψdψ. Hece, provided 2 ɛ > 2fx k K 2 ψdψ ad h 3 h /2 P log max /2 k l ˆfx k E ˆfx k > ɛ log 2 < l 2 exp ɛ < 2 /2 c /2 < 2 h 3 r < ɛ. r 2 h 3/2 ɛ /c Remarks.. A direct cosequece of Theorem 5 is that sup x G ˆfx E ˆfx = h ad if 3 log the sup x G ˆfx E ˆfx = o p. log h 3 /2 Op 2. If h 3 log, the by the triagle iequality sup x G ˆfx fx sup x G ˆfx E ˆfx + sup x G E ˆfx fx = o p + sup x G E ˆfx fx. If f satisfies a Lipschitz coditio of order i G, that is fx fx < C x x for all x, x G ad Kγdγ = the E ˆfx fx fx + h ψ fx Kψ dψ C ψ Kψ dψ. Hece, 6

7 if ψ Kψ dψ <, sup x G E ˆfx fx = O ad sup x G ˆfx fx = O p. The coditio ψ Kψ dψ < is verified for all kerels such that K 0 ad γkγdγ <. 3. If f satisfies the Lipschitz coditio i remark 2 ad γkγdγ <, the for ay give the mea squared error i Theorem 4 is bouded above as E ˆfx fx 2 h 2 C + C 2. The first term is a boud o the square of the bias ad the secod is a boud o the variace. The bias boud icreases as icreases whereas the variace boud dimiishes as icreases. This is the bias-variace tradeoff associated with the choice of for fixed. The boud o the right-had side of the iequality is /3 miimized at h C = 2 2C /3. Hece, substitutig h i the expressio for the boud we have E ˆfx fx 2 = O 2/3. 4. If = C δ with < δ < 0 the h we have h log as. log = +δ C log. By L Hôpital s rule ad the fact that < δ < 0 6. If = C δ the h 3 = +3δ C 3 if /3 < δ. As i remark 3, by L Hôpital s rule ad the fact that /3 < δ < 0 we have h3 log as. Remarks ad 2 together with Theorem 5 provide the followig corollary. Corollary. Assume a Kγ M for all γ R; b Kγ dγ < ; c γ Kγ 0 as γ ; d for ay u, u R with u u we have Ku Ku < C u u for some arbitrary costat C < Lipschitz coditio of order i R; e K 0 ad γkγdγ < ; e 0 ad h3 log as. The, if f is cotiuous ad if it satisfies a Lipschitz coditio of order i G, a compact subset of R, Theorem 6. Assume that: a{x i } i= sup x G ˆfx fx = O p. forms a iid sequece of radom variables with desity f; b Kγ M for all γ R; c Kγ dγ < ; d γ Kγ 0 as γ ; e Kψ 2+δ dψ < for some δ > 0; f > 0 for all, 0 ad as. The, where x is a poit of cotiuity of f. Proof. ˆfx E ˆfx = t= K µ ad s 2 = t= EZ t µ 2. Note that s 2 = t= 2 h 2 E K Xt x ˆfx E ˆfx V ˆfx d N0,, X t x X E K t x. Let Z t = K Xt x, EZ t = E K 2 Xt x = h 2 V K Xt x = V ˆfx. 7

8 ˆfx E ˆfx Zt µ s Hece, = V ˆfx t= = t= X t with EX t = 0, EXt 2 = s EZ 2 t µ 2 ad t= EX2 t =. Cosequetly, by Liapouov s CLT see Davidso 994, p. 372 ˆfx E ˆfx V ˆfx d N0, provided lim t= E X t 2+δ = 0 for some δ > 0. We ow verify that this limit is i fact zero. E X t 2+δ = t= Z t µ E s t= 2+δ = V ˆfx δ 2 E Zt µ 2+δ t= = V ˆfx + δ 2 E Z t µ 2+δ By the c r iequality ad the fact that µ = O, we have E Z t µ 2+δ 2 +δ E Z t 2+δ. Therefore, E X t 2+δ V ˆfx + δ 2 2 +δ E Z t 2+δ t= = V ˆfx + δ 2 2 +δ δ Xt x 2+δ E K 2 +δ = +δ V ˆfx δ 2 2 +δ E K Xt x 2+δ 2 +δ = V ˆfx E + δ 2 δ/2 K Xt x 2+δ By Theorem 2 as, V ˆf fx K 2 φdφ ad E K Xt x 2+δ fx Kψ 2+δ dψ. Also, by assumptio 0. Therefore, δ/2 t= E X t 2+δ 0 as. Remarks.. Sice V ˆfx fx K 2 ψdψ we ca write 2 ˆfx E ˆfx fx K 2 ψdψ 2 d N0, or d 2 ˆfx E ˆfx N 0, fx K 2 ψdψ..2 Uivariate Kerel Desity Estimatio Example A good example of the empirical usage of kerel desity estimatio is Pittua ad Zelli 2006 I which the authors implemet a populatio weighted kerel desity estimatio strategy usig a gaussia kerel to study the icome distributio of the EU betwee 977 ad 996. The data is mea regioal per capita GDP calculated i costat 990 dollars ad coverted to Purchasig Power Stadards. The data was collected for 0 Fuctioal Urba Regios, accordig to the Nomeclature of Territorial Uits for Statistics NUTS. 8

9 Desity x ,000 0,000 5,000 20,000 25,000 30,000 Real per capita GDP i PPS Desity x ,000 0,000 5,000 20,000 25,000 30,000 Real per capita GDP i PPS.5 x x 0 4 Real per capita GDP i PPS Year With this graph oe could argue that i certai years the EU s icome distributio is bi-modal or eve tri-modal suggestig that this distributio is i-fact the sum of two or more uderlyig distributios. I their paper Pittua ad Zelli 2009 use various techiques to test for bi ad tri-modality..3 Bias approximatio It is desirable to provide approximatios for the bias of ˆfx. I the ext defiitio, for β R, β deotes the largest iteger strictly smaller tha β. Defiitio 2. Let T be a iterval i R ad let β ad C be two positive umbers. The Hölder class Σβ, C o T is defied as the set of β times differetiable fuctios f : T R whose derivative f β satisfies f β x f β x C x x β β, x x, x, x T. Remarks:. Note that 0 < β β. Sice β β > 0, f β x is cotiuous at every x T. 2. If f Σβ, C the all of its derivatives of order β exist for all x T. 3. If β β =, the f β satisfies a Lipschitz coditio of order i T. 9

10 Defiitio 3. We say that the kerel fuctio K is of order m with m Z if. Kudu = 2. u j Kudu = 0 for j =,, m. Theorem 7. Assume that f Σβ, C for β 2 ad K is a kerel of order β. The, if Kγ γ β dγ < C Proof. Sice f Σβ, C, by Taylor s Theorem, for some λ [0, ] fx + γ fx = β j= E ˆfx fx = Oh β. 2 j! f j x γ j + β! f β x + γλ γ β. Sice K is of order β ad f Σβ, C, E ˆfx fx = Kγfx + γ fxdγ β = Kγ j! f j x γ j + β! f β x + γλ γ β dγ j= = Kγ β! f β x + γλ γ β dγ = Kγ f β x + γλ f β x γ β dγ β! By the Triagle Iequality, f Σβ, C ad the fact that λ [0, ], we have E ˆfx fx C Kγ γλ β β γ β dγ β! C Kγ γ β β γ β dγ Ch β where the last equality follows from the assumptio that Kγ γ β dγ < C ad > 0. Remark:. It is commo to assume that β = 2. I this case, E ˆfx fx = Oh 2 ad γ 2 Kγ dγ < C. If Kγ 0, the the last coditio becomes γ 2 Kγdγ < C. 2. Give that /2 ˆfx fx = /2 ˆfx E ˆfx + /2 E ˆfx fx, we have, uder the coditios of Theorem 7 that Hece, provided h 2β+ /2 ˆfx fx = /2 ˆfx E ˆfx + h 2β+ /2 O. 0 we ca write, /2 ˆfx fx = /2 ˆfx E ˆfx + o. Usig Theorems 6, 7 ad Remark 2 we have, 0

11 Theorem 8. Uder the assumptios of Theorems 6 ad 7, if h 2β+ 0 as, the h ˆfx d fx N 0, fx K 2 γdγ. for every poit of cotiuity x of f. Remark: Theorem 6 requires that 0 ad ad the coditio i Remark 2 above requires h 2β+ 0. If C θ, the it is easy to see that ay < θ < 2β+ produces a badwidth sequece that satisfies all speed requiremets. If β is large, i.e., if the Hölder class icludes fuctios with large degree of smoothess, the upper boud o the rage of values for θ is close to zero, therefore relaxig the costraits imposed o the choice of. 2 Distributio fuctio estimatio It is atural to defie a estimator for the distributio fuctio at x -F x by itegratig ˆf. Hece, we have Defiitio 4. Let ˆfx be the kerel desity estimator. The kerel distributio fuctio estimator is give by ˆF x = x ˆfzdz. 3 Theorem 9. Assume that {X i } i= is a iid sequece ad that K is a desity fuctio. The, E ˆF x F x = F x + γ F xkγdγ 4 ad V ˆF x = 2 F x + γ GγKγdγ 2 F x + γkγdγ 5 where Gγ = γ Kzdz. Proof. By defiitio ˆF x = x i= K Xi y dy. By a chage of variable i itegratio we write ˆF x = i= h X i x Kzdz. Cosequetly, by the assumptio that the sequece {X i } i= is iid h we have E ˆF x = Kzdzfydy = Kzdzfydy = i= y x h y x h y x h Kzdzfydy give that Kzdz = γ = Kzdzfx + γdγ by lettig γ = y x = GγdF x + γ Itegratig by parts, we have GγdF x + γ = lim GγF x + γ lim GγF x + γ γ γ F x + h γdgγ = F x + γkγdγ. Thus, E ˆF x = F x + γkγdγ ad give that Kγdγ = E ˆF x F x = F x + γ F xkγdγ. 6

12 Now, V ˆF x = E 2 G E G h i= = E 2 2 G E G = 2 i= j= i= E G = G V = E Xj x Xj x E G G E G G 2 E G 2 where the peultimate equality follows from the assumptio that {X i } i= is iid. Usig itegratio by parts ad a chage of variable as i obtaiig the expressio for E ˆF x we have E G 2 = 2 F x + γkγgγdγ. X Hece, give that E G i x = F x + γkγdγ we have V ˆF x = 2 F x + γ GγKγdγ 2 F x + γkγdγ. 7 Remarks.. Note that E ˆF x F x is expressed with the same algebraic structure as the right-had side of E ˆfx fx = fx + γ fxkγdγ with F replacig f. I the case of ˆf we used Theorem 2, which required fγ dγ <, to show that E ˆfx fx 0 as. I the case of ˆF, it is ot geerally true that F γdγ <, thus we caot make use of Theorem 2. Additioal assumptios are eeded to study the behavior of the bias ad the variace of ˆF. Theorem 0. Assume that a {X i } i= is a iid sequece; b K is a desity fuctio with γkγdγ = 0 ad γ 2 Kγdγ < ; c F x has bouded secod derivative o R ad F x is cotiuous o R; d 0 < 0 as. The, E ˆF x = F x + Oh 2 ad V ˆF F x F x x = h fxα + o where α = 2 γkγgγdγ. Proof. By c, for ay x R ad some 0 λ we have F x + γ = F x + F x γ + 2 F 2 x + λ γh 2 γ 2. The, from Theorem 9 F x + γkγdγ = F x + h2 2 F 2 x + λ γγ 2 Kγdγ = F x + Oh 2 2

13 where the last itegral exists give that F 2 is bouded ad γ 2 Kγdγ <. Similarly, 2F x + γkγ Gγdγ = F x 2Kγ Gγdγ + F x 2γKγ Gγdγ + h 2 F 2 x + λ γ Gγγ 2 Kγdγ = 2F x F x 2 GγKγdγ + 2 F x γkγdγ F x2 γkγgγdγ + h 2 F 2 x + λ γγ 2 Kγdγ h 2 F 2 x + λ γγ 2 KγGγdγ. Note that all itegrals o the right had side of the last equality exist sice: itegratig by parts we have that 2 GγKγdγ =, by b γkγdγ = 0, γkγgγdγ γ Kγdγ < by b ad the fact that Gγ, F 2 x + λ γγ 2 KγGγdγ C γ 2 Kγdγ < by b, c ad Gγ. Hece, otig that F x = fx ad lettig α = 2 γkγgγdγ we have 2F x + γkγ Gγdγ = F x fxα + Oh 2. 8 Cosequetly, E ˆF x F x = Oh 2 ad V ˆF x = F x h fxα + Oh 2 F x + Oh 2 2 = = F x F x F x F x fxα + h2 fxα + o O + h4 O h sice 0. Remarks.. Uder the coditios of Theorem 0 ˆF x coverges i quadratic mea to F x ad cosequetly ˆF x p F x for all x. 2. Differet from the case of desity estimatio, there is o requiremet that i the case of distributio estimatio. 3. Give that {X i } i= is iid, the empirical distributio fuctio F x = i= IX i x is such that EF x = F x, V F x = F x F x ad usig Lévy s cetral limit theorem F x F x d N0, F x F x. Hece, the use of ˆF x as a estimator for F x itroduces a bias term compared to F x. However, sice V F x V ˆF x = h fxα o, it is possible to obtai a reductio of the variace. Cosequetly, E ˆF x F x 2 could be smaller tha EF x F x 2. A advatage of ˆF x over F x is its smoothess. 4. E ˆF x F x 2 F x F x o the righthad side of the iequality we obtai h = /3 fxα 4C h fxα + o + h 4 C. Choosig to miimize the boud /3. This rate of decay for h is differet from that obtaied for the estimatio of f whe β 2 i Theorem 8. Theorem. Assume that a {X i } i= is a iid sequece; b K is a desity fuctio with γkγdγ = 0 ad γ 2 Kγdγ < ; c F x has bouded secod derivative o R ad F x is cotiuous o R; d 3

14 0 < 0 as. The, ˆF x E ˆF x d N0, F x F x. 9 Proof. ˆF x E ˆF x = X i= G i x X E G i x. Let Z i = Xi x G, EZ i = µ ad s 2 = i= EZ i µ 2. Note that s 2 = 2 2 E G E G = V G = V i= ˆF x E ˆF x Hece, V ˆF x i= EX2 i =. Cosequetly, by Liapouov s CLT see Davidso 994, p. 372 ˆF x. = Zi µ i= s = i= X i with EX i = 0, EXi 2 = s EZ 2 i µ 2 ad ˆF x E ˆF x V ˆF x d N0, provided lim i= E X i 2+δ = 0 for some δ > 0. We ow verify that this limit is i fact zero. E X i 2+δ = i= Z i µ E s i= 2+δ = V ˆF x δ 2 E Zi µ 2+δ i= = V ˆF x + δ 2 E Z i µ 2+δ By the c r iequality ad the fact that by Theorem 0 µ = O, we have E Z i µ 2+δ 2 +δ E Z i 2+δ. Therefore, E X i 2+δ V ˆF x + δ 2 2 +δ E Z t 2+δ i= = V ˆF x + δ 2 2 +δ δ E G Xt x 2+δ V ˆF x + δ 2 δ sice Gx ad fγdγ = = V ˆF x δ 2 δ/2 By Theorem 0 as, V ˆF F x F x ad sice δ/2 0 for all δ > 0 we have i= E X i 2+δ = o. Sice V ˆF x = h fxα + o = o we have F x F x ˆF x E ˆF x F x F x d N0,. 0 Remark. ˆF x F x = ˆF x E ˆF x + E ˆF x F x = ˆF x E ˆF x + /2 h 2 O, where the last equality follows from Theorem 0. Hece, if /2 h 2 0 we have ˆF x F x d N0, F x F x. If = h o remark 4 followig Theorem 0 the /2 h 2 = /6 C 0 as ad ˆF x F x d N0, F x F x. Hece, uder suitable choice for the order of the badwidth, F ad ˆF have the same asymptotic distributio. 4

15 3 Multivariate desity estimatio Let X i ω : Ω R D for i =, 2,...,, D a fiite positive iteger, be a sequece of radom vectors defied o the probability space Ω, F, P ad suppose that the elemets of the sequece are idetically ad idepedetly distributed. We assume that F x = X i x fudu, where x RD ad X i x is take elemet-wise. As i the uivariate case f is called the desity fuctio of X i. Defiitio 5. Let H = diag{h d, } D d= be a ostochastic matrix with 0 < h d, for all d, x R D, K : R D R be a measurable fuctio. The, we defie ˆfx = deth i= K H X i x where deth deotes the determiat of the matrix H ad H deotes its iverse. Remark. Sice 0 < h d, for all d, deth > 0 ad H exists. Uder the assumptio that K : R D R is a measurable fuctio ad that E K H X i x < ad E K H X i x 2 < it is straightforward to prove a multivariate versio of Theorem. Hece, we have ad V ˆfx = deth E ˆfx = Kγfx + H γdγ K 2 γfx + H γdγ Kγfx + H γdγ 2. We ow state a extesio of Theorem 2. Throughout x E represets the Euclidea orm of the vector x. Theorem 2. Assume that: a Kφ M for all φ R D ; b R D Kφ dφ < ; c φ E Kφ 0 as φ E ; d 0 < h d, 0 for all d as. Let fx : R D R such that e R D fφ dφ <. The, for every cotiuity poit x of f, deth E K H X i x fx Kφdφ as. The remark that follows Theorem 2 applies to the multivariate case ad therefore deth E K r H X i x fx K r φdφ as for r >. Hece, we have the followig multivariate versio of Theorem 3. Theorem 3. Let {X i } i= be a sequece of iid radom vectors with desity fx. Assume that a Kγ M for all γ R; b Kγ dγ < ad Kγdγ = ; c γ E Kγ 0 as γ E ; d h d, > 0 for all ad h d, 0 as. The, E ˆfx fx 0 as ad if deth as, V ˆfx 0 as for every x which is a poit of cotiuity of f. 5

16 Remarks:. ˆfx p fx. Theorem 3 implies that ˆfx coverges i quadratic mea to fx ad cosequetly 2. If f satisfies a Lipschitz coditio of order i R D, i.e., for all x, x R D with x x we have that for some C <, fx fx C x x E. The E ˆfx fx fx + H γ fx Kγ dγ C H γ E Kγ dγ D C,d γ d Kγ dγ sice x E x for all x R D d= d= = C,d γ d Kγ dγ C traceh if γ d Kγ dγ < C. The coditio that γ d Kγ dγ < C is verified if K 0 ad γ d Kγdγ < C. Let D d,,d m fx = m x d x dm fx for d,, d m =,, D wheever the partial derivatives o the righthad side of the equality exist. We ow defie Defiitio 6. Let T R D ad let β ad C be two positive umbers. The Hölder class Σβ, C o T is defied as the set of β times differetiable fuctios f : T R whose partial derivatives of order β satisfy D d,,d β fx D d,,d β fx C x x β β E, x x, x, x T. Theorem 4. Assume that f Σβ, C for β 2 ad K is a multivariate kerel of order β, i.e., γd γ d β Kγdγ = 0 for d,, d β =,, D. The, if Kγ γ d β dγ < C for d =,, D E ˆfx fx = O D d= h β,d Proof. Sice f Σβ, C, by Taylor s Theorem, for some λ [0, ] fx + H γ fx = + D d fx,d γ d + 2 d = β! + β! d = d = d β = Sice K is of order β ad f Σβ, C, E ˆfx fx = fx + H γ fxdγ = β! = β! d = d = d β = d β = d β =,d,d β γ d γ d β Kγdγ d = d 2= = O traceh β. 2 D d,d 2 fx,d,d2 γ d γ d2 + D d,,d β fx,d,d β γ d γ d β D d,,d β fx + λh γ,d,d β γ d γ d β D d,,d β fx + λh γ,d,d β γ d γ d β Kγdγ Dd,,d β fx + λh γ D d,,d β fx 6

17 By the Triagle Iequality, f Σβ, C ad the fact that λ [0, ], we have E ˆfx fx C β! C β! d = d = d β = d β = λh γ β β E,d,d β γ d γ d β Kγ dγ D d= h β β,d γ d β β,d,d β γ d γ d β Kγ dγ where the last iequality follows from the fact that h d,, γ d > 0 ad β β. The largest power o,d occurs whe d = = d β = d ad we have E ˆfx fx CD β! provided Kγ γ β dγ < C ad > 0. d= h β,d γ d β Kγ dγ = O D The followig theorem is a extesio of Theorem 6 to the multivariate settig. Theorem 5. Assume that: a{x i } i= d= h β,d forms a iid sequece of radom vectors with desity f; b Kγ M for all γ R; c Kγ dγ < ; d γ E Kγ 0 as γ E ; e Kψ 2+δ dψ < for some δ > 0; f h d, > 0 for all, h d, 0 ad deth as. The, deth d 2 ˆfx E ˆfx N 0, fx K 2 ψdψ, where x is a poit of cotiuity of f. Remark. Theorem 5 shows that oe of the cosequeces of estimatig multivariate desities is that the rate of covergece i distributio slows expoetially as D grows. This is kow as the curse of dimesioality. 3. Multivariate Kerel Desity Estimatio Example The followig figure is simply a attempt to estimate a multivariate probability desity fuctio with a multivariate kerel. The followig figure also foreshadows the importace of our ext topic, Badwidth selectio. As you ca see the i the bottom graph choosig a badwidth which is too small ca distort the estimatio of the desity fuctio as much as choosig too large a badwidth. The the distributio estimated is the followig: x, y Nµ, Σ where µ = [ 4 ], Σ = [ 3 ]

18 4 Badwidth Choice 4. Miimizig Itegrated Squared Error It should be apparet by ow that badwidth choice is a importat part of oparametric desity estimatio. Ituitively, we would like to select a sequece so that a suitably defied distace betwee ˆfx ad fx is as small as possible over the set where the radom variable X i takes values. We are here searchig for a global criterio for the choice of. 8

19 Distaces i fuctio spaces ca be defied i various ways, subject to some mathematical techicalities. Here, we search for that will miimize fx ˆfx 2 dx 3 I the statistical literature this itegral is called the Itegrated Square Error ISE of ˆf, ad we write ISE ˆf. Note that ISE ˆf does ot deped o where the estimator ˆf is beig evaluated, but it does deped o the sample {X i } i=, hece ISE ˆf is itself a radom variable. It is clear that miimizatio of ISE ˆf is equivalet to the miimizatio of ˆf 2 xdx 2 fx ˆfxdx. From the defiitio of ˆfx, ˆf 2 xdx = 2 i= j= K K Xi X j where K Ku = Ku tktdt. The itegral fx ˆfxdx clearly depeds o f, which is ukow, hece to reder the criterio for choosig operatioal we eed a suitable approximatio for fx ˆfxdx. Note that, ad that E Z i = fx ˆfxdx = K = α x i= i= K Z i fxdx fxfαdxdα = E K α x where the expectatio is take with respect to the joit desity of two idepedet ad idetically distributed radom variables. Note also that α x E K i= j=,j i Xi X j K suggestig the followig approximatio for ˆf 2 xdx 2 fx ˆfx which ca be miimized with respect to, M ˆf = 2 i= j= K K Xj X i 2 i= j=,j i,, Xi X j K Sice K is kow, that miimizes M ˆf ca be chose by a umerical optimizatio procedure. Example: Suppose Kψ = 2π exp 2 ψ2. The, K Kψ = 2π 2 exp 4 ψ2, ad M ˆf = 2 i= j= exp 2π Xi X j 2 t= i=,i t. Xt X i K. 9

20 4.2 Miimizig Mea Itegrated Squared Error Oe of the remarks made regardig ISE was that it depeds o the sample {X i } i=. Oe ca imagie a process of repeated samplig that iduces a distributio o ISE. Hece, we might be cosider, MISE ˆf = E ˆfx fx 2 dx = E ˆfx E ˆfx + E ˆfx fx 2 dx = E { ˆfx E ˆfx 2 + E ˆfx fx ˆfx E ˆfxE ˆfx fx}dx = {V ˆfx + E ˆfx fx 2 }dx Note that V ˆfx = fx K 2 ψdψ+o ad if f Σ2, C we have that E ˆfx fx = h 2 2 µ 2 f 2 x + oh 2 with µ 2 = x 2 Kxdx. Hece, MISE ˆf ca be writte as MISE ˆf = K 2 ψdψ + h4 4 µ2 2 f 2 x 2 dx + o + oh 4. Let K 2 ψdψ = λ 2, µ 2 2 f 2 x 2 dx = λ so that MISE ˆf = λ 2 + h4 4 λ + o + oh 4. Let the that miimizes the first two terms of MISE ˆf be deoted by ĥ. It satisfies ĥ 2 λ 2 + ĥ3 λ = 0 or ĥ5 λ = λ 2 which implies ĥ = λ2 λ 5 = K 2 ψdψ 5 ψ 2 Kψdψ 5 2 f 2 x 2 dx Remarks:. Although λ 2 ca be obtaied from kowledge of K, λ depeds o f 2 x. Hece, i practice ĥ depeds o obtaiig a estimate for f 2 x 2 dx. 2. Takig λ2 λ 5 = c we see that ĥ 5. Hece, accordig to the AMISE criterio, there is a optimal rate of covergece of the badwidth to zero, amely, /5. Recall that /2 ˆfx fx = /2 ˆfx E ˆfx + /2 E ˆfx fx ad sice the first term after the equality coverges i distributio to a ormal distributio, ormality of the left side of the equality results if /2 E ˆfx fx approaches zero as. But /2 E ˆfx fx = /2 h 5/2 2 µ 2f 2 x + /2 h5/2 h 2 where R = oh 2. Hece, for /2 E ˆfx fx to vaish asymptotically, we eed h 5 0 as which implies that δ where < δ < /5. Cosequetly, asymptotic ormality of R 20

21 /2 ˆfx fx caot result whe the optimal choice is based o AMISE, oversmoothig is ecessary. 3. It should be clear that both miimizatio of the cross validatio fuctio ad AMISE ca be exteded to multivariate desity estimatio with little coceptual effort. For a radom vector X : Ω R D we have M ˆf = 2 i= j= deth K K H X j X i 2 deth i= j=,j i K H X i X j. Also, i the case where f Σ2, C ad γ d Kγdγ = 0 ad γ d γ d2 Kγdγ = 0 whe d d 2 we ca write AMISE ˆf = deth = deth K 2 ψdψ + µ2 2 4 K 2 ψdψ + µ2 2 4 D 2 D ii fxhi, 2 dx i= i= j= h 2 i,h 2 j, D ii fxd jj fxdx. Takig derivatives with respect to each of the elemets i H we have that Ĥ must satisfy, µ 2 2 ĥ 3 d, D dd 2 fxdx + ĥd, ĥ 2 i, D dd fxd ii fxdx λ 2 = ĥd,detĥ for all d. Whe H = I D we have ĥ D+4 = i=,i d λ 2 µ 2 D D 2 i= d= Ddd fxd ii fxdx which implies that ĥ = D+4 λ 2 µ 2 2 λ 4. The simplest procedure to estimate f 2 x 2 dx is to assume, at this stage, that fx belogs to a parametric family of desities. For example, if we assume that X i Nµ, σ 2, the /5.06σ if K is a Gaussia kerel h AMISE = / σ if K is a Epaechikov kerel / σ if K is a uiform kerel where σ ca be estimated by the sample stadard deviatio. The resultig badwidths are ormally referred to as rule-of-thumb badwidths.. 2

22 Refereces Beett, G., 962. Probability iequalities for the sum of idepedet radom variables. Joural of the America Statistical Associatio 57, Davidso, J., 994. Stochastic Limit Theory. Oxford Uiversity Press, New York. 22

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I