Lecture 15: Desity estimatio Why do we estimate a desity? Suppose that X 1,...,X are i.i.d. radom variables from F ad that F is ukow but has a Lebesgue p.d.f. f. Estimatio of F ca be doe by estimatig f. Note that estimators of F derived i 5.1.1 ad 5.1.2 do ot have Lebesgue p.d.f. s. Havig a desity estimator f, F ca be estimated by F(x) = x f (t)dt, which may be better tha F f itself may be of iterest Differece quotiet Sice f (t) = F (t) a.e., a simple estimator of f (t) is the differece quotiet f (t) = F (t + ) F (t ), t R, 2 where F is the empirical c.d.f., ad { } is a sequece of positive costats. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 1 / 11
Lecture 15: Desity estimatio Why do we estimate a desity? Suppose that X 1,...,X are i.i.d. radom variables from F ad that F is ukow but has a Lebesgue p.d.f. f. Estimatio of F ca be doe by estimatig f. Note that estimators of F derived i 5.1.1 ad 5.1.2 do ot have Lebesgue p.d.f. s. Havig a desity estimator f, F ca be estimated by F(x) = x f (t)dt, which may be better tha F f itself may be of iterest Differece quotiet Sice f (t) = F (t) a.e., a simple estimator of f (t) is the differece quotiet f (t) = F (t + ) F (t ), t R, 2 where F is the empirical c.d.f., ad { } is a sequece of positive costats. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 1 / 11
Properties of differece quotiet Sice 2 f (t) has the biomial distributio Bi(F(t + ) F(t ),), E[f (t)] f (t) if 0 as ad Var ( f (t) ) 0 if 0 ad. Thus, we should choose covergig to 0 slower tha 1. If we assume that 0,, ad f is cotiuously differetiable at t, the it ca be show (exercise) that mse f (t)(f) = f (t) ( 1 + o 2 ) + O(λ 2 ) ad, uder the additioal coditio that λ 3 0, λ [f (t) f (t)] d N ( 0, 1 2 f (t)). UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 2 / 11
Kerel desity estimators A useful class of estimators is the class of kerel desity estimators of the form 1 ( f (t) = w t Xi λ ), where w is a kow Lebesgue p.d.f. o R ad is called the kerel. If we choose w(t) = 1 2 I [ 1,1](t), the f (t) is essetially the same as the so-called histogram. Properties of kerel desity estimator f is a Lebesgue desity o R, sice 1 ( t x f (t)dt = w ) dt = w(y)dy = 1. The bias of f (t) as a estimator of f (t) is E[ f (t)] f (t) = 1 ( w t z λ )f (z)dz f (t) = w(y)[f (t y) f (t)]dy. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 3 / 11
Kerel desity estimators A useful class of estimators is the class of kerel desity estimators of the form 1 ( f (t) = w t Xi λ ), where w is a kow Lebesgue p.d.f. o R ad is called the kerel. If we choose w(t) = 1 2 I [ 1,1](t), the f (t) is essetially the same as the so-called histogram. Properties of kerel desity estimator f is a Lebesgue desity o R, sice 1 ( t x f (t)dt = w ) dt = w(y)dy = 1. The bias of f (t) as a estimator of f (t) is E[ f (t)] f (t) = 1 ( w t z λ )f (z)dz f (t) = w(y)[f (t y) f (t)]dy. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 3 / 11
Properties of kerel desity estimator If f is bouded ad cotiuous at t, the, by the domiated covergece theorem, the bias of f (t) coverges to 0 as 0. If f is bouded ad cotiuous at t ad t w(t)dt <, the the bias of f (t) is O( ). If f is bouded ad cotiuous at t ad w 0 = [w(t)] 2 dt <, the variace of f (t) is Var ( f ) 1 (t) = λ 2 = 1 ( ( Var w t X1 [ w ( t z )) )] 2 λ 2 f (z)dz 1 [ 1 ( ] 2 w t z λ )f (z)dz = 1 [w(y)] 2 f (t y)dy + O = w ( ) 0f (t) 1 + o ( ) 1 UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 4 / 11
Properties of kerel desity estimator Hece, if 0,, ad f is bouded ad cotiuous at t, the mse f (t) (F) = w 0f (t) + O(λ 2 ). If 0,, ad f is bouded ad cotiuous at t ad w 0 = [w(t)]2 dt <, the λ { f (t) E[ f (t)]} d N ( 0,w 0 f (t) ). This ca be( show as follows. Let Y i = w t Xi ). The Y 1,...,Y are idepedet ad idetically distributed with E(Y 1 ) = w ( t x ) f (x)dx = w(y)f (t y)dy = O ( ) ad UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 5 / 11
Properties of kerel desity estimator [ ( )] t x 2 Var(Y 1 ) = w f (x)dx [ w = [w(y)] 2 f (t y)dy + O(λ 2 ) = w 0 f (t) + o( ), ( t x ) ] 2 f (x)dx sice f is bouded ad cotiuous at t ad w 0 = [w(t)]2 dt <. The Var ( f ) 1 (t) = 2 λ 2 Var(Y i ) = w ( ) 0f (t) 1 + o. λ Note that f (t) E f (t) = [Y i E(Y i )]/( ). To apply Lideberg s cetral limit theorem to f (t), we fid, for ε > 0, E(Y1 2 I { Y 1 E(Y 1 ) >ε } ) = w(y) E(Y 1 ) >ε [w(y)] 2 f (t y)dy, which coverges to 0 uder the give coditios. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 6 / 11
Properties of kerel desity estimator This proves λ { f (t) E[ f (t)]} d N ( 0,w 0 f (t) ). Furthermore, E[ f (t)] f (t) = λ 1 E(Y 1 ) f (t) = w(y)[f (t y) f (t)]dy = yw(y)f (ξ t,y, )dy, where ξ t,y, t. If f is bouded ad cotiuous at t, t w(t)dt <, ad λ 3 0, the λ {E[ f (t)] f (t)} = O ( λ ) 0 ad λ { f (t) f (t)]} d N ( 0,w 0 f (t) ). UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 7 / 11
Example 5.4 A i.i.d. sample of size = 200 was geerated from N(0,1). Desity curve estimates, differece quotiet f ad kerel estimate f, are plotted i Figure 5.1 with the curve of the true p.d.f. For the kerel estimate, w(t) = 1 2 e t is used ad = 0.4. From Figure 5.1, it seems that the kerel estimate is much better tha the differece quotiet Figure 5.1. Desity estimates i Example 5.4 True p.d.f. f(t) 0.0 0.1 0.2 0.3 0.4 0.5 Estimator (5.26) Estimator (5.29) -2-1 0 1 2 t UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 8 / 11
Noparametric regressio Assume that x i is a realizatio of a uivariate radom variable X i, ad we wat to estimate the regressio fuctio µ(t) = E(Y i t) = E(Y i X i = t) based o a radom sample (Y 1,X 1 ),...,(Y,X ) from a populatio with a pdf f (x,y). I oparametric regressio, we do ot specify ay form of µ(t) except that it is a smooth fuctio of t. A oparametric estimator of µ(t) based o a kerel w(t) is ( )/ t Xi ( ) t Xi µ(t) = Y i w, t R w From the previous discussio o the kerel estimatio of the pdf of X i, f (t), the deomiator divided by coverges i probability to f (t) if 0 ad. Hece, for the cosistecy of µ(t) as a estimator of µ(t), it suffices to show that, for ay t R, h (t) = 1 ( ) t Xi Y i w coverges i probability to yf (t,y)dy UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 9 / 11
Cosider first the expectatio: E[h (t)] = 1 [ ( )] t Xi E Y i w = 1 ( ) t x yw f (x,y)dxdy = yw (z)f (t z,y)dzdy Suppose that f (x,y) is cotiuous ad f (x,y) c(y)g(y), where g(y) is the pdf of Y i ad c(y) is a fuctio of y satisfies E[ Y i c(y i )] = y c(y)g(y)dy < The, if 0 as, by the domiated covergece theorem, lim E[h (t)] = lim yw (z)f (t z,y)dzdy = yw (z)f (t,y)dzdy = w (z)dz yf (t,y)dy = yf (t,y)dy Thus, it remais to show that the variace of h (t) coverges to 0 uder some coditios. UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 10 / 11
Var(h (t)) = 1 ( ( λ 2 Var Y i w t Xi = 1 λ 2 = 1 )) 1 λ 2 [ ( )] 2 y 2 w t x f (x,y)dxdy y 2 [w (z)] 2 f (t z,y)dzdy [ ( )] 2 E Y i w t Xi Suppose that f (x,y) is cotiuous ad f (x,y) c(y)g(y), where g(y) is the pdf of Y i ad c(y) is a fuctio of y satisfies E[Yi 2 c(y i )] = y 2 c(y)g(y)dy < ad w 0 = [w(z)] 2 dz <. The lim y 2 [w (z)] 2 f (t z,y)dzdy = Hece, the variace of h (t) coverges to 0 if. = y 2 [w (z)] 2 f (t,y)dzdy [w (z)] 2 dz y 2 f (t,y)dy < Uder some more coditios, similar to the estimatio of f (t), for ay t R, we ca show that for some fuctio σ 2 (t), λ [ µ(t) µ(t)] coverges i distributio to N(0,σ 2 (t)) UW-Madiso (Statistics) Stat 710, Lecture 15 Ja 2018 11 / 11