Density Estimation. Chapter 1

Size: px
Start display at page:

Download "Density Estimation. Chapter 1"

Transcription

1 Capter 1 Desity Estimatio Te estimatio of probability desity fuctios (PDFs) ad cumulative distributio fuctios (CDFs) are corerstoes of applied data aalysis i te social scieces. Testig for te equality of two distributios (or momets tereof) is peraps te most basic test i all of applied data aalysis. Ecoomists, for istace, devote a great deal of attetio to te study of icome distributios ad ow tey vary across regios ad over time. Toug te PDF ad CDF are ofte te objects of direct iterest, teir estimatio also serves as a importat buildig block for oter objects beig modeled suc as a coditioal mea (i.e., a regressio fuctio ), wic may be directly modeled usig oparametric or semiparametric metods (a coditioal mea is a fuctio of a coditioal PDF, wic is itself a ratio of ucoditioal PDFs). After masterig te priciples uderlyig te oparametric estimatio of a PDF, te oparametric estimatio of te workorse of applied data aalysis, te coditioal mea fuctio cosidered i Capter, progresses i a fairly straigtforward maer. Careful study of te approaces developed i Capter 1 will be most elpful for uderstadig material preseted i later capters. We begi wit te estimatio of a uivariate PDF i Sectios 1.1 troug 1.3, tur to te estimatio of a uivariate CDF i Sectios 1.4 ad 1.5, ad te move o to te more geeral multivariate settig i Sectios 1.6 troug 1.8. Asymptotic ormality, uiform rates of covergece, ad bias reductio metods appear i Sectios 1.9 troug 1.1. Numerous illustrative applicatios appear i Sectio 1.13, wile teoretical ad applied exercises ca be foud i Sectio 1.14 We ow proceed wit a discussio of ow to estimate te PDF

2 4 1. DENSITY ESTIMATION f X (x) of a radom variable X. For otatioal simplicity we drop te subscript X ad simply use f(x) to deote te PDF of X. Some of te treatmets of te kerel estimatio of a PDF discussed i tis capter are draw from te two excellet moograps by Silverma (1986) ad Scott (199). 1.1 Uivariate Desity Estimatio To best appreciate wy oe migt cosider usig oparametric metods to estimate a PDF, we begi wit a illustrative example, te parametric estimatio of a PDF. Example 1.1. Suppose X 1, X,..., X represet idepedet ad idetically distributed (i.i.d.) draws from a ormal distributio wit mea µ ad variace σ. We wis to estimate te ormal PDF f(x). By assumptio, f(x) as a kow parametric fuctioal form (i.e., 1 uivariate ormal) give by f(x) = (πσ ) 1/ [ exp (x µ) /σ ], were te mea µ = E(X) ad variace σ = E[(X E(X)) ] = var(x) are te oly ukow parameters to be estimated. Oe could estimate µ ad σ by te metod of maximum likeliood as follows. Uder te i.i.d. assumptio, te joit PDF of (X 1,..., X ) is simply te product of te uivariate PDFs, wic may be writte as f(x 1,..., X ) = πσ i=1 1 (X i µ) 1 1 P (X i µ) e i=1 σ = (πσ ) / e σ. Coditioal upo te observed sample ad takig te logaritm, tis gives us te log-likeliood fuctio L(µ, σ ) l f(x 1,..., X ; µ, σ ) 1 = l(π) l σ (Xi µ). σ Te metod of maximum likeliood proceeds by coosig tose parameters tat make it most likely tat we observed te sample at ad give our distributioal assumptio. Tus, te likeliood fuctio (or a mootoic trasformatio tereof, e.g., l) expresses te plausibility of differet values of µ ad σ give te observed sample. We te maximize te likeliood fuctio wit respect to tese two ukow parameters. i=1

3 1.1. UNIVARIATE DENSITY ESTIMATION 5 Te ecessary first order coditios for a maximizatio of te loglikeliood fuctio are L(µ, σ )/ µ = 0 ad L(µ, σ )/ σ = 0. Solvig tese first order coditios for te two ukow parameters µ ad σ yields µˆ = 1 1 X i ad σˆ = ( X i µ) ˆ. i=1 i=1 µ ˆ ad σˆ above are te maximum likeliood estimators of µ ad σ, respectively, ad te resultig estimator of f( x) is fˆ(x) = 1 πσˆ [ exp 1 ( ) ] x µ ˆ. σˆ Te Acilles eel of ay parametric approac is of course te requiremet tat, prior to estimatio, te aalyst must specify te exact parametric fuctioal form for te object beig estimated. Upo reflectio, te parametric approac is somewat circular sice we iitially set out to estimate a ukow desity but must first assume tat te desity is i fact kow (up to a adful of ukow parameters, of course). Havig based our estimate o te assumptio tat te desity is a member of a kow parametric family, we must te aturally cofrot te possibility tat te parametric model is misspecified, i.e., ot cosistet wit te populatio from wic te data was draw. For istace, by assumig tat X is draw from a ormally distributed populatio i te above example, we i fact impose a umber of potetially quite restrictive assumptios: symmetry, uimodality, mootoically decreasig away from te mode ad so o. If te true desity were i fact asymmetric or possessed multiple modes, or was omootoic away from te mode, te te presumptio of distributioal ormality may provide a misleadig caracterizatio of te true desity ad could tereby produce erroeous estimates ad lead to usoud iferece. At tis jucture may readers will o doubt be poitig out tat, avig estimated a parametric PDF, oe ca always test weter te uderlyig distributioal assumptio is valid. We are, of course, completely sympatetic toward suc argumets. Ofte, owever, te rejectio of a distributioal assumptio fails to provide ay clear alterative. Tat is, we ca reject te assumptio of ormality, but tis rejectio leaves us were we started, peraps avig ruled out but oe of a large

4 6 1. DENSITY ESTIMATION umber of cadidate distributios. Agaist tis backdrop, researcers migt istead cosider oparametric approaces. Noparametric metods circumvet problems arisig from te eed to specify parametric fuctioal forms prior to estimatio. Rater ta presume oe kows te exact fuctioal form of te object beig estimated, oe istead presumes tat it satisfies some regularity coditios suc as smootess ad differetiability. Tis does ot, owever, come witout cost. By imposig less structure o te fuctioal form of te PDF ta do parametric metods, oparametric metods require more data to acieve te same degree of precisio as a correctly specified parametric model. Our primary focus i tis text is o a class of estimators kow as oparametric kerel estimators (a kerel fuctio is simply a weigtig fuctio), toug i Capters 14 ad 15 we provide a treatmet of alterative oparametric metodologies icludig earest eigbor ad series metods. Before proceedig to a formal teoretical aalysis of oparametric desity estimatio metods, we first cosider a popular example of estimatig te probability of a ead o a toss of a coi wic is closely related to te oparametric estimatio of a CDF. Tis i tur will lead us to te oparametric estimatio of a PDF. Example 1.. Suppose we ave a coi (peraps a ufair oe) ad we wat to estimate te probability of flippig te coi ad avig it lad eads up. Let p = P(H) deote te (ukow) populatio probability of obtaiig a ead. Takig a relative frequecy approac, we would flip te coi times, cout te frequecy of eads i trials, ad compute te relative frequecy give by 1 pˆ = {# of eads }, (1.1) wic provides a estimate of p. Te pˆ defied i (1.1) is ofte referred to as a frequecy estimator of p, ad it is also te maximum likeliood estimator of p (see Exercise 1.). Te estimator pˆ is, of course, fully oparametric. Ituitively, oe would expect tat, if is large, te pˆ sould be close to p. Ideed, oe ca easily sow tat te mea squared error (MSE) of pˆ is give by (see Exercise 1.3) [ def p(1 p) MSE (ˆp) = E (ˆp p) ] =, so MSE (ˆp) 0 as, wic is termed as pˆ coverges to p i mea square error; see Appedix A for te defiitios of various modes of covergece.

5 1.1. UNIVARIATE DENSITY ESTIMATION 7 We ow discuss ow to obtai a estimator of te CDF of X, wic we deote by F (x). Te CDF is defied as F (x) = P[X x]. Wit i.i.d. data X 1,...,X (i.e., radom draws from te distributio F ( )), oe ca estimate F (x) by F (x) = 1 { # of X i s x }. (1.) Equatio (1.) as a ice ituitive iterpretatio. Goig back to our coi-flip example, if a coi is suc tat te probability of obtaiig a ead we we flip it equals F (x) (F (x) is ukow), ad if we treat te collectio of data X 1,..., X as flippig a coi times ad we say tat a ead occurs o te i t trial if X i x, te P(H) = P(X i x) = F (x). Te familiar frequecy estimator of P(H) is equal to te umber of eads divided by te umber of trials: P(H) ˆ = # of eads 1 = { # of X i s x } F (x). (1.3) Terefore, we call (1.) a frequecy estimator of F (x). Just as before we estimatig P(H), we expect ituitively tat as gets large, ˆP(H) sould yield a more accurate estimate of P(H). By te same reasoig, oe would expect tat as, F (x) yields a more accurate estimate of F (x). Ideed, oe ca easily sow tat F (x) F (x) i MSE, wic implies tat F (x) coverges to F (x) i probability ad also i distributio as. I Appedix A we itroduce te cocepts of covergece i mea square error, covergece i probability, covergece i distributio, ad almost sure covergece. It is well establised tat F (x) ideed coverges to F (x) i eac of tese various seses. Tese cocepts of covergece are ecessary as it is easy to sow tat te ordiary limit of F (x) does ot exist, i.e., lim F (x) does ot exist (see Exercise 1.3, were te defiitio of a ordiary limit is provided). Tis example igligts te ecessity of itroducig ew cocepts of covergece modes suc as covergece i mea square error ad covergece i probability. Now we take up te questio of ow to estimate a PDF f(x) witout makig parametric presumptios about it s fuctioal form. From te

6 8 1. DENSITY ESTIMATION defiitio of f(x) we ave 1 d f(x) = F (x). (1.4) dx From (1.) ad (1.4), a obvious estimator of f(x) is fˆ(x) = were is a small positive icremet. By substitutig (1.) ito (1.5), we obtai F (x + ) F (x ), (1.5) 1 fˆ(x) = { # of X 1,..., X fallig i te iterval [x, x + ] }. If we defie a uiform kerel fuctio give by (1.6) k(z) = { 1/ if z 1 (1.7) 0 oterwise, te it is easy to see tat fˆ(x) give by (1.5) ca also be expressed as fˆ(x) = 1 ( Xi x ) k. (1.8) i=1 Equatio (1.8) is called a uiform kerel estimator because te kerel fuctio k( ) defied i (1.7) correspods to a uiform PDF. I geeral, we refer to k( ) as a kerel fuctio ad to as a smootig parameter (or, alteratively, a badwidt or widow widt). Equatio (1.8) is sometimes referred to as a aïve kerel estimator. I fact oe migt use may oter possible coices for te kerel fuctio k( ) i tis cotext. For example, oe could use a stadard ormal kerel give by 1 1 k(v) = π e v, < v <. (1.9) Tis class of estimators ca be foud i te first publised paper o kerel desity estimatio by Roseblatt (1956), wile Parze (196) establised a umber of properties associated wit tis class of estimators 1 We oly cosider te cotiuous X case i tis capter. We deal wit te discrete X case i Capters 3 ad 4. Recall tat te defiitio of te derivative of a fuctio g(x) is give by g(x+ d g(x)/dx = lim ) g(x) g(x+) g(x ) 0, or, equivaletly, d g(x)/dx = lim 0.

7 1.1. UNIVARIATE DENSITY ESTIMATION 9 ad relaxed te oegativity assumptio i order to obtai estimators wic are more efficiet. For tis reaso, tis approac is sometimes referred to as Roseblatt-Parze kerel desity estimatio. We will prove sortly tat te kerel estimator fˆ(x) defied i (1.8) costructed from ay geeral oegative bouded kerel fuctio k( ) tat satisfies (i) k(v) dv = 1 (ii) k(v) = k( v) (1.10) (iii) v k(v) dv = κ > 0 is a cosistet estimator of f(x). Note tat te symmetry coditio (ii) implies tat vk(v) dv = 0. By cosistecy, we mea tat fˆ(x) f(x) i probability (covergece i probability is defied i Appedix A). Note tat k( ) defied i (1.10) is a (symmetric) PDF. For recet work o kerel metods wit asymmetric kerels, see Abadir ad Lawford (004). To defie various modes of covergece, we first itroduce te cocept of te Euclidea orm ( Euclidea legt ) of a vector. Give a q 1 vector x = (x 1, x,..., x q ) R q, we use x to deote te Euclidea legt of x, wic is defied by x = [x x] 1/ x 1 + x + + x q. We q = 1 (a scalar), x is simply te absolute value of x. I te appedix we discuss te otatio O( ) ( big O ) ad o( ) ( small O ). Let a be a ostocastic sequece. We say tat a = O( α ) if a C α for all sufficietly large, were α ad C (> 0) are costats. Similarly, we say tat a = o( α ) if a / α 0 as. We are ow ready to prove te MSE cosistecy of fˆ(x). Teorem 1.1. Let X 1,..., X deote i.i.d. observatios avig a tree-times differetiable PDF f(x), ad let f (s) (x) deote te st order derivative of f(x) (s = 1,, 3). Let x be a iterior poit i te support of X, ad let fˆ(x) be tat defied i (1.8). Assume tat te kerel fuctio k( ) is bouded ad satisfies (1.10). Also, as, 0 ad, te ( ) 4 [ ] MSE fˆ(x) = κ f () κf(x) ( (x) + + o 4 + () 1) 4 = O ( 4 + () 1), (1.11)

8 10 1. DENSITY ESTIMATION were κ = v k(v) dv ad κ = k (v) dv. Proof of Teorem 1.1. ( ) { [ ] } MSE fˆ(x) E fˆ(x) f(x) ( = var fˆ(x) ( var fˆ(x) ) [ + ) + ( ) E fˆ(x) [ bias ] f(x) ( fˆ(x))]. We will evaluate te bias(fˆ(x)) ad var(fˆ(x)) terms separately. For te bias calculatio we will eed to use te Taylor expasio formula. For a uivariate fuctio g(x) tat is m times differetiable, we ave 1 g(x) =g(x 0 ) + g (1) (x 0 )(x x 0 ) + g () (x 0 )(x x 0 ) +! (m 1)! g (m 1) (x 0 )(x x 0 ) m 1 + m! g(m) (ξ)(x x 0 ) m, s g(x) were g (s) (x 0 ) = x s x=x, ad ξ lies betwee x ad x 0 0.

9 1.1. UNIVARIATE DENSITY ESTIMATION 11 Te bias term is give by { ( ) 1 ( Xi x )} bias fˆ(x) = E k f(x) i=1 [ ( = 1 X1 x )] E k f(x) (by idetical distributio) ( = 1 x1 x ) f(x 1 )k dx 1 f(x) = 1 f(x + v)k(v) dv f(x) = (cage of variable, x 1 x = v) { f(x) + f (1) (x)v + f(x) { = f(x) f () (x) = by (1.10) f () (x) 1 f () (x) v + O( 3 ) } k(v) dv v k(v) dv + O ( 3)} f(x) v k(v) dv + O ( 3), (1.1) were te O ( 3) term comes from (1/3!) 3 f (3) ( x)v 3 v k(v) dv C3 3 k(v)dv ( = O 3 ), were C is a positive costat, ad were x lies betwee x ad x + v. Note tat i te above derivatio we assume tat f(x) is treetimes differetiable. We ca weake tis coditio to f(x) beig twice differetiable, resultig i (O( 3 ) becomes o( ), see Exercise 1.5) ( ) ( ) bias fˆ(x) = E fˆ(x) = f () (x) f(x) v k(v) dv + o ( ). (1.13)

10 1 1. DENSITY ESTIMATION Next we cosider te variace term. Observe tat ( ) [ var fˆ(x) = var 1 ( k Xi x )] } i=1 { = 1 ( Xi x var [k )] + 0 i=1 (by idepedece) ( ( 1 X1 x )) = var k (by idetical distributio) { [ (X1 1 x )] [ ( ( X 1 x ))] } = E k E k { ( 1 = f(x 1 )k x1 x ) dx 1 [ ( x1 x ) ] } f(x 1 )k dx 1 { 1 = f(x + v)k (v) dv [ ] } f(x + v)k(v) dv { 1 [ = f(x) + f (ξ)v] (1) k (v) dv O ( )} = 1 ( ) } {f(x) k (v) dv + O v k (v) dv O () 1 = {κf(x) + O()}, (1.14) were κ = k (v) dv. Equatios (1.1) ad (1.14) complete te proof of Teorem 1.1. Teorem 1.1 implies tat (by Teorem A.7 of Appedix A) fˆ(x) f(x) = O p ( + () 1/) = o p (1). By coosig = c 1/α for some c > 0 ad α > 1, te coditios required for cosistet estimatio of f(x), 0 ad,

11 1.1. UNIVARIATE DENSITY ESTIMATION 13 are clearly satisfied. Te overridig questio is wat values of c ad α sould be used i practice. As ca be see, for a give sample size, if is small, te resultig estimator will ave a small bias but a large variace. O te oter ad, if is large, te te resultig estimator will ave a small variace but a large bias. To miimize MSE(fˆ(x)), oe sould balace te squared bias ad te variace terms. Te optimal coice of (i te sese tat MSE(fˆ(x)) is miimized) sould satisfy dmse(fˆ(x))/d = 0. By usig (1.11), it is easy to sow tat te optimal tat miimizes te leadig term of MSE(fˆ(x)) is give by opt = c(x) 1/5, (1.15) { were c(x) = κf(x)/[κ f () (x)] } 1/5. MSE(fˆ(x)) is clearly a poitwise property, ad by usig tis as te basis for badwidt selectio we are obtaiig a badwidt tat is optimal we estimatig a desity at a poit x. Examiig c(x) i (1.15), we ca see tat a badwidt wic is optimal for estimatio at a poit x located i te tail of a distributio will differ from tat wic is optimal for estimatio at a poit located at, say, te mode. Suppose tat we are iterested ot i tailorig te badwidt to te poitwise estimatio of f(x) but istead i tailorig te badwidt globally for all poits x, tat is, for all x i te support of f( ) (te support of x is defied as te set of poits of x for wic f(x) > 0, i.e., {x : f(x) > 0}). I tis case we ca coose optimally by miimizig te itegrated MSE (IMSE) of fˆ(x). Usig (1.11) we ave [ def 1 [ ] IMSE(fˆ) = E fˆ(x) f(x)] dx = 4 κ f () (x) dx 4 κ + + o ( 4 + () 1). (1.16) Agai lettig opt deote te optimal smootig parameter tat miimizes te leadig terms of (1.16), we use simple calculus to get opt = c 0 1/5, (1.17) { [f κ /5 κ 1/5 () (x) ] 1/5 were c 0 = dx} > 0 is a positive costat. Note tat if f () (x) = 0 for (almost) all x, te c 0 is ot well defied. For example, if X is, say, uiformly distributed over its support, te f (s) (x) = 0 for all x ad for all s 1, ad (1.17) is ot defied i tis case. It ca be sow tat i tis case (i.e., we X is uiformly

12 14 1. DENSITY ESTIMATION distributed), opt will ave a differet rate of covergece equal to 1/3 ; see te related discussio i Sectio ad Exercise A iterestig extesio of te above results ca be foud i Zide- Wals (005), wo examies te asymptotic process for te kerel desity estimator by meas of geeralized fuctios ad geeralized radom processes ad presets ovel results for caracterizig te beavior of kerel desity estimators we te desity does ot exist, i.e., we te desity does ot exist as a locally summable fuctio. 1. Uivariate Badwidt Selectio: Rule-of-Tumb ad Plug-I Metods Equatio (1.17) reveals tat te optimal smootig parameter depeds o te itegrated secod derivative of te ukow desity troug c 0. I practice, oe migt coose a iitial pilot value of to estimate f () (x) ] [ dx oparametrically, ad te use tis value to obtai opt usig (1.17). Suc approaces are kow as plug-i metods for obvious reasos. Oe popular way of coosig te iitial, suggested by Silverma (1986), is to assume tat f(x) belogs to a parametric family of distributios, ad te to compute usig (1.17). For example, if f(x) is a ormal PDF wit variace σ, te [ f () (x) ] dx = 3/[8π 1/ σ 5 ]. If a stadard ormal kerel is used, usig (1.17), we get te pilot estimate pilot = (4π) 1/10 [ (3/8)π 1/] 1/5 σ 1/5 1.06σ 1/5, (1.18) wic is te plugged ito [fˆ() (x)] dx, wic te may be used to obtai opt usig (1.17). A clearly udesirable property of te plug-i metod is tat it is ot fully automatic because oe eeds to coose a iitial value of to estimate [f () (x)] dx (see Marro, Joes ad Seater (1996) ad also Loader (1999) for furter discussio). Ofte, practitioers will use (1.18) itself for te badwidt. Tis is kow as te ormal referece rule-of-tumb approac sice it is te optimal badwidt for a particular family of distributios, i tis case te ormal family. Sould te uderlyig distributio be close to a ormal distributio, te tis will provide good results, ad for exploratory purposes it is certaily computatioally attractive. I practice, σ is replaced by te sample stadard deviatio of {X i } i=1, wile Silverma (1986, p. 47) advocates usig a more robust measure

13 1.. UNIVARIATE BANDWIDTH: CROSS-VALIDATION 15 of spread wic replaces σ wit A, a adaptive measure of spread give by A = mi(stadard deviatio, iterquartile rage/1.34). We ow tur our attetio to a discussio of a umber of fully automatic or data-drive metods for selectig tat are tailored to te sample at ad. 1.3 Uivariate Badwidt Selectio: Cross-Validatio Metods I bot teoretical ad practical settigs, oparametric kerel estimatio as bee establised as relatively isesitive to coice of kerel fuctio. However, te same caot be said for badwidt selectio. Differet badwidts ca geerate radically differig impressios of te uderlyig distributio. If kerel metods are used simply for exploratory purposes, te oe migt udersmoot te desity by coosig a small value of ad let te eye do ay remaiig smootig. Alteratively, oe migt coose a rage of values for ad plot te resultig estimates. However, for soud aalysis ad iferece, a priciple avig some kow optimality properties must be adopted. Oe ca tik of coosig te badwidt as beig aalogous to coosig te umber of terms i a series approximatio; te more terms oe icludes i te approximatio, te more flexible te resultig model becomes, wile te smaller te badwidt of a kerel estimator, te more flexible it becomes. However, icreasig flexibility (reducig potetial bias) ecessarily leads to icreased variability (icreasig potetial variace). See i tis ligt, oe aturally appreciates ow a umber of metods discussed below are motivated by te eed to balace te squared bias ad variace of te resultig estimate Least Squares Cross-Validatio Least squares cross-validatio is a fully automatic data-drive metod of selectig te smootig parameter, origially proposed by Rudemo (198), Stoe (1984) ad Bowma (1984) (see also Silverma (1986, pp )). Tis metod is based o te priciple of selectig a badwidt tat miimizes te itegrated squared error of te resultig estimate, tat is, it provides a optimal badwidt tailored to all x i te support of f(x).

14 16 1. DENSITY ESTIMATION Te itegrated squared differece betwee fˆ ad f is [ ] fˆ(x) f(x) dx = fˆ(x) dx fˆ(x)f(x) dx + f(x) dx. (1.19) As te tird term o te rigt-ad side of (1.19) is urelated to, coosig to miimize (1.19) is terefore equivalet to miimizig fˆ(x) dx fˆ(x)f(x) dx (1.0) wit respect to. I te secod term, fˆ(x)f(x) dx ca be writte as E X [fˆ(x)], were E X ( ) deotes expectatio wit respect to X ad ot wit respect to te radom observatios {X j } used for computig fˆ( ). Terefore, we may estimate E X [fˆ(x)] by 1 i=1 fˆ i(x i ) (i.e., replacig E X by its sample mea), were j=1 1 fˆ i(x i ) = ( 1) j=1,j=i k ( ) Xi X j (1.1) is te leave-oe-out kerel estimator of f(x i ). 3 Fially, we estimate te first term fˆ(x) dx by ( fˆ(x) Xi x dx = 1 ) ( Xj x ) k k dx i=1 j=1 = 1 ( ) Xi X j k, (1.) i=1 j=1 were k (v) = k(u)k(v u) du is te twofold covolutio kerel derived from k( ). If k(v) = exp( v /)/ π, a stadard ormal kerel, te k(v) = exp( v /4)/ 4π, a ormal kerel (i.e., ormal PDF) wit mea zero ad variace two, wic follows sice two idepedet N(0, 1) radom variables sum to a N(0, ) radom variable. 3 Here we empasize tat it is importat to use te leave-oe-out kerel estimator for computig E X( ) above. Tis is because te expectatios operator presumes tat te X ad te X j s are idepedet of oe aoter. Witout usig te leave-oe-out estimator, te cross-validatio metod will break dow; see Exercise 1.6 (iii).

15 1.3. UNIVARIATE BANDWIDTH: CROSS-VALIDATION 17 Least squares cross-validatio terefore cooses to miimize CV f () = 1 ( ) k Xi X j i=1 j=1 ( 1) i=1 j=i,j=1 ( ) Xi X j k, (1.3) wic is typically udertake usig umerical searc algoritms. It ca be sow tat te leadig term of CV f () is CV f0 give by (igorig a term urelated to ; see Exercise 1.6) CV f0 () = B 1 4 κ +, (1.4) were B 1 = (κ /4) [ [f () (x)] dx ] (κ = v k(v) dv, κ = k (v) dv). Tus, as log as f () (x) does ot vais for (almost) all x, we ave B 1 > 0. Let 0 deote te value of tat miimizes CV f0. Simple calculus sows tat 0 = c 0 1/5 were { [ ] } 1/5 c = [κ/(4b 1 )] 1/5 = κ 1/5 κ /5 f () 0 (x) dx. A compariso of 0 wit opt i (1.17) reveals tat te two are idetical, i.e., 0 opt. Tis arises because opt miimizes E[fˆ(x) f(x)] dx, wile 0 miimizes E[CV f ()], te leadig term of CV f (). It ca be easily see tat E[CV f ()] + f(x) dx is a alterative versio of E[fˆ(x) f(x)] dx; ece, E[CV f ()] + f(x) dx also estimates E[fˆ(x) f(x)] dx. Give tat f(x) dx is urelated to, oe would expect tat 0 ad opt sould be te same. Let ĥ deote te value of tat miimizes CV f (). Give tat CV f () = CV f0 +(s.o.), were (s.o.) deotes smaller order terms (ta CV f0 ) ad terms urelated to, it ca be sow tat ĥ = 0 + o p ( 0 ), or, equivaletly, tat ĥ 0 ĥ i probability. (1.5) 0 Ituitively, (1.5) is easy to uderstad because CV f () = CV f0 () + (s.o.), tus asymptotically a tat miimizes CV f () sould be

16 18 1. DENSITY ESTIMATION close to a tat miimizes CV f0 (); terefore, we expect tat ĥ ad 0 will be close to eac oter i te sese of (1.5). Härdle, Hall ad Marro (1988) sowed tat (ĥ 0 )/ 0 = O p ( 1/10 ), wic ideed coverges to zero (i probability) but at a extremely slow rate. We agai uderscore te eed to use te leave-oe-out kerel estimator we costructig CV f as give i (1.3). If istead oe were to use te stadard kerel estimator, least squares cross-validatio will break dow, yieldig ĥ = 0. Exercise 1.6 sows tat if oe does ot use te leave-oe-out kerel estimator we estimatig f(x i ), te = 0 miimizes te objective fuctio, wic of course violates te cosistecy coditio tat as. Here we implicitly impose te restrictio tat f () (x) is ot a zero fuctio, wic rules out te case for wic f(x) is a uiform PDF. I fact tis coditio ca be relaxed. Stoe (1984) sowed tat, as log as f(x) is bouded, te te least squares cross-validatio metod will select optimally i te sese tat [fˆ(x, ) ˆ f(x)] dx 1 almost surely, (1.6) if [fˆ(x, ) f(x)] dx were fˆ(x, ĥ) deotes te kerel estimator of f(x) wit cross-validatio selected ĥ, ad fˆ(x, ) is te kerel estimator wit a geeric. Obviously, te ratio defied i (1.6) sould be greater ta or equal to oe for ay. Terefore, Stoe s (1984) result states tat, asymptotically, cross-validated smootig parameter selectio is optimal i te sese of miimizig te estimatio itegrated square error. I Exercise 1.16 we furter discuss te ituitio uderlyig wy ĥ 0 eve we f(x) is a uiform PDF Likeliood Cross-Validatio Likeliood cross-validatio is aoter automatic data-drive metod for selectig te smootig parameter. Tis approac yields a desity estimate wic as a etropy teoretic iterpretatio, sice te estimate will be close to te actual desity i a Kullback-Leibler sese. Tis approac was proposed by Dui (1976). Likeliood cross-validatio cooses to maximize te (leave-oeout) log likeliood fuctio give by L = l L = l fˆ i(x i ), i=1

17 1.4. UNIVARIATE CDF ESTIMATION 19 were fˆ i(x i ) is te leave-oe-out kerel estimator of f(x i ) defied i (1.1). Te mai problem wit likeliood cross-validatio is tat it is severely affected by te tail beavior of f(x) ad ca lead to icosistet results for fat tailed distributios we usig popular kerel fuctios (see Hall (1987a, 1987b)). For tis reaso te likeliood crossvalidatio metod as elicited little iterest i te statistical literature. However, te likeliood cross-validatio metod may work well for a rage of stadard distributios (i.e., ti tailed). We cosider te performace of likeliood cross-validatio i Sectio 1.3.3, we we compare te impact of differet badwidt selectio metods o te resultig desity estimate, ad i Sectio 1.13, were we cosider empirical applicatios A Illustratio of Data-Drive Badwidt Selectio Figure 1.1 presets kerel estimates costructed from = 500 observatios draw from a simulated bimodal distributio. Te secod order Gaussia (ormal) kerel was used trougout, ad least squares crossvalidatio was used to select te badwidt for te estimate appearig i te upper left plot of te figure, wit lscv = We also plot te estimate based o te ormal referece rule-of-tumb ( ref = 0.34) alog wit a udersmooted estimate (1/5 lscv ) ad a oversmooted estimate (5 lscv ). 4 Figure 1.1 reveals tat least squares cross-validatio appears to yield a reasoable desity estimate for tis data, wile te referece rule-of-tumb is iappropriate as it oversmoots somewat. Extreme oversmootig ca lead to a uimodal estimate wic completely obscures te true bimodal ature of te uderlyig distributio. Also, udersmootig leads to too may false modes. See Exercise 1.17 for a empirical applicatio tat ivestigates te effects of uder- ad oversmootig o te resultig desity estimate. 1.4 Uivariate CDF Estimatio I Sectio 1.1 we itroduced te empirical CDF estimator F (x) give i (1.), wile Exercise 1.4 sows tat it is a -cosistet estimator 4 Likeliood cross-validatio yielded a badwidt of mlcv = 0.15, wic results i a desity estimate virtually idetical to tat based upo least squares crossvalidatio for tis dataset.

18 0 1. DENSITY ESTIMATION f(x) f(x) Least-Squares CV Referece f(x) X X Udersmooted Oversmooted f(x) X X Figure 1.1: Uivariate kerel estimates of a mixture of ormals usig least squares cross-validatio, te ormal referece rule-of-tumb, udersmootig, ad oversmootig ( = 500). Te correct parametric data geeratig process appears as te solid lie, te kerel estimate as te dased lie. of F (x). However, tis empirical CDF F (x) is ot smoot as it jumps by 1/ at eac sample realizatio poit. Oe ca, owever, obtai a smooted estimate of F (x) by itegratig fˆ(x). Defie Fˆ(x) = x fˆ(v) dv = 1 ( ) x Xi G, (1.7) x were G(x) = k(v) dv is a CDF (wic follows directly because k( ) is a PDF; see (1.10)). Te ext teorem provides te MSE of Fˆ(x). i=1

19 1.4. UNIVARIATE CDF ESTIMATION 1 Teorem 1.. Uder coditios give i Bowma, Hall ad Prva (1998), i particular, assumig tat F (x) is twice cotiuously differetiable, k(v) = d G(v)/dv is bouded, symmetric, ad compactly supported, ad tat d F (x)/dx is Hölder-cotiuous, 0 C ɛ for some 0 < ɛ < 1, te as, 8 [ ] MSE(Fˆ) = E Fˆ(x) F (x) = c 0 (x) 1 c 1 (x) 1 + c (x) 4 + o ( 4 + 1), were c 0 = F (x)(1 F (x)), c 1 (x) = α 0 f(x), α 0 = vg(v)k(v) dv, f(x) = d F (x)/dx, c (x) = [(κ /)F () (x)], κ = v k(v) dv, ad were F (s) (x) = d s F (x)/dx s is te st derivative of F (x). [ ] [ ( )] Proof. Note tat E Fˆ(x) = E G x X i. Te we ave ( = ) [ ( )] ( ) x Xi x z E G = G f(z)dz = G(v)f(x v) dv = G(v)dF (x v) = [G(v)F (x v)] v= v= + k(v)f (x v) dv [ = k(v) F (x) F (1) (x)v + (1/) F () (x)v dv + o( ) = F (x) + (1/)κ F () (x) + o( ), (1.8) were at te secod equality above we used [... ] dv = [... ] dv. Also ote tat we did ot use a Taylor expasio i G(v)F (x v) dv sice v m G(v) dv = + for ay m 0. We first used itegratio by parts to get k(v), ad te used te Taylor expasio sice v m k(v) dv is usually fiite. For example, if k(v) as bouded support or k(v) is a stadard ormal kerel fuctio, te v m k(v) dv is fiite for ay m 0.

20 1. DENSITY ESTIMATION Similarly, ( )] ( ) x E [G Xi x z = G f(z)dz = G (v)f(x v) dv = G (v)df (x v) = G(v)k(v)F (x v) dv = G(v)k(v)[F (x) F (1) (x)v] dv + O( ) = F (x) α 0 f(x) + O( ), were α 0 = vg(v)k(v) dv, ad were we ave used te fact tat G(v)k(v) dv = dg (v) = G ( ) G ( ) = 1, (1.9) because G( ) is a (user-specified) CDF kerel fuctio. From (1.8) we ave bias[fˆ(x)] = (1/)κ F () (x) + o( ), ad from (1.8) ad (1.9) we ave Hece, [ ] var Fˆ(x) [ = 1 var ( )] x Xi G { [ (x )] [ ( )] } Xi x Xi = 1 E G E G = 1 F (x)[1 F (x)] α 0 f(x) 1 + o(/). ( [ ( [ ] E Fˆ(x) F (x)) = bias Fˆ(x))] + var Fˆ(x) = 1 F (x) [1 F (x)] + 4 (κ /) [F () (x) α 0 f(x) + o( ). (1.30) Tis completes te proof of Teorem 1.. ]

21 1.4. UNIVARIATE CDF BANDWIDTH SELECTION: CV METHODS 3 From Teorem 1. we immediately obtai te followig result o te IMSE of Fˆ: [ IMSE(Fˆ) = E Fˆ(x) F (x)] dx = C 0 1 C C 4 + o ( 4 + 1), (1.31) were C j = cj (x) dx (j = 0, 1, ). Lettig 0 deote te value of tat miimizes te leadig term of IMSE, we obtai 0 = a 0 1/3, were a 0 = [C 1 /(4C )] 1/3, ece te optimal smootig parameter for estimatig uivariate a CDF as a faster rate of covergece ta te optimal smootig parameter for estimatig a uivariate PDF ( 1/3 versus 1/5 ). Wit 1/3, we ave = O( /3 ) = o( 1/ ). Hece, [Fˆ(x) F (x)] N(0, F (x)[1 F (x)]) i distributio by te Liapuov cetral limit teorem (CLT); see Teorem A.5 i Appedix A for tis ad a rage of oter useful CLTs. As is te case for oparametric PDF estimatio, oparametric CDF estimatio as widespread potetial applicatio toug it is ot early as widely used. For istace, it ca be used to test stocastic domiace witout imposig parametric assumptios o te uderlyig CDFs; see, e.g., Barrett ad Doald (003) ad Lito, Wag ad Maasoumi (005). 1.5 Uivariate CDF Badwidt Selectio: Cross-Validatio Metods Bowma et al. (1998) suggest coosig for Fˆ(x) by miimizig te followig cross-validatio fuctio: 1 { CV F () = 1(X i x) Fˆ i (x)} dx, (1.3) i=1 X j ) were Fˆ i(x) = ( 1) 1 ( x j=i G is te leave-oe-out estimator of F (x). Bowma et al. (1998) sow tat CV F = E[CV F ] + (s.o.) ad tat

22 4 1. DENSITY ESTIMATION (see Exercise 1.9) E[CV F ()] = F (1 F ) dx + 1 F (1 F ) dx C ). + C 4 + o ( 1 (1.33) We observe tat (1.33) as te same leadig term as IMSE(Fˆ) give i (1.31). Tus, asymptotically, selectig via cross-validatio leads to te same asymptotic optimality property for Fˆ(x) tat would arise we usig 0, te optimal determiistic smootig parameter. If we let ĥ deote te cross-validated smootig parameter, te it ca be sow tat / ˆ 0 1 i probability. Note tat we usig ĥ, te asymptotic distributio of Fˆ(x, ĥ) is te same as Fˆ(x, 0 ) (by usig a stocastic equicotiuity argumet as outlied i Appedix A), tat is, ( ) Fˆ(x) F (x) d N (0, F (x)(1 F (x))), (1.34) were Fˆ(x) is defied i (1.7) wit replaced by ĥ. Note tat o bias term appears i (1.34) sice bias(fˆ(x)) = O( 0 ) = O( /3 ) = o( 1/ ), wic was ot te case for PDF estimatio. Here te squared bias term as order smaller ta te leadig variace term of O( 1 ) (i.e., var(fˆ(x)) = O( 1 )). We ow tur our attetio to a geeralizatio of te uivariate kerel estimators developed above, amely multivariate kerel estimators. Agai, we cosider oly te cotiuous case i tis capter; we tackle discrete ad mixed cotiuous ad discrete data cases i Capters 3 ad Multivariate Desity Estimatio Suppose tat X 1,..., X costitute a i.i.d. q-vector (X i R q, for some q > 1) avig a commo PDF f(x) = f(x 1, x,..., x q ). Let X is deote te st compoet of X i (s = 1,..., q). Usig a product kerel fuctio costructed from te product of uivariate kerel fuctios, we estimate te PDF f(x) by fˆ(x) = 1 ( Xi x ) K, (1.35) 1... q i=1

23 1.6. MULTIVARIATE DENSITY ESTIMATION 5 ( ) ( ) ( ) were K Xi x = k Xi1 x 1 Xiq x k q 1 q, ad were k( ) is a uivariate kerel fuctio satisfyig (1.10). Te proof of MSE cosistecy of fˆ(x) is similar to te uivariate case. I particular, oe ca sow tat ( ) bias fˆ(x) = ( q q sf ss (x) + O 3 s s=1 s=1 κ ), (1.36) were f ss (x) is te secod order derivative of f(x) wit respect to x s, κ = v k(v) dv, ad oe ca also sow tat [ ( )] ( ) q 1 ( ) var fˆ(x) = κ q 1 f(x) + O s = O, 1... q 1... s=1 q (1.37) were κ = k (v) dv. Te proofs of (1.36) ad (1.37), wic are similar to te uivariate X case, are left as a exercise (see Exercise 1.11). Summarizig, we obtai te result ( ) [ ( ( ) MSE fˆ(x) = bias fˆ(x))] + var fˆ(x) ( ) q = O s + ( 1... q ) 1. s=1 Hece, if as, max 1 s q s 0 ad 1... q, te we ave fˆ(x) f(x) i MSE, wic implies tat fˆ(x) f(x) i probability. As we saw i te uivariate case, te optimal smootig parameters s sould balace te squared bias ad variace terms, i.e., 4 s = O ( ( 1... q ) 1) for all s. Tus, we ave s = c s 1/(q+4) for some positive costat c s (s = 1,..., q). Te cross-validatio metods discussed i Sectio 1.3 ca be easily geeralized to te multivariate data settig, ad we ca sow tat least squares cross-validatio ca optimally select te s s i te sese outlied i Sectio 1.3 (see Sectio 1.8 below). We briefly remark o te idepedece assumptio ivoked for te proofs preseted above. Our assumptio was tat te data is idepedet across te i idex. Note tat o restrictios were placed o te s idex for eac compoet X is (s = 1,..., q). Te product kerel is used simply for coveiece, ad it certaily does ot require tat te X is s

24 6 1. DENSITY ESTIMATION are idepedet across te s idex. I oter words, te multivariate kerel desity estimator (1.35) is capable of capturig geeral depedece amog te differet compoets of X i. Furtermore, we sall relax te idepedece across observatios assumptio i Capter 18, ad will see tat all of te results developed above carry over to te weakly depedet data settig. 1.7 Multivariate Badwidt Selectio: Rule-of-Tumb ad Plug-I Metods I Sectio 1. we discussed te use of te so-called ormal referece rule-of-tumb ad plug-i metods i a uivariate settig. Te geeralizatio of te uivariate ormal referece rule-of-tumb to a multivariate settig is straigtforward. Lettig q be te dimesio of X i, oe ca coose s = c s X s,sd 1/(4+q) for s = 1,..., q, were X s,sd is te i=1 sample stadard deviatio of {X is } ad c s is a positive costat. I practice oe still faces te problem of ow to coose c s. Te coice of c s = 1.06 for all s = 1,..., q is computatioally attractive; owever, tis selectio treats te differet X is s symmetrically. I practice, sould te joit PDF cage rapidly i oe dimesio (say i x 1 ) but cage slowly i aoter (say i x ), te oe sould select a relatively small value of c 1 (ece a small 1 ) ad a relatively large value for c ( ). Ulike te cross-validatio metods tat we will discuss sortly, rule-of-tumb metods do ot offer tis flexibility. For plug-i metods, o te oter ad, te leadig (squared) bias ad variace terms of fˆ(x) must be estimated, ad te 1,..., q must be cose to miimize te leadig MSE term of fˆ(x). However, te leadig MSE term of fˆ(x) ivolves te ukow f(x) ad its partial derivative fuctios, ad pilot badwidts must be selected for eac variable i order to estimate tese ukow fuctios. How to best select te iitial pilot smootig parameters ca be tricky i igdimesioal settigs, ad te plug-i metods are ot widely used i applied settigs to te best of our kowledge, or would we cousel teir use oter ta for exploratory data aalysis.

25 1.8. MULTIVARIATE BANDWIDTH SELECTION: CROSS-VALIDATION METHODS Multivariate Badwidt Selectio: Cross-Validatio Metods Least Squares Cross-Validatio Te uivariate least squares cross-validatio metod discussed i Sectio ca be readily geeralized to te multivariate desity estimatio settig. Replacig te uivariate kerel fuctio i (1.3) by a multivariate product kerel, te cross-validatio objective fuctio ow becomes 1 CV f ( 1,..., q ) = K (X i, X j ) i=1 j=1 ( 1) i=1 j=i,j=1 K (X i, X j ), (1.38) were q ( ) K (X i, X j ) = 1 Xis X js s k, s=1 q K (X i, X j ) = 1 s k s=1 s ( Xis X js ), s ad k (v) is te twofold covolutio kerel based upo k( ), were k( ) is a uivariate kerel fuctio satisfyig (1.10). Exercise 1.1 sows tat te leadig term of CV f ( 1,..., q ) is give by (igorig a term urelated to te s s) CV f0 ( 1,..., q ) = [ q Bs (x) s=1 ] κ q s dx +, (1.39) 1... q were B s (x) = (κ /)f ss (x). Defiig a s via s = a s 1/(q+4) (s = 1,..., q), we ave were CV f0 ( 1,..., q ) = 4/(q+4) χ f (a 1,..., a q ), (1.40) [ q κ q χ f (a 1,..., a q ) = Bs (x)a s dx +. (1.41) a 1... a q s=1 ]

26 8 1. DENSITY ESTIMATION Let te a 0 s s be te values of te a s s tat miimize χ f (a 1,..., a q ). Uder te same coditios used i te uivariate case ad, i additio, assumig tat f ss (x) is ot a zero fuctio for all s, Li ad Zou (005) sow tat eac a0 s is uiquely defied, positive, ad fiite (see Exercise 1.10). Let 0 1,..., 0 q deote te values of 1,..., q tat miimize CV f0. Te from (1.40) we kow tat 0 s = a 0 s 1/(q+4) = O ( 1/(q+4)). Exercise 1.1 sows tat CV f0 is also te leadig term of E[CV f ]. Terefore, te ostocastic smootig parameters 0 s ca be iterpreted as optimal smootig parameters tat miimize te leadig term of te IMSE. Let ĥ 1,..., ĥ q deote te values of 1,..., q tat miimize CV f. Usig te fact tat CV f = CV f0 + (s.o.), we ca sow tat ĥ s = 0 s + o p ( 0 s). Tus, we ave ĥ s 0 s 0 s ĥ = s 0 s 1 0 i probability, for s = 1,..., q. (1.4) Terefore, smootig parameters selected via cross-validatio ave te same asymptotic optimality properties as te ostocastic optimal smootig parameters. Note tat if f ss (x) = 0 almost everywere (a.e.) for some s, te B s = 0 ad te above result does ot old. Stoe (1984) sows tat te cross-validatio metod still selects 1,..., q optimally i te sese tat te itegrated estimatio square error is miimized; see also Ouyag et al. (006) for a more detailed discussio of tis case Likeliood Cross-Validatio Likeliood cross-validatio for multivariate models follows directly via (multivariate) maximizatio of te likeliood fuctio outlied i Sectio 1.3., ece we do ot go ito furter details ere. However, we do poit out tat, toug straigtforward to implemet, it suffers from te same defects outlied for te uivariate case i te presece of fat tail distributios (i.e., it as a tedecy to oversmoot i suc situatios). 1.9 Asymptotic Normality of Desity Estimators I tis sectio we sow tat fˆ(x) as a asymptotic ormal distributio. Te most popular CLT is te Lideberg-Levy CLT give i

27 1.9. ASYMPTOTIC NORMALITY OF DENSITY ESTIMATORS 9 Teorem A.3 of Appedix A, wic states tat 1/ [ 1 i=1 Z i ] N(0, σ ) i distributio, provided tat Z i is i.i.d. (0, σ ). Toug te Lideberg-Levy CLT ca be used to derive te asymptotic distributio of various semiparametric estimators discussed i Capters 7, 8, ad 9, it caot be used to derive te asymptotic distributio of fˆ(x). Tis is because fˆ(x) = 1 i Z i,, were te summad Z i, = K (X i, x) depeds o (sice = ()). We sall make use of te Liapuov CLT give i Teorem A.5 of Appedix A Teorem 1.3. Let X 1,..., X be i.i.d. q-vectors wit its PDF f( ) avig tree-times bouded cotiuous derivatives. Let x be a iterior poit of te support of X. If, as, s 0 for all s = 1,..., q, q, ad ( 1... q ) q s=1 s 0, te [ ] q 1... q fˆ(x) f(x) κ d (x) N(0, κ q f(x)). sf ss s=1 Proof. Usig (1.36) ad (1.37), oe ca easily sow tat [ ] q 1... q fˆ(x) f(x) κ sf ss (x) as asymptotic mea zero ad asymptotic variace κ q f(x), i.e., [ ] q 1... q fˆ(x) f(x) κ sf ss (x) s=1 = ( )] 1... q [fˆ(x) E fˆ(x) + [ ( ) 1... q E fˆ(x) = ( 1... q [fˆ(x) E = + O ( 1... q s (1... q ) 1/ i=1 q s=1 s=1 f(x) κ s f ss (x) s=1 )] fˆ(x) ) 3 (by (1.36)) [ ( Xi x ) ( ( Xi x ))] K E K + o(1) Z,i + o(1) N (0, κ q f(x)), i=1 d q ]

28 30 1. DENSITY ESTIMATION by Liapuov s CLT, provided we ca verify tat Liapuov s CLT coditio (A.1) olds, were ( Z,i = ( 1... q ) [K 1/ Xi x ) ( ( Xi x ))] E K ad def σ,i i=1 i=1 = var(z,i ) = κ q f(x) + o(1) by (1.37). Paga ad Ulla (1999, p. 40) sow tat (A.1) olds uder te coditio give i Teorem 1.3. Te coditio tat k(v) +δ dv < for some δ > 0 used i Paga ad Ulla is implied by our assumptio tat k(v) is oegative ad bouded, ad tat k(v) dv = 1, because k(v) +δ dv C k(v) dv = C is fiite, were C = sup v R q k(v) 1+δ Uiform Rates of Covergece Up to ow we ave demostrated oly te case of poitwise ad IMSE cosistecy (wic implies cosistecy i probability). I tis sectio we geeralize poitwise cosistecy i order to obtai a stroger uiform cosistecy result. We will prove tat oparametric kerel estimators are uiformly almost surely cosistet ad derive teir uiform almost sure rate of covergece. Almost sure covergece implies covergece i probability; owever, te coverse is ot true, i.e., covergece i probability may ot imply covergece almost surely; see Serflig (1980) for specific examples. We ave already establised poitwise cosistecy for a iterior poit i te support of X. However, it turs out tat popular kerel fuctios suc as (1.9) may ot lead to cosistet estimatio of f(x) we x is at te boudary of its support, ece we eed to exclude te boudary rages we cosiderig te uiform covergece rate. Tis igligts a importat aspect of kerel estimatio i geeral, ad a umber of kerel estimators itroduced i later sectios are motivated by te desire to mitigate suc boudary effects. We first sow tat we x is at (or ear) te boudary of its support, fˆ(x) may ot be a cosistet estimator of f(x). Cosider te case were X is uivariate avig bouded support. For simplicity we assume tat X [0, 1]. Te poitwise cosistecy result fˆ(x) f(x) = o p (1) obtaied earlier requires tat x lie i te

29 1.10. UNIFORM RATES OF CONVERGENCE 31 iterior of its support. Exercise 1.13 sows tat, for x at te boudary of its support, MSE fˆ(x)) may ot be o(1). Terefore, some modificatios may be eeded to cosistetly estimate f(x) for x at te boudary of its support. Typical modificatios iclude te use of boudary kerels or data reflectio (see Gasser ad Müller (1979), Hall ad Werly (1991), ad Scott (199, pp )). By way of example, cosider te case were x lies o its lowermost boudary, i.e., x = 0, ece fˆ(0) = () 1 i=1 K((X i 0)/). Exercise 1.13 sows tat for tis case, E[fˆ(0)] = f(0)/ + O(). Terefore, bias fˆ(0)] = E[fˆ(0)] f(0) = f(0)/ + O(), wic will ot coverge to zero if f(0) = 0 (we f(0) > 0). I te literature, various boudary kerels are proposed to overcome te boudary (bias) problem. For example, a simple boudary corrected kerel is give by (assumig tat X [0, 1]) 1 k ( y x ) / k(v) dv if x [0, ) k (x, y) = 1 k ( y x ) x/ if x [, 1 ] 1 k ( y x ) / (1 x)/ k(v) dv if x (1, 1], (1.43) were k( ) is a secod order kerel satisfyig (1.10). Now, we estimate f(x) by 1 fˆ(x) = k (x, X i ), (1.44) i=1 were k (x, X i ) is defied i (1.43). Exercise 1.14 sows tat te above boudary corrected kerel successfully overcomes te boudary problem. We ow establis te uiform almost sure covergece rate of fˆ(x) f(x) for x S, were S is a bouded set excludig te boudary rage of te support of X. I te above example, we te support of x is [0, 1], we ca coose S = [ɛ, 1 ɛ] for arbitrarily small positive ɛ (0 < ɛ < 1/). We assume tat f(x) is bouded below by a positive costat o S. Teorem 1.4. Uder smootess coditios o f( ) give i Masry (1996b), ad also assumig tat if x S f(x) δ > 0, we ave ( q (l()) 1/ sup fˆ(x) f(x) = O + ) ) 1/ s almost surely. x S ( 1... q s=1

30 3 1. DENSITY ESTIMATION A detailed proof of Teorem 1.4 is give i Sectio 1.1. Sice almost sure covergece implies covergece i probability, te uiform rate also olds i probability, i.e., uder te same coditios as i Teorem 1.4, we ave ( ) q (l()) 1/ sup fˆ(x) f(x) = O p ( 1... q ) 1/ +. x S Usig te results of (1.36) ad (1.37), we ca establis te followig uiform MSE rate. Teorem 1.5. Assumig tat f(x) is twice differetiable wit bouded secod derivatives, te we ave { [ ] } sup E fˆ(x) f(x) x S = O q s=1 ( 4 + ( 1... q ) 1 ) Proof. Tis follows from (1.36) ad (1.37), by otig tat sup ad supx S f ss (x) are bot fiite (s = 1,..., q). s=1 s s. x S f(x) Note tat altoug covergece i MSE implies covergece i probability, oe caot derive te uiform covergece rate i probability from Teorem 1.5. Tis is because { E sup x S [ ] } fˆ(x) f(x) = sup E x S [ fˆ(x) f(x)], ad [ ] P sup fˆ(x) f(x) > ɛ = x S [ ] sup P fˆ(x) f(x) > ɛ x S. Te sup ad te E( ) or te P( ) operators do ot commute wit oe aoter. Ceg (1997) proposes alterative (local liear) desity estimators tat acieve automatic boudary correctios ad ejoy some typical optimality properties. Ceg also suggests a data-based badwidt selector (i te spirit of plug-i rules), ad demostrates tat te badwidt selector is very efficiet regardless of weter tere are osmoot boudaries i te support of te desity.

31 1.11. HIGHER ORDER KERNEL FUNCTIONS Higer Order Kerel Fuctios Recall tat decreasig te badwidt lowers te bias of a kerel estimator but icreases its variace. Higer order kerel fuctios are devices used for bias reductio wic are also capable of reducig te MSE of te resultig estimator. May popular kerel fuctios suc as te oe defied i (1.10) are called secod order kerels. Te order of a kerel, ν (ν > 0), is defied as te order of te first ozero momet. For example, if uk(u) du = 0, but u k(u) du = 0, te k( ) is said to be a secod order kerel (ν = ). A geeral νt order kerel (ν is a iteger) must terefore satisfy te followig coditios: (i) k(u) du = 1, (ii) u l k(u) du = 0, (l = 1,..., ν 1), (1.45) (iii) uν k(u)du = κ ν = 0. Obviously, we ν =, (1.45) collapses to (1.10). If oe replaces te secod order kerel i fˆ(x) of (1.35) by a νt order kerel fuctio, te as was te case we usig a secod order kerel, uder te assumptio tat f(x) is νt order differetiable, ad assumig tat te s s all ave te same order of magitude, oe ca sow tat ad ( ) bias fˆ(x) = O ( q ν s s=1 ) (1.46) ( ) var fˆ(x) = O ( ( 1... q ) 1) (1.47) (see Exercise 1.15). Hece, we ave ( ) MSE fˆ(x) q = O ( s s=1 ν + ( 1... q ) 1 ) (1.48) ad q ( ) fˆ(x) f(x) = O p ν + ( 1... q ) 1/. s=1 Tus, by usig a νt iger order kerel fuctio (ν > ), oe ca reduce te order of te bias of fˆ(x) from O ( q s=1 s) to O ( q s=1 ν s), s

32 34 1. DENSITY ESTIMATION ad te optimal value of s may oce agai be obtaied by balacig te squared bias ad te variace, givig s = O ( 1/(ν+q)), wile te rate of covergece is ow fˆ(x) f(x) = O p ( ν/(ν+q) ). Assumig tat f(x) is differetiable up to ay fiite order, te oe ca coose ν to be sufficietly large, ad te resultig rate ca be made arbitrarily close to O p ( 1/ ). Note, owever, tat for ν >, o oegative kerel exists tat satisfies (1.45). Tis meas tat, ecessarily, we ave to assig egative weigts to some rage of te data wic implies tat oe may get egative desity estimates, clearly a udesirable sideeffect. Furtermore, i fiite-sample applicatios oegative secod order kerels ave ofte bee foud to yield more stable estimatio results ta teir iger order couterparts. Terefore, iger order kerel fuctios are maily used for teoretical purposes; for example, to acieve a -rate of covergece for some fiite dimesioal parameter i a semiparametric model, oe ofte as to use ig order kerel fuctios (see Capter 7 for suc a example). Higer order kerel fuctios are quite easy to costruct. Assumig tat k(u) is symmetric aroud zero, 5 i.e., k(u) = k( u), te u m+1 k(u) du = 0 for all positive itegers m. By way of example, i order to costruct a simple fourt order kerel (i.e., ν = 4), oe could begi wit, say, a secod order kerel suc as te stadard ormal kerel, set up a polyomial i its argumet, ad solve for te roots of te polyomial subject to te desired momet costraits. For example, lettig Φ(u) = (π) 1/ exp( u /) be a secod order Gaussia kerel, we could begi wit te polyomial k(u) = (a + bu )Φ(u), (1.49) were a ad b are two costats wic must satisfy te requiremets of a fourt order kerel. Lettig k(u) satisfy (1.45) wit ν = 4 ( u l k(u) du = 0 for l = 1, 3 because k(u) is a eve fuctio), we terefore oly require k(u) du = 1 ad u k(u) du = 0. From tese two restrictios, oe ca easily obtai te result a = 3/ ad b = 1/. For readers requirig some iger order kerel fuctios, we provide a few examples based o te secod order Gaussia ad Epaecikov kerels, peraps te two most popular kerels i applied oparametric estimatio. As oted, te fourt order uivariate Gaussia kerel 5 Typically, oly symmetric kerel fuctios are used i practice, toug see Abadir ad Lawford (004) for recet work ivolvig optimal asymmetric kerels.

33 1.11. PROOF OF THEOREM is give by te formula ( 3 1 k(u) = u ) exp( u /), π wile te sixt order uivariate Gaussia kerel is give by ( 15 5 k(u) = u + 1 u 4) exp( u /) π Te secod order uivariate Epaecikov kerel is te optimal kerel based o a calculus of variatios solutio to miimizig te IMSE of te kerel estimator (see Serflig (1980, pp )). Te uivariate secod order Epaecikov kerel is give by te formula { ( u ) if u < 5.0 k(u) = oterwise, te fourt order uivariate Epaecikov kerel by { ( k(u) = u ) ( 1 u ) 8 5 if u < oterwise, wile te sixt order uivariate Epaecikov kerel is give by { ( k(u) = u 31 + u 4) ( 1 1 u ) if u < oterwise. Figure 1. plots te secod, fourt, ad sixt order Epaecikov kerels defied above. Clearly, for ν >, te kerels ideed assig egative weigts wic ca result i egative desity estimates, ot a desirable feature. For related work ivolvig exact mea itegrated squared error for iger order kerels i te cotext of uivariate kerel desity estimatio, see Hase (005). Also, for related work usig iterative metods to estimate trasformatio-kerel desities, see Yag ad Marro (1999) ad Yag (000). 1.1 Proof of Teorem 1.4 (Uiform Almost Sure Covergece) Te proof below is based o te argumets preseted i Masry (1996b), wo establises uiform almost sure rates for local polyomial regressio wit weakly depedet (α-mixig) data; see Capter 18 for furter details o weakly depedet processes. Sice te bias of te kerel

4 Conditional Distribution Estimation

4 Conditional Distribution Estimation 4 Coditioal Distributio Estimatio 4. Estimators Te coditioal distributio (CDF) of y i give X i = x is F (y j x) = P (y i y j X i = x) = E ( (y i y) j X i = x) : Tis is te coditioal mea of te radom variable

More information

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more

More information

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley Notes O Noparametric Desity Estimatio James L Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Uivariate Desity Estimatio via Numerical Derivatives Cosider te problem of estimatig te desity

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Nonparametric regression: minimax upper and lower bounds

Nonparametric regression: minimax upper and lower bounds Capter 4 Noparametric regressio: miimax upper ad lower bouds 4. Itroductio We cosider oe of te two te most classical o-parametric problems i tis example: estimatig a regressio fuctio o a subset of te real

More information

Lecture 9: Regression: Regressogram and Kernel Regression

Lecture 9: Regression: Regressogram and Kernel Regression STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 9: Regressio: Regressogram ad erel Regressio Istructor: Ye-Ci Ce Referece: Capter 5 of All of oparametric statistics 9 Itroductio Let X,

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Lecture 7 Testing Nonlinear Inequality Restrictions 1

Lecture 7 Testing Nonlinear Inequality Restrictions 1 Eco 75 Lecture 7 Testig Noliear Iequality Restrictios I Lecture 6, we discussed te testig problems were te ull ypotesis is de ed by oliear equality restrictios: H : ( ) = versus H : ( ) 6= : () We sowed

More information

ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1

ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1 Teory of Stocastic Processes Vol2 28, o3-4, 2006, pp*-* SILVELYN ZWANZIG ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS Local liear metods are applied to a oparametric regressio

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

ALLOCATING SAMPLE TO STRATA PROPORTIONAL TO AGGREGATE MEASURE OF SIZE WITH BOTH UPPER AND LOWER BOUNDS ON THE NUMBER OF UNITS IN EACH STRATUM

ALLOCATING SAMPLE TO STRATA PROPORTIONAL TO AGGREGATE MEASURE OF SIZE WITH BOTH UPPER AND LOWER BOUNDS ON THE NUMBER OF UNITS IN EACH STRATUM ALLOCATING SAPLE TO STRATA PROPORTIONAL TO AGGREGATE EASURE OF SIZE WIT BOT UPPER AND LOWER BOUNDS ON TE NUBER OF UNITS IN EAC STRATU Lawrece R. Erst ad Cristoper J. Guciardo Erst_L@bls.gov, Guciardo_C@bls.gov

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Uniformly Consistency of the Cauchy-Transformation Kernel Density Estimation Underlying Strong Mixing

Uniformly Consistency of the Cauchy-Transformation Kernel Density Estimation Underlying Strong Mixing Appl. Mat. If. Sci. 7, No. L, 5-9 (203) 5 Applied Matematics & Iformatio Scieces A Iteratioal Joural c 203 NSP Natural Scieces Publisig Cor. Uiformly Cosistecy of te Caucy-Trasformatio Kerel Desity Estimatio

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Hazard Rate Function Estimation Using Weibull Kernel

Hazard Rate Function Estimation Using Weibull Kernel Ope Joural of Statistics, 4, 4, 65-66 Publised Olie September 4 i SciRes. ttp://www.scirp.org/joural/ojs ttp://d.doi.org/.46/ojs.4.486 Hazard Rate Fuctio Estimatio Usig Weibull Kerel Raid B. Sala, Hazem

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley Notes O Media ad Quatile Regressio James L. Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Coditioal Media Restrictios ad Least Absolute Deviatios It is well-kow that the expected value

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

DEGENERACY AND ALL THAT

DEGENERACY AND ALL THAT DEGENERACY AND ALL THAT Te Nature of Termodyamics, Statistical Mecaics ad Classical Mecaics Termodyamics Te study of te equilibrium bulk properties of matter witi te cotext of four laws or facts of experiece

More information

On the convergence, consistence and stability of a standard finite difference scheme

On the convergence, consistence and stability of a standard finite difference scheme AMERICAN JOURNAL OF SCIENTIFIC AND INDUSTRIAL RESEARCH 2, Sciece Huβ, ttp://www.sciub.org/ajsir ISSN: 253-649X, doi:.525/ajsir.2.2.2.74.78 O te covergece, cosistece ad stabilit of a stadard fiite differece

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

CONCENTRATION INEQUALITIES

CONCENTRATION INEQUALITIES CONCENTRATION INEQUALITIES MAXIM RAGINSKY I te previous lecture, te followig result was stated witout proof. If X 1,..., X are idepedet Beroulliθ radom variables represetig te outcomes of a sequece of

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 20: Multivariate convergence and the Central Limit Theorem Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

On Exact Finite-Difference Scheme for Numerical Solution of Initial Value Problems in Ordinary Differential Equations.

On Exact Finite-Difference Scheme for Numerical Solution of Initial Value Problems in Ordinary Differential Equations. O Exact Fiite-Differece Sceme for Numerical Solutio of Iitial Value Problems i Ordiar Differetial Equatios. Josua Suda, M.Sc. Departmet of Matematical Scieces, Adamawa State Uiversit, Mubi, Nigeria. E-mail:

More information

Computation Of Asymptotic Distribution For Semiparametric GMM Estimators

Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Computatio Of Asymptotic Distributio For Semiparametric GMM Estimators Hideiko Icimura Departmet of Ecoomics Uiversity College Lodo Cemmap UCL ad IFS April 9, 2004 Abstract A set of su ciet coditios for

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8) Elemets of Statistical Methods Lots of Data or Large Samples (Ch 8) Fritz Scholz Sprig Quarter 2010 February 26, 2010 x ad X We itroduced the sample mea x as the average of the observed sample values x

More information

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution Large Sample Theory Covergece Covergece i Probability Covergece i Distributio Cetral Limit Theorems Asymptotic Distributio Delta Method Covergece i Probability A sequece of radom scalars {z } = (z 1,z,

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley Notes O Noarametric Desity Estimatio James L. Powell Deartmet of Ecoomics Uiversity of Califoria, Berkeley Uivariate Desity Estimatio via Numerical Derivatives Cosider te roblem of estimatig te desity

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002 ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

STA Object Data Analysis - A List of Projects. January 18, 2018

STA Object Data Analysis - A List of Projects. January 18, 2018 STA 6557 Jauary 8, 208 Object Data Aalysis - A List of Projects. Schoeberg Mea glaucomatous shape chages of the Optic Nerve Head regio i aimal models 2. Aalysis of VW- Kedall ati-mea shapes with a applicatio

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

11 THE GMM ESTIMATION

11 THE GMM ESTIMATION Cotets THE GMM ESTIMATION 2. Cosistecy ad Asymptotic Normality..................... 3.2 Regularity Coditios ad Idetificatio..................... 4.3 The GMM Iterpretatio of the OLS Estimatio.................

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information