Kernel density estimator

Similar documents
LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

LECTURE 8: ASYMPTOTICS I

Topic 9: Sampling Distributions of Estimators

Lecture 19: Convergence

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 33: Bootstrap

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

32 estimating the cumulative distribution function

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Distribution of Random Samples & Limit theorems

Sequences and Series of Functions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Topic 9: Sampling Distributions of Estimators

Lecture 8: Convergence of transformations and law of large numbers

Topic 9: Sampling Distributions of Estimators

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 20: Multivariate convergence and the Central Limit Theorem

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Unbiased Estimation. February 7-12, 2008

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

1 The Haar functions and the Brownian motion

An Introduction to Asymptotic Theory

Notes 5 : More on the a.s. convergence of sums

Lecture 27: Optimal Estimators and Functional Delta Method

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

MA Advanced Econometrics: Properties of Least Squares Estimators

1 Convergence in Probability and the Weak Law of Large Numbers

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

This section is optional.

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

6.3 Testing Series With Positive Terms

Chapter 6 Infinite Series

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Stat 421-SP2012 Interval Estimation Section


Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 15: Density estimation

1 Approximating Integrals using Taylor Polynomials

7.1 Convergence of sequences of random variables

Random Variables, Sampling and Estimation

Statistical Inference Based on Extremum Estimators

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

STAT Homework 1 - Solutions

Advanced Stochastic Processes.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Lecture Chapter 6: Convergence of Random Sequences

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Lecture 2: Monte Carlo Simulation

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

STA Object Data Analysis - A List of Projects. January 18, 2018

Exponential Families and Bayesian Inference

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

MAT1026 Calculus II Basic Convergence Tests for Series

Output Analysis and Run-Length Control

Chapter 2 The Monte Carlo Method

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Notes 19 : Martingale CLT

Expectation and Variance of a random variable

Efficient GMM LECTURE 12 GMM II

Lecture 7: Properties of Random Samples

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Lecture 3 The Lebesgue Integral

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Frequentist Inference

Math Solutions to homework 6

Fall 2013 MTH431/531 Real analysis Section Notes

Last Lecture. Wald Test

7.1 Convergence of sequences of random variables

Stochastic Simulation

Glivenko-Cantelli Classes

Rates of Convergence by Moduli of Continuity

1.010 Uncertainty in Engineering Fall 2008

Problem Set 4 Due Oct, 12

Estimation of the Mean and the ACVF

STAT Homework 2 - Solutions

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

4. Partial Sums and the Central Limit Theorem

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Math 341 Lecture #31 6.5: Power Series

Asymptotic Results for the Linear Regression Model

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Approximations and more PMFs and PDFs

Lecture 18: Sampling distributions

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Lecture 11 and 12: Basic estimation theory

Transcription:

Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I ecoomics, oparametric desity estimatio plays importat roles i various areas such as, for example, idustrial orgaizatio Guerre et al, 000, empirical fiace Ait-Sahalia, 996, ad etc These otes borrow from the followig sources: Li ad Racie 007, Paga ad Ullah 999, ad Härdle ad Lito 994 erel desity estimator Assumptio a Suppose {X i : i =,, } is a collectio of iid radom variables draw from a distributio with the CDF F ad PDF f b I the eighborhood N x of x, f is bouded ad twice cotiuously differetiable with bouded derivatives Whe discussig f x, we will implicitly assume that f x exists at x The ecoometricia s objective is to estimate f without imposig ay fuctioal form parametric assumptios o the PDF First, cosider estimatio of F Sice a estimator of F ca be costructed as F x = E {X i x}, ˆF x = {X i x} The fuctio ˆF x is called the empirical CDF of X i The WLLN implies that for all x, i= ˆF x p F x As a matter of fact, a stroger results ca be established Gliveko-Gatelli Theorem, see Chapter 9 of va der Vaart 998: sup ˆF x F x as 0 x R Next, by the CLT, / ˆF x F x d N 0, F x F x Furthermore, for ay x, x R, / ˆF x F x ad / ˆF x F x are joitly asymptotically ormal with mea zero ad the covariace F x x F x F x, where x x deotes the miimum betwee x ad x Sice df x F x + h F x h f x = = lim, dx h 0 h from, oe ca cosider the followig estimator for the PDF f: ˆf x = ˆF x + h ˆF x h h = {x h X i x + h }, h i=

where h is a small umber ote that we cosider cotiuously distributed radom variables, so that P X i = x h = 0 We write h istead of just h because, typically, it will be a fuctio of the sample size such that lim h = 0 Now, defie the followig kerel fuctio: The, the kerel PDF estimator is give by u = { u } ˆf x = h 3 i= Thus, with the kerel fuctio defied accordig to, the kerel desity estimator is a average umber of observatios i the small eighborhood of x as defied by the smoothig parameter or badwidth also kerel widow The kerel fuctio i is called uiform, because it correspods to the uiform distributio we have that u du = It has a disadvatage of givig equal weights to all observatios iside the h -widow with the ceter at x, regardless of how close they are to the ceter Also, if oe cosiders ˆf x as a fuctio of x, it is rough havig jumps at the poits X i ±h, ad has a zero derivative everywhere else Those problems ca be resolved if oe cosiders alterative kerel fuctios, for example, the quadratic kerel: h u = 5 u { u } 6 The class of estimators 3 with a kerel satisfyig u du = is referred to as Roseblatt-Parze erel Estimator Small sample properties of the kerel desity estimator We will make the followig assumptio cocerig : Assumptio a u du = b u = u c is compactly supported o [, ] ad bouded d u u du 0 The kerel desity estimator is biased: Lemma Uder Assumptios a ad a, E ˆf x f x = u f x + uh f x du Proof E ˆf x = h E i= = h E = h h h u x h f u du Next, usig chage of variable y = u x /h, u = x + yh, ad du = h dy, we obtai E ˆf x = u f x + uh du, ad the result follows sice f x u du = f x by Assumptio a

Lemma Uder Assumptios a ad a, the variace of ˆf x is give by V ar ˆf x = h u f x + uh du u f x + uh du Proof Sice the data are iid, V ar ˆf x = V ar h h = E E h h h h By the same chage of variable argumet as i the proof of Lemma, we obtai E u x = f u du h h = h u f x + uh du From Lemma, oe ca expect that the bias icreases with h ; a bigger badwidth implies that more observatios away from x have o-zero weights which cotributes to the bias O the other had, the variace decreases with h, as the estimator averages over more observatios The theorem below establishes more formally the bias-variace trade-off for the kerel estimator Let f s deote the s-order derivative of f: f s x = ds f x dx s Theorem Suppose that h 0 ad h as The, uder Assumptios ad, a E ˆf x f x = c x h + o h, where c x = f x u u du/ b V ar ˆf x = c x / h + O /, where c x = f x u du Proof Sice the first two derivative of f exist by Assumptio b, cosider the followig expasio for f x + uh : f x + uh = f x + f x uh + f x u u h, where x u lies betwee x ad x + uh From Lemma we have E ˆf x f x = u f x uh + f x u u h du = h u f x u u du = c h + h u f x u f x u du 4 The secod equality follows because by Assumptio b, u udu = 0 We will show ext that u f x u f x u du = o, 5 3

ad therefore the secod summad i 4 is o h By Assumptio c, we oly eed to cosider u ; by Assumptio b ad sice x u lies betwee x ad x + uh, f x u f f x sup z < z N x Next, sice h 0, Now, by the domiated covergece theorem, lim lim x u = x u f x u f x u du = = 0, u lim f x u f x u du which establishes 5 ad cocludes the proof of part a of the theorem For part b, u f x + uh du = f x u du h h + f x u udu + h u f x u u du = c h + O h, 6 sice u udu = 0 by symmetry Assumptio b, ad u f x u u du = O as i the proof of part a The result of part b follows from 6 ad Lemma Agai, Theorem shows the bias-variace trade-off The optimal choice of badwidth ca be foud by miimizig some fuctio that combies bias ad variace, for example, the mea squared error MSE: MSE ˆf x = E ˆf x f x = V ar ˆf x + E ˆf x f x = c x h = c x h Miimizatio of the leadig term of MSE gives + c x h 4 + O + c x h 4 + o 4c x h 3 = c x h, or + o h 4 h + h 4 7 h = = /5 c x 4c x /5 f x u du /5 f x u u du /5 /5 4

Whe the optimal i the MSE sese badwidth is selected, either bias or variace compoets of the MSE domiate each other asymptotically as V ar = Bias = O 4/5 Whe the Itegrated MSE criterio is employed, MSE ˆf x dx, the optimal badwidth becomes u du /5 h = f x /5 dx u u du /5 /5 Let ˆσ deote the sample variace of the data The followig rules of thumb ofte used i practice: h = 364ˆσ u du /5 u u du /5 /5, which is optimal for f x N µ, σ, ad h = 06ˆσ /5, which is optimal for f x N µ, σ ad whe is the stadard ormal desity Cosistecy of the kerel desity estimator Cosistecy of ˆf x follows immediately from Theorem by Chebychev s iequality Corollary Suppose that h 0 ad h as The, uder Assumptios ad, ˆf x p f x Proof By Chebychev s iequality, P ˆf x f x > ε E ˆf x f x ε = c x ε + c x h 4 h ε 0, + o + h 4 h where the secod lie is by 7 A stroger result ca be give, see Newey 994 Suppose that f admits at least m cotiuous derivatives o some iterval [x, x ]; has at least m cotiuous derivatives, is compactly supported ad of order m: u j u du = 0 for all j =,, m ; u m u du 0, ad u du = The sup x [x,x ] ˆf / h x f x = O p + h m log The derivatives of f ca be estimated by the derivative of ˆf, however, with a slower rates of covergece Newey 994 shows that sup x [x,x ] ˆf k h x f k k / x = O p + h m log 5

Asymptotic ormality of the kerel desity estimator Write ˆf x = h i= h = v i, where v i = h i= h Note that h ad cosequetly v i deped o The collectio {{v i : i =,, } : N} is called a triagular array I our case, uder Assumptio, v i s are iid The followig CLT is available for idepedet triagular arrays Lehma ad Romao, 005, Corollary, page 47 Lemma 3 Lyapouov CLT Suppose that for each, w,, w are idepedet Assume that Ew i = 0 ad σi = Ew i <, ad defie s = i= σ i Suppose further that for some δ > 0 the followig coditio holds: lim E w i +δ = 0 8 The, i= s +δ w i /s d N 0, i= The coditio 8 is called Lyapouov s coditio Whe the data are ot just idepedet but iid, the Lyapouov s coditio ca be simplified as follows Davidso, 994, Theorem 3 o page 373 Lemma 4 The Lyapouov s coditio is satisfied whe w,, w are iid, σ = Ewi > 0 uiformly i, ad lim E w i +δ / δ/ = 0 for some δ > 0 Proof Sice the data are iid, We have i= s +δ s = σ, ad +δ s +δ = / σ = +δ/ σ +δ E w i +δ = σ δ δ/ E w i +δ i= = σ δ δ/ E w i +δ Therefore, the Lyapouov s coditio is satisfied if δ/ E w i +δ 0, sice σ is uiformly bouded away from zero by the assumptio Assumig that lim σ exists, i the iid case, the result of Lyapouov CLT ca be stated as follows Corollary Suppose that for each, w,, w are iid, Ew i = 0, lim Ewi lim E w i +δ / δ/ = 0 for some δ > 0 The, / w i d N i= 0, lim Ew i > 0 ad fiite, ad 6

Next, we prove asymptotic ormality of the kerel desity estimator Theorem Suppose that h ad h / h 0 Assume further that f x > 0 The, uder Assumptios ad, h / ˆf x f x d N 0, f x u du 9 Furthermore, for x x, h / ˆf x f x ad h / ˆf x f x are asymptotically idepedet Proof By Theorem a, Defie The, h / ˆf x f x = h E w i = h / / h / h i= + h / h E h h / ˆf x f x h = = h f x = O h / / h / E h h / f x E h w i + O p h / h 0 i= w i + o p, where the equality i the secod lie is by the assumptio that h / h 0 It is ow left to verify the coditios of Corollary By the defiitio of w i, Ew i = 0 Next, Ewi = E E h h h h As i the proof of Lemma ad by the domiated covergece theorem, E = h u f x + uh du h i= = O h, so that the secod summad i is O h ad asymptotically egligible For the first term i, we ca use the chage of variable argumet agai: E = u x du h h h h = u f x + uh du, f x u du, 3 7

where the last result is by the domiated covergece theorem The results i -3 together imply that lim Ew i = f x u du Lastly, we show that E w i +δ / δ/ 0 We will use the c r iequality Davidso, 994, Theorem 98 o page 40 i order to deal with E w i +δ : for r > 0, m r m E X i c r E X i r, i= where c r = whe r, ad c r = m r whe r Now, by the c r iequality, E w i +δ +δ E h +δ/ +δ + h h +δ/ E +δ h By, Further, h +δ/ h +δ/ E +δ h i= E +δ h = = h +δ/ h δ/ = O h δ/ = O h +δ/ u x h +δ f u du u +δ f x + uh du where the equality i the last lie is agai by the domiated covergece theorem Hece, E w i +δ δ/ = O δ/ h This completes the proof of 9 I order to show asymptotic idepedece of ˆf x ad ˆf x, cosider their asymptotic covariace: E = u x u x f u du h h h h h h = u u + x x f x + uh du h Sice the kerel fuctio is compactly supported ad lim x x /h =, lim u + x x = 0, h ad by the domiated covergece theorem, u lim u + x x h Asymptotic idepedece the follows by the Cramer-Wold device, f x + uh du = 0 8

From 0, oe ca see that the assumptio h / h 0 is used to make the bias asymptotically egligible Cosequetly, there is uder-smoothig relatively to the MSE-optimal badwidth, ad the bias goes to zero at a faster rate tha the variace Suppose that the badwidth is chose accordig to h = c α The, h / h / α/ α = 5α/, ad for h / h 0 to hold, we eed that 5α < 0 or α > /5 Thus, for asymptotic ormality, the badwidth is o /5, while the MSE-optimal badwidth is h = c /5 A more geeral statemet of the asymptotic ormality result that also icludes the bias result, ie without imposig uder-smoothig is h / ˆf x fx 05h f x u udu d N 0, fx udu 4 The result i 4 holds provided that h ad does ot require that h / h 0 I particular, if oe chooses h = ah /5, the h / ˆf x fx d N a5/ f x u udu, fx udu, ad the kerel desity estimator is asymptotically biased Multivariate kerel desity estimatio ad the curse of dimesioality Suppose ow that {X i : i =,, } is a collectio of iid radom d-vectors draw from a distributio with a joit PDF f x,, x d The uivariate kerel desity estimator ca be exteded to the multivariate case as follows: ˆf x,, x d = d i= j= h Xij x j h = h d i= j= d Xij x j ote h d i the deomiator istead of h Oe ca see that the multivariate kerel desity estimator is a extesio of uivariate kerel smoothig to d dimesios or d variables I the multivariate case, oe ca establish results similar to those of the uivariate case To simplify the otatio, let ad write Also for u = u,, u d R d, let x = x x d R d, fx,, x d = fx d u = d u j, j= h, 9

so that ˆf x,, x d = ˆf x = h d i= dx i x/h Note that d udu = u du d u d du d = u du =, where the secod equality follows by Assumptio a Similar results to those show for the uivariate estimator ca be established i the multivariate case Assumptio 3 a Suppose {X i : i =,, } is a collectio of iid radom vectors draw from the distributio with a joit PDF f b I the eighborhood N x of x, f is bouded ad twice cotiuously differetiable with bouded partial derivatives Theorem 3 Suppose that h 0 ad h as The, uder Assumptios ad 3, fx x j a E ˆf x f x = fx + h u d u du j= b V ar ˆf x = fx u h d du + O /h d + / + oh Proof For part a, E ˆf x = h d u x d fudu = h d vfx + h vdv, where we used the chage of variable v = u x/h, u = x + h v, du j du = du du d, dv = dv dv d Next, = h dv j for j =,, d, ad fx + h v = fx + h v fx x + h v fx v x x v, where x v deotes the mea-value satisfyig x v x h v, ie it lies betwee x ad x + h v Sice the kerel fuctio is symmetric aroud zero, v d vdv = 0 By Assumptio 3b ad the same argumets as i the proof of Theorem a, fx v x x fx x x Hece, E ˆf x = fx + h v fx x x v dvdv + oh = fx + d d fx h v i v j d vdv + oh x i x j = fx + h = fx + h d j= i= j= fx x j v v dv d j= v j v j dv j + h fx x j d i= j i + oh, = o fx x i x j v i v i dv i v j v j dv j + oh 0

where the equality i the last lie holds due to the symmetry of the kerel fuctio u aroud zero For part b, V ar ˆf x = V ar d h d h [ = h d Ed h = [ h d Ed h = [ h d Ed h = h d Ed h = u x h d d h = h d dufx + h udu + O = h d du fx + h u fx x + h u fx v x x = d h h d fx u du + O h d +, ] h d E d h fx + Oh ] holds by the result i a ] f x + Oh + O fudu + O u du + O where the last lie follows sice ud udu = 0 due to the symmetry of the kerel fuctio aroud zero, ad because d udu = d u du The bias ad variace calculatios imply that ˆf x = fx + O p + h h d, ad therefore the rate of covergece slows dow with the umber of variables d I the oparametric literature, this is referred to as the curse of dimesioality Oe ca derive the MSE-poit-optimal badwidth cosiderig the leadig terms i the bias ad variace expressios implied by Theorem 3: MSE x = 4 h4 u u du Miimizig the MSE with respect to h, we obtai: h 3 u u du d j= Therefore, the MSE optimal badwidth is give by d j= fx x j fx d u du h = d u d u du j= fx x j fx x j + fx d u du h d = d fx d u du h d+ /d+4 /d+4

Oe ca see that the rate of the optimal badwidth, /d+4, icreases with the umber of variables d, ie oe should use larger values for the badwidth whe there are more variables Oe ca exted the uivariate CLT to the multivariate case as follows: h d / ˆf x fx d h u d fx u du x d N 0, fx u du j= j To elimiate the asymptotic bias, oe has to choose a uder-smoothig badwidth so that h d / h 0 Oe ca also icorporate differet badwidth values for differet variables: ˆf x,, x d = h h h d i= j= The bias ad variace results i this case take the followig form: d Xij x j E ˆf x = fx + d u fx u du h j + o h + + h d x j= j V ar ˆf x = fx d u du h + O + + h d + h h d h h d h j Aalogously, the CLT statemet ca be modified as h h d / ˆf x fx d u u du j= fx x j h j d N 0, fx u du d

Bibliography Ait-Sahalia, Y 996: Testig Cotiuous-Time Models of the Spot Iterest Rate, The Review of Fiacial Studies, 9, 385 46 Davidso, J 994: Stochastic Limit Theory, New York: Oxford Uiversity Press Guerre, E, I Perrige, ad Q Vuog 000: Optimal Noparametric Estimatio of First-Price Auctios, Ecoometrica, 68, 55 74 Härdle, W ad O Lito 994: Applied Noparametric Methods, i Hadbook of Ecoometrics, ed by R F Egle ad D L McFadde, Amsterdam: Elsevier, vol 4, chap 38, 95 339 Lehma, E L ad J P Romao 005: Testig Statistical Hypotheses, New York: Spriger, third ed Li, Q ad J S Racie 007: Noparametric Ecoometrics: Theory ad Practice, Priceto, New Jersey: Priceto Uiversity Press Newey, W 994: erel Estimatio of Partial Meas ad a Geeral Variace Estimator, Ecoometric Theory, 0, 33 53 Paga, A ad A Ullah 999: Noparametric Ecoometrics, New York: Cambridge Uiversity Press va der Vaart, A W 998: Asymptotic Statistics, Cambridge: Cambridge Uiversity Press 3