LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

Similar documents
Kernel density estimator

4 Conditional Distribution Estimation

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

Convergence of random variables. (telegram style notes) P.J.C. Spreij

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

7.1 Convergence of sequences of random variables

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 15: Density estimation

10/ Statistical Machine Learning Homework #1 Solutions

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Lecture 19: Convergence

Nonparametric regression: minimax upper and lower bounds

Lecture 7: Properties of Random Samples

Lecture 9: Regression: Regressogram and Kernel Regression

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

INFINITE SEQUENCES AND SERIES

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

Exponential Families and Bayesian Inference

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

1 The Haar functions and the Brownian motion

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

CONCENTRATION INEQUALITIES

Lecture 13: Maximum Likelihood Estimation

7.1 Convergence of sequences of random variables

Topic 9: Sampling Distributions of Estimators

Unbiased Estimation. February 7-12, 2008

Chapter 6 Infinite Series

LECTURE 8: ASYMPTOTICS I

Lecture 18: Sampling distributions

ST5215: Advanced Statistical Theory

ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1

Statistical Theory MT 2008 Problems 1: Solution sketches

Lecture 33: Bootstrap

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Lecture 7: October 18, 2017

Statistical Theory MT 2009 Problems 1: Solution sketches

Week 6. Intuition: Let p (x, y) be the joint density of (X, Y ) and p (x) be the marginal density of X. Then. h dy =

Math 341 Lecture #31 6.5: Power Series

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

Lecture 8: Convergence of transformations and law of large numbers

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

Fall 2013 MTH431/531 Real analysis Section Notes

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

MATH4822E FOURIER ANALYSIS AND ITS APPLICATIONS

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

6. Uniform distribution mod 1

Lecture 3 The Lebesgue Integral

Chapter 6 Principles of Data Reduction

Advanced Stochastic Processes.

Math 216A Notes, Week 5

Uniformly Consistency of the Cauchy-Transformation Kernel Density Estimation Underlying Strong Mixing

Sequences and Series of Functions

Lecture 3: August 31

ALLOCATING SAMPLE TO STRATA PROPORTIONAL TO AGGREGATE MEASURE OF SIZE WITH BOTH UPPER AND LOWER BOUNDS ON THE NUMBER OF UNITS IN EACH STRATUM

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Lecture 10 October Minimaxity and least favorable prior sequences

Problem Set 4 Due Oct, 12

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Stat410 Probability and Statistics II (F16)

Regression with an Evaporating Logarithmic Trend

Lecture 7 Testing Nonlinear Inequality Restrictions 1

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

A Proof of Birkhoff s Ergodic Theorem

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities

Solutions to HW Assignment 1

Semiparametric Mixtures of Nonparametric Regressions

Application to Random Graphs

Linear Support Vector Machines

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Lecture 11 October 27

Probability and Statistics

LIMITS AND DERIVATIVES

MA Advanced Econometrics: Properties of Least Squares Estimators

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

4. Partial Sums and the Central Limit Theorem

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Empirical Processes: Glivenko Cantelli Theorems

Maximum Likelihood Estimation and Complexity Regularization

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

This section is optional.

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Topic 9: Sampling Distributions of Estimators

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

Notes 5 : More on the a.s. convergence of sums

Topic 9: Sampling Distributions of Estimators

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

LIMITS AND DERIVATIVES NCERT

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Lecture 2: Monte Carlo Simulation

Expectation and Variance of a random variable

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Transcription:

Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more observatios smooter estimates ad more bias wile smaller values of result i rouger estimates ad bigger variaces. Tus it is desirable to ave a automatic procedure tat picks te optimal value of i some sese from data. Here we cosider oe suc procedure least squares cross-validatio i te case of a uivariate desity estimatio. As te optimality criterio we use te itegrated MSE IMSE discussed i te previous lecture: IMSE E ˆf x fx dx b c xdx 4 b 4 + c + o f x dx 4 + u udu as c x was defied i Lecture as c x f x u udu; ad c c xdx udu as c x was defied i Lecture as c x fx udu. As sow i Lecture te IMSE-optimal badwidt is give by / c /5 b u du /5 f x dx /5 u u du /5 /5. Te discussio below follows Li ad Racie 007 see Sectio.3. Cross-validatio Ideally oe would like to compute te IMSE for differet values of owever fx appearig i te expressio of IMSE is ukow. Tus we sould attempt to approximate te IMSE usig data. Cosider te itegrated squared differece betwee ˆf ad f: ˆf x fx dx ˆf xdx ˆf xfxdx + f xdx.

Te last term does ot deped o ad terefore ca be igored. For te first term we ave ˆf xdx Xi x dx i Xi x Xj x dx i j u + u du i j i j we used te cage of variable u X i x/ x X i u X j x/ X j X i / + u dx du ad defied v uv + udu. Te fuctio v is called te covolutio kerel ad ca be computed for ay value v sice is kow ad cose by te ecoometricia. Te secod term i ivolves te ukow f ad terefore must be estimated. Sice for a fuctio ψx ψxfxdx EψX i oe ca estimate ψxfxdx by i ψx i. Hece a aive estimator of ˆf xfxdx would be i ˆf X i i j Tis approac owever does ot approximate ˆf xfxdx well as te expressio above cotais terms wit 0 wic produces 0. Tus te expressio must be modified to elimiate te terms wit 0: ˆf i X i i j i i Xj x ˆf i x j i.

is te so called leave-oe-out estimator of fx. Oe ca sow tat usig te leave-oe-out estimator of f produces a ubiased estimator of te leadig term of E ˆf xfxdx we data are iid ad te badwidt is fixed: E ˆf i E ˆf i X i i EE ˆf i X i X... X i X i+... X E ˆf i xf Xi X...X i X i+...x x X... X i X i+... X dx E ˆf i xfxdx by idepedece of X... X E ˆf i xfxdx E ˆf xfxdx te last equality olds sice E ˆf x E X j x/ does ot deped o te umber of observatios we is fixed. Te least squares cross-validatio criterio becomes CV i j i j i wic must be miimized umerically w.r.t to fid te optimal badwidt. Oe ca sow tat te leadig term of E CV captures tat of IMSE i. Lemma. Suppose tat Assumptios o te kerel fuctio i Lecture olds. Suppose tat data are iid wit te PDF f wic is four times cotiuously differetiable wit uiformly bouded derivatives. Te E CV f xdx + b 4 + c + o 4 +. Remark. i Note tat miimizig CV is equivalet to miimizig CV + f xdx. Hece te leadig term of E CV captures exactly te leadig term of IMSE E ˆf x fx dx. ii If oe does ot use te leave-oe-out estimator te tird term i te expressio for E CV becomes c 0/ wic is egative for typical kerels. As a result miimizig CV would produce 0 see Exercise.6 i Li ad Racie 007. Proof. Tis is a part of Exercise.6 i Li ad Racie 007. First re-write CV as CV 0 + CV + R 3 3

CV i j i i j i R i j i j i. i j i Note tat te first term i 3 captures oe of te leadig terms of te IMSE as 0 udu c. Sice u 0 oe ca easily sow tat R O p : P R > ɛ E R /ɛ wile E R E u v fufvdudv yfv + yfvdydv O. We ca re-write CV term as CV κ i j i κu u u. Next by te usual cage of variable argumet E CV κyfv + yfvdvdy 3 κy fv + j! f j vy j j + 4! f 4 v yy 4 4 fvdvdy j v y v y ad f j x dj fx dx j. 4

Usig te domiated covergece teorem ad te assumptio tat f 4 is uiformly bouded i.e. sup f 4 x < c x for some costat c > 0 we obtai: E CV 4 j0 j y j κydy j! f j vfvdv + o 4. Te proof ca ow be completed by sowig tat 0 j f j f x dx j vfvdv 0 j 3 f x dx j 4 wic ca be sow usig itegratio by parts ad tat j 0 0 j y j κydy 0 j 0 j 3 6 y ydy j 4 te result for j 4 ca be sow usig te biomial formula: for ay positive iteger p p p a + b p a i b b i. i i0 Lemma ca be used to sow tat CV E CV + s.o.p 4 s.o.p stads for smaller-order terms i probability. Te formula for CV ivolves expressios wit double summatios over i j: H X i X j. i j Suc statistics are called U-statistics. Teorem 4 i te Appedix sows ow U-statistics ca be approximated by usual averages of observatios ivolvig oly a sigle sum i wic i tur ca be used to sow 4. 5

A Appedix: U-statistics Let X... X be iid radom variables. Let H be a symmetric fuctio i.e. H : R R H a b Hb ad H is allowed to cage wit. A secod-order U-statistic is defied as U H X i X j i<j!!! deotes te umber of distict combiatios of two elemets out of elemets. Note tat a secod-order U-statistic is costructed by cosiderig all suc combiatios computig H X i X j for eac combiatio ad averagig over te combiatios sice te sum is take over i < j. Note also tat te order of X i ad X j is uimportat as H is symmetric. We are iterested i limitig teorems WLLNs ad CLTs for U-statistics. Te key step is trasformig a U-statistic ito averages of te form i g X i for some fuctio g : R R after wic usual WLLNs ad CLTs ca be applied. Suc a trasformatio is kow as Hájek s projectio. Te fuctio g x is costructed as g x E H X i X j X i x EH x X j EH X i x te secod equality olds by idepedece of X i ad X j ad te tird equality olds by te symmetry of H. Moreover µ Eg X i EE H X i X j X i EH X i X j. Projectig te U-statistic U i o observatio X i produces: E U X i E H X k X l X i. k<l Cosider E H X k X l X i for some fixed i. We i k ad i l E H X l X k X i EH X k X l µ. Te umber of suc terms is. 3 Te remaiig terms cotai X i ad terefore satisfy E H X l X k X i g X i. Hece E U X i µ + g X i µ + g X i µ + g X i µ. 5 6

Te projectio Û of U is defied as Û E U X i EU i µ + g X i µ. 6 i Note tat EÛ EU is acieved by subtractio of EU. Te scalig by i te secod term esures tat V arû/v aru as we will see from te followig lemma. Lemma. Suppose tat tere is c > 0 suc tat ξ V arg X i > c ξ ad ξ V arh X X are fiite ad as Te V arû/v aru. ξ ξ 0. Remark. Te coditio V arg X i > 0 rules out degeeracy of Û. A U-statistic satisfyig suc a coditio is called o-degeerate. Proof. First V arû 4 V arg X i 4 ξ. 7 For te U-statistic V aru i<j k<l Cov H X i X j H X k X l. Te covariace terms are zero we i j k l are all differet by idepedece. Tus a covariace term ca be o-zero we i te two variables i H X i X j coicide wit te two variables i H X k X l i.e. X i X k ad X j X l or ii H X i X j ad H X k X l ave oly oe variable i commo. Te cotributio from i to te variace of U is V arh X X V arh X X. To fid te cotributio from ii for eac i j select k i wic leaves coice for l j. Hece Cov H X i X j H X k X l E H X i X j µ H X i X l µ k<l EH X i X j H X i X l µ for some l j. Tus te cotributio from ii to te variace of U is 4 EH X i X j H X i X l µ. 7

Next EH X i X j H X i X l µ EE H X i X j H X i X l X i µ E E H X i X j X i E H X i X l X i µ Eg X i µ V arg X i. We ave ad terefore V aru V arh X X + 4 V arg X i ξ + 4 ξ V arû V aru 4ξ / ξ / + 4ξ / ξ / ξ + o +. Te asymptotic equivalece of te variaces of te U-statistic ad its projectio togeter wit equality of teir meas implies te asymptotic equivalece of U ad Û. Lemma 3. Suppose tat V arû/v aru. Te R U EU V aru Û EÛ V ar Û o p. Proof. By costructio ER 0. Tus if we sow tat V arr 0 te result follows by Cebysev s Markov s iequality. CovU V arr Û V aru V ar Û. 8 Cosider E U Û Û µ i i E U Û g X i EE U Û X i g X i 0 9 8

te last equality follows by 5 ad sice by costructio Te result i 9 implies tat EÛ X i µ + g X i µ. 0 EU Û µ EÛÛ µ CovU Û V arû. 0 It follows ow from 8 ad 0 tat V arû V arr V aru 0. We ca ow state te desired projectio result wic we ca be used to establis LLNs ad CLTs for te U-statistic U. Teorem 4 Projectio of o-degeerate secod-order U-statistics. Suppose tat te coditios of Lemma old. Te / U EU / g X i Eg X i + o p. i Moreover if ξ < d for some d > 0 ad all large U EU + O p /. Proof. Te first result follows from Lemmas ad 3 sice te result of Lemma 3 ca be re-writte as / U EU ξ + 4ξ / Û EÛ 4ξ + o p. Te secod result follows from te first by Cebysev s iequality ad sice V ar / g X i Eg X i ξ < d for all large. i 9