Kernel Density Estimation

Similar documents
The Priestley-Chao Estimator

IEOR 165 Lecture 10 Distribution Estimation

Order of Accuracy. ũ h u Ch p, (1)

Chapter 1. Density Estimation

Applications of the van Trees inequality to non-parametric estimation.

LECTURE 14 NUMERICAL INTEGRATION. Find

Solutions to the Multivariable Calculus and Linear Algebra problems on the Comprehensive Examination of January 31, 2014

Homework 1 Due: Wednesday, September 28, 2016

Numerical Differentiation

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

The Derivative as a Function

lecture 26: Richardson extrapolation

Math 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0

Fast optimal bandwidth selection for kernel density estimation

Fast Exact Univariate Kernel Density Estimation

Logistic Kernel Estimator and Bandwidth Selection. for Density Function

Hazard Rate Function Estimation Using Erlang Kernel

Function Composition and Chain Rules

Polynomial Interpolation

Section 3.1: Derivatives of Polynomials and Exponential Functions

Boosting Kernel Density Estimates: a Bias Reduction. Technique?

Convexity and Smoothness

7 Semiparametric Methods and Partially Linear Regression

Convexity and Smoothness

Basic Nonparametric Estimation Spring 2002

Math 1241 Calculus Test 1

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

158 Calculus and Structures

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

, meant to remind us of the definition of f (x) as the limit of difference quotients: = lim

The total error in numerical differentiation

Uniform Convergence Rates for Nonparametric Estimation

(4.2) -Richardson Extrapolation

Test 2 Review. 1. Find the determinant of the matrix below using (a) cofactor expansion and (b) row reduction. A = 3 2 =

NUMERICAL DIFFERENTIATION

Continuity. Example 1

232 Calculus and Structures

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

2.3 Product and Quotient Rules

Continuity and Differentiability Worksheet

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim

Higher Derivatives. Differentiable Functions

Section 15.6 Directional Derivatives and the Gradient Vector

Calculus I - Spring 2014

The Verlet Algorithm for Molecular Dynamics Simulations

Click here to see an animation of the derivative

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

1 + t5 dt with respect to x. du = 2. dg du = f(u). du dx. dg dx = dg. du du. dg du. dx = 4x3. - page 1 -

Continuity and Differentiability of the Trigonometric Functions

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Polynomial Interpolation

Unified estimation of densities on bounded and unbounded domains

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

4.2 - Richardson Extrapolation

Kernel Smoothing and Tolerance Intervals for Hierarchical Data

1 Lecture 13: The derivative as a function.

1 Introduction to Optimization

Chapter 4 Derivatives [ ] = ( ) ( )= + ( ) + + = ()= + ()+ Exercise 4.1. Review of Prerequisite Skills. 1. f. 6. d. 4. b. lim. x x. = lim = c.

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

Logarithmic functions

Chapters 19 & 20 Heat and the First Law of Thermodynamics

A = h w (1) Error Analysis Physics 141

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

How to Find the Derivative of a Function: Calculus 1

. If lim. x 2 x 1. f(x+h) f(x)

arxiv: v1 [math.dg] 4 Feb 2015

Exam 1 Solutions. x(x 2) (x + 1)(x 2) = x

UNIVERSITY OF MANITOBA DEPARTMENT OF MATHEMATICS MATH 1510 Applied Calculus I FIRST TERM EXAMINATION - Version A October 12, :30 am

( ) ! = = = = = = = 0. ev nm. h h hc 5-4. (from Equation 5-2) (a) For an electron:! = = 0. % & (b) For a proton: (c) For an alpha particle:

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h

MA455 Manifolds Solutions 1 May 2008

3.4 Worksheet: Proof of the Chain Rule NAME

A SHORT INTRODUCTION TO BANACH LATTICES AND

Very fast optimal bandwidth selection for univariate kernel density estimation

MVT and Rolle s Theorem

Math 242: Principles of Analysis Fall 2016 Homework 7 Part B Solutions

Polynomial Functions. Linear Functions. Precalculus: Linear and Quadratic Functions

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

Local Orthogonal Polynomial Expansion (LOrPE) for Density Estimation

We name Functions f (x) or g(x) etc.

Math 312 Lecture Notes Modeling

Combining functions: algebraic methods

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

Calculus I Practice Exam 1A

Name: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ).

3.1 Extreme Values of a Function

OSCILLATION OF SOLUTIONS TO NON-LINEAR DIFFERENCE EQUATIONS WITH SEVERAL ADVANCED ARGUMENTS. Sandra Pinelas and Julio G. Dix

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Math 161 (33) - Final exam

Taylor Series and the Mean Value Theorem of Derivatives

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

AMS 147 Computational Methods and Applications Lecture 09 Copyright by Hongyun Wang, UCSC. Exact value. Effect of round-off error.

Runge-Kutta methods. With orders of Taylor methods yet without derivatives of f (t, y(t))

1 (10) 2 (10) 3 (10) 4 (10) 5 (10) 6 (10) Total (60)

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5

Practice Problem Solutions: Exam 1

Transcription:

Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative distribution function (cdf) F (x). We ave tat f(x) = lim 0 F (x + ) F (x) Let F n (x) = 1 n ni=1 I(X i x) be te empirical distribution function (edf) of te data. Consider a two-sided or central difference estimator of f. ie. wic can be written as ˆf(x) = 1 ˆf(x) = F n(x + /2) F n (x /2) n ( ) x Xi K = 1 n K (x X i ) i=1 n i=1 were K is te U( 0.5, 0.5) pdf. Note tat te notation K (u) = 1 K(u/). Te kernel density estimate ierits te smootness properties of K so we may replace te above Uniform K wit any symmetric pdf wic is symmetric about zero. ie. kernel K wit support ( τ, τ) is suc tat: (i) τ τ K(u)du = 1 (ii) τ τ uk(u)du = 0 wic implies tat K( u) = K(u) (iii) τ τ u2 K(u)du = σ 2 K < Suc a kernel is said to be of second order meaning tat its first moment is zero and its second one is finite. It is possible to define iger order kernels. For estimation at a single point x, a natural measure of discrepancy is te mean square error (MSE) defined by MSE[ ˆf(x)] = E[ ˆf(x) f(x)] 2 = [E( ˆf(x)) f(x)] 2 + V ( ˆf(x)) = [bias( ˆf(x)] 2 + V ( ˆf(x)) 1

For a global measure of te discrepancy between ˆf and f we can use te mean integrated square error (MISE) defined by MISE( ˆf) = E ( ˆf(x) f(x)) 2 dx Since te integrand is non-negative, te order of expectation and integration can be reversed to give MISE( ˆf) = E( ˆf(x) f(x)) 2 dx = MSE( ˆf(x))dx = (E ˆf(x) f(x)) 2 dx + V ( ˆf(x))dx = ISB( ˆf) + IV ( ˆf) were ISB denotes integrated squared bias and IV integrated variance. Now, Let u = x t E ˆf(x) = ( ) 1 x t K f(t)dt so tat du = 1 dt du =. Terefore, dt E ˆf(x) = K(u)f(x u)du A Taylor series expansion gives f(x u) = f(x) uf (x) + 1 2 2 u 2 f (x) +... Terefore, E ˆf(x) = [ K(u) f(x) uf (x) + 1 ] 2 2 u 2 f (x) +... du = f(x) + 1 2 2 σ 2 Kf (x) + o( 2 ) Tus, bias( ˆf(x)) 2 2 σ2 Kf (x) so tat [bias( ˆf(x))] 2 4 4 σ4 K(f (x)) 2. Hence, ISB[ ˆf(x)] 2 4 4 σ4 K (f (x)) 2 dx = 4 4 σ4 KR(f (x)) 2

were R(f (x)) = (f (x)) 2 dx. V ( ˆf(x)) = 1 n = 1 n = 1 ( ) 1 x t 2 K f(t)dt 1 { 1 ( x t K 2 n ( ) 1 x t 2 K f(t)dt 1 { } 2 f(x) + bias( ˆf(x)) 2 n f(x u)k(u) 2 du 1 n {f(x) + O(2 )} 2 ) } 2 f(t)dt using te transformation u = x t and te previously calculated approximation to te bias( ˆf(x)). If we now expand f(x u) as a Taylor series ten we get Hence, V ( ˆf(x)) = 1 [f(x) uf (x) +...]K(u) 2 du + O(n 1 ) = 1 f(x) 1 f(x) IV ( ˆf) = K(u) 2 du + O(n 1 ) K(u) 2 du 1 = R(K) V ( ˆf(x)) 2 dx K(u) 2 du Terefore, te asymptotic MISE (AMISE) of ˆf(x) is given by AMISE( ˆf(x)) = 4 4 σ4 KR(f (x)) + R(K) Hence, our estimator ˆf(x) is consistent in MISE provided as n, 0 and. Multivariate Density Estimation Suppose now tat we ave a random sample of p-variate data X 1,..., X n from an unknown continuous distribution wit pdf f(x). Here X i = (X i1,..., X ip ) T. We define te product kernel density estimator to be 3

ˆf(x) = 1 1... p ( ) n p xj X ij K i=1 j=1 j were te point of estimation x = (x 1,..., x p ) T. We will be referring to tis estimator later in te module. Examples Te Old Faitful geyser in Yellowstone National park erupts regularly and te data considered ere are te duration times (in minutes) of n = 222 eruptions recorded over a 16 day period. Figures 1-3 sow tree kernel density estimates based on tree different levels of smooting. In eac case a fixed Normal kernel was used. In figure 1, = 0.05 and te resulting estimate is noisy but tere is evidence of two main peaks corresponding to eruptions aving sort or long durations. Te bimodal nature of te density estimate is muc clearer in figure 2 were te two peaks are estimated more smootly and tey are not as ig as in figure 1. Tese features are carried on into figure 3 were = 0.4. Te data are now over-smooted but te bimodal caracteristic is still clear. geyser kde plot1.pdf Probability density function 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 geyser$duration Figure 1: Old Faitful geyser eruption duration time data - a kernel density estimate based on a Normal kernel and = 0.05. Te tick marks on te x-axis correspond to te actual n = 222 data values. 4

geyser kde plot2.pdf Probability density function 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 geyser$duration Figure 2: Old Faitful geyser eruption duration time data - a kernel density estimate based on a Normal kernel and = 0.17. Te tick marks on te x-axis correspond to te actual n = 222 data values. geyser kde plot3.pdf Probability density function 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 geyser$duration Figure 3: Old Faitful geyser eruption duration time data - a kernel density estimate based on a Normal kernel and = 0.40. Te tick marks on te x-axis correspond to te actual n = 222 data values. Also recorded for eac eruption was te time interval (in minutes) until te next eruption, tus giving a bivariate set of data. Figure 4 sows a bivariate kernel density estimate based on product Normal kernels and 1 = 0.44, 2 = 5.20. Te bivariate density estimate is bimodal and it is evident 5

tat te time interval until te next eruption is positively correlated wit te duration of te eruption. geyser kde plot4.pdf 0.020 0.015 Density function 0.010 0.005 0.000 90 80 geyser.interval 70 60 50 2 3 4 geyser.duration 5 Figure 4: Old Faitful geyser eruption duration time and interval time data - a bivariate kernel density estimate based on product Normal kernels wit 1 = 0.44, 2 = 5.20. 6