Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative distribution function (cdf) F (x). We ave tat f(x) = lim 0 F (x + ) F (x) Let F n (x) = 1 n ni=1 I(X i x) be te empirical distribution function (edf) of te data. Consider a two-sided or central difference estimator of f. ie. wic can be written as ˆf(x) = 1 ˆf(x) = F n(x + /2) F n (x /2) n ( ) x Xi K = 1 n K (x X i ) i=1 n i=1 were K is te U( 0.5, 0.5) pdf. Note tat te notation K (u) = 1 K(u/). Te kernel density estimate ierits te smootness properties of K so we may replace te above Uniform K wit any symmetric pdf wic is symmetric about zero. ie. kernel K wit support ( τ, τ) is suc tat: (i) τ τ K(u)du = 1 (ii) τ τ uk(u)du = 0 wic implies tat K( u) = K(u) (iii) τ τ u2 K(u)du = σ 2 K < Suc a kernel is said to be of second order meaning tat its first moment is zero and its second one is finite. It is possible to define iger order kernels. For estimation at a single point x, a natural measure of discrepancy is te mean square error (MSE) defined by MSE[ ˆf(x)] = E[ ˆf(x) f(x)] 2 = [E( ˆf(x)) f(x)] 2 + V ( ˆf(x)) = [bias( ˆf(x)] 2 + V ( ˆf(x)) 1
For a global measure of te discrepancy between ˆf and f we can use te mean integrated square error (MISE) defined by MISE( ˆf) = E ( ˆf(x) f(x)) 2 dx Since te integrand is non-negative, te order of expectation and integration can be reversed to give MISE( ˆf) = E( ˆf(x) f(x)) 2 dx = MSE( ˆf(x))dx = (E ˆf(x) f(x)) 2 dx + V ( ˆf(x))dx = ISB( ˆf) + IV ( ˆf) were ISB denotes integrated squared bias and IV integrated variance. Now, Let u = x t E ˆf(x) = ( ) 1 x t K f(t)dt so tat du = 1 dt du =. Terefore, dt E ˆf(x) = K(u)f(x u)du A Taylor series expansion gives f(x u) = f(x) uf (x) + 1 2 2 u 2 f (x) +... Terefore, E ˆf(x) = [ K(u) f(x) uf (x) + 1 ] 2 2 u 2 f (x) +... du = f(x) + 1 2 2 σ 2 Kf (x) + o( 2 ) Tus, bias( ˆf(x)) 2 2 σ2 Kf (x) so tat [bias( ˆf(x))] 2 4 4 σ4 K(f (x)) 2. Hence, ISB[ ˆf(x)] 2 4 4 σ4 K (f (x)) 2 dx = 4 4 σ4 KR(f (x)) 2
were R(f (x)) = (f (x)) 2 dx. V ( ˆf(x)) = 1 n = 1 n = 1 ( ) 1 x t 2 K f(t)dt 1 { 1 ( x t K 2 n ( ) 1 x t 2 K f(t)dt 1 { } 2 f(x) + bias( ˆf(x)) 2 n f(x u)k(u) 2 du 1 n {f(x) + O(2 )} 2 ) } 2 f(t)dt using te transformation u = x t and te previously calculated approximation to te bias( ˆf(x)). If we now expand f(x u) as a Taylor series ten we get Hence, V ( ˆf(x)) = 1 [f(x) uf (x) +...]K(u) 2 du + O(n 1 ) = 1 f(x) 1 f(x) IV ( ˆf) = K(u) 2 du + O(n 1 ) K(u) 2 du 1 = R(K) V ( ˆf(x)) 2 dx K(u) 2 du Terefore, te asymptotic MISE (AMISE) of ˆf(x) is given by AMISE( ˆf(x)) = 4 4 σ4 KR(f (x)) + R(K) Hence, our estimator ˆf(x) is consistent in MISE provided as n, 0 and. Multivariate Density Estimation Suppose now tat we ave a random sample of p-variate data X 1,..., X n from an unknown continuous distribution wit pdf f(x). Here X i = (X i1,..., X ip ) T. We define te product kernel density estimator to be 3
ˆf(x) = 1 1... p ( ) n p xj X ij K i=1 j=1 j were te point of estimation x = (x 1,..., x p ) T. We will be referring to tis estimator later in te module. Examples Te Old Faitful geyser in Yellowstone National park erupts regularly and te data considered ere are te duration times (in minutes) of n = 222 eruptions recorded over a 16 day period. Figures 1-3 sow tree kernel density estimates based on tree different levels of smooting. In eac case a fixed Normal kernel was used. In figure 1, = 0.05 and te resulting estimate is noisy but tere is evidence of two main peaks corresponding to eruptions aving sort or long durations. Te bimodal nature of te density estimate is muc clearer in figure 2 were te two peaks are estimated more smootly and tey are not as ig as in figure 1. Tese features are carried on into figure 3 were = 0.4. Te data are now over-smooted but te bimodal caracteristic is still clear. geyser kde plot1.pdf Probability density function 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 geyser$duration Figure 1: Old Faitful geyser eruption duration time data - a kernel density estimate based on a Normal kernel and = 0.05. Te tick marks on te x-axis correspond to te actual n = 222 data values. 4
geyser kde plot2.pdf Probability density function 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 geyser$duration Figure 2: Old Faitful geyser eruption duration time data - a kernel density estimate based on a Normal kernel and = 0.17. Te tick marks on te x-axis correspond to te actual n = 222 data values. geyser kde plot3.pdf Probability density function 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 geyser$duration Figure 3: Old Faitful geyser eruption duration time data - a kernel density estimate based on a Normal kernel and = 0.40. Te tick marks on te x-axis correspond to te actual n = 222 data values. Also recorded for eac eruption was te time interval (in minutes) until te next eruption, tus giving a bivariate set of data. Figure 4 sows a bivariate kernel density estimate based on product Normal kernels and 1 = 0.44, 2 = 5.20. Te bivariate density estimate is bimodal and it is evident 5
tat te time interval until te next eruption is positively correlated wit te duration of te eruption. geyser kde plot4.pdf 0.020 0.015 Density function 0.010 0.005 0.000 90 80 geyser.interval 70 60 50 2 3 4 geyser.duration 5 Figure 4: Old Faitful geyser eruption duration time and interval time data - a bivariate kernel density estimate based on product Normal kernels wit 1 = 0.44, 2 = 5.20. 6