Econometric Theory, 2, 2005, 03 057+ Printed in the United States of America+ DOI: 0+070S0266466605050528 EXACT MEAN INTEGRATED SQUARED ERROR OF HIGHER ORDER KERNEL ESTIMATORS BRUCE E. HANSEN University of Wisconsin The exact mean integrated suared error ~MISE! of the nonparametric kernel density estimator is derived for the asymptotically optimal smooth polynomial kernels of Müller ~984, Annals of Statistics 2, 766 774! and the trapezoid kernel of Politis and Romano ~999, Journal of Multivariate Analysis 68, 25! and is used to contrast their finite-sample efficiency with the higher order Gaussian kernels of Wand and Schucany ~990 Canadian Journal of Statistics 8, 97 204!+ We find that these three kernels have similar finite-sample efficiency+ Of greater importance is the choice of kernel order, as we find that kernel order can have a major impact on finite-sample MISE, even in small samples, but the optimal kernel order depends on the unknown density function+ We propose selecting the kernel order by the criterion of minimax regret, where the regret ~the loss relative to the infeasible optimum! is maximized over the class of two-component mixturenormal density functions+ This minimax regret rule produces a kernel that is a function of sample size only and uniformly bounds the regret below 2% over this density class+ The paper also provides new analytic results for the smooth polynomial kernels, including their characteristic function+. INTRODUCTION This paper is concerned with the choice of kernel for univariate nonparametric density estimation+ A variety of kernel functions have been proposed+ Some references include Parzen ~962!, Epanechnikov ~969!, Müller ~984!, Silverman ~986!, Wand and Schucany ~990!, Scott ~992!, Politis and Romano ~999!, and Abadir and Lawford ~2004!+ A common measure to evaluate the efficiency of a density estimator is mean integrated suared error ~MISE!+ As asymptotic MISE ~AMISE! is easier to calculate than exact MISE, it has been common to analyze AMISE and adopt kernel functions that minimize AMISE+ Marron and Wand ~992! refocused attention on the exact finite-sample MISE in their study of the higher order This research was supported in part by the National Science Foundation+ I thank Oliver Linton and a referee for helpful comments and suggestions that improved the paper+ Address correspondence to Bruce E+ Hansen, Department of Economics, 80 Observatory Drive, University of Wisconsin, Madison, WI 53706, USA; e-mail: bhansen@ssc+wisc+edu+ 2005 Cambridge University Press 0266-4666005 $2+00 03
032 BRUCE E. HANSEN Gaussian kernels of Wand and Schucany ~990!, showing that AMISE can be uite misleading in many cases of interest+ We extend this literature by developing new expressions for the exact MISE of several important kernel density estimators, including the asymptotically optimal smooth symmetric polynomial kernels of Müller ~984!, the infinite-order Dirichlet kernel, and the trapezoid kernel of Politis and Romano ~999!+ We compute the MISE when the sampling densities are normal mixtures as in Marron and Wand ~992!+ As an important by-product, we obtain convenient expressions for the characteristic functions of the smooth polynomial kernels, a class that includes the Epanechnikov, the biweight, and most higher order polynomial kernels used in empirical practice+ Kernel characteristic functions are useful for many purposes, including computationally efficient density estimation as discussed in Silverman ~986!+ Our calculations show that the choice between the higher order Gaussian kernels and the smooth polynomial kernels has only a small impact on MISE, whereas the choice of kernel order has a large impact on MISE+ Efficiency gains ~and losse from higher order kernels can be uite substantial even in small samples+ The practical problem is that the ideal kernel order depends on the unknown density+ To provide a practical solution to the problem of kernel order selection, we suggest the decision theoretic criterion of minimax regret+ This criterion picks the kernel order to minimize the maximal percentage deviation between the actual MISE and the infeasible optimal MISE over a set of candidate density functions+ The minimax regret kernel order is a function only of sample size and thus can be implemented in practice+ We calculate the minimax regret rule by maximizing the regret over all two-component mixture-normal densities+ We find that the minimax regret kernel order is increasing in sample size, and this rule uniformly bounds the regret below 2%+ We should mention two caveats at the outset+ First, higher order kernel density estimators are not constrained to be nonnegative and thus may violate the properties of a density function+ A practical solution to this problem ~when it arise is to remove the negative part of the estimator and rescale the positive part to integrate to one+ As shown by Hall and Murison ~993! this does not affect the AMISE+ However, the impact of this procedure on the finite-sample MISE is unclear+ Second, our MISE calculations implicitly assume that the density has support on the entire real line ~our analysis is for mixture-normal densitie+ The existence of a boundary can have substantial effects on the estimation problem and hence on the MISE+ The remainder of the paper is organized as follows+ Section 2 introduces notation+ Section 3 presents new results concerning smooth polynomial kernels+ Section 4 computes the characteristic functions and roughness of these kernels+ Section 5 presents the exact MISE of the density estimates, when the underlying density is a mixture of normals as in Marron and Wand ~992!+ Section 6
MISE OF HIGHER ORDER KERNEL ESTIMATORS 033 presents a numerical analysis of the MISE of the various kernels+ We find that the MISE can be greatly influenced by the order of the kernel, even in small samples, and that the selection of kernel order is essential to accurate estimation+ We find that the higher order Gaussian kernels of Wand and Schucany ~990! are the most reasonable candidates for finite-sample efficient estimation+ Section 7 proposes the selection of kernel order using minimax regret+ Section 8 is a brief conclusion+ Proofs are presented in the Appendix+ Further numerical results and details, in addition to the compute code for the calculations, are available at http:00www+ssc+wisc+edu0;bhansen0papers0mise+html+ 2. NOTATION For any function g, let g ~m! ~x! ~d m 0dx m!g~x! denote its mth derivative+ Let f~x!, f s ~x! s f~x0, and f ~m! s ~x! s m f ~m! ~x0 denote the standard normal density, the scaled normal density, and its mth derivative, respectively+ We call K~x! a symmetric kernel function if K~x! K~x! and * R K~x! dx + Following Müller ~984!, we say that K is s-smooth if K ~ is absolutely continuous on R+ If the kernel has bounded support, this reuires that the ~s!th derivative euals zero at the boundaries+ For integer r, a kernel function K is of order 2r if * x k K~x! dx 0 for all k 2r yet * x 2r K~x! dx 0+ In particular, a kernel is second-order if * x 2 K~x! dx 0+ Pochhammer s symbol is denoted by n ~a! n ) ~a j! j0 G~a n! + G~a! A useful result is Legendre s duplication formula: ~2m!! 2m m!2 + 2m ) We will be making extensive use of special functions, for which a useful reference is Magnus, Oberhettinger, and Soni ~966!+ Many can be represented as generalized hypergeometric functions ~a! j +++~a p! j x p F ~a,+++,a p ; c,+++,c ; x! j j0 ~c! j +++~c! j j! + 2) For a review of the latter see Abadir ~999!+ In particular, for integer m 0, our results will make use of the Gegenbauer polynomials C l m ~x! @m02# ~! k G~l m k!~2x! m2k G~l! k0 k!~m 2k!! 3)
034 BRUCE E. HANSEN and the spherical Bessel function j m ~x! k0 2 x m 3 2m ~! k 3 k! 2mk 0 F 3 2 2 x m2k m; x 2 4 + 4) The Gegenbauer polynomials have several hypergeometric representations, including 2~l! x l C 2m ~x! ~! m m ~m!! 2 F m, m l; 3 2 ; x 2+ 5) See Magnus et al+ ~966, p+ 220!+ 3. SMOOTH POLYNOMIAL KERNELS Using a random sample with n real observations X,+++,X n drawn independently from a distribution with density f ~x!, the nonparametric Rosenblatt kernel estimator of f ~x! is f n ~x! n K x X i, nh i h where K is a kernel function and h is a bandwidth+ The MISE of f n is MISE n ~h, K, f! E~ f n ~x! f ~x!! 2 dx+ For the moment, restrict attention to s-smooth, second-order kernels with bounded support+ Müller ~984! and Granovsky and Müller ~99! found that the MISE of f ~ n is minimized by setting K to eual 2s M s ~x! ~ x 2! s + 6) This class includes the uniform ~s 0!, Epanechnikov ~s!, biweight ~s 2!, and triweight ~s 3!+ Furthermore, as s diverges, the optimal smooth kernel approaches the Gaussian kernel, as
MISE OF HIGHER ORDER KERNEL ESTIMATORS 035 lim sr s M2s M2s M x f~x!+ Now consider the more general case of s-smooth bounded-support kernels of order 2r+ Müller ~984! and Granovsky and Müller ~99! found that the MISE of f n ~ is minimized by using a polynomial kernel of order s r inx 2 + We now present several alternative representations for this kernel+ THEOREM + Müller s s-smooth, 2rth-order kernel on @,# is M 2r, s ~x! B r, s ~x!m s ~x!, 7) where B r, s ~x! 2r ~!k 3 3 2 r sr 2 s r k ~s! r k0 k!~r k!! 3 2k x 2k 8) ~! r Furthermore s 2 m0 2r ~s! rs 2 x C s02 2r ~x! 9) r s 2m2m ~! m ~s m!! 2 C s02 2m ~x!+ 0) r M 2r, s ~x! ~! m m0 2ms m!2 2m 22ms M ~2m! 2ms ~x!+ ) Euation ~8! gives an explicit formula for the kernel, whereas ~9! ~! give alternatives+ For numerical computation, the formula ~7! ~8! is numerically reliable+
G E I E 036 BRUCE E. HANSEN Granovsky and Müller ~99! examined the limit as s r + They showed that lim sr M2s M 2r, s x M2s G 2r ~x! ~!r f~2r! ~x! 2 r ~r!!x + 2) Here G 2r are the higher order Gaussian kernels of Wand and Schucany ~990! and Marron and Wand ~992!+ We now show that as the order 2r increases, both classes of kernels approach the infinite-order Dirichlet kernel D~x! sin~x!0~px!+ lim rr lim rr THEOREM 2+ 2r M 2r, s THEOREM 3+ x 2r D~x!+ 3) 2r M2r 2 G x D~x!+ 4) M2r 2 4. CHARACTERISTIC FUNCTION AND ROUGHNESS For any function g, let g~t! * exp~itx!g~x! dx denote the characteristic function of g~x!+ For example, the characteristic function of the normal density is f~t! exp~t 2 02!+ For the higher order Gaussian kernels, the characteristic function G 2r ~t! exp~t 2 02! r k0 t 2k 02 k k! is given by Wand and Schucany ~990, p+ 200!+ However, the characteristic function has not been calculated previously for the smooth polynomial kernels M 2r, s + We now show that they consist of linear combinations of spherical Bessel functions as defined in ~4!+ THEOREM 4+ M s ~t! 2s 3 2 s t j s ~t!+ THEOREM 5+ MG 2r, s ~t! 2 Mp 2 r ts m0 a s ~m!j s2m ~t!, 5)
where a s ~m! G 2 m s 2 2m s m! + 6) For computational purposes, when t is large, the series ~4! can be numerically unstable, in which case it is preferable to compute j m ~t! using the recurrence 2m j m ~t! j m ~t! j m ~t! 7) t with the initial conditions MISE OF HIGHER ORDER KERNEL ESTIMATORS 037 j 0 ~t! sin~t! t 8) and j ~t! sin~t! t 2 cos~t! + t Combining this with Parseval s euality, we can use ~5! to calculate the roughness of M 2r, s + THEOREM 6+ R~M 2r, s! M 2r, s ~x! 2 dx ~2! p r r l0 m0 ~6l m6 s!a s ~l!a s ~m! ~l m!~m l! 2 l m 2s + Politis and Romano ~999! introduced a class of infinite-order general flattop kernels that have flat characteristic functions at the origin+ Their preferred kernel is l 2 ~x! 2~sin2 ~x! sin 2 ~x02!! px 2, which has the trapezoidal characteristic function ld 2 ~t! ~2 6t 6! ~ 6t 6!,
038 BRUCE E. HANSEN where ~t! max~t,0!+ We thus call l 2 ~t! the trapezoid kernel+ It is easy to calculate that R~l 2! 40~3p!+ Figure a displays the four kernels M 4,8, M 0,8, M 20,8, and l 2 + The kernels have been rescaled and normalized so that all have eual roughness+ The basic shapes of the kernels are similar, with the waviness of the kernels increasing in kernel order+ The second panel displays the characteristic functions of the same kernels ~again normalized to have eual roughnes+ Here it is easier to see the Figure. ~a! Polynomial and Trapezoid Kernel Functions+ ~b! Polynomial and Trapezoid Characteristic Functions+
D D E E E MISE OF HIGHER ORDER KERNEL ESTIMATORS 039 contrasts+ As the kernel order increases, the characteristic functions are increasingly flat at the origin, with the high-order polynomial characteristic function developing a significant negative lobe+ 5. EXACT MISE The exact MISE of the kernel density estimator f n can be written as MISE n ~h, K, f! R~K! nh ni 2 ~h, K, f! 2I ~h, K, f! I 0 ~ f!, 9) where I j ~h, K, f! p 0 K~ht! j f D~t! 2 dt ph 0 K~t! j D f h t 2 dt 20) with K and fd denoting the characteristic functions of K and f+ Euation ~9! is shown, for example, in euation ~3+! of Chiu ~99!+ We write I 0 ~h, K, f! I 0 ~ f! because it is independent of K and h+ Marron and Wand ~992! proposed a test set of mixture-normal density functions for which ~9! is feasible to calculate+ The Marron Wand densities take the form f ~x! w k f sk ~x m k!+ 2) k The characteristic function of ~2! is f ~t! w k e s k 2 t 2 02 exp~itm k! k with suare f ~t! 2 i k w i w k e s ik 2 t 2 02 cos~tm ik!, 22) where s 2 ik s 2 i s 2 k and m ik m i m k + We follow Marron and Wand and restrict attention to this class of density functions for our calculation of exact MISE+ From ~22! and euation ~A+7! in the Appendix, or Theorem 2+ of Marron and Wand ~992!, we find that I 0 ~ f! i k w i w k f sik ~m ik!+ To calculate MISE n ~h, K, f! we also need the integrals I ~h, K, f! and I 2 ~h, K, f!, which may be calculated by numerical or analytic integration+ Although we use numerical integration to calculate MISE in the next section we use analytic integration to verify their accuracy+ Our first result is the integrals for the smooth polynomial kernels+
040 BRUCE E. HANSEN THEOREM 7+ For the mixture-normal densities 2), r I ~h, M 2r, s, f! ~! m a s ~m!u s ~h, m!, m0 r r I 2 ~h, M 2r, s, f! l0 m0 ~! lm a s ~l!a s ~m!v s ~h, l, m!, where a s ~m! are defined in 6), and U s ~h, m! i k w i w k a0 2 h 2m2a f ~2m2a! sik ~m ik! a!gs 2m a 3 2, 23) V s ~h, l, m! i k a0 w i w k ~2a 2s 2m 2l!! 2 h 2l2m2a f ~2l2m2a! sik ~m ik! ~a 2s 2m 2l!!Ga s 2l 3 2Ga s 2m 3 2a! + 24) Our second result is the integrals for the trapezoid and Dirichlet kernels+ This extends the work of Davis ~98!, who made similar calculations for D when f f+ THEOREM 8+ For the mixture-normal densities 2), I ~h, l 2, f! 2C 02 2 h hc 2 h C 0 h hc h, I 2 ~h, l 2, f! 4C 02 2 h 4hC 2 h h 2 C 302 2 h 3C 02 h 4hC h h 2 C 302 h, I ~h, D, f! I 2 ~h, D, f! C 02 h,
D MISE OF HIGHER ORDER KERNEL ESTIMATORS 04 where C h ~u! 2p i k w i w k 2 h s ik 2h a0 ga h, u 2 2 s ik a! 2a 2 m 2 a ik 2s ik 2 25) and x g~a, x! t 0 a e t dt is the incomplete gamma function. Theorems 7 and 8 express the integrals I ~h, K, f! and I 2 ~h, K, f! as convergent infinite series+ Methods to numerically evaluate infinite series and other special functions are discussed in detail in Zhang and Jin ~996!+ As they discuss, numerical evaluation of infinite series can fail as a result of accumulated round-off error when there are alternating signs, which occurs in our expressions+ To investigate their accuracy, we numerically compared the expressions in Theorems 7 and 8 with numerical integration for the Marron Wand densities+ We used Gauss Legendre uadrature with 8,000 grid points over @0, T # where T was selected so that the numerical integral of p * T 0 f ~t! 2 dt was within 0+0% of the exact value * f ~x! 2 dx+ The results were plotted against h+ ~See Hansen, 2003+! We found that in some cases the terms in the series expansions grow sufficiently large such that the numerical results are unreliable+ This occurred for the polynomial kernels for large h, r, and0or s and also for the trapezoid kernel for very small h in a few cases+ We concluded that numerical integration was more reliable for evaluation of I ~h, K, f! and I 2 ~h, K, f! for these kernels+ 6. ANALYSIS OF MISE For each kernel we computed MISE n ~h, K, f! onagridof3,000 values of h over the region @0,3h * #, where h * is the Silverman rule-of-thumb bandwidth for a sample size of 50+ Plots of MISE n ~h, K, f! against h for each model and kernel indicated that this region appeared to contain the global minimum for all cases+ ~See Hansen, 2003+! For given K and f the optimal finite-sample MISE is MISE n ~K, f! min MISE n ~h, K, f!+ h For the kernel classes G 2r and M 2r, s, MISE n ~K, f! was plotted as a function of r+ Figures 2 and 3 display these plots for the Marron Wand densities ~the Gaussian density! and 3 ~strongly skewed density!+ These densities are shown
042 BRUCE E. HANSEN Figure 2. MISE, Density + here because are they are extreme cases regarding the gain or loss from the use of higher order kernels+ Plots for other densities can be seen in Hansen ~2003!+ Plotted are the MISE for the Gaussian kernels G 2r and the smooth polynomial kernels M 2r,, M 2r,5, and M 2r,8, for 2r 2,+++,20, and for the sample sizes of n 50, 200, 500, and,000+ The dotted flat lines are the MISE of D and l 2,
MISE OF HIGHER ORDER KERNEL ESTIMATORS 043 Figure 3. MISE, Density 3+ which are independent of r+ For each plot, the MISE n ~K, f! is normalized relative to min K MISE n ~K, f!+ First examine the flat-top kernels D and l 2 + In nearly all cases, the trapezoid kernel l 2 does much better than the Dirichlet kernel D, and in most cases l 2 performs uite well relative to other kernel choices+ However, it never has as low MISE as the best Gaussian or polynomial kernel+
044 BRUCE E. HANSEN Second, compare the smooth polynomial kernels M 2r, s across the smoothness index s+ We observe that increasing the smoothness order typically improves the MISE, but the difference diminishes in larger samples+ Indeed, for very large samples ~not shown!, the best MISE is for small s, but the gain is minimal+ There also appears to be little gain achieved by increasing s above s 8 ~again not shown!, so setting s 8 emerges as a practical recommendation+ Third, compare the MISE of the smooth polynomial kernel M 2r,8 and the Gaussian kernel G 2r + In most cases the difference in MISE is uite small, with the Gaussian kernels achieving lower MISE in small samples and the reverse in large samples+ Examining the plots of MISE n ~K, f!, it is clear that there is no uniformly optimal choice of kernel K ~including kernel order! across sample sizes n and densities f+ For any given f there may be a best choice of kernel among those we consider, but this choice will vary across f+ A data-dependent method such as cross-validation can possibly be used to select kernel order, which might enable estimation to adapt to the unknown density f+ This is a difficult problem, however, and is not pursued further in this paper+ 7. MINIMAX REGRET KERNELS Let K denote a class of kernels, such as the Gaussian class $G 2r ; r % or the polynomial class $M 2r, s ; r ; s %+ The problem raised at the end of the previous section is how to select a kernel K K when it is known that the finite-sample optimal choice varies with the true underlying density f+ This section proposes a pragmatic solution based on minimax regret, a concept in statistical decision theory dating back to Savage ~95!+ For any density f and sample size n, there is a best possible kernel choice that can be defined as K n ~ f! argmin ln MISE * n ~K, f!+ KK Note that we implicitly define our loss function as ln MISE n * ~K, f! to eliminate scale effects+ The regret associated with the use of an arbitrary kernel K is Regret n ~ f, K! ln MISE n ~K, f! ln MISE n ~K n ~ f!, f! ln MISE n~k, f! MISE n ~K n ~ f!, f!, the percentage deviation of the realized MISE from the optimal value+ For a compact class of density functions F the maximal regret associated with K is the maximum over f F Regret n ~K! max Regret n ~ f, K!+ ff
The minimax regret rule is to pick K to minimize the maximum regret K * n argmin Regret n ~K! KK with the associated minimized regret Regret n * Regret n ~K n *! min KK MISE OF HIGHER ORDER KERNEL ESTIMATORS 045 max Regret n ~ f, K!+ ff Note that whereas the small-sample optimal kernel K n ~ f! depends on the unknown density f in addition to n, the minimax regret kernel K * n only depends on the sample size n+ The kernel K * n guarantees that if the true density f belongs to F, the percentage deviation of the MISE from the optimum is bounded by Regret * n + For the density class F we use the Marron Wand mixture-normal class ~2! with 2: f ~x! w s f x m ~ w! s s 2 f x m 2+ s 2 Many of the Marron Wand test densities are in this class, and two normal components are sufficient to generate most density shapes of interest+ Because the regret criterion is invariant to location and scale, we select m 2 and s 2 so that EX 0 and Var~X!, so the density class is fully described by the remaining parameters ~w, m, s!+ We use a grid over these parameters, varying each in increments of 0+ over all feasible values+ This results in a density class with,887 elements+ Setting K to eual the Gaussian kernel class G 2r, Figure 4 plots Regret n ~G 2r! as a function of 2r for n 25, 00, 400, and 800+ The minimax regret solution r * n locates its lowest value for any curve+ We can see that this solution r * n varies from 2 to 8 for the four sample sizes+ We calculated the minimax regret r * n for the Gaussian kernels G 2r as a function of sample size, and we present the findings in Table + The optimal kernel order r * n is strictly increasing with n+ The magnitudes may be surprising to many readers, as the minimax kernel orders are higher than typically used in practice+ Let G * n denote the Gaussian higher order kernel with r r * n, and similarly M * n, s + These are feasible kernel choices as they vary only with sample size+ We now compare the regret of the feasible kernel choices f, G * n, M * n,2, and l 2 + Figure 5 plots Regret * n for these kernels as a function of n where the general kernel class includes G 2r, M 2r,2, and l 2 + We see that the standard normal kernel f has regret that is increasing in n and is clearly dominated by the other choices except when the sample size is very small+ The trapezoid kernel l 2 has high
046 BRUCE E. HANSEN Figure 4. Regret of Gaussian Kernels+ regret for small samples but has comparable regret with the optimal Gaussian and polynomial kernels for large samples+ The optimal Gaussian and polynomial kernels have similar regret for most sample sizes, with that for G n * slightly lower+ Because G n * is somewhat easier to manipulate numerically, it is our recommendation for empirical practice+ A sensible alternative choice is the Table. Minimax regret Gaussian kernel order by sample size: Regret calculated over twocomponent normal mixture densities 2r n * n 2 n 40 4 n 40 6 n 250 8 n 530 0 n 900 2 n,500 4 n 2,900 6 n 6,000 8 n 6,000 20 n 34,000
MISE OF HIGHER ORDER KERNEL ESTIMATORS 047 Figure 5. Regret across Kernel Classes+ trapezoidal kernel l 2, as long as n 500 ~its regret is bounded below 2% for such sample+ Finally, from Figure 5 we find that the regret of G n * is uniformly bounded below 2%+ This means that this kernel choice guarantees that regardless of f ~in the class of two-component mixture-normal distribution the MISE n * ~G n *, f! cannot differ from the optimal MISE n * ~K n ~ f!, f! by more than 2%+ 8. CONCLUSION We have found that finite-sample MISE varies considerably across kernel order 2r and underlying density f+ If information about f is known this can be used to help pick the kernel order+ Otherwise, the minimax regret solution picks the kernel order to minimize the worst-case regret+ We find that by focusing on the class of mixture-normal densities, we are able to deduce a practical rule for kernel order selection that bounds the regret below 2% for any sample size+ Our analysis points to the excellent performance of the higher order Gaussian and smooth polynomial kernels+ An important caveat is that minimax regret is defined relative to a density class F, and we restricted attention to two-component normal mixtures+ If another set F is selected, the minimax regret kernel can change+ Further investigation
048 BRUCE E. HANSEN of the sensitivity of these recommendations to alternative sets F would be a useful subject of future research+ REFERENCES Abadir, K+M+ ~999! An introduction to hypergeometric functions for economists+ Econometric Reviews 3, 287 330+ Abadir, K+M+ &S+ Lawford ~2004! Optimal asymmetric kernels+ Economics Letters 83, 6 68+ Chiu, S+-T+ ~99! Bandwidth selection for kernel density estimation+ Annals of Statistics 9, 883 905+ Davis, K+B+ ~98! Mean integrated suare error properties of density estimates+ Annals of Statistics 5, 530 535 Epanechnikov, V+I+ ~969! Non-parametric estimation of a multivariate probability density+ Theory of Probability and Its Applications 4, 53 58+ Granovsky, B+L+ &H+-G+ Müller ~99! Optimizing kernel methods: A unifying variational principle+ International Statistical Review 59, 373 388+ Hall, P+ &R+D+ Murison ~993! Correcting the negativity of high-order kernel density estimators+ Journal of Multivariate Analysis 47, 03 22+ Hansen, B+E+ ~2003! Appendix: Exact mean integrated suared error of higher-order kernel estimators+ www+ssc+wisc+edu0;bhansen0papers0mise+html+ Magnus, W+, F+ Oberhettinger, &R+P+ Soni ~966! Formulas and Theorems for the Special Functions of Mathematical Physics+ Springer-Verlag+ Marron, J+S+ &M+P+ Wand ~992! Exact mean integrated suared error+ Annals of Statistics 20, 72 736+ Müller, H+-G+ ~984! Smooth optimum kernel estimators of densities, regression curves and modes+ Annals of Statistics 2, 766 774+ Parzen, E+ ~962! On estimation of a probability density and mode+ Annals of Mathematical Statistics 35, 065 076+ Politis, D+N+ &J+P+ Romano ~999! Multivariate density estimation with general flat-top kernels of infinite order+ Journal of Multivariate Analysis 68, 25+ Savage, L+J+ ~95! The theory of statistical decision+ Journal of the American Statistical Association 46, 55 67+ Scott, D+W+ ~992! Multivariate Density Estimation+ Wiley+ Silverman, B+W+ ~986! Density Estimation for Statistics and Data Analysis+ Chapman and Hall+ Tranter, C+J+ ~968! Bessel Functions with some Physical Applications+ Hart+ Wand, M+P+ &W+R+ Schucany ~990! Gaussian-based kernels+ Canadian Journal of Statistics 8, 97 204+ Zhang, S+ &J+ Jin ~996! Computation of Special Functions+ Wiley+ APPENDIX: Proofs Proof of Theorem. Müller ~984, Thm+ 2+4! and Granovsky and Müller ~99, Cor+ 2! show that the optimal kernel is the uniue 2rth-order kernel of the form ~7! with B r, s ~x! a polynomial of degree r inx 2 + Because ~8! takes this form, it is sufficient to show that B r, s ~x!m s ~x! is a valid 2rth-order kernel, that ~8! ~0! are euivalent, and ~!+ We start by showing the euivalence of ~8! ~0!+ First, because for k m, ~m! k ~! k m!0~m k!!, then
MISE OF HIGHER ORDER KERNEL ESTIMATORS 049 2r ~!k 3 3 2 r sr 2 s r x k 2k ~s! r k0 k!~r k!! 3 2k 2r 3 3 2 sr ~s! r ~r!! k0 k ~ r! r 2 s r x k 2k k! 3 2k 2 3 2r 2 s r ~s! r ~r!! 2 F r, 2 s r ; 3 2 ; x 2 r ~! 2r ~s! rs x C s02 2r ~x!, where the final euality is ~5!+ This shows the euality of ~8! and ~9!+ Second, let l s 2 _ + The Gegenbauer polynomials C m l ~x! satisfy the recurrence ~Magnus et al+, 966, p+ 222! ~ j!c l j Thus for any j, 3 2j ~s j!! C l 2j~x! ~x! ~2j 2s!xC l j ~x! ~ j 2C l j ~x!+ 3 2j2j s 2 ~s j!! xc l 2j ~x! By back substitution and the fact C l ~x! ~2s!x, we find 3 r 2r ~s r!! x l C 2r ~x! ~! r ~s m!! m 3 2j ~s j!! C l 2j~x!+ ~! m 3 2m2m s 2 C l 2m ~x! ~! r x C l ~x! m s r 2m2m ~! r 2 ~! ~s m!! m0 2 C l 2m ~x!+
050 BRUCE E. HANSEN Hence r ~! 2r ~s! rs x C s02 2r ~x! s 2 m0 C s02 2m ~x!, m s r 2m2m 2 ~! ~s m!! showing the euivalence of ~9! and ~0!+ We now show that B r, s ~x!m s ~x! is a valid 2rth-order kernel+ First, using ~0! and the facts that C l 0 ~x! and * C l m ~x!m s ~x! dx 0 for all m, B r, s ~x!m s ~x! dx s 2 m0 2 m s r 2m2m ~! ~!0 20s 2 s 2 ~s m!! M s ~x! dx + C l 2m ~x!m s ~x! dx Thus B r, s ~x!m s ~x! is a valid kernel+ Second, using ~9! and the orthogonal properties of the Gegenbauer polynomials, for any j 2r x j B r, s ~x!m s ~x! dx ~! r 2r ~s! rs 2 x j C s02 2r ~x!m s ~x! dx 0+ Thus B r, s ~x!m s ~x! is of order 2r as reuired+ Finally, we demonstrate ~!+ Rodrigues formula for the Gegenbauer polynomials ~Magnus et al+, 966, p+ 22! is ~2m 2! C s02 2m ~x! 2 2m ~2m!!~2!~2m! ~ x d 2m 2! s dx ~ x 2! 2ms + 2m Using ~! and ~6!, C s02 2m ~x!m s ~x! 2ms~m!s 2 ~2m!! 22ms M ~2m! 2ms ~x!+
Combined with ~0! this yields M r, s ~x! s 2 m0 s 2 m0 MISE OF HIGHER ORDER KERNEL ESTIMATORS 05 r ~! m r ~! m 2m2m s 2 ~s m!! 2m2m s 2 ~s m!! C s02 2m ~x!m s ~x! 2ms~m!s 2 ~2m!! 22ms M ~2m! 2ms ~x! r ~! m m0 2ms m!2 2m 22ms M ~2m! 2ms ~x!, which is ~!+ Proof of Theorem 2. Let J v ~x! denote the Bessel functions and P m ~a, b! ~x! denote the Jacobi polynomials+ ~See, e+g+, Ch+ 3 and Sect+ 5+2 of Magnus et al+, 966+! In particular, J s02 ~x! 2x 02 j p s ~x! where j s ~x! is the spherical Bessel function defined in ~4!+ Using results from pp+ 29 and 20 of Magnus et al+ ~966! and ~!, 2 2r ~r!!gr s ~! r x l C 2r 2 ~x! ~! r Gr s 2 Gs 2r 2 P r ~2r!!Gs 2 P ~s,02! r ~2x 2! A.) ~02, ~ 2x 2!+ A.2) By Magnus et al+ ~966, p+ 26!, as r r Mr P ~02, r x 2 2r 2 r 2 x 02J 02 ~x! 2 p j 0~x! 2 sin~x! 02 p 02 x + A.3) The eualities are ~A+! and ~8!+ Furthermore, using Stirling s formula we can show that
052 BRUCE E. HANSEN lim mr G 2 m m 02 G~m! + A.4) Thus combining ~9! with ~A+2!, and using the limits in ~A+3! and ~A+4!, we obtain as r r Gr s 2 2r B r, s x 2r r 2rGs 3 2G~s r! Furthermore, as r r M s 2r x Thus as r r x 2r M 2r, s Gs 3 2 sin~x! p 02 x + 2s x 2 2 s r 4r 2r 2r B r, s r establishing ~3!+ 2rM x s2r x Gs 3 2 sin~x! p 02 x D~x!, ~02, P r x 2 2 2r 2s + 2s Proof of Theorem 3. Let He m ~x! denote the hermite polynomials+ By the Rodrigues formula and a limit property for the Hermite polynomials ~Magnus et al+, 966, pp+ 252, 255!, as r r, M2r 2 G 2r x M2r 2 establishing ~4!+ ~! r f ~2r! x M2r 2 2 r ~r!!x ~! r He 2r x M2r 2f x M2r 2 r sin x px, 2 r ~r!!x
H G 0 Proof of Theorem 4. From Magnus et al+ ~966, p+ 40! MISE OF HIGHER ORDER KERNEL ESTIMATORS 053 cos~tx!~ x 2! s dx 2 s t p 2t J s02~t! 2 s j t s ~t!, the second euality using ~A+!+ Then by the definition of the Fourier transform H M s ~x! 2s e itx ~ x 2! s dx 3 2s 0 cos~tx!~ x 2! s dx 3 2s 2 tsj s ~t!+ Proof of Theorem 5. Using ~!, the derivative property of Fourier transformations, and Theorem 4, r M 2r, s ~x! ~! m m0 2ms m!2 2m 22ms M ~2m! H 2ms ~x! r m0 2ms m!2 2m 22ms t 2m M 2ms ~t! as stated+ r m0 2ms m!2 2m 22ms 3 22ms 2m t 2 2ms j t 2ms ~t! Mp 2 2 r G 2 m s 2 2m s ts m0 j 2ms ~t! m!
G G 054 BRUCE E. HANSEN 4 Proof of Theorem 6. By ~A+! and Magnus et al+ ~966, p+ 99!, p 0 2 2s t 2 t 0 j s2l ~t!j s2m ~t! dt 2s J s2l02 ~t!j s2m02 ~t! dt G~2s!G 2 l m G 3 2 l m 2s G~ l m G~ l m ~l m!~m l! 2 l m 2s ~2! 6l m6 s + A.5) By Parseval s euality, Theorem 5, and ~A+5!, R~M 2r, s! M 2r, s ~x! 2 dx M 2r, s ~t! 2 dt 2p p l0 r r m0 0 6l m6 s a s ~l!a s ~m! 4 p 0 2 t 2s j s2l ~t!j s2m ~t! dt ~2! p r r l0 m0 ~6l m6 s! a s ~l!a s ~m! ~l m!~m l! 2 l m 2s + Proof of Theorem 7. Using expressions ~5! and ~22! we find I ~h, M 2r, s! p 0 r a s ~m! m0 r M 2r, s ~ht! f D~t! 2 dt 2 a s ~m! m0 p 302 0 i k ht 2 s j s2m ~ht! f D~t! 2 dt 2 w i w k 2 s j p 302 0 ht s2m ~ht!e s ik 2 t 2 02 cos~tm ik! dt+ A.6)
G MISE OF HIGHER ORDER KERNEL ESTIMATORS 055 Using Magnus et al+ ~966, p+ 402! we find that p 0 t 2m e s 2 t 2 02 cos~tm! dt ~!m M2ps 2m exp m2 2s 2He 2m m s ~! m f s ~2m! ~m!+ A.7) Using ~A+!, ~4!, and ~A+7!, 2 2 s e p 302 0 ht s 2 t 2 cos~tm!j s2m ~ht! dt 2 s02 e p 0 ht s 2 t 2 cos~tm!j s2m02 ~ht! dt a0 a0 ~! a Gs 2m a 3 2a! 22m2a h e s 2 t 2 cos~tm!t 2m2a dt p 0 ~! m a!gs 2m a 2 h ma 2 3 f ~2m2a! s ~m!+ Combining this with ~A+6! we obtain ~23!+ Next, I 2 ~h, M 2r, s! p 0 M 2r, s ~ht! 2 f D~t! 2 dt r r l0 m0 a s ~l!a s ~m! r r l0 m0 0 4 a s ~l!a s ~m! 2 ht 2s p 2 0 i k 2 ht 2s w i w k 4 p 2 j s2l ~ht!j s2m ~ht! f D~t! 2 dt j s2l ~ht!j s2m ~ht!e s ik 2 t 2 02 cos~tm ik! dt+ A.8) By euation ~2+20! of Tranter ~968! 4 p j s2l~t!j s2m ~t! 2 t J s2l02~t!j s2m02 ~t! a0 ~! a ~2a 2s 2l 2m!!~t02! 2l2m2s2a a!ga s 2l 3 2Ga s 2m 3 2~a 2s 2l 2m!! +
056 BRUCE E. HANSEN Combining this with ~A+7! we obtain 4 ht 2 2s j s2l ~ht!j s2m ~ht!e s 2 t 2 02 cos~tm! dt p 2 0 ~! a ~2a 2s 2l 2m!! a0 a!ga s 2l 2Ga 3 s 2m 2~a 3 2s 2l 2m!! 2 h 2l2m2a p 0 t 2l2m2a e s 2 t 2 02 cos~tm! dt ~! a ~2a 2s 2l 2m!! a0 a!ga s 2l 2Ga 3 s 2m 2~a 3 2s 2l 2m!! 2 h 2l2m2a f ~2l2m2a! s ~m!+ Combining this with ~A+8! we obtain ~24! as reuired+ Proof of Theorem 8. By a series expansion and ~! ~x cos~x! 2! a a0 ~2a!! a0 ~x 2! a! a + 2 2a 2a Thus for any h 2 _ 20 u t 2h cos~tm!exp s 2 t 2 dt 2 2 a0 2h ~m 2! a! a u t 0 2h2a exp s 2 t 2 dt 2 2 2a 2a s 2h a0 a! 2a 2 aga m2 h, u 2 s 2 + 2s 2 Summing, we find p 0 u t 2h f D~t! 2 dt 2p i 2p i k k w i w k 20 w i w k 2 h u s ik 2h a0 t 2h cos~tm ik!exp s ik 2 t 2 dt 2 a! 2a m 2 a ik 2s ik 2 ga h, u 2 2 s ik 2 C h ~u!+
D D E MISE OF HIGHER ORDER KERNEL ESTIMATORS 057 Hence I ~h, D! I 2 ~h, D! p 0 D~ht! f D~t! 2 dt p 0 0h f D~t! 2 dt C ~0h!, I ~h, l 2! p 0 l 2 ~ht! f D~t! 2 dt p 0 20h ~2 ht! f D~t! 2 dt 0h ~ ht! f D~t! 2 dt p 0 2C 02 ~20h! hc ~20h! C 02 ~0h! hc ~0h!, and I 2 ~h, l 2! p 0 l 2 ~ht! 2 f D~t! 2 dt p 0 @~4 4t t 2!~t 2! ~3 4t t 2!~t!# f D~t! 2 dt 4C 02 ~20h! 4hC ~20h! h 2 C 302 ~20h! 3C 02 ~0h! 4hC ~0h! h 2 C 302 ~0h!+