EXACT MEAN INTEGRATED SQUARED ERROR OF HIGHER ORDER KERNEL ESTIMATORS

Similar documents
Time Series and Forecasting Lecture 4 NonLinear Time Series

Estimation of cumulative distribution function with spline functions

Averaging Estimators for Regressions with a Possible Structural Break

A Novel Nonparametric Density Estimator

Smooth simultaneous confidence bands for cumulative distribution functions

Discussion Paper No. 28

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

A Note on the Differential Equations with Distributional Coefficients

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE

arxiv: v1 [stat.co] 26 May 2009

arxiv: v1 [physics.comp-ph] 22 Jul 2010

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Nonparametric Methods

Nonparametric Density Estimation

Chapter 9. Non-Parametric Density Function Estimation

Bias Correction and Higher Order Kernel Functions

Econ 582 Nonparametric Regression

Adaptive Nonparametric Density Estimators

Chapter 9. Non-Parametric Density Function Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

CORRECTION OF DENSITY ESTIMATORS WHICH ARE NOT DENSITIES

1 Review of di erential calculus

Nonparametric Function Estimation with Infinite-Order Kernels

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Defect Detection using Nonparametric Regression

PROBLEMS AND SOLUTIONS

LEGENDRE POLYNOMIALS AND APPLICATIONS. We construct Legendre polynomials and apply them to solve Dirichlet problems in spherical coordinates.

Nonparametric Modal Regression

Bandwith selection based on a special choice of the kernel

Miscellanea Kernel density estimation and marginalization consistency

Modelling Non-linear and Non-stationary Time Series

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions

Introduction to Curve Estimation

Waves on 2 and 3 dimensional domains

Reproducing Kernel Hilbert Spaces

Local Polynomial Regression

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

Variance Function Estimation in Multivariate Nonparametric Regression

On robust and efficient estimation of the center of. Symmetry.

The choice of weights in kernel regression estimation

MATH 118, LECTURES 27 & 28: TAYLOR SERIES

Reproducing Kernel Hilbert Spaces

An idea how to solve some of the problems. diverges the same must hold for the original series. T 1 p T 1 p + 1 p 1 = 1. dt = lim

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

A nonparametric method of multi-step ahead forecasting in diffusion processes

Asymptotics of Integrals of. Hermite Polynomials

5.4 Bessel s Equation. Bessel Functions

be the set of complex valued 2π-periodic functions f on R such that

MATHEMATICAL FORMULAS AND INTEGRALS

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

The Laplace driven moving average a non-gaussian stationary process

6 Lecture 6b: the Euler Maclaurin formula

Polynomial Solutions of Nth Order Nonhomogeneous Differential Equations

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

A COMPLETE ASYMPTOTIC SERIES FOR THE AUTOCOVARIANCE FUNCTION OF A LONG MEMORY PROCESS. OFFER LIEBERMAN and PETER C. B. PHILLIPS

Lecture 32: Taylor Series and McLaurin series We saw last day that some functions are equal to a power series on part of their domain.

On integral representations of q-gamma and q beta functions

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Reproducing Kernel Hilbert Spaces

Homework and Computer Problems for Math*2130 (W17).

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Power Series Solutions to the Legendre Equation

Positive data kernel density estimation via the logkde package for R

INTRODUCTION TO REAL ANALYSIS II MATH 4332 BLECHER NOTES

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Study # 1 11, 15, 19

The Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1

Closest Moment Estimation under General Conditions

Numerical Analysis: Interpolation Part 1

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

More on Estimation. Maximum Likelihood Estimation.

Section Taylor and Maclaurin Series

MATH 5640: Fourier Series

Math 341: Probability Seventeenth Lecture (11/10/09)

Optimal Polynomial Control for Discrete-Time Systems

Smooth functions and local extreme values

Density Estimation (II)

PARTIAL DIFFERENTIAL EQUATIONS. Lecturer: D.M.A. Stuart MT 2007

Nonparametric Regression

COURSE Numerical integration of functions (continuation) 3.3. The Romberg s iterative generation method

arxiv: v1 [stat.me] 25 Mar 2019

The Simplex Method: An Example

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Bessel s and legendre s equations

EXPANSION OF ANALYTIC FUNCTIONS IN TERMS INVOLVING LUCAS NUMBERS OR SIMILAR NUMBER SEQUENCES

A note on multivariate Gauss-Hermite quadrature

Math 308 Exam I Practice Problems

MA22S3 Summary Sheet: Ordinary Differential Equations

THE HYPERBOLIC METRIC OF A RECTANGLE

MTH310 EXAM 2 REVIEW

Kernel Density Estimation

Two special equations: Bessel s and Legendre s equations. p Fourier-Bessel and Fourier-Legendre series. p

Transcription:

Econometric Theory, 2, 2005, 03 057+ Printed in the United States of America+ DOI: 0+070S0266466605050528 EXACT MEAN INTEGRATED SQUARED ERROR OF HIGHER ORDER KERNEL ESTIMATORS BRUCE E. HANSEN University of Wisconsin The exact mean integrated suared error ~MISE! of the nonparametric kernel density estimator is derived for the asymptotically optimal smooth polynomial kernels of Müller ~984, Annals of Statistics 2, 766 774! and the trapezoid kernel of Politis and Romano ~999, Journal of Multivariate Analysis 68, 25! and is used to contrast their finite-sample efficiency with the higher order Gaussian kernels of Wand and Schucany ~990 Canadian Journal of Statistics 8, 97 204!+ We find that these three kernels have similar finite-sample efficiency+ Of greater importance is the choice of kernel order, as we find that kernel order can have a major impact on finite-sample MISE, even in small samples, but the optimal kernel order depends on the unknown density function+ We propose selecting the kernel order by the criterion of minimax regret, where the regret ~the loss relative to the infeasible optimum! is maximized over the class of two-component mixturenormal density functions+ This minimax regret rule produces a kernel that is a function of sample size only and uniformly bounds the regret below 2% over this density class+ The paper also provides new analytic results for the smooth polynomial kernels, including their characteristic function+. INTRODUCTION This paper is concerned with the choice of kernel for univariate nonparametric density estimation+ A variety of kernel functions have been proposed+ Some references include Parzen ~962!, Epanechnikov ~969!, Müller ~984!, Silverman ~986!, Wand and Schucany ~990!, Scott ~992!, Politis and Romano ~999!, and Abadir and Lawford ~2004!+ A common measure to evaluate the efficiency of a density estimator is mean integrated suared error ~MISE!+ As asymptotic MISE ~AMISE! is easier to calculate than exact MISE, it has been common to analyze AMISE and adopt kernel functions that minimize AMISE+ Marron and Wand ~992! refocused attention on the exact finite-sample MISE in their study of the higher order This research was supported in part by the National Science Foundation+ I thank Oliver Linton and a referee for helpful comments and suggestions that improved the paper+ Address correspondence to Bruce E+ Hansen, Department of Economics, 80 Observatory Drive, University of Wisconsin, Madison, WI 53706, USA; e-mail: bhansen@ssc+wisc+edu+ 2005 Cambridge University Press 0266-4666005 $2+00 03

032 BRUCE E. HANSEN Gaussian kernels of Wand and Schucany ~990!, showing that AMISE can be uite misleading in many cases of interest+ We extend this literature by developing new expressions for the exact MISE of several important kernel density estimators, including the asymptotically optimal smooth symmetric polynomial kernels of Müller ~984!, the infinite-order Dirichlet kernel, and the trapezoid kernel of Politis and Romano ~999!+ We compute the MISE when the sampling densities are normal mixtures as in Marron and Wand ~992!+ As an important by-product, we obtain convenient expressions for the characteristic functions of the smooth polynomial kernels, a class that includes the Epanechnikov, the biweight, and most higher order polynomial kernels used in empirical practice+ Kernel characteristic functions are useful for many purposes, including computationally efficient density estimation as discussed in Silverman ~986!+ Our calculations show that the choice between the higher order Gaussian kernels and the smooth polynomial kernels has only a small impact on MISE, whereas the choice of kernel order has a large impact on MISE+ Efficiency gains ~and losse from higher order kernels can be uite substantial even in small samples+ The practical problem is that the ideal kernel order depends on the unknown density+ To provide a practical solution to the problem of kernel order selection, we suggest the decision theoretic criterion of minimax regret+ This criterion picks the kernel order to minimize the maximal percentage deviation between the actual MISE and the infeasible optimal MISE over a set of candidate density functions+ The minimax regret kernel order is a function only of sample size and thus can be implemented in practice+ We calculate the minimax regret rule by maximizing the regret over all two-component mixture-normal densities+ We find that the minimax regret kernel order is increasing in sample size, and this rule uniformly bounds the regret below 2%+ We should mention two caveats at the outset+ First, higher order kernel density estimators are not constrained to be nonnegative and thus may violate the properties of a density function+ A practical solution to this problem ~when it arise is to remove the negative part of the estimator and rescale the positive part to integrate to one+ As shown by Hall and Murison ~993! this does not affect the AMISE+ However, the impact of this procedure on the finite-sample MISE is unclear+ Second, our MISE calculations implicitly assume that the density has support on the entire real line ~our analysis is for mixture-normal densitie+ The existence of a boundary can have substantial effects on the estimation problem and hence on the MISE+ The remainder of the paper is organized as follows+ Section 2 introduces notation+ Section 3 presents new results concerning smooth polynomial kernels+ Section 4 computes the characteristic functions and roughness of these kernels+ Section 5 presents the exact MISE of the density estimates, when the underlying density is a mixture of normals as in Marron and Wand ~992!+ Section 6

MISE OF HIGHER ORDER KERNEL ESTIMATORS 033 presents a numerical analysis of the MISE of the various kernels+ We find that the MISE can be greatly influenced by the order of the kernel, even in small samples, and that the selection of kernel order is essential to accurate estimation+ We find that the higher order Gaussian kernels of Wand and Schucany ~990! are the most reasonable candidates for finite-sample efficient estimation+ Section 7 proposes the selection of kernel order using minimax regret+ Section 8 is a brief conclusion+ Proofs are presented in the Appendix+ Further numerical results and details, in addition to the compute code for the calculations, are available at http:00www+ssc+wisc+edu0;bhansen0papers0mise+html+ 2. NOTATION For any function g, let g ~m! ~x! ~d m 0dx m!g~x! denote its mth derivative+ Let f~x!, f s ~x! s f~x0, and f ~m! s ~x! s m f ~m! ~x0 denote the standard normal density, the scaled normal density, and its mth derivative, respectively+ We call K~x! a symmetric kernel function if K~x! K~x! and * R K~x! dx + Following Müller ~984!, we say that K is s-smooth if K ~ is absolutely continuous on R+ If the kernel has bounded support, this reuires that the ~s!th derivative euals zero at the boundaries+ For integer r, a kernel function K is of order 2r if * x k K~x! dx 0 for all k 2r yet * x 2r K~x! dx 0+ In particular, a kernel is second-order if * x 2 K~x! dx 0+ Pochhammer s symbol is denoted by n ~a! n ) ~a j! j0 G~a n! + G~a! A useful result is Legendre s duplication formula: ~2m!! 2m m!2 + 2m ) We will be making extensive use of special functions, for which a useful reference is Magnus, Oberhettinger, and Soni ~966!+ Many can be represented as generalized hypergeometric functions ~a! j +++~a p! j x p F ~a,+++,a p ; c,+++,c ; x! j j0 ~c! j +++~c! j j! + 2) For a review of the latter see Abadir ~999!+ In particular, for integer m 0, our results will make use of the Gegenbauer polynomials C l m ~x! @m02# ~! k G~l m k!~2x! m2k G~l! k0 k!~m 2k!! 3)

034 BRUCE E. HANSEN and the spherical Bessel function j m ~x! k0 2 x m 3 2m ~! k 3 k! 2mk 0 F 3 2 2 x m2k m; x 2 4 + 4) The Gegenbauer polynomials have several hypergeometric representations, including 2~l! x l C 2m ~x! ~! m m ~m!! 2 F m, m l; 3 2 ; x 2+ 5) See Magnus et al+ ~966, p+ 220!+ 3. SMOOTH POLYNOMIAL KERNELS Using a random sample with n real observations X,+++,X n drawn independently from a distribution with density f ~x!, the nonparametric Rosenblatt kernel estimator of f ~x! is f n ~x! n K x X i, nh i h where K is a kernel function and h is a bandwidth+ The MISE of f n is MISE n ~h, K, f! E~ f n ~x! f ~x!! 2 dx+ For the moment, restrict attention to s-smooth, second-order kernels with bounded support+ Müller ~984! and Granovsky and Müller ~99! found that the MISE of f ~ n is minimized by setting K to eual 2s M s ~x! ~ x 2! s + 6) This class includes the uniform ~s 0!, Epanechnikov ~s!, biweight ~s 2!, and triweight ~s 3!+ Furthermore, as s diverges, the optimal smooth kernel approaches the Gaussian kernel, as

MISE OF HIGHER ORDER KERNEL ESTIMATORS 035 lim sr s M2s M2s M x f~x!+ Now consider the more general case of s-smooth bounded-support kernels of order 2r+ Müller ~984! and Granovsky and Müller ~99! found that the MISE of f n ~ is minimized by using a polynomial kernel of order s r inx 2 + We now present several alternative representations for this kernel+ THEOREM + Müller s s-smooth, 2rth-order kernel on @,# is M 2r, s ~x! B r, s ~x!m s ~x!, 7) where B r, s ~x! 2r ~!k 3 3 2 r sr 2 s r k ~s! r k0 k!~r k!! 3 2k x 2k 8) ~! r Furthermore s 2 m0 2r ~s! rs 2 x C s02 2r ~x! 9) r s 2m2m ~! m ~s m!! 2 C s02 2m ~x!+ 0) r M 2r, s ~x! ~! m m0 2ms m!2 2m 22ms M ~2m! 2ms ~x!+ ) Euation ~8! gives an explicit formula for the kernel, whereas ~9! ~! give alternatives+ For numerical computation, the formula ~7! ~8! is numerically reliable+

G E I E 036 BRUCE E. HANSEN Granovsky and Müller ~99! examined the limit as s r + They showed that lim sr M2s M 2r, s x M2s G 2r ~x! ~!r f~2r! ~x! 2 r ~r!!x + 2) Here G 2r are the higher order Gaussian kernels of Wand and Schucany ~990! and Marron and Wand ~992!+ We now show that as the order 2r increases, both classes of kernels approach the infinite-order Dirichlet kernel D~x! sin~x!0~px!+ lim rr lim rr THEOREM 2+ 2r M 2r, s THEOREM 3+ x 2r D~x!+ 3) 2r M2r 2 G x D~x!+ 4) M2r 2 4. CHARACTERISTIC FUNCTION AND ROUGHNESS For any function g, let g~t! * exp~itx!g~x! dx denote the characteristic function of g~x!+ For example, the characteristic function of the normal density is f~t! exp~t 2 02!+ For the higher order Gaussian kernels, the characteristic function G 2r ~t! exp~t 2 02! r k0 t 2k 02 k k! is given by Wand and Schucany ~990, p+ 200!+ However, the characteristic function has not been calculated previously for the smooth polynomial kernels M 2r, s + We now show that they consist of linear combinations of spherical Bessel functions as defined in ~4!+ THEOREM 4+ M s ~t! 2s 3 2 s t j s ~t!+ THEOREM 5+ MG 2r, s ~t! 2 Mp 2 r ts m0 a s ~m!j s2m ~t!, 5)

where a s ~m! G 2 m s 2 2m s m! + 6) For computational purposes, when t is large, the series ~4! can be numerically unstable, in which case it is preferable to compute j m ~t! using the recurrence 2m j m ~t! j m ~t! j m ~t! 7) t with the initial conditions MISE OF HIGHER ORDER KERNEL ESTIMATORS 037 j 0 ~t! sin~t! t 8) and j ~t! sin~t! t 2 cos~t! + t Combining this with Parseval s euality, we can use ~5! to calculate the roughness of M 2r, s + THEOREM 6+ R~M 2r, s! M 2r, s ~x! 2 dx ~2! p r r l0 m0 ~6l m6 s!a s ~l!a s ~m! ~l m!~m l! 2 l m 2s + Politis and Romano ~999! introduced a class of infinite-order general flattop kernels that have flat characteristic functions at the origin+ Their preferred kernel is l 2 ~x! 2~sin2 ~x! sin 2 ~x02!! px 2, which has the trapezoidal characteristic function ld 2 ~t! ~2 6t 6! ~ 6t 6!,

038 BRUCE E. HANSEN where ~t! max~t,0!+ We thus call l 2 ~t! the trapezoid kernel+ It is easy to calculate that R~l 2! 40~3p!+ Figure a displays the four kernels M 4,8, M 0,8, M 20,8, and l 2 + The kernels have been rescaled and normalized so that all have eual roughness+ The basic shapes of the kernels are similar, with the waviness of the kernels increasing in kernel order+ The second panel displays the characteristic functions of the same kernels ~again normalized to have eual roughnes+ Here it is easier to see the Figure. ~a! Polynomial and Trapezoid Kernel Functions+ ~b! Polynomial and Trapezoid Characteristic Functions+

D D E E E MISE OF HIGHER ORDER KERNEL ESTIMATORS 039 contrasts+ As the kernel order increases, the characteristic functions are increasingly flat at the origin, with the high-order polynomial characteristic function developing a significant negative lobe+ 5. EXACT MISE The exact MISE of the kernel density estimator f n can be written as MISE n ~h, K, f! R~K! nh ni 2 ~h, K, f! 2I ~h, K, f! I 0 ~ f!, 9) where I j ~h, K, f! p 0 K~ht! j f D~t! 2 dt ph 0 K~t! j D f h t 2 dt 20) with K and fd denoting the characteristic functions of K and f+ Euation ~9! is shown, for example, in euation ~3+! of Chiu ~99!+ We write I 0 ~h, K, f! I 0 ~ f! because it is independent of K and h+ Marron and Wand ~992! proposed a test set of mixture-normal density functions for which ~9! is feasible to calculate+ The Marron Wand densities take the form f ~x! w k f sk ~x m k!+ 2) k The characteristic function of ~2! is f ~t! w k e s k 2 t 2 02 exp~itm k! k with suare f ~t! 2 i k w i w k e s ik 2 t 2 02 cos~tm ik!, 22) where s 2 ik s 2 i s 2 k and m ik m i m k + We follow Marron and Wand and restrict attention to this class of density functions for our calculation of exact MISE+ From ~22! and euation ~A+7! in the Appendix, or Theorem 2+ of Marron and Wand ~992!, we find that I 0 ~ f! i k w i w k f sik ~m ik!+ To calculate MISE n ~h, K, f! we also need the integrals I ~h, K, f! and I 2 ~h, K, f!, which may be calculated by numerical or analytic integration+ Although we use numerical integration to calculate MISE in the next section we use analytic integration to verify their accuracy+ Our first result is the integrals for the smooth polynomial kernels+

040 BRUCE E. HANSEN THEOREM 7+ For the mixture-normal densities 2), r I ~h, M 2r, s, f! ~! m a s ~m!u s ~h, m!, m0 r r I 2 ~h, M 2r, s, f! l0 m0 ~! lm a s ~l!a s ~m!v s ~h, l, m!, where a s ~m! are defined in 6), and U s ~h, m! i k w i w k a0 2 h 2m2a f ~2m2a! sik ~m ik! a!gs 2m a 3 2, 23) V s ~h, l, m! i k a0 w i w k ~2a 2s 2m 2l!! 2 h 2l2m2a f ~2l2m2a! sik ~m ik! ~a 2s 2m 2l!!Ga s 2l 3 2Ga s 2m 3 2a! + 24) Our second result is the integrals for the trapezoid and Dirichlet kernels+ This extends the work of Davis ~98!, who made similar calculations for D when f f+ THEOREM 8+ For the mixture-normal densities 2), I ~h, l 2, f! 2C 02 2 h hc 2 h C 0 h hc h, I 2 ~h, l 2, f! 4C 02 2 h 4hC 2 h h 2 C 302 2 h 3C 02 h 4hC h h 2 C 302 h, I ~h, D, f! I 2 ~h, D, f! C 02 h,

D MISE OF HIGHER ORDER KERNEL ESTIMATORS 04 where C h ~u! 2p i k w i w k 2 h s ik 2h a0 ga h, u 2 2 s ik a! 2a 2 m 2 a ik 2s ik 2 25) and x g~a, x! t 0 a e t dt is the incomplete gamma function. Theorems 7 and 8 express the integrals I ~h, K, f! and I 2 ~h, K, f! as convergent infinite series+ Methods to numerically evaluate infinite series and other special functions are discussed in detail in Zhang and Jin ~996!+ As they discuss, numerical evaluation of infinite series can fail as a result of accumulated round-off error when there are alternating signs, which occurs in our expressions+ To investigate their accuracy, we numerically compared the expressions in Theorems 7 and 8 with numerical integration for the Marron Wand densities+ We used Gauss Legendre uadrature with 8,000 grid points over @0, T # where T was selected so that the numerical integral of p * T 0 f ~t! 2 dt was within 0+0% of the exact value * f ~x! 2 dx+ The results were plotted against h+ ~See Hansen, 2003+! We found that in some cases the terms in the series expansions grow sufficiently large such that the numerical results are unreliable+ This occurred for the polynomial kernels for large h, r, and0or s and also for the trapezoid kernel for very small h in a few cases+ We concluded that numerical integration was more reliable for evaluation of I ~h, K, f! and I 2 ~h, K, f! for these kernels+ 6. ANALYSIS OF MISE For each kernel we computed MISE n ~h, K, f! onagridof3,000 values of h over the region @0,3h * #, where h * is the Silverman rule-of-thumb bandwidth for a sample size of 50+ Plots of MISE n ~h, K, f! against h for each model and kernel indicated that this region appeared to contain the global minimum for all cases+ ~See Hansen, 2003+! For given K and f the optimal finite-sample MISE is MISE n ~K, f! min MISE n ~h, K, f!+ h For the kernel classes G 2r and M 2r, s, MISE n ~K, f! was plotted as a function of r+ Figures 2 and 3 display these plots for the Marron Wand densities ~the Gaussian density! and 3 ~strongly skewed density!+ These densities are shown

042 BRUCE E. HANSEN Figure 2. MISE, Density + here because are they are extreme cases regarding the gain or loss from the use of higher order kernels+ Plots for other densities can be seen in Hansen ~2003!+ Plotted are the MISE for the Gaussian kernels G 2r and the smooth polynomial kernels M 2r,, M 2r,5, and M 2r,8, for 2r 2,+++,20, and for the sample sizes of n 50, 200, 500, and,000+ The dotted flat lines are the MISE of D and l 2,

MISE OF HIGHER ORDER KERNEL ESTIMATORS 043 Figure 3. MISE, Density 3+ which are independent of r+ For each plot, the MISE n ~K, f! is normalized relative to min K MISE n ~K, f!+ First examine the flat-top kernels D and l 2 + In nearly all cases, the trapezoid kernel l 2 does much better than the Dirichlet kernel D, and in most cases l 2 performs uite well relative to other kernel choices+ However, it never has as low MISE as the best Gaussian or polynomial kernel+

044 BRUCE E. HANSEN Second, compare the smooth polynomial kernels M 2r, s across the smoothness index s+ We observe that increasing the smoothness order typically improves the MISE, but the difference diminishes in larger samples+ Indeed, for very large samples ~not shown!, the best MISE is for small s, but the gain is minimal+ There also appears to be little gain achieved by increasing s above s 8 ~again not shown!, so setting s 8 emerges as a practical recommendation+ Third, compare the MISE of the smooth polynomial kernel M 2r,8 and the Gaussian kernel G 2r + In most cases the difference in MISE is uite small, with the Gaussian kernels achieving lower MISE in small samples and the reverse in large samples+ Examining the plots of MISE n ~K, f!, it is clear that there is no uniformly optimal choice of kernel K ~including kernel order! across sample sizes n and densities f+ For any given f there may be a best choice of kernel among those we consider, but this choice will vary across f+ A data-dependent method such as cross-validation can possibly be used to select kernel order, which might enable estimation to adapt to the unknown density f+ This is a difficult problem, however, and is not pursued further in this paper+ 7. MINIMAX REGRET KERNELS Let K denote a class of kernels, such as the Gaussian class $G 2r ; r % or the polynomial class $M 2r, s ; r ; s %+ The problem raised at the end of the previous section is how to select a kernel K K when it is known that the finite-sample optimal choice varies with the true underlying density f+ This section proposes a pragmatic solution based on minimax regret, a concept in statistical decision theory dating back to Savage ~95!+ For any density f and sample size n, there is a best possible kernel choice that can be defined as K n ~ f! argmin ln MISE * n ~K, f!+ KK Note that we implicitly define our loss function as ln MISE n * ~K, f! to eliminate scale effects+ The regret associated with the use of an arbitrary kernel K is Regret n ~ f, K! ln MISE n ~K, f! ln MISE n ~K n ~ f!, f! ln MISE n~k, f! MISE n ~K n ~ f!, f!, the percentage deviation of the realized MISE from the optimal value+ For a compact class of density functions F the maximal regret associated with K is the maximum over f F Regret n ~K! max Regret n ~ f, K!+ ff

The minimax regret rule is to pick K to minimize the maximum regret K * n argmin Regret n ~K! KK with the associated minimized regret Regret n * Regret n ~K n *! min KK MISE OF HIGHER ORDER KERNEL ESTIMATORS 045 max Regret n ~ f, K!+ ff Note that whereas the small-sample optimal kernel K n ~ f! depends on the unknown density f in addition to n, the minimax regret kernel K * n only depends on the sample size n+ The kernel K * n guarantees that if the true density f belongs to F, the percentage deviation of the MISE from the optimum is bounded by Regret * n + For the density class F we use the Marron Wand mixture-normal class ~2! with 2: f ~x! w s f x m ~ w! s s 2 f x m 2+ s 2 Many of the Marron Wand test densities are in this class, and two normal components are sufficient to generate most density shapes of interest+ Because the regret criterion is invariant to location and scale, we select m 2 and s 2 so that EX 0 and Var~X!, so the density class is fully described by the remaining parameters ~w, m, s!+ We use a grid over these parameters, varying each in increments of 0+ over all feasible values+ This results in a density class with,887 elements+ Setting K to eual the Gaussian kernel class G 2r, Figure 4 plots Regret n ~G 2r! as a function of 2r for n 25, 00, 400, and 800+ The minimax regret solution r * n locates its lowest value for any curve+ We can see that this solution r * n varies from 2 to 8 for the four sample sizes+ We calculated the minimax regret r * n for the Gaussian kernels G 2r as a function of sample size, and we present the findings in Table + The optimal kernel order r * n is strictly increasing with n+ The magnitudes may be surprising to many readers, as the minimax kernel orders are higher than typically used in practice+ Let G * n denote the Gaussian higher order kernel with r r * n, and similarly M * n, s + These are feasible kernel choices as they vary only with sample size+ We now compare the regret of the feasible kernel choices f, G * n, M * n,2, and l 2 + Figure 5 plots Regret * n for these kernels as a function of n where the general kernel class includes G 2r, M 2r,2, and l 2 + We see that the standard normal kernel f has regret that is increasing in n and is clearly dominated by the other choices except when the sample size is very small+ The trapezoid kernel l 2 has high

046 BRUCE E. HANSEN Figure 4. Regret of Gaussian Kernels+ regret for small samples but has comparable regret with the optimal Gaussian and polynomial kernels for large samples+ The optimal Gaussian and polynomial kernels have similar regret for most sample sizes, with that for G n * slightly lower+ Because G n * is somewhat easier to manipulate numerically, it is our recommendation for empirical practice+ A sensible alternative choice is the Table. Minimax regret Gaussian kernel order by sample size: Regret calculated over twocomponent normal mixture densities 2r n * n 2 n 40 4 n 40 6 n 250 8 n 530 0 n 900 2 n,500 4 n 2,900 6 n 6,000 8 n 6,000 20 n 34,000

MISE OF HIGHER ORDER KERNEL ESTIMATORS 047 Figure 5. Regret across Kernel Classes+ trapezoidal kernel l 2, as long as n 500 ~its regret is bounded below 2% for such sample+ Finally, from Figure 5 we find that the regret of G n * is uniformly bounded below 2%+ This means that this kernel choice guarantees that regardless of f ~in the class of two-component mixture-normal distribution the MISE n * ~G n *, f! cannot differ from the optimal MISE n * ~K n ~ f!, f! by more than 2%+ 8. CONCLUSION We have found that finite-sample MISE varies considerably across kernel order 2r and underlying density f+ If information about f is known this can be used to help pick the kernel order+ Otherwise, the minimax regret solution picks the kernel order to minimize the worst-case regret+ We find that by focusing on the class of mixture-normal densities, we are able to deduce a practical rule for kernel order selection that bounds the regret below 2% for any sample size+ Our analysis points to the excellent performance of the higher order Gaussian and smooth polynomial kernels+ An important caveat is that minimax regret is defined relative to a density class F, and we restricted attention to two-component normal mixtures+ If another set F is selected, the minimax regret kernel can change+ Further investigation

048 BRUCE E. HANSEN of the sensitivity of these recommendations to alternative sets F would be a useful subject of future research+ REFERENCES Abadir, K+M+ ~999! An introduction to hypergeometric functions for economists+ Econometric Reviews 3, 287 330+ Abadir, K+M+ &S+ Lawford ~2004! Optimal asymmetric kernels+ Economics Letters 83, 6 68+ Chiu, S+-T+ ~99! Bandwidth selection for kernel density estimation+ Annals of Statistics 9, 883 905+ Davis, K+B+ ~98! Mean integrated suare error properties of density estimates+ Annals of Statistics 5, 530 535 Epanechnikov, V+I+ ~969! Non-parametric estimation of a multivariate probability density+ Theory of Probability and Its Applications 4, 53 58+ Granovsky, B+L+ &H+-G+ Müller ~99! Optimizing kernel methods: A unifying variational principle+ International Statistical Review 59, 373 388+ Hall, P+ &R+D+ Murison ~993! Correcting the negativity of high-order kernel density estimators+ Journal of Multivariate Analysis 47, 03 22+ Hansen, B+E+ ~2003! Appendix: Exact mean integrated suared error of higher-order kernel estimators+ www+ssc+wisc+edu0;bhansen0papers0mise+html+ Magnus, W+, F+ Oberhettinger, &R+P+ Soni ~966! Formulas and Theorems for the Special Functions of Mathematical Physics+ Springer-Verlag+ Marron, J+S+ &M+P+ Wand ~992! Exact mean integrated suared error+ Annals of Statistics 20, 72 736+ Müller, H+-G+ ~984! Smooth optimum kernel estimators of densities, regression curves and modes+ Annals of Statistics 2, 766 774+ Parzen, E+ ~962! On estimation of a probability density and mode+ Annals of Mathematical Statistics 35, 065 076+ Politis, D+N+ &J+P+ Romano ~999! Multivariate density estimation with general flat-top kernels of infinite order+ Journal of Multivariate Analysis 68, 25+ Savage, L+J+ ~95! The theory of statistical decision+ Journal of the American Statistical Association 46, 55 67+ Scott, D+W+ ~992! Multivariate Density Estimation+ Wiley+ Silverman, B+W+ ~986! Density Estimation for Statistics and Data Analysis+ Chapman and Hall+ Tranter, C+J+ ~968! Bessel Functions with some Physical Applications+ Hart+ Wand, M+P+ &W+R+ Schucany ~990! Gaussian-based kernels+ Canadian Journal of Statistics 8, 97 204+ Zhang, S+ &J+ Jin ~996! Computation of Special Functions+ Wiley+ APPENDIX: Proofs Proof of Theorem. Müller ~984, Thm+ 2+4! and Granovsky and Müller ~99, Cor+ 2! show that the optimal kernel is the uniue 2rth-order kernel of the form ~7! with B r, s ~x! a polynomial of degree r inx 2 + Because ~8! takes this form, it is sufficient to show that B r, s ~x!m s ~x! is a valid 2rth-order kernel, that ~8! ~0! are euivalent, and ~!+ We start by showing the euivalence of ~8! ~0!+ First, because for k m, ~m! k ~! k m!0~m k!!, then

MISE OF HIGHER ORDER KERNEL ESTIMATORS 049 2r ~!k 3 3 2 r sr 2 s r x k 2k ~s! r k0 k!~r k!! 3 2k 2r 3 3 2 sr ~s! r ~r!! k0 k ~ r! r 2 s r x k 2k k! 3 2k 2 3 2r 2 s r ~s! r ~r!! 2 F r, 2 s r ; 3 2 ; x 2 r ~! 2r ~s! rs x C s02 2r ~x!, where the final euality is ~5!+ This shows the euality of ~8! and ~9!+ Second, let l s 2 _ + The Gegenbauer polynomials C m l ~x! satisfy the recurrence ~Magnus et al+, 966, p+ 222! ~ j!c l j Thus for any j, 3 2j ~s j!! C l 2j~x! ~x! ~2j 2s!xC l j ~x! ~ j 2C l j ~x!+ 3 2j2j s 2 ~s j!! xc l 2j ~x! By back substitution and the fact C l ~x! ~2s!x, we find 3 r 2r ~s r!! x l C 2r ~x! ~! r ~s m!! m 3 2j ~s j!! C l 2j~x!+ ~! m 3 2m2m s 2 C l 2m ~x! ~! r x C l ~x! m s r 2m2m ~! r 2 ~! ~s m!! m0 2 C l 2m ~x!+

050 BRUCE E. HANSEN Hence r ~! 2r ~s! rs x C s02 2r ~x! s 2 m0 C s02 2m ~x!, m s r 2m2m 2 ~! ~s m!! showing the euivalence of ~9! and ~0!+ We now show that B r, s ~x!m s ~x! is a valid 2rth-order kernel+ First, using ~0! and the facts that C l 0 ~x! and * C l m ~x!m s ~x! dx 0 for all m, B r, s ~x!m s ~x! dx s 2 m0 2 m s r 2m2m ~! ~!0 20s 2 s 2 ~s m!! M s ~x! dx + C l 2m ~x!m s ~x! dx Thus B r, s ~x!m s ~x! is a valid kernel+ Second, using ~9! and the orthogonal properties of the Gegenbauer polynomials, for any j 2r x j B r, s ~x!m s ~x! dx ~! r 2r ~s! rs 2 x j C s02 2r ~x!m s ~x! dx 0+ Thus B r, s ~x!m s ~x! is of order 2r as reuired+ Finally, we demonstrate ~!+ Rodrigues formula for the Gegenbauer polynomials ~Magnus et al+, 966, p+ 22! is ~2m 2! C s02 2m ~x! 2 2m ~2m!!~2!~2m! ~ x d 2m 2! s dx ~ x 2! 2ms + 2m Using ~! and ~6!, C s02 2m ~x!m s ~x! 2ms~m!s 2 ~2m!! 22ms M ~2m! 2ms ~x!+

Combined with ~0! this yields M r, s ~x! s 2 m0 s 2 m0 MISE OF HIGHER ORDER KERNEL ESTIMATORS 05 r ~! m r ~! m 2m2m s 2 ~s m!! 2m2m s 2 ~s m!! C s02 2m ~x!m s ~x! 2ms~m!s 2 ~2m!! 22ms M ~2m! 2ms ~x! r ~! m m0 2ms m!2 2m 22ms M ~2m! 2ms ~x!, which is ~!+ Proof of Theorem 2. Let J v ~x! denote the Bessel functions and P m ~a, b! ~x! denote the Jacobi polynomials+ ~See, e+g+, Ch+ 3 and Sect+ 5+2 of Magnus et al+, 966+! In particular, J s02 ~x! 2x 02 j p s ~x! where j s ~x! is the spherical Bessel function defined in ~4!+ Using results from pp+ 29 and 20 of Magnus et al+ ~966! and ~!, 2 2r ~r!!gr s ~! r x l C 2r 2 ~x! ~! r Gr s 2 Gs 2r 2 P r ~2r!!Gs 2 P ~s,02! r ~2x 2! A.) ~02, ~ 2x 2!+ A.2) By Magnus et al+ ~966, p+ 26!, as r r Mr P ~02, r x 2 2r 2 r 2 x 02J 02 ~x! 2 p j 0~x! 2 sin~x! 02 p 02 x + A.3) The eualities are ~A+! and ~8!+ Furthermore, using Stirling s formula we can show that

052 BRUCE E. HANSEN lim mr G 2 m m 02 G~m! + A.4) Thus combining ~9! with ~A+2!, and using the limits in ~A+3! and ~A+4!, we obtain as r r Gr s 2 2r B r, s x 2r r 2rGs 3 2G~s r! Furthermore, as r r M s 2r x Thus as r r x 2r M 2r, s Gs 3 2 sin~x! p 02 x + 2s x 2 2 s r 4r 2r 2r B r, s r establishing ~3!+ 2rM x s2r x Gs 3 2 sin~x! p 02 x D~x!, ~02, P r x 2 2 2r 2s + 2s Proof of Theorem 3. Let He m ~x! denote the hermite polynomials+ By the Rodrigues formula and a limit property for the Hermite polynomials ~Magnus et al+, 966, pp+ 252, 255!, as r r, M2r 2 G 2r x M2r 2 establishing ~4!+ ~! r f ~2r! x M2r 2 2 r ~r!!x ~! r He 2r x M2r 2f x M2r 2 r sin x px, 2 r ~r!!x

H G 0 Proof of Theorem 4. From Magnus et al+ ~966, p+ 40! MISE OF HIGHER ORDER KERNEL ESTIMATORS 053 cos~tx!~ x 2! s dx 2 s t p 2t J s02~t! 2 s j t s ~t!, the second euality using ~A+!+ Then by the definition of the Fourier transform H M s ~x! 2s e itx ~ x 2! s dx 3 2s 0 cos~tx!~ x 2! s dx 3 2s 2 tsj s ~t!+ Proof of Theorem 5. Using ~!, the derivative property of Fourier transformations, and Theorem 4, r M 2r, s ~x! ~! m m0 2ms m!2 2m 22ms M ~2m! H 2ms ~x! r m0 2ms m!2 2m 22ms t 2m M 2ms ~t! as stated+ r m0 2ms m!2 2m 22ms 3 22ms 2m t 2 2ms j t 2ms ~t! Mp 2 2 r G 2 m s 2 2m s ts m0 j 2ms ~t! m!

G G 054 BRUCE E. HANSEN 4 Proof of Theorem 6. By ~A+! and Magnus et al+ ~966, p+ 99!, p 0 2 2s t 2 t 0 j s2l ~t!j s2m ~t! dt 2s J s2l02 ~t!j s2m02 ~t! dt G~2s!G 2 l m G 3 2 l m 2s G~ l m G~ l m ~l m!~m l! 2 l m 2s ~2! 6l m6 s + A.5) By Parseval s euality, Theorem 5, and ~A+5!, R~M 2r, s! M 2r, s ~x! 2 dx M 2r, s ~t! 2 dt 2p p l0 r r m0 0 6l m6 s a s ~l!a s ~m! 4 p 0 2 t 2s j s2l ~t!j s2m ~t! dt ~2! p r r l0 m0 ~6l m6 s! a s ~l!a s ~m! ~l m!~m l! 2 l m 2s + Proof of Theorem 7. Using expressions ~5! and ~22! we find I ~h, M 2r, s! p 0 r a s ~m! m0 r M 2r, s ~ht! f D~t! 2 dt 2 a s ~m! m0 p 302 0 i k ht 2 s j s2m ~ht! f D~t! 2 dt 2 w i w k 2 s j p 302 0 ht s2m ~ht!e s ik 2 t 2 02 cos~tm ik! dt+ A.6)

G MISE OF HIGHER ORDER KERNEL ESTIMATORS 055 Using Magnus et al+ ~966, p+ 402! we find that p 0 t 2m e s 2 t 2 02 cos~tm! dt ~!m M2ps 2m exp m2 2s 2He 2m m s ~! m f s ~2m! ~m!+ A.7) Using ~A+!, ~4!, and ~A+7!, 2 2 s e p 302 0 ht s 2 t 2 cos~tm!j s2m ~ht! dt 2 s02 e p 0 ht s 2 t 2 cos~tm!j s2m02 ~ht! dt a0 a0 ~! a Gs 2m a 3 2a! 22m2a h e s 2 t 2 cos~tm!t 2m2a dt p 0 ~! m a!gs 2m a 2 h ma 2 3 f ~2m2a! s ~m!+ Combining this with ~A+6! we obtain ~23!+ Next, I 2 ~h, M 2r, s! p 0 M 2r, s ~ht! 2 f D~t! 2 dt r r l0 m0 a s ~l!a s ~m! r r l0 m0 0 4 a s ~l!a s ~m! 2 ht 2s p 2 0 i k 2 ht 2s w i w k 4 p 2 j s2l ~ht!j s2m ~ht! f D~t! 2 dt j s2l ~ht!j s2m ~ht!e s ik 2 t 2 02 cos~tm ik! dt+ A.8) By euation ~2+20! of Tranter ~968! 4 p j s2l~t!j s2m ~t! 2 t J s2l02~t!j s2m02 ~t! a0 ~! a ~2a 2s 2l 2m!!~t02! 2l2m2s2a a!ga s 2l 3 2Ga s 2m 3 2~a 2s 2l 2m!! +

056 BRUCE E. HANSEN Combining this with ~A+7! we obtain 4 ht 2 2s j s2l ~ht!j s2m ~ht!e s 2 t 2 02 cos~tm! dt p 2 0 ~! a ~2a 2s 2l 2m!! a0 a!ga s 2l 2Ga 3 s 2m 2~a 3 2s 2l 2m!! 2 h 2l2m2a p 0 t 2l2m2a e s 2 t 2 02 cos~tm! dt ~! a ~2a 2s 2l 2m!! a0 a!ga s 2l 2Ga 3 s 2m 2~a 3 2s 2l 2m!! 2 h 2l2m2a f ~2l2m2a! s ~m!+ Combining this with ~A+8! we obtain ~24! as reuired+ Proof of Theorem 8. By a series expansion and ~! ~x cos~x! 2! a a0 ~2a!! a0 ~x 2! a! a + 2 2a 2a Thus for any h 2 _ 20 u t 2h cos~tm!exp s 2 t 2 dt 2 2 a0 2h ~m 2! a! a u t 0 2h2a exp s 2 t 2 dt 2 2 2a 2a s 2h a0 a! 2a 2 aga m2 h, u 2 s 2 + 2s 2 Summing, we find p 0 u t 2h f D~t! 2 dt 2p i 2p i k k w i w k 20 w i w k 2 h u s ik 2h a0 t 2h cos~tm ik!exp s ik 2 t 2 dt 2 a! 2a m 2 a ik 2s ik 2 ga h, u 2 2 s ik 2 C h ~u!+

D D E MISE OF HIGHER ORDER KERNEL ESTIMATORS 057 Hence I ~h, D! I 2 ~h, D! p 0 D~ht! f D~t! 2 dt p 0 0h f D~t! 2 dt C ~0h!, I ~h, l 2! p 0 l 2 ~ht! f D~t! 2 dt p 0 20h ~2 ht! f D~t! 2 dt 0h ~ ht! f D~t! 2 dt p 0 2C 02 ~20h! hc ~20h! C 02 ~0h! hc ~0h!, and I 2 ~h, l 2! p 0 l 2 ~ht! 2 f D~t! 2 dt p 0 @~4 4t t 2!~t 2! ~3 4t t 2!~t!# f D~t! 2 dt 4C 02 ~20h! 4hC ~20h! h 2 C 302 ~20h! 3C 02 ~0h! 4hC ~0h! h 2 C 302 ~0h!+