The choice of weights in kernel regression estimation

Size: px

Start display at page:

Download "The choice of weights in kernel regression estimation"

Derek Murphy
5 years ago
Views:

Biometrika (1990), 77, 2, pp. 377-81 Printed in Great Britain The choice of weights in kernel regression estimation BY THEO GASSER Zentralinstitut fur Seelische Gesundheit, J5, P.O.B. 5970, 6800

1 Biometrika (1990), 77, 2, pp Printed in Great Britain The choice of weights in kernel regression estimation BY THEO GASSER Zentralinstitut fur Seelische Gesundheit, J5, P.O.B. 5970, 6800 Mannheim 1, Federal Republic of Germany AND JOACHIM ENGEL Institut fur Angewandte Mathematik, Universitat Heidelberg, Im Neuenheimer Feld294, 6900 Heidelberg, Federal Republic of Germany SUMMARY For kernel regression estimation a weighting scheme due to Nadaraya and Watson has been associated with random design, and a convolution type weighting scheme with fixed design. Based on integrated mean square error, none of the estimators is uniformly optimal in either design. However, the convolution type weights are minimax optimal. Further advantages of this estimator can be seen in the structure of the bias. Some key words: Fixed design; Kernel estimator; Minimax optimality; Nonparametric regression; Random design. 1. INTRODUCTION The multitude of nonparametric regression estimators is an issue of considerable practical and theoretical importance. A wide class of estimators studied by Jennen- Steinmetz & Gasser (1988) included fixed width kernel estimators, smoothing splines and nearest-neighbour estimators as particular cases. No estimator is uniformly best in terms of integrated mean square error, but the kernel estimator turns out to be minimax optimal. Since nonparametric methods are usually intended to be applicable to a broad variety of situations, the minimax property is an important safeguard. Two definitions of kernel weights enjoy particular popularity, the Nadaraya-Watson type (Nadaraya, 1962; Watson, 1964) and the convolution type estimator (Priestley & Chao, 1972; Gasser & Muller, 1979). The Nadaraya-Watson method is intuitively motivated as an estimator of a conditional expectation which suggests a context where the independent variable is random. Hence this method seems suited for a situation of randomly selected design points, whose distribution is determined by the design density. The convolution method traditionally has been associated with the case of fixed design points. Jennen-Steinmetz & Gasser (1988) gave some arguments in favour of the convolution estimator for both designs, but their arguments are inconclusive; see also an unpublished report by C. K. Chu and J. S. Matron. In the present note, we discuss further aspects supporting the superiority of the convolution weights. 2. PROPERTIES OF BIAS AND VARIANCE The fixed design regression model for data (Y lt t,) (i = 1,..., n) on [0,1] is

2 378 THEO GASSER AND JOACHIM ENGEL with design points f, chosen by the experimenter, regression function r and random terms,,..., which are independent and identically distributed and satisfy E(e,) = 0, var (,) = cr 2. For random design we assume independent and identically distributed data (Y,,*,),...,(Y n, t n ) satisfying E{ Y,-\ t t ) = r(t,) and var (Y t \t,) = a 2. For simplicity, the data are assumed to be ordered with respect to the second variable. The design density / may have support on the whole real line, but the estimators are studied on [0,1] only. The convolution estimator f (Gasser & Miiller, 1979) is 1 n b /_i where the s, are the midpoints between the design points t,. The kernel W is assumed to be a symmetric probability density with compact support. The Nadaraya-Watson estimator is defined by Following Jennen-Steinmetz & Gasser (1988) the asymptotic bias and variance are respectively bias: \b 2 M 2 {W)r"{t), b 2 M 2 {W){\r"{t) + r\t)f\t)/f{t)}, variance: {C* 2 V{ W)}/{nf(t)b}, {a 2 V( W))/{nf{t)b}, where M 2 (W) =J u 2 W(u) du and V( W) =\{W{u)} 2 du. The factor C in the variance of the convolution estimator equals 1 for fixed and 1-5 for random design as discussed by C. K. Chu and J. S. Marron in their unpublished report. Jennen-Steinmetz & Gasser (1988) derived this variance for s ( = t, instead for St = (t t + t i+1 )/2 for technical convenience which, however, leads to an inferior factor of C = 2. For fixed design, the variance is the same for both estimators. For random design it is still of the same form but higher by the factor C = 1-5 for the convolution estimator. However, an additional bias term appears for the Nadaraya-Watson estimator for both designs. Since this term does not necessarily inflate the bias in size, the integrated mean square error can be smaller for either estimator. However, the following qualitative arguments, together with the minimax result of 3 lead us to discourage the use of Nadaraya-Watson weights. (i) Nadaraya-Watson weights in general do not allow estimation of linear functions or linear parts without bias, in contrast to convolution weights, (ii) The bias of the Nadaraya-Watson estimator does not only depend on the regression function r but also on the design density / (iii) The bias of the convolution weights is of a simple form, is conservative, i.e. attenuates structure, and the estimate can provide qualitative information about the bias. The bias of the Nadaraya-Watson estimator is of a complicated, visually unpredictable form and can lead to qualitative errors. Statements (i) and (ii) are illustrated in Fig. 1: assume a linear function r and a bimodal design density / The finite sample expectation of the convolution estimator coincides with the linear function within the accuracy of plotting. The expectation of the Nadaraya- Watson estimator is far from linear but rather a highly sigmoid function. A bandwidth

3 Weights in kernel regression estimation 379 of b = 0-2 was chosen. Higher bandwidths would aggravate the problem. Other nonuniform densities also lead to serious departures from the linearity of the regression function, a fact which may shake the user's confidence in these nonparametric techniques. Figure 2 illustrates statement (iii): assumed is a symmetric peak as regression function r and a declining design density / The expectation of the convolution estimator is an attenuated peak which is still symmetric and at the right location. This qualitative agreement cannot be achieved with the Nadaraya-Watson estimator which shows a displaced and asymmetric peak. Given the importance of these features for practical conclusions and for model building this behaviour is hard to tolerate. These bias problems are particularly accentuated in the scientific process of many empirical sciences: studies are usually replicated by sticking to the design of the previously published study. In this way, qualitatively misleading phenomena as obtained by the Nadaraya-Watson estimator will be attributed even more confidence. For random design convolution estimators have to pay a price in variance. This can be better tolerated since replications of studies can control for random phenomena. Fig. 1 Fig ' t i Fig. 1. True curve and expectation of convolution estimator: linear, solid line. Expectation Nadaraya-Watson estimator: sigmoid, dashed line. Design density: bimodal, solid line. Fig. 2. True curve: Gauss peak, solid line. Expectation convolution estimator: dotted line. Expectation Nadaraya-Watson estimator: dashed line. Design density: declining solid line. 3. A MINIMAX RESULT The asymptotic integrated mean square error evaluated at the optimal bandwidth is given first for the convolution and then for the Nadaraya-Watson estimator: IMSE, Here /(/) is defined as J.U1*

4 380 THEO GASSER AND JOACHIM ENGEL The interplay between design density / and regression function r determines which estimator to prefer. For both types of design one can easily find combinations of/ and r that favour either one of the two estimators. However, the convolution type scheme is optimal in a minimax sense. THEOREM. For some positive real number L let F = {fec l [a,b]\0<l^f(t), te[0,1], -oo^a^o, where f is a density. Then, for all re C 2 [0,1], For sup IMSE,(^ r) = sup IMSE 2 (/, r). /ef i: 10 ' dt>0 strict inequality holds. Note that the conditions on / guarantee that there are no holes in the design on [0,1] where the curve is to be estimated. Proof. It is sufficient to give a proof in case of random design. We have to show that sup(l-5) 4 {7(/)} 4 ( \ fef Jo \2 I fef Jo The proof is given for the case when [0,1] is strictly contained within [a, >]; the case [0,1] = [a, b] is only slightly different. We prove the theorem by constructing a class of densities f 8^ in F such that, for all A, and such that sup /(/ 8,A) = sup /(/) = - a fef L i.e. we construct design densities in F such that the Nadaraya-Watson bias increases to infinity while the variance is bounded. for fe[0,1]. If 5 is sufficiently small Jo and we can extend / a>a to a density e F on all of [a, b\. Without loss of generality we can assume the existence of an interval [a, /3] c [0,1] such that \r'(t)\ s* TJ for t e [a, /?]; otherwise the above statement is trivial. Then P! T 1 l sup -^- = sup - = - 8 Jo JS,\ fef Jo J L and, provided 8 is sufficiently small, ->00

$as A ->oo. Note that r'r" e V\_a, 0] since Weights in kernel regression estimation 381 i: 1 = ^{r'(p) 2 -r'(a) 2 }. D 4.$

5 as A ->oo. Note that r'r" e V\_a, 0] since Weights in kernel regression estimation 381 i: 1 = ^{r'(p) 2 -r'(a) 2 }. D 4. CONCLUSIONS When considering kernel regression estimators for application one has to choose between two popular weighting schemes, one called Nadaraya-Watson weights, the other convolution weights. The bias of convolution weights is rather simple, depending essentially on the second derivative of the regression function which is easy to grasp qualitatively in an exploratory analysis. The Nadaraya-Watson estimator has a complex bias, involving first and second derivatives of the regression function as well as the density of the design and its first derivative. It may not only lead to a distortion of linear parts of the regression function but also to a qualitative distortion of its pattern which is not easy to grasp intuitively, say from a graphical display. This type of bias also leads to more severe boundary problems and to difficulties with certain methods of bandwidth choice. Both are problems of practical relevance. The variance is the same for fixed design, but is smaller for the Nadaraya-Watson estimator in case of random design. A comparison of mean square error in a minimax sense shows that Nadaraya-Watson weights can become arbitrarily bad in an unfavourable situation, which is contrary to the spirit of nonparametric methods. This altogether makes us hesitate to recommend the use of Nadaraya- Watson weights. ACKNOWLEDGEMENT This work has been performed as part of the research program of the Sonderforschungsbereich 123 at the University of Heidelberg, and was made possible by financial support from the Deutsche Forschungsgemeinschaft. REFERENCES GASSER, T. & MULLER, H. G. (1979). Kernel estimation of regression functions. In Smoothing Techniques for Curve Estimation, Ed. T. Gasser and M. Rosenblatt, pp New York: Springer-Verlag. JENNEN-STEINMETZ, C. & GASSER, T. (1988). A unifying approach to nonparametric regression estimation. /. Am. Statist Assoc 83, NADARAYA, E. A. (1964). On estimating regression. Theory Prob. Applic 9, PRIESTLEY, M. B. & CHAO, M. T. (1972). Nonparametric function fitting. J. R. Statist Soc B 34, WATSON, G. S. (1964). Smooth regression analysis. Sankhya A 26, [Received August Revised November 1989]

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered