Technische Universität Ilmenau Institut für Mathematik

Similar documents
Kernel density estimator

Estimation of the essential supremum of a regression function

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Sequences and Series of Functions

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Lecture 19: Convergence

Lecture 2: Monte Carlo Simulation

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

b i u x i U a i j u x i u x j

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

6.3 Testing Series With Positive Terms

Math 341 Lecture #31 6.5: Power Series

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

Lecture 10 October Minimaxity and least favorable prior sequences

Estimation for Complete Data

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

A survey on penalized empirical risk minimization Sara A. van de Geer

Lecture 33: Bootstrap

Empirical Processes: Glivenko Cantelli Theorems

Fall 2013 MTH431/531 Real analysis Section Notes

Chapter 6 Infinite Series

Random Variables, Sampling and Estimation

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

1 Approximating Integrals using Taylor Polynomials

Adaptive Estimation of Density Function Derivative

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Sieve Estimators: Consistency and Rates of Convergence

Lecture 8: Convergence of transformations and law of large numbers

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Mi-Hwa Ko and Tae-Sung Kim

Seunghee Ye Ma 8: Week 5 Oct 28

A NOTE ON BOUNDARY BLOW-UP PROBLEM OF u = u p

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Approximation by Superpositions of a Sigmoidal Function

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Expectation and Variance of a random variable

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

The Growth of Functions. Theoretical Supplement

5.1 A mutual information bound based on metric entropy

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

Stochastic Simulation

Lecture 3 The Lebesgue Integral

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A statistical method to determine sample size to estimate characteristic value of soil parameters

Lecture 15: Learning Theory: Concentration Inequalities

Optimally Sparse SVMs

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Lecture 15: Density estimation

Rates of Convergence by Moduli of Continuity

Notes 5 : More on the a.s. convergence of sums

7.1 Convergence of sequences of random variables

SAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction

Nonparametric regression: minimax upper and lower bounds

Regression with quadratic loss

Information-based Feature Selection

Statistics 511 Additional Materials

Asymptotic distribution of products of sums of independent random variables

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

Lecture 3: August 31

An alternative proof of a theorem of Aldous. concerning convergence in distribution for martingales.

Detailed proofs of Propositions 3.1 and 3.2

CS284A: Representations and Algorithms in Molecular Biology

Stat 421-SP2012 Interval Estimation Section

Mathematical Methods for Physics and Engineering

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

MAT1026 Calculus II Basic Convergence Tests for Series

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

Lecture 2. The Lovász Local Lemma

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Berry-Esseen bounds for self-normalized martingales

CSE 527, Additional notes on MLE & EM

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

REGRESSION WITH QUADRATIC LOSS

Study the bias (due to the nite dimensional approximation) and variance of the estimators

lim za n n = z lim a n n.

Chapter 6 Principles of Data Reduction

Advanced Stochastic Processes.

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Element sampling: Part 2

Math 128A: Homework 1 Solutions

Infinite Sequences and Series

6a Time change b Quadratic variation c Planar Brownian motion d Conformal local martingales e Hints to exercises...

Math 113 Exam 4 Practice

Simulation. Two Rule For Inverting A Distribution Function

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

PRELIM PROBLEM SOLUTIONS

Transcription:

Techische Uiversität Ilmeau Istitut für Mathematik Preprit No. M 03/14 Rates of cosistecy for oparametric estimatio of the mode i absece of smoothess assumptios Herrma, Eva; Ziegler, Klaus 2003 Impressum: Hrsg.: Leiter des Istituts für Mathematik Weimarer Straße 25 98693 Ilmeau Tel.: +49 3677 69 3621 Fax: +49 3677 69 3270 http://www.tu-ilmeau.de/ifm/ ISSN xxxx-xxxx

Rates of cosistecy for oparametric estimatio of the mode i absece of smoothess assumptios Eva Herrma ad Klaus Ziegler Techical Uiversity of Darmstadt ad Techical Uiversity of Ilmeau Abstract Noparametric estimatio of the mode of a desity or regressio fuctio via kerel methods is cosidered. It is show that the rate of cosistecy of the mode estimator ca be determied without the typical smoothess coditios. Oly the uiform rate of the so-called stochastic part of the problem together with some mild coditios characterizig the shape or acuteess of the mode ifluece the rate of the mode estimator. I particular, outside the locatio of the mode, our assumptios do ot eve imply cotiuity. Overall, it turs out that the locatio of the mode ca be estimated at a rate that is the better the peakier (ad hece o-smooth) the mode is, while the cotrary holds with estimatio of the size of the mode. AMS subject classificatio: 62G05, 62G07 Key words ad phrases: oparametric curve estimatio, mode, kerel smoothig, rates of cosistecy, o-smooth curves 1 Itroductio ad assumptios A importat problem i oparametric curve estimatio cosists i estimatio of the mode, i.e., the locatio of a isolated maximum of the ukow desity or regressio fuctio. A umber of distiguished papers deal with this topic. There are, amog others, Parze (1962), Rüschedorf (1977), Eddy (1980, 1982), Müller (1985, 1989), Romao (1988a,b), Grud ad Hall (1995), Ehm (1996), ad, most recetly, Mokkadem ad Pelletier (2003) as well as Abraham, Biau ad Cadre (2003). I the followig, we will restrict to the uivariate situatio but extesios to the multivariate case are possible. The classical approach is as follows. Let f be the ukow real-valued curve ad θ the mode of f, i.e. f(θ) > sup f(x) for each ɛ > 0, (1) x θ >ɛ which meas that θ is the locatio of the uique global maximum of f. Theθ is estimated from the locatio ˆθ ˆθ of a maximum of a curve estimator ˆf ˆf for f. Uiqueess of the maximum or eve a coditio like (1) for ˆf caot be expected here, but, i geeral, this does ot affect the validity of asymptotic theory. It is well-kow that uiform cosistecy of ˆf for f is sufficiet to esure cosistecy of ˆθ for θ (see, e.g., Parze, 1962; Rüschedorf, 1977; Nadaraya, 1989). 1

To obtai rates, however, oe has to kow some more about the local geometry of f aroud θ. Müller (1985) showed, i essece, that if the uiform cosistecy of ˆf is of order β, i.e. sup ˆf (x) f(x) = O(β ) (2) x (i probability or a.s.), ad if there are ρ>0,c>0 such that f(θ) f(x) c x θ ρ i a eighborhood of θ, (3) the ˆθ θ = O(β 1/ρ ) (4) i probability or a.s. depedig o what holds i (2). A similar cocept was used by Boulara et al. (1995) for estimatio of poits θ with f (p) (θ) =b, i.e. where some p-th derivative of the curve takes a give value b. So, (2) seems to be crucial for (4). But i order to obtai (2), global smoothess coditios have to be imposed o f. Most authors, amog them Müller (1985), assume f to be twice cotiuously differetiable. Differetiability, however, excludes the case ρ 1 i (3), (4), ad hece his results do ot apply to cusp-shaped modes (ρ =1,seeEhm, 1996, where, however, C p -smoothess, p 2, of f is assumed outside of θ) oreve proper peaks (ρ <1). I fact, Müller s coditios tacitly eve imply ρ 2, ad the case ρ<2 does t seem to have bee cosidered explicitly so far. It should be metioed here that Härdle et al. require oly some uiform local Lipschitz coditio istead of differetiability i order to obtai (2), but this is still a global smoothess coditio. Results of type (2) are always prove by splittig ito a stochastic part ˆf (x) f (x) ad a determiistic part f (x) f(x) where i desity estimatio, f (x) =E ˆf (x) ad the aalytic part is simply the bias. A look at the proofs beig available i the literature reveals that smoothess coditios are eeded exclusively for hadlig the aalytic part whereas a rate of the stochastic part sup ˆf (x) f (x) = O(U ) (5) x (a.s. or i probability) ca be determied without imposig ay smoothess coditios. For example, if ˆf is the Roseblatt-Parze kerel desity estimator ˆf (x) = 1 h i=1 K( x X i h ) (6) based o i.i.d. observatios X i havig desity f, (5) ca be established with a.s. rate log U = h uder some regularity coditios o the kerel K ad the badwidth h. A aalogous result holds for regressio estimators, see, e.g., Härdle et al. (1988) ad ca also be derived 2

from the results cotaied i Ziegler (2002). Similar results with somewhat worse rates are available i the case of depedet observatios ad i the multivariate case (see Györfi et al., 1989; Koshki ad Vasil iev, 1998; Liebscher, 2001). The aim of the preset paper is to show that the kowledge of the rate of the stochastic part U i (6) together with some iformatio about the shape or acuteess of the peak already suffice to prove a result like (4), with o smoothess coditios beig required at all. Apart from θ, cotiuityoff will at most be required for (5), which is sometimes the case due to techical reasos, see e.g. Eimahl ad Maso (1999) where eve the costat i the O-term is determied. I the sequel, we cosider the slightly more geeral case of θ beig the locatio of a local maximum of f, with (1) holdig for x i some eighborhood I of θ. I part, the local shape of the peak is characterized by (3) (holdig for x i some maybe smaller eighborhood J I), but this gives oly a upper boud for f aroud θ. A lower boud will be eeded i additio. Therefore, we itroduce the further coditio that there are ρ >0, d>0such that f(θ) f(x) d x θ ρ i a eighborhood of θ. (7) Note, that (3) together with (7) implies ρ ρ. Ideed, ρ >ρwould imply d x θ ρ < c x θ ρ for x θ beig small eough. If, e.g., f is twice cotiuously differetiable i a eighborhood of θ, the, by Taylor s theorem ad f (θ) =0,thechoice ρ = ρ =2is possible. Eve i this case, our result still improves o kow oes, because this is oly a local smoothess assumptio while (2) always requires global smoothess. If f has a cusp-shaped mode at θ, i.e. the oe-sided derivatives exist i θ with f (θ 0) > 0, f (θ 0) < 0adiff is cotiuously differetiable i left ad right eighborhoods of θ,the ρ = ρ =1.Forρ<1there is ecessarily f (θ 0) =, f (θ+0) =. We will retur to a discussio of these special cases i the remark after the corollary below ad i Sectio 3. I the ext sectio, we will show for the desity estimator (6) that (5), where agai f = E ˆf, together with (1), (3) ad (7) imply (4) with β = U + h ρ. This is the same rate as beig obtaied i (4) from (2) from the global smoothess assumptio that f is uiform local Lipschitz of order ρ. We will also prove a rate for the estimatio of the size f(θ) ofthepeak. I Sectio 3 we compare our results to those beig available i the literature ad i Sectio 4 we idicate how to exted our techiques to regressio aalysis. 2 Mai results For simplicity, we give the result oly for fixed, i.e., o-data-drive badwidths ad compactly supported kerels but we stress that they remai valid for data-depedet badwidths ad more geeral kerels. This ca be achieved usig the techiques described i Romao (1988a), Herrma (2000) or Ziegler (2002). Our methods also apply to the estimatio of modes of derivatives of f usig itegratio by parts as, e.g., i Ziegler (2002). Ad fially we remark that extesios to the multivariate case are possible, too. 3

Theorem 1 Let K 0 be a bouded ad symmetric kerel fuctio with K(u)du =1 ad compact support, h > 0, h 0 a badwidth sequece, ˆf the desity estimator (6). We assume that there are a eighborhood I of θ ad a umerical sequece U such that sup ˆf (x) E ˆf (x) = O(U ) a.s. (8) x I Let f ad its mode θ satisfy (1) for all x I. We also assume that (3), (7) hold ad ˆθ is defied by ˆf (ˆθ )=maxˆf (x). (9) x I The ˆθ θ = O((U + h ρ ) 1/ρ ) a.s. (10) Outlie of proof Write agai f (x) for E ˆf (x). I the proof of Theorem 2.1 i Grud ad Hall (1995) it has bee show (see also, Ziegler, 2002) that for each ɛ>0 the iequality ˆθ θ >ɛimplies sup ˆf (x) f (x) 1 x I 2 (f (θ) sup f (x)). x θ >ɛ Therefore, i order to derive (10) from (8), it suffices to fid for each η>0someτ>0 such that, for V =(U + h ρ ) 1/ρ it holds that f (θ) sup f (x) ηu. (11) x θ >τv Let x I with x θ >τv be give (with τ to be specified later). Accordig to (1) ad (3), there exists δ>0 such that f(θ) f(x h u) mi(c θ x + h u ρ,δ) mi(c θ x ρ,δ) for either u>0oru<0, depedig o the sig of θ x. ChooseM > 0 such that supp K [ M, M]. The, accordig to (7) f(θ h u) f(θ) dh ρ u ρ for large eough ad u [ M,M]. Hece, with d = d M u ρ K(u)du < ad c = M c M K(u)du = c 0 K(u)du = 1 c we have for large eough, by K 0, that 0 M 2 f (θ) f (x) = = M M M M K(u)(f(θ h u) f(x h u))du K(u)(f(θ h u) f(θ))du + dh ρ + cτ ρ V ρ = dh ρ + cτ ρ h ρ + cτ ρ U. 4 M M K(u)(f(θ) f(x h u))du

If we ow take τ such that cτ ρ max( d, η), the (11) will be satisfied. Remarks (a) By quite the same proof, we see that if (8) holds i probability istead of a.s., the result (10) is also obtaied i probability istead of a.s. (b) The existece of a θ with (9) is e.g. automatically esured if K is take to be cotiuous, sice due to the compact support, ˆf is also cotiuous ad compactly supported the. Of course, the estimatio of the height f(θ) of the peak is of iterest, too. A atural estimator is ˆf (ˆθ ) = sup x R ˆf (x). See, e.g., Nadaraya (1989) for results uder smoothess coditios. A rate for the cosistecy of ˆf (ˆθ ) ca be derived from (8) ad (10) if we replace (7) by a locally uiform versio. Theorem 2 Let the coditios of Theorem 1 be fulfilled. Istead of (7), assume that holds for some ρ, d > 0. The f(x) f(y) d x y ρ x, y i a eighborhood of θ (12) ˆf (ˆθ ) f(θ) =O((U + h ρ ) ρ/ρ ) a.s. Proof First we ote that from (12) it follows that E ˆf (x) f(x) =O(h ρ ) uiformly i x i a eighborhood of θ (13) sice E ˆf (x) f(x) M M K(u) f(x uh ) f(x) du dh ρ M M u ρ K(u)du for small eough h so that x ad x uh are both i the eighborhood where (12) holds if x is i a certai smaller eighborhood. The, from (8), (10), (12) ad (13), ˆf (ˆθ ) f(θ) = ˆf (ˆθ ) f(ˆθ )+f(ˆθ ) f(θ) = O(U + h ρ )+O( ˆθ θ ρ ) = O((U + h ρ ) ρ/ρ ) wherewehaveused ρ ρ. Remarks (a) Agai we obtai the assertio of Theorem 2 i probability if (8) is assumed to hold i probability. (b) The coditio (12) meas that f is locally uiform Lipschitz of order ρ i a eighborhood of θ, which is stroger tha (7). Ideed, it is kow that for estimatio of the size of the mode, stroger smoothess assumptios are required tha for estimatio of the locatio of the mode. See, e.g., Ziegler (2002). 5

As we have metioed i the itroductio, i case of i.i.d. observatios X i, the rate log U = h beig attaied a.s. for the stochastic part is familiar. Here, the rate of U + h ρ becomes best if we choose ( (log ) ) 1 2 ρ+1 h = O, which leads to ( (log ) ρ ) U + h ρ 2 ρ+1 = O. Ideed, for miimizatio of U + h ρ the rates of U ad h ρ must coicide, whece h ρ = O( log h ). Hece we have the followig corollary: Corollary Let the desity estimator be based o i.i.d. samples. The, we have uder the assumptios of Theorem 1 ( (log ) ρ ) (2 ρ+1)ρ ˆθ θ = O (14) (a.s.), while uder the assumptios of Theorem 2 it holds that ( ) ρ 2 ˆf (ˆθ ) f(θ) =O log (2 ρ+1)ρ (15) (a.s.) Remarks (a) Assume ρ = ρ which will cover most situatios ayway. The, we see from (14) that the rate of ˆθ θ is O(( log )1/(2ρ+1) )=O(h ) which improves as ρ decreases, i.e., as the peak gets acuter. Istead, the rate of ˆf (ˆθ ) f(θ) iso( log )ρ/(2ρ+1) )which worses with ρ gettig smaller. Naturally, the latter is the same rate at which f ca be estimated uiformly if it is uiformly local Lipschitz of order ρ. Thisbehaviorisroughly what we should have expected before. A high ad slim peak is met more exactly by the estimator, while its height will be abraded by the smoothig process. Furthermore, it is atural that the badwidth should be chose the smaller the acuter the peak is. (b) Note that our assumptios i Theorem 1 do ot eve imply cotiuity except i θ itself. Ideed, the case ρ <ρmay allow f to jump ad oscillate quite heavily outside θ. I Theorem 2, the coditio (12) implies cotiuity of f i a eighborhood of θ. However, the proof shows that Theorem 2 holds uder the weaker coditio (13) which might be valid without cotiuity i special situatios. 6

3 Brief discussio (a) I the twice differetiable case ρ = ρ = 2 our rates coicide with the kow oes i Müller (1985; see also Vieu, 1996). However, our results are still a slight improvemet i this case sice we eed to impose differetiability oly locally i a small eighborhood of θ. O the other had, uder some additioal requiremets, the rate has bee slightly improved by Leclerc ad Pierre-Loti-Viaud (2000). I the degeerate case, i.e. f (θ) =0 with some higher differetiability, the exact rate has bee recetly determied by Mokkadem ad Pelletier (2003). I the case ρ<2, o results seem to have bee available so far. This has also bee poited out by Abraham et al. (2003), p.7. See, however, Ehm, 1996, where f is assumed to be C p -smooth, p 2 except at θ itself, where it has a kik (ρ = ρ = 1). I this case, our rate (14), i.e. O( 1/3 ) ca be improved by the very sophisticated costructio of aother estimator for θ. (b) I Abraham et al. (2003), for computatioal reasos, a differet estimator is cosidered which maximizes ˆf oly over the values of X 1,...,X (istead of maximizig over x R or a iterval which the classical mode estimate does). The authors compare the performace of their estimator to that of the classical oe i the smooth case, ad state that it would be desirable to do the like i the o-smooth case. Now eve as we have results for the classical estimator i o-smooth cases, such a compariso is difficult sice the coditios give i Abraham et al. (2003) do ot directly correspod to ours. However, their β clearly equals our ρ sice they employ our coditio (13), while their α should correspod to our 1/ρ. Furthermore, we are i the uivariate case d = 1. With i.i.d. observatios beig available, the rate obtaied from (14) is O(( log ) ρ/(2 ρ+1)ρ ) = O(( log )αβ/(2β+d) ). Sice αβ 1, this is slightly better tha the rate give i Cor. 2.1 of Abraham et al. (2003) (log )2/(2β+d) which is O( ). Hece, eve i the o-smooth case, the classical estimator still αβ/(2β+d) seems to perform slightly superior to the computatioally advatageous oe. However, we do ot kow if oe of those rates ca still be improved. (c) Note that our assumptios, eve i the case ρ = ρ, do ot imply ay local symmetry of f aroud θ. The situatio chages dramatically as soo as we wat to costruct a cofidece iterval for θ. For asymptotic ormality of ˆθ θ the coditios f (θ 0) = f (θ +0) ad f (θ 0) = f (θ + 0), i.e., local symmetry of f aroud θ up to order 2, seem to be crucial. This will be show i a forthcomig paper of the secod author. However, eve i o-symmetric situatios of this kid, the mode ca still be estimated asymptotically ormal usig a differet estimator. The costructio of such a estimator is described i Ehm (1996). 7

4 Extesios to regressio fuctios Now we tur to regressio aalysis. For fixed desig regressio Y i = f(x i )+ɛ i with desig poits x 1,...,x I =[a, b] satisfyig x i x i 1 1 = o( 1 ) uiformly i 2 i ad i.i.d. error variables with zero mea ad variace Eɛ 2 i = σ 2 <, the Gasser-Müller estimator ˆf (x) = 1 h i=1 si K( x u s i 1 h (with s i 1 = 1(x 2 i + x i 1 ),i=2,...,, s 0 = a, s = b) is kow to fulfill E ˆf (x) = K(u)f(x h u)du + O( 1 ) h (Müller, 1985). Therefore, our results take over to this case quite straightforwardly as log 1 as h is of smaller order tha both U ad h ρ which is the typical case. I the case of radom desig, with f(x) =E(Y X = x) to be estimated via the Nadaraya-Watso estimator where ad ˆr (x) = 1 h ĝ (x) = 1 h ˆf (x) = ˆr (x) ĝ (x), i=1 i=1 )du Y i K( x X i h ) K( x X i h ), the quatity ˆf is compared to i the stochastic part (5) is f (x) = Eˆr (x) Eĝ (x) rather tha E ˆf (x). Note, however, that the asymptotic equivalece of the two quatities is show i Ziegler (2001a). For estimatio from i.i.d. pairs of observatios, (5) ca still be prove with log U = h 8

(see Härdle et al., 1988). For U i the case of depedet observatios, see Györfi et al. (1989) or Liebscher (1998) amog others. To prove a aalogue to our theorem for the Nadaraya-Watso estimator, oe has to fid a appropriate lower boud of f (θ) f (x) = Eˆr(θ) Eˆr(x) Eĝ (θ) Eĝ (x).ifx has a desig desity g beig bouded away from zero ad ifiity o I, i.e. we have 0 <C 1 g(x) C 2 < for x I, = f (θ) f (x) K(u)f(θ uh)g(θ uh)du K(u)g(θ uh)du K(u)f(x uh)g(x uh)du K(u)g(x uh)du = C 2 C 1 K(u)(f(θ uh) f(θ))g(θ uh)du K(u)(f(θ) f(x uh))g(x uh)du + K(u)g(θ uh)du K(u)g(x uh)du K(u)(f(θ uh) f(θ))du + C 1 C 2 K(u)(f(θ) f(x uh))du, ad from ow o oe may proceed as i the proof of the theorem. Further extesios to local polyomial smoothers are possible i a similar way. Fially we remark that our method should take over to the estimatio of poits θ with f (p) = b as metioed i the itroductio by modifyig assumptios (3) ad (7) appropriately. This will be show i a forthcomig paper of the secod author. Refereces Abraham, C., Biau, G. ad Cadre, B. (2003). Simple estimatio of the mode of a multivariate desity. Preprit. Boulara, J., Ferré, L. ad Vieu, P. (1995). Locatio of particular poits i o-parametric regressio aalysis. Aust. J. Stat. 37, 161-168. Eddy, W. (1980). Optimal kerel estimators of the mode. A. Statist. 8 870-882. Eddy, W. (1982). The asymptotic distributios of kerel estimators of the mode. Z. Wahrsch. Verw. Gebiete 59 279-290. Ehm, W. (1996). Adaptive kerel estimatio of a cusp-shaped mode. I: Fischer, Herbert (ed.) et al.: Applied mathematics ad parallel computig. Festschrift for Klaus Ritter, 109-120, Physica- Verlag, Heidelberg. Eimahl, U. ad Maso, D.M. (2000). A empirical process approach to the uiform cosistecy of kerel-type fuctio estimators. J. Theoret. Probab. 13, 1-37. Grud, B. ad Hall, P. (1995). O the miimisatio of L p error i mode estimatio. A. Statist. 23 2264-2284. Györfi, L., Härdle, W., Sarda, P. ad Vieu, P. (1989). Noparametric Estimatio from Time Series. Lecture Notes i Statistics 60, Spriger-Verlag. Härdle, W., Jasse, P. ad Serflig, R. (1988). Strog uiform cosistecy rates for estimators of coditioal fuctioals. A. Statist. 16 1428-1449. 9

Koshki, G.M. ad Vasil iev, V.A. (1998). Noparametric estimatio of derivatives of a multivariate desity from depedet observatios. Mathematical Methods Statist. 7 361-400. Herrma, E. (2000) Data adaptive kerel regressio estimatio. Habilitatiosschrift, Darmstadt Techical Uiversity. Leclerc, J. ad Pierre-Loti-Viaud, D. (2000). Vitesse de covergece presque sure de l estimateur a oyau du mode. Comptes redus de l Academie des Scieces de Paris 331 637-640. Liebscher, E. (2001). Estimatio of the desity ad the regressio fuctio uder mixig coditio. Statistics ad Decisios 19, 9-26. Mokkadem, A. ad Pelletier, M. (2003): The law of the iterated logarithm for the multivariate kerel mode estimator. ESAIM: Probability ad Statistics 7 1-21. Müller, H.-G. (1985). Kerel estimators of zeros ad of locatio ad size of extrema of regressio fuctios. Scad. J. Statist. 12 221-232. Müller, H.-G. (1989). Adaptive oparametric peak estimatio. A. Statist. 17 1053-1069. Nadaraya, E.A. (1989). Noparametric Estimatio of Probability Desities ad Regressio Curves. Kluver Academic Publishers, Dordrecht. Parze, E. (1962). O estimatio of a probability desity fuctio ad mode. A. Math. Statist. 33 1065-1076. Romao, J.P. (1988a). O weak covergece ad optimality of kerel desity estimates of the mode. A. Statist. 16 629-647. Romao, J.P. (1988b). Bootstrappig the mode. A. Ist. Statist. Math. 40 565-586. Rüschedorf, L. (1977). Cosistecy of estimators for multivariate desity fuctios ad for the mode. Sakhyā Ser.A39 243-250. Vieu, P. (1996). A ote o desity mode estimatio. Statistics ad Probability Letters 26 297-307. Ziegler, K. (2001): O approximatios to the bias of the Nadaraya-Watso regressio estimator. J. Noparametric Statistics 13 583-589. Ziegler, K. (2002): O oparametric kerel estimatio of the mode of the regressio fuctio i the radom desig model. J. Noparametric Statistics 14 749-774. 10