Techische Uiversität Ilmeau Istitut für Mathematik Preprit No. M 03/14 Rates of cosistecy for oparametric estimatio of the mode i absece of smoothess assumptios Herrma, Eva; Ziegler, Klaus 2003 Impressum: Hrsg.: Leiter des Istituts für Mathematik Weimarer Straße 25 98693 Ilmeau Tel.: +49 3677 69 3621 Fax: +49 3677 69 3270 http://www.tu-ilmeau.de/ifm/ ISSN xxxx-xxxx
Rates of cosistecy for oparametric estimatio of the mode i absece of smoothess assumptios Eva Herrma ad Klaus Ziegler Techical Uiversity of Darmstadt ad Techical Uiversity of Ilmeau Abstract Noparametric estimatio of the mode of a desity or regressio fuctio via kerel methods is cosidered. It is show that the rate of cosistecy of the mode estimator ca be determied without the typical smoothess coditios. Oly the uiform rate of the so-called stochastic part of the problem together with some mild coditios characterizig the shape or acuteess of the mode ifluece the rate of the mode estimator. I particular, outside the locatio of the mode, our assumptios do ot eve imply cotiuity. Overall, it turs out that the locatio of the mode ca be estimated at a rate that is the better the peakier (ad hece o-smooth) the mode is, while the cotrary holds with estimatio of the size of the mode. AMS subject classificatio: 62G05, 62G07 Key words ad phrases: oparametric curve estimatio, mode, kerel smoothig, rates of cosistecy, o-smooth curves 1 Itroductio ad assumptios A importat problem i oparametric curve estimatio cosists i estimatio of the mode, i.e., the locatio of a isolated maximum of the ukow desity or regressio fuctio. A umber of distiguished papers deal with this topic. There are, amog others, Parze (1962), Rüschedorf (1977), Eddy (1980, 1982), Müller (1985, 1989), Romao (1988a,b), Grud ad Hall (1995), Ehm (1996), ad, most recetly, Mokkadem ad Pelletier (2003) as well as Abraham, Biau ad Cadre (2003). I the followig, we will restrict to the uivariate situatio but extesios to the multivariate case are possible. The classical approach is as follows. Let f be the ukow real-valued curve ad θ the mode of f, i.e. f(θ) > sup f(x) for each ɛ > 0, (1) x θ >ɛ which meas that θ is the locatio of the uique global maximum of f. Theθ is estimated from the locatio ˆθ ˆθ of a maximum of a curve estimator ˆf ˆf for f. Uiqueess of the maximum or eve a coditio like (1) for ˆf caot be expected here, but, i geeral, this does ot affect the validity of asymptotic theory. It is well-kow that uiform cosistecy of ˆf for f is sufficiet to esure cosistecy of ˆθ for θ (see, e.g., Parze, 1962; Rüschedorf, 1977; Nadaraya, 1989). 1
To obtai rates, however, oe has to kow some more about the local geometry of f aroud θ. Müller (1985) showed, i essece, that if the uiform cosistecy of ˆf is of order β, i.e. sup ˆf (x) f(x) = O(β ) (2) x (i probability or a.s.), ad if there are ρ>0,c>0 such that f(θ) f(x) c x θ ρ i a eighborhood of θ, (3) the ˆθ θ = O(β 1/ρ ) (4) i probability or a.s. depedig o what holds i (2). A similar cocept was used by Boulara et al. (1995) for estimatio of poits θ with f (p) (θ) =b, i.e. where some p-th derivative of the curve takes a give value b. So, (2) seems to be crucial for (4). But i order to obtai (2), global smoothess coditios have to be imposed o f. Most authors, amog them Müller (1985), assume f to be twice cotiuously differetiable. Differetiability, however, excludes the case ρ 1 i (3), (4), ad hece his results do ot apply to cusp-shaped modes (ρ =1,seeEhm, 1996, where, however, C p -smoothess, p 2, of f is assumed outside of θ) oreve proper peaks (ρ <1). I fact, Müller s coditios tacitly eve imply ρ 2, ad the case ρ<2 does t seem to have bee cosidered explicitly so far. It should be metioed here that Härdle et al. require oly some uiform local Lipschitz coditio istead of differetiability i order to obtai (2), but this is still a global smoothess coditio. Results of type (2) are always prove by splittig ito a stochastic part ˆf (x) f (x) ad a determiistic part f (x) f(x) where i desity estimatio, f (x) =E ˆf (x) ad the aalytic part is simply the bias. A look at the proofs beig available i the literature reveals that smoothess coditios are eeded exclusively for hadlig the aalytic part whereas a rate of the stochastic part sup ˆf (x) f (x) = O(U ) (5) x (a.s. or i probability) ca be determied without imposig ay smoothess coditios. For example, if ˆf is the Roseblatt-Parze kerel desity estimator ˆf (x) = 1 h i=1 K( x X i h ) (6) based o i.i.d. observatios X i havig desity f, (5) ca be established with a.s. rate log U = h uder some regularity coditios o the kerel K ad the badwidth h. A aalogous result holds for regressio estimators, see, e.g., Härdle et al. (1988) ad ca also be derived 2
from the results cotaied i Ziegler (2002). Similar results with somewhat worse rates are available i the case of depedet observatios ad i the multivariate case (see Györfi et al., 1989; Koshki ad Vasil iev, 1998; Liebscher, 2001). The aim of the preset paper is to show that the kowledge of the rate of the stochastic part U i (6) together with some iformatio about the shape or acuteess of the peak already suffice to prove a result like (4), with o smoothess coditios beig required at all. Apart from θ, cotiuityoff will at most be required for (5), which is sometimes the case due to techical reasos, see e.g. Eimahl ad Maso (1999) where eve the costat i the O-term is determied. I the sequel, we cosider the slightly more geeral case of θ beig the locatio of a local maximum of f, with (1) holdig for x i some eighborhood I of θ. I part, the local shape of the peak is characterized by (3) (holdig for x i some maybe smaller eighborhood J I), but this gives oly a upper boud for f aroud θ. A lower boud will be eeded i additio. Therefore, we itroduce the further coditio that there are ρ >0, d>0such that f(θ) f(x) d x θ ρ i a eighborhood of θ. (7) Note, that (3) together with (7) implies ρ ρ. Ideed, ρ >ρwould imply d x θ ρ < c x θ ρ for x θ beig small eough. If, e.g., f is twice cotiuously differetiable i a eighborhood of θ, the, by Taylor s theorem ad f (θ) =0,thechoice ρ = ρ =2is possible. Eve i this case, our result still improves o kow oes, because this is oly a local smoothess assumptio while (2) always requires global smoothess. If f has a cusp-shaped mode at θ, i.e. the oe-sided derivatives exist i θ with f (θ 0) > 0, f (θ 0) < 0adiff is cotiuously differetiable i left ad right eighborhoods of θ,the ρ = ρ =1.Forρ<1there is ecessarily f (θ 0) =, f (θ+0) =. We will retur to a discussio of these special cases i the remark after the corollary below ad i Sectio 3. I the ext sectio, we will show for the desity estimator (6) that (5), where agai f = E ˆf, together with (1), (3) ad (7) imply (4) with β = U + h ρ. This is the same rate as beig obtaied i (4) from (2) from the global smoothess assumptio that f is uiform local Lipschitz of order ρ. We will also prove a rate for the estimatio of the size f(θ) ofthepeak. I Sectio 3 we compare our results to those beig available i the literature ad i Sectio 4 we idicate how to exted our techiques to regressio aalysis. 2 Mai results For simplicity, we give the result oly for fixed, i.e., o-data-drive badwidths ad compactly supported kerels but we stress that they remai valid for data-depedet badwidths ad more geeral kerels. This ca be achieved usig the techiques described i Romao (1988a), Herrma (2000) or Ziegler (2002). Our methods also apply to the estimatio of modes of derivatives of f usig itegratio by parts as, e.g., i Ziegler (2002). Ad fially we remark that extesios to the multivariate case are possible, too. 3
Theorem 1 Let K 0 be a bouded ad symmetric kerel fuctio with K(u)du =1 ad compact support, h > 0, h 0 a badwidth sequece, ˆf the desity estimator (6). We assume that there are a eighborhood I of θ ad a umerical sequece U such that sup ˆf (x) E ˆf (x) = O(U ) a.s. (8) x I Let f ad its mode θ satisfy (1) for all x I. We also assume that (3), (7) hold ad ˆθ is defied by ˆf (ˆθ )=maxˆf (x). (9) x I The ˆθ θ = O((U + h ρ ) 1/ρ ) a.s. (10) Outlie of proof Write agai f (x) for E ˆf (x). I the proof of Theorem 2.1 i Grud ad Hall (1995) it has bee show (see also, Ziegler, 2002) that for each ɛ>0 the iequality ˆθ θ >ɛimplies sup ˆf (x) f (x) 1 x I 2 (f (θ) sup f (x)). x θ >ɛ Therefore, i order to derive (10) from (8), it suffices to fid for each η>0someτ>0 such that, for V =(U + h ρ ) 1/ρ it holds that f (θ) sup f (x) ηu. (11) x θ >τv Let x I with x θ >τv be give (with τ to be specified later). Accordig to (1) ad (3), there exists δ>0 such that f(θ) f(x h u) mi(c θ x + h u ρ,δ) mi(c θ x ρ,δ) for either u>0oru<0, depedig o the sig of θ x. ChooseM > 0 such that supp K [ M, M]. The, accordig to (7) f(θ h u) f(θ) dh ρ u ρ for large eough ad u [ M,M]. Hece, with d = d M u ρ K(u)du < ad c = M c M K(u)du = c 0 K(u)du = 1 c we have for large eough, by K 0, that 0 M 2 f (θ) f (x) = = M M M M K(u)(f(θ h u) f(x h u))du K(u)(f(θ h u) f(θ))du + dh ρ + cτ ρ V ρ = dh ρ + cτ ρ h ρ + cτ ρ U. 4 M M K(u)(f(θ) f(x h u))du
If we ow take τ such that cτ ρ max( d, η), the (11) will be satisfied. Remarks (a) By quite the same proof, we see that if (8) holds i probability istead of a.s., the result (10) is also obtaied i probability istead of a.s. (b) The existece of a θ with (9) is e.g. automatically esured if K is take to be cotiuous, sice due to the compact support, ˆf is also cotiuous ad compactly supported the. Of course, the estimatio of the height f(θ) of the peak is of iterest, too. A atural estimator is ˆf (ˆθ ) = sup x R ˆf (x). See, e.g., Nadaraya (1989) for results uder smoothess coditios. A rate for the cosistecy of ˆf (ˆθ ) ca be derived from (8) ad (10) if we replace (7) by a locally uiform versio. Theorem 2 Let the coditios of Theorem 1 be fulfilled. Istead of (7), assume that holds for some ρ, d > 0. The f(x) f(y) d x y ρ x, y i a eighborhood of θ (12) ˆf (ˆθ ) f(θ) =O((U + h ρ ) ρ/ρ ) a.s. Proof First we ote that from (12) it follows that E ˆf (x) f(x) =O(h ρ ) uiformly i x i a eighborhood of θ (13) sice E ˆf (x) f(x) M M K(u) f(x uh ) f(x) du dh ρ M M u ρ K(u)du for small eough h so that x ad x uh are both i the eighborhood where (12) holds if x is i a certai smaller eighborhood. The, from (8), (10), (12) ad (13), ˆf (ˆθ ) f(θ) = ˆf (ˆθ ) f(ˆθ )+f(ˆθ ) f(θ) = O(U + h ρ )+O( ˆθ θ ρ ) = O((U + h ρ ) ρ/ρ ) wherewehaveused ρ ρ. Remarks (a) Agai we obtai the assertio of Theorem 2 i probability if (8) is assumed to hold i probability. (b) The coditio (12) meas that f is locally uiform Lipschitz of order ρ i a eighborhood of θ, which is stroger tha (7). Ideed, it is kow that for estimatio of the size of the mode, stroger smoothess assumptios are required tha for estimatio of the locatio of the mode. See, e.g., Ziegler (2002). 5
As we have metioed i the itroductio, i case of i.i.d. observatios X i, the rate log U = h beig attaied a.s. for the stochastic part is familiar. Here, the rate of U + h ρ becomes best if we choose ( (log ) ) 1 2 ρ+1 h = O, which leads to ( (log ) ρ ) U + h ρ 2 ρ+1 = O. Ideed, for miimizatio of U + h ρ the rates of U ad h ρ must coicide, whece h ρ = O( log h ). Hece we have the followig corollary: Corollary Let the desity estimator be based o i.i.d. samples. The, we have uder the assumptios of Theorem 1 ( (log ) ρ ) (2 ρ+1)ρ ˆθ θ = O (14) (a.s.), while uder the assumptios of Theorem 2 it holds that ( ) ρ 2 ˆf (ˆθ ) f(θ) =O log (2 ρ+1)ρ (15) (a.s.) Remarks (a) Assume ρ = ρ which will cover most situatios ayway. The, we see from (14) that the rate of ˆθ θ is O(( log )1/(2ρ+1) )=O(h ) which improves as ρ decreases, i.e., as the peak gets acuter. Istead, the rate of ˆf (ˆθ ) f(θ) iso( log )ρ/(2ρ+1) )which worses with ρ gettig smaller. Naturally, the latter is the same rate at which f ca be estimated uiformly if it is uiformly local Lipschitz of order ρ. Thisbehaviorisroughly what we should have expected before. A high ad slim peak is met more exactly by the estimator, while its height will be abraded by the smoothig process. Furthermore, it is atural that the badwidth should be chose the smaller the acuter the peak is. (b) Note that our assumptios i Theorem 1 do ot eve imply cotiuity except i θ itself. Ideed, the case ρ <ρmay allow f to jump ad oscillate quite heavily outside θ. I Theorem 2, the coditio (12) implies cotiuity of f i a eighborhood of θ. However, the proof shows that Theorem 2 holds uder the weaker coditio (13) which might be valid without cotiuity i special situatios. 6
3 Brief discussio (a) I the twice differetiable case ρ = ρ = 2 our rates coicide with the kow oes i Müller (1985; see also Vieu, 1996). However, our results are still a slight improvemet i this case sice we eed to impose differetiability oly locally i a small eighborhood of θ. O the other had, uder some additioal requiremets, the rate has bee slightly improved by Leclerc ad Pierre-Loti-Viaud (2000). I the degeerate case, i.e. f (θ) =0 with some higher differetiability, the exact rate has bee recetly determied by Mokkadem ad Pelletier (2003). I the case ρ<2, o results seem to have bee available so far. This has also bee poited out by Abraham et al. (2003), p.7. See, however, Ehm, 1996, where f is assumed to be C p -smooth, p 2 except at θ itself, where it has a kik (ρ = ρ = 1). I this case, our rate (14), i.e. O( 1/3 ) ca be improved by the very sophisticated costructio of aother estimator for θ. (b) I Abraham et al. (2003), for computatioal reasos, a differet estimator is cosidered which maximizes ˆf oly over the values of X 1,...,X (istead of maximizig over x R or a iterval which the classical mode estimate does). The authors compare the performace of their estimator to that of the classical oe i the smooth case, ad state that it would be desirable to do the like i the o-smooth case. Now eve as we have results for the classical estimator i o-smooth cases, such a compariso is difficult sice the coditios give i Abraham et al. (2003) do ot directly correspod to ours. However, their β clearly equals our ρ sice they employ our coditio (13), while their α should correspod to our 1/ρ. Furthermore, we are i the uivariate case d = 1. With i.i.d. observatios beig available, the rate obtaied from (14) is O(( log ) ρ/(2 ρ+1)ρ ) = O(( log )αβ/(2β+d) ). Sice αβ 1, this is slightly better tha the rate give i Cor. 2.1 of Abraham et al. (2003) (log )2/(2β+d) which is O( ). Hece, eve i the o-smooth case, the classical estimator still αβ/(2β+d) seems to perform slightly superior to the computatioally advatageous oe. However, we do ot kow if oe of those rates ca still be improved. (c) Note that our assumptios, eve i the case ρ = ρ, do ot imply ay local symmetry of f aroud θ. The situatio chages dramatically as soo as we wat to costruct a cofidece iterval for θ. For asymptotic ormality of ˆθ θ the coditios f (θ 0) = f (θ +0) ad f (θ 0) = f (θ + 0), i.e., local symmetry of f aroud θ up to order 2, seem to be crucial. This will be show i a forthcomig paper of the secod author. However, eve i o-symmetric situatios of this kid, the mode ca still be estimated asymptotically ormal usig a differet estimator. The costructio of such a estimator is described i Ehm (1996). 7
4 Extesios to regressio fuctios Now we tur to regressio aalysis. For fixed desig regressio Y i = f(x i )+ɛ i with desig poits x 1,...,x I =[a, b] satisfyig x i x i 1 1 = o( 1 ) uiformly i 2 i ad i.i.d. error variables with zero mea ad variace Eɛ 2 i = σ 2 <, the Gasser-Müller estimator ˆf (x) = 1 h i=1 si K( x u s i 1 h (with s i 1 = 1(x 2 i + x i 1 ),i=2,...,, s 0 = a, s = b) is kow to fulfill E ˆf (x) = K(u)f(x h u)du + O( 1 ) h (Müller, 1985). Therefore, our results take over to this case quite straightforwardly as log 1 as h is of smaller order tha both U ad h ρ which is the typical case. I the case of radom desig, with f(x) =E(Y X = x) to be estimated via the Nadaraya-Watso estimator where ad ˆr (x) = 1 h ĝ (x) = 1 h ˆf (x) = ˆr (x) ĝ (x), i=1 i=1 )du Y i K( x X i h ) K( x X i h ), the quatity ˆf is compared to i the stochastic part (5) is f (x) = Eˆr (x) Eĝ (x) rather tha E ˆf (x). Note, however, that the asymptotic equivalece of the two quatities is show i Ziegler (2001a). For estimatio from i.i.d. pairs of observatios, (5) ca still be prove with log U = h 8
(see Härdle et al., 1988). For U i the case of depedet observatios, see Györfi et al. (1989) or Liebscher (1998) amog others. To prove a aalogue to our theorem for the Nadaraya-Watso estimator, oe has to fid a appropriate lower boud of f (θ) f (x) = Eˆr(θ) Eˆr(x) Eĝ (θ) Eĝ (x).ifx has a desig desity g beig bouded away from zero ad ifiity o I, i.e. we have 0 <C 1 g(x) C 2 < for x I, = f (θ) f (x) K(u)f(θ uh)g(θ uh)du K(u)g(θ uh)du K(u)f(x uh)g(x uh)du K(u)g(x uh)du = C 2 C 1 K(u)(f(θ uh) f(θ))g(θ uh)du K(u)(f(θ) f(x uh))g(x uh)du + K(u)g(θ uh)du K(u)g(x uh)du K(u)(f(θ uh) f(θ))du + C 1 C 2 K(u)(f(θ) f(x uh))du, ad from ow o oe may proceed as i the proof of the theorem. Further extesios to local polyomial smoothers are possible i a similar way. Fially we remark that our method should take over to the estimatio of poits θ with f (p) = b as metioed i the itroductio by modifyig assumptios (3) ad (7) appropriately. This will be show i a forthcomig paper of the secod author. Refereces Abraham, C., Biau, G. ad Cadre, B. (2003). Simple estimatio of the mode of a multivariate desity. Preprit. Boulara, J., Ferré, L. ad Vieu, P. (1995). Locatio of particular poits i o-parametric regressio aalysis. Aust. J. Stat. 37, 161-168. Eddy, W. (1980). Optimal kerel estimators of the mode. A. Statist. 8 870-882. Eddy, W. (1982). The asymptotic distributios of kerel estimators of the mode. Z. Wahrsch. Verw. Gebiete 59 279-290. Ehm, W. (1996). Adaptive kerel estimatio of a cusp-shaped mode. I: Fischer, Herbert (ed.) et al.: Applied mathematics ad parallel computig. Festschrift for Klaus Ritter, 109-120, Physica- Verlag, Heidelberg. Eimahl, U. ad Maso, D.M. (2000). A empirical process approach to the uiform cosistecy of kerel-type fuctio estimators. J. Theoret. Probab. 13, 1-37. Grud, B. ad Hall, P. (1995). O the miimisatio of L p error i mode estimatio. A. Statist. 23 2264-2284. Györfi, L., Härdle, W., Sarda, P. ad Vieu, P. (1989). Noparametric Estimatio from Time Series. Lecture Notes i Statistics 60, Spriger-Verlag. Härdle, W., Jasse, P. ad Serflig, R. (1988). Strog uiform cosistecy rates for estimators of coditioal fuctioals. A. Statist. 16 1428-1449. 9
Koshki, G.M. ad Vasil iev, V.A. (1998). Noparametric estimatio of derivatives of a multivariate desity from depedet observatios. Mathematical Methods Statist. 7 361-400. Herrma, E. (2000) Data adaptive kerel regressio estimatio. Habilitatiosschrift, Darmstadt Techical Uiversity. Leclerc, J. ad Pierre-Loti-Viaud, D. (2000). Vitesse de covergece presque sure de l estimateur a oyau du mode. Comptes redus de l Academie des Scieces de Paris 331 637-640. Liebscher, E. (2001). Estimatio of the desity ad the regressio fuctio uder mixig coditio. Statistics ad Decisios 19, 9-26. Mokkadem, A. ad Pelletier, M. (2003): The law of the iterated logarithm for the multivariate kerel mode estimator. ESAIM: Probability ad Statistics 7 1-21. Müller, H.-G. (1985). Kerel estimators of zeros ad of locatio ad size of extrema of regressio fuctios. Scad. J. Statist. 12 221-232. Müller, H.-G. (1989). Adaptive oparametric peak estimatio. A. Statist. 17 1053-1069. Nadaraya, E.A. (1989). Noparametric Estimatio of Probability Desities ad Regressio Curves. Kluver Academic Publishers, Dordrecht. Parze, E. (1962). O estimatio of a probability desity fuctio ad mode. A. Math. Statist. 33 1065-1076. Romao, J.P. (1988a). O weak covergece ad optimality of kerel desity estimates of the mode. A. Statist. 16 629-647. Romao, J.P. (1988b). Bootstrappig the mode. A. Ist. Statist. Math. 40 565-586. Rüschedorf, L. (1977). Cosistecy of estimators for multivariate desity fuctios ad for the mode. Sakhyā Ser.A39 243-250. Vieu, P. (1996). A ote o desity mode estimatio. Statistics ad Probability Letters 26 297-307. Ziegler, K. (2001): O approximatios to the bias of the Nadaraya-Watso regressio estimator. J. Noparametric Statistics 13 583-589. Ziegler, K. (2002): O oparametric kerel estimatio of the mode of the regressio fuctio i the radom desig model. J. Noparametric Statistics 14 749-774. 10