Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay, email: kohler@mathematik.tu-darmstadt.de 2 Departmet of Computer Sciece ad Software Egieerig, Cocordia Uiversity, 455 De Maisoeuve Blvd. West, Motreal, Quebec, Caada H3G M8, email: krzyzak@cs.cocordia.ca 3 Fachbereich Mathematik, Uiversität Stuttgart, Pfaffewaldrig 57, 70569 Stuttgart, Germay, email: walk@mathematik.ui-stuttgart.de. October, 200 Abstract Give a idepedet ad idetically distributed sample of the distributio of a R d R-valued radom vector X, Y ) the problem of estimatio of the essetial supremum of the correspodig regressio fuctio mx) = EY X = x} is cosidered. Estimates are costructed which coverge almost surely to this value wheever the depeded variable Y satisfies some weak itegrability coditio. AMS classificatio: Primary 62G08; secodary 62G20. ey words ad phrases: Essetial supremum, oparametric regressio, strog cosistecy. Itroductio Let X, Y ), X, Y ), X 2, Y 2 )... be idepedet ad idetically distributed radom vectors with values i R d R. Assume E Y <, let mx) = EY X = x} be the so-called Correspodig author. Tel: +-54-848-2424 ext. 3007, Fax: +-54-848-2830 Ruig title: Essetial supremum of a regressio fuctio
regressio fuctio, ad let µ = P X be the distributio of the desig variable X. Assume that m is essetially bouded, i.e., assume that a c R exists such that mx) c for µ-almost all x. The essetial supremum of m is defied by ess supm) := if t R : mx) t for µ-almost all x}. ) Replacemet of m by m leads to the essetial supremum orm of m. I this paper we cosider the problem of estimatig the essetial supremum of the regressio fuctio m from a sample of the uderlyig distributio. More precisely, give the data D = X, Y ),..., X, Y )} we wat to costruct estimates ˆM = ˆM D ) ad ˆX = ˆX D ) such that ˆM ess supm) ) a.s. 2) ad m ˆX ) ess supm) ) a.s. 3) Oe way of costructig such estimates is to use a plug-i approach. Here first we use the give data D to costruct a estimate m x) = m x, D ) of mx) for x R d. Next we costruct our plug-i estimates by ˆM = ess sup x R dm x) = ess sup m ad, if possible, ˆX = arg ess sup x R dm x). For the latter estimate it is ecessary that the regressio estimate has at least at oe poit the same value as the essetial supremum. I case that there exists several poits with this property, we choose ayoe of them. It is easy to see that the ess sup x R d m x) mx) 0 ) a.s. 4) implies 2). So to get cosistet estimates of the essetial supremum, this approach requires that the regressio estimate be cosistet i essetial supremum orm. 2
Various cosistecy results for regressio estimates i essetial) supremum orm ca be foud, e.g., i Devroye 978a) ad Härdle ad Luckhaus 984) ad i refereces therei. Uder rather weak assumptios 4) was show for the earest-eighbor regressio estimates i Devroye 978b). The mai result there is that for X [ c, c] d a.s. for some c > 0, sup x [ c,c] d E Y r X = x} < for some r > 2d + ad for cotiuous m the estimate satisfies sup m x) mx) 0 ) a.s. x [ c,c] d Hece uder these assumptios the correspodig plug-i earest eighbor estimates of ess sup m satisfy 2) ad 3). I this paper we show that by a direct aalysis of suitably defied estimates ˆM ad ˆX we get cosistecy uder essetially weaker assumptios. I particular, we show ˆM ess supm) ) for all distributios of X, Y ) such that the regressio fuctio is essetially bouded ad E Y log + Y ) ) <, where logz) if z, log + z) := 0 if z <. If, i additio, m is uiformly cotiuous, we also have m ˆX ) ess supm) ) a.s.. Notatio Throughout this paper we use the followig otatios: x deotes the Euclidea orm of x R d, µ deotes the distributio of X ad mx) = EY X = x} is the regressio fuctio of X, Y ). We write mod µ i case that a assertio holds for µ-almost all x R d. The essetial supremum of a fuctio g : R d R is defied by ess supg) := if t R : gx) t mod µ}. Let D R d ad let f : R d R be a real-valued fuctio defied o R d. We write x = arg max z D fz) if max z D fz) exists ad if x satisfies x D ad fx) = max z D fz). 3
Furthermore we defie logz) if z, log + z := 0 if z <, for z R +..2 Outlie The defiitio of the estimates are give i Sectio 2, the mai result is formulated i Sectio 3 ad prove i Sectio 4. 2 Defiitio of the estimate Let ) be a arbitrary sequece of positive umbers satisfyig 0 ), let : R d R be a kerel fuctio satisfyig x) c Sr x) x R d ), where S r is the closed) ball of radius r cetered at the origi ad A deotes the idicator fuctio of A. Let S r x) = x + S r be the ball of radius r cetered at x, ad let µ be the empirical distributio of X,..., X, i.e., µ A) = A X i ) A R d ). i= We estimate the essetial supremum of the regressio fuctio by the kerel estimate: i= Y i ˆM := sup x R d :µ S r h x)) / log) j= x Xj ), 5) where sup = ad 0/0 = 0. It follows from the proof of Theorem below that uder the assumptios there we have with probability oe ˆM R for sufficietly large. 4
Furthermore we estimate the mode of the regressio fuctio by the mode of the above modified kerel estimate: ˆX := arg max x R d :µ S r h x)) / log) i= Y i. 6) j= x Xj Here we assume for simplicity that the maximum above exists. I case that it does ot exist, it suffices to choose ˆX R d such that i= Y i ˆX X i sup j= ˆX X j x R h d :µ S r h x)) / log) ) ɛ j= x Xj i= Y i for some ɛ R satisfyig ɛ > 0 N) ad ɛ 0 ). It is easy to see that the proof of Theorem remais valid with this modificatio of the defiitio of ˆX. Remark. Assume d = ad let = [,] be the aive kerel. The we have for x, z R d i= Y i wheever = ) = j= x Xj i= Y i Sh z) X i ) = µ S h z)) i= Y i Sh x) X i ) µ S h x)) ) z Xi j= z Xj i= Y i i : X i S h x)} = i : X i S h z)}. Hece if we wat to compute all values of the estimate, it suffices to cosider oly those itervals S h x) = [x, x+ ], where oe of the two borders x ad x+ coicides with oe of the data poits. This implies that the estimates 5) ad 6) ca be computed i practice i fiite time for d = i case of the aive kerel. Remark 2. Assume d > ad assume that the regressio fuctio be uiformly cotiuous. I this case it is easy to see that the proof of Theorem remais valid if we restrict the argumets of our estimate to fiitely may values o some equidistat grid, provided we decrease the grid size ad icrease the area of the grid more ad more as the sample size teds to ifiity. I this case the estimates 5) ad 6) ca be computed i practice i fiite time eve for d > ad for a geeral kerel. 5
3 Mai result Let the estimate ˆM be defied as i the previous sectio. The the followig result is valid: Theorem Let X, Y ), X, Y ),... be idepedet ad idetically distributed radom variables with values i R d R. Assume E Y log + Y ) <, 7) ad assume that the regressio fuctio is essetially bouded. Let : R+ R + be a mootoically decreasig ad left cotiuous fuctio satisfyig +0) > 0 ad t d t 2 ) 0 as t. Defie the kerel : R d R by u) = u 2 ) u R d ) ad let the estimates ˆM ad ˆX be defied as i the previous sectio. Assume that > 0 satisfies 0 ) ad log) h d ). 8) The the followig assertios hold: a) ˆM ess supm) a.s. 9) b) If, i additio, m is uiformly cotiuous ad if the kerel has compact support, the m ˆX ) ess supm) a.s. 0) Remark 3. The proof of Theorem below leads to the followig extesio of the strog uiform cosistecy result of Devroye 978b) o regressio estimatio metioed i Sectio : Let m be the k -earest eighbor k -NN) regressio estimator, i.e., m x) = k Y i Xi is amog the k NNs of x i X,...,X }} i= 6
with k = log, 2 where z deotes the ceilig of z R). Let A be a arbitrary compact subset of R d ad deote the support of the distributio of X by suppµ). The sup m x) mx) 0 a.s. x suppµ) A for all distributios of X, Y ) with ties occurig with probability zero, cotiuous regressio fuctios ad E Y log + Y )} <. The ties coditio is fulfilled if the distributio of X x is absolutely cotiuous for ay x R d, which ca be assumed without loss of geerality see, e.g., Györfi et al. 2002), pp. 86, 87). This cosistecy result ca be obtaied by a modificatio of the proof of Theorem i Sectio 4. Oe cosiders a modified aive kerel estimate m x) = S ) with a data-depedet local badwidth x) = mi h > 0 : µ S h x)) }. log) Further oe iserts EY Shx) x)x)}/px S hx)x)} betwee m x) ad mx). Remark 4. The proof of Theorem below is based o uiform) expoetial iequalities for the sums of idepedet radom variables Theorem 2.5 i Devroye, Györfi ad Lugosi 996) ad Theorem 9. i Györfi et al. 2002)). By usig Theorem 5. i Frake ad Diage 2006) ad the remark there cocerig Theorem.3 i) of Bosq 996) istead it is possible to show that Theorem remais valid if we replace the assumptio that the data are idepedet ad idetically distributed by the assumptio that the data are statioary ad α-mixig with geometrically decreasig mixig coefficiets α or eve polyomially decreasig α satisfyig α = O γ ) for some γ >. Remark 5. Assume E Y +ρ < for some ρ 0, ]. Let δ 0, /4) be arbitrary. By replacig i the proof of Theorem the iequality by M := y > s } log y ) log s ) y > s } y > s } y ρ s ) ρ y > s } with s [δ, /2 δ ρ) it is easy to see that i this case the estimate i= Y i sup x R d :µ S r h x)) δ ρ 7 j= x Xj
satisfies M ess supm) a.s. provided we choose > 0 such that 0 ) ad δ ρ h d ). 4 Proofs Proof of Theorem. a) Let s 0, 2 ) be arbitrary. Usig well-kow results from VC-theory cf., e.g., Theorem 2.5, Theorem 3.3. ad Corollary 3.2 i Devroye, Györfi ad Lugosi 996)) we get P sup µ S r h x)) P X S r h x)) > ɛ x R d } 8 d+2 e ɛ2 32. Furthermore we ca coclude from Theorem 9. i Györfi et al. 2002) ad the proof of Lemma 3.2 i ohler, rzyżak ad Walk 2003) ) x Xi x X P sup E x R d h i= ) 32e 0) 2 d+3) 6 e ɛ2 28 0) 2 ɛ )} } > ɛ ad ) ) } x Xi x X P sup Y i x R d Yi s } EY h Y s } } i= > ɛ ) 32e 0) s 2 d+3) 6 e ɛ 2 28 0) 2 2s. ɛ Applicatio of the Borel-Catelli lemma yields ad sup x R d sup x R d µ S r h x)) P X S r h x)) log / 0 a.s., ) )} i= E x X log / 0 a.s., 2) sup x R d i= Y i Yi s } EY Y s } x X )} 0 a.s. log / 2 s 8 3)
Usig c Sr x) x) 0) =: c 2 we see that for ay x R d satisfyig µ S r h x)) / log) we have )} i= Y i E Y x X )} i= E x X i= Y i Yi > s } i= = Y i Yi s } + i= i= )} )} E Y Y s } x X E Y Y > )} s } x X )} E x X E x X c 2 i= Y i Yi > s } c / log) i= + Y i Yi s } =: T, + T 2, + T 3,. Next we show for i, 2, 3}. + c 2 E Y Y > s }} c P X S r h x)) ) i= ) EY x X Y s } )} )} E x X sup T i, 0 a.s. 4) x R d :µ S r h x)) / log) For i = we have for ay L > ad sufficietly large i= Y i Yi > s } / log) i= Y i log Y i log s ) Y i > s } / log) = s s Y i log Y i Yi > s } i= Y i log Y i Yi >L} i= s E Y log Y Y >L} } a.s. by 7) ad by the strog law of large umbers. Ad because of 7) we get E Y log Y Y >L} } 0 for L, from which 4) follows for i =. 9
For i = 2 we observe E Y Y > sup s }} x R d :µ S r h x)) / log) P X S r h x)) sup x R d :µ S r h x)) / log) log Y ) E Y log s ) Y > s }} µ S r h x)) µ S r h x)) P X S r h x))) s E Y log Y ) Y > s }} log) / log) sup x R d µ S r h x)) P X S r h x))) ) = s E Y log Y ) Y > s }} log) sup x R d µ S r h x)) P X S r h x))). Because of 7) we have E Y log Y ) Y > s }} 0 ), ad together with ) this implies 4) for i = 2. I order to show 4) for i = 3 we observe that we have for ay x R d satisfyig µ S r h x)) / log) for sufficietly large i= Y i Yi s } EY x X Y s } )} )} i= E x X = Because of )} E x X ) i= Y i Yi s } EY Y s } x X )} )} i= E x X )} EY Y s } x X )} E x X ) i= + )} i= E x X x X )} i= Y i Yi s } ) EY Y s } c µ S r h x)) E Y Y s } x X )} + c P X S r h x)) i= Y i Yi s } ) EY Y s } c / log) )} E x X c µ S r h x)) )} x X c 2 log) 3 / E Y Y + s }} c log) / log) µ S r h x)) P X S r h x)))) )} E x X i= c log)/. log) 3 / E Y Y s }} log) 3 / E Y } 0 ) i= 0
ad lim if log) / log) sup x R d µ S r h x)) P X S r h x)) ) > 0 which follows from )) we coclude from 2) ad 3) that 4) also holds for i = 3. Summarizig the above results we see that i= Y i sup sup x R d :µ S r h x)) / log) i= )} i= sup Y i E Y x X )} x R d :µ S r h x)) / log) i= E x X 0 a.s., x R d :µ S r h x)) / log) )} E Y x X )} E x X hece it suffices to show )} E Y x X sup )} ess supm) a.s., x R d :µ S r h x)) / log) E x X i.e., because µa) = 0 implies µ A) = 0 a.s.) )} E Y x X Z := sup )} ess supm) x R d :µs r h x))>0,µ S r h x)) / log) E x X a.s. 5) To show this, we first observe that for ay N ad ay x R d satisfyig µs r h x)) > 0 we have: )} E Y x X )} = E x X Thus E )} mx) x X )} E x X ess supm) E E x X )} x X )} = ess supm). We otice lim sup Z ess supm). 6) log) µ S r h x)) = log) µ S r h x)) µs r h x))) + log) µs r h x)) a.s. mod µ, because accordig to ) log) µ S r h x)) µs r h x))) 0 a.s.
ad log) µs r h x)) cx) log) h d mod µ for some cx) > 0 cf., Devroye 98) or Györfi et al. 2002), Lemma 24.6), ad by 8) log) h d ). Therefore we have with probability oe log) µ S r h x)) for sufficietly large mod µ. 7) Defie the radom set B by B := } x R d : log) µ S r h x)) for sufficietly large. The we have µb) = with probability oe accordig to 7). Set H := z R d : N : µs r h z)) > 0 ad ) mx) x z } ) mz) ). x z By Lemma 24.8 i Györfi et al. 2002) we get µh) =. For every z B H we have EY z X lim if Z )} lim if )} E z X = lim if = lim if = mz). EmX) z X )} )} E z X ) mx) x z ) x z Because of µb H) = this implies lim if Z mz) mod µ. 2
But from this we ca coclude lim if Z ess supm) a.s., 8) thus 8) ad 6) imply 5), which completes the proof of 9). b) Because of 7) ad the defiitio of ˆX we ca assume i the sequel w.l.o.g. that µ S r h ˆX )) / log). Usig the uiform cotiuity of m, µs r h ˆX )) > 0 a.s., u) = 0 for u > δ for some δ > 0 ad 0 ) we get ) mx) ˆX x m ˆX m ˆX ) mx) ) ) ˆX x sup mx) mz) 0 x,z R d, x z δ hece i order to prove 0) it suffices to show ), ˆX x ) ˆX x ) ) mx) ˆX x ) ess supm) ˆX x a.s. As i the proof of 6) we get ) mx) ˆX x ) ˆX x ) ˆX x ) ˆX x ess supm) ess supm). Set A := sup x R d :µ S r h x)) / log) ) i= i= Y i E )} Y x X )} E x X. By defiitio of ˆX we have for ay z R d satisfyig µ S r h z)) / log) i= Y i ˆX X i mx) ˆX x ) ˆX x ) i= ˆX X i ) A i= Y i z Xi A i= z Xi ) mx) z x ) 2 A, z x 3
which implies that we have with probability oe ) mx) ˆX x ) Z 2 A. ˆX x From this, A = sup x R d :µ S r h x)) / log) ) i= i= Y i E )} Y x X )} E x X 0 a.s. cf., part a) of the proof of Theorem ) ad 8) we coclude lim if ) mx) ˆX x ) ess supm) ˆX x a.s., which implies the assertio. Refereces [] Bosq, D. 996). Noparametric Statistics for Stochastic Processes. Lecture Notes i Statistics 0, Spriger, New York. [2] Devroye, L. 978a). The uiform covergece of the Nadarya-Watso regressio fuctio estimate. Caadia Joural of Statistics 6, pp. 79-9. [3] Devroye, L. 978b). The uiform covergece of earest eighbor regressio fuctio estimators ad their applicatio i optimizatio. IEEE Trasactios o Iformatio Theory 24, pp. 42-5. [4] Devroye, L. 98). O the almost everywhere covergece of oparametric regressio fuctio estimates. Aals of Statistics 9, pp. 30 39. [5] Devroye, L., Györfi, L. ad Lugosi, G. 996). A Probabilistic Theory of Patter Recogitio. Spriger-Verlag, New York. [6] Frake, J. ad Diage, M. 2006). Estimatig market risk witeural etworks. Statistics & Decisios 24, pp. 233-253. 4
[7] Györfi, L., ohler, M., rzyżak, A. ad Walk, H. 2002). A Distributio-Free Theory of Noparametric Regressio. Spriger Series i Statistics, Spriger-Verlag, New York. [8] Härdle, W., ad Luckhaus, S. 984). Uiform cosistecy of a class of regressio fuctio estimates. Aals of Statistics 2, pp. 62-623. [9] ohler, M., rzyżak, A. ad Walk, H. 2003). Strog cosistecy of automatic kerel regressio estimates. Aals of the Istitute of Statistical Mathematics 55, pp. 287-308. 5