Nonparametric estimation of a maximum of quantiles

Size: px

Start display at page:

Download "Nonparametric estimation of a maximum of quantiles"

Gavin Lester
6 years ago
Views:

1 Noparametric estimatio of a maximum of quatiles Georg C. Ess 1, Beedict Götz 1, Michael Kohler 2, Adam Krzyżak 3, ad Rolad Platz 4. 1 Fachgebiet Systemzuverlässigkeit ud Maschieakustik SzM, Techische Uiversität Darmstadt, Magdaleestr. 4, Darmstadt, Germay, ess, goetz@szm.tu-darmstadt.de 2 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, Darmstadt, Germay, kohler@mathematik.tu-darmstadt.de 3 Departmet of Computer Sciece ad Software Egieerig, Cocordia Uiversity, 1455 De Maisoeuve Blvd. West, Motreal, Quebec, Caada H3G 1M8, krzyzak@cs.cocordia.ca 4 Frauhofer Istitute for Structural Durability ad System Reliability LBF, Bartigstr. 47, Darmstadt, Germay, rolad.platz@lbf.frauhofer.de May 16, 2014 Abstract A simulatio model of a complex system is cosidered, for which the outcome is described by m(p, X), where p is a parameter of the system, X is a radom iput of the system ad m is a real-valued fuctio. The maximum (with respect to p) of the quatiles of m(p, X) is estimated. The quatiles of m(p, X) of a give level are estimated for various values of p by a order statistic of values m(p i, X i ) where X, X 1, X 2,... are idepedet ad idetically distributed ad where p i is close to p, ad the maximal quatile is estimated by the maximum of these quatile estimates. Uder assumptios o the smoothess of the fuctio describig the depedecy of the values of the quatiles o the parameter p the rate of covergece of this estimate is aalyzed. The fiite sample size behavior of the estimate is illustrated by simulated data ad by applyig it i a simulatio model of a real mechaical system. AMS classificatio: Primary 62G05; secodary 62G30. Key words ad phrases: Coditioal quatile estimatio, maximal quatile, rate of covergece, supremum orm error. Correspodig author. Tel: ext. 3007, Fax: Ruig title: Estimatio of the maximal quatile 1

2 1 Itroductio We cosider a simulatio model of a complex system described by Y = m(p, X), where p is a parameter of the system from some compact subset P of R l, X is a R d -valued radom variable with kow distributio ad m : P R d R is a give (measurable) fuctio. Let G p (y) = Pm(p, X) y be the cumulative distributio fuctio (cdf) of Y for a give value of p. For α (0, 1) let q p,α = miy R : G p (y) α be the α-quatile of m(p, X). Usig at most fuctio evaluatios m(p i, x i ) of m for arbitrarily chose values (p 1, x 1 ),..., (p, x ) P R d we are iterested i estimatig the maximal α-quatile sup q p,α. This problem is motivated by research carried out at the Collaborative Research Cetre SFB 805 at the Techische Uiversität Darmstadt which focuses o ucertaity i load-carryig systems such as truss structures. The truss structure uder cosideratio is a typical system for load distributio i mechaical egieerig (cf., Figure 1). It is made of te odes ad 24 equal beams M 6 7 M σ 5 v,2 σ v, σ v,1 F z E, r, l 2 y z x Figure 1: Truss structure 2

3 with circular cross sectios ad fixed boudary coditios. Four equal tetrahedro structures may be assembled ito modules. The truss is supported statically determied at odes 2, 3 ad 4. A sigle force F z i z-directio acts at ode 1 ad two symmetric momets M act at odes 6 ad 7 as show i Figure 1. Due to the o-symmetric loadig, the maximum equivalet stress σ v (cf., e.g., Gross et al. (2007)) is ot equal i every beam. It is iflueced ot oly by the o-symmetric loadig via F z ad M but also by the beam s material parameter Youg s modulus E as well as geometric parameters radius r ad legth l. We assume that from the maufacturig process of the truss structure we kow for each beam parameter E, r ad l the correspodig rages [a, b]. The parameter vector p = (E, r, l) describes the parameters for all 24 equal beams withi the truss structure. The occurrig radom force F z is ormally distributed ad the momets M have trucated ormal distributio restricted to positive half-axis. A umerical fiite elemet model is built that calculates the maximum equivalet stress Y = σ v,1, Y = σ v,2 ad Y = σ v,3 i the beams 1, 2 ad 3, resp., that are the members with the highest equivalet stresses i this specific truss structure. So our parameter set is P = [a E, b E ] [a r, b r ] [a l, b l ]. The fuctio m : P R + R + describes the physical model of the truss structure ad Y = m(p, X) is the occurrig maximum equivalet stress i beam 1, 2 ad 3, resp., of the truss structure with load vector X = (F z, M). We are iterested i quatifyig the ucertaity i the radom quatity Y = m(p, X) which we characterize by determiig a iterval i which the value of Y is cotaied with high probability. Such a iterval ca be determied by computig quatiles of Y for α close to oe (leadig to the right ed poit of the iterval) ad α close to zero (leadig to the left ed poit of the iterval). However, we do ot kow the exact value of p P. Hece istead of computig the right ed poit of the iterval by a quatile of Y we compute for each p P the α-quatile q p,α of m(p, X) ad use sup q p,α for α close to oe as the right ed poit of the iterval. Similarly we ca costruct the left ed poit of the iterval by computig if q p,α for α close to zero. I the sequel we will describe how we ca defie estimates of these quatities. I order to simplify the otatio we cosider oly sup q p,α, the other value ca be estimated similarly. For p P we estimate the α-quatile of G p by the α-quatile of a empirical cumulative distributio fuctio correspodig to data poits m(p i, X i ), where p i is close to p ad X, X 1, X 2,... are idepedet ad idetically distributed radom variables. The we estimate the maximum quatile by the maximum of these quatile estimates. Uder suitable assumptios o the smoothess of the fuctio describig the depedecy of the quatiles o the parameter we aalyze the rate of covergece of this estimate. The fiite sample size behavior of the estimate is illustrated usig simulated data. I our proofs the mai trick is to relate the error of our quatile estimates to the behavior of 3

4 the order statistics of a fixed sequece of idepedet radom variables uiformly distributed o (0, 1). The estimatio problem i this paper is estimatio of a quatile fuctio i a fixed desig regressio problem. Usually this kid of problem is studied i the literature i the cotext of radom desig regressio, see, e.g., Yu, Lu ad Stader (2003), Koeker (2005), ad the literature cited therei. I the cotext of radom desig regressio the rate of covergece results for local averagig estimates of coditioal quatiles have bee derived i Bhattacharya ad Gagopadhya (1990). Results cocerig estimates based o support vector machies ca be foud i Steiwart ad Christma (2011). The problem of estimatig the maximal quatile was ot cosidered i the papers metioed above. The cocept of local averagig cosidered i this paper is also very popular i the cotext of oparametric regressio, cf., e.g., Nadaraya (1964, 1970), Watso (1964), Stoe (1977), Devroye ad Wager (1980), Györfi (1981), Devroye (1982), Devroye ad Krzyżak (1989), Devroye, Györfi, Krzyżak ad Lugosi (1994) or Beirlat ad Györfi (1998). Throughout the paper we use the followig otatio: N, R ad R + are the set of positive itegers, real umbers, ad oegative real umbers, resp. For z R we deote the smallest iteger greater tha or equal to z by z. For a set A its idicator fuctio (which takes o value oe o A ad zero outside of A) is deoted by I A. x = maxx (1),..., x (d) is the supremum orm of a vector x = (x (1),..., x (d) ) T order statistics are deoted by z 1:,..., z :, i.e., z 1:,..., z : R d. For z 1,..., z R the correspodig is a permuatio of z 1,..., z satisfyig z 1: z :. Similarly, the order statistics of real valued radom variables are defied. If Z are real valued radom variables ad δ > 0 are positive real umbers, we write Z = O P (δ ) if lim lim sup P Z > c δ = 0. c The defiitio of our estimate of the maximum quatile is give i Sectio 2, the mai result is formulated i Sectio 3 ad prove i Sectio 5. The fiite sample size behaviour of the estimate is illustrated i Sectio 4 usig simulated data. 2 Defiitio of the estimate I order to simplify the otatio we assume i the sequel for all of our theoretical cosideratios that the set of parameters is give by P = [0, 1] l. Let X, X 1, X 2,... be idepedet ad idetically distributed ad let p 1,..., p be equidistatly chose from P. I the sequel we estimate the maximum quatile sup q p,α from the data D = ((p 1, X 1 ), m(p 1, X 1 )),..., ((p, X ), m(p, X )). 4

5 For p P we estimate the cumulative distributio fuctio G p (y) = Pm(p, X) y by the empirical cumulative distributio fuctio based o all those m(p i, X i ) where the distace betwee p i ad p i supremum orm is at most h. Here h > 0 is a parameter of the estimate. I order to defie this estimate, let K = I [ 1,1] l be the aive kerel. The Ĝ p (y) = ( ) i=1 I m(p i,x i) y K p pi h ( ) j=1 K p pj h is our estimate of G p (y). Next, the quatile is estimated by the plug-i estimate q p,α = mi y R : G p (y) α ˆq p,α = mi Fially, we estimate the maximal quatile y R : Ĝ p (y) α. sup q p,α by the maximum of the estimated quatile ˆq p,α for p p 1,..., p, i.e., by ˆM = max ˆq p i=1,..., i,α. (1) 3 Mai results Our mai result is the followig boud o the error of our maximal quatile estimate. Theorem 1 Let α (0, 1), let P = [0, 1] l ad let m : P R d R be a measurable fuctio. Let X be a R d -valued radom variable ad for p P let G p be the cumulative distributio fuctio of m(p, X). Let q p,α = miy R : G p (y) α be the α-quatile of m(p, X). Assume that there exists δ > 0 such that for some c 1 > 0 for all p P the derivative of G p exists o [q p,α δ, q p,α + δ] ad is cotiuous o this iterval ad greater tha c 1. Let the estimate ˆM be defied as i (1) for some h > 0 satisfyig h 0 ( ) ad h l ( ). 5

6 The ( ) ˆM log() sup q p,α = O P + sup q p1,α q h l p2,α. p 1,p 2 P : p 1 p 2 h If we impose some smoothess coditio o the fuctio describig the depedecy of the values of the quatiles o the parameter p, we ca derive from the above theorem the followig rate of covergece result. Corollary 1 Assume that the assumptios of Theorem 1 hold ad that, i additio, q p,α is (r, C)- smooth as a fuctio of p P for some r (0, 1] ad C > 0, i.e., Set q p1,α q p2,α C p 1 p 2 r (p 1, p 2 P). ( log 2 () h = c )1/(2r+l) ad defie the estimate ˆM as i Sectio 2. The ( (log ˆM 2 )r/(2r+l) ) () sup q p,α = O P. Proof. Sice q p,α is (r, C)-smooth as a fuctio of p P we ca coclude from Theorem 1 ( ) ˆM log() sup q p,α = O P + C h r h l. The defiitio of h implies the result. Remark 1. The rate of covergece i Corollary 1 is (up to some logarithmic factor) the same as the optimal miimax rate of covergece for estimatio of a (r, C) smooth fuctio o a compact subset of R l i supremum orm derived i Stoe (1982). Remark 2. I ay applicatio of the above estimate we have to select the badwidth h of the estimate i a data-drive way. We suggest to choose the badwidth i a optimal way i view of estimatio of G p (y) by Ĝp(y) (p P) for suitable chose y R. To do this, we propose to use a versio of the well-kow splittig of the sample techique i oparametric regressio (cf., e.g., Chapter 7 i Györfi et al. (2002)). More precisely, let us assume that we have available additioal radom variables X 1,..., X such that X, X 1,..., X, X1,..., X are idepedet ad idetically distributed, ad that we observe m(p i, X i ) for these radom variables. We choose y as the α-quatile of the empirical cumulative distributio fuctio correspodig to m(p 1, X 1 ),..., m(p, X ), ad choose h by miimizig 1 I m(pi, X i) y Ĝp i (y) 2. i=1 6

7 I the ext sectio we will ivestigate the performace of this data-drive way of choosig the badwidth usig simulated data. 4 Applicatio to simulated data I this sectio we illustrate the fiite sample size behaviour of our estimate (i particular i view of the data-drive choice of the badwidth i Remark 2) with the aid of simulated data. I all simulatios we choose the sample size for the estimate of the maximal quatile as = 20, 000, where we use half of the data to choose the badwidth of the estimate from a fiite set of badwidths as described i Remark 2. I each simulatio we compute the absolute differece betwee the true value of the maximal quatile ad firstly the estimate with the data-drive choice of the badwidth ad secodly with the estimate with sample size /2 applied with that value of the badwidth which leads to the miimal error for the give set of data. The level of the quatile is chose as α = We repeat each simulatio 100 times ad report the mea values ad the stadard deviatios of the resultig 100 error values for the two estimates. I our first example we choose X as a stadard ormally distributed radom variable, P = [0, 1] ad defie m(p, X) = 0.5 si(2 π p) + X. Hece m(p, X) is ormally distributed with mea 0.5 si(2 π p) ad variace 1. I this case the α = 0.95 quatile of m(p, X) is give by q p,α 0.5 si(2 π p) We choose i this example the badwidth from the set 0.01, 0.05, 0.1, 0.15, 0.2, 0.3, 0.4. I Figure 2 we show a typical simulatio for the two quatile estimates usig the data-drive badwidth ad the optimal badwidth, respectively. The mea values (stadard deviatios) of our errors are i case of our data drive choice of the badwidth approximately give by (0.034) as compared to (0.015) i case of the optimal badwidth choice, which is ot applicable i practice sice it depeds o the ukow maximal value. Hece the mea absolute error of our estimate usig the datadrive choice of the badwidth is approximately oly 2.5-times larger tha the mea absolute error of the estimate usig the optimal choice of the badwidth, which is ot kow i practice. I our secod example we agai choose radom variable X with a stadard ormal distributio ad P = [0, 1], but this time we defie m by m(p, x) = exp(p + x), which implies that m(p, X) is logormally distributed ad its logarithm has mea p ad variace 1. Agai the α-quatiles are kow ad ca be computed with a statistics package. The badwidth is chose from the same set as i the first example. I Figure 3 we show typical simulatio for the 7

8 With data drive badwidth 0.1 With optimal badwidth Figure 2: Typical simulatio i the first model two quatile estimates usig the data-drive badwidth ad the optimal badwidth, respectively. I this example the mea values (stadard deviatios) of our errors are i case of our data drive With data drive badwidth 0.1 With optimal badwidth Figure 3: Typical simulatio i the secod model choice of the badwidth approximately give by (0.517) as compared to (0.285) i case of the optimal badwidth choice, hece the mea error of the data-drive badwidth choice is withi a factor of 2.5 of the mea value of the optimal (ad i practice ot available) badwidth choice. I our third example we choose X = (X (1), X (2), X (3) ) T where X (1), X (2) ad X (3) are ide- 8

9 pedet stadard ormally distributed radom variables, P = [ 0.5, 0.5] 3 ad m(p, x) = (p (1) + x (1) ) 2 + (p (2) + x (2) ) 2 + (p (3) + x (1) ) 2. I this case m(p, X) is o-cetral chi-square radom variable with 3 degrees of freedom ad ocetrality factor (p (1) ) 2 + (p (2) ) 2 + (p (3) ) 2. Agai the α-quatiles are kow ad ca be computed with a statistics package. The badwidth is chose from the same set as above. I this example the mea values (stadard deviatios) of our errors are i case of our data drive choice of the badwidth approximately give by (0.376) as compared to (0.144) i case of the optimal badwidth choice, hece the mea error of the data-drive badwidth choice is withi a factor of approximately 2 of the mea value of the optimal (ot available i practice) badwidth choice. Fially we apply our ewly proposed estimatio method i the applicatio described i Sectio 1. The umerical simulatio model of the truss structure i Figure 1 is obtaied by fiite elemets with the commercial software ANSYS. Each of the 24 beams is modeled with the same parameters E, r ad l. Timosheko beam theory is applied to all beams by usig BEAM189 elemets i ANSYS. The 10 coectig odes are modeled as rigid liks usig MPC184 elemets. Deflectio is costraied at ode 2 i x-, y- ad z-directio, at ode 3 i y- ad z-directio ad at ode 4 i z-directio such that the truss is supported statically determied. The beam parameters E, r ad l are assumed to be i the itervals give i Table 1. Radom Table 1: Ucertai parameters parameter p value uit Youg s modulus E [69.7, 73.7] 10 9 N/m 2 radius r [0.0049, ] m legth l [0.3995, ] m loadig is applied to ode 1 as a sigle force F z i z-directio ad two symmetric momets with magitude M that act aroud axes i the x-y-plae i a agle of +60 ad 60 from x-directio at odes 6 ad 7, respectively. The load F z is ormally distributed with mea value µ F z ad stadard deviatio σ F z. The momets M have trucated ormal distributio restricted to positive half axis, where the uderlyig ormal distributio has mea value µ M = 0 ad stadard deviatio σ M, see Table 2. To obtai the maximum equivalet stresses σ v,1, σ v,2 ad σ v,3, a static aalysis is carried out for each parameter set ad load set. Two idepedet radom load sets were geerated for each combiatio of 41 values of each parameter set distributed equidistatly i the parameter itervals. I total, determiistic calculatios were carried out. Applicatio of the ewly proposed estimate of the maximum 0.95-quatile to this data results i predicted maximum stresses of N/m 2, N/m 2 ad N/m 2 i the 9

10 Table 2: Ucertai loads load X value uit load F z µ F z = 1000, σ F z = N momets M µ M = 0, σ M = 3.33 Nm three cosidered beams, respectively. I cotrast, if we igore the ucertaity i the parameters ad just compute 0.95-quatiles usig the mea values of these parameters ad a data set of 41 3 correspodig values, we get istead N/m 2, N/m 2 ad N/m 2, hece we get values which are 3.8%, 4.9% ad 4.9% lower, respectively. 5 Proofs 5.1 Auxiliary results I this subsectio we formulate ad prove two lemmas which we will eed i the proof of Theorem 1. Our first lemma shows that if we mix coupled samples from several differet distributios, the empirical quatile of the mixed samples lies with probability oe betwee the miimal ad the maximal quatiles of the idividual samples. Here we use the fact that we do ot chage the joit distributio of our samples if we assume that the radom variables are defied by applyig the iverse of the cumulative distributio fuctio to uiformly distributed radom variables. More precisely, our settig is the followig: Let M N be the umber of samples which we mix, ad for k 1,..., M let H (k) : (0, 1) R be a mootoically icreasig fuctio (which we will choose later as the iverse fuctio of a cumulative distributio fuctio). Let N N, let u 1,..., u N (0, 1) ad deote by u 1:N,..., u N:N the correspodig order statistics. We defie M differet coupled samples by z (k) i = H (k) (u i ) (i = 1,..., N) for k 1,..., M, ad we mix them by choosig for i = 1,..., N. Let Ĝ (k) (y) = 1 N N i=1 z i z (1) i,..., z (M) i I z (k) i y ad Ĝ(y) = 1 N N i=1 I zi y be the empirical cumulative distributio fuctios correspodig to z (k) 1,..., z(k) N ad z 1,..., z N, resp., ad let ˆq (k) α = mi y R : Ĝ (k) (y) α ad ˆq α = mi y R : Ĝ(y) α 10

11 be the correspodig quatile estimates. The the followig result holds. Lemma 1 Defie ˆq (k) α ad ˆq α as above. The mi k=1,...,m ˆq(k) α ˆq α max k=1,...,m ˆq(k) α. Proof. By costructio we have ˆq (k) α = H (k) (u N α :N ) (2) for k 1,..., M. Furthermore, for k 1,..., k N 1,..., M suitably chose we have that H (k1) (u 1:N ),..., H (k N ) (u N:N ) is a permuatio of z 1,..., z N. Choose k mi, k max 1,..., M such that mi k=1,...,m ˆq(k) α = ˆq α (kmi) ad max k=1,...,m ˆq(k) α = ˆq α (kmax). The by (2) we have for ay k 1,..., M H (kmi) (u N α :N ) H (k) (u N α :N ) H (kmax) (u N α :N ). Usig this relatio ad the mootoicity of H (k) we get for ay i 1,..., N α ad for ay i N α,..., N H (ki) (u i:n ) H (ki) (u N α :N ) H (kmax) (u N α :N ) = max k=1,...,m ˆq(k) α H (ki) (u i:n ) H (ki) (u N α :N ) H (kmi) (u N α :N ) = mi k=1,...,m ˆq(k) α. Now, let z 1:N,..., z N:N be the order statistics of z 1,..., z. The the above iequalities imply Sice z N α :N = ˆq α this completes the proof. mi k=1,...,m ˆq(k) α z N α :N max k=1,...,m ˆq(k) α. Our secod lemma relates the error of the quatile estimate to the behavior of order statistics correspodig to some fixed sequece of idepedet radom variables distributed uiformly o (0, 1). 11

12 Lemma 2 Let p P be fixed ad let the estimate ˆq p,α be defied as i Sectio 2. Assume that there exists a costat c 1 > 0 such that for some δ > 0 we have that G p is cotiuously differetiable o [q p,α δ, q p,α + δ] with derivative bouded from below by c 1, for all p P satisfyig p p δ. Let N = 1 i : p p i h be the umber of parameters p i which have at most suporm distace h to p. Let U 1,..., U N be idepedet radom variables distributed uiformly o (0, 1) ad deote their order statistics by U 1:N,..., U N:N. The we have for ay 0 < ɛ < δ satisfyig δ h : P ˆq p,α q p,α > ɛ + Proof. For p P let sup q p,α q p,α : p p h G 1 p (u) = mi y R : G p (y) u (u (0, 1)) 2 P U N α :N α > c1 ɛ. be the geeralized iverse of G p. The it is well-kow that G 1 p (U 1 ) has the same distributio as m( p, X 1 ). Sice Ĝp is by defiitio (observe that K is the aive kerel) the empirical cumulative distributio fuctio of m(p i1, X i1 ),..., m(p in, X in ) for suitable chose pairwise distict i 1,..., i N 1,...,, we see that it has the same distributio as the empirical cumulative distributio fuctio ĜN of G 1 p i1 (U 1 ),..., G 1 p in (U N ). Cosequetly, ˆq p,α has the same distributio as the quatile estimate ˆq U α correspodig to ĜN which implies for ay η > 0 Let P ˆq p,α q p,α > η = P ˆq U α q p,α > η. Ĝ (k) (y) = 1 N N i=1 I G 1 (U i) y be the cumulative emprirical distributio fuctio of G 1 (U 1 ),..., G 1 (U N ) ad deote by ˆq α (k) the correspodig quatile estimate (k = 1,..., N). Applicatio of Lemma 1 yields for ay η > 0 P ˆq α U q p,α > η P mi k=1,...,n ˆq(k) α q p,α > η + P max k=1,...,n ˆq(k) α q p,α > η 2 P max ˆq α (k) q pik,α + max q pik,α q p,α > η. k=1,...,n Summarizig the above results we see that we have show P ˆq p,α q p,α > ɛ + 2 P = 2 P max ˆq (k) k=1,...,n max G 1 k=1,...,n sup : p p h k=1,...,n q p,α q p,α α q pik,α > ɛ (U N α :N ) G 1 (α) > ɛ. 12

13 By the assumptios of Lemma 2, o [q pik,α δ, q pik,α + δ] the derivative of G pik exists ad is bouded from below by c 1 > 0. Cosequetly, o [α c 1 δ, α + c 1 δ], the iverse fuctio of G pik is cotiuously differetiable with derivative bouded from above by 1/c 1. So i case that U N α :N α < c1 δ we ca coclude from the mea value theorem that we have G 1 (U N α :N ) G 1 (α) 1 U N α :N α. c 1 This implies P max G 1 k=1,...,n which completes the proof. (U N α :N ) G 1 (α) > ɛ P U N α :N α > c 1 ɛ, 5.2 Proof of Theorem 1 Sice ˆM sup q p,α it suffices to show ˆM max ad Because of i=1,..., q p i,α ˆM max i=1,..., q p i,α + max q p i,α sup q p,α i=1,..., ( ) = O log() P + sup q p1,α q 1/l p2,α h p 1,p 2 P : p 1 p 2 h ( ) max q log() p i,α sup q p,α = O + sup q p1,α q i=1,..., 1/l p2,α. (4) h p 1,p 2 P : p 1 p 2 h max q p i,α sup i=1,..., q p,α = sup q p,α max i=1,..., q p i,α = sup mi (q p,α q i=1,..., pi,α) sup q p1,α q p2,α, p 1,p 2 P : p 1 p 2 1/ 1/l (4) follows from 1/l h for sufficietly large, which is implied by h l ( ). I order to prove (3) we observe that for a i, b i R (i = 1,..., ) we have max a i max b i i=1,..., i=1,..., max a i b i, (5) i=1,..., which together with the uio boud implies for arbitrary ɛ > 0 P ˆM max q p i,α i=1,..., > ɛ + sup q p1,α q p2,α p 1,p 2 P : p 1 p 2 h P max ˆq p i,α q i=1,..., pi,α > ɛ + sup q p1,α q p2,α p 1,p 2 P : p 1 p 2 h max P i=1,..., ˆq pi,α q pi,α > ɛ + sup q p1,α q p2,α p 1,p 2 P : p 1 p 2 h (3). (6) 13

14 Let U i be defied as i Lemma 2 ad let F N be the empirical cumulative distributio fuctio correspodig to U 1,..., U N ad let F be the cumulative distributio fuctio of U 1. The P U N α :N α > c 1 ɛ N α = P F (U N α :N) F N (U N α :N) + N α > c 1 ɛ P sup F N (t) F (t) > c 1 ɛ 1. t R N I case that c 1 ɛ > 1 N we ca boud the last probability usig well-kow results from Vapik- Chervoekis theory (cf., e.g., Theorem 12.4 i Devroye, Györfi ad Lugosi (1996)) ad get P U N α :N α > c1 ɛ ( ( 8 (N + 1) exp N c 1 ɛ 1 ) ) 2 /32. N Usig this, N h l, Lemma 2 ad (6) we coclude P ˆM max q p i,α i=1,..., > ɛ + sup q p1,α q p2,α p 1,p 2 P : p 1 p 2 h ( ( c 3 ( h l + 1) exp c 3 h l c 1 ɛ 1 ) ) 2 h l, which implies ˆM max i=1,..., q p i,α = O P ( ) log() h l + sup q p1,α q p2,α. p 1,p 2 P : p 1 p 2 h This completes the proof. 6 Ackowledgemets The authors would like to thak the Germa Research Foudatio (DFG) for fudig this project withi the Collaborative Research Cetre SFB 805. Refereces [1] Bhattacharya, P. K. ad Gagopadhya, A. K. (1990). Kerel ad earest-eighbor estimatio of coditioal quatile. Aals of Statistics 18, pp [2] Beirlat, J. ad Györfi, L. (1998). O the asymptotic L 2 -error i partitioig regressio estimatio. Joural of Statistical Plaig ad Iferece, 71, pp [3] Devroye, L. (1982). Necessary ad sufficiet coditios for the almost everywhere covergece of earest eighbor regressio fuctio estimates. Zeitschrift für Wahrscheilichkeitstheorie ud verwadte Gebiete, 61, pp

15 [4] Devroye, L., Györfi, L., Krzyżak, A., ad Lugosi, G. (1994). O the strog uiversal cosistecy of earest eighbor regressio fuctio estimates. Aals of Statistics, 22, pp [5] Devroye, L., Györfi, L. ad Lugosi, G. (1996). A Probabilistic Theory of Patter Recogitio. Spriger-Verlag, New York. [6] Devroye, L. ad Krzyżak, A. (1989). A equivalece theorem for L 1 covergece of the kerel regressio estimate. Joural of Statistical Plaig ad Iferece, 23, pp [7] Devroye, L. ad Wager, T. J. (1980). Distributio-free cosistecy results i oparametric discrimiatio ad regressio fuctio estimatio. Aals of Statistics, 8, pp [8] Gross, D., Hauger, W., Schröder, J. ad Wall, W. A. (2007). Techische Mechaik Bad 2: Elastostatik. Spriger-Verlag, New York. [9] Györfi, L. (1981). Recet results o oparametric regressio estimate ad multiple classificatio. Problems of Cotrol ad Iformatio Theory, 10, pp [10] Györfi, L., Kohler, M., Krzyżak, A. ad Walk, H. (2002). A Distributio-Free Theory of Noparametric Regressio. Spriger Series i Statistics, Spriger-Verlag, New York. [11] Koeker, R. (2005). Quatile regressio. Cambridge Uiversity Press. [12] Nadaraya, E. A. (1964). O estimatig regressio. Theory of Probability ad its Applicatios, 9, pp [13] Nadaraya, E. A. (1970). Remarks o oparametric estimates for desity fuctios ad regressio curves. Theory of Probability ad its Applicatios, 15, pp [14] Steiwart, I. ad Christma, A. (2011). Estimatio coditioal quatiles with the help of the piball loss. Beroulli 17, pp [15] Stoe, C. J. (1977). Cosistet oparametric regressio. Aals of Statististics, 5, pp [16] Stoe, C. J. (1982). Optimal global rates of covergece for oparametric regressio. Aals of Statistics, 10, pp [17] Watso, G. S. (1964). Smooth regressio aalysis. Sakhya Series A, 26, pp [18] Yu, K., Lu, Z. ad Stader, J. (2003). Quatile regressio: applicatio ad curret research areas. Joural of the Royal Statistical Society, Series D 52, pp

Estimation of the essential supremum of a regression function

Estimation of the essential supremum of a regression function Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,