Nonparametric estimation of conditional distributions

Noparametric estimatio of coditioal distributios László Györfi 1 ad Michael Kohler 2 1 Departmet of Computer Sciece ad Iformatio Theory, udapest Uiversity of Techology ad Ecoomics, 1521 Stoczek, U.2, udapest, Hugary, email: gyorfi@szit.bme.hu 2 Fachrichtug 6.1-Mathematik, Uiversität des Saarlades, Postfach 151150, D-66041 Saarbrücke, Germay, email: kohler@math.ui-sb.de November 9, 2005 bstract Estimatio of coditioal distributios is cosidered. It is assumed that the coditioal distributio is either discrete or that it has a desity with respect to the Lebesgue-orelmeasure. Partitioig estimates of the coditioal distributio are costructed ad results cocerig cosistecy ad rate of covergece of the itegrated total variatio error of the estimates are preseted. Key words ad phrases: coditioal desity, coditioal distributio, cofidece sets, partitioig estimate, Poisso regressio, rate of covergece, uiversal cosistecy. Ruig title: Estimatio of coditioal distributios Please sed correspodece ad proofs to: Michael Kohler, Fachrichtug 6.1-Mathematik, Uiversität des Saarlades, Postfach 151150, D-66041 Saarbrücke, Germay, email: kohler@math.ui-sb.de, phoe +49-681-3022435, fax +49-681-3026583, e-mail: kohler@math.ui-sb.de 1

1 Itroductio Oe of the mai tasks i statistics is to estimate a distributio from a give sample. Let µ be a probability distributio o IR d ad let X 1, X 2,... be idepedet ad idetically distributed radom variables with distributio µ. simple but powerful estimate of µ is the empirical distributio µ () = 1 I (X i ), where I deotes the idicator fuctio of the set. y the strog law of large umbers we have µ () µ() a.s. (1) i=1 for each orel set. If we wat to make some statistical iferece about µ it is ot eough to have (1) for each set idividually, istead we eed covergece of µ to µ uiformly over classes of sets. y the Gliveko-Catelli theorem the empirical distributio satisfies sup µ ((, x]) µ((, x]) 0 a.s., (2) x IR d where (, x] = (, x (1) ]... (, x (d) ] for x = (x (1),..., x (d) ) IR d. This is great i case that we wat to make some statistical iferece about itervals, but for more geeral ivestigatios it would be much icer if we are able to cotrol the error i total variatio defied as sup µ () µ(), (3) d where d are the orel-sets i IR d. Clearly, for the empirical distributio the error (3) does ot coverge to zero i geeral, sice if µ has a cotiuous distributio fuctio we have µ({x 1,..., X }) = 0 ad µ ({X 1,..., X }) = 1. If we are able to costruct estimates ˆµ of µ such that sup ˆµ () µ() 0 a.s., (4) d 2

the it is easy to costruct cofidece sets ˆ for the values of X 1 such that they have asymptotically level α for give α (0, 1), i.e. such that lim if µ( ˆ ) 1 α a.s. Ideed, ay set ˆ with ˆµ ( ˆ ) 1 α has this property sice µ( ˆ ) = µ ( ˆ ) (µ ( ˆ ) µ( ˆ )) 1 α sup d µ () µ(). Ufortuatley, as was show i Devroye ad Györfi (1990), it is impossible to costruct estimates ˆµ such that (4) holds for all distributios µ. However, it follows from arro et al. (1992) that i case we restrict ourselves to distributios where the oatomic part is absolutely cotiuous with respect to a kow domiatig measure, it is possible to costruct estimates such that (4) holds for all such distributios. Special cases iclude discrete measures (where we assume for otatioal coveiece that µ(in 0 ) = 1) ad measures which have a desity with respect to the Lebesgue-orel-measure. y Scheffe s theorem it suffices i these cases to costruct estimates (ˆµ ({k})) k IN0 of (µ({k})) k IN0 ad estimates ˆf of the desity f of µ, resp., which satisfy ˆµ ({k}) µ({k}) 0 a.s. (5) k=0 ad f (x) f(x) λ(dx) 0 a.s., (6) where λ deotes the Lebesgue-orel-measure. Here oe estimates µ() by ˆµ () = ˆµ ({k}) ad ˆµ () = k IN 0 3 ˆf (x) dx, resp.

May estimates which satisfy (6) uiversally for all desities are costructed i Devroye ad Györfi (1985a). I this paper we wat to apply the above ideas i the regressio cotext. Here we have give idepedet ad idetically distributed radom vectors (X, Y ), (X 1, Y 1 ),... with values i IR d IR d. Give the sample D = {(X 1, Y 1 ),..., (X, Y )} of the distributio of (X, Y ) we wat to costruct estimates ˆP { x} of the coditioal distributio P{Y X = x} of Y give X such that sup d ˆP { x} P{Y X = x} µ(dx) 0 a.s., (7) where µ deotes agai the distributio of X. I cotrast to stadard regressio, where d = 1 ad where oly the mea E{Y X = x} of the coditioal distributio is estimated (cf., e.g., Györfi et al. (2002)), we ca use estimates with the property (7) ot oly for predictio of the value of Y for give value of X, but also to costruct cofidece regios for the value of Y give the value of X. Ideed, similarly as above oe gets that (7) implies that ay set C (x) with ˆP {C (x) x} 1 α satisfies lim if P{Y C (X) D } 1 α a.s., sice we have with P { } = P{ D } P{Y C (X) D } = P {Y C (x) X = x}µ(dx) ˆP {C (x) x}µ(dx) ˆP {C (x) x} P {Y C (x) X = x} µ(dx) 1 α sup ˆP ( x) P{Y X = x} µ(dx). 4

I order to costruct estimates with the property (7), we cosider two special cases: I the first case the coditioal distributio of Y give X is discrete (ad for otatioal coveiece we assume agai that the support is cotaied i IN 0 ). I the secod case the coditioal distributio of Y give X = x has a desity f( x) with respect to the Lebesgue-orel-measure. I both cases Scheffe s theorem implies that i order to have (7) we have to costruct estimates of P{Y = k X = x} ad f( x) such that k=0 ˆP {k x} P{Y = k X = x} µ(dx) 0 a.s. (8) ad f (y x) f(y x) λ(dy)µ(dx) 0 a.s., (9) resp. I order to costruct i the first case estimates with the property (8) we use two differet approaches: I the first approach we cosider for each y IN 0 P{Y = y X = x} = E{I {Y =y} X = x} as a regressio fuctio ad estimate it by applyig a partitioig estimate to a sample of (X, I {Y =y} ). I the secod approach we cosider Poisso regressio, i.e., we make a parametric assumptio o the way the coditioal distributio of Y give X = x depeds o m(x) ad assume that P{Y = y X = x} = m(x)y y! e m(x) (y IN 0 ) for some m : IR d (0, ), where m is completely ukow. I this case we estimate m(x) = E{Y X = x} by a partitioig estimate m (x) applied to a sample of (X, Y ), ad cosider the plug-i estimate ˆP {Y = y X = x} = m (x) y y! e m(x) (y IN 0 ). 5

I both approaches we preset results cocerig uiversal cosistecy, i.e. we show (8) for all correspodig discrete coditioal distributios, ad we aalyze the rate of covergece of the estimates. Estimates of the coditioal desity i the secod case are defied as suitable partitioig estimates. We preset results cocerig uiversal cosistecy, i.e., we show (9) for all coditioal distributios with desity, ad we aalyze the rate of covergece uder regularity assumptios o the smoothess of the coditioal desity. The paper is orgaized as follows: Our mai results cocerig estimatio of discrete coditioal distributios ad coditioal desities are described i Sectio 2 ad 3, resp. The proofs are give i Sectio 4. 2 The estimatio of discrete coditioal distributios I this sectio we study partitioig estimates of discrete coditioal distributios. I our first two theorems each coditioal probability P{Y = y X = x} is estimated separately. We have the followig result cocerig cosistecy of the estimate. Theorem 1 Let P = {,j : j} be a partitio of IR d ad for x IR d deote by (x) that cell,j of P that cotais x. Let ˆP {y x} = i=1 I (x)(x i ) I {Yi =y} j=1 I (x)(x j ) be the partitioig estimate of P{Y = y X = x}. ssume that the uderlyig partitioig P = {,j : j} satisfies for each sphere S cetered at the origi lim max diam(,j) = 0 (10) j:,j S ad {j :,j S } lim = 0, (11) 6

where diam() deotes the diameter of the set. The ˆP {y x} P{Y = y X = x} µ(dx) 0 a.s. Next we cosider the rate of covergece of the above partitioig estimate. It is wellkow that i order to derive o-trivial rate of covergece results i oparametric regressio oe eeds smoothess assumptio o the uderlyig regressio fuctio (cf., Devroye (1982)). I our ext result we assume that the coditioal probabilities are locally Lipschitz cotiuous, such that the itegral over the sum of the Lipschitz costat is fiite. Theorem 2 ssume X is bouded a.s., P{Y = y X = x} P{Y = y X = z} C y (x) x z for all x, z from the bouded support of X ad for some local Lipschitz costats C y (x) satisfyig ad assume C y (x)µ(dx) = C <, P{Y = y} <. Let ˆP {y x} be the partitioig estimate of P{Y = y X = x} with respect to a partitio of IR d cosistig of cubes with side-legth h. The so for E ˆP {y x} P{Y = y X = x} µ(dx) c 1 1 + P{Y = y} 1 + d C h, h d h = c 2 1/(d+2) 7

we get E ˆP {y x} P{Y = y X = x} µ(dx) c 3 1 d+2. I the ext theorem we cosider Poisso regressio. Here the coditioal distributio of Y give X is give by P{Y = y X = x} = m(x)y y! e m(x) (y IN 0 ) for some m : IR d (0, ). ecause of m(x) = E{Y X = x} we ca estimate it by applyig a partitioig estimate to D ad use a plug-i estimate ˆP {y x} = m (x) y y! e m(x) (y IN 0 ) to estimate the coditioal distributio of Y give X. For this estimate we have the followig result. Theorem 3 ssume that E{Y } < ad P{Y = y X = x} = m(x)y y! e m(x) (y IN 0 ) for some m : IR d (0, ). Let P i=1 I (x)(x i ) Y i P i=1 m (x) = I (x)(x i if ) i=1 I (x)(x i ) > log 0 otherwise. be the (modified) partitioig estimate of m with partitio P = {,j : j} ad set ˆP {y x} = m (x) y y! e m(x) (y IN 0 ). a) ssume that the uderlyig partitio P satisfies (10) ad for each sphere S cetered at the origi {j :,j S } log lim = 0. (12) The ˆP {y x} P{Y = y X = x} µ(dx) 0 a.s. 8

b) ssume X is bouded a.s. ad assume that E{Y 2 } < ad m is Lipschitz cotiuous, i.e. m(x) m(z) C x z for some costat C IR +. Choose the uderlyig partitio such that it cosists of cubes of side-legth h. The E ˆP {y x} P{Y = y X = x} µ(dx) c 4 h d + c 5 h, so for h = c 6 1/(d+2) we get E ˆP {y x} P{Y = y X = x} µ(dx) c 7 1 d+2. Remark 1. ssume that the assumptios of Theorem 3 b) hold. The fuctio f(u) = u y e u /(y!) satisfies for u [0, ] f (u) = y u y 1 y! e u uy y! e u ( + 1) y 1 (y 1)!, so by boudedess of the Lipschitz cotiuous regressio fuctio m we get for y > 0 P{Y = y X = x} P{Y = y X = z} ( + 1) y 1 C x z. (y 1)! This implies that the coditioal probabilities are Lipschitz cotiuous ad that the itegral over the sum of the Lipschitz costat is bouded by 1 y 1 + ( + 1) C = ( 1 + ( + 1) e ) C, (y 1)! y=1 hece uder the assumptio of Theorem 3 b) the estimate i Theorem 2 achieves the same rate of covergece although it does ot deped o the particular form of the coditioal distributio. 9

Remark 2. Uder more restrictive regularity assumptios o the uderlyig distributio cosistecy of a localized log-likelihood Poisso regressio estimate was show i Kohler ad Krzyżak (2005). 3 The estimatio of coditioal desities I this sectio assume that Y takes values i IR d. Our aim is to estimate the coditioal distributio of Y give X cosistetly i total variatio. We assume that Y has absolutely cotiuous distributio ad the coditioal desity of Y give X is deoted by f(y x). For estimatig f(y x), itroduce a histogram estimate. Let Q = {,j : j} be a partitio of IR d, such that the Lebesgue measure λ of each cell is positive ad fiite. Let (y) be the cell of Q ito which y falls. s before let P = {,j : j} be a partitio of IR d ad deote the cell ito which x falls by (x). Put ν (, ) = 1 the the histogram estimate is as follows: i=1 I {Xi,Y i }, f (y x) = ν ( (x), (y)) µ ( (x)) λ( (y)). We will use the followig coditios: assume that for each sphere S cetered at the origi we have lim max diam(,j) = 0 (13) j:,j S ad {j :,j S } lim = 0. (14) The ext theorem exteds the desity-free strog cosistecy result of bou-jaoude (1976) to coditioal desity estimatio. 10

Theorem 4 ssume that the partitios P ad Q satisfy (10), (11), (13) ad (14), resp. The f (y x) f(y x) λ(dy)µ(dx) 0 a.s. Devroye ad Györfi (1985a), ad eirlat ad Györfi (1998) calculated the rate of covergece of the expected L 1 error of the histogram. Next we exted these results to the estimates of coditioal desities. Theorem 5 ssume X ad Y are bouded a.s., ad f(u x) f(y x) C 1 (x) u y ad f(y z) f(y x) C 2 (y) x z for all x, z from the bouded support of X ad for all y, u from the bouded support of Y such that C 1 (z)µ(dz) < ad C 2 (y)λ(dy) <. Let f (y x) be the histogram estimate of f(y x) with respect to a partitios P ad Q cosisitig of cubes with side-legths h ad H, resp. The so for E f (y x) f(y x) λ(dy)µ(dx) c8 c 9 h d + h d + d c 10 h + d H c 11 H, d h = c 12 1/(d+d +2) ad H = c 13 1/(d+d +2) we get E f (y x) f(y x) λ(dy)µ(dx) c 14 1 d+d +2. 11

4 Proofs 4.1 Proof of Theorem 1 Usig (where x + = max{x, 0}) we get = 2 a b = 2(b a) + + (a b) ˆP {y x} P{Y = y X = x} µ(dx) ( P{Y = y X = x} ˆP {y x}) + ˆP {y x}µ(dx) + µ(dx) P{Y = y X = x}µ(dx). Usig the Cauchy-Schwarz iequality ad Theorem 23.1 i Györfi et al. (2002) we get for each fixed y IN 0 ( P{Y = y X = x} ˆP ) {y x} P X(dx) + ˆP {y x} P{Y = y X = x} µ(dx) ( ˆP {y x} P{Y = y X = x}) 2 µ(dx) 0 a.s., which implies together with the domiated covergece theorem, that the first term o the right had side above coverges to zero. Cocerig the secod term we observe = ˆP {y x}µ(dx) i=1 I (x)(x i ) I {Yi =y} j=1 I (x)(x j ) ( ) = I P { j=1 I (x)(x j )>0} 1 µ(dx) = I P µ{ i=1 I,j (X i )=0o,j }. j=0 12 P{Y = y X = x}µ(dx) 1 µ(dx)

Together with (11), it implies that ˆP {y x}µ(dx) µ{,j } µ {,j } j=0 0 P{Y = y X = x}µ(dx) a.s. (cf. Lemma 1 i Devroye ad Györfi (1985b) or, with better costat i the expoetial upper boud, cf. the proof of Lemma 23.2 i Györfi et al. (2002)). 4.2 Proof of Theorem 2 I the sequel we use the otatio ν y, () = 1 i=1 I {Yi =y,x i }, ad with this otatio the partitio estimate is give by ˆP {y x} = ν y,( (x)) µ ( (x)). Thus, = = ˆP {y x} P{Y = y X = x} µ(dx) ν y, ( (x)) µ ( (x)) P ν y, () µ () P{Y = y X = x} µ(dx) P{Y = y X = x} µ(dx) ν y, () P µ () ν y,() µ() µ(dx) + ν y, () P{Y = y, X } P µ() µ() µ(dx) + P{Y = y, X } P{Y = y X = x} µ() µ(dx) P 13

ν y, () µ () ν y,() µ() µ() P + ν y, () P{Y = y, X } µ() µ() µ() P + P{Y = y, X } P{Y = y X = x} µ() µ(dx) = P P + ν y, () 1 µ () 1 µ() µ() ν y, () P{Y = y, X } P + P{Y = y, X } P{Y = y X = x} µ() µ(dx) P µ () µ() P + ν y, () P{Y = y, X } P + P{Y = y, X } P{Y = y X = x} µ() µ(dx), P where we have used for the last iequality that ν y, () = µ (). Sice µ () is biomially distributed with parameters ad µ() we get by Cauchy- Schwarz iequality P E{ µ () µ() } y Jese iequality we have P P E{(µ () µ()) 2 } µ(). ( a1 +... + a l l ) 2 a2 1 +... a2 l l, 14

which implies a 1 +... + a l l (a 2 1 +... a2 l ). Usig this iequality i the sum above for the c 15 /h d may cells P cotaied i the bouded support of X (which are the oly oes with µ() 0) we coclude P E{ µ () µ() } = c15 h d c15 ( ) 2 µ()/ P h d µ() P c15 h d. Similarly we get = = E{ ν y, () P{Y = y, X } } P E{(ν y, () P{Y = y, X }) 2 } P P P{Y = y, X } c 15 P P{Y = y, X } h d c15 c 15 P{Y = y} h d h d P{Y = y}. Fially P{Y = y, X } P{Y = y X = x} P µ() µ(dx) P{Y = y X = z}µ(dz) P{Y = y X = x}µ(dz) P µ() µ() µ(dx) P{Y = y X = z} P{Y = y X = x} µ(dz) µ(dx) µ() P 15

C y (x) diam() µ() µ(dx) P µ() d h C y (x)µ(dx) d h C. Summarizig the above results, the assertio follows. 4.3 Proof of Theorem 3 I the proof we will use the followig lemma. Lemma 1 For arbitrary u, v IR + we have u j j! e u vj j! e v 2 u v. j=0 Proof. W.l.o.g. assume u < v. The u j j! e u vj j=0 u j j! e u uj j=0 ( u j = j=0 j! e v j! e v j! e u uj j! e v + u j j! e v vj j=0 ) ( v j + j=0 = e u ( e u e v) + e v e v e u e v = 2 (1 e (v u)) 2 v u, j! e v j! e v uj j! e v ) sice 1 + x e x (x IR). Proof of Theorem 3. Proof of a): y Lemma 1 we get = ˆP {y x} P{Y = y X = x} µ(dx) m (x) y y! e m(x) m(x)y y! 16 e m(x) µ(dx)

2 m (x) m(x) µ(dx) (15) 0 a.s. by Györfi (1991) (see also Theorems 23.3 i Györfi et al. (2002)). Proof of Part b): Usig (15), E ˆP {y x} P{Y = y X = x} µ(dx) { 2 E c 4 h d + c 5 h, } m (x) m(x) 2 µ(dx) where the last step ca be doe i a similar way as the proof of Theorem 4.3 i Györfi et al. (2002). 4.4 Proof of Theorem 4 Itroduce the otatio ν(, ) = E{ν (, )} = P{X, Y }, the f (y x) f(y x) λ(dy)µ(dx) = P Q P Q + P Q + P Q P Q ν (, ) µ () λ() f(y x) λ(dy)µ(dx) ν (, ) µ () λ() ν (, ) µ() λ() λ(dy)µ(dx) ν (, ) ν(, ) µ() λ() µ() λ() λ(dy)µ(dx) ν(, ) µ() λ() f(y x) λ(dy)µ(dx) = ν (, ) µ () λ() ν (, ) µ() λ() µ()λ() 17

therefore + ν (, ) ν(, ) µ() λ() µ() λ() µ()λ() P Q + ν(, ) µ() λ() f(y x) λ(dy)µ(dx), P Q f (y x) f(y x) λ(dy)µ(dx) ν (, ) 1 µ () 1 µ() µ() P Q + ν (, ) ν(, ) P Q + ν(, ) µ() λ() f(y x) λ(dy)µ(dx) P Q µ () µ() (16) P + ν (, ) ν(, ) (17) P Q + ν(, ) µ() λ() f(y x) λ(dy)µ(dx), (18) P Q where we have used for the last iequality that ν (, ) = µ (). Q ecause of (11), (16) teds to 0 a.s., while (11) ad (14) imply that (17) teds to 0 a.s. (cf. Lemma 1 i Devroye ad Györfi (1985b)). Cocerig the covergece of the bias term (18), itroduce the otatio f (y x) = (x) f(u z)λ(du)µ(dz) (y) µ( (x)) λ( (y)) the P Q = P Q ν(, ) µ() λ() f(y x) λ(dy)µ(dx) f(u z)λ(du)µ(dz) µ() λ() f(y x) λ(dy)µ(dx) 18

= f (y x) f(y x) λ(dy)µ(dx) 0, because of the coditios (10) ad (13). This covergece is obvious if f(y x) is cotiuous ad has compact support. I geeral, we use that f(y x) L 1 (µ λ), ad refer to the deseess result such that the set of cotiuous fuctios i L 1 (µ λ) with compact support is dese i L 1 (µ λ) (cf., e.g., Devroye ad Györfi (2002)). alterative techique would be the Lebesgue desity theorem (cf., e.g., Lemma 24.5 i Györfi et al. (2002)), which is a poitwise covergece, ad together with the Scheefe theorem ad the domiated covergece theorem we are ready. 4.5 Proof of Theorem 5 ecause of the proof of Theorem 4, { E } f (y x) f(y x) λ(dy)µ(dx) E { µ () µ() } P + E { ν (, ) ν(, ) } P Q ν( (x), (y)) + µ( (x)) λ( (y)) f(y x) λ(dy)µ(dx). ccordig to the proof of Theorem 2, the coditio that X is bouded implies that P E { µ () µ() } ad, similarly, usig X ad Y are bouded we ca show P c15 h d, c16 E { ν (, ) ν(, ) } h d Q H d Cocerig the rate of covergece of the bias term we observe ν( (x), (y)) µ( (x)) λ( (y)) f(y x) λ(dy)µ(dx) 19.

= P Q = P Q P Q P Q + P Q ν(, ) µ() λ() f(y x) λ(dy)µ(dx) f(u z)λ(du)µ(dz) µ() λ() pplyig the coditios the theorem we get that f(y x) λ(dy)µ(dx) f(u z) f(y x) λ(du)µ(dz) λ(dy)µ(dx) µ() λ() f(u z) f(y z) λ(du)µ(dz) λ(dy)µ(dx) µ() λ() f(y z) f(y x) λ(du)µ(dz) λ(dy)µ(dx). µ() λ() ν( (x), (y)) µ( (x)) λ( (y)) f(y x) µ(dx)λ(dy) = P Q + P Q C 1 (z)µ(dz)λ(s Y ) d H + C 1(z) d H λ(du)µ(dz) λ(dy)µ(dx) µ() λ() C 2(y) d h λ(du)µ(dz) λ(dy)µ(dx) µ() λ() C 2 (y)λ(dy) d h, where S Y is the bouded support of Y. Refereces [1] bou-jaoude, S. (1976). Coditios écessaires et suffisates de covergece L 1 e probabilité de l histogramme pour ue desité. ales de l Istitut Heri Poicaré, XII, 213-231. [2] arro,. R., Györfi, L. ad va der Meule, E. C. (1992). Distributio estimatio cosistet i total variatio ad two types of iformatio divergece. IEEE Tras. Iformatio Theory 38, pp. 1437-1454. 20

[3] eirlat, J. ad Györfi, L. (1998). O the L 1 error i histogram desity estimatio: the multidimesioal case. Noparametric Statistics 9, pp. 197-216. [4] Devroye, L. (1982). y discrimiatio rule ca have arbitrarily bad probability of error for fiite sample size. IEEE Trasactios o Patter alysis ad Machie Itelligece 4, 154 157. [5] Devroye, L. ad Györfi, L. (1985a). Noparametric Desity Estimatio: The L 1 View. Joh Wiley, New York. [6] Devroye, L. ad Györfi, L. (1985b). Distributio-free expoetial boud for the L 1 error of partitioig estimates of a regressio fuctio. I Probability ad Statistical Decisio Theory, F. Koecy, J. Mogyoródi, W. Wertz, Eds., D. Reidel, pp. 67-76. [7] Devroye, L. ad Györfi, L. (1990). No empirical measure ca coverge i the total variatio sese for all distributio. als of Statistics 18, pp.1496-1499. [8] Devroye, L. ad Györfi, L. (2002). Distributio ad desity estimatio. I Priciples of Noparametric Learig, L. Györfi (Ed.), Spriger-Verlag, Wie, pp. 223-286. [9] Györfi, L. (1991). Uiversal cosistecy of a regressio estimate for ubouded regressio fuctios, Noparametric fuctioal estimatio ad related topics (ed. G. Roussas), 329 338, NTO SI Series, Kluwer cademic Publishers, Dordrecht. [10] Györfi, L., Kohler, M., Krzyżak,., ad Walk, H. (2002). Distributio-Free Theory of Noparametric Regressio. Spriger Series i Statistics, Spriger. [11] Kohler, M. ad Krzyżak,. symptotic cofidece itervals for Poisso regressio. Submitted for publicatio, 2005. 21