Joural of Machie Learig Research 6 205) 863-877 Submitted 6/5; Published 9/5 O the Asymptotic Normality of a stimate of a Regressio Fuctioal László Györfi Departmet of Computer Sciece Iformatio Theory Budapest Uiversity of Techology coomics Magyar Tudósok körútja 2., H-7 Budapest, Hugary Harro Walk Departmet of Mathematics Uiversity of Stuttgart Pfaffewaldrig 57, D-70569 Stuttgart, Germay gyorfi@cs.bme.hu walk@mathematik.ui-stuttgart.de ditor: Alex Gammerma Vladimir Vovk Abstract A estimate of the secod momet of the regressio fuctio is itroduced. Its asymptotic ormality is proved such that the asymptotic variace depeds either o the dimesio of the observatio vector, or o the smoothess properties of the regressio fuctio. The asymptotic variace is give explicitly. Keywords: oparametric estimatio, regressio fuctioal, cetral limit theorem, partitioig estimate. Itroductio This paper cosiders a histogram-based estimate of secod momet of the regressio fuctio i multivariate problems. The iterest i the secod momet is motivated by the fact that by estimatig it oe obtais a estimate of the best possible achievable mea squared error, a quatity of obvious statistical iterest. It is show that the estimate is asymptotically ormally distributed. It is remarkable that the asymptotic variace oly depeds o momets of the regressio fuctio but either o its smoothess, or o the dimesio of the space. The proof relies o a Poissoizatio techique that has bee used successfully i related problems. Let Y be a real valued rom variable with {Y 2 } < let X = X ),..., X d) ) be a d-dimesioal rom observatioal vector. I regressio aalysis oe wishes to estimate Y give X, i.e., oe wats to fid a fuctio g defied o the rage of X so that gx) is close to Y. Assume that the mai aim of the aalysis is to miimize the mea squared error : mi g {gx) Y ) 2 }.. This research has bee partially supported by the uropea Uio Hugary co-fiaced by the uropea Social Fud through the project TMOP-4.2.2.C-//KONV-202-0004 - Natioal Research Ceter for Developmet Market Itroductio of Advaced Iformatio Commuicatio Techologies. c 205 László Györfi Harro Walk.
Györfi Walk As is well-kow, this miimum is achieved by the regressio fuctio mx), which is defied by mx) = {Y X = x}. ) For each measurable fuctio g oe has {gx) Y ) 2 } = {mx) Y ) 2 } + {mx) gx)) 2 } = {mx) Y ) 2 } + mx) gx) 2 µdx), where µ sts for the distributio of the observatio X. It is of great importace to be able to estimate the miimum mea squared error L = {mx) Y ) 2 } accurately, eve before a regressio estimate is applied: i a stard oparametric regressio desig process, oe cosiders a fiite umber of real-valued features X i), i I, evaluates whether these suffice to explai Y. I case they suffice for the give explaatory task, a estimatio method ca be applied o the basis of the features already uder cosideratio, if ot, more or differet features must be cosidered. The quality of a subvector {X i), i I} of X is measured by the miimum mea squared error ) 2 L I) := Y {Y X i), i I} that ca be achieved usig the features as explaatory variables. L I) depeds upo the ukow distributio of Y, X i) : i I). The first phase of ay regressio estimatio process therefore heavily relies o estimates of L eve before a regressio estimate is picked). Cocerig dimesio reductio the related testig problem is o the hypothesis L = L I). This testig problem ca be maaged such that we estimate both L L I), accept the hypothesis if the two estimates are close to each other. Cf. De Brabater et al. 204).) Devroye et al. 2003), vas Joes 2008), Liitiäie et al. 2008), Liitiäie et al. 2009), Liitiäie et al. 200), Ferrario Walk 202) itroduced earest eighbor based estimates of L, proved strog uiversal cosistecy calculated the fast) rate of covergece. Because of L = {Y 2 } {mx) 2 } {Y 2 } <, estimatig L is equivalet to estimatig the secod momet S of the regressio fuctio: S = {mx) 2 } = mx) 2 µdx). I this paper we itroduce a partitioig based estimator of S, show its asymptotic ormality. It turs out that the asymptotic variace depeds either o the dimesio of the observatio vector, or o the smoothess properties of the regressio fuctio. The asymptotic variace is give explicitly. 864
O the Asymptotic Normality of a Regressio Fuctioal stimate 2. A Splittig stimate We suppose that the regressio estimatio problem is based o a sequece X, Y ), X 2, Y 2 ),... of i.i.d. rom vectors distributed as X, Y ). Let P = {A,j, j =, 2,...} be a cubic partitio of IR d of size h > 0. The partitioig estimator of the regressio fuctio m is defied as iterpretig 0/0 = 0) with m x) = ν A,j ) µ A,j ) if x A,j, 2) ν A) = µ A) = I {Xi A}Y i I {Xi A}. Here I deotes the idicator fuctio.) If for cubic partitio h d h 0 3) as, the the partitioig regressio estimate 2) is weakly uiversally cosistet, which meas that { } lim m x) mx)) 2 µdx) = 0 4) for ay distributio of X, Y ) with {Y 2 } <, for bouded Y it holds m x) mx)) 2 µdx) = 0 5) lim a.s. Cf. Theorems 4.2 23. i Györfi et al. 2002).) Assume splittig data Z = {X, Y ),..., X, Y )} D = {X, Y ),..., X, Y )} such that X, Y ),..., X, Y ), X, Y ),..., X, Y ) are i.i.d. The splittig data estimate of S is defied as S := Y i m X i) = I {X i A,j }Y i ν A,j ) µ A,j ). 865
Györfi Walk Put the S has the equivalet form ν A) = S = I {X i A}Y i, ν A,j ) ν A,j ) µ A,j ). 6) Theorem Assume 3) that µ is o-atomic has bouded support. Suppose that there is a fiite costat C such that The { Y 3 X} < C. 7) S {S }) /σ D N0, ), where σ 2 = 2 M 2 x)mx) 2 µdx) 2 mx) µdx)) 2 mx) 4 µdx), with M 2 X) = {Y 2 X}. The estimatio problem is motivated by the above metioed dimesio reductio such that oe estimates S for the origial observatio vector for the observatio vector where some compoets are left out. If the two estimates are close to each other, the we decide that the left out compoets are ieffective. Theorem is o the rom part of the estimates. Therefore there is a further eed to study the differece of the biases of the estimates. Uder 3) we have lim {S } = S for Lipschitz cotiuous m the rate of covergece ca be of order /d for suitable choice of h. Cf. Devroye et al. 203).) Similarly to De Brabater et al. 204) we cojecture that this differece of the biases has uiversally a fast rate of covergece. Obviously, there are several other possibilities for defiig partitioig based estimates provig their asymptotic ormality, for example, or m X i) 2 ν A,j ) 2 µ A,j ). Notice that both estimates have larger bias variace tha our estimate 6) has. The proof of Theorem works without ay major modificatio for cosistet k earest eighbor k -NN) estimate m if k k / 0. A delicate importat research problem is the case of o-cosistet -NN estimate m, because for -NN estimate m the bias is smaller. We cojecture that eve i this case oe has a CLT. We prove Theorem i the ext sectio. 866
O the Asymptotic Normality of a Regressio Fuctioal stimate 3. Proof of Theorem Itroduce the otatios the U = S {S Z }) V = {S Z } {S }), S {S }) = U + V. We prove Theorem by showig that for ay u, v IR ) ) u v P{U u, V v} Φ Φ where Φ deotes the stard ormal distributio fuctio, 2 σ 2 = M 2 x)mx) 2 µdx) mx) µdx)) 2 9) σ σ 2 8) σ2 2 = M 2 x)mx) 2 µdx) mx) 4 µdx). 0) Notice that V is measurable with respect to Z, therefore ) ) u v P{U u, V v} Φ Φ σ σ 2 ) ) = u v {I {V v}p{u u Z }} Φ Φ σ σ 2 ))} u {I {V v} P{U u Z } Φ σ )) ) + v u P{V v} Φ Φ σ 2 σ { ) } ) u P{U u Z } Φ + v P{V v} Φ. Thus, 8) is satisfied if i probability Proof of ). Let s start with the represetatio U = = σ ) u P{U u Z } Φ σ σ 2 ) ) v P{V v} Φ. 2) σ 2 Y i m X i) {Y i m X i) Z }) Y i m X i) {Y i m X i) Z }). ) 867
Györfi Walk Because of 7) the Jese iequality, for ay s 3, we get M s X) := { Y s X} = { Y s X} /s ) s { Y 3 X} /3 ) s C s/3, 3) especially, for s = M X) = mx) C /3 { Y 3 } C. Next we apply a Berry-ssee type cetral limit theorem see Theorem 4 i Petrov 975)). It implies that ) P{U u u Z } Φ VarY m X ) Z ) c { Y m X ) 3 Z } VarY m X ) Z ) 3 with the uiversal costat c > 0. Because of {Y m X ) Z } = we get that mx)m x)µdx), VarY m X ) Z ) = {Y 2 m X ) 2 Z } {Y m X ) Z } 2 2 = M 2 x)m x) 2 µdx) mx)m x)µdx)). Now 4), together with the boudedess of M 2 by 3), implies that VarY m X ) Z ) σ 2 i probability, where σ 2 is defied by 9). Further { Y m X ) 3 Z } C m x) 3 µdx). Put Agai, applyig the Jese iequality we get A x) = A,j if x A,j. m x) 3 I {X i A x)} Y i 3/2 I {X i A x)} the right h side of which is the square of the regressio estimate, where Y is replaced by Y 3/2. Thus, 4) together with { Y 3 } < implies that I {X i A x)} Y i 3/2 I {X i A x)} 2 µdx) {{ Y 3/2 X} 2 } < C 2, 868
O the Asymptotic Normality of a Regressio Fuctioal stimate i probability. These limit relatios imply ). Proof of 2). Assumig that the support S of µ is bouded, let l be such that S l A,j. Also we re-idex the partitio so that µa,j ) µa,j+ ), with µa,j ) > 0 for j l, µa,j ) = 0 otherwise. The, S = ν A,j ) ν A,j ) µ A,j ), 4) The coditio h d implies that l c h d. where Because of 4) we have that V = = l / 0. { }) {ν A ν A,j ),j ) Z } µ A,j ) ν A,j ) µ A,j ) { }) ν A,j ) νa,j ) µ A,j ) ν A,j ), µ A,j ) νa) = {ν A)}. Observe that we have to show the asymptotic ormality for a fiite sum of depedet rom variables. I order to prove 2), we follow the lies of the proof i Beirlat Györfi 998) use a Poissoizatio argumet. With this we itroduce a modificatio M of V such that := V M 0, the proof of which follows, startig from 23). Now we proceed arguig for M. Itroduce the otatio N for a Poisso) rom variable idepedet of X, Y ), X 2, Y 2 ),.... Moreover put N ν A) = N µ A) = I {Xi A}Y i I {Xi A}. The key result i this step is the followig property: 869
Györfi Walk Propositio 2 Beirlat Maso 995), Beirlat et al. 994).) Put M = l { }) ν A,j ) νa,j ) µ A,j ) ν A,j ), 5) µ A,j ) Assume that M = { }) ν A,j ) νa,j ) µ A,j ) ν A,j ). 6) µ A,j ) Φ t, v) = exp it M + iv N )) e t2 ρ 2 +v 2 )/2 for a costat ρ > 0, where i =. The Put M /ρ D N0, ). T = t M + v N, for which a cetral limit result is to hold: as. Remark that { VarT ) = t 2 Var M ) + 2tv T D N 0, t 2 ρ 2 + v 2) 7) M N } + v 2. For a cell A = A,j from the partitio with µa) > 0, let Y A) be a rom variable such that P{Y A) B} = P{Y B X A}, where B is a arbitrary Borel set. Itroduce the otatios q,k = P{µ A) = k} = ) µa) k µa)) k k q,k = P{ µ A) = k} = µa))k e µa). k! Cocerig the expectatio, with Y A), Y 2 A),...) a i.i.d. sequece of rom variables distributed as Y A) we fid that { } ν A) { } ν A) = µ A) µ A) µ A) = k P{ µ A) = k} k=0 { k } = Y ia) q,k k k= = {Y A)} q,0 ) = νa) µa) q,0), 8) 870
O the Asymptotic Normality of a Regressio Fuctioal stimate further, by 24) { } ν A) µ A) { = Moreover, { ν A) 2 } µ A) 2 = k= k q,k = = k= k= k = = Y A) + )µ A) k=0 } = νa) µa) µa)) )), 9) { ν A) 2 } µ A) 2 µ A) = k P{ µ A) = k} k 2 ia)) Y k 2 q,k k= k { Y A) 2} + kk ) {Y A)} 2 k= = Var Y A)) µa)) k k + k= e µa) k 2 k! µa)) k e µa) + k! q,k k q,k + {Y A)} 2 q,0 ), k= kk + ) µa) q 3,0) + 2 µa) 2 q,0). µa)) k The idepedece of the Poisso masses over differet cells leads to Var M l ) ) = νa,j ) 2 ν A,j ) Var µ A,j ) + νa,j ) 2 Var Y A,j )) µa,j ) e µa,j) ) 3 2 µa,j ) 2 e µa,j) ) ) k! e µa) + {Y A,j )} 2 e µa,j) ) {Y A,j )} 2 e µa,j) ) 2) + + νa,j ) 2 µa,j ) 2 Var Y A,j )) µa,j ) 3Var Y A,j )) νa,j ) 2 µa,j ) 2 νa,j ) 2 {Y A,j )} 2 e µa,j) ) 87
Györfi Walk such that the boudig error i these iequalities is of order Ol /). 4) together with the boudedess of M 2 m implies that νa,j ) 2 µa,j ) 2 Var Y A,j )) µa,j ) A = M x) 2z)µdz) A mz)µdz) ) 2 x) A µdx) mz)µdz) ) 4 x) µdx) µa x)) µa x)) µa x)) = σ 2 2 + o), where σ 2 2 is defied by 0). Moreover, l 3Var Y A,j )) νa,j ) 2 µa,j ) 2 3C4/3 l 0. The = C 4/3 νa,j ) 2 {Y A,j )} 2 e µa,j) νa,j ) 2 µa,j ) 2 {Y A,j )} 2 µa,j )e µa,j) µa,j ) µa,j ) 2 e µa,j) C 4/3 max z>0 z2 e z )l / 0. So we proved that Var M ) σ 2 2. To complete the asymptotics for VarT ), it remais to show that { } N M 0 as. Because of N = = µ A,j ) µa,j ), 872
O the Asymptotic Normality of a Regressio Fuctioal stimate we have that { } N M = = = = { } ν A,j ) µ A,j ) νa,j) µ A,j ) µa,j )) { } ) ν A,j ) νa,j ) { ν A,j )} µa,j )) µ A,j ) νa,j ) νa,j ) νa ),j) µa,j ) e µa,j) )µa,j )) νa,j ) 2 e µa,j) C 2/3 max z>0 z2 e z )l / 0. To fiish the proof of 7) by Lyapuov s cetral limit theorem, it suffices to prove that 3/2 { }) ν A,j ) { t µ A,j ) ν A,j ) νa,j ) + v µ A,j ) µa,j )) 3} 0 µ A,j ) or, by ivokig the c 3 iequality a + b 3 4 a 3 + b 3 ), that 3/2 { { } } ν A,j ) µ A,j ) ν A,j ) 3 νa,j) 3 0 20) µ A,j ) 3/2 I view of 20), because of 3) it suffices to prove D := 3/2 For a cell A, 8) implies that { µ A,j ) µa,j ) 3} 0. 2) { { } } ν A,j ) µ A,j ) ν A,j ) 3 µa,j) 3 0 22) µ A,j ) { { } } { ν A) µ A) ν A) 3 ν A) 4 µ A) µ A) νa) µa) q } 3,0)I { µa)>0} { νa) + 4 µa) q,0)i { µa)>0} νa) µa) q 3},0). 873
Györfi Walk O the oe h, 8), 3) 25) imply that, for a costat K, { ν A) µ A) νa) µa) q,0)i { µa)>0} { ν A) = µ A) νa) µa) q,0)i { µa)>0} k=0 k Y ia) {Y i A)}) 3 = k 3 q,k k= K k= k 3/2 q,k c 3/2 µa) 3/2, 3 } 3 µ A) = k } P{ µ A) = k} where we applied the Marcikiewicz Zygmud 937) iequality for absolute cetral momets of sums of i.i.d. rom variables. O the other h { νa) µa) q,0)i { µa)>0} νa) µa) q 3},0) C q,0. Therefore D 3/2 c 2 c 2 c 2 = c 3 0, ) 3/2 µa,j ) 3/2 + e µa,j) µa,j ) 3 µa,j ) 3/2 + ) µa,j ) 3/2 + max z>0 z3/2 e z µa x)) /2 µdx) 3/2 e µa,j) µa,j ) 3 where we used the assumptio that µ is o-atomic. Thus, 20) is proved. The proof of 2) is easier. Notice that 2) meas F := 3/2 N 3 I {Xi A,j } µa,j ) 0. 874
O the Asymptotic Normality of a Regressio Fuctioal stimate Oe has Therefore N 3 I {Xi A,j } µa,j ) N 3 { 4 I {Xi A,j } µa,j )) + 4 N )µa,j ) 3} ) { c 4 k 3/2 µa,j ) 3/2 k e k! + N 3} µa,j ) 3 k= c 5 3/2 µa,j ) 3/2 + 3/2 µa,j ) 3). F 2c 5 µa,j ) 3/2 0, so 2) is proved, too. The remaiig step i the proof of 2) is to show that := V M = /2 By 8) 9) have that = /2 = /2 C 2/3 /2 { } ν A,j ) µ A,j ) { }) ν A,j ) νa,j ) 0. 23) µ A,j ) νa,j ) µa,j ) e µa,j) µa,j )) )νa,j ) νa,j ) 2 µa,j ) 2 e µa,j) µa,j )) )µa,j ) For 0 z, usig the elemetary iequalities we have that e µa,j) µa,j )) )µa,j ). z e z z + z 2 e z z) = e z z)) e kz z) k z 2 e )z, k=0 875
Györfi Walk thus we get that l C 2/3 /2 e µa,j) µa,j )) )µa,j ) C 2/3 /2 C2/3 /2 C2/3 /2 0. µa,j ) 3 e )µa,j) ) µa,j ) [µa,j )] 2 e µa,j) e µa,j ) max z 0 z2 e z )e This eds the proof of 2) so the proof of Theorem is complete. Next we give two lemmas, which are used above. Lemma 3 If B, p) is a biomial rom variable with parameters, p), the { } = + B, p) p)+. 24) + )p Lemma 4 If P oλ) is a Poisso rom variable with parameter λ, the { } P oλ) 3 I {P oλ)>0} 24 λ 3. 25) Refereces J. Beirlat L. Györfi. O the asymptotic L 2 -error i partitioig regressio estimatio. Joural of Statistical Plaig Iferece, 7:93 07, 998. J. Beirlat D. Maso. O the asymptotic ormality of l p -orms of empirical fuctioals. Mathematical Methods of Statistics, 4: 9, 995. J. Beirlat, L. Györfi, G. Lugosi. O the asymptotic ormality of the l - l 2 - errors i histogram desity estimatio. Caadia J. Statistics, 22:309 38, 994. K. De Brabater, P. G. Ferrario, L. Györfi. Detectig ieffective features for oparametric regressio. I J. A. K. Suykes, M. Sigoretto, A. Argyriou, editors, Regularizatio, Optimizatio, Kerels, Support Vector Machies, pages 77 94. Chapma & Hall/CRC Machie Learig Patter Recogitio Series, 204. L. Devroye, D. Schäfer, L. Györfi, H. Walk. The estimatio problem of miimum mea squared error. Statistics Decisios, 2:5 28, 2003. 876
O the Asymptotic Normality of a Regressio Fuctioal stimate L. Devroye, P. Ferrario, L. Györfi, H. Walk. Strog uiversal cosistet estimate of the miimum mea squared error. I B. Schölkopf, Z. Luo, V. Vovk, editors, mpirical Iferece - Festschrift i Hoor of Vladimir N. Vapik, pages 43 60. Spriger, Heidelberg, 203. D. vas A. J. Joes. No-parametric estimatio of residual momets covariace. Proceedigs of the Royal Society, A 464:283 2846, 2008. P. G. Ferrario H. Walk. Noparametric partitioig estimatio of residual local variace based o first secod earest eighbors. Joural of Noparametric Statistics, 24:09 039, 202. L. Györfi, M. Kohler, A. Krzyżak, H. Walk. A Distributio-Free Theory of Noparametric Regressio. Spriger Verlag, New York, 2002.. Liitiäie, F. Coroa, A. Ledasse. O oparametric residual variace estimatio. Neural Processig Letters, 28:55 67, 2008.. Liitiäie, M. Verleyse, F. Coroa, A. Ledasse. Residual variace estimatio i machie learig. Neurocomputig, 72:3692 3703, 2009.. Liitiäie, F. Coroa, A. Ledasse. Residual variace estimatio usig a earest eighbor statistic. Joural of Multivariate Aalysis, 0:8 823, 200. J. Marcikiewicz A. Zygmud. Sur les foctios idépedates. Fudameta Mathematicae, 29:60 90, 937. V. V. Petrov. Sums of Idepedet Rom Variables. Spriger-Verlag, Berli, 975. 877