On the Asymptotic Normality of an Estimate of a Regression Functional

Similar documents
Estimation of the essential supremum of a regression function

Nonparametric estimation of conditional distributions

Convergence of random variables. (telegram style notes) P.J.C. Spreij

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Sieve Estimators: Consistency and Rates of Convergence

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

7.1 Convergence of sequences of random variables

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 33: Bootstrap

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

A survey on penalized empirical risk minimization Sara A. van de Geer

SDS 321: Introduction to Probability and Statistics

Parameter, Statistic and Random Samples

Kernel density estimator

7.1 Convergence of sequences of random variables

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

5.1 A mutual information bound based on metric entropy

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

An Introduction to Randomized Algorithms

Estimation for Complete Data

Lecture 15: Density estimation

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

On the asymptotic properties of a nonparametric L 1 -test statistic of homogeneity

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Statistical Inference Based on Extremum Estimators

Regression with an Evaporating Logarithmic Trend

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Asymptotic Results for the Linear Regression Model

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Berry-Esseen bounds for self-normalized martingales

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

MA Advanced Econometrics: Properties of Least Squares Estimators

Detailed proofs of Propositions 3.1 and 3.2

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Distribution of Random Samples & Limit theorems

Maximum Likelihood Estimation and Complexity Regularization

Lecture 2: Monte Carlo Simulation

Lecture 20: Multivariate convergence and the Central Limit Theorem

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Lecture 7: October 18, 2017

STA Object Data Analysis - A List of Projects. January 18, 2018

Machine Learning Brett Bernstein

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Study the bias (due to the nite dimensional approximation) and variance of the estimators

18.657: Mathematics of Machine Learning

Lecture 19: Convergence

A Central Limit Theorem for Spatial Observations

Chapter 6 Principles of Data Reduction

Unbiased Estimation. February 7-12, 2008

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Asymptotic distribution of products of sums of independent random variables

This section is optional.

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

LECTURE 8: ASYMPTOTICS I

Bayesian Methods: Introduction to Multi-parameter Models

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

5. Likelihood Ratio Tests

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

HAJEK-RENYI-TYPE INEQUALITY FOR SOME NONMONOTONIC FUNCTIONS OF ASSOCIATED RANDOM VARIABLES

J. Stat. Appl. Pro. Lett. 2, No. 1, (2015) 15

Random Variables, Sampling and Estimation

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

Output Analysis and Run-Length Control

1 Review and Overview

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Statistical Pattern Recognition

11 Correlation and Regression

The standard deviation of the mean

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Empirical Process Theory and Oracle Inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Quantile regression with multilayer perceptrons.

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

REGRESSION WITH QUADRATIC LOSS

Entropy Rates and Asymptotic Equipartition

Properties and Hypothesis Testing

Research Article On the Strong Laws for Weighted Sums of ρ -Mixing Random Variables

The Central Limit Theorem

Notes 19 : Martingale CLT

On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates

BIRKHOFF ERGODIC THEOREM

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

of (X n ) are available at certain points. Under assumption of weak dependency we proved the consistency of Hill s estimator of the tail

Probability and Statistics

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Regression with quadratic loss

ESTIMATING THE ERROR DISTRIBUTION FUNCTION IN NONPARAMETRIC REGRESSION WITH MULTIVARIATE COVARIATES

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Pattern Classification, Ch4 (Part 1)

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

A Weak Law of Large Numbers Under Weak Mixing

Transcription:

Joural of Machie Learig Research 6 205) 863-877 Submitted 6/5; Published 9/5 O the Asymptotic Normality of a stimate of a Regressio Fuctioal László Györfi Departmet of Computer Sciece Iformatio Theory Budapest Uiversity of Techology coomics Magyar Tudósok körútja 2., H-7 Budapest, Hugary Harro Walk Departmet of Mathematics Uiversity of Stuttgart Pfaffewaldrig 57, D-70569 Stuttgart, Germay gyorfi@cs.bme.hu walk@mathematik.ui-stuttgart.de ditor: Alex Gammerma Vladimir Vovk Abstract A estimate of the secod momet of the regressio fuctio is itroduced. Its asymptotic ormality is proved such that the asymptotic variace depeds either o the dimesio of the observatio vector, or o the smoothess properties of the regressio fuctio. The asymptotic variace is give explicitly. Keywords: oparametric estimatio, regressio fuctioal, cetral limit theorem, partitioig estimate. Itroductio This paper cosiders a histogram-based estimate of secod momet of the regressio fuctio i multivariate problems. The iterest i the secod momet is motivated by the fact that by estimatig it oe obtais a estimate of the best possible achievable mea squared error, a quatity of obvious statistical iterest. It is show that the estimate is asymptotically ormally distributed. It is remarkable that the asymptotic variace oly depeds o momets of the regressio fuctio but either o its smoothess, or o the dimesio of the space. The proof relies o a Poissoizatio techique that has bee used successfully i related problems. Let Y be a real valued rom variable with {Y 2 } < let X = X ),..., X d) ) be a d-dimesioal rom observatioal vector. I regressio aalysis oe wishes to estimate Y give X, i.e., oe wats to fid a fuctio g defied o the rage of X so that gx) is close to Y. Assume that the mai aim of the aalysis is to miimize the mea squared error : mi g {gx) Y ) 2 }.. This research has bee partially supported by the uropea Uio Hugary co-fiaced by the uropea Social Fud through the project TMOP-4.2.2.C-//KONV-202-0004 - Natioal Research Ceter for Developmet Market Itroductio of Advaced Iformatio Commuicatio Techologies. c 205 László Györfi Harro Walk.

Györfi Walk As is well-kow, this miimum is achieved by the regressio fuctio mx), which is defied by mx) = {Y X = x}. ) For each measurable fuctio g oe has {gx) Y ) 2 } = {mx) Y ) 2 } + {mx) gx)) 2 } = {mx) Y ) 2 } + mx) gx) 2 µdx), where µ sts for the distributio of the observatio X. It is of great importace to be able to estimate the miimum mea squared error L = {mx) Y ) 2 } accurately, eve before a regressio estimate is applied: i a stard oparametric regressio desig process, oe cosiders a fiite umber of real-valued features X i), i I, evaluates whether these suffice to explai Y. I case they suffice for the give explaatory task, a estimatio method ca be applied o the basis of the features already uder cosideratio, if ot, more or differet features must be cosidered. The quality of a subvector {X i), i I} of X is measured by the miimum mea squared error ) 2 L I) := Y {Y X i), i I} that ca be achieved usig the features as explaatory variables. L I) depeds upo the ukow distributio of Y, X i) : i I). The first phase of ay regressio estimatio process therefore heavily relies o estimates of L eve before a regressio estimate is picked). Cocerig dimesio reductio the related testig problem is o the hypothesis L = L I). This testig problem ca be maaged such that we estimate both L L I), accept the hypothesis if the two estimates are close to each other. Cf. De Brabater et al. 204).) Devroye et al. 2003), vas Joes 2008), Liitiäie et al. 2008), Liitiäie et al. 2009), Liitiäie et al. 200), Ferrario Walk 202) itroduced earest eighbor based estimates of L, proved strog uiversal cosistecy calculated the fast) rate of covergece. Because of L = {Y 2 } {mx) 2 } {Y 2 } <, estimatig L is equivalet to estimatig the secod momet S of the regressio fuctio: S = {mx) 2 } = mx) 2 µdx). I this paper we itroduce a partitioig based estimator of S, show its asymptotic ormality. It turs out that the asymptotic variace depeds either o the dimesio of the observatio vector, or o the smoothess properties of the regressio fuctio. The asymptotic variace is give explicitly. 864

O the Asymptotic Normality of a Regressio Fuctioal stimate 2. A Splittig stimate We suppose that the regressio estimatio problem is based o a sequece X, Y ), X 2, Y 2 ),... of i.i.d. rom vectors distributed as X, Y ). Let P = {A,j, j =, 2,...} be a cubic partitio of IR d of size h > 0. The partitioig estimator of the regressio fuctio m is defied as iterpretig 0/0 = 0) with m x) = ν A,j ) µ A,j ) if x A,j, 2) ν A) = µ A) = I {Xi A}Y i I {Xi A}. Here I deotes the idicator fuctio.) If for cubic partitio h d h 0 3) as, the the partitioig regressio estimate 2) is weakly uiversally cosistet, which meas that { } lim m x) mx)) 2 µdx) = 0 4) for ay distributio of X, Y ) with {Y 2 } <, for bouded Y it holds m x) mx)) 2 µdx) = 0 5) lim a.s. Cf. Theorems 4.2 23. i Györfi et al. 2002).) Assume splittig data Z = {X, Y ),..., X, Y )} D = {X, Y ),..., X, Y )} such that X, Y ),..., X, Y ), X, Y ),..., X, Y ) are i.i.d. The splittig data estimate of S is defied as S := Y i m X i) = I {X i A,j }Y i ν A,j ) µ A,j ). 865

Györfi Walk Put the S has the equivalet form ν A) = S = I {X i A}Y i, ν A,j ) ν A,j ) µ A,j ). 6) Theorem Assume 3) that µ is o-atomic has bouded support. Suppose that there is a fiite costat C such that The { Y 3 X} < C. 7) S {S }) /σ D N0, ), where σ 2 = 2 M 2 x)mx) 2 µdx) 2 mx) µdx)) 2 mx) 4 µdx), with M 2 X) = {Y 2 X}. The estimatio problem is motivated by the above metioed dimesio reductio such that oe estimates S for the origial observatio vector for the observatio vector where some compoets are left out. If the two estimates are close to each other, the we decide that the left out compoets are ieffective. Theorem is o the rom part of the estimates. Therefore there is a further eed to study the differece of the biases of the estimates. Uder 3) we have lim {S } = S for Lipschitz cotiuous m the rate of covergece ca be of order /d for suitable choice of h. Cf. Devroye et al. 203).) Similarly to De Brabater et al. 204) we cojecture that this differece of the biases has uiversally a fast rate of covergece. Obviously, there are several other possibilities for defiig partitioig based estimates provig their asymptotic ormality, for example, or m X i) 2 ν A,j ) 2 µ A,j ). Notice that both estimates have larger bias variace tha our estimate 6) has. The proof of Theorem works without ay major modificatio for cosistet k earest eighbor k -NN) estimate m if k k / 0. A delicate importat research problem is the case of o-cosistet -NN estimate m, because for -NN estimate m the bias is smaller. We cojecture that eve i this case oe has a CLT. We prove Theorem i the ext sectio. 866

O the Asymptotic Normality of a Regressio Fuctioal stimate 3. Proof of Theorem Itroduce the otatios the U = S {S Z }) V = {S Z } {S }), S {S }) = U + V. We prove Theorem by showig that for ay u, v IR ) ) u v P{U u, V v} Φ Φ where Φ deotes the stard ormal distributio fuctio, 2 σ 2 = M 2 x)mx) 2 µdx) mx) µdx)) 2 9) σ σ 2 8) σ2 2 = M 2 x)mx) 2 µdx) mx) 4 µdx). 0) Notice that V is measurable with respect to Z, therefore ) ) u v P{U u, V v} Φ Φ σ σ 2 ) ) = u v {I {V v}p{u u Z }} Φ Φ σ σ 2 ))} u {I {V v} P{U u Z } Φ σ )) ) + v u P{V v} Φ Φ σ 2 σ { ) } ) u P{U u Z } Φ + v P{V v} Φ. Thus, 8) is satisfied if i probability Proof of ). Let s start with the represetatio U = = σ ) u P{U u Z } Φ σ σ 2 ) ) v P{V v} Φ. 2) σ 2 Y i m X i) {Y i m X i) Z }) Y i m X i) {Y i m X i) Z }). ) 867

Györfi Walk Because of 7) the Jese iequality, for ay s 3, we get M s X) := { Y s X} = { Y s X} /s ) s { Y 3 X} /3 ) s C s/3, 3) especially, for s = M X) = mx) C /3 { Y 3 } C. Next we apply a Berry-ssee type cetral limit theorem see Theorem 4 i Petrov 975)). It implies that ) P{U u u Z } Φ VarY m X ) Z ) c { Y m X ) 3 Z } VarY m X ) Z ) 3 with the uiversal costat c > 0. Because of {Y m X ) Z } = we get that mx)m x)µdx), VarY m X ) Z ) = {Y 2 m X ) 2 Z } {Y m X ) Z } 2 2 = M 2 x)m x) 2 µdx) mx)m x)µdx)). Now 4), together with the boudedess of M 2 by 3), implies that VarY m X ) Z ) σ 2 i probability, where σ 2 is defied by 9). Further { Y m X ) 3 Z } C m x) 3 µdx). Put Agai, applyig the Jese iequality we get A x) = A,j if x A,j. m x) 3 I {X i A x)} Y i 3/2 I {X i A x)} the right h side of which is the square of the regressio estimate, where Y is replaced by Y 3/2. Thus, 4) together with { Y 3 } < implies that I {X i A x)} Y i 3/2 I {X i A x)} 2 µdx) {{ Y 3/2 X} 2 } < C 2, 868

O the Asymptotic Normality of a Regressio Fuctioal stimate i probability. These limit relatios imply ). Proof of 2). Assumig that the support S of µ is bouded, let l be such that S l A,j. Also we re-idex the partitio so that µa,j ) µa,j+ ), with µa,j ) > 0 for j l, µa,j ) = 0 otherwise. The, S = ν A,j ) ν A,j ) µ A,j ), 4) The coditio h d implies that l c h d. where Because of 4) we have that V = = l / 0. { }) {ν A ν A,j ),j ) Z } µ A,j ) ν A,j ) µ A,j ) { }) ν A,j ) νa,j ) µ A,j ) ν A,j ), µ A,j ) νa) = {ν A)}. Observe that we have to show the asymptotic ormality for a fiite sum of depedet rom variables. I order to prove 2), we follow the lies of the proof i Beirlat Györfi 998) use a Poissoizatio argumet. With this we itroduce a modificatio M of V such that := V M 0, the proof of which follows, startig from 23). Now we proceed arguig for M. Itroduce the otatio N for a Poisso) rom variable idepedet of X, Y ), X 2, Y 2 ),.... Moreover put N ν A) = N µ A) = I {Xi A}Y i I {Xi A}. The key result i this step is the followig property: 869

Györfi Walk Propositio 2 Beirlat Maso 995), Beirlat et al. 994).) Put M = l { }) ν A,j ) νa,j ) µ A,j ) ν A,j ), 5) µ A,j ) Assume that M = { }) ν A,j ) νa,j ) µ A,j ) ν A,j ). 6) µ A,j ) Φ t, v) = exp it M + iv N )) e t2 ρ 2 +v 2 )/2 for a costat ρ > 0, where i =. The Put M /ρ D N0, ). T = t M + v N, for which a cetral limit result is to hold: as. Remark that { VarT ) = t 2 Var M ) + 2tv T D N 0, t 2 ρ 2 + v 2) 7) M N } + v 2. For a cell A = A,j from the partitio with µa) > 0, let Y A) be a rom variable such that P{Y A) B} = P{Y B X A}, where B is a arbitrary Borel set. Itroduce the otatios q,k = P{µ A) = k} = ) µa) k µa)) k k q,k = P{ µ A) = k} = µa))k e µa). k! Cocerig the expectatio, with Y A), Y 2 A),...) a i.i.d. sequece of rom variables distributed as Y A) we fid that { } ν A) { } ν A) = µ A) µ A) µ A) = k P{ µ A) = k} k=0 { k } = Y ia) q,k k k= = {Y A)} q,0 ) = νa) µa) q,0), 8) 870

O the Asymptotic Normality of a Regressio Fuctioal stimate further, by 24) { } ν A) µ A) { = Moreover, { ν A) 2 } µ A) 2 = k= k q,k = = k= k= k = = Y A) + )µ A) k=0 } = νa) µa) µa)) )), 9) { ν A) 2 } µ A) 2 µ A) = k P{ µ A) = k} k 2 ia)) Y k 2 q,k k= k { Y A) 2} + kk ) {Y A)} 2 k= = Var Y A)) µa)) k k + k= e µa) k 2 k! µa)) k e µa) + k! q,k k q,k + {Y A)} 2 q,0 ), k= kk + ) µa) q 3,0) + 2 µa) 2 q,0). µa)) k The idepedece of the Poisso masses over differet cells leads to Var M l ) ) = νa,j ) 2 ν A,j ) Var µ A,j ) + νa,j ) 2 Var Y A,j )) µa,j ) e µa,j) ) 3 2 µa,j ) 2 e µa,j) ) ) k! e µa) + {Y A,j )} 2 e µa,j) ) {Y A,j )} 2 e µa,j) ) 2) + + νa,j ) 2 µa,j ) 2 Var Y A,j )) µa,j ) 3Var Y A,j )) νa,j ) 2 µa,j ) 2 νa,j ) 2 {Y A,j )} 2 e µa,j) ) 87

Györfi Walk such that the boudig error i these iequalities is of order Ol /). 4) together with the boudedess of M 2 m implies that νa,j ) 2 µa,j ) 2 Var Y A,j )) µa,j ) A = M x) 2z)µdz) A mz)µdz) ) 2 x) A µdx) mz)µdz) ) 4 x) µdx) µa x)) µa x)) µa x)) = σ 2 2 + o), where σ 2 2 is defied by 0). Moreover, l 3Var Y A,j )) νa,j ) 2 µa,j ) 2 3C4/3 l 0. The = C 4/3 νa,j ) 2 {Y A,j )} 2 e µa,j) νa,j ) 2 µa,j ) 2 {Y A,j )} 2 µa,j )e µa,j) µa,j ) µa,j ) 2 e µa,j) C 4/3 max z>0 z2 e z )l / 0. So we proved that Var M ) σ 2 2. To complete the asymptotics for VarT ), it remais to show that { } N M 0 as. Because of N = = µ A,j ) µa,j ), 872

O the Asymptotic Normality of a Regressio Fuctioal stimate we have that { } N M = = = = { } ν A,j ) µ A,j ) νa,j) µ A,j ) µa,j )) { } ) ν A,j ) νa,j ) { ν A,j )} µa,j )) µ A,j ) νa,j ) νa,j ) νa ),j) µa,j ) e µa,j) )µa,j )) νa,j ) 2 e µa,j) C 2/3 max z>0 z2 e z )l / 0. To fiish the proof of 7) by Lyapuov s cetral limit theorem, it suffices to prove that 3/2 { }) ν A,j ) { t µ A,j ) ν A,j ) νa,j ) + v µ A,j ) µa,j )) 3} 0 µ A,j ) or, by ivokig the c 3 iequality a + b 3 4 a 3 + b 3 ), that 3/2 { { } } ν A,j ) µ A,j ) ν A,j ) 3 νa,j) 3 0 20) µ A,j ) 3/2 I view of 20), because of 3) it suffices to prove D := 3/2 For a cell A, 8) implies that { µ A,j ) µa,j ) 3} 0. 2) { { } } ν A,j ) µ A,j ) ν A,j ) 3 µa,j) 3 0 22) µ A,j ) { { } } { ν A) µ A) ν A) 3 ν A) 4 µ A) µ A) νa) µa) q } 3,0)I { µa)>0} { νa) + 4 µa) q,0)i { µa)>0} νa) µa) q 3},0). 873

Györfi Walk O the oe h, 8), 3) 25) imply that, for a costat K, { ν A) µ A) νa) µa) q,0)i { µa)>0} { ν A) = µ A) νa) µa) q,0)i { µa)>0} k=0 k Y ia) {Y i A)}) 3 = k 3 q,k k= K k= k 3/2 q,k c 3/2 µa) 3/2, 3 } 3 µ A) = k } P{ µ A) = k} where we applied the Marcikiewicz Zygmud 937) iequality for absolute cetral momets of sums of i.i.d. rom variables. O the other h { νa) µa) q,0)i { µa)>0} νa) µa) q 3},0) C q,0. Therefore D 3/2 c 2 c 2 c 2 = c 3 0, ) 3/2 µa,j ) 3/2 + e µa,j) µa,j ) 3 µa,j ) 3/2 + ) µa,j ) 3/2 + max z>0 z3/2 e z µa x)) /2 µdx) 3/2 e µa,j) µa,j ) 3 where we used the assumptio that µ is o-atomic. Thus, 20) is proved. The proof of 2) is easier. Notice that 2) meas F := 3/2 N 3 I {Xi A,j } µa,j ) 0. 874

O the Asymptotic Normality of a Regressio Fuctioal stimate Oe has Therefore N 3 I {Xi A,j } µa,j ) N 3 { 4 I {Xi A,j } µa,j )) + 4 N )µa,j ) 3} ) { c 4 k 3/2 µa,j ) 3/2 k e k! + N 3} µa,j ) 3 k= c 5 3/2 µa,j ) 3/2 + 3/2 µa,j ) 3). F 2c 5 µa,j ) 3/2 0, so 2) is proved, too. The remaiig step i the proof of 2) is to show that := V M = /2 By 8) 9) have that = /2 = /2 C 2/3 /2 { } ν A,j ) µ A,j ) { }) ν A,j ) νa,j ) 0. 23) µ A,j ) νa,j ) µa,j ) e µa,j) µa,j )) )νa,j ) νa,j ) 2 µa,j ) 2 e µa,j) µa,j )) )µa,j ) For 0 z, usig the elemetary iequalities we have that e µa,j) µa,j )) )µa,j ). z e z z + z 2 e z z) = e z z)) e kz z) k z 2 e )z, k=0 875

Györfi Walk thus we get that l C 2/3 /2 e µa,j) µa,j )) )µa,j ) C 2/3 /2 C2/3 /2 C2/3 /2 0. µa,j ) 3 e )µa,j) ) µa,j ) [µa,j )] 2 e µa,j) e µa,j ) max z 0 z2 e z )e This eds the proof of 2) so the proof of Theorem is complete. Next we give two lemmas, which are used above. Lemma 3 If B, p) is a biomial rom variable with parameters, p), the { } = + B, p) p)+. 24) + )p Lemma 4 If P oλ) is a Poisso rom variable with parameter λ, the { } P oλ) 3 I {P oλ)>0} 24 λ 3. 25) Refereces J. Beirlat L. Györfi. O the asymptotic L 2 -error i partitioig regressio estimatio. Joural of Statistical Plaig Iferece, 7:93 07, 998. J. Beirlat D. Maso. O the asymptotic ormality of l p -orms of empirical fuctioals. Mathematical Methods of Statistics, 4: 9, 995. J. Beirlat, L. Györfi, G. Lugosi. O the asymptotic ormality of the l - l 2 - errors i histogram desity estimatio. Caadia J. Statistics, 22:309 38, 994. K. De Brabater, P. G. Ferrario, L. Györfi. Detectig ieffective features for oparametric regressio. I J. A. K. Suykes, M. Sigoretto, A. Argyriou, editors, Regularizatio, Optimizatio, Kerels, Support Vector Machies, pages 77 94. Chapma & Hall/CRC Machie Learig Patter Recogitio Series, 204. L. Devroye, D. Schäfer, L. Györfi, H. Walk. The estimatio problem of miimum mea squared error. Statistics Decisios, 2:5 28, 2003. 876

O the Asymptotic Normality of a Regressio Fuctioal stimate L. Devroye, P. Ferrario, L. Györfi, H. Walk. Strog uiversal cosistet estimate of the miimum mea squared error. I B. Schölkopf, Z. Luo, V. Vovk, editors, mpirical Iferece - Festschrift i Hoor of Vladimir N. Vapik, pages 43 60. Spriger, Heidelberg, 203. D. vas A. J. Joes. No-parametric estimatio of residual momets covariace. Proceedigs of the Royal Society, A 464:283 2846, 2008. P. G. Ferrario H. Walk. Noparametric partitioig estimatio of residual local variace based o first secod earest eighbors. Joural of Noparametric Statistics, 24:09 039, 202. L. Györfi, M. Kohler, A. Krzyżak, H. Walk. A Distributio-Free Theory of Noparametric Regressio. Spriger Verlag, New York, 2002.. Liitiäie, F. Coroa, A. Ledasse. O oparametric residual variace estimatio. Neural Processig Letters, 28:55 67, 2008.. Liitiäie, M. Verleyse, F. Coroa, A. Ledasse. Residual variace estimatio i machie learig. Neurocomputig, 72:3692 3703, 2009.. Liitiäie, F. Coroa, A. Ledasse. Residual variace estimatio usig a earest eighbor statistic. Joural of Multivariate Aalysis, 0:8 823, 200. J. Marcikiewicz A. Zygmud. Sur les foctios idépedates. Fudameta Mathematicae, 29:60 90, 937. V. V. Petrov. Sums of Idepedet Rom Variables. Spriger-Verlag, Berli, 975. 877