UNIFORM IN BANDWIDTH CONSISTENCY OF KERNEL REGRESSION ESTIMATORS AT A FIXED POINT

Similar documents
Uniform in bandwidth consistency of kernel regression estimators at a fixed point

Weighted uniform consistency of kernel density estimators with general bandwidth sequences

Weighted uniform consistency of kernel density estimators with general bandwidth sequences

A generalization of Strassen s functional LIL

arxiv:math/ v2 [math.pr] 16 Mar 2007

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Gärtner-Ellis Theorem and applications.

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Estimation of the Bivariate and Marginal Distributions with Censored Data

ROBUST - September 10-14, 2012

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

UNIFORM IN BANDWIDTH CONSISTENCY OF KERNEL-TYPE FUNCTION ESTIMATORS

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

Nonparametric regression with martingale increment errors

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

New Versions of Some Classical Stochastic Inequalities

A Law of the Iterated Logarithm. for Grenander s Estimator

CHAPTER 6. Differentiation

Real Analysis Problems

The Codimension of the Zeros of a Stable Process in Random Scenery

Empirical Processes: General Weak Convergence Theory

LECTURE 15: COMPLETENESS AND CONVEXITY

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Density estimators for the convolution of discrete and continuous random variables

Lecture Notes on Metric Spaces

Spectral Gap and Concentration for Some Spherically Symmetric Probability Measures

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

MATHS 730 FC Lecture Notes March 5, Introduction

Random Bernstein-Markov factors

Problem set 1, Real Analysis I, Spring, 2015.

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Analysis Qualifying Exam

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Tools from Lebesgue integration

On variable bandwidth kernel density estimation

Statistics 581, Problem Set 1 Solutions Wellner; 10/3/ (a) The case r = 1 of Chebychev s Inequality is known as Markov s Inequality

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

The Heine-Borel and Arzela-Ascoli Theorems

Verifying Regularity Conditions for Logit-Normal GLMM

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION

Probability and Measure

Yunhi Cho and Young-One Kim

Your first day at work MATH 806 (Fall 2015)

Heat kernels of some Schrödinger operators

7 Convergence in R d and in Metric Spaces

Nonparametric estimation of log-concave densities

Measurable functions are approximately nice, even if look terrible.

Math 328 Course Notes

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

Real Analysis Notes. Thomas Goller

Bayesian Regularization

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Notes for Functional Analysis

Optimal global rates of convergence for interpolation problems with random design

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

Lecture 21: Convergence of transformations and generating a random variable

Weak and strong moments of l r -norms of log-concave vectors

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Probability and Measure

THEOREMS, ETC., FOR MATH 515

Summer Jump-Start Program for Analysis, 2012 Song-Ying Li

Solutions: Problem Set 4 Math 201B, Winter 2007

FUNCTIONAL LAWS OF THE ITERATED LOGARITHM FOR THE INCREMENTS OF THE COMPOUND EMPIRICAL PROCESS 1 2. Abstract

SOLUTIONS TO SOME PROBLEMS

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

1 Sequences of events and their limits

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Bahadur representations for bootstrap quantiles 1

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

3 Integration and Expectation

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

STAT 7032 Probability Spring Wlodek Bryc

Math 104: Homework 7 solutions

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Lecture 11. Probability Theory: an Overveiw

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Czechoslovak Mathematical Journal

McGill University Math 354: Honors Analysis 3

Estimates for probabilities of independent events and infinite series

7 Complete metric spaces and function spaces

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

Lecture 6 Basic Probability

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

L p Spaces and Convexity

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

APPLICATIONS OF THE KANTOROVICH-RUBINSTEIN MAXIMUM PRINCIPLE IN THE THEORY OF MARKOV OPERATORS

Efficiency of Profile/Partial Likelihood in the Cox Model

Continuous Functions on Metric Spaces

Measure and Integration: Solutions of CW2

Principles of Real Analysis I Fall VII. Sequences of Functions

Transcription:

UNIFORM IN BANDWIDTH CONSISTENCY OF KERNEL RERESSION ESTIMATORS AT A FIXED POINT JULIA DONY AND UWE EINMAHL Abstract. We consider pointwise consistency properties of kernel regression function type estimators where the bandwidth sequence in not necessarily deterministic. In some recent papers uniform convergence rates over compact sets have been derived for such estimators via empirical process theory. We now show that it is possible to get optimal results in the pointwise case as well. The main new tool for the present work is a general moment bound for empirical processes which may be of independent interest. 1. Introduction Let X 1, Y 1, X 2, Y 2,... be independent random variables in R d R with joint density function f XY, and take t R d fixed. Let further F be a class of measurable functions ϕ : R R with Eϕ 2 Y <, and consider the regression function m ϕ t = E[ϕY X = t]. For any function ϕ F and a bandwidth 0 < h < 1, define the kernel type estimator ˆϕ n,h t := 1 nh d n t Xi ϕy i K, h where K is a kernel function, i.e. K is Borel measurable and Kxdx = 1. Such kernel estimators have been studied for many years, resulting in a considerable amount of literature and references. By choosing ϕ 1, one obtains an estimator for f X t, the marginal density of X in t R d. This kernel density estimator denoted by ˆf n,h t forms an important special case of the class of kernel estimators ˆϕ n,h t. It is well known that for suitable deterministic bandwidth sequences h n going to zero at an appropriate rate and assuming that the density f X is continuous, one obtains a strongly consistent estimator ˆf n,hn of f X, i.e. one has with probability 1 that ˆf n,hn t f X t for all t R d fixed. For proving such consistency results, one usually writes the difference ˆf n,hn t f X t as the sum of a probabilistic term ˆf n,hn t E ˆf n,hn t, and a deterministic term E ˆf n,hn t f X t, the so called bias. The order of the bias depends on smoothness properties of f X and of the kernel K, whereas the first random term can be studied via techniques based on empirical processes. Hall [14] proved an LIL type result for the probabilistic term corresponding to the kernel density estimator if d = 1. The d dimensional version of this result implies in particular that under suitable conditions on the The research of J. D. is financed by an FWO-Vlaanderen Postdoctoral Fellowship. The research of U. E. was partially supported by an FWO Vlaanderen grant. 1

2 J. DONY AND U. EINMAHL bandwidth sequence h n and the kernel function K, one has 1.1 ˆfn,hn t E ˆf log log n n,hn t = O nh d a.s. n Since this is an LIL type result with corresponding lower bounds, this gives us the precise convergence rate for the pointwise convergence of the probabilistic term. Deheuvels and Mason [2] later showed that this LIL holds whenever the bandwidth sequence satisfies 1.2 nh d n/ log log n, as n, which is the optimal condition under which 1.1 can hold. The work of [2] is based on a notion of a local empirical process indexed by sets. This was further generalized by Einmahl and Mason [7, 8] who looked at local empirical processes indexed by functions and who established strong invariance principles for such processes. They inferred LIL type results from the strong invariance principles and this not only for density estimators, but also for the Nadaraya Watson estimator for the regression function and conditional empirical processes. Recall that the Nadaraya Watson estimator ˆm n,h,ϕ t for m ϕ t = E[ϕY X = t], where ϕ : R R is Borel measurable, is defined as ˆm n,h,ϕ t := ˆϕ n,h t/ ˆf n,h t. There is also an LIL for this estimator see [12] for a first result in this direction which implies that 1.3 ˆm n,hn,ϕt Ê ˆm log log n n,h n,ϕt = O nh d a.s., n where Ê ˆm n,h,ϕt = E ˆϕ n,h t/e ˆf n,h t is a convenient centering term. If ϕ is a bounded function this holds again under the above condition 1.2. If E[ ϕy p X = x] is uniformly bounded in a neighborhood of t, where p > 2, one needs that h d n is at least of order On 1 log n q for some q > 2/p 2 see [7, 8]. From our main result it will actually follow that in this last case q 2/p 2 is already sufficient. Some related results have also been obtained for uniform convergence of kernel type estimators on compact subsets or even on R d. In this case one typically gets a slightly worse convergence rate of the probabilistic term of order O log n/nh d n and one needs that the bandwidth sequence satisfies nh d n/ log n in the bounded case, which is more restrictive than 1.2. For more details see [13, 9, 11] and the references in these papers. In practice, one has to choose a bandwidth sequence h n in such a way that the bias and the probabilistic part are reasonably balanced. The optimal choice for h n then will often depend on some unknown parameter of the distribution which one has to estimate. This can lead to bandwidth sequences depending on the data and the location t. This means that the above results do not apply if one is interested in estimators with such general bandwidth sequences. In [10] uniform in h versions

POINTWISE UNIFORM IN BANDWIDTH CONSISTENCY 3 of the results in [9, 11] were obtained. This makes it possible to establish consistency of kernel type estimators when the bandwidth h is allowed to range in an interval which may increase or decrease in length with the sample size. These kinds of results are immediately applicable to proving uniform consistency of kernel type estimators when the bandwidth h is a function of the data X 1,..., X n or the location t R d. A typical result of [10] is the following asymptotic result concerning the Nadaraya Watson estimator which holds under certain conditions on the distribution of X, Y and the kernel K. 1.4 lim sup n c log n n sup γ/d h<1 sup sup t I ϕ F nhd ˆm n,h,ϕ t Ê ˆm n,h,ϕt log h log log n <, a.s., where γ = 1 or γ = 1 2/p depending on whether F is a bounded class of functions, or if it has an envelope function with a finite p th moment p > 2. Here, I is a compact rectangle of R d on which f X has to be bounded and strictly positive. We call this an asymptotic uniform in bandwidth AUiB boundedness result. It implies that if one chooses the bandwidth depending on the data and/or the location as is usually done in practice, one keeps the same order of convergence as the one valid for a deterministic bandwidth sequence, given, for instance, in [9]. The purpose of the present paper is to establish a similar AUiB boundedness result in the pointwise setting, where in view of the aforementioned results, one can hope for a slightly smaller order and bigger intervals from which one can choose the bandwidth sequence. Pointwise AUiB boundedness results can be useful in various contexts. In particular, they can be used for deriving consistency results for generalized Hill type estimators introduced in [1]. See [5] and Chapter 6 in [3]. 2. Main result Before stating our main result, we have to impose several assumptions on the kernel function, the bandwidth and the class F. These assumptions are mainly technical, and will be listed below. We first recall some terminology. Let X, A be a measurable space. We say that a class of functions g : X R is pointwise measurable if there exists a countable subclass 0 of such that we can find for any function g a sequence of functions g m 0 for which g m z gz, z X. This property is usually assumed to avoid measurability problems, and is discussed in [17]. Next, we call a class of functions with envelope function : X [0, ] a VC class, if N ɛ, Cɛ ν for some constants C, ν > 0, that are called the characteristics of the class. As usual we define N ɛ, = sup N ɛ Q 2,, d Q, Q where the supremum is taken over all probability measures Q on X, A with Q 2 <. d Q is the L 2 Q metric and N ɛ,, d is the minimal number of d balls with radius ɛ which are needed to cover the function class. An envelope

4 J. DONY AND U. EINMAHL function is any A measurable function : X [0, ] such that sup gx x, x X. g We are now ready to state the conditions that need to be imposed for our results. Let F be a class of functions ϕ : R R satisfiying the following three conditions : F.i F is a pointwise measurable class, F.ii F has a measurable envelope function F y sup ϕ F ϕy, y R, F.iii F is a VC class. Next, a kernel function K : R d R will be any measurable function satisfying K.i K = κ < and Kxdx = 1, K.ii K has a support contained in [ 1/2, 1/2] d, K.iii K := x Kγt x : γ > 0 is a pointwise measurable VC class of functions from R d to R. Conditions K.i and K.ii are easy to verify. Many kernels satisfy also condition K.iii. See for instance Remark 1 in [8] for some discussion. Our main result is then as follows. Proposition 2.1 Pointwise AUiB boundedness of kernel type estimators. Let F and K satisfy F and K and assume that the envelope function F of F satisfies for some 0 < ɛ < 1 one of the following conditions on J := t+[ ɛ, ɛ] d : A p > 2 : sup x J E[F p Y X = x] =: µ p <. B s > 0 : sup x J E [expsf Y X = x] <. Then if f X is bounded on J it follows for any c > 0 that 2.1 lim sup n sup sup a nhb 0 ϕ F nhd ˆϕ n,h t E ˆϕ n,h t log log n <, a.s., where 0 < b 0 < 2ɛ is a positive constant and a d n = cn 1 log n 2 p 2 or a d n = cn 1 log log n depending on whether condition A or condition B holds. Note that Proposition 2.1B is more general than the corresponding result in [10] where one has to assume that the function class F is bounded. The above Proposition 2.1 provides a result for all classes whose envelope function admits a finite moment generating function. This improvement is possible since in the present case we can use an exponential Bernstein type inequality in terms of a strong second moment of the envelope function. For establishing an AUiB boundedness result uniformly on compact rectangles of R d one would need this inequality in terms of the weak second moment. Such an improvement of the Bernstein inequality, however, seems to be only known in the bounded case. This means that, concerning uniform in bandwidth pointwise consistency, there is no distinction between the bounded case and the case where the moment generating function of F is finite. Also in the case where one uses deterministic bandwidth sequences, this result seems to be new. We conclude this section with formulating two corollaries of our proposition. We first look at the kernel estimator for the density f X. If the kernel K satistisfies the

POINTWISE UNIFORM IN BANDWIDTH CONSISTENCY 5 above conditions, we can conclude for any sequence a n such that na d n/ log log n and a n b n, where b n 0, sup ˆf n,h t E ˆf log log n n,h t = O a nhb n na d a.s., n provided that f X is bounded on J. Assume now that this density is positive and continuous on J, and F is a class of functions satisfying conditions F and A. If lim inf n na d n/log n 2/p 2 > 0, and a n b n 0, we obtain by a standard argument that sup sup ˆm n,h,ϕ t Ê ˆm n,h,ϕt = O a nhb n ϕ F log log n na d n As in [10], the proof of Proposition 2.1 is based on the theory of empirical processes. We use again exponential deviation inequalities in combination with certain moment inequalities. The necessary exponential inequalities are available in the literature, but we need a new moment inequality. This will be stated and proved in Section 3. In Section 4 we shall finally prove Proposition 2.1. 3. Moment inequalities Let X, X 1,..., X n be i.i.d. random variables taking values in a measurable space X, A and let A A be a fixed set. It is our goal to derive a moment inequality for E αn g 1I A where α n is the empirical process based on X 1,..., X n and is a pointwise measurable class of functions g : X R for which Eg 2 X exists. Let : X [0, ] be an envelope function for the function class and assume that E 2 X <. We further assume that has the following property: For any sequence of i.i.d. X valued random variables Z 1, Z 2,... it holds that k E gzi EgZ 1 C 1 k Z1 2, 1 k n, where C 1 > 1 is a constant depending on only. From Theorem 3.2 below it will follow that VC classes always have this property. But we first prove our moment inequality. Theorem 3.1. Let be a pointwise measurable function class satisfying the above assumptions. Then we have for any A A, E αn g 1I A 2C 1 X1I A X 2. Proof. Similarly as in [7, 8] we shall use a special representation of the random variables X i, i 1. W.l.o.g. we assume that 0 < PA < 1. Consider independent random variables Y 1, Y 2,..., Y 1, Y 2,... such that for all B A and any i 1, PY i B = PX B X A and PY i B = PX B X A c. a.s.

6 J. DONY AND U. EINMAHL Let further ɛ 1, ɛ 2,... be independent BernoulliPX A variables, independent of the two other sequences, and set νn := n ɛ i. Finally, define for any i 1, Xi Yνi, if ɛ i = 1, = Y i νi, if ɛ i = 0. Then it is easy to see that this leads to a sequence of independent random variables with Xi d = X i, i 1. Consequently, it is sufficient to prove the moment bound for the empirical process αn based on the variables Xi, i 1. Moreover, it is readily seen that νn n gxi 1I A Xi = gy i, and also that E[gX 1I A X ] = EgY 1 PX A. Consequently, we have that E nαng 1I A νn = E gy i npx AEgY 1 νn E gyi EgY i νn + E EgY i npx AEgY 1 E νn α νn g + E νn npx A sup E gy 1, g where α n g denotes the empirical process based upon Y 1..., Y n. Recall that νn has Binomialn, PX A distribution so that Eνn = npx A, and thus E νn npx A Varνn 1/2 npx A. Recall also that is an envelope function of. Hence, we have that E nα ng 1I A E νn α νn g + npx AE 2 Y 1. We now look at the first term. Due to assumption and by independence of νn and the variables Y 1, Y 2,..., we can conclude that note that νn n, E νn α νn g n E k α k g Pνn = k k=1 n C 1 ke2 Y 1 Pνn = k k=1 = C 1 E2 Y 1 E[ν 1/2 n] C 1 npx AE2 Y 1, where we have used the trivial fact that E[ν 1/2 n] E[νn] 1/2 = npx A. Recalling that E 2 Y 1 = E[ 2 X1I A X]/PX A, we can conclude that E nα ng 1I A 2C 1 ne[2 X1I A X], proving that the moment bound holds, as claimed.

POINTWISE UNIFORM IN BANDWIDTH CONSISTENCY 7 The next result shows that condition is satisfied if we have a VC class. Theorem 3.2. Let be a pointwise measurable VC class of functions with envelope function and characteristics A, ν 1. If Z, Z 1, Z 2,... is a sequence of i.i.d. X valued random variables satisfying for some 0 < β <, E 2 Z β 2, then we have for some constant C depending on A, ν only that E n gz i EgZ C nβ 2, n 1. Proof. Let ε 1,..., ε n be i.i.d. Rademacher variables which are also independent of Z 1,..., Z n. Then we have, E n gz i EgZ 2E n ε i gz i, and it is sufficient to show that 3.1 E n ε i gz i C nβ 2, where C is a positive constant depending on A and ν only. From the Hoffmann Jørgensen inequality see Proposition 6.8. in [16] it follows that 3.2 E n where Observing that E [ max 1in Z i ] ε i gz i 6t 0 + 6E [ max 1in Z i ], n t 0 := inf P ε i gz i > t 1. t>0 24 E [ max 1in 2 Z i ] 1/2 we see that it is sufficient to show that 3.3 t 0 C nβ 2. Let µ be the distribution of the variable Z : Ω X and define n := n x X n : 2 x i 64nβ 2. Note that for any t > 0, n P ε i gz i > t = X n P µ n c n + P n =: α 1 + α 2. E [ n 2 Z i ] 1/2 nβ2, n ε i gx i > t µ n dx n ε i gx i > t µ n dx From Markov s inequality we obtain that α 1 = P n 2 Z i > 64nβ 2 1/64. To bound α 2, we use a well known inequality of Jain and Marcus [15] which is also stated as Corollary 2.2.8 in [17].

8 J. DONY AND U. EINMAHL We can conclude that for any x = x 1,..., x n X n and some absolute constant c 0 <, E n ε i gx i E n ε i g 0 x i + c0 n log N ɛ,, d 2,x dɛ, where g 0 is arbitrary and d 2 2,xg 1, g 2 := n 1 n g 1x i g 2 x i 2. Further, it is easy to infer that when x n, and for g 1, g 2, E n ε i g 0 x i n 1/2 g0x 2 i 8 nβ 2, d 2 2,xg 1, g 2 2 n n g1x 2 i + g2x 2 i 256β 2. Hence, if ɛ > 16β, one needs only one ball of d 2,x radius ɛ to cover the class. Therefore, N ɛ,, d 2,x = 1 whenever x n and ɛ > 16β. On the other hand, let Q n,x f := n 1 n fx i and note that Q n,x g 1 g 2 2 = d 2 2,xg 1, g 2. Then since Q n,x 2 256β 2 for x n, and recalling that N ɛ, = sup Q N ɛ Q 2,, d Q where d Q is the L 2 Q metric, the assumption that is a VC class gives us for any x n and whenever 0 < ɛ 16β, ɛ Qn,x 2 ɛ N ɛ,, d 2,x N,, d 2,x N 16β 16β, Aɛ ν 16β ν. Hence, we have for x n, log N ɛ,, d 2,x dɛ = 0 16β 0 16β 0 0 = νa 1/ν 16β log N ɛ,, d 2,x dɛ log A16β/ɛν dɛ A 1/ν 0 log 1 s ds, which can be bounded by 16c νa 1/ν β, with c := 1 0 log 1/sds <. Recall that A 1. Consequently, by Markov s inequality we have for any x n : P n ε i gx i > t t 1 8 nβ 2 + c 0 c νa 1/ν 16β = 64nβ 2 1 + 2c 0 c 1 /t, with c 1 = c νa 1/ν. Taking everything together, we obtain that n P ε i gz i > t 1 64nβ 64 + 2 1 + 2c 0 c 1. t To finish, recall from 3.3 that we need to find a range for t > 0 such that the above probability is bounded by 1/24. By solving the equation, we see that t should be such that 64nβ 2 1 + 2c 0 c 1 5t/3 2 6, or t 3 29 nβ 2 1 + 2c 0 c 1. 5

POINTWISE UNIFORM IN BANDWIDTH CONSISTENCY 9 Consequently, by setting C = 3 2 7 1 + 2c 0 c 1 and taking t C nβ 2, we have shown that P n ε igz i > t 1 24, proving the theorem through 3.3 and the Hoffmann Jørgensen inequality. Note that from our assumptions on F and the kernel K it follows that the subsequent class of functions on X = R d R t x 3.4 := x, y ϕyk : ϕ F, h > 0 is a VC class h with envelope function x, y = κf y. Use, for instance, Lemma A.1 in [9]. Consequently, Theorem 3.2 ensures the class to satisfy condition, and thereby all the conditions of Theorem 3.1. 4. Proof of Proposition 2.1 To start the proof of Proposition 2.1, we first show how the process in 2.1 can be expressed in terms of an empirical process indexed by a certain class of functions. To do so, consider the following classes of functions on X = R d R defined by n := x, y ϕyk t x and note that for any ϕ F and a n h b 0, ˆϕ n,h t E ˆϕ n,h t = =: h : ϕ F, a n h b 0, 1 [ n t Xi t X ] nh d ϕy i K neϕy K h h 1 nh d α ng ϕ,h, where g ϕ,h x, y := ϕykt x/h and α n g is the empirical process based upon the sample X 1, Y 1,..., X n, Y n. Further, set n k := 2 k, k 0 and define h d k,j := 2j a d n k. Then h k,0 = a nk and by setting Lk := maxj : h d k,j 2bd 0, it holds that h k,lk 1 b 0 < h k,lk so that [a nk, b 0 ] [h k,0, h k,lk ]. Further, consider for 1 j Lk the subclasses k,j := x, y ϕyk t x h : ϕ F, h k,j 1 h h k,j, and note that nk Lk j=1 k,j. Since a n is eventually non decreasing, we have for all n k 1 < n n k and for any ϕ F if k 1 is large enough, nhd ˆϕ n,h t E ˆϕ n,h t α n g nk sup sup a nhb 0 log log n a nk hb 0 hd log log n 4.1 max 1jLk 2 2 nα n g k,j. n k h d k,j log log n k Recall further that the class in 3.4 is a VC class with envelope x, y = κf y. This of course implies that also k,j is a VC class for this envelope function and with characteristics independent of k and j.

10 J. DONY AND U. EINMAHL In view of Theorem 3.2, the assumptions of Theorem 3.1 are satisfied and we have for any subset A of R d R, E αnk g 1I A k,j 2C X, Y 1I A X, Y 2, where C is a positive constant depending on the characteristics of only. Setting A k,j = t + [ h k,j /2, h k,j /2] d R, we can conclude that 4.2 E α nk g k,j = E α nk g 1I Ak,j k,j C E 2 k,jx, Y 1/2, where k,j x, y = κf y1ix t + [ h k,j /2, h k,j /2] d is another envelope function for k,j. 4.1. Proof under condition A. We use the empirical process version of a recent Fuk Nagaev type inequality in Banach spaces. See Theorem 3.1 in [6]. By a slight misuse of notation, we also write α n for the empirical process based on the sample Z 1,..., Z n in a general measurable space X, A. We shall apply this inequality on X = R d R and with Z i = X i, Y i, i 1. Fact 4.1 Fuk Nagaev type inequality. Let Z, Z 1,..., Z n be i.i.d. X valued random variables and consider a pointwise measurable class of functions g : X R with envelope function. Assume that for some p > 2, E p Z <. Then we have for 0 < η 1, δ > 0 and any t > 0, P max kα k g 1 + ηβ n + t 1kn t 2 exp 2 + δnσ 2 + nc 2 E p Z /t p, where σ 2 = sup g Eg 2 Z, β n = E nα n g and C 2 is a positive constant depending on η, δ and p. Let κ = sup x R d Kx and recall that by K.ii the support of K lies in [ 1/2, 1/2] d. Recall further that f X is bounded on J = t+[ ɛ, ɛ] d. Then for any g ϕ,h k,j with k large enough such that h k,j 2ɛ, it holds that [ Eg ϕ,h X, Y 2 = E ϕ 2 Y K 2 t X h ] h d κ 2 [ 12, 12 ]d E [ F 2 Y X = t uh ] f X t uhdu κ 2 f X J E [F p Y X = t + x] 2/p dx, [ h k,j 2, h k,j 2 ]d which on account of A implies that for k sufficiently large, In the same way, we obtain that sup Eg 2 X, Y h d k,jκ 2 f X J µ 2/p p =: σk,j. 2 g k,j E gx, Y q k,j E q k,j X, Y C qσ 2 k,j, 2 q p, where C q = κ q 2 µ q 2/p p. This gives in particular that E 2 k,j X, Y σ2 k,j and we can infer from 4.2, 4.3 E n k α nk g k,j C n k σ 2 k,j.

POINTWISE UNIFORM IN BANDWIDTH CONSISTENCY 11 Applying the Fuk Nagaev type inequality see Fact 4.1 with η = δ = 1, we find that for all x > 0, P max nα n g k,j x + 2C n k σ 2 n k 1 <nn k,j k exp x2 3n k σk,j 2 + C 2 x p n k E p k,j X, Y x 2 exp 3c 1 n k h d + c 2 x p n k h d k,j, k,j where c 1 = κ 2 f X J µ 2/p p and c 2 = C 2 κ p µ p f X J. Taking x = ρ n k h d k,j log log n k for ρ > 0 and recalling the condition on a n, we get for large enough k, P max nα n g k,j ρ n k h d n k 1 <nn k,j log log n k k exp ρ2 log log n k + c 3ρn k h d k,j 1 p/2 4c 1 log log n k p/2 log n k ρ 2 c 4c 4 ρ2 j p 2 1 1 + log n k log log n k. p/2 Finally, it is not too difficult to see that Lk 2 log n k for any k 1 since ɛ < 1. Therefore, recalling the empirical process representation in 4.1, this implies immediately that for k large enough, nhd ˆϕ n,h t E ˆϕ n,h t P max sup > 2 2ρ n k 1 <nn k log log n Lk j=1 sup a nhb 0 ϕ F P max n k 1 <nn k nα n g k,j 2log n k 1 ρ 2 4c 1 + ρ n k h d k,j log log n k p 2 1Lk c 4 ρ 1 2 log n k log log n k p/2 1 2 p 2 1 2log n k 1 ρ 2 c 4c 4 ρ1 + ξ 1 + log n k log log n k, p/2 for some ξ > 0. Finally, note that for any δ > 0, klog k 1+δ 1 < k=1 so that by taking ρ large enough so that ρ > 8c 1, the previous calculations yield 2.1 via the Borel Cantelli lemma by summing in k. 4.2. Proof under condition B. To start with the proof of Proposition 2.1 in this case, set µ p := sup E [F p Y X = x], p 1, x J so that obviously µ p is finite for all p 1. We consider the function classes k,j exactly as in the previous case and consider the envelope functions k,j x, y = κf y1i x t + [ h k,j 2, h k,j ] d. 2

12 J. DONY AND U. EINMAHL Our proof is similar to the one under condition A. The main difference is that we now use another exponential inequality which follows from a result of Yurinskii. See Theorem 3.3.1 and 3.3.7 in [18]. Fact 4.2 Bernstein type inequality. Let Z, Z 1,..., Z n be i.i.d. X valued random variables and consider a pointwise measurable class of functions g : X R with envelope function. Assume that for some H > 0, E m Z m! 2 σ2 H m 2, m 2, where σ 2 E 2 Z. Then we have for any t > 0, P max kα k g β n + t 1kn t 2 exp 2nσ 2 exp t2 + 2tH 4nσ 2 exp t, 4H where β n = E nα n g. Condition B implies that for some s > 0, and uniformly on x J, E[expsF Y X = x] M + 1 <, where M > 0. Hence, a simple Taylor expansion in combination with the monotone convergence theorem yields that for all x J, s m m! E[F m Y X = x] M, m=1 so that we can bound µ p for any p 2 as In particular, µ 2 2M/s 2. previous case that µ p = sup E [F p Y X = x] p!m x J s p, p 2. Furthermore, we obtain in the same way as in the E 2 k,jx, Y 2Ms 2 h d k,jκ 2 f X J =: σ 2 k,j. With A k,j = t + [ h k,j /2, h k,j /2] d, it then easily follows that for any m 1 and k large enough so that A k,j J, E m k,jx, Y = E[ m k,jx, Y X = x]f X xdx = κ m A k,j E [ F m Y X = x ] f X xdx κ m µ m h d k,j f X J m! 2 σ2 k,jκ m 2 s m 2. Using the same argument as in the previous case, we can find suitable constants A 1, A 2 > 0 such that E n k α nk g k,j A 1 n k σk,j 2 A 2 n k h d k,j.

POINTWISE UNIFORM IN BANDWIDTH CONSISTENCY 13 Hence all the conditions of the above Bernstein type inequality = Fact 4.2 are satisfied for k large enough with H = κ/s and βn 2 = On k h d k,j = on kh d k,j log log n k. This gives us for all 1 j Lk and ρ > 0 note that n k h d j,k log log n k, P max n k 1 <nn k nα n g k,j exp ρ2 s 2 log log n k 8Mκ 2 f X J log n k A3ρ2 + exp log n k A3ρ2 + log n k A4ρ, ρ n k h d k,j log log n k ρs n k h d k,j log log n k + exp ρs 2 j log log n k /4κ with A 3 = s 2 /8Mκ 2 f X J and A 4 = s/2 2κ. Consequently, by the empirical process representation in 4.1, we have for any positive constant ρ < recall also that Lk 2 log n k, k 1 that P max sup sup n k 1 <nn k a nhb 0 ϕ F Lk P j=1 nhd ˆϕ n,h t E ˆϕ n,h t 8 log log n max n k 1 <nn k nα n g k,j 2log n k 1 A3ρ2 + 2log n k 1 A4ρ. 4κ > ρ ρ n k h d k,j log log n k Finally, by taking ρ large enough such that ρ > 2/A 3 2/A 4, the result follows from the Borel Cantelli lemma by summing in k. References [1] Csörgő, S., Deheuvels, P. and Mason, D.M. 1985. Kernel estimates of the tail index of a distribution, Ann. Stat. 13, 1050 1077. [2] Deheuvels, P. and Mason, D.M. 1994. Functional laws of the iterated logarithm for local empirical processes indexed by sets. Ann. Probab. 22, 1619 1661. [3] Dony, J. 2008. Nonparametric regression estimation : An empirical process approach to the uniform in bandwidth consistency of kernel type estimators and conditional U statistics. Ph.D. Thesis, Vrije Universiteit Brussel, Belgium. [4] Dony, J., Einmahl, U. and Mason, D.M. 2006. Uniform in bandwidth consistency of local polynomial regression function estimators. Austr. J. Stat. 35, 105 120. [5] Dony, J. and Mason, D.M. 2009. Uniform in bandwidth consistency of kernel estimators of the tail index. In progress. [6] Einmahl, U. and Li, D. 2008. Characterization of LIL behavior in Banach space. Trans. Am. Math. Soc. 360, 6677-6693. [7] Einmahl, U. and Mason, D.M. 1997. aussian approximation of local empirical processes indexed by functions. Prob. Th. Rel.Fields 107, 283 311. [8] Einmahl, U. and Mason, D.M. 1998. Strong approximations to the local empirical process. Progr. Probab. 43, 75 92. [9] Einmahl, U. and Mason, D.M. 2000. An empirical process approach to the uniform consistency of kernel-type function estimators. J. Theor. Probab. 13, 1-37. [10] Einmahl, U. and Mason, D.M. 2005. Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33, 1380 1403. [11] iné, E. and uillou, A. 2002. Rates of strong consistency for multivariate kernel density estimators. Ann. Inst. H. Poincaré. 38, 907-921.

14 J. DONY AND U. EINMAHL [12] Haerdle, W. 1984. A law of the iterated logarithm for nonparametric regression function estimators. Ann. Statist. 12 624-635. [13] Haerdle, W., Janssen, P. and Serfling, R. 1988. Strong uniform consistency rates for estimators of conditional functionals. Ann. Statist. 16, 1428-1449. [14] Hall, P. 1981. Laws of the iterated logarithm for nonparametric density estimators. Z. Wahrsch. verw. ebiete. 56, 47 61. [15] Jain, M.C. and Marcus, M.B. 1978. Continuity of sub aussian processes, Dekker, New York. Advances in Probability 4, 81 196. [16] Ledoux, M. and Talagrand, M. 1991. Probability on Banach Spaces. Springer-Verlag, Berlin. [17] van der Vaart, A.W. and Wellner, J.A. 1996. Weak convergence and empirical processes. With applications to statistics. Springer Series in Statistics. Springer Verlag, New York. [18] Yurinskii, V.V. 1995. Sums and aussian vectors. Lecture Notes in Mathematics 1617 Springer Verlag, Berlin. Department of Mathematics, Free University of Brussels VUB, Pleinlaan 2, B-1050 Brussels, Belgium E-mail address: jdony@vub.ac.be E-mail address: ueinmahl@vub.ac.be