Asymptotic normality of the L k -error of the Grenander estimator

Similar documents
arxiv:math/ v1 [math.st] 11 Feb 2006

Nonparametric estimation under shape constraints, part 2

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

Nonparametric estimation under Shape Restrictions

A Law of the Iterated Logarithm. for Grenander s Estimator

Chernoff s distribution is log-concave. But why? (And why does it matter?)

The International Journal of Biostatistics

Nonparametric estimation under shape constraints, part 1

INCONSISTENCY OF BOOTSTRAP: THE GRENANDER ESTIMATOR

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Estimation of the Bivariate and Marginal Distributions with Censored Data

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

Closest Moment Estimation under General Conditions

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Closest Moment Estimation under General Conditions

A central limit theorem for the Hellinger loss of Grenander-type estimators

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

Estimation of a quadratic regression functional using the sinc kernel

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Selected Exercises on Expectations and Some Probability Inequalities

Les deux complices. Piet Groeneboom, Delft University. Juin 28, 2018

Concentration behavior of the penalized least squares estimator

Density estimators for the convolution of discrete and continuous random variables

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

Optimal global rates of convergence for interpolation problems with random design

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Consistency of the maximum likelihood estimator for general hidden Markov models

Maximum likelihood: counterexamples, examples, and open problems

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Analysis Qualifying Exam

Maximum Likelihood Estimation under Shape Constraints

Stochastic Models (Lecture #4)

Definitions & Theorems

Asymptotic Statistics-III. Changliang Zou

Nonparametric estimation of log-concave densities

Conditional moment representations for dependent random variables

arxiv:submit/ [math.st] 6 May 2011

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Austin Mohr Math 704 Homework 6

ON A DISTANCE DEFINED BY THE LENGTH SPECTRUM ON TEICHMÜLLER SPACE

STAT 512 sp 2018 Summary Sheet

STAT 200C: High-dimensional Statistics

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

arxiv: v3 [math.st] 29 Nov 2018

1 Definition of the Riemann integral

Bahadur representations for bootstrap quantiles 1

I forgot to mention last time: in the Ito formula for two standard processes, putting

1. Let A R be a nonempty set that is bounded from above, and let a be the least upper bound of A. Show that there exists a sequence {a n } n N

Maximum Likelihood Drift Estimation for Gaussian Process with Stationary Increments

Problem Set 5: Solutions Math 201A: Fall 2016

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

van Rooij, Schikhof: A Second Course on Real Functions

REAL AND COMPLEX ANALYSIS

Module 3. Function of a Random Variable and its distribution

A generalization of Strassen s functional LIL

On variable bandwidth kernel density estimation

The Lindeberg central limit theorem

Random Bernstein-Markov factors

9 Brownian Motion: Construction

Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Strong approximations for resample quantile processes and application to ROC methodology

1 Glivenko-Cantelli type theorems

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

The range of tree-indexed random walk

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Problem List MATH 5143 Fall, 2013

arxiv: v2 [math.pr] 4 Sep 2017

. Find E(V ) and var(v ).

Lecture 12. F o s, (1.1) F t := s>t

The main results about probability measures are the following two facts:

Problem set 1, Real Analysis I, Spring, 2015.

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Notes on uniform convergence

Connection to Branching Random Walk

5.1 Approximation of convex functions

Sequences and Series of Functions

0.1 Uniform integrability

Metric Spaces and Topology

DA Freedman Notes on the MLE Fall 2003

Week 2: Sequences and Series

McGill University Math 354: Honors Analysis 3

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Sums of exponentials of random walks

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

Statistical Properties of Numerical Derivatives

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Transcription:

Asymptotic normality of the L k -error of the Grenander estimator Vladimir N. Kulikov Hendrik P. Lopuhaä 1.7.24 Abstract: We investigate the limit behavior of the L k -distance between a decreasing density f and its nonparametric maximum likelihood estimator ˆf n for k 1. Due to the inconsistency of ˆf n at zero, the case k = 2.5 turns out to be some kind of transition point. We extent asymptotic normality of the L 1 -distance to the L k -distance for 1 k < 2.5, and obtain the analogous limiting result for a modification of the L k -distance for k 2.5. Since the L 1 -distance is the area between f and ˆf n, which is also the area between the inverse g of f and the more tractable inverse U n of ˆf n, the problem can be reduced immediately to deriving asymptotic normality of the L 1 -distance between U n and g. Although we loose this easy correspondence for k > 1, we show that the L k -distance between f and ˆf n is asymptotically equivalent to the L k -distance between U n and g. EURANDOM, P.O. Box 513-56 MB Eindhoven, the Netherlands, kulikov@eurandom.nl Delft University of Technology, Mekelweg 4 2628 CD Delft, the Netherlands, h.p.lopuhaa@its.tudelft.nl i

1 Introduction Let f be a non-increasing density with compact support. Without loss of generality, assume this to be the interval [, 1]. The non-parametric maximum likelihood estimator ˆf n for f has been discovered by Grenander (1956). It is defined as the left derivative of the least concave majorant (LCM) of the empirical distribution function F n constructed from a sample X 1,..., X n from f. Prakasa Rao (1969) obtained the earliest result on the asymptotic pointwise behavior of the Grenander estimator. One immediately striking feature of this result is that the rate of convergence is of the same order as the rate of convergence of histogram estimators, and that the asymptotic distribution is not normal. It took much longer to develop distributional theory for global measures of performance for this estimator. The first distributional result for a global measure of deviation was the convergence to a normal distribution of the L 1 -error mentioned in Groeneboom (1985) (see Groeneboom, Hooghiemstra and Lopuhaä (1999) for a rigorous proof). A similar result in the regression setting has been obtained by Durot (2). In this paper we extend the result for the L 1 -error to the L k -error, for k 1. We will follow the same approach as in Groeneboom, Hooghiemstra and Lopuhaä (1999), which, instead of comparing ˆf n to f, compared both inverses. The corresponding L 1 -errors are the same, since they represent the area between the graphs of ˆfn and f and the area between the graphs of the inverses. Clearly, for k > 1 we no longer have such an easy correspondence between the two L k -errors. Nevertheless, we will show that the L k -error between ˆf n and f can still be approximated by a scaled version of the L k -error between the two inverses and that this scaled version is asymptotically normal. The main reason to do a preliminary inversion step, is that we use results from Groeneboom, Hooghiemstra and Lopuhaä (1999) on the inverse process. But apart from this, we believe that working with ˆf n directly will not make life easier. For a [, f()], the (left continuous) inverse of ˆfn is U n (a) = supx [, 1] : ˆfn (x) a. Since ˆf n (x) is the left continuous slope of the LCM of F n at the point x, a simple picture shows that it has the following more useful representation U n (a) = argmaxf n (x) ax. (1.1) x [,1] Here the argmax function is the supremum of the times at which the maximum is attained. Since U n (a) can be seen as the x-coordinate of the point that is touched first when dropping a line with slope a on F n, with probability one, ˆf n (x) a if and only if U n (a) x. Asymptotic normality of the L k error relies on embedding the process F n (x) ax into a Brownian motion with drift. The fact that the difference between F n (x) ax and the limit process is small, directly implies that the difference of the locations of their maxima is small. However, it does not necessarily imply that the difference of the slopes of the LCM s of both processes is small. Similarly, convergence in distribution of suitably scaled finite dimensional projections of U n follows immediately from distributional convergence of F n after suitable scaling, and an argmax type of continuous mapping theorem (see for instance Kim and Pollard (199)). When working with ˆf n directly, similar to Lemma 4.1 in Prakasa Rao (1969), one needs to bound the probability that the LCM of a gaussian approximation of F n on [, 1] differs from the one restricted to a shrinking interval, which is somewhat technical and tedious. Another important difference between the case k > 1 and the case k = 1, is the fact that for large k, the inconsistency of ˆf n at zero, as shown by Woodroofe and Sun (1993), 1

starts to dominate the behavior of the L k -error. By using results from Kulikov and Lopuhaä (22) on the behavior of ˆf n near the boundaries of the support of f, we will show that for 1 k < 2.5 the L k -error between ˆf n and f is asymptotically normal. This result can be formulated as follows. Define for c IR, V (c) = argmaxw (t) (t c) 2, (1.2) t IR ξ(c) = V (c) c, (1.3) where W (t) : < t < denotes standard two-sided Brownian motion on IR originating from zero (i.e. W () = ), and Theorem 1.1 (Main theorem). Let f be a decreasing density on [, 1], satisfying: (A1) < f(y) f(x) f() <, for x y 1; (A2) f is twice continuously differentiable; (A3) inf x (,1) f (x) >. Then for 1 k < 2.5, with µ k = E V () k 1/k, (4f(x) f (x) ) dx k/3 the random variable ( ) 1/k n 1/6 n 1/3 ˆf n (x) f(x) k dx µ k converges in distribution to a normal random variable with zero mean and variance f(x) (2k+1)/3 f (x) (2k 2)/3 dx ( k 2 E V () k ( f(x) f (x) ) ) (2k 2)/k 8 k/3 dx cov( ξ() k, ξ(c) k )dc. Note that the theorem holds under the same conditions as in Groeneboom et al. (1999). For k 2.5, Theorem 1.1 is no longer true. However, the results from Kulikov and Lopuhaä (22) enable us to show that an analogous limiting result still holds for a modification of the L k -error. In Section 2 we introduce a Brownian approximation of U n and derive asymptotic normality of a scaled version of the L k -distance between U n and the inverse g of f. In Section 3 we show that on segments [s, t], where the graph of ˆfn does not cross the graph of f, the difference t f(s) ˆf n (x) f(x) k U n (a) g(a) k dx s f(t) g (a) k 1 da is of negligible order. Together with the behavior near the boundaries of the support of f, for 1 k < 2.5, we establish asymptotic normality of the L k -distance between ˆf n and f in Section 4. In Section 5 we investigate the case k > 2.5, and prove a result analogous to Theorem 1.1 for a modified L k -error. Remark 1.1 With almost no additional effort one can establish asymptotic normality of a weighted L k -error n k/3 ˆf n (t) f(t) k w(t) dt, where w is continuous differentiable on [, 1]. This may be of interest when one wants to use weights proportional to negative powers of the limiting standard deviation ( 1 2 f(t) f (t) ) 1/3 of ˆf n (t). Moreover, when w is estimated at a sufficiently fast rate, one may also replace w by its estimate in the above integral. Similar results are in Kulikov and Lopuhaä (24) on a weighted L k -error. 2

2 Brownian approximation In this section we will derive asymptotic normality of the L k -error of the inverse process of the Grenander estimator. For this we follow the same line of reasoning as in Sections 3 and 4 in Groeneboom et al.(1999). Therefore, we only mention the main steps and transfer all proofs to the appendix. Let E n denote the empirical process n(f n F ). For n 1, let B n be versions of the Brownian bridge constructed on the same probability space as the uniform empirical process E n F 1 via the Hungarian embedding, and define versions W n of Brownian motion by W n (t) = B n (t) + ξ n t, t [, 1], (2.1) where ξ n is a standard normal random variable, independent of B n. For fixed a (, f()) and J = E, B, W, define [ ] Vn J (a) = argmax Xn J (a, t) + n 2/3 F (g(a) + n 1/3 t) F (g(a)) n 1/3 at, (2.2) t where Xn E (a, t) = n 1/6 E n (g(a) + n 1/3 t) E n (g(a)), Xn B (a, t) = n 1/6 B n (F (g(a) + n 1/3 t)) B n (F (g(a))), Xn W (a, t) = n 1/6 W n (F (g(a) + n 1/3 t)) W n (F (g(a))). (2.3) One can easily check that Vn E (a) = n 1/3 U n (a) g(a). A graphical interpretation and basic properties of Vn J are provided in Groeneboom et al.(1999). For n tending to infinity, properly scaled versions of Vn J will behave as ξ(c) defined in (1.3). As a first step we prove asymptotic normality for a Brownian version of the L k -distance between U n and g. This is an extension of Theorem 4.1 in Groeneboom et al.(1999). Theorem 2.1 Let V W n be defined as in (2.2) and ξ by (1.3). Then for k 1, n 1/6 f() V W n (a) k E V W n (a) k g (a) k 1 da converges in distribution to a normal random variable with zero mean and variance σ 2 = 2 (4f(x)) (2k+1)/3 f (x) (2k 2)/3 dx cov( ξ() k, ξ(c) k ) dc. The next lemma shows that the limiting expectation in Theorem 2.1 is equal to µ k = E V () k ( 4f(x) f (x) ) 1/k k/3 dx. (2.4) Lemma 2.1 Let Vn W be defined by (2.2) and let µ k be defined by (2.4). Then for k 1, f() E V W lim n n1/6 n (a) k g (a) k 1 da µk k =. 3

The next step is to transfer the result of Theorem 2.1 to the L k -error of V E n. This can be done by means of the following lemma. Lemma 2.2 For J = E, B, W, let Vn J be defined as in (2.2). Then for k 1, we have f() ( n 1/6 Vn B (a) k Vn W (a) k) da = o p (1), and f() Vn E (a) k Vn B (a) k da = Op (n 1/3 (log n) k+2 ). From Theorem 2.1 and Lemmas 2.1 and 2.2 we immediately have the following corollary. Corollary 2.1 Let U n be defined by (1.1) and let µ k be defined by (2.4). Then for k 1, ( ) f() n 1/6 n k/3 U n (a) g(a) k g (a) k 1 da µ k k converges in distribution to a normal random variable with zero mean and variance σ 2 defined in Theorem 2.1. 3 Relating both L k -errors When k = 1, the L k -error has an easy interpretation as the area between two graphs. In that case U n (a) g(a) da is the same as ˆf n (x) f(x) dx, up to some boundaries effects. This is precisely Corollary 2.1 in Groeneboom et al.(1999). In this section we show that a similar approximation holds for t s ˆf n (x) f(x) k dx on segments [s, t], where the graphs of ˆf n and f do not intersect. In order to avoid boundary problems, we will apply this approximation in subsequent sections to a suitable cut-off version f n of ˆf n. Lemma 3.1 Let f n be a piecewise constant left-continuous non-increasing function on [, 1] with a finite number of jumps. Suppose that f n f(), and define its inverse function by Ũ n (a) = sup x [, 1] : f n (x) a, for a [, f()]. applies: If Suppose that [s, t] [, 1], such that one of the following situations 1. fn (x) f(x), for x (s, t), such that f n (s) = f(s) and f n (t+) f(t), 2. fn (x) f(x), for x (s, t), such that f n (t) = f(t) and f n (s) f(s). then for k 1, t f n (x) f(x) k dx s sup x [s,t] f n (x) f(x) < (inf x [,1] f (x ) 2 2 sup x [,1] f (x), (3.1) f(s) f(t) where C > only depends on f and k. Ũn(a) g(a) k g (a) k 1 da C f(s) f(t) Ũn(a) g(a) k+1 g (a) k da, 4

Proof: Let us first consider case 1. Let f n have m points of jump on (s, t). Denote them in increasing order by ξ 1 < < ξ m, and write s = ξ and ξ m+1 = t. Denote by α 1 > > α m the points of jump of Ũn on the interval (f(t), f(s)) in decreasing order, and write f(s) = α and α m+1 = f(t) (see Figure 1). We then have density f f(s) = α α 1 α 2 function f n α 3 α 4 f(t) = α 5. s = ξ ξ 1 ξ 2 g(α 3 ) ξ 3 ξ 4 ξ 5 = t Figure 1: Segment [s, t] where f n f. t s f n (x) f(x) k dx = ξi+1 i= ξ i f n (ξ i+1 ) f(x) k dx. Apply Taylor expansion to f in the point g(α i ) for each term, and note that f n (ξ i+1 ) = α i. Then, if we abbreviate g i = g(α i ), for i =, 1,..., m, we can write the right hand side as ξi+1 f (g i ) k (x g i ) k 1 + f (θ i ) 2f (g i ) (x g k i) dx, i= ξ i for some θ i between x and g i, also using the fact that g i < ξ i < x ξ i+1. Due to condition (3.1) and the fact that f n (ξ i+1 ) = f n (x), for x (ξ i, ξ i+1 ], we have that f (θ i ) f (g i ) (x g i) sup f f(x) f(g i ) inf f inf f sup f f(x) (inf f ) 2 fn (x) 1 2. (3.2) Hence for x (ξ i, ξ i+1 ], 1 + f (θ i )(x g i ) 2f (g i ) where C 1 = sup f 2 inf f k ( 3 2) k 1. Similarly, k 1 + f (θ i ) (x g i ) 2 f (g i ) 1 + f (θ i )(x g i ) 2f (g i ) k 5 sup z [ 1 2, 3 2 ] kz k 1 1 + C 1 (x g i ), 1 C 1 (x g i ).

Therefore we obtain the following inequality t f ξi+1 n (x) f(x) k dx f (g i ) k (x g i ) k dx C 1 s i= ξ i m ξi+1 i= ξ i (x g i ) k+1 dx. After integration, we can rewrite this inequality in the following way: t f n (x) f(x) k dx 1 f (g i ) k (ξ i+1 g i ) k+1 (ξ i g i ) k+1 s k + 1 i= C 1 (ξ i+1 g i ) k+2 (ξ i g i ) k+2. (3.3) k + 2 i= Next, let us consider the corresponding integral for the inverse Ũn: f(s) f(t) Ũn(a) g(a) k g (a) k 1 da = i= αi ξ i+1 g(a) α i+1 g (a) k 1 da = k i= gi+1 g i (ξ i+1 x) k f (x) k dx, now using that g i < x < g i+1 < ξ i+1. Apply Taylor expansion to f in the point g i. For the right hand side, we then obtain i= gi+1 g i (ξ i+1 x) k f (g i ) + f (θ i )(x g i ) k dx, for some θ i between x and g i. Using (3.2), by means of the same arguments as above we get the following inequality: f(s) Ũn(a) g(a) k gi+1 g (a) k 1 da f (g i ) k (ξ i+1 x) k dx f(t) C 1 m gi+1 i= i= g i g i (ξ i+1 x) k (x g i ) dx. (3.4) Since g i < x < g i+1 < ξ i+1, for each term on the right hand side of (3.4), we have that gi+1 gi+1 (ξ i+1 x) k (x g i ) dx (ξ i+1 g i ) (ξ i+1 x) k dx g i g i 1 = (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+1 (ξ i+1 g i ) k + 1 1 (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+2. k + 1 Hence from (3.4) we find that f(s) Ũn(a) g(a) k f(t) g (a) k 1 da 1 k + 1 C 1 k + 1 i= f (g i ) k (ξ i+1 g i ) k+1 (ξ i+1 g i+1 ) k+1 i= (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+2. (3.5) 6

For the third integral in the statement of the lemma, similarly as before, we can write f(s) f(t) Ũn(a) g(a) k+1 g (a) k da = i= f (g i ) k+1 (ξ i+1 x) k+1 1 + f (θ) g i f (g i ) (x g i) gi+1 According to (3.2) we have that for x (g i, g i+1 ), 1 + f (θ) f (g i ) (x g i) 1 2, so that after integration we obtain f(s) f(t) Ũn(a) g(a) k+1 g (a) k da C 2 k + 2 i= k+1 (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+2, (3.6) where C 2 = ( 1 2) k+1 inf f k+1. Now, let us define as the difference between the first two integrals: def = t s f(s) f n (x) f(x) k Ũn(a) g(a) k dx f(t) g (a) k 1 da. By (3.3) and (3.5) and the fact that ξ = g and ξ m+1 = g m+1, we find that D (ξ i+1 g i+1 ) k+1 f (g i ) k f (g i+1 ) k i= +D +D i= i= (ξ i+1 g i ) k+2 (ξ i g i ) k+2 (3.7) (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+2, where D is some positive constant that depends only on the function f and k. By a Taylor expansion, the first term on the right hand side of (3.7) can be bounded by D (ξ i+1 g i+1 ) k+1 f (g i ) k f (g i ) + f (θ i )(g i+1 g i ) k i= D (ξ i+1 g i+1 ) k+1 f (g i ) k 1 1 + f (θ i )(g i+1 g i ) k f (g i= i ) D (ξ i+1 g i+1 ) k+1 (g i+1 g i ) sup f k sup f inf f sup kz k 1 x [ 1 2, 3 2 ] C 3 i= m i= (ξ i+1 g i+1 ) k+1 (g i+1 g i ), with C 3 only depending on f and k, where we also use (3.2) and the fact that according to (3.1), we have that (g i+1 g i ) sup f / inf f < 1 2. Since g i < g i+1 < ξ i+1, this means that. 7

the first term on the right hand side of (3.7) can be bounded by m C 3 i= m (ξ i+1 g i+1 ) k+1 (g i+1 g i ) C 3 (ξ i+1 g i ) (ξ i+1 g i+1 ) (ξ i+1 g i+1 ) k+1 i= i= C 3 m i= (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+2. Because ξ = g and ξ m+1 = g m+1, for the second term on the right hand side of (3.7), we have that (ξ i+1 g i ) k+2 (ξ i g i ) k+2 = (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+2. i= Putting things together and using (3.6) we find that C 4 m i= (ξ i+1 g i ) k+2 (ξ i+1 g i+1 ) k+2 C 5 f(s) where C 5 only depends on f and k. This proves the lemma for case 1. f(t) Ũn(a) g(a) k+1 g (a) k da,. f(s) = α α 1 density f α 2 α 3 α 4 f(t) = α 5 function f n s = ξ ξ 1 ξ 2 ξ 3 g(α 3 ) ξ 4 ξ 5 = t Figure 2: Segment [s, t] where f n f. For case 2, the proof is similar. The main differences are that f n (ξ i ) = α i (see Figure 2) and that the Taylor expansions are applied to f in g i+1 instead of g i. Similar to (3.3), now using that g i+1 ξ i+1 ξ i, we obtain t f n (x) f(x) k dx 1 f (g i+1 ) k (g i+1 ξ i ) k+1 (g i+1 ξ i+1 ) k+1 k + 1 s C 1 k + 2 i= i= (g i+1 ξ i ) k+2 (g i+1 ξ i+1 ) k+2. (3.8) 8

Similar to (3.5), now using that g i+1 g i ξ i, we now obtain f(s) Ũn(a) g(a) k f(t) g (a) k 1 da 1 f (g i+1 ) k (g i+1 ξ i ) k+1 (g i ξ i ) k+1 k + 1 i= C 1 (g i+1 ξ i ) k+2 (g i ξ i ) k+2, (3.9) k + 1 i= and similar to (3.6) we find f(s) f(t) Ũn(a) g(a) k+1 g (a) k da C 2 k + 2 i= (g i+1 ξ i ) k+2 (g i ξ i ) k+2. (3.1) For the difference between the two integrals, again using that ξ = g and ξ m+1 = g m+1, we now find D (g i ξ i ) k+1 f (g i ) k f (g i+1 ) k i= i= +D +D i= i= (g i+1 ξ i ) k+2 (g i ξ i ) k+2 (3.11) (g i+1 ξ i ) k+2 (g i+1 ξ i+1 ) k+2, where D is some positive constant that depends only on the function f and k. The first two terms on the right hand side of (3.11) can be bounded similar to the first two terms on the right hand side of (3.7), which yields m C 4 (g i+1 ξ i ) k+2 (g i ξ i ) k+2 f(s) Ũn(a) g(a) k+1 C 5 g (a) k da, where C 5 only depends on f and k. This proves the lemma for case 2. f(t) 4 Asymptotic normality of the L k -error of ˆf n We will apply Lemma 3.1 to the following cut-off version of ˆf n : f() if ˆf n (x) f(), f n (t) = ˆf n (x) if ˆf n (x) < f(), if ˆf n (x) <. (4.1) The next lemma shows that f n satisfies condition (3.1) with probability tending to one. Lemma 4.1 Define the event A n = sup f n (x) f(x) inf x [,1] f (x) 2 2 sup t [,1] f. (x) Then P A c n. x [,1] 9

Proof: It is sufficient to show that sup f n (x) f(x) tends to zero. For this we can follow the line of reasoning in Section 5.4 of Groeneboom and Wellner (1992). Similar to their Lemma 5.9 we derive from our Lemma A.1 that for each a (, f()), P U n (a) g(a) n 1/3 log n C 1 exp C 2 (log n) 3. By monotonicity of U n and the conditions of f, this means that there exists a constant C 3 > such that P sup U n (a) g(a) C 3 n 1/3 log n a (,f()) C 1 exp 1 2 C 2(log n) 3. This implies that the maximum distance between successive points of jump of ˆf n is of the order O(n 1/3 log n). Since both f n and f are monotone and bounded by f(), this also means that the maximum distance between f n and f is of the order O(n 1/3 log n). The difference between the L k -errors for ˆf n and f n is bounded as follows ˆf n (x) f(x) k dx f n (x) f(x) k dx (4.2) Un(f()) ˆf n (x) f(x) k dx + U n() ˆf n (x) f(x) k dx. The next lemma shows that the integrals on the right hand side are of negligible order. Lemma 4.2 Let U n be defined in (1.1). Then Un(f()) ˆf n (x) f(x) k dx = o p (n (2k+1)/6 ), and ˆf n (x) f(x) k dx = o p (n (2k+1)/6 ). U n () Proof: Consider the first integral, which can be bounded by Un(f()) Un(f()) 2 k ˆf n (x) f() k dx + 2 k f(x) f() k dx 2 k Un(f()) ˆf n (x) f() k dx + 2k k + 1 sup f k U n (f()) k+1. (4.3) Define the event B n = U n (f()) n 1/3 log n. Then U n (f()) k+1 1 Bn = o p (n (2k+1)/6 ). Moreover, according to Theorem 2.1 in Groeneboom et al.(1999) it follows that P Bn c. Since for any η >, ) P (n 2k+1 6 U n (f()) k+1 1 B c n > η P Bn c, this implies that the second term in (4.3) is of the order o p (n (2k+1)/6 ). The first term in (4.3) can be written as ( ) ( Un 2 k (f()) ) ˆf Un n (x) f() k dx 1 Bn + 2 k (f()) ˆf n (x) f() k dx 1 B c n, (4.4) 1

where the second integral is of the order o p (n (2k+1)/6 ) by the same reasoning as before. To bound the first integral in (4.4), we will construct a suitable sequence (a i ) m i=1, such that the intervals (, n a 1 ] and (n a i, n a i+1 ], for i = 1, 2,..., m 1, cover the interval (, U n (f())], and such that the integrals over these intervals can be bounded appropriately. First of all let 1 > a 1 > a 2 > > a m 1 1/3 > a m, (4.5) and let z = and z i = n a i, i = 1,..., m, so that < z 1 < < z m 1 n 1/3 < z m. On the event B n, for n sufficiently large, the intervals (, n a 1 ] and (n a i, n a i+1 ] cover (, U n (f())]. Hence, when we denote J i = [z i U n (f()), z i+1 U n (f())], the first integral in (4.4) can be bounded by m 1 i= ( ( ˆfn (x) f() ) ) m 1 k dx 1 Bn (z i+1 z i ) ˆf n (z i ) f() k, J i using that ˆf n is decreasing and the fact that J i (, U n (f())], so that ˆf n (z i ) f() ˆf n (x) f(), for x J i. It remains to show that i= m 1 (z i+1 z i ) ˆf n (z i ) f() k = o p (n (2k+1)/6 ). (4.6) i= From Woodroofe and Sun (1993), we have that ˆf n () f() sup 1 j< j Γ j, (4.7) in distribution, where Γ j are partial sums of standard exponential random variables. Therefore z 1 ˆf n () f() k = O p (n a 1 ). (4.8) According to Theorem 3.1 in Kulikov and Lopuhaä (22), for 1/3 α < 1, ( ) n (1 α)/2 ˆfn (n α ) f(n α ) Z, (4.9) in distribution, where Z is a non-degenerate random variable. Since for any i = 1,..., m 1 we have that 1/3 a i < 1, it follows that ˆf n (z i ) f() ˆf n (z i ) f(z i ) + sup f z i = O p (n (1 a i)/2 ) + O p (n a i ) = O p (n (1 a i)/2 ). This implies that for i = 1,..., m 1, (z i+1 z i ) ˆf n (z i ) f() k = O p (n a i+1 k(1 a i )/2 ). (4.1) Therefore, if we can construct a sequence (a i ) satisfying (4.5), as well as a i+1 + k(1 a i) 2 a 1 > 2k + 1, (4.11) 6 > 2k + 1, for all i = 1,..., m 1, (4.12) 6 11

then (4.6) follows from (4.8) and (4.1). One may take a 1 = 2k + 7 12 a i+1 = k(a i 1) 2 + 2k + 3, for i = 1,..., m 1. 8 Since k < 2.5, it immediately follows that (4.11) and (4.12) are satisfied. To show that (4.5) holds, first note that 1 > a 1 > 1/3, because k < 2.5. It remains to show that the described sequence strictly decreases and reaches 1/3 in finitely many steps. As long as a i > 1/3, it follows that a i a i+1 = 2 k 2 a i + 2k 3. 8 When k = 2, this equals 1/8. For 1 k < 2, use that a i > 1/3, to find that a i a i+1 > 1/24, and for 2 k < 2.5, use that a i a 1 = (2k +1)/7, to find that a i a i+1 (k +1)(2.5 k)/12. This means that the sequence (a i ) also satisfies (4.5), which proves (4.6). Similar to (4.3), the second integral can be bounded by 2 k U n () ˆf n (x) k dx + 2k k + 1 sup f k (1 U n () k+1. From here the proof is similar as above. We can use the same sequence (a i ) as before, and take B n = 1 U n () n 1/3 log n. If we now define z = 1, z i = 1 n a i, for i = 1, 2,..., m 1, then similar to the argument above, we are left with considering ( ) 1 ˆf m 1 n (x) k dx 1 Bn (z i z i+1 ) ˆf n (z i ) k. (4.13) U n () i= The first term is ( 1 (1 n a 1 ) ) ˆf n (1) k = n a 1 k. According to Theorem 4.1 in Kulikov and Lopuhaä (22), for 1/3 < α < 1, ( ( n (1 α)/2 f(1 n α ) ˆf ) 1/2 n (1 n α ) argmaxw (t) t ) 2, (4.14) t [, ) in distribution. Hence, for i = 1, 2,..., m 1 each term is of the order O p (n a i+1 k(1 a i )/2 ). As before the sequence (a i ) chosen above satisfies (4.11) and (4.12), which implies that (4.13) is of the order o p (n (2k+1)/6 ). This proves the lemma. We are now able to prove our main result concerning the asymptotic normality of the L k -error, for 1 k < 2.5. Proof of Theorem 1.1: First consider the difference f() ˆf n (x) f(x) k U n (a) g(a) k dx g (a) k 1 da, (4.15) 12

which can be bounded by ˆf n (x) f(x) k dx where R n = f n (x) f(x) k dx + R n, (4.16) f() f n (x) f(x) k U n (a) g(a) k dx g (t) k 1 da. Let A n be the event defined in Lemma 4.1, so that P A c n. As in the proof of Lemma 4.2, this means that R n 1 A c n = o p (n (2k+1)/6 ). Note that on the event A n, the function f n satisfies the conditions of Lemma 3.1, and that for any a [, f()], U n (a) = supt [, 1] : ˆfn (t) > a = supt [, 1] : f n (t) > a = Ũn(a). Moreover, we can construct a partition [, s 1 ], (s 1, s 2 ],..., (s l, 1] of [, 1] in such a way that on each element of the partition, f n satisfies either condition 1 or condition 2 of Lemma 3.1. This means that we can apply Lemma 3.1 to each element of the partition. Putting things together, it follows that R n 1 An is bounded from above by C f() U n (a) g(a) k+1 g (a) k da. Corollary 2.1 implies that this integral is of the order O p (n (k+1)/3 ), so that R n 1 An = o p (n (2k+1)/6 ). Finally, the first difference in (4.16) can be bounded as in (4.2), which means that according to Lemma 4.2 it is of the order o p (n (2k+1)/6 ). Together with Corollary 2.1, this implies that ( ) n 1/6 n k/3 ˆf n (x) f(x) k dx µ k k N(, σ 2 ), where σ 2 is defined in Theorem 2.1. An application of the δ-method then yields that ( ( ) 1/k ) n 1/6 n 1/3 ˆf n (x) f(x) k dx µ k converges to a normal random variable with mean zero and variance 1 ( 1/k 1 2 µ k k k) σ 2 σ 2 = k 2 µ 2k 2 = σk 2. k 5 Asymptotic normality of a modified L k -error for large k For large k the inconsistency of ˆf n at zero starts to dominate the behavior of the L k -error. The following lemma indicates that for k > 2.5 the result of Theorem 1.1 does not hold. For k > 3, the L k -error tends to infinity, whereas for 2.5 < k 3, we are only able to prove that the variance of the integral near zero tends to infinity. In the latter case, it is in principle possible that the behavior of the process ˆf n f on [, z n ] depends on the behavior of the process on [z n, 1] in such a way that the variance of the whole integral stabilizes, but this seems unlikely. The proof of this lemma is transferred to the appendix. 13

Lemma 5.1 Let z n = 1/(2nf()). Then we have the following. (i) If k > 3, then n k/3 E ˆf n (x) f(x) k dx. ( zn ) (ii) If k > 2.5, then var n (2k+1)/6 ˆf n (x) f(x) k dx. Although Lemma 5.1 indicates that for k > 2.5 the result Theorem 1.1 will not hold for the usual L k -error, a similar result can be derived for a modified version. For k 2.5 we will consider a modified L k -error of the form ( ) 1 1/k n 1/6 n1/3 ˆf n (x) f(x) k dx µ k, (5.1) where µ k is the constant defined in Theorem 1.1. In this way, for suitable choices of ɛ, we avoid a region where the Grenander estimator is inconsistent in such a way that we are still able to determine its global performance. We first determine for what values of ɛ we cannot expect asymptotic normality of (5.1). First of all, for ɛ > 1, similar to the proof of Lemma 5.1, it follows that ( zn ) var n (2k+1)/6 ˆf n (x) f(x) k dx. For ɛ < 1/6, in view of Lemma 3.1 and the Brownian approximation discussed in Section 2, we have that n (2k+1)/6 ˆf n (x) f(x) k dx will behave as n 1/6 f(1 ) f( ) n k/3 U W n (a) g(a) k g (a) k 1 da, which, according to Lemma 2.1, is of the order O(n 1/6 ɛ ). Hence, we also cannot expect asymptotic normality of (5.1) for ɛ < 1/6. Finally, for (k 1)/(3k 6) < ɛ < 1, a more tedious argument in the same spirit as the proof of Lemma 5.1, yields that ) 2 (n (2k+1)/6 var ˆf n (x) f(x) k dx. Hence, in order to obtain a proper limit distribution for (5.1) for k 2.5, we will choose ɛ between 1/6 and (k 1)/(3k 6). To prove a result analogous to Theorem 1.1, we define another cut-off version of the Grenander estimator: fn(x) ɛ = and its inverse function f( ) if ˆf n (x) f( ), ˆf n (x) if f(1 ) ˆf n (x) < f( ), f(1 ) if ˆf n (x) < f(1 ), U ɛ n(a) = sup x [, 1 ] : ˆfn (x) a, (5.2) for a [f(1 ), f( )]. The next lemma is the analogue of Lemma 4.1. 14

Lemma 5.2 Define the event Then P A ɛ n 1. A ɛ n = sup fn(x) ɛ f(x) inf x [,1] f (x) 2 x [,1] 2 sup t [,1] f (x) Proof: It suffices to show that sup x [,1] fn(x) ɛ f(x). Using the definition of fn ɛ we can bound sup f ɛ n(x) f(x) sup f n(x) ɛ f n (x) + sup fn (x) f(x) (5.3) x [,1] x [,1] x [,1] The first term on the right hand side of (5.3) is smaller than sup f which, together with Lemma 4.1, implies that sup x [,1] f ɛ n(x) f(x) = o p (n 1/6 ). Similar to (4.2), the difference between the modified L k -errors for ˆf n and fn ɛ is bounded as follows ˆf n (x) f(x) k dx fn(x) ɛ f(x) k dx (5.4) U ɛ n (f( )) ˆf n (x) f(x) k dx + U ɛ n(f(1 )). ˆf n (x) f(x) k dx. The next lemma is the analogue of Lemma 4.2 and shows that both integrals on the right hand side are of negligible order. Lemma 5.3 For k 2.5 and 1/6 < ɛ < (k 1)/(3k 6), let U ɛ n be defined in (5.2). Then and U ɛ n (f( )) ˆf n (x) f(x) k dx = o p (n (2k+1)/6 ), U ɛ n (f(1 n ɛ )) ˆf n (x) f(x) k dx = o p (n (2k+1)/6 ). Proof: Consider the first integral, then similar to (4.3) we have that U ɛ 2 k n (f( )) ˆf n (x) f( ) k dx + 2 k U ɛ n (f( )) f( ) f(x) k dx (5.5) U ɛ 2 k n (f( )) ˆf n (x) f( ) k dx + 2k k + 1 sup f k (Un(f(n ɛ ɛ )) ) k+1. If we define the event Bn ɛ = Un(f(n ɛ ɛ )) n 1/3 log n, then by a similar reasoning as in the proof of Lemma 4.2, it follows that (Un(f(n ɛ ɛ )) ) k+1 = o p (n (2k+1)/6 ). The first integral on the right hand side of (5.5) can be written as ( ) ( U ɛ n (f( )) ) U ɛ ˆf n (x) f( ) k n (f( )) dx 1 Bn + ˆf n (x) f( ) k dx 1 B c n, 15

where the second term is of the order o p (n (2k+1)/6 ) by the same reasoning as before. To bound ( ) U ɛ n (f( )) ˆf n (x) f( ) k dx 1 Bn (5.6) we distinguish between two cases: (i) 1/6 < ɛ 1/3 (ii) 1/3 < ɛ < (k 1)/(3k 6) In case (i), the integral (5.6) can be bounded by ˆf n ( ) f( ) k n 1/3 log n. According to Theorem 3.1 in Kulikov and Lopuhaä (22), for < α < 1/3, ( ) n 1/3 ˆfn (n α ) f(n α ) 4f()f () 1/3 V (), (5.7) in distribution, where V () is defined in (1.2). It follows that ˆf n ( ) f( ) = O p (n 1/3 ) and therefore (5.6) is of the order o p (n (2k+1)/6 ). In case (ii), similar to Lemma 4.2, we will construct a suitable sequence (a i ) m i=1, such that the intervals (n a i, n a i+1 ], for i = 1, 2,..., m 1 cover the interval (, U n (f( ))], and such that the integrals over these intervals can be bounded appropriately. First of all let ɛ = a 1 > a 2 > > a m 1 1/3 > a m, (5.8) and let z i = n a i, i = 1,..., m, so that < z 1 < < z m 1 n 1/3 < z m. Then, similar to the proof of Lemma 4.2, we can bound (5.6) as follows ( ) U ɛ n (f( )) ˆf m 1 n (x) f( ) k dx 1 Bn (z i+1 z i ) ˆf n (z i ) f( ) k. Since 1/3 a i ɛ < 1, for i = 1,..., m 1, we can apply (4.9) and conclude that each term is of the order O p (n (1 a i)/2 ). Therefore, it suffices to construct a sequence (a i ) satisfying (5.8), as well as i=1 One may take a i+1 + k(1 a i) 2 > 2k + 1, for all i = 1,..., m 1. (5.9) 6 a 1 = ɛ a i+1 = k(a i 1) 2 + 2k + 1 6 + 1 8 ( ) k 1 3(k 2) ɛ, for i = 1,..., m 1. Then (5.9) is satisfied and it remains to show that the described sequence strictly decreases and reaches 1/3 in finitely many steps. This follows from the fact that a i ɛ and k 2.5, since in that case a i a i+1 = k 2 2 ( ) k 1 3(k 2) a i 1 ( ) k 1 8 3(k 2) ɛ 4k 9 ( ) k 1 8 3(k 2) ɛ >. As in the proof of Lemma 4.2, the argument for the second integral is similar. Now take B ɛ n = 1 U ɛ n(f(1 )) n 1/3 log n. The case 1/6 < ɛ 1/3 can be treated in the 16

same way as before. For the case 1/3 < ɛ < (k 1)/(3k 6), we can use the same sequence (a i ) as above, but now define z i = 1 n a i, i = 1,..., m, so that 1 > z 1 >... > z m 1 1 n 1/3 > z m. Then we are left with considering ( ) 1 f(1 ) ˆf n (x) k dx Un(f(1 n ɛ ɛ )) m 1 1 Bn (z i z i+1 ) f(1 ) ˆf n (z i ) k. As before, each term in the sum is of the order O p ( n a i+1 k(1 a i )/2 ), for i = 1,..., m 1. The sequence chosen above satisfies (5.9) and (5.8), which implies that the sum above is of the order o p (n (2k+1)/6 ). Apart from (5.4) we also need to bound the difference between integrals for U n and its cut-off version Un: ɛ f() U n (a) g(a) k f( ) U ɛ g (a) k 1 da n(a) g(a) k g (a) k 1 da (5.1) f() f n ( ) U n (a) g(a) k g (a) k 1 da + i=1 f(1 ) fn (1 ) U n (a) g(a) k g (a) k 1 da. The next lemma shows that both integrals on the right hand side are of negligible order. Lemma 5.4 For k 2.5, let 1/6 < ɛ < (k 1)/(3k 6). Furthermore let U n be defined in (1.1) and let f n be defined in (4.1). Then f() f n ( ) U n (a) g(a) k g (a) k 1 da = o p (n (2k+1)/6 ), and fn(1 ) Un (a) g(a) k g (a) k 1 da = o p (n (2k+1)/6 ). Proof: Consider the first integral and define the event A n = f() f n ( ) < n 1/6 / log n. For 1/6 < ɛ 1/3, according to (5.7) we have that f() f n ( ) ˆf n ( ) f() ˆf n ( ) f( ) + sup f = O p (n 1/3 ) + O( ) = o p (n 1/6 / log n). This means that if 1/6 < ɛ 1/3, the probability P A c n. For 1/3 < ɛ < 1, P A c n P f() f n ( ) > P ˆfn ( ) f( ) < sup f, since according to (4.9), ˆfn ( ) f( ) is of the order n (1 ɛ)/2. Next, write the first integral as ( f() U n (a) g(a) k ) ( f() U n (a) g(a) k ) g (a) k 1 da 1 An + g (a) k 1 da 1 A c n. (5.11) f n ( ) 17 f n ( )

Similar to the argument used in Lemma 4.2, the second integral in (5.11) is of the order o p (n (2k+1)/6 ). The expectation of the first integral is bounded by f() Un (a) g(a) k f() E f() n 1/6 / log n g (a) k 1 da n k/3 C 1 E V E n (a) k da f() n 1/6 / log n C 2 n (2k+1)/6 / log n, using Lemma A.1. The Markov inequality implies that the first term in (5.11) is of the order o p (n (2k+1)/6 ). For the second integral the proof is similar. Theorem 5.1 Suppose conditions (A1) - (A3) of Theorem 1.1 are satisfied. Then for k 2.5 and for any ɛ, such that 1/6 < ɛ < (k 1)/(3k 6), ( ) 1 1/k n 1/6 n1/3 ˆf n (x) f(x) k dx µ k converges in distribution to a normal random variable with zero mean and variance σk 2, where µ k and σk 2 are defined in Theorem 1.1. Proof: As in the proof of Theorem 1.1, it suffices to show that the difference f() ˆf n (x) f(x) k U n (a) g(a) k dx g (a) k 1 da is of the order o p (n (2k+1)/6 ). We can bound this difference by ˆf n (x) f(x) k dx fn(x) ɛ f(x) k dx (5.12) f() U n (a) g(a) k f( ) U + g (a) k 1 da n(a) ɛ g(a) k f(1 ) g (a) k 1 da (5.13) f( + f ɛ n(x) f(x) k ) U dx n(a) ɛ g(t) k f(1 ) g (a) k 1 da. (5.14) Differences (5.12) and (5.13) can be bounded as in (5.4) and (5.1), so that Lemmas 5.3 and 5.4 imply that these terms are of the order o p (n (2k+1)/6 ). Finally, Lemma 3.1 implies that (5.14) is bounded by f( ) Un(a) ɛ g(a) k+1 g (a) k da. f(1 ) Write the integral as ( f() U n (a) g(a) k+1 f() g (a) k da + U n (a) g(a) k+1 ) f( ) U g (a) k da n(a) ɛ g(a) k+1 f(1 ) g (a) k da. Then Corollary 2.1 and Lemma 5.4 imply that both terms are of the order o p (n (2k+1)/6 ). This proves the theorem. 18

A Appendix The proofs in Section 2 follow the same line of reasoning as in Groeneboom et al.(1999). Since we will frequently use results from this paper, we state them for easy reference. First, the tail probabilities of Vn J have a uniform exponential upper bound. Lemma A.1 For J = E, B, W, let V J n be defined by (2.2). Then there exist constants C 1, C 2 > only depending on f, such that for all n 1, a (, f()) and x >, P V J n (a) x C1 exp( C 2 x 3 ). Properly normalized versions of V J n (a) converge in distribution to ξ(c) defined in (1.3). To be more precise, for a (, f()), define φ 1 (a) = f (g(a)) 2/3 (4a) 1/3 >, (A.15) φ 2 (a) = (4a) 1/3 f (g(a)) 1/3 >, (A.16) and let J n (a) = c : a φ 2 (a)cn 1/3 (, f()). For J = E, B, W and c J n (a), define, V J n,a(c) = φ 1 (a)v J n (a φ 2 (a)cn 1/3 ), (A.17) Then we have the following property. Lemma A.2 For J = E, B, W, integer d 1, a (, f()) and c J n (a) d,we have joint distributional convergence of (V J n,a(c 1 ),..., V J n,a(c d )) to the random vector (ξ(c 1 ),..., ξ(c d )). Due to the fact that Brownian motion has independent increments, the process V W n is mixing. Lemma A.3 The process Vn W (a)) : a (, f()) is strong mixing with mixing function: α n (d) = 12e C 3nd 3, where the constant C 3 > only depends on f. As a direct consequence of Lemma A.3 we have the following lemma, which is a slight extension of Lemma 4.1 in Groeneboom et al.(199). Lemma A.4 Let l and m be fixed such that l + m > and let h be a continuous function. Define c h = 2 (4f(x)) (2l+2m+1)/3 f (x) (4 4l 4m)/3 h(f(x)) 2 dx. Then, ( ) f() var n 1/6 Vn W (a) l Vn W (a) m h(a) da c h cov(ξ() l ξ() m, ξ(c) l ξ(c) m ) dc, as n. 19

Proof: The proof runs along the lines of the proof of Lemma 4.1 in Groeneboom et al.(1999). We first have that ( ) f() var n 1/6 Vn W (a) l Vn W (a) m h(a) da = 2 f() n 1/3 φ 2 (a) 1 (a f()) (4a) (2l+2m+1)/3 g (a) 4(l+m) 1 3 h(a)h(a φ 2 (a)n 1/3 c) ( cov Vn,a() W l Vn,a() W m, Vn,a(c) W l Vn,a(c) W m) dc da. According to Lemma A.1, for a and c fixed, the sequence V W n,a(c) l V W n,a(c) m is uniformly integrable. Hence by Lemma A.2 the moments of ( V W n,a() l V W n,a() m, V W n,a(c) l V W n,a(c) m) converge to corresponding moments of (ξ() l ξ() m, ξ(c) l ξ(c) m ). Again Lemma A.1 and the fact that l + m >, yields that E V W n,a() 3(l+m) < C and E V W n,a(c) 3(l+m) < C, where C > does not depend on n, a and c. Together with Lemma A.3 and Lemma 3.2 in Groeneboom et al.(1999) this yields that ( cov Vn,a() W l Vn,a() W m, Vn,a(c) W l Vn,a(c) W m) D1 e D 2 c 3, where D 1 and D 2 do not depend on n, a and c. It follows by dominated convergence that ( ) f() var n 1/6 Vn,a() W l Vn W (a) m h(a) da c h = c h using that the process ξ is stationary, where c h = 2 = 2 This proves the lemma. f() Proof of Theorem 2.1: Write and define ( cov ξ() l ξ() m, ξ(c) l ξ(c) m) dc, ( cov ξ() l ξ() m, ξ(c) l ξ(c) m) dc, (4a) (2l+2m+1)/3 g (a) (4l+4m 1)/3 h(a) 2 da (4f(x)) (2l+2m+1)/3 f (x) (4 4l 4m)/3 h(f(x)) 2 dx. W k n (a) = V W n (a) k E V W n (a) k g (a) k 1, L n = (f() )n 1/3 (log n) 3, M n = (f() )n 1/3 log n, [ ] [ ] (f() ) n 1/3 N n = = L n + M n log n + (log n) 3, 2

where [x] denotes the integer part of x. We divide the interval (, f()) into 2N n +1 blocks of alternating length A j = ( + (j 1)(L n + M n ), + (j 1)(L n + M n ) + L n ], B j = ( + (j 1)(L n + M n ) + L n, + j(l n + M n )], where j = 1,..., N n. Now write T n,k = S n,k + S n,k + R n,k, where N n S n,k = n 1/6 N n S n,k = n 1/6 Wn k (a) da, j=1 A j Wn k (a) da, B j j=1 R n,k = n 1/6 f() +N n(l n+m n) W k n (a) da. From here on the proof is completely the same as the proof of Theorem 4.1 in Groeneboom et al.(1999). Therefore we omit all specific details and only give a brief outline of the argument. Lemmas A.1 and A.3 imply that all moments of Wn k (a) are bounded uniformly in a, and that E Wn k (a)wn k (b) D 1 exp ( D 2 n b a 3). This is used to ensure that ERn 2 and that the contribution of the small blocks is negligible: E(S n,k )2. We then only have to consider the contribution over the big blocks. When we denote N n Y j = n A 1/6 Wn k (a) da and σn 2 = var, j one finds that E exp iu N n σ n j=1 Y j j=1 j=1 Nn iu E exp Y j σ n 4(N n 1) exp( C 3 nmn) 3, where C 3 > only depends on f. This means that we can apply the central limit theorem to independent copies of Y j. Hence, asymptotic normality of S n,k follows if we show that the contribution of the big blocks satisfies the Lindeberg condition, i.e., for each ε >, 1 N n σn 2 j=1 EY 2 j 1 Yj >εσ n, n. (A.18) By using the uniform boundedness of the moments of W k n (a), we have that 1 N n σn 2 j=1 EY 2 j 1 Yj >ɛσ n 1 ɛσ 3 n N n sup E Y j 3 = O 1 k N n 21 Y j ( σ 3 n n 1/6 (log n) 6).

Similar computations as in the proof of Theorem 4.1 in Groeneboom et al.(1999), yields that σ 2 n = var(t n,k ) + O(1). Application of Lemma A.4, with l =, m = k and h(a) = 1/ g (a) k 1, implies that σ 2 n σ 2, which proves (A.18). In order to prove Lemma 2.1, we first prove the following lemma. Lemma A.5 Let V W n be defined by (2.2) and let V () be defined by (1.2). Then for k 1, and for all a such that n 1/3 F (g(a)) (1 F (g(a))) log n, (A.19) we have E V W n (a) k = E V () k (4a) k/3 f (g(a)) 2k/3 + O(n 1/3 (log n) k+3 ), where the term O(n 1/3 (log n) k+3 ) is uniform in all a satisfying (A.19). Proof: The proof relies on the proof of Corollary 3.2 in Groeneboom et al.(1999). There it is shown that, if we define ( ) H n (y) = n 1/3 H F (g(a)) + n 1/3 y g(a), with H being the inverse of F, and V n,b = sup y [ n 1/3 F (g(a)), n 1/3 (1 F (g(a)))] : W (y) by 2 is maximal, with b = f (g(a)) /(2a 2 ), then for the event A n = Vn W (a) log n, H n (V n,b ) log n, one has that P A c n is of the order O(e C(log n)3 ), which then implies that sup E Vn W (a) H n (V n,b ) = O(n 1/3 (log n) 4 ). a (,f()) Similarly, together with an application of the mean value theorem, this yields E Vn W (a) k H n (V n,b ) k = O(n 1/3 (log n) 3+k ). (A.2) sup a (,f()) Note that by definition, the argmax V n,b closely resembles the argmax V b (), where Therefore we write V b (c) = argmaxw (t) b(t c) 2. (A.21) t IR E H n (V n,b ) k = E H n (V b ()) k + E ( H n (V n,b ) k H n (V b ()) k). (A.22) Since by Brownian scaling V b (c) has the same distribution as b 2/3 V (cb 2/3 ), where V is defined in (1.2), together with the conditions on f, we find that E H n (V b ()) k = a k E V b () k + O(n 1/3 ) = 22 (4a) k/3 f (g(a)) 2k/3 E V () k + O(n 1/3 ).

As in the proof of Corollary 3.2 in Groeneboom et al.(1999), V n,b can only be different from V b () with probability of the order O(e 2 3 (log n)3 ). Hence, from (A.22) we conclude that E H n (V n,b ) k = Together with (A.2) this proves the lemma. (4a) k/3 f (g(a)) 2k/3 E V () k + O(n 1/3 ). Proof of Lemma 2.1: The result immediately follows from Lemma A.5. The values of a for which condition (A.19) does not hold, gives a contribution of the order O(n 1/3 log n) to the integral E Vn W (a) k da, and finally, f() (4a) k/3 f (g(a)) 2k/3 g da = (a) k 1 (4f(x)) k/3 f (x) k/3 dx. Proof of Lemma 2.2: The proof of the first statement relies on the proof of the Corollary 3.3 in Groeneboom et al.(199). Here it is shown, that if for a belonging to the set J n = a : both a and a(1 ξ n n 1/2 ) (, f()), we define Vn B (a, ξ n ) = Vn B (a(1 n 1/2 ξ n )) + n 1/3 g(a(1 n 1/2 ξ n )) g(a), then for the event A n = ξ n n 1/6, Vn W (a) log n, Vn B (a, ξ n ) log n, one has that P A c n is of the order O(e C(log n)3 ), which then implies that E V B n (a, ξ n ) Vn W (a) da = O(n 1/3 (log n) 3 ). a J n Hence, by using the same method as in proof of Lemma 2.1, we obtain: E Vn B (a, ξ n ) k Vn W (a) k da = O(n 1/3 (log n) k+2 ). a J n From Lemma A.1 it also follows that E Vn B (a) k = O(1) and E Vn W (a) k = O(1), uniformly with respect to n and a (, f()). Hence the contribution of the integrals over [, f()] \ J n is negligible, and it remains to show that n 1/6 Vn B (a, ξ n ) k Vn B (a) k da = o p (1). (A.23) a J n For k = 1, this is shown in the proof of Corollary 3.3 in Groeneboom et al.(1999), so we may assume that k > 1. Completely similar to the proof in the case k = 1, we first obtain n 1/6 Vn B (a, ξ n ) k Vn B (a) k da a J n = n 1/6 f() V B n (a) ag (a)ξ n n 1/6 k V B n (a) k da + O p (n 1/3 ). 23

Let ɛ > and write n (a) = ag (a)ξ n n 1/6. Then we can write f() V n 1/6 B n (a) n (a) k Vn B (a) k da f() V = n 1/6 B n (a) n (a) k Vn B (a) k 1 [,ɛ] ( Vn B (a) ) da (A.24) f() V +n 1/6 B n (a) n (a) k Vn B (a) k 1 (ɛ, ) ( Vn B (a) ) da. (A.25) First consider the term (A.24) and distinguish between 1. V B n (a) < 2 n (a), 2. V B n (a) 2 n (a), In case 1, Vn B (a) n (a) k Vn B (a) k 3 k n (a) k + 2 k n (a) k (3 k + 2 k ) ag (a)ξ n k n k/6. In case 2, note that V B n (a) n (a) k V B n (a) k = k θ k 1 n (a), where θ is between Vn B (a) ɛ and Vn B (a) n (a) 3 2 ɛ. Using that ξ n and Vn B are independent, the expectation of (A.24) is bounded from above by C 1 ɛ k 1 E ξ n f() ag (a) P V B n (a) ɛ da + O p (n (k 1)/6 ), where C 1 > only depends on f and k. Hence, since k > 1, we find that lim sup n n 1/6 f() V B n (a) ag (a)ξ n n 1/6 k V B n (a) k 1 [,ɛ] ( V B n (a) ) da (A.26) is bounded from above by C 2 ɛ k 1, where C 2 > only depends on f and k. Letting ɛ and using that k > 1, then yields that (A.24) tends to zero. The term (A.25) is equal to f() 2ξ n ag (a)v B n (a) + (ag (a)ξ n ) 2 n 1/6 V B n (a) n (a) + V B n (a) kθ(a) k 1 1 (ɛ, ) ( V B n (a) ) da, (A.27) where θ(a) is between Vn B (a) n (a) and Vn B (a). Note that for Vn B (a) > ɛ, 2Vn B (a) V n B (a) n (a) + Vn B (a) V n B (a) Vn B (a) ag (a)n 1/6 ξ n = O p (n 1/6 ), ɛ uniformly in a (, f()), so that (A.27) is equal to f() kξ n ag (a)vn B (a) Vn B (a) k 2 1 (ɛ, ) ( Vn B (a) ) da +kξ n f() ag (a) V n B (a) ( Vn B Vn B (a) k 1 θ(a) k 1) 1 (a) (ɛ, ) ( Vn B (a) ) da + O p (n 1/6 ). 24

We have that Vn B (a) k 1 θ(a) k 1 V B n (a) k 1 1 n(a) Vn B (a) k 1 1 = O p(n 1/6 ), where the big O-term in uniform in a. This means that (A.27) is equal to f() kξ n ag (a)vn B (a) Vn B (a) k 2 da (A.28) f() +kξ n ag (a)sign(vn B (a)) Vn B (a) k 1 1 [,ɛ) ( Vn B (a) ) da + O p (n 1/6 ). (A.29) The integral in (A.29) is of the order O(ɛ k 1 ), whereas Eξn 2 = 1. Since k > 1, this means that after letting ɛ, (A.29) tends to zero. Finally, let Sn B (a) = ag (a)vn B (a) V B n (a) k 2 and write ( 2 ( f() ) ( f() 2 f() E ξ n Sn B (a) da) = var Sn B (a) da + E Sn B (a) da). Then, since according to Lemma A.1, all moments of Sn B (a) are bounded uniformly in a, we find by dominated convergence and Lemma A.2 that f() lim E S B n n (a) da = f() a g (a) ( (φ 1 (a)) k Eξ() ξ() k 2) da =, because the distribution of ξ() is symmetric. Applying Lemma A.4 with l = 1, m = k 2 and h(a) = ag (a) we obtain ( f() var ag (a)vn B (a) V n B (a) ) k 2 da = O(n 1/3 ). We conclude that (A.27) tends to zero in probability. This proves the first statement of the lemma. The proof of the second statement relies on the proof of Corollary 3.1 in Groeneboom et al.(1999). There it is shown that for the event A n = V B n (a) < log n, V E n (a) < log n one has that P A c n is of the order O(e C(log n)3 ). Furthermore, if K n = sup t E n (t) B n (F (t)) n 1/2 (log n) 2, then P K n 1 and E V E n (a) V B n (a) 1 An K n = O(n 1/3 (log n) 3 ), (A.3) uniformly in a (, f()). By the mean value theorem, together with (A.3), we now have that E Vn E (a) k Vn B (a) k 1Kn k(log n) k 1 E V E n (a) Vn B (a) 1An K n + 2n k/3 P A c n = O(n 1/3 (log n) k+2 ) + O(n k/3 e C(log n)3 ). This proves the lemma. Before proving Lemma 5.1, we first prove the following lemma. 25

Lemma A.6 Let k 2.5 and z n = 1/(2nf()). Then there exist < a 1 < b 1 < a 2 < b 2 <, such that for i = 1, 2 zn lim inf P n ˆf n (x) f(x) k dx [a i, b i ] >. n Proof: Consider the event A n = X i z n, for all i = 1, 2,..., n. Then it follows that P A n 1/ e > 1/2. Since on the event A n, the estimator ˆf n is constant on the interval [, z n ], for any a i > we have zn ( P n ˆf zn ) n (x) f(x) k dx [a i, b i ] P n ˆf n () f(x) k dx 1 An [a i, b i ] ( = P ˆf ) n () f() k + R n 1 An [a i, b i ], 2f() where zn ( R n = n kθ n (x) k 1 ˆf n () f(x) ˆf ) n () f() dx, with θ n (x) between ˆf n () f(x) and ˆf n () f(). Using (4.7), we obtain that R n is of the order O p (n 1 ) and therefore ˆf n () f() k 2f() + R n f()k 1 2 sup 1 j< j 1 Γ j in distribution. Now, choose < a 1 < b 1 < a 2 < b 2 <, such that for i = 1, 2 f() k 1 P 2 sup j k 1 [a i, b i ] Γ j > 1 1/ e. Then, for i = 1, 2 we find P 1 j< zn n ˆf n (x) f(x) k dx [a i, b i ] P which converges to a positive value. ( ˆf n () f() k 2f() k + R n ) [a i, b i ] P A c n, Proof of Lemma 5.1: Take < a 1 < b 1 < a 2 < b 2 < as in Lemma A.6, and let A ni be the event zn A ni = n ˆf n (x) f(x) k dx [a i, b i ]. Then n k/3 E ˆf zn n (x) f(x) k dx n k/3 E ˆf n (x) f(x) k dx1 An1 a 1 n (k 3)/3 P A n1. Since, according to Lemma A.6, P A n1 tends to a positive constant, this proves (i). 26

For (ii), write X n = n z n ˆf n (x) f(x) k dx, and define B = EX n (a 2 + b 1 )/2. Then var(x n ) E(X n EX n ) 2 1 An1 B + E(X n EX n ) 2 1 An2 B c 1 4 (a 2 b 1 ) 2 P A n1 1 B + 1 4 (a 2 b 1 ) 2 P A n2 1 B c Hence, according to Lemma A.6, lim inf n var 1 4 (a 2 b 1 ) 2 min(p A n1, P A n2 ). ( n (2k+1)/6 zn ) ˆf n (x) f(x) k dx lim inf n n(2k 5)/3 1 4 (a 2 b 1 ) 2 min(p A n1, P A n2 ) =. References Durot, C.(2) Sharp asymptotics for isotonic regression, Probab. Theory and Related Fields 122, no. 2, p.222 24. Grenander, U. (1956). On the theory of mortality measurements, Part II, Skand. Akt. Tid. 39, p.125 153. Groeneboom, P. (1985). Estimating a monotone density, In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer (Edited by L.LeCam and R.Olshen) 2, p.539 555. Groeneboom, P., Hooghiemstra, G. and Lopuhaä, H.P. (1999). Asymptotic normality of the L 1 -error of the Grenander estimator, Ann. Statist. 27, No.4, p.1316 1347. P.Groeneboom and J.A.Wellner (1992) Information bounds and nonparametric maximum likelihood estimation, Birkhäuser Verlag. Kim, J. and Pollard, D. (199). Cube root asymptotics, Ann. Statist. 18, No.1, p.191 219. Komlos, J., Major, P. and Tusnady, G. (1975). An approximation of partial sums of independent rv -s, and the sample DF. I, Z. Wahrsch. Verw. Gebiete 32, p.111 131. Kulikov, V.N. and Lopuhaä, H.P. (22). The behavior of the NPMLE of a decreasing density near the boundaries of the support. Tentatively accepted by Annals of Statistics. Kulikov, V.N. and Lopuhaä, H.P. (24). Testing for a monotone density using L k - distances between the empirical distribution function and its concave majorant. Report 24-28 Eurandom. Submitted. (http://www.eurandom.nl/reports/reports%224/28vkreport.pdf) Pollard, D. (1984). Convergence of stochastical processes, Springer-Verlag. Prakasa Rao, B.L.S. (1969). Estimation of a unimodal density, Sankhya Ser. A 31, p.23 36. Sun, J. and Woodroofe, M. (1996). Adaptive smoothing for a penalized NPMLE of a non-increasing density, J. Statist. Planning and Inference 52, p.143 159. Woodroofe, M. and Sun, J. (1993). A penalized maximum likelihood estimate of f(+) when f is non-increasing, Statist. Sinica 3, p.51 515. 27