On convex least squares estimation when the truth is linear

Size: px

Start display at page:

Download "On convex least squares estimation when the truth is linear"

Emerald Brown
5 years ago
Views:

1 Electronic Journal of Statistics Vol ISSN: DOI: 1.114/15-EJS198 On convex least squares estimation when the truth is linear Yining Chen and Jon A. Wellner London School of Economics and Political Science and University of Washington y.chen11@lse.ac.uk; jaw@stat.washington.edu Abstract: We prove that the convex least squares estimator LSE attains a n 1/ pointwise rate of convergence in any region where the truth is linear. In addition, the asymptotic distribution can be characterized by a modified invelope process. Analogous results hold when one uses the derivative of the convex LSE to perform derivative estimation. These asymptotic results facilitate a new consistent testing procedure on the linearity against a convex alternative. Moreover, we show that the convex LSE adapts to the optimal rate at the boundary points of the region where the truth is linear, up to a log-log factor. These conclusions are valid in the context of both density estimation and regression function estimation. MSC 1 subject classifications: Primary 6E, 6G7, 6G8, 6G1, 6G; secondary 6G15. Keywords and phrases: Adaptive estimation, convexity, density estimation, least squares, regression function estimation, shape constraint. Received December 14. Contents 1 Introduction Estimation of a density function A special case Rate of convergence Asymptotic distribution Testing against a general convex density More general settings Adaptation at the boundary points Estimation of a regression function Basic properties Rate of convergence and asymptotic distribution Appendices: proofs Appendix I: existence of the limit process Appendix II: pointwise adaptation for cases B and C Acknowledgments References

2 17 Y. Chen and J. A. Wellner 1. Introduction Shape-constrained estimation has received much attention recently. The attraction is the prospect of obtaining automatic nonparametric estimators with no smoothing parameters to choose. Convexity is among the popular shape constraints that are of both mathematical and practical interest. Groeneboom, Jongbloed and Wellner 1b show that under the convexity constraint, the least squares estimator LSE can be used to estimate both a density and a regression function. For density estimation, they showed that the LSE ˆf n of the true convex density f converges pointwise at a n /5 rate under certain assumptions. The corresponding asymptotic distribution can be characterized via a so-called invelope function investigated by Groeneboom, Jongbloed and Wellner 1a. In the regression setting, similar results hold for the LSE ˆr n of the true regression function r. However, in the development of their pointwise asymptotic theory, it is required that f or r has positive second derivative in a neighborhood of the point to be estimated. This assumption excludes certain convex functions that may be of practical value. Two further scenarios of interest are given below: Scenario 1. At the point x,thek-th derivative f k x = or r k x = for k =, 3,...,s 1andf s x > or r s x >, where s is an integer greater than one; Scenario. There exists some region [a, b] onwhichf or r is linear. Scenario 1 can be handled using techniques developed in Balabdaoui, Rufibach and Wellner 9. The aim of this manuscript is to provide theory in the setting of Scenario. We prove that for estimation of a convex density when Scenario holds, at any fixed point x a, b, the LSE ˆf n x converges pointwise to f x at a n 1/ rate. Its left or right derivative ˆf nx converges to f x at the same rate. The corresponding asymptotic distributions are characterized using { a modified invelope process. More generally, we show that the processes n ˆfn x f x : x [a+δ, b δ] } and { n ˆf nx f x : x [a+δ, b δ] } both converge weakly for any δ>. We remark that unlike the case of Groeneboom, Jongbloed and Wellner 1b, these processes are not local, because intuitively speaking, the linear behavior of the true function is non-local, which requires linearity on an interval with positive length. In addition, there does not exist a universal distribution of ˆf n on a, b, i.e. the pointwise limit distributions at different points are in general different. In addition, we show that the derived asymptotic processes can be used in a more practical setting where one would like to perform tests on the linearity against a convex alternative. Moreover, we study the adaptation of the LSE ˆf n at the boundary points of the linear region e.g. a and b. Note that the difficulty level of estimating f a and f b depends on the behavior of f outside [a, b]. Nevertheless, we show that ˆf n a or ˆf n b converges to f a or f b at the minimax optimal rate up to a negligible factor of log log n. Last but not least, we establish the analogous rate and asymptotic distribution results for the LSE ˆr n in the regression setting.

3 Convex least squares estimation 173 Our study yields a better understanding of the adaptation of the LSE in terms of pointwise convergence under the convexity constraint. It is also one of the first attempts to quantify the behavior of the convex LSE at non-smooth points. When the truth is linear, the minimax optimal n 1/ pointwise rate is indeed achieved by the LSE on a, b. The optimal rate at the boundary points a and b is also achievable by the LSE up to a log-log factor. These results can also be applied to the case where the true function consists of multiple linear components. Furthermore, our results can be viewed as an intermediate stage for the development of theory under misspecification. Note that linearity is regarded as the boundary case of convexity: if a function is non-convex, then its projection to the class of convex functions will have linear components. We conjecture that the LSE in these misspecified regions converges at an n 1/ rate, with the asymptotic distribution characterized by a more restricted version of the invelope process. More broadly, we expect that this type of behavior will be seen in situations of other shape restrictions, such as log-concavity for d =1 Balabdaoui, Rufibach and Wellner, 9 andk-monotonicity Balabdaoui and Wellner, 7. The LSE of a convex density function was first studied by Groeneboom, Jongbloed and Wellner 1b, where its consistency and some asymptotic distribution theory were provided. On the other hand, the idea of using the LSE for convex regression function estimation dates back to Hildreth Its consistency was proved by Hanson and Pledger 1976, with some rate results giveninmammen1991. In this manuscript, for the sake of mathematical convenience, we shall focus on the non-discrete version discussed by Balabdaoui and Rufibach 8. See Groeneboom, Jongbloed and Wellner 8 for the computational aspects of all the above-mentioned LSEs. There are studies similar to ours regarding other shape restrictions. See Remark. of Groeneboom 1985 and Carolan and Dykstra 1999 in the context of decreasing density function estimation a.k.a. Grenander estimator when the truth is flat and Balabdaoui 14 with regard to discrete log-concave distribution estimation when the true distribution is geometric. In addition, Groeneboom and Pyke 1983 studied the Grenander estimator s global behavior in the L norm under the uniform distribution and gave a connection to statistical problems involving combination of p-values and two-sample rank statistics, while Cator 11 studied the adaptivity of the LSE in monotone regression or density function estimation. For estimation under misspecification of various shape constraints, we point readers to Jankowski 14, Cule and Samworth 1, Dümbgen, Samworth and Schuhmacher 11, Chen and Samworth 13, and Balabdaoui et.al. 13. More recent developments on global rates of the shape-constrained methods can be found in Guntuboyina and Sen 15, Doss and Wellner 16, and Kim and Samworth 14. See also Meyer 13 and Chen and Samworth 16 where an additive structure is imposed in shape-constrained estimation in the multidimensional setting. The rest of the paper is organized as follows: in Section, we study the behavior of the LSE for density estimation. In particular, for notational convenience, we first focus on a special case where the true density function f is taken to be

4 174 Y. Chen and J. A. Wellner triangular. The convergence rate and asymptotic distribution are given in Section.1.1 and Section.1.. Section.1.3 demonstrates the practical use of our asymptotic results, where a new consistent testing procedure on the linearity against a convex alternative is proposed. More general cases are handled later in Section. based on the ideas illustrated in Section.1. Section.3 discusses the adaptation of the LSE at the boundary points. Analogous results with regard to regression function estimation are presented in Section 3. Some proofs, mainly on the existence and uniqueness of a limit process and the adaptation of the LSE, are deferred to the appendices.. Estimation of a density function Given n independent and identically distributed IID observations from a density function f :[, [,. Let F be the corresponding distribution function DF. In this section, we denote the convex cone of all non-negative continuous convex and integrable functions on [, byk. TheLSEoff is given by ˆf n =argmin gt dt gtdf n t, g K where F n is the empirical distribution function of the observations. Furthermore, we denote the DF of ˆf n by ˆF n. Throughout the manuscript, without specifying otherwise, the derivative of a convex function can be interpreted as either its left derivative or its right derivative..1. A special case To motivate the discussion, as well as for notational convenience, we take f t = 1 t1 [,1] t in Section.1.1 and Section.1.. Extensions to other convex density functions are presented in Section.. Consistency of ˆf n over, in this setting can be found in Groeneboom, Jongbloed and Wellner 1b. Balabdaoui 7 has shown how ˆf n can be used to provide consistent estimators of f and f. Here we concentrate on the LSE s rate of convergence and asymptotic distribution Rate of convergence The following proposition shows that the convergence rate at any interior point where f is linear is n 1/, thus achieving the optimal rate given in Example 1 of Cai and Low 15. Theorem.1 Pointwise rate of convergence. Suppose f t = 1 t1 [,1] t. Then for any fixed x, 1, ˆf n x f x = O p n 1/.

5 Convex least squares estimation 175 Fig 1. The case of c>: A illustrates the situation of k, where ˆF nx F x cx ; B illustrates the situation of k>, where ˆF nx F x < c1 x. In both scenarios, the dashed curve represents the supporting hyperplane of the LSE, while the thick solid line represents the true density f. Proof of Theorem.1 A key ingredient of this proof is the version of Marshall s lemma in this setting Dümbgen, Rufibach and Wellner, 7, Theorem 1, which states that ˆF n F F n F,.1 where is the uniform norm. Let c = ˆf n x f x. Two cases are considered here: a c>andb c<. In the first case, because ˆf n is convex, one can find a supporting hyperplane of ˆf n passing through x,f x +c such that ˆf n y ky x +f x +c, for any y [,, where k is a negative slope. If k, then ˆF n x F x cx >. Otherwise, ˆFn x F x < c1 x <. Figure 1 explains the above inequalities graphically. Consequently, ˆF n x F x c minx, 1 x. In the second case, if we can find t<x such that ˆf n and f intersect at the point t, t, then ˆF n t F t 1 t c t x t ; ˆFn x ˆF n t F x F t 1 c x t <. Otherwise, by taking t =, we can still verify that the above two inequalities hold true. Figure illustrates these findings graphically.

6 176 Y. Chen and J. A. Wellner Fig. The case of c<: A illustrates the inequalities when ˆf n and f has an intersection on [,x ; the situation where ˆf n and f has no intersection on [,x is represented in B. In both scenarios, the dotted curve represents the LSE and the thick solid line represents the true density f. Therefore, ˆF n F max { ˆF n t F t, ˆFn x F x ˆFn t F t } c t max x t,x t c x 4, where the last inequality uses the fact that for any t [,x, t /x t isan increasing function of t while x t is a decreasing function of t. By.1, we have that O p n 1/ =4 F n F ˆF n F min x, x, x c. 4 It then follows that c = O p n 1/, as desired. Corollary. Uniform rate of convergence. For any <δ 1/, sup ˆf n x f x = O p n 1/. x [δ,1 δ] Proof of Corollary. A closer look at the proof of Theorem.1 reveals that we have O p n 1/ =4 F n F min x, x, x 4 simultaneously for every x [δ, 1 δ]. Therefore, O p n 1/ = inf x, min x, x x [δ,1 δ] 4 sup x [δ,1 δ] ˆf n x f x ˆf n x f x. Consequently, sup x [δ,1 δ] ˆf n x f x is O p n 1/.

7 Convex least squares estimation 177 Let ˆf n and ˆf + n denote respectively the left and right derivatives of ˆf n.the same convergence rate also applies to these derivative estimators. Corollary.3 Uniform rate of convergence: derivatives. For any <δ 1/, sup max ˆf n x f x, ˆf n + x f x = O p n 1/. x [δ,1 δ] Proof of Corollary.3 By the convexity of ˆf n, sup max ˆf n x f x, ˆf n + x f x x [δ,1 δ] max ˆf n δ f δ, ˆf + 1 δ f 1 δ δ max ˆfn δ ˆf n δ/ f δ+f δ/, ˆfn 1 δ/ ˆf n 1 δ f 1 δ/ + f 1 δ = O p n 1/, where the final equation follows from Corollary Asymptotic distribution To study the asymptotic distribution of ˆf n, we start by characterizing the limit distribution. Theorem.4 Characterization of the limit process. Let Xt =UF t and Y t = t Xsds for any t, whereu is a standard Brownian bridge process on [, 1]. Then almost surely a.s., there exists a uniquely defined random continuously differentiable function H on [, 1] satisfying the following conditions: 1 Ht Y t for every t [, 1]; H has convex second derivative on, 1; 3 H = Y and H = X; 4 H1 = Y 1 and H 1 = X1; 5 Ht Y t dh 3 t =. The above claim also holds if we instead consider a standard Brownian motion Xt on [, 1] and replace X, Y, H T by X,Ỹ, H T,whereỸt = t Xsds, and where H is different from H. Figure 3 shows a typical realization of Xt, Y t, H t andht. A detailed construction of the above limit invelope process can be found in Appendix I. Note that our process is defined on a compact interval, so is technically different from the process presented in Groeneboom, Jongbloed and Wellner 1a which is defined on the whole real line. As a result, extra conditions regarding

8 178 Y. Chen and J. A. Wellner Fig 3. A typical realization of Xt, Y t, H t and Ht in Theorem.4 with Xt = UF t and F the distribution function with triangular density f t =1 t1 [,1] t: in A, Xt and H t are plotted in solid and dash-dotted curves respectively; in B, the corresponding Y t is plotted in solid curve, while the invelope process Ht is illustrated in dash-dotted curve. the behavior of H and H at the boundary points are imposed here to ensure its uniqueness. Other characterization of the limit process is also possible. A slight variant is given below, which proof can also be found in Appendix I. Corollary.5 Different characterization. Conditions 3 and 4 in the statement of Theorem.4 can be replaced respectively by 3 lim t + H t =, and 4 lim t 1 H t =. Now we are in the position to state our main result of this section. Theorem.6 Asymptotic distribution. Suppose f t = 1 t1 [,1] t. Then for any δ>, the process ˆfn x f n x ˆf nx f x H x H 3 x in C[δ, 1 δ] D[δ, 1 δ], where C is the space of continuous functions equipped with the uniform norm, D is the Skorokhod space, and H is the invelope process defined in Theorem.4. In particular, for any x, 1, n ˆfn x f x, ˆf nx f x T d H x,h 3 x T..

9 Convex least squares estimation 179 Proof of Theorem.6 Before proceeding to the proof, we first define the following processes on [, : X n t =n 1/ F n t F t ; t Y n t =n 1/ F n s F s ds; t { s } Ĥ n t =n 1/ ˆfn v f v dv ds..3 Furthermore, define the set of knots of a convex function f on, 1 as { Sf = t, 1 : ft < 1 ft δ+ft + δ, δ, min1 t, t }..4 We remark that the above definition of knots can be easily extended to convex functions with a different domain. By Lemma. of Groeneboom, Jongbloed and Wellner 1b, we have that Ĥ n t Y n t fort [, 1], with equality if t S ˆf n. In addition, Ĥnt Y n tdĥ3 n t =. Now define the space E m of vector-valued functions m 3 as E m = C[, 1] C[, 1] C[1/m, 1 1/m] D[1/m, 1 1/m] C[, 1] D[, 1]. and endow E m with the product topology induced by the uniform norm on C i.e. the space of continuous functions and Skorokhod metric on D i.e. the Skorokhod space. Let E m be supported by the stochastic process Z n = T Ĥn, Ĥ n, Ĥ n, Ĥ3 n,y n,x n. Corollary.3 entails the tightness of Ĥ n 3 in D[1/m, 1 1/m], while Corollary. entails the tightness of Ĥ n in C[1/m, 1 1/m]. In addition, both Ĥn and Ĥ n are tight in C[, 1] via an easy application of Marshall s lemma. Since X n converges to a Brownian bridge, it is tight in D[, 1] as well. Finally, the same is also true for Y n. Note that E m is separable. For any subsequence of Z n, by Prohorov s theorem, we can construct a further subsequence Z nj such that {Z nj } j converges weakly in E m to some Z = H,H,H,H3,Y,X T. In addition, it follows from a diagonalization argument that a further extraction can make {Z nj } j converge weakly to Z in every E m. Using Skorokhod s representation theorem, we can assume without loss of generality that for almost every ω in the sample space, Z nj ω Z ω. Moreover, since X has a continuous

10 18 Y. Chen and J. A. Wellner path a.s., the convergence of X nj can be strengthened to X nj ω Xω, where is the uniform norm. In the rest of the proof, ω is suppressed for notational convenience, so depending on the context, Z nj or Z can either mean a random variable or a particular realization. In the following, we shall verify that H satisfies all the conditions listed in the statement of Theorem.4. 1 The fulfillment of this condition follows from the fact that inf Ĥn t Y n t for all n N. t [,1] Since f is linear on [, 1], Ĥ n is convex on [, 1] for every n N. The pointwise limit of convex functions is still convex, so H is convex on [1/m, 1 1/m]. The condition is then satisfied by letting m. 3 This condition always holds in view of our construction of Ĥn in.3. 4 We consider two cases: a if H 1 1/m as m, then the conditions are satisfied in view of Corollary.5; b otherwise, it must be the case that H 1 is bounded from above. Note that H 1 is also bounded from below a.s., which can be proved by using Corollary.3 and the fact that Ĥ n is convex. Denote by τ nj theknotof ˆf nj closest to 1. Then by Lemma. of Groeneboom, Jongbloed and Wellner 1b, Ĥ nj τ nj =Y nj τ nj andĥ n j τ nj =X nj τ nj. Since t =1isaknot of f, consistency of ˆfnj allows us to see that lim j τ nj = 1. Because H 1 is finite and both Y tandxt are sample continuous processes, taking τ nj 1 yields H 1 = Y 1 and H 1 = X1. Note that this argument remains valid even if τ nj > 1, because in this scenario, Ĥ n j is linear and bounded on [ τ nj,τ nj ]. 5 It follows from 1/m 1/m Ĥnj t Y nj t dĥ3 n j t =that 1/m 1/m H t Y t dh 3 t =. Since this holds for any m, one necessarily has that H t Y t dh 3 t =. Consequently, in view of Theorem.4, the limit Z is the same for any subsequences of Z n in E m.fixanym > 1/δ. It follows that the full sequence {Z n } n converges weakly in E m and has the limit H, H,H,H 3,Y,X T. This, together with the fact that H 3 is continuous at any fixed x, 1 with probability one which can be proved using Conditions 1 and 5 of H, yields.. It can be inferred from Corollary.5 and Theorem.6 that both ˆf n and ˆf n 1 do not converge to the truth at a n 1/ rate. In fact, Balabdaoui 7

11 Convex least squares estimation 181 proved that ˆf n is an inconsistent estimator of f. Nevertheless, the following proposition shows that ˆf n is at most O p 1. For the case of the maximum likelihood estimator of a k monotone density, we refer readers to Gao and Wellner 9 for a similar result. Proposition.7 Behavior at zero. ˆfn = O p 1. Proof of Proposition.7 Let τ n be the left-most point in S ˆf n. Since ˆf n is finite for every n, the linearity of ˆf n on [,τ n ] means that ˆf n t 1 t/τ n ˆf n for every t. By Corollary.1 of Groeneboom, Jongbloed and Wellner 1b, F n τ n = τn ˆf n tdt τn 1 t/τ n ˆf n dt = τ n ˆfn /. It follows from Theorem 9.1. of Shorack and Wellner 1986 that ˆf n F n τ n /τ n sup t>.1.3. Testing against a general convex density F n t F t O p 1. F t t The asymptotic results established in Section.1. can be used to test the linearity of a density function against a general convex alternative. To illustrate the main idea, in this section, we focus on the problem of testing H : f t = 1 t1 [,1] t against H 1 : f K and f t 1 t1 [,1] t for some t,. The test we propose is free of tuning parameters. Since the triangular distribution is frequently used in practice, and is closely related to the uniform distribution e.g. the minimum of two independent U[, 1] is triangular, our test could be a valuable addition to practitioners toolkit. For other tests on the linearity in the regression setting, we point readers to Meyer 3 and Sen and Meyer 13. Our test is based on the statistic T n := { n sup 1 t1[,1] t ˆf n t }. t [, The behavior of T n under H is established below. Proposition.8 Behavior of T n under H. Write T := inf t [,1] H t, d where H is the invelope process defined in Theorem.4. ThenT n T.Moreover, P T = 1.

12 18 Y. Chen and J. A. Wellner Table 1 Estimated upper-quantiles of T based on 1 5 simulations α 1%.5% 5% 1% % t α Fig 4. The estimated density function of T based on 1 5 simulations. Here we used the kernel density estimator with the boundary correction at zero. Proof of Proposition.8 The first part follows directly from Theorem.6 and Corollary.5. For the second part, note that if T<, then inf t [,1] H t >, which would imply H 1 H >. But this would violate the characterization of H i.e. H 1 H = X1 X =, since X is a Brownian bridge. Consequently, P T = 1. The quantiles and the density function of T can be approximated numerically using Monte Carlo methods. They are given in Table 1 and Figure 4. Herewe denote the upper α quantile of T by t α. Our numerical evidence also suggests that the density function of T appears to be log-concave. For related work on log-concavity of Chernoff s distribution i.e. a different random variable also associated with a Brownian motion, see Balabdaoui and Wellner 14. For a test of size α, 1, we propose to reject H if T n >t α. According to Proposition.8, this test is asymptotically of size α. In the following, we prove that our test is also consistent, i.e., its power goes to one as n. Theorem.9 Consistency of the test. Under H 1, for any α>, P T n > t α 1 as n. Proof of Theorem.9 Suppose that f t 1 t1 [,1] t for some t,. First, we show that there exists some t, 1 such that f t < 1 t. Suppose the conclusion fails, then f t 1 t1 [,1] t for all t>. Hence f tdt

13 Convex least squares estimation t1 [,1] tdt = 1 with strict inequality if f t > 1 t1 [,1] t for some t>. But then f is not a density and we conclude that f t = 1 t1 [,1] t for every t,. This contradiction yields the conclusion. Next, it follows from Theorem 3.1 of Groeneboom, Jongbloed and Wellner 1b thatthelse ˆf n is consistent at t, i.e., ˆf n t f t a.s.. Therefore, almost surely, T n / n>1 t ˆf n t = 1 t f t +f t ˆf n t 1 t f t >. It follows that P T n >t α 1... More general settings The aim of this subsection is to extend the conclusions presented in Section.1 to more general convex densities. We assume that f is positive and linear on a, b for some a<b, where the open interval a, b is picked as the largest interval on which f remains linear. More precisely, it means that there does not exist a bigger open interval a,b i.e. a, b a,b, on which f is linear. For the sake of notational convenience, we suppress the dependence of H on a, b and F in the following two theorems. Their proofs are similar to those given in Section.1, so are omitted for brevity. Theorem.1 Characterization of the limit process. Let Xt =UF t and Y t = t Xsds for any t>. Then a.s., there exists a uniquely defined random continuously differentiable function H on [a, b] satisfying the following conditions: 1 H t Y t for every t [a, b]; H has convex second derivative on a, b; 3 H a =Y a and H a =Xa; 4 H b =Y b and H b =Xb; 5 b a H t Y t dh 3 t =. Theorem.11 Rate and asymptotic distribution. For any <δ b a/, sup x [a+δ,b δ] ˆf n x f x, ˆf nx f x = O p n 1/. Moreover, n ˆfn x f x ˆf nx f x H x x H 3 in C[a + δ, b δ] D[a + δ, b δ], where H is the invelope process defined in Theorem.1.

14 184 Y. Chen and J. A. Wellner Fig 5. Typical f in different cases are illustrated, where we set α =in B and C..3. Adaptation at the boundary points In this subsection, we study the pointwise convergence rate of the convex LSE ˆf n at the boundary points of the region where f is linear. Examples of such points include a and b given in Section.. To begin our discussion, we assume that x, is such a boundary point in the interior of the support i.e. f x >. Here again f is a convex and decreasing density function on [,. Three cases are under investigation as below. These cases are illustrated in Figure 5. A f t =f x +K 1 t x +K t x 1 [x, t for every t in a fixed small neighborhood of x,withk 1 + K < andk > ; B f t =f x +K 1 t x +K t x α 1 [x, t for every t in a fixed small neighborhood of x,withk 1 <, K > andα>1; C f t =f x +K 1 t x +K x t α 1 [,xt for every t in a fixed small neighborhood of x,withk 1 <, K > andα>1. Note that in all cases above, only the behavior of f in a small neighborhood of x is relevant. As pointed out in Example of Cai and Low 15, the minimax optimal convergence rate at x is n 1/3 in A. Furthermore, in B and C, Example 4 of Cai and Low 15 suggests that the optimal rate at x is n α/α+1. In the following, we prove that the convex LSE automatically adapts to optimal rates, up to a factor of log log n. Though almost negligible, the factor of log log n here indicates that there might be room for improvement. These results should be viewed as a first step for the investigation of the adaptation of the convex LSE. One could also compare our results with Cator 11, where the adaptation of the LSE in the context of decreasing density function estimation was tackled. Theorem.1 Adaptation at the boundary points: I. In the case of A, ˆf n x f x = O p n 1/3 log log n. Moreover, min ˆfn x f x, = O p n 1/.

15 Convex least squares estimation 185 Proof of Theorem.1 Suppose that for some fixed δ>, A holds for every t [x δ, x +δ]. Our first aim is to show that inf min ˆfn t f t, = O p n 1/..5 t [x δ,x +δ] Suppose that c := ˆf n t f t <. If we can find x δ t<x such that ˆf n and f intersect at the point t, f t, then ˆFn t ˆF n x δ F t F x δ 1 c t x + δ ; x t ˆFn x ˆF n t F x F t 1 c x t <. Otherwise, by taking t = x δ, we can still verify the above two inequalities. Therefore, O p n 1/ = ˆF n F max { ˆFn t F t ˆFn x δ F x δ, ˆFn x F x ˆFn t F t } c { t max x + δ },x t x t c δ 4, where the first equation follows from Marshall s lemma see.1. Consequently,.5 holds true. As a remark, the above arguments are essentially the same as those illustrated in the proof of Theorem.1, where one only relies on the fact that f is linear on [x δ, x ] i.e. one-sided linearity. In the rest of the proof, it suffices to only consider the situation of ˆf n x f x >. Recall that to handle this scenario, the proof of Theorem.1 makes use of the fact that the triangular density is linear on the whole [, 1] i.e. two-sided linearity. This is no longer true here since f is not linear on [x δ, x + δ]. In the following, we deploy a different strategy to establish the rate. Let τn =max{t S ˆf n,t < x } and τ n + =min{t S ˆf n,t x },where S is defined in.4. We consider three different cases separately. a max τ n + x,x τn >δ.thenf and ˆf n are linear on either [x δ, x ]or [x,x +δ]. It follows from the line of reasoning as in the proof of Theorem.1 that max ˆfn x f x, = O p n 1/. b max τ n + x,x τn δ and τ + n τn >n 1/3. Note that being in the set S ˆf n implies that F n τn = ˆF n τn andf n τ n + = ˆF n τ n + forτn and τ n +.Since ˆf n is linear on [τn,τ n + ], F n τ n + F τ n + F n τn F τn.6

16 186 Y. Chen and J. A. Wellner = τ + n τ n ˆfn t f t dt ˆfn x f x τ n + τn / +min ˆfn τn f τn, ˆf n τ n + f τ n +, τ n + τn. By the law of the iterated logarithm for local empirical processes cf. Lemma of Csörgő and Horvath 1993, the term displayed in.6 isatmost τ + n τn O p n 1/ log log n.inviewof.5, rearranging the terms in the above inequality yields ˆf n x f x τ n + τn 1/ O p n 1/ log log n + O p n 1/ O p n 1/3 log log n. c τ n + τn n 1/3. Now we define τ n ++ =max{t S ˆf n,t < x +n 1/3 } and τ n +++ =min{t S ˆf n,t x +n 1/3 }. Here the existence of τ n ++ is guaranteed by the condition that τ n + τn n 1/3. Furthermore, we note that in our definitions τ n + and τ n ++ might not be distinct. Within this setting, three further scenarios are to be dealt with. c1 τ n +++ x >δ. Since both f and ˆf n are linear on [τ n ++,τ n +++ ], one can apply a strategy similar to that used in the proof of Theorem.1 to show that ˆf n τ n ++ f τ n ++ = O p n 1/. Consequently, τ ++ ˆf n x f x = ˆf n τ n ++ f τ n ++ n ˆf n t f t dt = O p n 1/3, where we have used the facts that τ ++ n x x < n 1/3 and sup ˆf nt f t = O p 1. t [x,x +δ] Here the second fact can be derived by invoking K 1 +o p 1 inf ˆf nt sup ˆf nt K 1 +K +o p 1, t [x δ,x +δ] t [x δ,x +δ] which follows easily from consistency of ˆf n in estimating f at the points x ± δ. c 3n 1/3 <τ n +++ x δ.then τ n +++ τ n ++ >n 1/3. Using essentially the same argument as in b, we see that and hence, max ˆfn τ ++ n f τ ++ n, = O p n 1/3 log log n, ˆf n τ ++ n f τ ++ n = O p n 1/3 log log n.

17 Convex least squares estimation 187 We then apply the argument presented in c1 to derive ˆf n x f x = O p n 1/3 log log n. c3 τ +++ n x 3n 1/3.Thenn 1/3 τ +++ n τ + n n 1/3.Byproceeding as in the proof of Lemma 4.3 of Groeneboom, Jongbloed and Wellner 1b, we are able to verify that inf ˆf n t f t = O p n 1/3. t [τ n +,τ n +++ ] Here we invoked Lemma A.1 of Balabdaoui and Wellner 7 with k =1andd = to verify the above claim. See also Kim and Pollard 199. Finally, we can argue as in c1 to show that ˆf n x f x = O p n 1/3. The proof is complete by taking into account all the above cases. We remark that in our proof the worst case scenarios that drive the convergence rate are b and c. Theorem.13 Adaptation at the boundary points: II. In the case of B or C, ˆf n x f x = O p n α/α+1 log log n. Moreover, min ˆfn x f x, = O p n 1/. The proof of Theorem.13 is of more technical nature and is deferred to Appendix II. 3. Estimation of a regression function Changing notation slightly from the previous section, we now assume that we are given pairs {X n,i,y n,i :i =1,...,n} for n =1,,... with Y n,i = r X n,i +ɛ n,i, for i =1,...,n, where r :[, 1] R is a convex function, and where {ɛ n,i : i =1,...,n} is a triangular array of IID random variables satisfying i Eɛ 1,1 =andσ = Eɛ 1,1 <. To simplify our analysis, the following fixed design is considered: ii X n,i = i/n +1fori =1,...,n. The LSE of r proposed by Balabdaoui and Rufibach 8 is ˆr n =argmin g K 1 gt dt 1 n n gx n,i Y n,i, 3.1 i=1

18 188 Y. Chen and J. A. Wellner where, in this section, K denotes the set of all continuous convex functions on [, 1]. We note that the above estimator is slightly different from the more classical LSE that minimizes 1 n n i=1 gx n,i 1 n n i=1 gx n,iy n,i over K. Since its criterion function has a completely discrete nature, different techniques are needed to prove analogous results. We will not pursue this direction in the manuscript Basic properties In this subsection, we list some basic properties of ˆr n given as 3.1. For any t [, 1], define R t = ˆR n t = t t R n t = 1 n r sds; ˆr n sds; n Y n,i 1 [Xn,i, t. i=1 Proposition 3.1. The LSE ˆr n exists and is unique. Proposition 3.. Let Sˆr n denote the set of knots of ˆr n. The following properties hold: a t ˆRn s R n s { =, for all t Sˆrn {, 1}, ds, for all t [, 1]; b ˆR n t =R n t for any t Sˆr n {, 1}; c { t ˆRn s R n s ds } dˆr nt =. Proposition 3.3 Marshall s lemma. Proof of Propositions 3.1, 3. and 3.3 Note that we can rewrite ˆr n as 1 argmin gt dt g K ˆR n R R n R. 3. gtdr n t. From this perspective, it is easy to check that Proposition 3.1, Proposition 3. and Proposition 3.3 follow from, respectively, slight modifications of Lemma.1, Lemma. of Groeneboom, Jongbloed and Wellner 1b and Theorem 1 of Dümbgen, Rufibach and Wellner 7. We remark that this version of Marshall s lemma in the regression setting serves as an important tool to establish consistency and the rate. In particular, in

19 Convex least squares estimation 189 the following, we show that 3. easily yields consistency of ˆR n, and consistency of ˆr n follow from this together with convexity of ˆr n. Proposition 3.4 Consistency. Under Assumptions i ii, for any <δ 1/, sup ˆr n x r x a.s., as n. x [δ,1 δ] Proof of Proposition 3.4 The proof replies on the following intermediate result: suppose there are differentiable functions G,G 1,...,allare[, 1] R and have convex first derivative, then for any fixed <δ 1/, G k G implies sup t [δ,1 δ] G k t G t. To see this, we first show that lim sup k sup t [δ,1 δ] Note that for any t [δ, 1 δ] andany<ɛ δ, G kt G t Gk t + ɛ G k t max ɛ It then follows that lim sup k sup t [δ,1 δ] sup t [δ,1 δ] {G kt G t} G t + ɛ G t ɛ {G kt G t}. 3.3 G t, G kt G k t ɛ ɛ Our claim 3.3 can be verified by letting ɛ. Secondly, we show that lim inf k G t, G t G t ɛ ɛ inf t [,1] {G kt G t}. We prove this by contradiction. Suppose that lim inf k inf t [,1] {G kt G t} = M G t. G t. for some M>. By extracting subsequences if necessary, we can assume that inf t [,1] {G k t G t} M as k.inviewof3.3, it follows from the convexity of G k and G that one can find an interval I k of positive length δ which can depend on M such that inf t Ik {G k t G t} M/ for every k>k,wherek is a sufficiently large integer. This implies that G k G sup t I k G k t G t Mδ/4 >,

20 19 Y. Chen and J. A. Wellner for every k>k, which contradicts the fact that G k G ask. Combining these two parts together completes the proof of the intermediate result. a.s. Since R n R by empirical process theory, Proposition 3.3 entails that ˆR a.s. n R. Consistency of ˆr n then follows straightforwardly from the above intermediate result. 3.. Rate of convergence and asymptotic distribution In the following, we assume that r that is linear on a, b, 1. Moreover, a, b is largest in the sense that one can not find a bigger open interval a,b on which r remains linear. Theorem 3.5 Rate and asymptotic distribution. Under Assumptions i ii, for any <δ b a/, sup x [a+δ,b δ] ˆr n x r x, ˆr nx r x = O p n 1/. Moreover, n ˆrn x r x ˆr nx r x σ b a 1/ H x a b a b a 1/ H3 x a b a in C[a + δ, b δ] D[a + δ, b δ], where H is the invelope process defined in the second part of Theorem.4. The proof of Theorem 3.5 is very similar to what has already been shown in Section.1, so is omitted for the sake of brevity. In presence of the linearity of r on a, b, the limit distribution of the process nˆrn r ona, b does not depend on r a orr b. In addition, the above theorem continues to hold if we weaken Assumption ii to: 1 n ii sup t [,1] n i=1 1 [X n,i, t t = o n 1/. Theoretical results in the random design are also possible, where for instance, we can assume that {X n,i,i =1,...,n} are IID uniform random variables on [, 1]. In this case, Theorem 3.5 is still valid, while a process different from H is required to characterize the limit distribution. This follows from the fact that in the random design nr n R can converge to a Gaussian process that is not a Brownian motion. 4. Appendices: proofs 4.1. Appendix I: existence of the limit process Recall that f t = 1 t1 [,1] t, Xt =UF t, and Y t = t Xsds for t. In this section, we show the existence and uniqueness of the invelope

21 Convex least squares estimation 191 process H. The case of H can be handled using essentially the same arguments, so is omitted here. Lemmas are needed to prove Theorem.4. Lemma 4.1. Let the functional φg be defined as φg = 1 g t dt gt dxt for functions in the set G k = {g :[, 1] R,g is convex,g = g1 = k}. Then with probability one, the problem of minimizing φg over G k has a unique solution. Proof of Lemma 4.1 We consider this optimization problem in the metric space L.First,weshow that if it exists, the minimizer must be in the subset G k,m = {g :[, 1] R,g is convex,g = g1 = k, inf gt M} [,1] for some <M<. To verify this, we need the following result sup gt dxt <, a.s. 4.1 g G 1,1 Let W t be a standard Brownian motion. We note that gt dxt has the same distribution as g 1 1 t dw t W 1 gtf tdt. Using the entropy bound of G 1,1 in L Theorem.7.1 of Guntuboyina and Sen 13 and Dudley s theorem cf. Theorem.6.1 of Dudley 1999, we can establish that H = { h :[, 1] R ht =g 1 1 t,g G 1,1 } is a GC-set. As g 1 1 t dw t is an isonormal Gaussian process indexed by H, wehavethata.s.sup g G1,1 1 g 1 1 t dw t <. Furthermore, it is easy to check that W 1 < a.s. and sup g G1,1 1 gtf tdt. So our claim of 4.1 holds. Now for sufficiently large M with M>k, gt dxt M sup 1 gt dxt. sup g G k,m g G 1,1 Thus, for any g G k with inf [,1] g = M, gtdxt isatmostom. Furthermore, 1 1 g t dt is of OM. Since φ at the minimizer could at most

22 19 Y. Chen and J. A. Wellner be as large as φ =, we conclude that it suffices to only consider functions in G k,m for some sufficiently large M. Note that the functional φ is continuous cf. Dudley s theorem and strictly convex. Moreover, for g 1,g G k,m,if g1 t g t dt =,theng1 = g on [, 1]. Since G k,m is compact in L, the existence and uniqueness follow from a standard convex analysis argument in a Hilbert space. As a remark, it can be seen from the proof of Lemma 4.1 that for a given ω Ω from the sample space which determines the value of Xt, if the function φ has a unique minimizer over G 1 which happens a.s., it also admits a unique minimizer over G k for any k>1. Lemma 4.. Almost surely, Y t does not have parabolic tangents at either t =or t =1. Proof of Lemma 4. First, consider the case of t =. Theorem 1 of Lachal 1997 saysthat lim sup t + t W sds /3t3 log log1/t =1, a.s., where W is a standard Brownian motion. From this, it follows that lim sup t + Y t t = lim sup {W s s s s W 1}ds /3t3 log log1/t t + /3t3 log log1/t t W sds /3t3 log log1/t =, a.s. = lim sup t + thanks to the scaling properties of Brownian motion W at = d aw t. This implies that Y t does not have a parabolic tangent at t =. Y 1 Y t Second, consider the case of t = 1. Note that lim t 1 1 t = X1 = and lim sup Y 1 Y t 1 { t 1 1 t = lim sup t W 1 W F s } ds W 11 t 3 /3 t 1 1 t { = lim t W 1 W F t } ds t 1 1 t a.s. t = lim W s ds, t + t where F t =t t. Therefore, to prove that Y t does not have a parabolic tangent at t = 1, it suffices to show that lim sup t + t W s ds =, t a.s.

23 Convex least squares estimation 193 Denote by Zt = t W s ds. For any <t 1 <t < 1, we argue that the random variable Zt Zt 1 t t 1 W t 1 follows a distribution of N,t 4 /6 t 1t +4t 3 1t /3 t 4 1/. This is because Zt Zt 1 t t 1 W t 1 = = = t t 1 t t 1 t t 1 {W s W t 1}ds W s W t 1 ds = s t u 1 dsdw u = s t t 1 t t 1 1 s s t 1 dw uds t udw u, where we invoked the stochastic Fubini s theorem in the last line, and thus, { } Var Zt Zt 1 t t 1 W t 1 = t t 1 t u du = t 4 /6 t 1t +4t 3 1t /3 t 4 1/. Now setting t 1 =1/ andt i+1 = t i for every i N. It is easy to check that the collection of random variables { } Zt i 1 Zt i t i 1 t i W t i i=1 is mutually independent, so lim sup i where we made use of the fact that lim i Zt i 1 Zt i t i 1 t i W t i t i 1 t 4 i 1 /6 t i t i 1 +4t3 i t i 1/3 t 4 i / t i 1 =, a.s., 4. =1/ 6. Assume that there exists some K> such that Zt Kt for all sufficiently small t>. But it follows from 4. that a.s. one can find a subsequence of N denoted by {i j } j=1 satisfying Zt ij 1 Zt ij t ij 1 t ij W t i j K +1, for j =1,,... Consequently, Zt ij 1 t i j 1 t i j 1 K +1+ Zt i j t i j 1 + W t i j t ij 1 t ij t i j 1

24 194 Y. Chen and J. A. Wellner K +1 Kt ij W t i j tij t ij 1 t ij a.s. K +1, tij t ij as j. The last step is due to t ij + and lim sup s + W s /s 1/4 =a.s. which is a direct application of the law of the iterated logarithm. The proof is completed by contradiction. Now denote by f k the unique function which minimizes φg overg k.let H k be the second order integral satisfying H k = Y, H k 1 = Y 1 and H k = f k. Lemma 4.3. Almost surely, for every k N, f k and H k has the following properties: i H k t Y t for every t [, 1]; ii Hk t Y t df k t =, where the derivative can be interpreted as either the left or the right derivative; iii H k t =Y t and H k t =Xt for any t Sf k, wheres is the set of knots; iv t f ksds Xt X and t f ksds X1 Xt for any t Sf k ; v f k is a continuous function on [, 1]. Proof of Lemma 4.3 To show i, ii, iii and iv, one may refer to Lemma. and Corollary.1 of Groeneboom, Jongbloed and Wellner 1a and use a similar functional derivative argument. For v, we note that since f k is convex, discontinuity can only happen at t = or t = 1. In the following, we show that it is impossible at t =. Suppose that f k is discontinuous at zero. Consider the class of functions g δ t =max1 t/δ,. Then f k t+ɛg δ t 1,1 t+k1 {,1} t G k for every ɛ,k lim inf t + f k t]. By considering the functional derivative of φg and using integration by parts, we obtain that for any δ>, kδ/ g δ tf k tdt g δ tdxt =Y δ/δ, which implies that kδ / Y δ for every δ>. However, this contradicts Lemma 4., which says that a.s. Y does not have a parabolic tangent at t =. Consequently, f k is continuous at t =. The same argument can also be applied to show the continuity of f k at t =1. Lemma 4.4. Fix any t, 1. Denotebyτ k the right-most knot of f k on,t]. If such knot does not exist, then set τ k =. Similarly, denote by τ + k the left-most knot of f k on [t,1, and set τ + k =1if such knot does not exist. Then, for almost every ω Ω, thereexistsk>such that for every k K, τ k and τ + k 1. Here we suppressed the dependence of K and f k as well as τ k and τ + k onω via X in the notation.

25 Convex least squares estimation 195 Proof of Lemma 4.4 Recall that Xt is said to be sample-bounded on [, 1] if sup [,1] Xt is finite for almost all ω Ω. Since both Xt andy t are sample bounded. Without loss of generality, in the following, wee can fix ω Ω and assume that sup [,1] Xt < and sup [,1] Y t <. First, we show the existence of at least one knot on, 1. Note that the cubic polynomial P k with P k = Y =, P k 1 = Y 1, P k = P k 1 = k can be expressed as P k s = k s + Y 1 k s, Therefore, take for instance s =.5 and consider the event P k.5 Y.5. This event can be reexpressed as 1 Y 1 Y k, which will eventually become false as k. This is due to the fact that Y t is sample bounded. In view of i and v of Lemma 4.3, we conclude that f k has at least one knot in the open interval, 1 for sufficiently large k. Next, take k large enough so that f k has one knot in, 1, which we denote by τ k. By iii of Lemma 4.3, H k τ k =Yτ k andh k τ k=xτ k. Without loss of generality, we may assume that τ k >t. Now the cubic polynomial P k with P k = Y =, P k τ k =Yτ k, P k = k and P k τ k=xτ k can be expressed as P k s =a k1 s 3 + a k s + a k3 s, where a k1 = Xτ k τ k Y τ k τ 3 k k 4τ k ; a k = k ; a k3 = 3Y τ k Xτ k kτ k τ k 4. By taking, say for example, s = t/, it can then be verified that the event P k t/ Y t/, which is equivalent to Xτ k t 3 16τ k Y τ kt 3 16τ 3 k + 3Y τ kt 4τ k Xτ kt 4 kt 3τ k t τ k, will eventually stop happening as k. This is due to the sample boundedness of both Xt andy t. Consequently, τ k for sufficiently large k. Furthermore, using essentially the same argument, one can also show that τ + k 1for large k, which completes the proof of this lemma.

26 196 Y. Chen and J. A. Wellner Lemma 4.5. For almost every ω Ω which determines Xt and f k, we can find an M>such that inf inf f k > M. k N [,1] Proof of Lemma 4.5 Fix any δ, 1/]. In view of Lemma 4.4, we may assume that there exist knots τ k and τ + k on,δ]and[1 δ, 1 respectively for sufficiently large k. If inf [,1] f k for every k>k, then we are done. Otherwise, we focus on those inf [,1] f k < and find <t k,1 <t k, < 1withf k t k,1 =f k t k, =. Note that the existence of t k,1 and t k, are guaranteed by Lemma 4.3 v. In the following, we take δ =1/1 and consider two scenarios. a t k, t k,1 < δ. Leta k,1 f k t k,1 anda k, f k t k,, where is the subgradient operator. Then Lemma 4.3v implies that a k,1 < and a k, >. Since both a k,1 s t k,1 anda k, s t k, can be regarded as supporting hyperplanes of f k due to its convexity, it follows that f k s max a k,1 s t k,1,a k, s t k,.setc k = a k,1 a k, t k, t k,1 /a k, a k,1 i.e. C k is the value of the above hyperplanes at their intersection, which is negative, so inf [,1] f k C k.now sup Xt t [,1] τ + Xτ + k Xτ k =H kτ + k H kτ k = k τ + k τ k δ δ δ δ inf u [,1 4δ] max a k,1 s t k,1,a k, s t k, ds τ k f k s ds max a k,1 s t k,1,a k, s t k, ds δc k max a k,1 s t k,1,a k, s t k,, ds δ max a k,1 s t k,1, a k, s t k,, ds δc k δ { ak,1 u + a k, 1 4δ u } / C k t k, t k 1 / δc k 1 4δ C k δc k δc k = 4δ 1 4δ + δ C k C k. Consequently, a.s., sup k N C k <. b t k, t k,1 δ. Now consider f k+ =maxf k,. It follows that φf k φf k+ = 1 min,f k t dt min,f k t dxt

27 Convex least squares estimation 197 Let M k = inf [,1] f k. By the convexity of f k, we see that the first term is no smaller than δmk /3. For the second term, we can again use Dudley s theorem and the fact that the following class { G unit = g :[, 1] [ 1, ] x 1 <x 1 } s.t. gs is convex on [x 1,x ]; gs =, s [,x 1 ] [x, 1] has entropy of order η 1/ in L to argue that sup gdxt <, g G unit [,1] a.s. Therefore, the second term is at most OM k. Then we can use the argument in the proof of Lemma 4.1 to establish that a.s. lim sup k M k <. Lemma 4.6. For any fixed t, 1 and almost every ω Ω, sup k f k t, sup k f k t and sup k f + k t are bounded. Proof of Lemma 4.6 LetΔ=mint, 1 t/. In view of Lemma 4.4, for sufficiently large k, we can assume that f k has at least one knot in,t Δ], and one knot in [t +Δ, 1. Denote these two points by τ k and τ + k respectively. By the convexity of f k, there exists some c R such that f k s cs t+f k t for every s [, 1]. It follows that τ + sup Xt Xτ + k Xτ k = k t [,1] τ k [τ k,t Δ] [t+δ,τ + k ] M k +Δf k t, inf [,1] f k f k s ds ds + t+δ t Δ { cs t+fk t } ds where M k = inf [,1] f k. By Lemma 4.5, we see that for almost every ω Ω, lim sup k f k t is bounded. Combining this with the lower bound we established previously entails the boundedness of {f k t} k. Next, note that both {f k t Δ} k and {f k t+δ} k are bounded. The boundedness of {f k t} k and {f + k t} k immediately follows from the convexity of f k by utilizing [ fk t f k t Δ f k t, f ] kt +Δ f k t. Δ Δ Lemma 4.7. For almost every ω Ω, we have that both } and {sup t [,1] H k t are bounded. k { } sup t [,1] H k t k

28 198 Y. Chen and J. A. Wellner Proof of Lemma 4.7 By Lemma 4.3iv, for any t [, 1], min, inf inf f k t inf f k k N [,1] [,1] t f k sds = X1 X 1 t inf [,1] f k X1 X min f k sds t f k sds, inf k N inf [,1] f k { Consequently, Lemma 4.5 entails the boundedness of sup t [,1] } t f ksds. k Furthermore, Lemma 4.4 says that one can always find a knot τ k, 1 with Xτ k =H } k τ k for all sufficiently large k. Thus, the boundedness of {sup t [,1] H k t follows from the fact that k τ sup H kt sup t [,1] t [,1] Xτ k k t f k sds + f k sds t sup Xt + sup f k sds. t [,1] t [,1] { } Finally, one can derive the sample boundedness of sup t [,1] H k t by using k the equality H k t =Y + t H k sds. Lemma 4.8. For almost every ω Ω, both{h k } k and {H k } k are uniformly equicontinuous on [, 1]. In fact, they are uniformly Hölder continuous with exponent less than 1/4. Proof of Lemma 4.8 Here we only show that the family {H k } k is uniformly equicontinuous. Fix any <δ<1. For any t 1 t 1witht t 1 <δ, by the convexity of f k, t δ 1 H kt H kt 1 = f k sds max f k sds, f k sds. 1 δ t 1 In the following, we shall focus on δ f ksds. The term 1 δ f ksds can be handled in exactly the same fashion. By Lemma 4.4, Sf k isnon-emptyfor every k>k,wherek is a sufficiently large positive integer. Furthermore, in view of Lemma 4.5, we can assume that inf k N inf [,1] f k M for some M>. Three scenarios are discussed in the following. Here we fix α, 1/. a There exists at least one knot τ k Sf k withτ k [δ, δ α ]. Then δ f k sds τk f k sds + Mδ α Xτ k X + Mδ α M δ α + Mδ α,.

29 Convex least squares estimation 199 for some M > whichistheα-hölder constant of this particular realization of Xt. The last line follows from Lemma 4.3iv and the fact that Xt isα-hölder-continuous, so Xτ k X M τ k α M δ α α = M δ α. b Sf k,δ α ]=. Letτ k Sf k betheleft-mostknotin[δ α, 1. Then f k is linear on [,τ k ]andf k τ k M. Note that M τ k α Xτ k X so f k M τ α 1 k δ τk f k sds = f k + f k τ k τ k f k Mτ k /, + M M δ α α + M. It now follows that f k sds M δ α α + Mδ M + Mδ α. We remark that since f k = k, the above conclusion also implies that Sf k,δ α ] for all sufficiently large k. c Sf k [δ, δ α =, but Sf k,δ]. Letτ k be the right-most knot in,δ]andτ + k be the left-most knot in [δα, 1. As a convention, we set τ + k = 1 if such a knot does not exist. Note that f k is linear on [τ k,τ+ k ], so we can use essentially the same argument as above to see that δ f τ k sds k M + Mδ α. Finally, δ δ f k sds Xτ k X + f k sds τ k M δ α +M + Mδ α =3M + Mδ α. To finish the proof, we shall apply Lemma 4.5 to verify that H k t H k t 1 Mδ. In addition, combining results from all the scenarios listed above, we see that {H k } k is uniformly Hölder continuous with exponent α < 1/4. Proof of Theorem.4 For every m N with m 3, define the following norms H m = sup Ht + sup H t + sup H t. t [,1] t [,1] t [1/m,1 1/m] First, we show the existence of such a function for almost all ω Ωby construction. Fix ω thus we focus on a particular realization of Xt but suppress its dependence on ω in the notation. Let H k be the function satisfying H k = f k, H k = and H k 1 = Y 1. We claim that the sequence H k admits a convergent subsequence in the topology induced by the norm m.

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES By Jon A. Wellner University of Washington 1. Limit theory for the Grenander estimator.