Asymptotically optimal estimation of smooth functionals for interval censoring, case 2

Size: px
Start display at page:

Download "Asymptotically optimal estimation of smooth functionals for interval censoring, case 2"

Transcription

1 Asymptotically optimal estimation of smooth functionals for interval censoring, case 2 Ronald Geskus 1 and Piet Groeneboom 2 Municipal Health Service Dept Public Health and Environment Nieuwe Achtergracht 1, 118 WT Amsterdam and Faculty of Information Technology and Systems Delft University of Technology Mekelweg 4, 2628 CD Delft Abstract For a version of the interval censoring model, case 2, in which the observation intervals are allowed to be arbitrarily small, we consider estimation of functionals that are differentiable along Hellinger differentiable paths. The asymptotic information lower bound for such functionals can be represented as the squared L 2 -norm of the canonical gradient in the observation space. This canonical gradient has an implicit expression as a solution of an integral equation that does not belong to one of the standard types. We study an extended version of the integral equation that can also be used for discrete distribution functions like the nonparametric maximum likelihood estimator (NPMLE), and derive the asymptotic normality and efficiency of the NPMLE from properties of the solutions of the integral equations. 1. rgeskus@gggd.amsterdam.nl 2. p.groeneboom@twi.tudelft.nl AMS 1991 subject classifications. 6F17, 62E2, 62G5, 62G2, 45A5. Key words and phrases. nonparametric maximum likelihood, empirical processes, asymptotic distributions, asymptotic efficiency, integral equations. 1

2 1 Introduction In the interval censoring problem one wants to obtain information on some distribution F, often representing an event time distribution, based on a sample of random intervals J 1,..., J n in which unobservable X 1,..., X n F are known to be contained. In case 1, we have a sample of observation times T i and we know whether X i is smaller or larger than the corresponding observation time T i. More formally: we observe (T 1, 1 ),..., (T n, n ), with i = 1 Xi T i. Case 2 is usually denoted as the situation with two observation times (U i, V i ) and the information whether X i is left of U i, between U i and V i or right of V i. For case 1, also denoted by current status data, quite a lot is known. It is already shown in Ayer et. al. (1955) and van Eeden (1956) that there exists a one-step procedure for calculation of the non-parametric maximum likelihood estimator (NPMLE) ˆF n of the distribution function F, based on isotonic regression theory. The asymptotic distribution of ˆF n (t ), for fixed t IR, is derived in part II, chapter 5 of Groeneboom and Wellner (1992); n 1/3 is the obtained convergence rate. The same chapter discusses convergence properties of the NPMLE µ( ˆF n ) of the mean µ(f ) and it is shown that, under some extra conditions, n(µ( ˆFn ) µ(f )) D N(, F (x)[1 F (x)] g(x) dx), as n with g(x) the density of the distribution of the observation times, and D denoting convergence in distribution. Huang and Wellner (1995) prove a similar result for functionals that are linear in F : K(F ) = c df. Then an extra factor c (x) appears in the integral formula for the limit variance. Groeneboom s proof for the mean uses the convergence rate of the supremum distance between the NPMLE and the underlying distribution function, which is replaced by a easier argument based on L 2 -distance properties of ˆFn and smoothness of c (x)/g(x) in Huang and Wellner s proof. In case 2 one may expect better estimation results than in case 1, since one has more information on the location of the X i s. However, both theoretical as well as practical aspects of the problem are more complicated. Only iterative procedures are available for computation of the NPMLE ˆF n of F. The iterative convex minorant algorithm, as introduced by Groeneboom in part II of Groeneboom and Wellner (1992), converges quickly in computer experiments, and a slight modification of this algorithm is shown to converge in Jongbloed (1998). When considering the asymptotic distribution of the NPMLE ˆF n (t ), a distinction should be made. If the relative amount of mass of the (U, V )-distribution near the diagonal point (t, t ), compared to the amount of mass of F near t is very small we are more in a case 1-type situation and we still have a n 1/3 convergence rate (Wellner (1995)). The limit distribution of ˆF n (t ) has been established in Groeneboom (1996) for the situation that U and V remain bounded away. If the observation time distribution has sufficient mass along the diagonal, the convergence rate increases to (n log n) 1/3 (see Groeneboom and Wellner (1992)). There is a conjecture on the asymptotic distribution of ˆF n (t ) in this situation, but the proof is still incomplete. Another estimator is F n (1) (t ), which is obtained by doing one step in the iterative convex minorant algorithm, with the true underlying distribution F as starting value. Of course, this procedure, which does not even lead to an estimator in the strict sense, has no practical value. However, it may be relevant for theoretical purposes. For the asymptotic distribution of F (1) n (t ) is known, and it is conjectured to have the same 2

3 asymptotic distribution as the NPMLE. Contrary to F (t ), for case 1 as well as case 2, the mean µ(f ) is a smooth linear functional. This means that it is differentiable along Hellinger differentiable paths of distributions. For any functional having this property, one can derive a Cramér-Rao-type information lower bound, giving the best possible limit variance that can be attained under n convergence rate. See e.g. van der Vaart (1991), part I of Groeneboom and Wellner (1992) or Bickel et. al. (1993) for the general theory and the application to case 1. For case 1, an explicit expression of the information lower bound can be derived and is given by c (x) φ(x) dx, with φ(x) being the function φ(x) = c (x) F (x)[1 F (x)]. g(x) So this lower bound is attained by the NPMLE. The function φ appearing in the information lower bound has an analogue in case 2. However, contrary to case 1, an explicit formula for φ is unknown, and unlikely to exist, except for some very special choices of the distributions of X and (U, V ) (see Geskus (1997)). Nevertheless it can be proved, without knowing φ explicitly, that the NPMLE K( ˆF n ), estimating the smooth functional K(F ), shows asymptotically optimal behavior in case 2 as well, using properties of the integral equation for φ. Just as in the problem of estimation of F (t ), two situations can be distinguished, showing different behavior. In Geskus and Groeneboom (1996) and Geskus and Groeneboom (1997), together from now on denoted as GG, the simplest case is treated, with the (U, V )-distribution having no mass along the diagonal. Smoothness properties of φ are derived from the structure of the integral equation and are sufficient to give the proof. The technical report Geskus and Groeneboom (1995) contains both papers. See also Geskus (1997), which contains a rather extensive treatment of the application of the general lower bound theory to case 1 and this simple version of case 2. In the present paper, we treat the situation where the observation times can be arbitrarily close. In the long run we will have observations for which we get very close to observing X itself, a situation that never occurs in case 1 and the case 2 treated in GG. Now the analysis is much more complicated, since we really have to deal with the singularity of the integrand of the integral equation on the diagonal. Our general outline is as follows. In section 2, after a specification of the problem, the relevant lower bound calculations are given. In section 3 we show asymptotic efficiency of the NPMLE; here we benefit from some recent results on the Hellinger distance between the NPMLE and the underlying distribution function in van de Geer (1996). Section 3 also contains our main result, Theorem 3.2, showing that the NPMLE K( ˆF n ) of a smooth functional K(F ) of the underlying distribution function F converges at n-rate, is asymptotically normal and has an asymptotic variance that is equal to the information lower bound. In section 4 we present some simulation results showing that the variance of the NPMLE is already very close to the information lower bound for reasonable sample sizes. In this section we also show some pictures of the solution of the integral equation and its derivative for the case that the distributions are uniform (F uniform and the distribution of (U, V ) uniform on the upper triangle of the unit square) and the smooth functional corresponds to 3

4 the mean. These graphs are compared to graphs of a corresponding solution of the integral equation when F is replaced by the NPMLE (and the equation is transformed into another equation in an inverse scale). Finally, in the appendix, some technical results are proved that are needed in section 3. Of course, situations with more than two observation times for each unobservable event time may occur as well. This is usually denoted as the case k situation. However, only the two observation times immediately around the event time give relevant information, so it very much resembles case 2. We will only consider case 2. See also GG, where case k is briefly discussed. Unless otherwise stated, all norms in this paper should be read as L 2 -norms. The dominating measure varies, but is denoted by a subscript, or should be clear from the context. The L 2-space denotes the subspace of functions that integrate to zero. 2 The model; lower bound considerations We will start with a brief formulation of the problem and a list of assumptions that are needed in order to perform the lower bound calculations. Moreover, a key equation on which our analysis will be based is given. A more elaborate introduction, including a derivation of this equation, can be found in GG and Geskus (1997). 2.1 The model; general lower bound theory for smooth functionals Let F be a distribution function. We are interested in estimation of some functional K(F ). However, instead of a sample X 1,..., X n F, we are only able to observe the sample (U 1, V 1, 1, Γ 1 ),..., (U n, V n, n, Γ n ) with i = 1 Xi U i and Γ i = 1 Ui <X i V i. The following model assumptions are made: (M1) Let K > and let S be a bounded interval IR. F is contained in the class F S := F support(f ) S, F absolutely continuous, sup f(x) K. x F is the distribution on which we want to obtain information; however, we do not observe X i F directly. (M2) Instead, we observe the pairs (U i, V i ), with distribution function H. H is contained in H, the collection of all two-dimensional distributions on (u, v) u < v, absolutely continuous with respect to two-dimensional Lebesgue measure and such that (U i, V i ) is independent of X i. Let h denote the density of (U i, V i ), with marginal densities and distribution functions h 1, H 1 and h 2, H 2 for U i and V i respectively. (M3) If both H 1 and H 2 put zero mass on some set A, then F F S has zero mass on A as well, so F H 1 + H 2. This means that F does not have mass on sets in which no observations can occur. Typically, under (M1), F has support [, M], and H has its mass on the triangle (u, v) u < v M. From now on we will restrict attention to this typical situation. More general choices of the support and the situation with H 1 + H 2 having larger support than F have been treated in Geskus (1997), but are similar in essence. Without condition (M3), the 4

5 functionals in which we are interested are not well-defined. Moreover, if F has probability mass on a region in which no observations can occur, one cannot estimate the structure of F on this region consistently. In many survival studies, events do occur outside the domain of observation, making estimation of functionals like the mean an impossible task. The observable random vectors (U i, V i, i, Γ i ) have density q F,H (u, v, δ, γ) = h(u, v)f (u) δ (F (v) F (u)) γ (1 F (v)) 1 δ γ with respect to λ 2 ν 2, where ν 2 denotes counting measure on the set (, 1), (1, ), (, ). The letter M in conditions (M1) to (M3) stands for model. With respect to the functional to be estimated we assume: (F1) K is differentiable along Hellinger differentiable paths of distributions from F S. The canonical gradient for this functional is denoted by κ F. What functionals satisfy this pathwise differentiability requirement? An important class of functionals are the functionals that are linear in F, K(F ) = c(x) df (x). All moment functionals F x k df (x) belong to this class. Estimation of the distribution function at a fixed point concerns a linear functional as well: for K(F ) = F (t ) we have c(x) = 1 [,t ](x). In Bickel et. al. (1993), proposition A.5.2, it is shown that linear functionals on F S with sup F F S E F c(x) 2 < are pathwise differentiable at any F F S, with canonical gradient κ F (x) = c(x) c(x) df (x). So if we were able to observe the X i s directly, we would obtain the information lower bound c(x) E F (c(x)) 2 F. For nonlinear functionals, there is no general method that immediately establishes pathwise differentiability and supplies the formula for the canonical gradient. An example of a nonlinear functional to which our theory can be applied is K(F ) = F 2 (x) w(x) dx. If the function w is bounded, a slight extension of the proof in Bickel et. al. (1993), as given in Geskus (1997), shows that this functional has canonical gradient M M M κ F (x) = 2 F (s) w(s) ds 2 F (s) w(s) ds df (x). s=x x= s=x In order to show that the NPMLE K( ˆF n ) asymptotically attains the lower bound, we have to make the following extra assumption: (F2) K(G) K(F ) = κ F (x) d(g F )(x) + O( G F 2 λ ), 5

6 for all distribution functions G with support contained in [, M], with λ denoting Lebesgue measure on IR. For linear functionals (F2) holds without the O-term. However, the functional K(F ) = F 2 (x) w(x) dx also satisfies (F2). In the interval censoring model we do not observe X F directly, and the estimand K(F ) is only implicitly defined as a functional Θ(Q F,H ) on the class of probability measures on the observation space, with H acting as a nuisance parameter. What about differentiability and information lower bounds in this model? The score operator L 1, relating the censoring model to the, unattainable, model without censoring is in our situation: [L 1 a](u, v, δ, γ) = E[a(X) U = u, V = v, = δ, Γ = γ] = δ u a df F (u) + γ v u a df F (v) F (u) M v (1 δ γ) a df + 1 F (v) a.e. [Q F,H ](2.1) This operator may be defined on L 2 (F ), with range in L 2 (Q F,H ). However, since it relates scores, our main interest lies in the domain L 2 (F ). Then its range is contained in L 2 (Q F,H). The adjoint of L 1 on L 2 (Q F,H) can be written as [L 1b](x) = E[b(U, V,, Γ) X = x] and we get [L 1b](x) = M M u=x v=u x M u= v=x x x u= v=u b(u, v, 1, ) h(u, v) dv du + b(u, v,, 1) h(u, v) dv du + (2.2) b(u, v,, ) h(u, v) dv du a.e.-[f ]. Now we have pathwise differentiability of Θ(Q F,H ) if and only if κ F R(L 1 ) and if this holds, then the canonical gradient is the unique element θ F,H in R(L 1 ) L 2 (Q F,H) satisfying L 1 θ F,H = κ F. (2.3) (See van der Vaart (1991).) Many functionals that are pathwise differentiable in the model without censoring, lose this property in the interval censoring model. Due to the smoothness of the adjoint operator, any functional K with a canonical gradient that is not a.e. equal to a continuous function cannot be obtained under L 1. So not all linear functionals remain pathwise differentiable. For example, K(F ) = F (t ), with canonical gradient 1 [,t ]( ) F (t ), is discontinuous at t, and therefore does not belong to the range of L 1. This corresponds with F (t ) not being estimable at n-rate. However, functionals with a canonical gradient that is sufficiently smooth, will be shown to remain differentiable under censoring. Hence for these functionals the information lower bound theory holds. If the canonical gradients θ = θ F,H and κ F satisfy some extra conditions, the information lower bound θ 2 Q F,H has an alternative formulation. 6

7 Theorem 2.1 Let θ be contained in R(L), say θ = La for some a L 2 (F ). Assume that the function x κ F (x) is differentiable with bounded derivative. Then we have with φ (x) = M x a (t) df (t). θ 2 Q F,H = < a, κ F > F = M κ F (x)φ (x)dx Proof: See theorem 3.3 in Geskus and Groeneboom (1995). This theorem holds more generally. However, in the interval censoring model, both case 1 and case 2, we have the extra property that the function φ(x) def = M x a(t) df (t) with a L 2(F ) also appears explicitly in the score operator L 1. Therefore it will play an important role. It will be called the integrated score function. From its definition we know that φ satisfies φ()=φ(m)= and that φ is continuous for F F S. In section 2.2 we will pay attention to the structure of the lower bound. Section 3 will be devoted to showing that the NPMLE ˆΘ n of Θ(Q F,H ) satisfies n( ˆΘn Θ(Q F,H )) D N(, θ 2 Q F,H ). 2.2 Lower bounds for interval censoring case 2 We restrict ourselves to the case θ R(L 1 ). So the case θ R(L 1 ) \ R(L 1 ) will not be considered. Solvability of the equation κ F (x) = [L 1 L 1a](x) a.e. F (2.4) in the variable a L 2 (F ) will be investigated. The support of F may consist of a finite number of disjoint intervals. However, (2.4) is not defined on intervals where F does not put mass, and these intervals do not play any further role. So without loss of generality we may assume the support of F to consist of one interval [, M]. By the structure of the score operator L 1 this can be reformulated as an equation in φ. If we suppose equation (2.4) to hold for all x [, M], taking derivatives on both sides yields the following integral equation: [ x φ(x) + d F (x) t= with d F (x) being the function M φ(x) φ(t) F (x) F (t) h(t, x) dt d F (x) = t=x φ(t) φ(x) F (t) F (x) h(x, t) dt ] = k(x)d F (x), (2.5) F (x)[1 F (x)] h 1 (x)[1 F (x)] + h 2 (x)f (x), writing k(x) instead of κ F (x). Although k may depend on F, we do not explicitly express this dependence. This is done, since in proving asymptotic efficiency of the NPMLE we have 7

8 to consider (2.5) for convex combinations F = (1 α)f +α ˆF n, where F F S (the unknown distribution) is continuous and ˆF n (the NPMLE of F ) is purely discrete. Solvability and structure of the solution to (2.5) will be investigated for such combinations, with k still determined by the underlying distribution F (so k = κ F ). Apart from the model conditions (M1) to (M3), some extra conditions will have to be introduced in order to make the proofs in this section possible. For the distributions we assume (D1) h 1 (x) + h 2 (x) > for all x [, M]. (D2) h(u, v) is continuous. The partial derivatives 1 t (x) = x h(x, t) and 2 t (x) = xh(t, x) exist, except for at most a finite number of points x, where left and right derivatives with respect to x do exist for each t. The derivatives are bounded, uniformly in t and x. (D3) F is a nondegenerate distribution function with at most finitely many points of jump x i (, M). Let D = x =, x 1,..., x m, x m+1 = M denote the ordered set of jump points of F, augmented with the endpoints of the interval [, M]. We assume that F is differentiable between jumps, except for at most a finite number of points, where left and right derivatives exist. Everywhere outside D, the derivative is bounded and c for some c > (so we assume f (x) c for all x [, M]). The set of points of jump may be empty. Note that if F has jumps, we assume that F has derivative c also on the (non-empty) intervals (, x 1 ) and (x m, M) (where we allow x 1 = x m, though). For the functional we should have (F3) k is differentiable, except for at most a finite number of points x, where left and right derivatives exist. The derivative is bounded, uniformly in x. Note that the letter D in conditions (D1) to (D3) stands for distribution and the letter F in (F1) to (F3) for functional. Of course, (D2) implies continuity of h 1 and h 2. (D1) is the equivalent of g > in case 1 and is needed. It implies that d F is bounded. In case 1, the function φ has an explicit representation of the form φ(x) = k F [1 F ], g whereas in case 2, φ can only be expressed implicitly as a solution to (2.5). If (2.5) is solvable, its solution φ can be shown to contain a factor F (1 F ), just as in case 1. The structure of d F already suggests this factor to be present. Validity of the factorization is shown by inserting φ = F (1 F ) ξ in (2.5). Some reordering yields an integral equation in ξ, which will be shown to be solvable. This ξ-equation has the following form: [ x M ] ξ(x) ξ(t) ξ(x) + c F (x) F (x) F (t) h ξ(t) ξ(x) (t, x) dt F (t) F (x) h (x, t) dt = k(x)c F (x), (2.6) with c F (x) given by t= c 1 F (x) = x t= t=x [1 F (t)] h(t, x) dt + M t=x F (t) h(x, t) dt = h 2 (x) E1 F (U) V = x + h 1 (x) EF (V ) U = x (2.7) 8

9 and h (t, x) = F (t)[1 F (t)] h(t, x) if t x h (x, t) = F (t)[1 F (t)] h(x, t) if x t This equation is similar in structure to the φ-equation. So the lemmas and theorems in the remainder of this section apply to both the φ-equation (2.5) and the ξ-equation (2.6). Most of the proofs will only be given for the φ-equation. Unlike the situation treated in GG, we now assume that the observation density does have mass along the diagonal. This has the consequence that the integral equation may no longer be a Fredholm integral equation. However, we first consider a desingularized integral equation, to which the theory on Fredholm integral equations of the second kind can be applied (Kress (1989)). If F has jumps, the solution of the integral equation will in general also have jumps. However, the key observation in analyzing the integral equation and in proving the efficiency of the NPMLE is that, even when F has discontinuities, we can make a change of scale in such a way that the solution of the integral equation can be extended to a Lipschitz function in the transformed scale. We first introduce some notation. Let G(t) = F 1 (t), t [, 1], with a derivative g which exists except for at most a finite number of points, where, however, G has left and right derivatives. Furthermore, let k(t) = k(g(t)), H(t, u) = H(G(t), G(u)) and likewise h(t, u) = h(g(t), G(u)), and let d F be defined by d F (t) = (2.8) t(1 t) (1 t) h 1 (t) + t h 2 (t), (2.9) where h i = h i G, i = 1, 2. Note that, if F has jumps, d F d F G. Also note that k, d and h are continuous. In a similar way, we define c F (t) 1 = t (1 s) h(s, t) dg(s) + 1 t s h(t, s) dg(s) and h (t, u) = t(1 t) h(t, u) if t u h (u, t) = t(1 t) h(u, t) if u t (2.1) We have the following lemma. Lemma 2.1 (i) The integral equation t φ ɛ (t) = d F (t) k(t) φ ɛ(t) φ ɛ(t ) (t t ) ɛ 1 h(t, t) dg(t )+ t has a unique continuous solution φ ɛ, satisfying φ ɛ(u) φ ɛ(t) (u t) ɛ h(t, u) dg(u) (2.11) inf d F (x)k(x) φ ɛ (t) sup d F (x)k(x), (2.12) x [,M] x [,M] for all t [, 1] and ɛ >. For points t in the range of F, say t = F (x), we have φ ɛ (t) = φ ɛ (x) 9

10 (ii) The integral equation ξ ɛ (t) = c F (t) k(t) t ξ ɛ(t) ξ ɛ(t ) (t t ) ɛ has a unique continuous solution ξ ɛ, satisfying 1 h (t, t) dg(t ξ )+ ɛ(u) ξ ɛ(t) (u t) ɛ h (t, u) dg(u) t (2.13) inf c F (x)k(x) ξ ɛ (t) sup c F (x)k(x), (2.14) x [,M] x [,M] for all t [, 1] and ɛ >. For points t in the range of F, say t = F (x), we have ξ ɛ (t) = ξ ɛ (x) Proof: ad (i) By the Fredholm theory, as used e.g. in Geskus and Groeneboom (1996), theorem 5, p. 82, the φ ɛ -equation (2.11) can be shown to have a unique continuous solution, for each ɛ >. Note that the integration in (2.11) is only with respect to dg(t ) and dg(u) and therefore only involves values belonging to the range of F. So for points t in the range of F we have φ ɛ (t) = φ ɛ (G(t)). Let m = arg min[ φ ɛ ] and s = arg max[ φ ɛ ]. We have φ ɛ (s) d F (s) k(s) since φ ɛ (s) φ ɛ (t), t [, 1], and similarly φ ɛ (m) d F (m) k(m) since φ ɛ (m) φ ɛ (t), t [, 1]. Hence we have (2.12). sup d F (x)k(x), x [,M] inf d F (x)k(x), x [,M] ad (ii) The argument is completely similar to the argument given for (i). The following lemma is the crux of the proof of the existence of the solution to the original integral equation. Lemma 2.2 The functions φ ɛ are Lipschitz on [, 1], uniformly in ɛ >. Proof: We will use similar notation as in Lemma 2.1. Let x 1,..., x m be the points of jump of F and let x =, x m+1 = M. Furthermore, let τ i = F (x i ), i =,..., m+1. For i =,..., m, the interval [τ i, τ i+1 ] can be divided into two parts: (1) the interval [τ i, τ i ), where τ i = F (x i+1 ). The interval [τ i, τ i ) corresponds to the interval [x i, x i+1 ) in the original scale. The function G is strictly increasing and differentiable on the interval (τ i, τ i ), and is right and left differentiable at τ i and τ i respectively. (2) the interval [τ i, τ i+1]. This interval corresponds to the jump of F at x i+1. Here the function G is constant, again having right and left derivatives at the respective endpoints. 1

11 If i = m, the second interval only consists of the point 1. Let D = τ,..., τ m+1 τ,..., τ m discontinuity points of k (t), d F (t), 1 t (t) = t h(t, t ) for t t, and 2 u(t) = t h(u, t) for t u. Then φ ɛ (t) is differentiable for t / D, and has left and right derivatives for t D. Using k(t) t φ ɛ(t) φ ɛ(t ) (t t ) ɛ h(t, t) dg(t ) + 1 t φ ɛ(u) φ ɛ(t) (u t) ɛ = φ ɛ (t)/ d F (t) = ξ ɛ (t)[(1 t) h 1 (t) + t h 2 (t)], and using left or right dervatives when t D, we have: φ ɛ (t) = d F (t) ξ ɛ (t) [(1 t) h 1 (t) + t h 2 (t)] + d F (t) k (t) t φ ɛ(t) φ ɛ(t ) (t t ) ɛ 1 t h(t, t) dg(t ) φ ɛ(u) φ ɛ(t) (u t) ɛ + t d F (t) φ ɛ (t) t :t t t t φ ɛ(t) φ ɛ(t ) >ɛ d F (t) φ ɛ (t) ɛ 1 t t ɛ + (t t ) 2 d H(t, t) h(t, u) dg(u) h(t, t u) dg(u) φ ɛ (t) u t φ ɛ(u) φ ɛ(t) u:u t>ɛ t+ɛ h(t, t)g(t ) dt + t d H(t, u) (u t) 2 h(t, u)g(u) du. (2.15) Note that H(t, t u) = h(t, u)g(t) and similarly for the other partial derivative of H. Moving the terms containing φ ɛ to the left-hand side of (2.15), shows that φ ɛ (t) has a finite upper bound, using Lemma 2.1. Moreover, φ ɛ is piecewise continuous on the closed intervals from one point in D to the subsequent one. So φ ɛ attains a maximum value, which may be a right or left derivative. The rest of the proof is devoted to showing that this maximum value is uniform in ɛ. def Let M ɛ = sup t [,1] φ ɛ (t) and suppose that φ ɛ attains its supremum at a point s. Note that M ɛ, since φ ɛ () = φ ɛ (1) = and φ ɛ is continuous. Then, if < t < s ɛ, φ ɛ (s) s t Likewise, if 1 > t > s + ɛ, we get φ ɛ(s) φ ɛ(t) (s t) 2 φ ɛ(s) t s s t φ ɛ (s) φ ɛ (u) du φ ɛ(t) φ ɛ(s) (t s) 2. (s t) 2. So these parts work in the opposite direction, and are harmless in (2.15). Now let K ɛ (t) be defined by K ɛ (t) def = d F (t) k (t) + d F (t) ξ ɛ (t) [(1 t) h 1 (t) + t h 2 (t)] t d F (t) φ ɛ(t) φ ɛ(t ) (t t ) ɛ 1 t h(t, t) dg(t ) t φ ɛ(u) φ ɛ(t) (u t) ɛ h(t, t u) dg(u) 11

12 and let C ɛ (t) be defined by C ɛ (t) def = 1 + d F (t) ɛ 1 t Then we have implying t ɛ h(t, t)g(t ) dt + In a similar way, if m ɛ def = inf t [,1] φ ɛ(t), we get t+ɛ t h(t, u)g(u) du, t [, 1]. (2.16) φ ɛ(s) C ɛ (s) K ɛ (s), (2.17) M ɛ sup K ɛ (t)/c ɛ (t). (2.18) t [,1] m ɛ Let the function A δ be defined by t A δ (t) def = d F (t) h(t t, t) dg(t) + t δ Fix δ > such that, for all t [, 1], inf K ɛ(t)/c ɛ (t). (2.19) t [,1] t+δ Note that δ > can be chosen independently of ɛ >, since t t h(t, u) dg(u), t [, 1] A δ (t)/c ɛ (t) 1 2. (2.2) lim C ɛ (t) = d F (t) h(t, t)g(t), t (, 1). ɛ Then we get from (2.2), for each t [, 1], by applying the mean value theorem on the ratios φ ɛ (t) φ ɛ (t )/(t t ) and φ ɛ (u) φ ɛ (t)/(u t), t d F (t) φ ɛ(t) φ ɛ(t ) h(t t+δ t, t) dg(t ) + φ ɛ(u) φ ɛ(t) h(t, t u) dg(u) /C ɛ (t) t δ (t t ) ɛ A δ (t) maxm ɛ, m ɛ /C ɛ (t) 1 2 maxm ɛ, m ɛ. Defining B δ (t) by B δ (t) we get, for t [, 1], t (u t) ɛ def = d F (t) k (t) + d F (t) [(1 t) h 1 (t) + t h 2 (t)] sup + 2 d F (t) δ sup d F (t ) k(t ) t [,1] sup t [,t] d F (t) k (t) + d F (t) ξ ɛ (t) [(1 t) h 1 (t) + t h 2 (t)] + d F (t) t δ φ ɛ(t) + φ ɛ(t ) t t t h(t, t) dg(t ) + d F (t) k (t) + d F (t) ξ ɛ (t) [(1 t) h 1 (t) + t h 2 (t)] + 2 d F (t) δ B δ (t) c, sup φ ɛ (t ) t [,1] t δ t h(t, t) dg(t ) + 12 t [,1] t h(t, t) + sup 1 t+δ 1 t+δ c F (t ) k(t ) u [t,1] φ ɛ(t) + φ ɛ(u) u t t h(t, u) dg(u) h(t, t u) h(t, t u) dg(u) (2.21)

13 for some constant c, independent of ɛ and t. Hence, for each t [, 1], implying φ ɛ (t) A δ(t)/c ɛ (t) + B δ (t)/c ɛ (t) 1 2 maxm ɛ, m ɛ + B δ (t)/c ɛ (t), 1 2 maxm ɛ, m ɛ sup B δ (t)/c ɛ (t) sup c/c ɛ (t) c, (2.22) t [,1] t [,1] for some constant c independent of ɛ. Hence φ ɛ (t) is bounded on [, 1], uniformly in ɛ and t, implying that φ ɛ is Lipschitz, uniformly in ɛ >. We now have the following theorem. Theorem 2.2 Let G(t) = F 1 (t), t [, 1], with a derivative g which exists except for at most a finite number of points, where G has left and right derivatives. Furthermore, let k(t) = k(g(t)), H(t, u) = H(G(t), G(u)), h(t, u) = h(g(t), G(u)), and let df be defined by where h i = h i G, i = 1, 2. Then (i) The integral equation t φ(t) = d F (t) k(t) d F (t) def = t(1 t) (1 t) h 1 (t) + t h 2 (t), (2.23) 1 φ(t) φ(t ) t t d H(t, t) + t has a unique solution which is Lipschitz on [, 1]. φ(u) φ(t) u t d H(t, u), t [, 1], (2.24) (ii) The Lipschitz norm in (i) has the following upper bound. Let C(t) be defined by Moreover, let A δ (t) and B δ (t) be defined by t A δ (t) def = d F (t) h(t t, t) dg(t ) + t δ and C(t) def = d F (t)g(t) h(t, t). (2.25) t+δ t t h(t, u) dg(u), (2.26) B δ (t) def = d F (t) k (t) + d F (t) [(1 t) h 1 (t) + t h 2 (t)] sup + 2 d F (t) δ sup d F (t ) k(t ) t [,1] sup t [,t] t [,1] t h(t, t) + sup c F (t ) k(t ) u [t,1] t h(t, u) (2.27) At the points in D = discontinuity points of g(t), augmented with and 1 discontinuity points of k (t), d F (t), 1 t (t) = h(t, t t ) for t t, and 2 u (t) = t h(u, t) for t u, 13

14 A δ and B δ have two versions, one corresponding to taking left derivatives and one corresponding to taking right derivatives. Then there exists a δ > such that and we have where c is given by sup A δ (t)/c(t) 1/2 t [,1] (iii) The integral equation (2.5) has a unique solution φ. φ(u) φ(t) c(u t), t < u 1, (2.28) c = 2 sup B δ (t)/c(t). (2.29) t [,1] Proof: ad (i) By the preceding two lemma s, the set φ ɛ : ɛ ɛ (for some ɛ > ) is bounded and equicontinuous. Hence, by the Arzelà-Ascoli theorem, each sequence φ ɛn, ɛ n, has a subsequence ( φ ɛm ), converging in the supremum metric to a continuous function φ on [, 1]. By Lebesgue s dominated convergence theorem we get, for such a subsequence ( φ ɛm ), φ(x) = lim φ ɛm (x) m = d F (x) k(x) x φ(x) φ(t) x t h(t, x) dg(t) + 1 x φ(t) φ(x) t x Uniqueness of the solution follows in the same way as in lemma 2.1. ad (ii) It was shown in (2.22) in the proof of Lemma 2.2 that where C ɛ is defined by (2.16). But since sup φ ɛ(t) 2 sup B δ (t)/c ɛ (t), t [,1] t [,1] lim C ɛ (t) = d F (t) h(t, t)g(t), ɛ h(x, t) dg(t). (2.3) for t [, 1], (2.28) now follows. ad (iii) We define φ by φ(x) = φ(f (x)). If t = F (x), we get, by a change of variables, φ(x) = φ(t) t = d F (t) k(t) = d F (x) k(x) x 1 φ(t) φ(t ) t t d H(t, t) + φ(x) φ(x ) F (x) F (x ) dh(x, x) + t u t d H(t, u) φ(u) φ(t) M x φ(y) φ(x) F (y) F (x) dh(x, y), and hence φ satisfies the original integral equation. Uniqueness of φ follows from uniqueness of φ (since a solution φ conversely defines a solution φ on the inverse scale). Remark. The same arguments can be applied to prove existence of a solution to the ξ- equation. Hence φ can be written as φ = F (1 F ) ξ. Solvability of κ F = L 1 L 1a can now immediately be seen. 14

15 Corollary 2.1 The equation κ F = L 1 L 1a is solvable. Proof: By the Lipschitz property of φ we have, for any x < y M, φ(y) φ(x) F (y) F (x) = φ(f (y)) φ(f (x)) F (y) F (x) K, for some constant K. Thus the Radon-Nikodym derivative dφ/df is a.e.-[f ] bounded by K. For the canonical gradient we get, if t < u, θ F (t, u, δ, γ) = [L 1 a](t, u, δ, γ) = δ φ(t) F (t) γ φ(u) φ(t) φ(u) F (u) F (t) + (1 δ γ) 1 F (u). (2.31) 3 Asymptotic efficiency of the NPMLE In this section we will denote the unknown distribution function of the unobservable random variables X i by F. As in section 2, we will assume that F is continuous. Let ˆF n be the NPMLE of F, based on the sample of observations (U 1, V 1, 1, Γ 1 ),..., (U n, V n, n, Γ n ). It is obtained by maximizing the likelihood n F (U i ) i (F (V i ) F (U i )) Γ i (1 F (V i )) 1 i Γ i h(u i, V i ) (3.1) i=1 over the class of piecewise constant right-continuous (sub-)distribution functions on [, M], having jumps only at a subset of the points U i and V i, i = 1,..., n. The properties of the function thus obtained are discussed in Groeneboom and Wellner (1992) and GG. A rather important property of the NPMLE is that it does not depend on h(u i, V i ), so we do not have to perform any preliminary density estimation or bandwidth choice. The fact that we do not have to solve a bandwidth problem is one of the great advantages of the nonparametric maximum likelihood approach in the present problem. By the restriction that ˆF n only has mass at the observation times, also made in Groeneboom and Wellner (1992), we get a piecewise constant function. Let x =, x m+1 = M and let x 1 <... < x m be the points of jump of F in the interval (, M). Then ˆF n satisfies the following properties: Proposition 3.1 Any function σ that is constant on the same intervals J i = [x i 1, x i ) as ˆF n satisfies for i = 2,..., m. δ σ(t) t J i + σ(u) u J i ˆF n(t) γ ˆF n(u) ˆF dq n (t, u, δ, γ) n(t) γ ˆF n(u) ˆF 1 δ γ n(t) 1 ˆF n(u) dq n (t, u, δ, γ) = Proof: See Groeneboom and Wellner (1992), part II, proposition 1.3, and Geskus and Groeneboom (1997), corollary 1, p

16 Proposition 3.2 Prob lim ˆF n F = = 1 n Proof: See Groeneboom and Wellner (1992), part II, sections 4.1 (case 1) and 4.3 (case 2). Proposition 3.3 ˆF n F Hi = O p (n 1/3 (log n) 1/6 ) as n, for i = 1, 2 Proof: See Geskus and Groeneboom (1997), corollary 2, p. 29, and van de Geer (1996), example 3.2. The following result is needed in the proof of Lemma 3.1. Proposition 3.4 lim P r ˆF n is defective =. n Proof: See Geskus and Groeneboom (1997), proposition 1, p. 26. Although the conditions on H are different there, the proof is the same, since the difference in conditions has no bearing on this particular property. In addition to the smoothness conditions (D1) to (D3), given in section 2, we assume (D4) h(t, t) = lim u t h(t, u) c >, for all t (, M) and some c >. Like in GG our definition of the canonical gradient θ will be extended to piecewise constant distribution functions F with finitely many discontinuities, based on the solution φ F of a discrete version of the integral equation (2.5). (In order to stress dependence on F, we will write φ F instead of φ.) However, since F (v) F (u) no longer remains bounded away from zero on the region where H puts mass, we have to use an approach different from the one in GG. On one hand, the quotient φ F (v) φ F (u) F (v) F (u), for u and v in the same interval of constancy of F, can only be defined correctly if φ F is constant on the same interval. On the other hand, d F, h and κ F in general are not constant on these intervals, making a completely discrete version of the integral equation impossible. Therefore, instead of one function φ F we now need a pair of functions (φ F, ψ F ), satisfying φ F (x) = d F (x) k(x) where r F (t, u) is defined by x r F (t, x) h(t, x) dt + M x r F (x, t) h(x, t) dt, (3.2) r F (t, u) = φf (u) φ F (t) F (u) F (t), if F (t) < F (u), ψ F (u) ψ F (t) F (u) F (t), if F (t) = F (u), t < u, (3.3) 16

17 and where φ F is constant on the same intervals as F. Since φ F is constant, the only real integral part is the ψ F -part; the remaining part of the integral can be written as a summation. The key to the proof of the existence of a solution pair (φ F, ψ F ) and also to the other proofs in this section are a representation of the equation for φ F on an inverse scale and the construction of a continuous extension of the equation for φ F on this inverse scale (similar techniques were used in section 2). Using a similar notation as in section 2, we denote by G the inverse of F, where, for purely discrete distribution functions F, we take the right-continuous version of the inverse, defined by Furthermore, we define G(t) = infx [, M] : F (x) > t, t. k F = k G, h 1,F = h 1 G, h 2,F = h 2 G and d F (t) = and likewise H(t, u) = H(G(t), G(u)), t u 1. For part (iii) of theorem 3.1 we will also need the following notation ij (h) def = d i def = i (g) def = xi+1 u=x i xi+1 t(1 t) (1 t) h 1,F (t) + t h 2,F (t). x i g(t) dt, (3.4) xj+1 v=x j h(u, v) dv du, (3.5) z i (1 z i ) i (h 1 ) (1 z i ) + i (h 2 ) z i. (3.6) The following theorem shows the existence of the solution pair. Moreover it gives a uniform Lipschitz condition for the functions φ F and ψ F, which will be a crucial tool in showing the Donsker property for θ F. Theorem 3.1 Let the following conditions on F, H and κ F be satisfied: (M1) to (M3); (D1) to (D4); (F1) to (F3). Furthermore, let F [,M] be the set of discrete non-defective distribution functions on [, M] with finitely many points of jump, contained in (, M). Then there exists an ɛ > such that, for F F, where F is defined by F F [,M] : sup x [,M] F (x) F (x) ɛ, (i) There exists a unique Lipschitz function φ F : [, 1] IR such that, for t [, 1] \ D, φ F (t) = d φ F (t) kf (t) F (t) φ F (t ) t t t d H(t, t) [,t) φ + F (u) φ F (t) u t d H(t, u), (3.7) u (t,1] where D is the (finite) set of discontinuities of the right-continuous inverse G = F 1 in (, 1), augmented with and 1. The function φ F is Lipschitz, uniformly for F F. (ii) There exists a pair (φ F, ψ F ), solving the integral equation (3.2), where φ F is absolutely continuous with respect to F and the function ψ F is Lipschitz on each interval between jumps of F, uniformly for F F, with a Lipschitz norm not depending on the interval. 17

18 (iii) Let z i = F (x i ) and y i = φ F (x i ), i = 1,..., m. Then, using the definitions (3.6) to (3.5), we have that the vector y = (y 1,..., y m ) is the unique solution of the set of linear equations y i d 1 i + ji (h) z i z j j<i = i (k) + j<i ji (h) z i z j + j>i y j + j>i ij (h) z j z i ij (h) z j z i y j, i = 1,..., m. (3.8) Theorem 3.1 will be proved by approximating the purely discrete distribution function F by the function F α = (1 α)f + αf and by studying the behavior of the corresponding function φ Fα, as α 1. The rather technical proof is given in the Appendix. By Theorem 3.1, the definition of the function θ F can be extended to piecewise constant distribution functions F F by defining θ F (t, u, δ, γ) = δφ F (t) F (t) where φ F and ψ F solve equation (3.2), and where φ F (t) F (t) γ r F (t, u) + (1 δ γ)φ F (u) 1 F (u), (3.9) and φ F (u) 1 F (u) are defined to be zero if F (t) = or if 1 F (u) =, respectively. Note that θ F no longer has an interpretation as canonical gradient. In the sequel we will write Q F instead of Q F,H. We are now ready to formulate our main result. Theorem 3.2 Let the conditions of Theorem 3.1 be satisfied. Then n(k( ˆFn ) K(F )) D N(, θ F 2 Q F ) as n (3.1) Proof: Note that it is sufficient to show the following: n(k( ˆFn ) K(F )) = n θ F d(q n Q F ) + o p (1). (3.11) Moreover, using the uniform consistency of ˆFn (see Proposition 3.2), we may assume that ˆF n F, for all large n, where F is defined as in Theorem 3.1. The proof consists of the following steps. I. By conditions (D1) and (F2), and Proposition 3.3 we have n(k( ˆFn ) K(F )) = n κ F d( ˆF n F ) + o p (1). II. In lemma 3.1 the following will be shown: κ F d(f F ) = θ F dq F, if F F. 18

19 III. Unlike the situation in GG, φ ˆF n is constant on the same intervals as ˆF n. Since γ =, if ˆF n (u) = ˆF n (t), Proposition 3.1 can be used to obtain θ ˆF n dq n =, yielding n θ ˆFn dq F = n θ ˆFn d(q n Q F ). IV. This is further split into n θ ˆF n d(q n Q F ) = n θ F d(q n Q F ) + n ( θ ˆF n θ F )d(q n Q F ). The last term will be shown to be o p (1) (Theorem 3.3). Lemma 3.1 Let F be defined as in Theorem 3.1. Then we have under the conditions of Theorem 3.1, for all F F, κ F d(f F ) = θ F dq F. Proof: Let, for any distribution function F, L F : L 2 (F ) L 2 (Q F ) denote the conditional expectation operator [L F a](u, v, δ, γ) = δ u a df F (u) + γ v u a df F (v) F (u) + (1 δ γ) with adjoint L, given by the conditional expectation M v a df [L b](x) = E[b(U, V,, Γ) X = x] a.e.-f. 1 F (v) a.e. [Q F ], Since the adjoint is an expectation, conditionally on the value of the random variable X F, its structure does not depend on F. F only determines where it has to be defined (the a.e.-f part). Still a L 2 (F ) implies L F (a) L 2 (Q F ). The ratios occurring in θ F, F F, are bounded, since, by Theorem 3.1, φf and ψ F are Lipschitz functions, if F F. Hence θ F L 2 (Q F ), for F F. Let 1 L 2 (F ) denote the constant function 1(x) 1, x IR. Under L F this transforms into the constant function 1 (t, u, δ, γ) 1 on L 2 (Q F ). Now we have, θ F dq F = < θ F, 1 > QF = < θ F, L F (1) > QF = < L ( θ F ), 1 > F = L ( θ F ) df. 19

20 If we can prove L ( θ F ) = κ F κ F df a.e.-f, we are done. This is shown as follows: Recall that the integral equation was obtained by taking derivatives on both sides of the equation κ F (x) = [L θf ](x) for all x [, M]. Now we will go the other way, integrate, but replace θ F by θ F, obtaining [L θf ](x) = [ κ F ](x) + C for all x [, M]. For the constant C we have, using that F is non-defective, C = C df = L ( θ F ) df κ F df. It is easily shown that θ F is contained in L 2 (Q F ). (However, it is not contained in R(L F ), because of the part ψ F of the solution pair (φ F, ψ F ).) Now we have < L ( θ F ), 1 > F = < θ F, L F (1) > QF = < θ F, 1 > QF =. The hard part of the proof of Theorem 3.2 is to show that n ( θ ˆFn θ F )d(q n Q F ) = o p (1). (3.12) We will prove the Donsker property for the middle part γ r F of θ F and only indicate the very similar (simpler) proofs for the the two other parts of θ F at the end of the proof of Theorem 3.3. Since the proof is rather involved, we first sketch the general ideas. We start by defining a neighborhood, shrinking with n, such that the probability that ˆF n belongs to F n tends to 1, as n. Let, for F F, and q F (t, u, δ, γ) = δf (t) + γf (u) F (t) + (1 γ δ)1 F (u) (3.13) q F (t, u, δ, γ) = δf (t) + γf (u) F (t) + (1 δ γ)1 F (u). (3.14) It is proved in van de Geer (1996), that h 2 (q ˆFn, q F ) = O p (n 2/3 (log n) 1/3 ), (3.15) where h(q F, q F ) is the Hellinger distance between the densities q F and q F w.r.t. the product of the measure, induced by H, and counting measure on, 1 2 \ (1, 1). The result (3.15) 2

21 has already been used above, since Proposition (3.3) is based on it. Now, if F n is the set of distribution functions F F, satisfying we have: h 2 (q F, q F ) n 2/3 log n, (3.16) P r ˆF n F n 1, as n. In fact, the upper bound n 2/3 log n, defining the class F n, could be replaced by c n n 2/3 (log n) 1/3, where we only need c n, as n, but we are a little bit wasteful with our powers of log n in an attempt to avoid an accumulation of constants in the upper bounds. We need to study properties of the empirical integrals Q n ( θ F θ F ) 2 and Q n ( θ F θ G ) 2, for F, G F n. The denominators in θ F, θ F and θ G can be arbitrarily close to zero. If F F, then F (u) F (t) will be zero on a region of positive Lebesgue measure, in which case we get for the middle part γ r F of θ F : γ r F (t, u) = γ ψ F (u) ψ F (t)/(f (u) F (t). We will face these difficulties by considering three regions of integration: and C n,η (F ) = w : q F (w) > η q F (w), q F (w) > n 1/3, (3.17) D η (F ) = w : q F (w) ηq F (w), (3.18) C n (F ) = w : q F (w) n 1/3, (3.19) for some η (, 1), where the elements w of the sets, defined above, are of the form w = (t, u, δ, γ). On the region C n,η (F ), θ F has a behavior which is comparable to the behavior of θ F ; on the other regions we just use the uniform boundedness of θ F and the fact that the integrals over these regions become sufficiently small. For the entropy calculations we shall use ratios r Fk,G k, φ k of the form φ k (G k (u)) φ k (F k (t)), G k (u) F k (t) where F k and G k are distribution functions such that F k F G k ((F k, G k ) is a bracket for F ) and where φ k is a Lipschitz function approximating φ F. In this way the good behavior of the ratios r F on the region C n,η (F ) is preserved on the same region by the approximating ratio r Fk,G k, φ k. Next we will apply the chaining lemma, using a chain with a kind of funnel structure, preserving this good behavior on the region C n,η (F ). Note that the approximating ratios are outside the original class of ratios r F. In dealing with the region D η (F ), defined by (3.18), we will need the functions g F = q F q F q F + q F, (3.2) for which Lemma A.1 in van de Geer (1996) holds. This lemma, specialized to our situation, is given below for easy reference. 21

22 Lemma 3.2 Let, for F F n, S n (F ) and G n be defined by S n (F ) = q F >σ n g 2 F dq n 1/2 and G n = g F 1 qf>σn : F F n, where (we take) σ n = n 1/3. Let (ρ n ) n 1 be a non-decreasing sequence of real numbers 1. Then, given < ν < 2 and < C <, there exists an < L < depending on ν and C, such that for τ n n 1 ν 2+ν 2+ν ρn, for all n, we have S n (F ) lim sup n P r h(q F, q F ) (Lτ n ) 8 for some F F n lim sup n 4P r sup δ> ( δ ρ n ) ν H(δ, Gn, Q n ) > C where H(δ, G n, Q n ) denotes the δ-entropy of G n for the L 2 -distance w.r.t. Q n., We will also need the following two lemmas. Lemma 3.3 (i) Let the function a n be defined by a n = 1 qf >n 1/3 / q2 F. Then Q n a n = O p (log n) (3.21) (ii) Let the function b n be defined by b n = 1 qf n 1/3. Then Q n b n = O p (n 2/3 ) (3.22) Proof: Both statements are simple consequences of the Markov inequality: P r Q n a n > c log n (c log n) 1 Q F a n = (c log n) 1 q F >n 1/3 q 2 F dq F = c 1 O(1), and, likewise, P r Q n b n > c n 2/3 c 1 n 2/3 Q F b n = c 1 n 2/3 q F n 1/3 dq F = c 1 O(1). 22

23 Lemma 3.4 Let, for η (, 1), the set D η (F ) be defined by (3.18). Then sup Q n D η (F ) = O p (n 2/3 log n). (3.23) F F n Proof: We have: Q n D η (F ) (1 η) 2 qf q F q F D η(f ) 2 dqn 4(1 η) 2 Q n qf q F q F +q F 2, (3.24) where we use in the first step that q F q F q F = q F 1 q F (1 η)q F, if q F η q F. Furthermore, by Lemmas 3.2 and 3.3, qf q 2 sup F F F n q F >n 1/3 q F +q dqn F = sup gf 2 dq n = O p (n 2/3 log n). (3.25) F F n q F >n 1/3 Relation (3.25) follows from Lemma 3.2, by taking ν = 1, τ n = n 1/3 (log n) 1/2 and ρ n = log n. The metric entropy H(δ, G n, Q n ) of the class G n, with the L 2 -distance with respect to Q n, is then O p (δ 1 (log n) 1/2 ) uniformly in δ >. This is seen by first noting that q F q F q F + q F = 2q F q F + q F 1, and next that, for two distribution functions F 1 and F 2, q F1 q F2 q F1 + q F q F2 + q F q F 1 q F2. q F By the results of Birman and Solomjak (1967) or Ball and Pajor (199), applied on classes of uniformly bounded monotone functions, the δ-entropy of the class of functions q F for the L 2 -distance w.r.t. the probability measure Q n, defined by Q n (B) = B q F >n 1/3 q 2 F dq n / q F >n 1/3 q 2 F dq n, is O(δ 1 ), implying H(δ, G n, Q n ) = O p (δ 1 (log n) 1/2 ), using part (i) of Lemma 3.3. The entropy results needed here can also be found in van der Vaart and Wellner (1996), Theorem 2.7.5, p. 159; Theorem 2.6.9, p. 142 and Example , p. 149, where, in fact, log N(ɛ, F, L 2 (Q)) K/ɛ follows from Theorem by taking F in the latter to be the class of indicators 1 [,t] : t IR, with V = 2. Since, by g F 1 and part (ii) of Lemma 3.3, qf q 2 sup F F F n q F n 1/3 q F +q dqn F dq n O p (n 2/3 ), q F n 1/3 23

24 we get, in combination with (3.25), sup F F n The result now follows from (3.24) and (3.26). qf q F q F +q F 2 dqn = O p (n 2/3 log n). (3.26) From now on we will concentrate on the behavior of the middle part of θ F. Consider triples (F k, G k, φ k ), where F k and G k are distribution functions belonging to F and φ k belongs to a uniform class of Lipschitz functions on [, 1], with the same uniform Lipschitz norm c Lip and upper bound as the functions φ F, F F, of Theorem 3.1. For these triples we define r Fk,G k, φ k (t, u) = φ k (G k (u)) φ k (F k (t)) G k (u) F k (t), if G k (u) > F k (t), (3.27) and, for pairs (t, u) such that F k (t) = G k (u), we define r Fk,G k, φ k (t, u) =. Moreover, we define the semi-metric d n ((F k, G k, φ k ), (F l, G l, φ l )) = 2 max φ k (t) φ 1 l (t) t [,1] F (u) F (t)>n 1/3 F (u) F γ dq (t) 2 n 1/2 2 + Fk (t) F l (t) F (u) F (t) γ dqn + F (u) F (t)>n 1/3 F (u) F (t)>n 1/3 We now have the following result. 1/2 Gk (u) G l (u) F (u) F (t) 2 γ dqn 1/2 (3.28) Lemma 3.5 Let, for distribution functions F and G such that F G, the set C n,η (F, G) be defined by C n,η (F, G) = (t, u) : F (u) F (t) > n 1/3, G(u) F (t) ηf (u) F (t). (3.29) Then we have for all pairs of distribution functions (F k, G k ) and (F l, G l ) such that F k G k and F l G l, Q n (r Fk,G k, φ k r Fl,G l, φ l ) 2 γ1 Cn,η(F k,g k ) C n,η(f l,g l ) C 2 d n ((F k, G k, φ k ), (F l, G l, φ l )) 2, where r F,G, φ is defined by (3.27), and where C > is a constant, only depending on η (, 1) and the Lipschitz norm c Lip, corresponding to the uniform Lipschitz class of functions φ F, F F. Proof: If G k (u) F k (t) > and G l (u) F l (t) >, we have the decomposition = r Fk,G k, φ k (t, u) r Fl,G l, φ l (t, u)γ φk (G k (u)) φ k (F k (t)) G k (u) F k (t) = φ k (G k (u)) φ k (F k (t)) G k (u) F k (t) φ l (G l (u)) φ l (G l (t)) G l (u) F l (t) γ G l (u) F l (t) G k (u) (F k (t) + φ k (G k (u)) φ k (F k (t)) φ l (G l (u)) φ l (F l (t)) 24 γ G l (u) F l (t) γ G l (u) F l (t).

25 Hence Q n (r Fk,G k, φ k r Fl,G l, φ l ) 2 γ1 Cn,η(F k,g k ) C n,η(f l,g l ) c 2 Lip η 2 +η 2 c 2 Lip η 2 +2η 2 +4c 2 Lip η 2 C 2 +C 2 C n,η(f k,g k ) C n,η(f l,g l ) C n,η(f k,g k ) C n,η(f l,g l ) C n,η(f k,g k ) C n,η(f l,g l ) C n,η(f k,g k ) C n,η(f l,g l ) C n,η(f k,g k ) C n,η(f l,g l ) C n,η(f k,g k ) C n,η(f l,g l ) C n,η(f k,g k ) C n,η(f l,g l ) G l (u) F l (t) (G k (u) F k (t)) 2 (F (u) F (t)) 2 γ dq n φ k (G k (u)) φ k (F k (t)) ( φ l (G l (u)) φ l (F l (t))) 2 (F (u) F (t) 2 γ dq n G l (u) F l (t) (G k (u) (F k (t)) 2 (F (u) F (t)) 2 γ dq n φ k (G l (u)) φ k (F l (t)) ( φ l (G l (u)) φ l (F l (t))) 2 (F (u) F (t)) 2 γ dq n (G k (u) G l (u)) 2 +(F k (t) F l (t)) 2 (F (u) F (t)) 2 γ dq n (G k (u) G l (u)) 2 +(F k (t) F l (t)) 2 (F (u) F (t)) 2 γ dq n ( φ k (G l (u)) φ l (G l (u))) 2 +( φ k (F l (t)) φ l (F l (t))) 2 (F (u) F (t)) 2 γ dq n C 2 d n ((F k, G k, φ k ), (F l, G l, φ l )) 2, (3.3) where C > is a constant, only depending on η and the Lipschitz norm c Lip, corresponding to the uniform Lipschitz class of functions φ F, F F. Note that we can also apply Lemma 3.5 in the approximation of a ratio r F, by making the identification r F = r F,F, and by noting that C φf n,η (F, F ) = C n,η (F ). For the set of functions F n we now have the following theorem which finishes the proof of Theorem 3.2. Theorem 3.3 Let F n be the set of distribution functions F F, defined by (3.16). Then we have, under the conditions of Theorem 3.2, for each ɛ >, P r sup n(q n Q F )( θ F θ F ) > ɛ, as n. (3.31) F F n Proof: We will (using the notation of Pollard (1984), page 15) denote the empirical process n(q n Q F ) by E n and the symmetrized empirical process E n by En. Fix an (arbitrary) ɛ >. By the symmetrization lemma we have P r E n (r F r F )γ > ɛ for some F F n 4P r En (r F r F )γ > 1 4 ɛ for some F F n. Let ɛ > and η (, 1) be fixed in the following. We are going to show that P r En(r F r F )γ > 1 4 ɛ for some F F n ξ n, as n, (3.32) for all ξ n = ((T 1, U 1, δ 1, γ 1 ),..., (T n, U n, δ n, γ n )), 25

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1 Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and

More information

Continuous Functions on Metric Spaces

Continuous Functions on Metric Spaces Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

(1) u (t) = f(t, u(t)), 0 t a.

(1) u (t) = f(t, u(t)), 0 t a. I. Introduction 1. Ordinary Differential Equations. In most introductions to ordinary differential equations one learns a variety of methods for certain classes of equations, but the issues of existence

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

Lecture 2: Uniform Entropy

Lecture 2: Uniform Entropy STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES JONATHAN LUK These notes discuss theorems on the existence, uniqueness and extension of solutions for ODEs. None of these results are original. The proofs

More information

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations.

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations. Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differential equations. 1 Metric spaces 2 Completeness and completion. 3 The contraction

More information

Problem List MATH 5143 Fall, 2013

Problem List MATH 5143 Fall, 2013 Problem List MATH 5143 Fall, 2013 On any problem you may use the result of any previous problem (even if you were not able to do it) and any information given in class up to the moment the problem was

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Problem Set 5: Solutions Math 201A: Fall 2016

Problem Set 5: Solutions Math 201A: Fall 2016 Problem Set 5: s Math 21A: Fall 216 Problem 1. Define f : [1, ) [1, ) by f(x) = x + 1/x. Show that f(x) f(y) < x y for all x, y [1, ) with x y, but f has no fixed point. Why doesn t this example contradict

More information

arxiv:math/ v2 [math.st] 17 Jun 2008

arxiv:math/ v2 [math.st] 17 Jun 2008 The Annals of Statistics 2008, Vol. 36, No. 3, 1031 1063 DOI: 10.1214/009053607000000974 c Institute of Mathematical Statistics, 2008 arxiv:math/0609020v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH

More information

Functional Analysis I

Functional Analysis I Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker

More information

Partial Differential Equations

Partial Differential Equations Part II Partial Differential Equations Year 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2015 Paper 4, Section II 29E Partial Differential Equations 72 (a) Show that the Cauchy problem for u(x,

More information

for all x,y [a,b]. The Lipschitz constant of f is the infimum of constants C with this property.

for all x,y [a,b]. The Lipschitz constant of f is the infimum of constants C with this property. viii 3.A. FUNCTIONS 77 Appendix In this appendix, we describe without proof some results from real analysis which help to understand weak and distributional derivatives in the simplest context of functions

More information

Basic Properties of Metric and Normed Spaces

Basic Properties of Metric and Normed Spaces Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion

More information

Nonlinear Analysis 71 (2009) Contents lists available at ScienceDirect. Nonlinear Analysis. journal homepage:

Nonlinear Analysis 71 (2009) Contents lists available at ScienceDirect. Nonlinear Analysis. journal homepage: Nonlinear Analysis 71 2009 2744 2752 Contents lists available at ScienceDirect Nonlinear Analysis journal homepage: www.elsevier.com/locate/na A nonlinear inequality and applications N.S. Hoang A.G. Ramm

More information

Math212a1413 The Lebesgue integral.

Math212a1413 The Lebesgue integral. Math212a1413 The Lebesgue integral. October 28, 2014 Simple functions. In what follows, (X, F, m) is a space with a σ-field of sets, and m a measure on F. The purpose of today s lecture is to develop the

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Simple Integer Recourse Models: Convexity and Convex Approximations

Simple Integer Recourse Models: Convexity and Convex Approximations Simple Integer Recourse Models: Convexity and Convex Approximations Willem K. Klein Haneveld Leen Stougie Maarten H. van der Vlerk November 8, 005 We consider the objective function of a simple recourse

More information

Lecture 4: Numerical solution of ordinary differential equations

Lecture 4: Numerical solution of ordinary differential equations Lecture 4: Numerical solution of ordinary differential equations Department of Mathematics, ETH Zürich General explicit one-step method: Consistency; Stability; Convergence. High-order methods: Taylor

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

INVERSE FUNCTION THEOREM and SURFACES IN R n

INVERSE FUNCTION THEOREM and SURFACES IN R n INVERSE FUNCTION THEOREM and SURFACES IN R n Let f C k (U; R n ), with U R n open. Assume df(a) GL(R n ), where a U. The Inverse Function Theorem says there is an open neighborhood V U of a in R n so that

More information

APPLICATIONS OF DIFFERENTIABILITY IN R n.

APPLICATIONS OF DIFFERENTIABILITY IN R n. APPLICATIONS OF DIFFERENTIABILITY IN R n. MATANIA BEN-ARTZI April 2015 Functions here are defined on a subset T R n and take values in R m, where m can be smaller, equal or greater than n. The (open) ball

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Lebesgue measure and integration

Lebesgue measure and integration Chapter 4 Lebesgue measure and integration If you look back at what you have learned in your earlier mathematics courses, you will definitely recall a lot about area and volume from the simple formulas

More information

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS By Piet Groeneboom and Geurt Jongbloed Delft University of Technology We study nonparametric isotonic confidence intervals for monotone functions.

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Equilibria with a nontrivial nodal set and the dynamics of parabolic equations on symmetric domains

Equilibria with a nontrivial nodal set and the dynamics of parabolic equations on symmetric domains Equilibria with a nontrivial nodal set and the dynamics of parabolic equations on symmetric domains J. Földes Department of Mathematics, Univerité Libre de Bruxelles 1050 Brussels, Belgium P. Poláčik School

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Real Analysis Qualifying Exam May 14th 2016

Real Analysis Qualifying Exam May 14th 2016 Real Analysis Qualifying Exam May 4th 26 Solve 8 out of 2 problems. () Prove the Banach contraction principle: Let T be a mapping from a complete metric space X into itself such that d(tx,ty) apple qd(x,

More information

Green s Functions and Distributions

Green s Functions and Distributions CHAPTER 9 Green s Functions and Distributions 9.1. Boundary Value Problems We would like to study, and solve if possible, boundary value problems such as the following: (1.1) u = f in U u = g on U, where

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

An elementary proof of the weak convergence of empirical processes

An elementary proof of the weak convergence of empirical processes An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical

More information

1. Bounded linear maps. A linear map T : E F of real Banach

1. Bounded linear maps. A linear map T : E F of real Banach DIFFERENTIABLE MAPS 1. Bounded linear maps. A linear map T : E F of real Banach spaces E, F is bounded if M > 0 so that for all v E: T v M v. If v r T v C for some positive constants r, C, then T is bounded:

More information

2 A Model, Harmonic Map, Problem

2 A Model, Harmonic Map, Problem ELLIPTIC SYSTEMS JOHN E. HUTCHINSON Department of Mathematics School of Mathematical Sciences, A.N.U. 1 Introduction Elliptic equations model the behaviour of scalar quantities u, such as temperature or

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

t y n (s) ds. t y(s) ds, x(t) = x(0) +

t y n (s) ds. t y(s) ds, x(t) = x(0) + 1 Appendix Definition (Closed Linear Operator) (1) The graph G(T ) of a linear operator T on the domain D(T ) X into Y is the set (x, T x) : x D(T )} in the product space X Y. Then T is closed if its graph

More information

2 Statement of the problem and assumptions

2 Statement of the problem and assumptions Mathematical Notes, 25, vol. 78, no. 4, pp. 466 48. Existence Theorem for Optimal Control Problems on an Infinite Time Interval A.V. Dmitruk and N.V. Kuz kina We consider an optimal control problem on

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

The Arzelà-Ascoli Theorem

The Arzelà-Ascoli Theorem John Nachbar Washington University March 27, 2016 The Arzelà-Ascoli Theorem The Arzelà-Ascoli Theorem gives sufficient conditions for compactness in certain function spaces. Among other things, it helps

More information

Lecture Notes on PDEs

Lecture Notes on PDEs Lecture Notes on PDEs Alberto Bressan February 26, 2012 1 Elliptic equations Let IR n be a bounded open set Given measurable functions a ij, b i, c : IR, consider the linear, second order differential

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following

More information

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION O. SAVIN. Introduction In this paper we study the geometry of the sections for solutions to the Monge- Ampere equation det D 2 u = f, u

More information

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1 Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes

Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes This is page 5 Printer: Opaque this Aad van der Vaart and Jon A. Wellner ABSTRACT We show that the P Glivenko property

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

On a Class of Multidimensional Optimal Transportation Problems

On a Class of Multidimensional Optimal Transportation Problems Journal of Convex Analysis Volume 10 (2003), No. 2, 517 529 On a Class of Multidimensional Optimal Transportation Problems G. Carlier Université Bordeaux 1, MAB, UMR CNRS 5466, France and Université Bordeaux

More information

OPTIMALITY CONDITIONS AND ERROR ANALYSIS OF SEMILINEAR ELLIPTIC CONTROL PROBLEMS WITH L 1 COST FUNCTIONAL

OPTIMALITY CONDITIONS AND ERROR ANALYSIS OF SEMILINEAR ELLIPTIC CONTROL PROBLEMS WITH L 1 COST FUNCTIONAL OPTIMALITY CONDITIONS AND ERROR ANALYSIS OF SEMILINEAR ELLIPTIC CONTROL PROBLEMS WITH L 1 COST FUNCTIONAL EDUARDO CASAS, ROLAND HERZOG, AND GERD WACHSMUTH Abstract. Semilinear elliptic optimal control

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm Chapter 13 Radon Measures Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm (13.1) f = sup x X f(x). We want to identify

More information

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t)) Notations In this chapter we investigate infinite systems of interacting particles subject to Newtonian dynamics Each particle is characterized by its position an velocity x i t, v i t R d R d at time

More information

P(E t, Ω)dt, (2) 4t has an advantage with respect. to the compactly supported mollifiers, i.e., the function W (t)f satisfies a semigroup law:

P(E t, Ω)dt, (2) 4t has an advantage with respect. to the compactly supported mollifiers, i.e., the function W (t)f satisfies a semigroup law: Introduction Functions of bounded variation, usually denoted by BV, have had and have an important role in several problems of calculus of variations. The main features that make BV functions suitable

More information

An invariance result for Hammersley s process with sources and sinks

An invariance result for Hammersley s process with sources and sinks An invariance result for Hammersley s process with sources and sinks Piet Groeneboom Delft University of Technology, Vrije Universiteit, Amsterdam, and University of Washington, Seattle March 31, 26 Abstract

More information

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis Real Analysis, 2nd Edition, G.B.Folland Chapter 5 Elements of Functional Analysis Yung-Hsiang Huang 5.1 Normed Vector Spaces 1. Note for any x, y X and a, b K, x+y x + y and by ax b y x + b a x. 2. It

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

56 4 Integration against rough paths

56 4 Integration against rough paths 56 4 Integration against rough paths comes to the definition of a rough integral we typically take W = LV, W ; although other choices can be useful see e.g. remark 4.11. In the context of rough differential

More information

Analysis II: The Implicit and Inverse Function Theorems

Analysis II: The Implicit and Inverse Function Theorems Analysis II: The Implicit and Inverse Function Theorems Jesse Ratzkin November 17, 2009 Let f : R n R m be C 1. When is the zero set Z = {x R n : f(x) = 0} the graph of another function? When is Z nicely

More information

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data

More information

MINIMAL GRAPHS PART I: EXISTENCE OF LIPSCHITZ WEAK SOLUTIONS TO THE DIRICHLET PROBLEM WITH C 2 BOUNDARY DATA

MINIMAL GRAPHS PART I: EXISTENCE OF LIPSCHITZ WEAK SOLUTIONS TO THE DIRICHLET PROBLEM WITH C 2 BOUNDARY DATA MINIMAL GRAPHS PART I: EXISTENCE OF LIPSCHITZ WEAK SOLUTIONS TO THE DIRICHLET PROBLEM WITH C 2 BOUNDARY DATA SPENCER HUGHES In these notes we prove that for any given smooth function on the boundary of

More information

Asymptotic normality of the L k -error of the Grenander estimator

Asymptotic normality of the L k -error of the Grenander estimator Asymptotic normality of the L k -error of the Grenander estimator Vladimir N. Kulikov Hendrik P. Lopuhaä 1.7.24 Abstract: We investigate the limit behavior of the L k -distance between a decreasing density

More information

On continuous time contract theory

On continuous time contract theory Ecole Polytechnique, France Journée de rentrée du CMAP, 3 octobre, 218 Outline 1 2 Semimartingale measures on the canonical space Random horizon 2nd order backward SDEs (Static) Principal-Agent Problem

More information

Some lecture notes for Math 6050E: PDEs, Fall 2016

Some lecture notes for Math 6050E: PDEs, Fall 2016 Some lecture notes for Math 65E: PDEs, Fall 216 Tianling Jin December 1, 216 1 Variational methods We discuss an example of the use of variational methods in obtaining existence of solutions. Theorem 1.1.

More information

REVIEW OF DIFFERENTIAL CALCULUS

REVIEW OF DIFFERENTIAL CALCULUS REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be

More information

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee 1 and Jon A. Wellner 2 1 Department of Statistics, Department of Statistics, 439, West

More information

S chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1.

S chauder Theory. x 2. = log( x 1 + x 2 ) + 1 ( x 1 + x 2 ) 2. ( 5) x 1 + x 2 x 1 + x 2. 2 = 2 x 1. x 1 x 2. 1 x 1. Sep. 1 9 Intuitively, the solution u to the Poisson equation S chauder Theory u = f 1 should have better regularity than the right hand side f. In particular one expects u to be twice more differentiable

More information

ON THE COMPOSITION OF DIFFERENTIABLE FUNCTIONS

ON THE COMPOSITION OF DIFFERENTIABLE FUNCTIONS ON THE COMPOSITION OF DIFFERENTIABLE FUNCTIONS M. BACHIR AND G. LANCIEN Abstract. We prove that a Banach space X has the Schur property if and only if every X-valued weakly differentiable function is Fréchet

More information

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19 Introductory Analysis I Fall 204 Homework #9 Due: Wednesday, November 9 Here is an easy one, to serve as warmup Assume M is a compact metric space and N is a metric space Assume that f n : M N for each

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

On some weighted fractional porous media equations

On some weighted fractional porous media equations On some weighted fractional porous media equations Gabriele Grillo Politecnico di Milano September 16 th, 2015 Anacapri Joint works with M. Muratori and F. Punzo Gabriele Grillo Weighted Fractional PME

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

7 Complete metric spaces and function spaces

7 Complete metric spaces and function spaces 7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m

More information

Regularity for Poisson Equation

Regularity for Poisson Equation Regularity for Poisson Equation OcMountain Daylight Time. 4, 20 Intuitively, the solution u to the Poisson equation u= f () should have better regularity than the right hand side f. In particular one expects

More information

Defining the Integral

Defining the Integral Defining the Integral In these notes we provide a careful definition of the Lebesgue integral and we prove each of the three main convergence theorems. For the duration of these notes, let (, M, µ) be

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Chapter 4. Inverse Function Theorem. 4.1 The Inverse Function Theorem

Chapter 4. Inverse Function Theorem. 4.1 The Inverse Function Theorem Chapter 4 Inverse Function Theorem d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d dd d d d d This chapter

More information

Asymptotic statistics using the Functional Delta Method

Asymptotic statistics using the Functional Delta Method Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random

More information

Lecture notes on Ordinary Differential Equations. S. Sivaji Ganesh Department of Mathematics Indian Institute of Technology Bombay

Lecture notes on Ordinary Differential Equations. S. Sivaji Ganesh Department of Mathematics Indian Institute of Technology Bombay Lecture notes on Ordinary Differential Equations S. Ganesh Department of Mathematics Indian Institute of Technology Bombay May 20, 2016 ii IIT Bombay Contents I Ordinary Differential Equations 1 1 Initial

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Riemann integral and volume are generalized to unbounded functions and sets. is an admissible set, and its volume is a Riemann integral, 1l E,

Riemann integral and volume are generalized to unbounded functions and sets. is an admissible set, and its volume is a Riemann integral, 1l E, Tel Aviv University, 26 Analysis-III 9 9 Improper integral 9a Introduction....................... 9 9b Positive integrands................... 9c Special functions gamma and beta......... 4 9d Change of

More information

CHAPTER 6. Differentiation

CHAPTER 6. Differentiation CHPTER 6 Differentiation The generalization from elementary calculus of differentiation in measure theory is less obvious than that of integration, and the methods of treating it are somewhat involved.

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

University of Sheffield. School of Mathematics and Statistics. Metric Spaces MAS331/6352

University of Sheffield. School of Mathematics and Statistics. Metric Spaces MAS331/6352 University of Sheffield School of Mathematics and Statistics Metric Spaces MAS331/635 017 18 1 Metric spaces This course is entitled Metric spaces. You may be wondering what such a thing should be. We

More information