Maximum Likelihood Estimation under Shape Constraints

Size: px
Start display at page:

Download "Maximum Likelihood Estimation under Shape Constraints"

Transcription

1 Maximum Likelihood Estimation under Shape Constraints Hanna K. Jankowski June 2 3, 29 Contents 1 Introduction 2 2 The MLE of a decreasing density Finding the Estimator Consistency Inconsistency at Zero Local Asymptotic Theory Why Do We See a Cube Root Rate? Proofs and the Switching Relation Some Global Asymptotic Results Exercises The Nonparametric MLE for Current Status Data Finding the Estimator A Maximal Intersection Approach A Convex Minorant Approach Consistency and Asymptotics Exercises Disclaimer 29 1

2 1 Introduction The estimation of functions such as densities, hazard rates, cumulative distributions or regression functions plays a large role in statistical inference. Often, it is quite natural to assume that the function of interest has a certain shape. For example, in economics it is often assumed that the functional form of the model is convex (Chak et al., 25). In survival analysis, we can assume that the hazard function is bathtub, or U-shaped. That is, it is first decreasing and then increasing. Heuristically, bathtub shaped hazards correspond to lifetime distributions with high initial hazard (or infant mortality), lower and often rather constant hazard during the middle of life, and then increasing hazard of failure (or wear out) as aging proceeds (see, for example, Jankowski and Wellner (29) and the references therein). As yet another example, functions such as the cumulative distribution function are naturally shape constrained to be increasing 1. Many different shape assumptions may be made. Here, we restrict ourselves to a discussion of the k monotone family of density functions (in fact, we will shortly narrow our focus even further). A density f is k monotone if ( 1) m f (m) is decreasing and convex (1.1) for all m {,..., k 2} when k 2, and if it is decreasing when k = 1. Note that for k = 2, we are looking at the family of convex decreasing densities. If a density satisfies (1.1) for all m, then it is said to be completely monotone. The study of decreasing densities began with the seminal work of Grenander (1956), who studied the maximum likelihood estimator (MLE) of a decreasing density and a decreasing hazard function (motivated by problems of survival analysis in actuarial science). Consistency of the MLE was shown in Marshall and Proschan (1965), and Prakasa Rao (197) later established the local asymptotic distribution theory at a fixed point. Notably, convergence here occurs at rate n 1/3. The estimators for k = 2 were considered in Groeneboom et al. (21b,a), and the general k 1 case was studied by Balabdaoui and Wellner (27) with convergence rates of n k/(2k+1), although several open problems still remain here. For the completely monotone setting we currently know that the MLE exists and is consistent (Jewell, 1982). One interesting fact about the k monotone family of densities is their characterization via mixtures. Proposition 1.1. A density f is k monotone (completely monotone) if and only if it can be represented as a scale mixture of Beta(1, k) (exponential) densities: { f(x) = y 1 (1 x/(ky)) k 1 + dµ(y) 1 k < y 1 exp ( x/y) dµ(y) k =, 1 Throughout, we say decreasing in lieu of non increasing and strictly decreasing for decreasing. A similar language will be used for increasing, positive and negative functions or variables. 2

3 for some probability measure µ. For a proof of this fact see Williamson (1956); Lévy (1962) and Gneiting (1999) when k < and Widder (1941); Gneiting (1998), and Feller (1971, pages ), when k =. We note that appropriately modified versions of these characterizations hold true for other functions, and not just densities. In this short course we focus on two specific examples from the k = 1 family. We will study the nonparametric maximum likelihood estimator for two specific examples: 1. The MLE for a decreasing density on [, ). 2. The MLE of the cumulative distribution function for current status (interval censoring case 1) data. 2 The MLE of a decreasing density By Proposition 1.1, every decreasing density is a mixture of uniform densities. That is, for some probability measure µ, f(x) = y 1 I [,y] (x)dµ(y). (2.1) Suppose that we observe n independent realizations of a random variable from a decreasing density on [, ). The likelihood is then L (f) = n f(x i ), and we assume (without loss of generality) that the observed x i values are ordered. Without any shape restrictions on the density, the MLE of f would be f(x) = 1 n i=1 n δ xi (x), where δ y is the Dirac delta function. This estimator is weakly consistent and converges at a rate n 1/2, however, it is not even a density! i=1 2.1 Finding the Estimator Our problem is then to maximize L (f) = n f(x i ), i=1 3

4 Figure 1: MLE via least concave majorants: MLE for the uniform (top) and exponential (bottom) distributions when n = 1. On the left is shown the cdf along with the least convex majorant. On the right is the resulting density (black) along with the true density (red). subject to f(x) = 1 and f being decreasing. This may be accomplished by a series of steps. STEP 1: Note that the likelihood only looks at the values of the function f at the points x 1 x 2... x n. Consider any decreasing function f over the interval (x i 1, x i ] (where we take x = ). The likelihood increases if the value of f(x i ) increases, and the 4

5 value of f on (x i 1, x i ) has no impact on the likelihood. Therefore, if f maximizes L, it must be constant on (x i 1, x i ]. We have thus learned that the MLE is a left continuous, piecewise constant function with jumps only at the observed data points. STEP 2: Let y i = f(x i ) for i = 1,..., n. By the above step, to find the MLE we need to find the the values y 1,..., y n which maximize L = n y i, subject to y 1 y 2... y n, y i for all i, and n i=1 y i(x i x i 1 ) = 1. We have therefore reduced the question from an infinite dimensional problem to an n dimensional one. Moreover, since the function l = log L is continuous and concave in y, a unique solution exists. STEP 3: To find the constrained maximum of L (or, equivalently l = log L ), one may first try to use the Lagrange method. Unfortunately, this approach will not work here, as the answer typically lies on the boundary of the domain. We can still learn something quite useful though about the MLE via the Lagrange approach, when we constrain the problem to lie in the interior of the domain. An answer lies on the boundary of the domain if for some i, y i+1 = y i. Suppose that we know apriori that the solution is made up of only p unique values of y. That is, i=1 y 1 = y 2 =... = y i1 = ϕ 1 y i1 +1 = y i1 +2 =... = y i2 = ϕ 2 y ip 1 +1 = y ip 1 +2 =... = y ip = ϕ p, where p and the locations i 1,..., i p are known ahead of time, and only the values of ϕ 1,..., ϕ p are unknown. We assume that i p = n and that ϕ 1 > ϕ 2 >... > ϕ p. Since we have constrained the problem to lie away from the boundary, we may now use a Lagrange approach to find the values of ϕ j for j = 1,..., p. The reduced problem is now to maximize L = p j=1 ϕ (i j i j 1 ) j, (taking i = ), subject to p j=1 ϕ j(x ij x ij 1 ) = 1. This gives the answer that ϕ j = 1 (i j i j 1 ) n (x ij x ij 1 ). 5.

6 Thus, we have reduced the problem to finding the values of i 1,..., i p. Note that part of finding the correct values of i 1,..., i p is the stipulation that the ϕ values are strictly decreasing. STEP 4: Suppose that we do know the values of i 1,..., i p, and let F n denoted the cdf associated with the MLE, found by integrating the density f. A simple calculation shows that F n (x ij ) = i j n (2.2) for j = 1,..., p. Let F n denote the empirical cumulative distribution function of the data. That is, F n (x) = 1 n I [,xi ](x). n We may re write (2.2) as i=1 F n (x ij ) = F n (x ij ), j = 1,..., p. We also know that F n is a concave function, as the MLE must be decreasing. We will next prove that F n (x) F n (x). (2.3) Proof of (2.3). As F n is piecewise linear, it is enough to show that (2.3) holds at all data points. To wit, suppose that there exists some x v such that F n (x v ) < F n (x v ). Let f be the density of F n as given above. Then x v falls in some interval (x ij 1, x ij ). Let m = i j, m = i j 1. Let f be another decreasing density, which is equal to f, except on the interval (x m, x m ]. We define f to be equal to (f + ε/δ 1 ) on (x m, x m ], and (f ε/δ 2 ) on (x v, x m ], where δ 1 = x v x m, and δ 2 = x m x v. Notice that It follows that F n (x v ) F n (x v ) = F n (x v ) F n (x v ) ( F n (x m ) F n (x m )) = δ 1 m m v m δ 1 + δ 2 n n = 1 n(δ 1 + δ 2 ) (δ 1(m v) δ 2 (v m)). v m δ 1 m v δ 2 >. (2.4) 6

7 To finish, consider the difference of the likelihoods at f and f. This is equal to (a positive constant times) ( v m (f + ε/δ 1 ) v m (f ε/δ 2 ) m v f m m = ε m v ) + O(ε 2 ). δ 1 δ 2 Therefore, for ε small enough, we have found a new decreasing density with a larger likelihood. As this is impossible, it must be that (2.3) holds. STEP 5: Let s collect what we have so far. We know that F n (x) F n (x) for all x, with equality at only a few special locations x ij for j = 1,..., p. Moreover, Fn (x) is concave by definition. Thus, it is a concave majorant. Since it is linear in between the touchpoints, it must be the least concave majorant. Theorem 2.1. The maximum likelihood estimator of a decreasing density on [, ) is the left derivative of the least concave majorant of the empirical cumulative distribution function F n. Here, we define f n () = f n (+). This result gives us an easy graphical representation of the MLE. Some examples are shown in Figure 1. Note that the left derivative of the piecewise linear majorant of F n is exactly what is needed to recover the left continuous MLE. 2.2 Consistency Figure 2 shows an example of the MLE for n = 1 for the exponential and uniform distributions. In general, the estimator appears to have really nice behaviour (especially considering the flexibility of the nonparametric MLE). The only problems appear near zero. Indeed, as we will see later, the Grenander estimator is not consistent at zero. Theorem 2.2. Suppose that X 1, X 2,... are IID random variables with a decreasing density f on [, ). Then the MLE f n is uniformly consistent on closed intervals bounded away from zero: that is, for each c >, we have almost surely. sup f n (x) f (x) c x< The first proof of this fact appeared in Marshall and Proschan (1965). Here, we present a proof based on a general method first developed in Jewell (1982). 7

8 Figure 2: The MLE (black) and the true density (red) for the uniform and exponential densities when n = 1. Proof. The proof relies on convex optimization, and we must first set up the appropriate notation. Notice that the log likelihood may be written as l(f) = n log f(x i ) = i=1 log f(x)df n (x). The MLE is the maximum of l(f) over the space K 1, the space of positive, decreasing functions on [, ) which integrate to one (that is, the space of decreasing densities). To find the MLE we may also minimize l(f) over K 1. It turns out that it s a little bit easier if we relax the minimization. Let K denote the space of positive and decreasing functions on [, ). Next, define the function ϕ(f) = log f(x)df n (x) + f(x)dx. (2.5) It turns out that the function which minimizes ϕ(f) has integral equal to one, and is therefore the MLE. We will proof this fact in the first step below. The remaining steps prove the entire result. STEP 1: Consider minimizing the function ϕ, and let f denote it s minimum. We know that for any other function g K, ϕ(g) ϕ( f). Let ε >. If f ± εg K, then similarly 8

9 we have that ϕ( f ± εg) ϕ( f). It follows that the directional derivative defined as is also positive. g ϕ( f) ϕ( = lim f ± εg) ϕ( f) ε ε g(x) = f(x) df n(x) + g(x)dx Choosing g = f, we see that for all ε sufficiently small (1 ± ε) f K, and therefore, fϕ( f) =. This implies that = = 1 + f(x) f(x) df n(x) + f(x)dx, f(x)dx or, f(x)dx = 1. It follows that to find the MLE we may simply minimize ϕ(f) over K. STEP 2: Recall that f is the true density of the observed data, and let F (x) = x f (x)dx. It is well known that sup x F n (x) F (x) almost surely. Let Ω Ω denote the space where this convergence occurs, and note that P (Ω ) = 1. STEP 3: Recall the characterization of a decreasing density (2.1). It follows that f(x) = = x y 1 I [,y] (x)dµ(y) y 1 dµ(y) 1 x. This tells us that any decreasing density has an upper bound for each x >. Fix ω Ω (for the remainder of the proof!), and consider the sequence { f n }. For each x >, we can find a subsequence {n k } such that f nk (x) converges (since f n (x) is bounded above by 1/x). Using a standard diagonalization procedure, we conclude that there exists a further subsequence, which we denote for simplicity as { f n }, and a function f, such that f n (x) f(x) for x >. Note that f(x) must also be positive and decreasing. Moreover, by Fatou s lemma, it satisfies f(x)dx 1. (2.6) 9

10 STEP 4: Choosing g = f in the directional derivative defined in STEP 1 above, we find that (since for any ε >, f + εf K and since f ϕ( f n ) ) f (x) f n (x) df n(x) f (x) = 1. (2.7) STEP 5: Fix δ >, and let η δ = F 1 (1 δ). We may assume that δ is small enough such that δ < η δ. From (2.7), it follows that there exists a finite number τ δ, such that f n (η δ ) τ δ, for all sufficiently large n. This implies that sup δ x ηδ 1/ f n (x) 1/ f(x) as n. Thus, by applying the general form of the dominated convergence theorem and (2.7), we find that ηδ δ f (x) f(x) df (x) 1 Since δ was chosen arbitrarily small, we may let δ to obtain that f (x) f(x) df (x) = f 2 (x) dx 1. (2.8) f(x) STEP 6: Finally, fix ε >. Using (2.6) and (2.8), we have that = 1/ε ε 1/ε ε 1/ε 2 2 (f (x) f(x)) 2 dx f(x) f 2 (x) 1/ε f(x) dx 2 f (x)dx + ε f (x)dx. ε 1/ε ε f(x)dx Letting ε the righthand side converges to zero, and therefore we find that f(x) = f (x) for all x >. We have thus shown that for any ω Ω, where P (Ω ) = 1, there exists a subsequence of fn (x) which converges to f (x) pointwise for x >. Since the functions in question are decreasing densities, we immediately also obtain uniform convergence over [c, ). This implies that sup x [c, ) f n (x) f (x) on Ω, and completes the proof. Remark 2.3. The approach taken in STEP 1 above to show that f(x)dx = 1 may be extended by considering other functions g. That is, we can plug in different g into the inequality g ϕ( f) as long as f ± εg K. This yields a characterization of the MLE 1

11 in terms of a series of inequalities. For the Grenander estimator, nothing beats the least convex minorant characterization. However, for estimators under other constraints (e.g. k = 2), these characterizations turn out to be the best, and only, descriptions available of the estimators. 2.3 Inconsistency at Zero Many shape constrained estimators of the type we are considering here are inconsistent at x = (or at other endpoints). For the Grenander estimator, this problem was first studied in Woodroofe and Sun (1993) when f () is bounded. We stipulate here that f () = f (+), and use the same convention for the MLE itself. The results of Woodroofe and Sun (1993) were later extended to other situations in Balabdaoui et al. (29), where different behaviour near zero of the true density is considered. For the k = 2 case, see Groeneboom et al. (21b) where the estimator of a convex density is not consistent at zero, and Jankowski and Wellner (29) where the estimator of a convex hazard function over [, T ] is not consistent at either or T. Woodroofe and Sun (1993) also propose an alternative to the Grenander estimator near zero, which is consistent, and a similar approach is taken in Balabdaoui (27) for the general k monotone density estimator. Why is this problem even interesting? As one example, suppose that the times between failures of a system are IID random variables Y 1,..., Y n which have a common distribution G with a finite positive mean µ. Fix a time t, and let N N t denote the largest n such that Y Y n t (if the Y i s were exponential then N t would be the Poisson process). The variable N is the number of failures observed prior to time t. Suppose that we observe only X = t (Y Y N ), the time since the last breakdown. Then, for large t, we may approximate the density of X as f(x) = 1 [1 G(x )], µ which is decreasing since G is a cdf. Also, f() = 1/µ, which is the natural quantity of interest in this problem. Other examples are given in Balabdaoui (27); Balabdaoui et al. (29); Woodroofe and Sun (1993). Theorem 2.4. Suppose that < f () <, and let N(t) denote a rate 1 Poisson process. Then f n () f () sup N(t) t> t d = 1 U, where U is a uniform random variable on the unit interval. 11

12 Sketch of Proof. By the characterization of the MLE via the least convex majorant, we know that f n () = sup t> = sup t> F n (t) t nf n (t/n). t Note that nf n (t/n) is a Binomial random variable with parameters n and F (t/n) n 1 f ()t, which converges to a Poisson random variable with mean f (). This proves the first part of the result. The fact that the supremum of N(t)/t has the same distribution as the reciprocal of a uniform random variable was shown for example in Pyke (1959). 2.4 Local Asymptotic Theory We now know that the MLE is a consistent estimator (except at zero). Next, we wish to know what the rate of convergence is. For the Grenander estimator, the local (and global) rate of convergence is n 1/3, which is much slower than the n 1/2 rate we typically see with parametric estimators. In a sense, this is the price one pays for the flexibility of nonparametric estimation. The local n 1/3 rate of convergence is only seen if the true density is locally strictly decreasing. In regions where the true density is flat, we obtain the usual n 1/2 rate. The two main results are as follows. Theorem 2.5. Let X 1,..., X n be independent observations from a decreasing density f on [, ). Fix a point x (, ) such that f (x ) <. Then n 1/3 1 2 f (x )f (x ) 1/3 ( f n (x ) f (x )) 2Z, where Z is the location of the maximum of the process {B(t) t 2, t R}, and B(t) is a standard two-sided Brownian motion on R such that B() =. The above result was first shown in Prakasa Rao (197). The proof we present here is based on the approach via the switching relation developed by Piet Groeneboom (see Groeneboom (1985) as well as Groeneboom (1983) and Groeneboom (1986)). The distribution of Z is known as Chernoff s distribution (after Chernoff (1964), who first showed how it arises as the limit in mode estimation). Chernoff shows that the density of Z may be written as f Z (z) = 1 2 u x(z 2, z)u x (z 2, z), 12

13 where u is the solution to the heat equation u t = u xx /2 for x < t 2, with boundary conditions u(x, t) = 1 for x t 2 and u(x, t) for x. In Groeneboom (1989) (a paper which won the Rollo Davidson prize in 1985), Groeneboom derived the density of Z in terms of the Airy function Ai(x). He showed that g(z) = u x (z 2, z) is the function with Fourier transform ĝ(s) = 2 1/3 /Ai(i2 1/3 s), s R. Exact quantiles for this distribution may be found in Groeneboom and Wellner (21). Remark 2.6. In fact, a stronger result than that stated in Theorem 2.5 holds. Under the same conditions, one can show that n 1/3 ( f n (x + n 1/3 x) f(x )) V (x), where V (x) is the left derivative of the least concave majorant of the process { f (x )B(t) ( f (x )/2)t 2, t R}. This result provides one reason of why we see the n 1/3 rate of convergence: In a neighbourhood of x, there are O(n 1/3 ) touchpoints between F n and its least convex minorant. Therefore, to get a non trivial limit, we need to re magnify space on the order of n 1/3. Theorem 2.7. Suppose that X 1,..., X n are IID Uniform[,1]. For any x (, 1), we have that n( fn (x ) 1) W, where W is the left derivative at x of the least concave majorant of the standard Brownian bridge. This result was proved in Groeneboom (1986), and extended to densities with locally flat regions in Carolan and Dykstra (1999) Why Do We See a Cube Root Rate? There are many other examples in statistics where the argmax estimator converges at a rate slower than the more typical n 1/2. One of the simplest examples of what is going on is the following. Suppose that we are interested in finding θ which maximizes P [θ 1, θ + 1] = E P [I[θ 1 X θ+1]]. To do this, we use the estimator, θ n, which maximizes the empirical version P n [θ 1, θ + 1] = 1 n n I[θ 1 X i θ + 1]. i=1 13

14 If P has a smooth density, then P [θ 1, θ + 1] is parabolic near θ, since (assuming wlog that θ < θ) 1+θ 1+θ P [θ 1, θ + 1] P [θ 1, θ + 1] = p(x)dx p(x)dx 1+θ 1+θ p(1 + θ )(θ θ ) + p (1 + θ )(θ θ ) 2 /2 p( 1 + θ )(θ θ ) p ( 1 + θ )(θ θ ) 2 /2. Since θ maximizes P [θ 1, θ + 1], we have that p(1 + θ ) = p( 1 + θ ). Also p must be increasing at 1 + θ and decreasing at 1 + θ. Therefore, P [θ 1, θ + 1] P [θ 1, θ + 1] c(θ θ ) 2, for some positive constant c. The function above describes the deterministic trend of θ n about θ. The random perturbation is equal to (P n P )[θ 1, θ + 1] (P n P )[θ 1, θ + 1], which is a centered average and therefore approximately normal with variance given by σ 2 = 1 n var (I[θ 1 X θ + 1] I[θ 1 X θ + 1]) 1 n var (I[θ 1 X θ 1] I[θ + 1 X θ + 1]) 1 n {P [θ + 1, θ + 1] + P [θ 1, θ 1]} 1 { θ+1 θ 1 } p(x)dx + p(x)dx n θ +1 1 n c (θ θ ), θ 1 for some other positive constant c. It follows that the random perturbation term is of the order O p (n 1/2 θ θ 1/2 ). In order for θ n to be the maximum, both the random perturbation term and the deterministic term must be of the same order. That is, random term + deterministic term = P n [θ 1, θ + 1] P n [θ 1, θ + 1], which must be positive for θ = θ n, since θ n is the maximizer. Therefore, O p (n 1/2 θ θ 1/2 ) c(θ θ ) 2, 14

15 which rearranges to θ θ O p (n 1/3 ), giving the n 1/3 rate. At the heart of the argument, is the function of interest g(x, θ) = I[θ 1, θ + 1](x) which is not smooth. If we had instead considered g(x, θ) = (x θ) 2, the maximization would yield the typical n 1/2 rate of convergence. The above example is taken from Kim and Pollard (199), who study many examples of cube-root asymptotics, and develop a general theory for the problem. Note that the exposition given above is not rigorous. To make the argument formal, one would also need to prove these results uniformly in θ, so that we can simply plug in the random θ = θ n. This may be done using empirical process theory. The same argument as above, but with some nice pictures, may also be found in Maathuis (27). For the Grenander estimator f n, instead of the maximizing function P n [θ 1, θ + 1] we minimize ϕ(f) as defined in (2.5). At first glance, ϕ is a smooth function, quite different from g(x, θ) = I[θ 1, θ + 1]. However, we must not forget the restriction to the class of decreasing densities. Recall that by (2.1), the density function may be written as f(x) = y 1 I [,y] (x)dµ(y). Thus, at the root of the problem, we are working with the functions g(x, y) = I[, y](x), and therefore we expect a similar behaviour as in the example considered above Proofs and the Switching Relation Define the process { } ŝ n (a) = sup x F n (x) yx = sup(f n (z) yz) z. To simplify notation, we will also write this as ŝ n (a) = argmax x {F n (x) yx}. The switching relation is the name given to the following set equality {ŝ n (a) < x} = { f n (x) < a}. (2.9) The relation was first noted in Groeneboom (1983). For a proof using convex analysis, as well as more background on the relation, see Balabdaoui et al. (29); van der Vaart and van der Laan (26). For a visual justification, see Figure 3. The advantage of the switching relation is that it shows that the least concave majorant is a continuous mapping of F n, since the argmax is a continuous map (see e.g. Ferger (24)). Thus, to find the limits, we need only to understand the underlying limiting process, and then apply the continuous mapping theorem. 15

16 Figure 3: Imagine dropping a line with slope a starting from infinity. The line will make contact with the function F n at some point. In the picture above, this line is the red line, with the least convex minorant denoted in black. The black dots denote the dots of the empirical cdf. Let x = ŝ n (a), with a being the slope of the red line. Now note that the value of f n (x) for any x > x must be smaller than a. Proof of 2.5. By the switching relation, we have that ( P n 1/3 ( f ) n (x ) f(x )) < t ( = P fn (x ) < f(x ) + n t) 1/3 = P ( ) ŝ n (f(x ) + n 1/3 t) < x = P ( n 1/3 (ŝ n (f(x ) + n 1/3 t) x ) < ), (2.1) where we have added the additional n 1/3 scaling. We next examine the term n 1/3 (ŝ n (f(x ) + n 1/3 t) x ) more closely. This is equal to n 1/3 (argmax z { Fn (x) (f(x ) + n 1/3 t)x } x ) = argmax h n 1/3 x { Fn (x + n 1/3 h) (f(x ) + n 1/3 t)(x + n 1/3 h) } = argmax h n 1/3 x { Fn (x + n 1/3 h) F n (x ) n 1/3 (f(x ) + n 1/3 t)h } = argmax h n 1/3 x { n 2/3 (F n (x + n 1/3 h) F n (x ) n 1/3 (f(x ) + n 1/3 t)h) } argmax h n 1/3 x V n (h), 16

17 where V n (h) = n 2/3 (F n (x + n 1/3 h) F n (x ) n 1/3 f(x )h) th A n + B n th, letting A n = n 2/3 ( F n (x + n 1/3 h) F n (x ) (F (x + n 1/3 h) F (x )) ) B n = n 2/3 ( F (x + n 1/3 h) F (x ) n 1/3 f(x )h ). The limit of B n, is easy. We clearly have B n = n 2/3 ( F (x + n 1/3 h) F (x ) n 1/3 f(x )h ) 1 2 f (x )h 2 by Taylor expansion. The limit of A n is also not hard to see. Let U n be the empirical process n(p n P ) when P is the uniform distribution on the unit interval, and let U be the standard Brownian bridge. Lastly, let B denote a standard Brownian motion. We then have A n = n 2/3 ( F n (x + n 1/3 h) F n (x ) (F (x + n 1/3 h) F (x )) ) d = n ( 1/6 U n (F (x + n 1/3 h)) U n (F (x )) ) n ( 1/6 U(F (x + n 1/3 h)) U(F (x )) ) d = n ( 1/6 B(F (x + n 1/3 h)) B(F (x )) B(1)(F (x + n 1/3 h) F (x )) ) n 1/6 (B(F (x + n 1/3 h)) B(F (x )) + n 1/6 f(x )B(1)h d = n 1/6 (B(F (x + n 1/3 h) F (x )) + n 1/6 f(x )B(1)h d n 1/6 (B(f(x )n 1/3 h) + n 1/6 f(x )B(1)h = f(x )B(h) + n 1/6 f(x )B(1)h f(x )B(h). It follows, by the continuous mapping theorem, that { f(x argmax h n 1/3 x V n (h) argmax h R )B(h) + 1 } 2 f (x )h 2 th. Let a = f(x ), b = f (x )/2 and c 3 = a 1 b 2. Note that, since f is strictly decreasing, b is 17

18 strictly positive. Some careful calculation shows that argmax h { ab(h) bh 2 th } { = c argmax h R ab(h/c) bc 2 h 2 tc 1 h } { } = c argmax h R ac 1 B(h) bc 2 h 2 tc 1 h = c argmax h R { (a 2 b 1 ) 1/3 (B(h) h 2 th) }, where t = t(ab) 1/3. Continuing with the calculation, we find that the last line above is equal to c argmax h R { B(h) h 2 th } = c argmax h R { B(h) (h t/2) 2} = c ( argmax h R { B(h + t/2) h 2} t/2 ) d = c ( { argmax h R B(h) + B( t/2) h 2} t/2 ) = c ( { argmax } h R B(h) h 2 t/2 ) Plugging this result back into (2.1), we find that ( P n 1/3 ( f ) n (x ) f(x )) < t P ( { argmax } h R B(h) h 2 < t/2 ) = P ( 2(ab) 1/3 argmax h R { B(h) h 2 } < t ). Since the process argmax h R {B(h) h 2 } has a unique maximum, this proves the claim. 2.5 Some Global Asymptotic Results The previous section looks at what happens to the estimator at a fixed point x. In addition to these results, there are also a number of global theorems, which give information about convergence of the estimator at more than just one point. In particular, we have the following theorem, proved in Groeneboom (1985). Theorem 2.8. Let f be a decreasing density with support [, T ], and with a bounded, continuous second derivative, such that f (x) for all x (, T ). Then n {n 1/6 1/3 f } n f 1 C N(, σ 2 ), where σ 2.17, and T C = 2E[Z] 1 2 f (x)f (x) 1/3 dx, 18

19 with Z equal to the location of the maximum of {B(h) h 2 } as above, and E[Z].41. The assumption that the true density is strictly decreasing is key to this result. For densities with regions of constancy, we observe a n local rate of convergence. The following result shows the global result for the uniform distribution. The theorem was established in Groeneboom and Pyke (1983). Theorem 2.9. Let f n be the Grenander estimator of a decreasing density when the data come from a uniform distribution on the unit interval. Then, 1 ) (3 log n) (n 1/2 ( f n (x) 1) 2 dx log n N(, 1) 2.6 Exercises 1. Show that the constrained maximization solution in Step 3 of Section 2.1 is correct. 2. I have posted R code on the website which finds the Grenander estimator (MLE) of a decreasing density (see grenander.r). Pick your favourite decreasing density and compare how well the MLE performs for different sample sizes of n. How does this relate to the theory presented in lectures? 3. The alternative characterization mentioned in Remark 2.6 is stated as follows. The function f n is the Grenander estimator if and only if it satisfies y 1 f n (x)df n (x) y, for all y, with equality at all points of discontinuity of f n. (a) Prove the necessity of these conditions. To do this, consider the directional derivative g ϕ( f n ) for values of g(x) = I[, y](x), and follow the approach described in Remark 2.6. (b) How does the choice of g(x) above relate to (2.1)? 4. In Section 2.4.1, we give a reason for the n 1/3 rate of convergence for the estimator of the location of the maximum of the function P [θ 1, θ + 1]. Repeat the argument to show that the estimator of the location of the minimum of P (X θ) 2 = E P [(X θ) 2 ] must exhibit n 1/2 convergence. 5. Use the switching relation to prove Theorem

20 3 The Nonparametric MLE for Current Status Data Medical studies, reliability theory and actuarial science often deal with the estimation of lifetime distributions. In these situations, it is typical to observe censored data of a variety of types. Here, we discuss one specific case of this data, known as current status or interval censoring case 1 data. In Hoel and Walberg (1972), the time to the onset of lung cancer in 144 male RFM mice was studied (lung tumors are predominantly non lethal in RFM mice). The mice were separated into two groups: mice subject to a conventional environment (96 subjects) and mice subject to a germ free environment (48 subjects). Each mouse was sacrificed at a random time, and at this time the presence or absence of tumor was determined. This data set is also given in Sun (26, page 6). These types of experiments are called tumorigenicity studies, and are designed to determine whether some agent or environment accelerates the time until tumor onset. In this case, we are interested in estimating the time until the onset of the tumor for both groups. However, we do not observe the exact time of onset for the subjects. We only observe whether or not the tumor was present at a random time. The observed data can be described as follows. Let X denote the failure time data with distribution F. In our example, X denotes the onset time of the lung tumor. Our goal is to estimate F. Let T denote the random observation time with distribution G. We also assume that T is independent of X. In our example, T is the time that a mouse is sacrificed. We observe n IID observations from (T, ), where = I(X T ). This type of data is called current status, as we only observe the current state of each subject, and not the actual time of failure. It is also a special type of interval censoring: for each subject, we observe only whether the failure time fell into the (random) interval [, T ] or (T, ) (see e.g. Figure 4). Specifically, this type of interval censoring is called interval censoring case 1, as there was exactly one observation time for each subject. One can also consider interval censoring case k, where there are k observation times per subject, but this is beyond the scope of this course. 2

21 δ 4 = δ 3 = 1 δ 2 = δ 1 = 1 t 1 t 2 t 3 t 4 Figure 4: An example of current status data with n = 4. We observe t = (1, 2.5, 3, 5) and δ = (1,, 1, ). 3.1 Finding the Estimator The likelihood for current status data may be written as L (F ) = n F (t i ) δ i (1 F (t i )) 1 δ i g(t i ) i=1 n F (t i ) δ i (1 F (t i )) 1 δ i. i=1 Without loss of generality, we assume that the observation times have been ordered: t 1 t 2... t n. Our goal is to maximize L (F ), or equivalently to maximize the log-likelihood, l(f ) = n δ i log(f (t i )) + (1 δ i ) log(1 F (t i )) i=1 over the space of distribution functions. Note that the likelihood depends only on the values of F at the observation times t i, and that one can change F but obtain the same likelihood. This means that the cdf F is not uniquely determined by the maximization problem. To get around this, we make the following, standard, additional assumptions. 1. We assume that the MLE is a piecewise constant function, with jumps only at the observation points. 2. If F (t n ) < 1, we do not specify the location of the remaining mass. 21

22 Under these conventions, the nonparametric MLE is uniquely determined. Let y i = F (t i ). We may now re-write the maximization problem as the n-dimensional problem of maximizing l(y) = n δ i log y i + (1 δ i ) log(1 y i ) (3.1) i=1 subject to y 1 y 2... y n 1. Suppose that δ n = 1. Then we must have that y n = 1, since this maximizes the corresponding term in the likelihood, without placing any restrictions on y n 1, say. A similar behaviour occurs if δ 1 =, when we must have that y 1 =. For this reason, we also add the following assumption. 3. We assume that δ 1 = 1 and δ n =. If this is not the case, then let i 1 = min{i : δ i = 1} and i n = max{i : δ i = }, and then y i = for all i < i 1 and y i = 1 for all i > i n for the MLE A Maximal Intersection Approach One way to think about the maximization problem is in terms of the observed intervals. Let us denote these as R 1,..., R n. That is, in the example where n = 4, t = (1, 2.5, 3, 5) and δ = (1,, 1, ), we observe the intervals R 1 = [, 1], R 2 = (2.5, ), R 3 = [, 3] and R 4 = (5, ). These intervals are shown in Figure 4. The MLE may be re-written as L (F ) = n P (R i ). i=1 It is clear that the MLE must place all of its mass on the intervals R i. However, we can reduce these sets even further. A set I is a maximal intersection if there exists a subset β {1,..., n} such that I = i β R i, but i β R i = for all strict supersets β, that is β β {1,..., n}. Alternatively, for each x R +, we could define the height map, h(x) = {i : x R i }. Then the maximal intersections are the sets formed by all local maxima of the height map. In our example, the maximal intersections are I 1 = [, 1], I 2 = (2.5, 3] and I 3 = (5, ). These intervals are shaded in grey on the x axis in Figure 4. Now, suppose that the MLE puts some mass on the interval R 3 /I 1. Since this interval shows up only once in the likelihood, but the interval I 1 shows up twice, we increase the likelihood by shifting that mass onto the interval I 1. This implies that the MLE will put mass only on the maximal intersections observed. This property was first observed in Peto (1973); Turnbull (1976). Denote the maximal intersections as I j, for j = 1,..., m, and denote the mass placed in I j as α j. We may the re-write the log likelihood in term of the 22

23 masses α j as l(α) = = n log P (R i ) i=1 ( n m ) log α j I(I j R i ), i=1 j=1 which needs to be maximized over all values α such that α α m = 1. Note that we have made the assumption that the cdf is piecewise constant with jumps at the observation times t i only. This corresponds to the assumption that the mass α j is placed in the right most point of the maximal intersection I j. In the example considered above, we find that l(α) = log α 1 + log(α 2 + α 3 ) + log(α 1 + α 2 ) + log(α 3 ), which is maximized at the boundary point α = (.5,,.5). This corresponds to putting mass 1/2 at the point t 1 = 1 and the remaining mass beyond t 4 = 5. The approach to solving the problem using maximal intersections extends nicely to other forms of interval censoring. See for example the discussion in Maathuis (27). However, in the special case of current status data, there is a better way of finding the estimator A Convex Minorant Approach We now return to the formulation of the likelihood problem given in (3.1). The following proposition gives a characterization of the MLE. The proof is given for example in Groeneboom and Wellner (1992). Recall that the shape-constrained MLE may often be found in terms of a set of equalities (and inequalities). The reason for this is that maximizing the likelihood is actually a constrained convex optimization problem, and these inequalities may be seen as a special case of Fenchel duality. A discussion of this link appears in Robertson et al. (1988, Chapter 6). Proposition 3.1. The vector ŷ maximizes l(y) given in (3.1) subject to y 1... y n 1 if and only if the following conditions hold. (C1). For all j = 1,..., n with equality if ŷ j > ŷ j 1. n i=j { δi 1 δ } i, ŷ i 1 ŷ i 23

24 (C2). n i=1 { δi ŷ i 1 δ } i =. ŷ i 1 ŷ i Moreover, ŷ which satisfies the equations above is unique. A little bit of calculation allows us to re write this characterization as follows. Corollary 3.2. The vector ŷ is the MLE if and only if j 1 {ŷ i δ i } for all j = 1,..., n, with equality holding if ŷ j > ŷ j 1. i=1 The incredible thing about this corollary is that it allows us to again characterize the MLE in terms of convex minorants (last time we had concave majorants...). Indeed, the vector ŷ is the left derivative at i = 1,..., n of the greatest convex minorant of the graph of {(j, j i=1 δ i) : j =,..., n}. This is easy to see: Define (j) = j i=1 δ i and G(j) = j i=1 ŷi, with G linear in between values of j. Clearly, G is convex and a minorant of G. That it is the greatest such minorant follows from the linearity of G between points of touch with. Also, the left derivative of G at j, is also its slope given by G(j) G(j 1) = ŷ j. The graphical characterization gives us a very simple way of finding the MLE. The example considered previously in Figure 4 is shown below using this approach, see Figure Figure 5: Finding the MLE using the greatest convex minorant approach. 24

25 Proof. Let τ and η be two successive points of jump of ŷ. That is, we have that ŷ τ 1 < ŷ τ = ŷ τ+1 =... = ŷ η 1 < ŷ η. By (C1), we also know that j 1 { δi 1 δ } i, ŷ i 1 ŷ i i=τ with strict inequality for j = τ + 1,..., η 1, and equality for j = η. We multiply the above display by ŷ i (1 ŷ i ), and since ŷ i is constant here, we obtain that j 1 { δi ŷ i (1 ŷ i ) 1 δ } j 1 i = {δ i ŷ i }, ŷ i 1 ŷ i i=τ which proves the result. Recall the tumorigenicity study of RFM mice (taken from Sun (26)) considered at the beginning of this section. The nonparametric MLE estimator of the survival functions for both groups is shown in Figure 6 below. i=τ Figure 6: Survival functions resulting from the nonparametric MLE for the RFM mice data set. The germ free mice are shown in blue (mean= 689.5), and the conventional environment in red (mean > 79.28). 25

26 The convex minorant characterization may also arise if the MLE is viewed instead as an isotonic regression. That is, we may show that the MLE may also be written as the minimizer of the least squares criterion n (δ i y i ) 2 i=1 over all y subject to our monotonicity conditions. For more information on isotonic regression and the link to convex minorants, we refer to Robertson et al. (1988, Theorem 1.2.1). 3.2 Consistency and Asymptotics F (x) Figure 7: Here X comes from an exponential distribution with mean one, and T from an exponential with mean.5. The plots show the MLE with the true cdf (red) for sample sizes n = 1, 1 and n = 1,. Figure 7 gives an example of the MLE as the sample size increases. From the figure, it appears that the estimator is consistent for the true distribution F. However, there is one condition we need for this to hold. Theorem 3.3. Let F denote the true distribution of X and G the distribution of the observation times T. Suppose also that P F P G. Then the nonparametric MLE of F, F n is consistent. That is, almost surely. sup F n (t) F (t), t R 26

27 Recall that P F P G, means that the probability measure P F induced by F is absolutely continuous with respect to the probability measure P G induced by G. That is, if there is a set A such that P G (A) =, then we must also have that P F (A) =. It is easy to see why this assumption is necessary. If G misses a region of positive probability under F, then there is no way for the estimator to learn anything about F. Notice also that the MLE is now always consistent (unlike the Grenander estimator of a decreasing density which is inconsistent at zero). This is because we have more control over cumulative distribution functions, in comparison to densities. For a proof of Theorem 3.3, see Groeneboom and Wellner (1992). The proof follows a similar method to that shown in the proof of Theorem 2.2. Next, we consider the global and local convergence rates of the estimators. Again, we find that these are on the order of n 1/3, as is true for all monotone nonparametric maximum likelihood estimators. Theorem 3.4. Let t be such that < F (t ), G(t ) < 1, and let F and G be differentiable at t, with strictly positive derivatives f (t ) and g(t ), respectively. Then, { } n 1/3 Fn (t ) F (t ) 2cZ, where c = ( 1 2 F (t )(1 F (t )) f ) 1/3 (t ), g(t ) and Z is the location of the maximum of the process {B(t) t 2, t R}, and B(t) is a standard two-sided Brownian motion on R such that B() =. Theorem 3.5. F n (x) F (x) dg(x) = O p (n 1/3 ) Theorem 3.4 is taken from Groeneboom and Wellner (1992, page 89), and Theorem 3.5 is taken from van der Vaart and Wellner (1996, page 322). For current status data, there is no known switching relation. Instead, we use the characterization of the MLE using the inequalities in Proposition 3.1 to prove the result. The idea is to use this characterization as an operator which uniquely finds the MLE as a function of some underlying process. The operator is then shown to be continuous, and the limit of the underlying process is found. The details of the proofs are given in the references. It is also possible to state stronger results for F n (t ) similar to that of Remark 2.6 (see e.g. Theorem 4.2 in Maathuis (27)). However, the statement of Theorem 3.4 is particularly nice, in that the constant c is revealing in its dependence on g(t ). We already know that we 27

28 need P F P G for consistency to hold, otherwise G undersamples regions of F. On a local level, it would be optimal for G also not to oversample relative to F. We see this in the term f (t )/g(t ) which appears in the constant c. The dependence of c on F (t )(1 F (t )) is natural and to be expected. 3.3 Exercises 1. Prove the necessity of the conditions (C1) and (C2) of Proposition 3.1. (a) Define the directional derivative of the likelihood function as and show that it is equal to y l(ŷ) = lim ε ε 1 (l(ŷ + εy) l(ŷ)), y l(ŷ) = n i=1 y i { } δ i ŷ i 1 δ i ŷ i. (b) Define Y = {y R n : y 1... y n 1}. Show that if ŷ + εy Y, then y l(ŷ), and that if ŷ ± εy Y, then y l(ŷ) =. (c) Define e (i) = {,...,, 1,..., 1} so that there are a total of (i 1) zeros in the vector. Now consider the functions y = ŷ, and e (i) = {,...,, 1,..., 1}. Plugging these into part (b) above, should yield conditions (C1) and (C2). (d) Recall the characterization of decreasing densities given in (2.1). Write a similar characterization for all increasing functions. Can you relate the choice of the vectors e (i) to this characterization? 2. Posted on the website is R code to find the nonparametric MLE of current status data using the graphical representation (see status.r). I have also posted the dataset hepa.txt which includes Hepatitis A data taken from Keiding (1991). The data consists of a cross sectional survey of people of various ages who had their blood tested for the presence of Hepatitis A antibodies. The data gives the values of age (age) the number of people with the antibodies present (npos) and the total number tested of a given age (total). Find the nonparametric MLE for the cdf of the random time that one develops the Hepatitis A antibodies. 3. For the RFM mice example, we found the MLE of the cdf, and used this to calculate the means for the two groups (CE=conventional environment, GF= germ free) using µ CE = µ GF = 28 (1 F CE (x))dx (1 F GF (x))dx.

29 See Figure 6 for the exact values. (a) We state that µ CE > Why not report an equality? (b) What does the assumption that the MLE F n is piecewise constant with all mass placed to right of the maximal intersection imply about µ n calculated as (1 F n (x))dx? (c) Is µ n a consistent estimator of (1 F (x))dx? 4. In Figure 7 we showed an example of the nonparametric MLE for current status data. In the example, both X and T are exponential with means 1 and.5, respectively. Recall that we say that X is stochastically greater than T (and write T X) if F T (x) F X (x) for all x. Would this additional information have any impact on the nonparametric MLE? 4 Disclaimer In this short course I have tried to give a flavour of the work done in shape constrained nonparametric likelihood estimation. The notes are, as expected, far from complete. Hopefully, the references given will provide a good start for those interested in different aspects of the theory presented here. In particular, I recommend Groeneboom and Wellner (1992) for the more theoretical aspects of the results given in Section 3, and Maathuis (27) for some accessible level lecture notes on interval censoring and the nonparametric MLE. References Balabdaoui, F. (27). Consistent estimation of a convex density at the origin. Math. Methods Statist Balabdaoui, F., Jankowski, H., Pavlides, M., Seregin, A. and Wellner, J. (29). On the Grenander estimator at zero. Tech. Rep. 554, University of Washington. Balabdaoui, F. and Wellner, J. A. (27). Estimation of a k-monotone density: limit distribution theory and the spline connection. Ann. Statist Carolan, C. and Dykstra, R. (1999). Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist Chak, P. M., Madras, N. and Smith, B. (25). Semi-nonparametric estimation with bernstein polynomials. Economics Letters

30 Chernoff, H. (1964). Estimation of the mode. Ann. Inst. Statist. Math Feller, W. (1971). An introduction to probability theory and its applications. Vol. II. Second edition, John Wiley & Sons Inc., New York. Ferger, D. (24). A continuous mapping theorem for the argmax-functional in the nonunique case. Statist. Neerlandica Gneiting, T. (1998). On the Bernstein-Hausdorff-Widder conditions for completely monotone functions. Exposition. Math Gneiting, T. (1999). Radial positive definite functions generated by Euclid s hat. J. Multivariate Anal Grenander, U. (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr (1957). Groeneboom, P. (1983). The concave majorant of Brownian motion. Ann. Probab Groeneboom, P. (1985). Estimating a monotone density. In Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983). Wadsworth Statist./Probab. Ser., Wadsworth, Belmont, CA. Groeneboom, P. (1986). Some current developments in density estimation. In Mathematics and computer science (Amsterdam, 1983), vol. 1 of CWI Monogr. North-Holland, Amsterdam, Groeneboom, P. (1989). Brownian motion with a parabolic drift and Airy functions. Probab. Theory Related Fields Groeneboom, P., Jongbloed, G. and Wellner, J. A. (21a). A canonical process for estimation of convex functions: the invelope of integrated Brownian motion +t 4. Ann. Statist Groeneboom, P., Jongbloed, G. and Wellner, J. A. (21b). Estimation of a convex function: characterizations and asymptotic theory. Ann. Statist Groeneboom, P. and Pyke, R. (1983). Asymptotic normality of statistics based on the convex minorants of empirical distribution functions. Ann. Probab Groeneboom, P. and Wellner, J. A. (1992). Information bounds and nonparametric maximum likelihood estimation, vol. 19 of DMV Seminar. Birkhäuser Verlag, Basel. 3

31 Groeneboom, P. and Wellner, J. A. (21). Computing Chernoff s distribution. J. Comput. Graph. Statist Hoel, D. and Walberg, H. (1972). Statistical analysis of survival experiments. Journal of the National Cancer Institute Jankowski, H. and Wellner, J. (29). Nonparametric estimation of a convex bathtubshaped hazard function. Bernoulli To appear (available online). Jewell, N. P. (1982). Mixtures of exponential distributions. Ann. Statist Keiding, N. (1991). Age-specific incidence and prevalence: a statistical perspective. J. Roy. Statist. Soc. Ser. A With discussion. Kim, J. and Pollard, D. (199). Cube root asymptotics. Ann. Statist Lévy, P. (1962). Extensions d un théorème de D. Dugué et M. Girault. Z. Wahrscheinlichkeitstheorie Verw. Gebiete Maathuis, M. (27). Survival analysis for interval censored data: lecture notes. Available online: maathuis/teaching/. Marshall, A. W. and Proschan, F. (1965). Maximum likelihood estimation for distributions with monotone failure rate. Ann. Math. Statist Peto, R. (1973). Experimental survival curves for interval censored data. Applied Statistics Prakasa Rao, B. L. S. (197). Estimation for distributions with monotone failure rate. Ann. Math. Statist Pyke, R. (1959). The supremum and infimum of the Poisson process. Ann. Math. Statist Robertson, T., Wright, F. T. and Dykstra, R. L. (1988). Order restricted statistical inference. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons Ltd., Chichester. Sun, J. (26). The statistical analysis of interval-censored failure time data. Statistics for Biology and Health, Springer, New York. Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. J. Roy. Statist. Soc. Ser. B

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES By Jon A. Wellner University of Washington 1. Limit theory for the Grenander estimator.

More information

Nonparametric estimation under shape constraints, part 2

Nonparametric estimation under shape constraints, part 2 Nonparametric estimation under shape constraints, part 2 Piet Groeneboom, Delft University August 7, 2013 What to expect? Theory and open problems for interval censoring, case 2. Same for the bivariate

More information

A Law of the Iterated Logarithm. for Grenander s Estimator

A Law of the Iterated Logarithm. for Grenander s Estimator A Law of the Iterated Logarithm for Grenander s Estimator Jon A. Wellner University of Washington, Seattle Probability Seminar October 24, 2016 Based on joint work with: Lutz Dümbgen and Malcolm Wolff

More information

Nonparametric estimation under shape constraints, part 1

Nonparametric estimation under shape constraints, part 1 Nonparametric estimation under shape constraints, part 1 Piet Groeneboom, Delft University August 6, 2013 What to expect? History of the subject Distribution of order restricted monotone estimates. Connection

More information

Asymptotic normality of the L k -error of the Grenander estimator

Asymptotic normality of the L k -error of the Grenander estimator Asymptotic normality of the L k -error of the Grenander estimator Vladimir N. Kulikov Hendrik P. Lopuhaä 1.7.24 Abstract: We investigate the limit behavior of the L k -distance between a decreasing density

More information

Maximum likelihood: counterexamples, examples, and open problems

Maximum likelihood: counterexamples, examples, and open problems Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington visiting Vrije Universiteit, Amsterdam Talk at BeNeLuxFra Mathematics Meeting 21 May, 2005 Email:

More information

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Seminaire, Institut de Mathématiques de Toulouse 5 March 2012 Seminaire, Toulouse Based on joint work

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p. Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington Maximum likelihood: p. 1/7 Talk at University of Idaho, Department of Mathematics, September 15,

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

Chernoff s distribution is log-concave. But why? (And why does it matter?)

Chernoff s distribution is log-concave. But why? (And why does it matter?) Chernoff s distribution is log-concave But why? (And why does it matter?) Jon A. Wellner University of Washington, Seattle University of Michigan April 1, 2011 Woodroofe Seminar Based on joint work with:

More information

Shape constrained estimation: a brief introduction

Shape constrained estimation: a brief introduction Shape constrained estimation: a brief introduction Jon A. Wellner University of Washington, Seattle Cowles Foundation, Yale November 7, 2016 Econometrics lunch seminar, Yale Based on joint work with: Fadoua

More information

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time ASYMPTOTIC PROPERTIES OF THE GMLE WITH CASE 2 INTERVAL-CENSORED DATA By Qiqing Yu a;1 Anton Schick a, Linxiong Li b;2 and George Y. C. Wong c;3 a Dept. of Mathematical Sciences, Binghamton University,

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Northwestern University November 5, 2010 Conference on Shape Restrictions in Non- and Semi-Parametric

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

arxiv:math/ v1 [math.st] 11 Feb 2006

arxiv:math/ v1 [math.st] 11 Feb 2006 The Annals of Statistics 25, Vol. 33, No. 5, 2228 2255 DOI: 1.1214/95365462 c Institute of Mathematical Statistics, 25 arxiv:math/62244v1 [math.st] 11 Feb 26 ASYMPTOTIC NORMALITY OF THE L K -ERROR OF THE

More information

Survival Analysis for Interval Censored Data - Nonparametric Estimation

Survival Analysis for Interval Censored Data - Nonparametric Estimation Survival Analysis for Interval Censored Data - Nonparametric Estimation Seminar of Statistics, ETHZ Group 8, 02. May 2011 Martina Albers, Nanina Anderegg, Urs Müller Overview Examples Nonparametric MLE

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic

More information

Strong log-concavity is preserved by convolution

Strong log-concavity is preserved by convolution Strong log-concavity is preserved by convolution Jon A. Wellner Abstract. We review and formulate results concerning strong-log-concavity in both discrete and continuous settings. Although four different

More information

Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks

Inconsistency of the MLE for the joint distribution. of interval censored survival times. and continuous marks Inconsistency of the MLE for the joint distribution of interval censored survival times and continuous marks By M.H. Maathuis and J.A. Wellner Department of Statistics, University of Washington, Seattle,

More information

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee 1 and Jon A. Wellner 2 1 Department of Statistics, Department of Statistics, 439, West

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 11, 2006 Abstract The behavior of maximum likelihood estimates (MLEs) and the likelihood ratio

More information

Nonparametric estimation for current status data with competing risks

Nonparametric estimation for current status data with competing risks Nonparametric estimation for current status data with competing risks Marloes Henriëtte Maathuis A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Statistics Seminar, York October 15, 2015 Statistics Seminar,

More information

Legendre-Fenchel transforms in a nutshell

Legendre-Fenchel transforms in a nutshell 1 2 3 Legendre-Fenchel transforms in a nutshell Hugo Touchette School of Mathematical Sciences, Queen Mary, University of London, London E1 4NS, UK Started: July 11, 2005; last compiled: August 14, 2007

More information

DR.RUPNATHJI( DR.RUPAK NATH )

DR.RUPNATHJI( DR.RUPAK NATH ) Contents 1 Sets 1 2 The Real Numbers 9 3 Sequences 29 4 Series 59 5 Functions 81 6 Power Series 105 7 The elementary functions 111 Chapter 1 Sets It is very convenient to introduce some notation and terminology

More information

Les deux complices. Piet Groeneboom, Delft University. Juin 28, 2018

Les deux complices. Piet Groeneboom, Delft University. Juin 28, 2018 Les deux complices Piet Groeneboom, Delft University Juin 28, 2018 Monsieur Teste (Paul Valéry) Je me suis rarement perdu de vue Je me suis détesté je me suis adoré Puis, nous avons vieilli ensemble (La

More information

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1 Lectures - Week 11 General First Order ODEs & Numerical Methods for IVPs In general, nonlinear problems are much more difficult to solve than linear ones. Unfortunately many phenomena exhibit nonlinear

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Fourth Week: Lectures 10-12

Fourth Week: Lectures 10-12 Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January

More information

The strictly 1/2-stable example

The strictly 1/2-stable example The strictly 1/2-stable example 1 Direct approach: building a Lévy pure jump process on R Bert Fristedt provided key mathematical facts for this example. A pure jump Lévy process X is a Lévy process such

More information

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1 EC9A0: Pre-sessional Advanced Mathematics Course Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1 1 Infimum and Supremum Definition 1. Fix a set Y R. A number α R is an upper bound of Y if

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

MATH 6605: SUMMARY LECTURE NOTES

MATH 6605: SUMMARY LECTURE NOTES MATH 6605: SUMMARY LECTURE NOTES These notes summarize the lectures on weak convergence of stochastic processes. If you see any typos, please let me know. 1. Construction of Stochastic rocesses A stochastic

More information

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS By Piet Groeneboom and Geurt Jongbloed Delft University of Technology We study nonparametric isotonic confidence intervals for monotone functions.

More information

Independence of some multiple Poisson stochastic integrals with variable-sign kernels

Independence of some multiple Poisson stochastic integrals with variable-sign kernels Independence of some multiple Poisson stochastic integrals with variable-sign kernels Nicolas Privault Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang Technological

More information

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky EMPIRICAL ENVELOPE MLE AND LR TESTS Mai Zhou University of Kentucky Summary We study in this paper some nonparametric inference problems where the nonparametric maximum likelihood estimator (NPMLE) are

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

Generalized continuous isotonic regression

Generalized continuous isotonic regression Generalized continuous isotonic regression Piet Groeneboom, Geurt Jongbloed To cite this version: Piet Groeneboom, Geurt Jongbloed. Generalized continuous isotonic regression. Statistics and Probability

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two

More information

STAT Sample Problem: General Asymptotic Results

STAT Sample Problem: General Asymptotic Results STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative

More information

MATH Max-min Theory Fall 2016

MATH Max-min Theory Fall 2016 MATH 20550 Max-min Theory Fall 2016 1. Definitions and main theorems Max-min theory starts with a function f of a vector variable x and a subset D of the domain of f. So far when we have worked with functions

More information

7 Influence Functions

7 Influence Functions 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the

More information

A Probabilistic Upper Bound on Differential Entropy

A Probabilistic Upper Bound on Differential Entropy A Probabilistic Upper Bound on Differential Entropy Joseph DeStefano, Qifeng Lu and Erik Learned-Miller Abstract The differential entrops a quantity employed ubiquitousln communications, statistical learning,

More information

arxiv: v3 [math.st] 29 Nov 2018

arxiv: v3 [math.st] 29 Nov 2018 A unified study of nonparametric inference for monotone functions arxiv:186.1928v3 math.st] 29 Nov 218 Ted Westling Center for Causal Inference University of Pennsylvania tgwest@pennmedicine.upenn.edu

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Richard Lockhart Simon Fraser University STAT 830 Fall 2018 Richard Lockhart (Simon Fraser University) STAT 830 Hypothesis Testing STAT 830 Fall 2018 1 / 30 Purposes of These

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

LECTURE-13 : GENERALIZED CAUCHY S THEOREM

LECTURE-13 : GENERALIZED CAUCHY S THEOREM LECTURE-3 : GENERALIZED CAUCHY S THEOREM VED V. DATAR The aim of this lecture to prove a general form of Cauchy s theorem applicable to multiply connected domains. We end with computations of some real

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

with Current Status Data

with Current Status Data Estimation and Testing with Current Status Data Jon A. Wellner University of Washington Estimation and Testing p. 1/4 joint work with Moulinath Banerjee, University of Michigan Talk at Université Paul

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

ISOTONIC MAXIMUM LIKELIHOOD ESTIMATION FOR THE CHANGE POINT OF A HAZARD RATE

ISOTONIC MAXIMUM LIKELIHOOD ESTIMATION FOR THE CHANGE POINT OF A HAZARD RATE Sankhyā : The Indian Journal of Statistics 1997, Volume 59, Series A, Pt. 3, pp. 392-407 ISOTONIC MAXIMUM LIKELIHOOD ESTIMATION FOR THE CHANGE POINT OF A HAZARD RATE By S.N. JOSHI Indian Statistical Institute,

More information

Chapter 3a Topics in differentiation. Problems in differentiation. Problems in differentiation. LC Abueg: mathematical economics

Chapter 3a Topics in differentiation. Problems in differentiation. Problems in differentiation. LC Abueg: mathematical economics Chapter 3a Topics in differentiation Lectures in Mathematical Economics L Cagandahan Abueg De La Salle University School of Economics Problems in differentiation Problems in differentiation Problem 1.

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis Jon A. Wellner and Vladimir Koltchinskii Abstract. Proofs are given of the limiting null distributions of the

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199

(Y; I[X Y ]), where I[A] is the indicator function of the set A. Examples of the current status data are mentioned in Ayer et al. (1955), Keiding (199 CONSISTENCY OF THE GMLE WITH MIXED CASE INTERVAL-CENSORED DATA By Anton Schick and Qiqing Yu Binghamton University April 1997. Revised December 1997, Revised July 1998 Abstract. In this paper we consider

More information

B553 Lecture 1: Calculus Review

B553 Lecture 1: Calculus Review B553 Lecture 1: Calculus Review Kris Hauser January 10, 2012 This course requires a familiarity with basic calculus, some multivariate calculus, linear algebra, and some basic notions of metric topology.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Lecture 16: Sample quantiles and their asymptotic properties

Lecture 16: Sample quantiles and their asymptotic properties Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Legendre-Fenchel transforms in a nutshell

Legendre-Fenchel transforms in a nutshell 1 2 3 Legendre-Fenchel transforms in a nutshell Hugo Touchette School of Mathematical Sciences, Queen Mary, University of London, London E1 4NS, UK Started: July 11, 2005; last compiled: October 16, 2014

More information

Estimation of a discrete monotone distribution

Estimation of a discrete monotone distribution Electronic Journal of Statistics Vol. 3 (009) 1567 1605 ISSN: 1935-754 DOI: 10.114/09-EJS56 Estimation of a discrete monotone distribution Hanna K. Jankowski Department of Mathematics and Statistics York

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Metric spaces and metrizability

Metric spaces and metrizability 1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

Notes on uniform convergence

Notes on uniform convergence Notes on uniform convergence Erik Wahlén erik.wahlen@math.lu.se January 17, 2012 1 Numerical sequences We begin by recalling some properties of numerical sequences. By a numerical sequence we simply mean

More information