STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

Size: px

Start display at page:

Download "STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES"

Virgil Cunningham
5 years ago
Views:

1 STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES By Jon A. Wellner University of Washington 1. Limit theory for the Grenander estimator. Our goal in this section is to reprove some limit theory for the Grenander estimator. In particular we want to study limit distributions for r n ( f n (x 0 ) f(x 0 ) under the following hypotheses: Case 1. f(x) = 1 [0,1] (x); uniform density (or degenerate mixing distribution) Case 2. At a point x 0 with f (x 0 ) < 0 Case 3. At a point x 0 with f (j) (x 0 ) = 0, j = 1,..., k 1, f (k) (x 0 ) 0. Case 4. At a point x 0 (a, b) with f(x) constant on (a, b). Case 5. At a point x 0 where f is discontinuous Case 1: the Grenander estimator under sampling from a uniform density. Suppose that X 1,..., X n are i.i.d. with a monotone density f on (0, ). Then the Grenander estimator ˆf n of f is the left-derivative of the least concave majorant of the empirical distribution function F n. When f is the uniform density f(x) = 1 [0,1] (x), corresponding to a degenerate mixing distribution G in the representation of f as a scale mixture of uniform densities, Groeneboom and Pyke (1983) and Groeneboom (1985) showed that for each 0 < x 0 < 1 (1) n( fn (x 0 ) f(x 0 )) d S(x 0 ) where S is the left derivative of the least concave majorant C of a standard Brownian bridge process U on [0, 1]. The properties of the limiting processes C and S have been studied by Groeneboom (1983), Pitman (1983), Bass (1984), Çinlar (1992), and Carolan and Dykstra (2001); see also Carolan and Dykstra (1999). Here is a short proof of (1). Let X 1,..., X n be i.i.d. Uniform(0, 1) with distribution function F (x) = 0 (x 1) and empirical distribution function F n. Define the process ŝ n (a) argmax s {F n (s) as} for a > 0. Then for fixed x 0 (0, 1) and a > 0 the Supported in part by NSF grant DMS and by NI-AID grant 2R01 AI AMS 2000 subject classifications: Primary 62G10, 62G20; secondary 62G30 Keywords and phrases: monotone density, degenerate mixing, discontinuity, shape constrained, Brownian bridge 1

2 2 JON A. WELLNER Grenander estimator ˆf n satisfies the switching relation [ ˆf n (x 0 ) a] = [ŝ n (a) x 0 ], and hence (2) P (n 1/2 ( ˆf n (x 0 ) f(x 0 )) t) = P (ŝ n (f(x 0 ) + tn 1/2 ) x 0 ) = P (argmax h {F n (x 0 + h) (f(x 0 ) + tn 1/2 )(x 0 + h)} 0) = P (argmax h Z n (h) 0) where, since f(x 0 ) = 1 implies that x 0 f(x 0 ) = x 0 = F (x 0 ), Z n (h) n 1/2 {F n (x 0 + h) F (x 0 ) hf(x 0 ) t(x 0 + h)n 1/2 } = n 1/2 {F n (x 0 + h) F (x 0 + h)} +n 1/2 (F (x 0 + h) F (x 0 ) hf(x 0 )) t(x 0 + h) = n 1/2 {F n (x 0 + h) F (x 0 + h)} t(x 0 + h) U(x 0 + h) t(x 0 + h) where U denotes a Brownian bridge process. Thus by the continuous mapping theorem it follows that the right side of (2) converges to P (argmax h {U(x 0 + h) t(x 0 + h)} 0) = P (argmax s {U(s) ts} x 0 ) = P (S(x 0 ) t) by switching again, where S is the slope of the least concave majorant of U. This onedimensional convergence in distribution extends straightforwardly to convergence of finite-dimensional distributions, and by monotonicity to convergence in the Skorokhod topology on D[a, 1 a] for each fixed a (0, 1/2) Case 2: f satisfies f (x 0 ) < 0, f(x 0 ) > 0, and f is continuous at x 0. This is the classical case in which Prakasa Rao (1969) established the following limit theorem for the Grenander estimator f n. Proposition. If f (x 0 ) < 0, f(x 0 ) > 0, and f is continuous at x 0, then n 1/3 ( f n (x 0 ) f(x 0 )) d ( f (x 0 ) f(x 0 )/2) 1/3 S(0) where S(0) is the slope at zero of the least concave majorant of W (t) t 2 where W is a standard two-sided Brownian motion process starting from 0. Informal proof. We will prove that (3) n 1/3 ( f n (x 0 + n 1/3 t) f(x 0 )) d S a,b (t)

3 LIMITING DISTRIBUTIONS FOR GRENANDER 3 where S a,b is the slope process of the least concave majorant of aw (t) bt 2, W is a two-sided Brownian motion process starting from 0, a = f(x 0 ), and b = f (x 0 ) /2. (This can be strengthened to process convergence in the Skorokhod topology and in L p [ K, K] for p 1 and every K > 0, but we will not do this here.) To prove (3), we use switching twice together with the argmax continuous mapping theorem as in the tightness proof. For s, t R Now (4) P (n 1/3 ( f n (x 0 + n 1/3 s) f(x 0 )) t) = P ( ˆf n (x 0 + n 1/3 s) f(x 0 ) + tn 1/3 ), = P (ŝ n (f(x 0 ) + tn 1/3 ) x 0 + n 1/3 s) = P (argmax v {F n (v) (f(x 0 ) + n 1/3 t)v} x 0 + n 1/3 s) = P (argmax h {F n (x 0 + n 1/3 h) (f(x 0 ) + n 1/3 t)(x 0 + n 1/3 h)} s) by letting v x 0 + n 1/3 h = P (argmax h {F n (x 0 + n 1/3 h) F n (x 0 ) (F (x 0 + n 1/3 h) F (x 0 )) + F (x 0 + n 1/3 ) F (x 0 ) f(x 0 )n 1/3 h n 1/3 n 1/3 th} s). Now the stochastic { term in (6) satisfies } n 2/3 F n (x 0 + n 1/3 h) F n (x 0 ) (F (x 0 + n 1/3 h) F (x 0 )) { } d = n 2/3 1/2 U n (F (x 0 + n 1/3 h)) U n (F (x 0 )) { } = n 1/(2 3 U(F (x 0 + n 1/3 h)) U(F (x 0 )) + o p (1) by KMT or by Theorems or d = n 1/6 W (f(x 0 )n 1/3 h) + o p (1) d = f(x 0 )W (h) + o p (1) where W is a standard two-sided Brownian motion process starting from 0. On the other hand, with δ n n 1/3 (, ) n 2/3 F (x 0 + n 1/3 ) F (x 0 ) f(x 0 )n 1/3 h by our hypothesis, while = δ 2 n (F (x 0 + δ n h) F (x 0 ) f(x 0 )δ n h) b h 2 with b = f (x 0 ) /2 n 2/3 n 1/3 n 1/3 h = n 0 h = h. Thus it follows that the last probability above converges to P (argmax h { f(x 0 )W (h) b h 2 th} s) = P (S(s) t) by switching again

4 4 JON A. WELLNER where S(s) = slope at s of the least concave majorant of f(x0 )W (h) b h Case 3: f satisfies with f (j) (x 0 ) = 0, j = 1,..., k 1, f (k) (x 0 ) 0. Now suppose that for some 1 < p < and A (0, ) we have F (t + sδ) F (t) f(t)sδ δ p+1 A s p+1 as δ 0. When f (j) (x 0 ) = 0 for j = 1,..., p 1 and f (p) (x 0 ) 0, then the last display holds with A = f (p) (x 0 )/(p + 1)!. Then Anevski and Hössjer (2002) claim that it follows from Leurgans (1982) and Wright (1981) that (5) n p/(2p+1) ( ˆf n (x 0 ) f(x 0 )) d C p S p (0) where S p is the process of left derivatives of the least concave majorant of B(s) s p+1 and C p f(x 0 ) (p 1)/(2p 1) A 1/(2p 1). (When p = 1, A = f (x 0 ) /2 and C 2 = f(x 0 ) 1/3 ( f (x 0 ) /2) 1/3. ) In this case the local scale will be n 1/(2p+1). Thus we will study Now n p/(2p+1) ( ˆf n (x 0 + n 1/(2p+1) s) f(x 0 )) s R. P (n p/(2p+1) ( f n (x 0 + n 1/(2p+1) s) f(x 0 )) t) (6) = P ( ˆf n (x 0 + n 1/(2p+1) s) f(x 0 ) + tn γ ), with γ p/(2p + 1) = P (ŝ n (f(x 0 ) + tn γ ) x 0 + n 1/(2p+1) s) = P (argmax v {F n (v) (f(x 0 ) + n γ t)v} x 0 + n 1/(2p+1) s) = P (argmax h {F n (x 0 + n 1/(2p+1) h) (f(x 0 ) + n γ t)(x 0 + n 1/(2p+1) h)} s) by letting v x 0 + n 1/(2p+1) h = P (argmax h {F n (x 0 + n 1/(2p+1) h) F n (x 0 ) (F (x 0 + n 1/(2p+1) h) F (x 0 )) + F (x 0 + n 1/(2p+1) ) F (x 0 ) f(x 0 )n 1/(2p+1) h n γ n 1/(2p+1) th} s).

5 LIMITING DISTRIBUTIONS FOR GRENANDER 5 Now the stochastic term in (6) satisfies { } n (p+1)/(2p+1) F n (x 0 + n 1/(2p+1) h) F n (x 0 ) (F (x 0 + n 1/(2p+1) h) F (x 0 )) { } d = n (p+1)/(2p+1) 1/2 U n (F (x 0 + n 1/(2p+1) h)) U n (F (x 0 )) { } = n 1/(2(2p+1)) U(F (x 0 + n 1/(2p+1) h)) U(F (x 0 )) + o p (1) by KMT or by Theorems or d = n 1/(2(2p+1)) W (f(x 0 )n 1/(2p+1) h) + o p (1) d = f(x 0 )W (h) + o p (1) where W is a standard two-sided Brownian motion process starting from 0. On the other hand, with δ n n 1/(2p+1), ( ) n (p+1)/(2p+1) F (x 0 + n 1/(2p+1) ) F (x 0 ) f(x 0 )n 1/(2p+1) h by our hypothesis, while = δ (p+1) n (F (x 0 + δ n h) F (x 0 ) f(x 0 )δ n h) A h p+1 with A > 0 n (p+1)/(2p+1) n γ n 1/(2p+1) h = n 0 h = h. Thus it follows that the last probability above converges to where P (argmax h { f(x 0 )B(h) A h p+1 th} s) = P (S p (s) t) by switching again S p (s) = slope at s of the least concave majorant of f(x0 )B(h) A h p+1. Now the final result (5) follows via Brownian scaling. Letting Hn loc denote the localized integral of ˆf n, and Yn loc the the process { } Yn loc (h) = n (p+1)/(2p+1) F n (x 0 + n 1/(2p+1) h) F n (x 0 ) (F (x 0 + n 1/(2p+1) h) F (x 0 )) { } + n (p+1)/(2p+1) F (x 0 + n 1/(2p+1) ) F (x 0 ) f(x 0 )n 1/(2p+1) h aw (h) b h p+1

6 6 JON A. WELLNER where a f(x 0 ) and b = A, we know that Hn loc (h) Yn loc (h) and satisfies an integral condition giving equality at touch points. Thus we want to find constants k 1 and k 2 so that k 1 Yn loc (k 2 h) k 1 (aw (k 2 h) b k 2 h p+1 ) d = k 1 a k 2 W (h) k 1 bk p+1 2 h p+1 = W (h) h p+1. The last equality holds if ak 1 k2 = 1 and bk 1 k p+1 2 = 1. But these relations hold if and only if Then we have where k 2 = (a 2 /b 2 ) 1/(2p+1) and k 1 = 1 a k 2 = 1 a ( ) b 1/(2p+1). a k 1 k 2 (Hn loc ) (k 2 s) d slope at s of least concave majorant of h W (h) h p+1 ( ) 1 a 2 1/(2p+1) ( ) a 2 1/(2p+1) k 1 k 2 = a 1 k 2 b 2 = b 2 a(a/b) 1/(2p+1) = 1 ( a ) 1/(2p+1) = a 1+1/(2p+1) b 1/(2p+1) a b = a 2p/(2p+1) b 1/(2p+1) = f(t) p/(2p+1) A 1/(2p+1), in agreement with the claims of Anevski and Hössjer (2002). Example. Consider the monotone decreasing density function f given by Thus { 1 f(x) = 2 (1 + (x 1)2 ), 0 x 1, 1 2 (1 (x 1)2 ), 1 < x 2. { 1 F (x) = [1 (1 x)3 ], 0 x 1, (x 1) 1 6 (x 1)3. Then f (x) is given by x 1 for 0 x < 1 and by 1 x for 1 < x < 2. Thus f (1) = 0, and f (x) is 1 for 0 < x < 1 and 1 for 1 < x < 2. Thus the hypothesis holds with p = 3 and A = 1/6. In this case the rate of convergence of the Grenander estimator at 1 is n 2/5 while the appropriate local scale is n 1/5.

7 LIMITING DISTRIBUTIONS FOR GRENANDER Case 4. f constant on (a, b) with x 0 (a, b). Now suppose that: (i) f is monotone nonincreasing on [0, ). (ii) f(x 0 ) > 0. (iii) f is constant on some open interval (a, b) containing x 0. Let [a, b] be the closure of the longest open interval containing x 0 over which f is constant and equal to f(x 0 ). Proposition: (Carolan and Dykstra (1999)) If (i) - (iii) hold, then { ( )} f(x 0 ) 1 x0 a n( fn (x 0 ) f(x 0 )) d pz + S p where S is as in subsection 1.1 and Z N(0, 1) is independent of S. Proof. Again we use the switching relation: P ( n( f n (x 0 ) f(x 0 )) t) = P ( f n (x 0 ) f(x 0 ) + n 1/2 t) = P (ŝ n (f(x 0 ) + n 1/2 t) x 0 ) = P (argmax s {F n (s) (f(x 0 ) + n 1/2 t)s} x 0 ) = P (argmax h {F n (x 0 + h) (f(x 0 ) + n 1/2 t)(x 0 + h)} 0) = P (argmax h {F n (x 0 + h) F n (x 0 ) (F (x 0 + h) F (x 0 )) + F (x 0 + h) F (x 0 ) (f(x 0 ) + n 1/2 t)h} 0). Note that, by definition of the interval [a, b] and concavity of F, { = 0, if a x0 + h b F (x 0 + h) F (x 0 ) hf(x 0 ) = < 0, if h < a x 0 or h > b x 0. Thus since n{fn (x 0 + h) F n (x 0 ) (F (x 0 + h) F (x 0 ))} U(F (x 0 + h)) U(F (x 0 )), and n(f (x0 + h) F (x 0 ) f(x 0 )h) { 0, if a x0 h b x 0,, if h < a x 0 or h > b x 0,

8 8 JON A. WELLNER it follows that the probability in the last display converges to (7) P (argmax a x0 h b x 0 {U(F (x 0 + h)) U(F (x 0 )) th} 0) = P (argmax a s b {U(F (s)) U(F (x 0 )) t(s x 0 )} x 0 ) by letting s = x 0 + h, h = s x 0, = P (argmax a s b {U(F (s)) U(F (a)) ts} x 0 ) = P ( R(x 0 ) x) where R(x 0 ) is the slope at x 0 of the least concave majorant of s U(F (s)) U(F (a)) on [a, b]. Now U(F (s)) U(F (a)) F (s) F (a) = U(F (s)) U(F (a)) (U(F (b)) U(F (a))) F (b) F (a) F (s) F (a) + (U(F (b)) U(F (a))) F (b) F (a) where V(s) U(F (s)) U(F (a)) has V(a) = V(b) = 0 and W(s) s a W(b) F (s) F (a) (U(F (b)) U(F (a))) F (b) F (a) Cov[V(r), V(s)] = Cov[W(r), W(s)] r a Cov[W(b), W(s)] s a r a s a Cov[W(r), W(b)] + = f(t)() after some tedious calculation. Thus Furthermore, V(s) d = f(t)()u { r a s a V ar[w(b)] r a } s a ( ) s a = ( ) s a p U. Cov[V(s), W(b)] = 0, V ar[w (b)] = p(1 p) = f(t)() f 2 (t)() 2,

9 LIMITING DISTRIBUTIONS FOR GRENANDER 9 so V and W(b) are independent, and W(b) = d p(1 p)z where Z N(0, 1). Putting these facts together yields U(F (s)) U(F (a)) = d ( ) s a p Ũ p(1 p) s a Z where Ũ and Z are independent and Ũ = d U. Plugging this back into (7) shows that the limiting distribution can be written as ( { ( ) pu s a P argmax a s b + ( ) } ) s a p(1 p) Z ts x 0 { ( ) ( ) } ) s a p(1 p)z = P (argmax a s b pu t s x 0 ( ) p(1 p) = P R # (x 0 ) t Z where R # (x 0 ) is the slope at x 0 of the least concave of pu((s a)/()) ( p [ = P S((x 0 a)/()) + ] ) 1 pz t where S(t) is the slope at t of the least concave of U. Since p/() = f(x 0 )/ p, this result agrees with that of Carolan and Dykstra (1999) Case 5: Suppose that f(x 0 ) > f(x 0 +). Now suppose that x 0 is a discontinuity point of f: f(x 0 ) > f(x 0 +) > 0. Let d f(x 0 ) f(x 0 +) > 0 and set f(x 0 ) (f(x 0 ) + f(x 0 +))/2. In this case Anevski and Hössjer (2002) show that P ( f n (x 0 ) f(x 0 ) t) P (argmax h {N 0 (h) ρ t+d/2,t d/2 (h)} 0) for each t d/2 for which the limit distribution is continuous (at x) where N 0 is a two-sided, centered Poisson process to be defined below, and { Bh, h 0 ρ B,C (h) Ch, h < 0.

10 10 JON A. WELLNER Here is the argument (again using switching): (8) (9) P ( ˆf n (x 0 ) f(x 0 ) t) = P (ŝ n (f(x 0 ) + t) x 0 ) = P (argmax s {F n (s) (f(x 0 ) + t)s} x 0 ) = P (argmax h {F n (x 0 + n 1 h) (f(x 0 ) + t)(x 0 + n 1 h)} 0) letting s = x 0 + n 1 h, so that h = n(s x 0 ) = P ( argmax{n(f n (x 0 + n 1 h) F n (x 0 ) (F (x 0 + n 1 h) F (x 0 ))) + n(f (x 0 + n 1 h) F (x 0 ) (f(x 0 ) + t)n 1 h)} 0 ). In (8) of the previous display the stochastic term is n(f n (x 0 + n 1 h) F n (x 0 ) (F (x 0 + n 1 h) F (x 0 ))) { N(f(x0 +)h) f(x 0 +)h, h 0 (N(f(x 0 )h) f(x 0 )h), h < 0 N 0 (h) where N is a standard two-sided Poisson process starting from 0 with rate 1. On the other hand the drift term in (9) is n { F (x 0 + n 1 h) F (x 0 ) (f(x 0 ) + t)n 1 h } { } th + f(x0 +)h f(x 0 )h, h 0 th + f(x 0 )h f(x 0 )h, h < 0 { } h[d/2 + t], h 0 =. h[d/2 t], h < 0 Combining the last two displays yields the claimed result. This agrees with the result of Anevski and Hössjer (2002). But we can go a bit further by using the switching relation again: Define the process M by { } N0 (h) (d/2)h, h 0 M(h) = = N N 0 (h) + (d/2)h, h < 0, 0 (h) (d/2) h, and let R(h) denote the process of slopes (left derivatives) of the least concave majorant of the process M. Then we see that we have shown P ( f n (x 0 ) f(x 0 ) t) P (argmax h {M(h) th} 0) = P (R(0) t) by switching again.

11 LIMITING DISTRIBUTIONS FOR GRENANDER 11 This means that f n (x 0 ) f(x 0 ) d R(0). Using the same reasoning again, we can generalize these relations as follows: Proposition: f n (x 0 + sn 1 ) f(x 0 ) d R(s). 2. Lower bound calculations for the Grenander estimator Case 1: the Grenander estimator under sampling from a uniform density. It is also instructive to do the lower bound calculation in this case. Define a sequence of densities {f n } as follows: f n (x) = { 1 + n 1/2 c/x 0, 0 x x 0 1 n 1/2 c/(1 x 0 ), x 0 < x 1, where 1 0 f n(x)dx = x 0 (1 + n 1/2 c/x 0 ) + (1 x 0 )(1 n 1/2 c/(1 x 0 )) = 1. Then, with ν(p ) = f(x 0 ), it follows that while, by an easy calculation, H 2 (P n, P 0 ) = c2 8n ν(p n ) ν(p 0 ) = n 1/2 c/x 0 { 1 x x 0 } + o(n 1 ) = c2 1 8n x 0 (1 x 0 ) + o(n 1 ). Thus nh 2 (P n, P 0 ) c 2 /(8x 0 (1 x 0 )), and we conclude that n inf T n max {E n,pn T n ν(p n ), E n,p0 T n ν(p 0 ) } c ( 1 nh2 (P n, P 0 ) n 4x 0 c exp 4x 0 ( c 2 4x 0 (1 x 0 ) We maximize this bound by choosing c so that c 2 = 2x 0 (1 x 0 ), and then the lower bound becomes 2 1 x0 exp( 1/2). 4 x 0 ) 2n ).

12 12 JON A. WELLNER Note that changing the definition of the perturbations just slightly gives a different result, and thus by combining the two results yields a more symmetric result over a neighborhood. Now let { 1 + n f n (x) = 1/2 c/x 0, 0 x < x 0 1 n 1/2 c/(1 x 0 ), x 0 x 1, Then ν(p n ) ν(p 0 ) = n 1/2 c/(1 x 0 ), while the Hellinger distance calculation is exactly the same. This results in the lower bound 2 x0 exp( 1/2), 4 1 x 0 and hence we conclude that n inf sup E n,pn T n ν(p n ) T n p:h(p,p 0 ) bn 1/2 { 2 x0 4 e 1/2 1 x 0 1 x0 x 0 } Case 2: f satisfies f (x 0 ) < 0, f(x 0 ) > 0, and f is continuous at x 0. Suppose that we want to estimate ν(p ) = f(x 0 ) for a fixed x 0 (0, ) on the basis of a sample X 1,..., X n from P 0 P. Let f be the density corresponding to P 0, and suppose that f (x 0 ) < 0, f(x 0 ) > 0, and f is continuous at x 0. To apply Proposition 3.1 we need to construct some density f n that is near f in the sense that for some constant A, and nh 2 (P n, P ) A ν(p n ) ν(p ) = b 1 n where b n. Hence we will try the following choice of f n. For c > 0, define f(x) if x x 0 cn 1/3 or x > x 0 + cn 1/3, f n (x) = f(x 0 cn 1/3 ) if x 0 cn 1/3 < x x 0, f(x 0 + cn 1/3 ) if x 0 < x x 0 + cn 1/3. It is easy to see that n 1/3 ν(p n ) ν(p ) = n 1/3 (f(x 0 cn 1/3 ) f(x 0 )) f (x 0 ) c (10)

13 On the other hand we calculate H 2 (P n, P ) = 1 2 LIMITING DISTRIBUTIONS FOR GRENANDER 13 = 1 2 = 1 2 = 1 2 = x0 [ f n (x) f(x)] 2 dx [ f n (x) f(x)] 2 [ f n (x) + f(x)] 2 [ f n (x) + f(x)] 2 dx [f n (x) f(x)] 2 [ f n (x) + f(x)] 2 dx [f(x 0 cn 1/3 ) f(x)] 2 x 0 cn 1/3 [ f n (x) + dx f(x)] 2 x x0 +cn 1/3 [f(x 0 + cn 1/3 ) f(x)] 2 x 0 [ f n (x) + f(x)] 2 [f (x n)(x 0 cn 1/3 x)] 2 x 0 cn 1/3 [ f n (x) + dx f(x)] x0 +cn 1/3 x 0 1 f (x 0 ) 2 2 (2 f(x 0 )) 2 = f (x 0 ) 2 c 3 4f(x 0 ) 3n. x0 + 1 f (x 0 ) 2 2 (2 f(x 0 )) 2 [f (x n )(x 0 + cn 1/3 x)] 2 [ f n (x) + f(x)] 2 x 0 cn 1/3 (x 0 cn 1/3 x) 2 dx x0 +cn 1/3 x 0 (x 0 + cn 1/3 x) 2 dx Now we can combine these two pieces with Proposition 3.1 to find that, for any estimator T n of ν(p ) = p(x 0 ) and the loss function l(x) = x we have { } inf max E n n 1/3 T n ν(p n ), E 0 n 1/3 T n ν(p 0 ) T n 1 { } 2n 4 n1/3 (ν(p n ) ν(p 0 )) 1 nh2 (P n, P 0 ) n = 1 { } 2n 4 n1/3 (f(x 0 cn 1/3 ) f(x 0 )) 1 nh2 (P n, P 0 ) n 1 4 f (x 0 ) c exp ( 2 f (x 0 ) 2 ) 12f(x 0 ) c3 = 1 4 f (x 0 ) c exp ( f (x 0 ) 2 ) 6f(x 0 ) c3

14 14 JON A. WELLNER We can choose c to maximize the quantity on the right side. It is easily seen that the maximum is achieved when ( ) 2f(x0 ) 1/3 c = c 0 f (x 0 ) 2. This yields { } lim inf inf max E n n 1/3 T n ν(p n ), E 0 n 1/3 T n ν(p 0 ) e 1/3 ( 2 f (x 0 ) f(x 0 ) ) 1/3. n T n 4 This bound has the appropriate structure in the sense that the (nonparametric) MLE of p, p n converges at rate n 1/3 and the same constant is involved in its limiting distribution: n 1/3 ( f n (x 0 ) f(x 0 )) d ( f (x) f(x)/2) 1/3 (2Z) as shown in Section 1 and by Prakasa Rao (1969) and Groeneboom (1985) Case 3: f satisfies with f (j) (x 0 ) = 0, j = 1,..., k 1, f (k) (x 0 ) 0. Exercise. Not yet carried through! 2.4. Case 4: f constant on (a, b) with x 0 (a, b). It is also interesting to carry through a lower bound calculation in this case. For a fixed density f satisfying (i)-(iii), define a sequence of densities {f n } by f(x), x a f(x) cn 1/2, a < x b f n (x) = f(x) cn 1/2, b < x < b n f(x), b b n. where b n inf{x : f(x) < f(t) cn 1/2 }. Note that the interval (b, b n ) may be empty (if f(b+) < f(b ) and n is sufficiently large). Now with ν(p ) = f(x 0 ) we have ν(p n ) = f(x 0 ) cn 1/2, ν(p ) = f(x 0 ), and Furthermore H 2 (P n, P ) = 1 2 = 1 2 bn a b a = f(x 0) 2 = ν(p n ) ν(p ) = cn 1/2. [ f n (x) f(x)] 2 dx [ f(x 0 ) cn 1/2 f(x 0 )] 2 dx + b a bn { } c 2 1 2f(x 0 ) n 1 dx + o(n 1 ) c 2 p 8nf 2 (x 0 ) + o(n 1 ). b [ f(x 0 ) cn 1/2 f(x)] 2 dx

15 Thus LIMITING DISTRIBUTIONS FOR GRENANDER 15 nh 2 (P n, P ) c2 p 8 f 2 (x 0 ). Plugging this into the general form of the lower bound yields ( 1 nh2 (P n, P 0 ) n n inf max {E n,pn T n ν(p n ), E n,p0 T n ν(p 0 ) } c T n 4 c ( 4 exp c2 p 4f 2 (x 0 ) We maximize this bound by choosing c so that c 2 = 2f 2 (x 0 )/p, and then the lower bound becomes 1 f(x 0 ) 2 exp( 1/4). 4 p This seems to have the same dependence on population parameters as the result of Carolan and Dykstra (1999) Case 5: Suppose that f(x 0 ) > f(x 0 +). Lower bound calculation: Define a sequence of densities {f n } as follows: ). ) 2n f n (x) = f(x) if x < x 0 f(x 0 ) if x = x 0 f(x 0 ) + f(x 0+cn 1 ) f(x 0 ) (x x cn 1 0 ) if x 0 x x 0 + cn 1 f(x) if x x 0 + cn 1. f(x) if x < x 0 f(x ) if x = x 0 a + b(x x 0 ) f(x) if x 0 x x 0 + cn 1 if x x 0 + cn 1. Let ν(p ) = (f(x 0 ) + f(x 0 +))/2 f(x 0 ). Then for the sequence {f n } we have ν(p n ) = f(x 0 ) for every n and hence Furthermore H 2 (P n, P ) = 1 2 ν(p n ) ν(p ) = 1 2 (f(x 0 ) f(x 0 +)) d/2. [ f n f] 2 dx = 1 2 x0 +cn 1 x 0 [ a + b(x x 0 ) f(x)] 2 dx = 1 12 cn 1 [ f(x 0 ) f(x 0 +)] 2 (3 f(x 0 ) + f(x 0 +)) f(x0 ) + f(x 0 +) + o(n 1 ) cr2 n + o(n 1 ).

16 16 JON A. WELLNER Thus nh 2 (P n, P ) cr 2. Thus plugging into the general lower bound result inf max {E n,pn T n ν(p n ), E n,p0 T n ν(p 0 ) } T n 1 d 4 2 (1 H2 (P n, P )) 2n = d ( ) 2n 1 nh2 (P n, P 0 ) 8 n d 8 exp( cr2 ). This holds for every c; choosing c = 1/r 2 yields the lower bound of d/(8e). [Can we compare this to the limiting behavior of the MLE? ] 3. Groeneboom s lower bound. The following proposition is due to Groeneboom (1996); see also Jongbloed (2000). Proposition 3.1 (Groeneboom, 1996). Let P be a set of probability measures on a measurable space (X, A), and let ν be a real-valued function defined on P. Moreover, let l : [0, ) [0, ) be an increasing convex loss function with l(0) = 0. Then, for any P 1, P 2 P such that H(P 1, P 2 ) < 1 and with E n,i f(x 1,..., X n ) = E n,i f(x) = f(x)dpi n (x) f(x 1,..., x n )dp i (x 1 ) dp i (x n ), for i = 1, 2, it follows that (11) inf max {E n,1 l( T n ν(p 1 ) ), E n,2 l( T n ν(p 2 ) )} T n ( ) 1 l 4 ν(p 1) ν(p 2 ) {1 H 2 (P 1, P 2 )} 2n. First we need two lemmas. Lemma 3.2 Let P, Q be two probability measures on a measurable space (X, A) with densities p, q with respect to a σ finite dominating measure µ. Then { (1 H 2 (P, Q)) (p q) dµ} 2 (p q) dµ.

17 LIMITING DISTRIBUTIONS FOR GRENANDER 17 Proof. The second inequality is trivial. To prove the first inequality, note that by Exercise (1 H 2 (P, Q)) 2 + (1 p qdµ) 2 ( ) pqdµ 2 ( 1 = + 2 ( ) pqdµ 2 = ( ) pqdµ = 1. ( ) 2 p q dµ p q p + ) 2 q dµ ( p q) 2 dµ ( p + q) 2 dµ Lemma 3.3 If P and Q are two probability measures on a measurable space (X, A) with densities p and q with respect to a σ finite dominating measure µ and P n and Q n denote the corresponding product measures on (X n, A n ) (of X 1,..., X n i.i.d. as P or Q respectively), then ρ(p n, Q n ) = ρ(p, Q) n. (12) Proof. Note that ρ(p n, Q n ) = = n n p(x i ) q(x i ) dµ(x 1 ) dµ(x n ) i=1 i=1 p(x1 )q(x 1 ) p(x n )q(x n )dµ(x 1 dµ(x n ) = p(x1 p(xn )q(x 1 ) dµ(x 1 ) )q(x n )dµ(x n ) = ρ(p, Q) ρ(p, Q) = ρ(p, Q) n. Remark 3.4 Note that (12) implies that H 2 (P n, Q n ) = 1 ρ(p n, Q n ) = 1 ρ(p, Q) n = 1 (1 H 2 (P, Q)) n by using exercise (chapter 2, page 10) twice.

18 18 JON A. WELLNER With these two lemmas in hand we can prove Groeneboom s lower bound lemma. Proof. By Jensen s inequality E n,i l( T n ν(p i ) ) l(e n,i T n ν(p i ) ), i = 1, 2, and hence the left side of (11) is bounded below by ( ) l inf max{e n,1 T n ν(p 1 ), E n,2 T n ν(p 2 ). T n Thus it suffices to prove the proposition for l(x) = x. Let p 1 dp 1 /(d(p 1 + P 2 ), p 2 = dp 2 /d(p 1 + P 2 ), and µ = P 1 + P 2 (or let p i be the density of P i with respect to some other convenient dominating measure µ, i = 1, 2). Now max {E n,1 T n ν(p 1 ), E n,2 T n ν(p 2 ) } 1 2 {E n,1 T n ν(p 1 ) + E n,2 T n ν(p 2 ) } = 1 2 { T n (x) ν(p 1 ) + n p 1 (x i )dµ(x 1 ) dµ(x n ) i=1 T n (x) ν(p 2 ) } n p 2 (x i )dµ(x 1 ) dµ(x n ) i=1 1 2 { [ T n (x) ν(p 1 ) + T n (x) ν(p 2 ) ] 1 2 ν(p 1) ν(p 2 ) n i=1 p 1 (x i ) n p 1 (x i ) i=1 } n p 2 (x i )dµ(x 1 ) dµ(x n ) i=1 n p 2 (x i )dµ(x 1 ) dµ(x n ) 1 4 ν(p 1) ν(p 2 ) {1 H 2 (P n 1, P n 2 )} 2 by Lemma 3.2 = 1 4 ν(p 1) ν(p 2 ) {1 H 2 (P 1, P 2 )} 2n by Lemma 3.3. i=1 4. Details for Case 2 proof. Proposition 2. (Rate of convergence). Suppose that f (t 0 ) < 0 and that f is continuous in a neighborhood of t 0. Then, for any fixed K > 0 the Grenander estimator ˆf n satisfies sup n 1/3 ( ˆf n (t 0 + n 1/3 t) f(t 0 )) = O p (1). (13) t K First proof: switching + general rate result. This proof will first establish just n 1/3 ( ˆf n (t 0 ) f(t 0 )) = O p (1). (14)

19 LIMITING DISTRIBUTIONS FOR GRENANDER 19 We first use the switching relationship as in the consistency proof to write P ( ˆf n (t 0 ) f(t 0 ) x) = P (V n (f(t 0 ) + x) t 0 ) where, by the change of variables s = t 0 + u in the definition of V n we have where V n (f(t 0 ) + x) t 0 = argmax u {F n (t 0 + u) (f(t 0 ) + x)(t 0 + u)} = argmax u {F n (t 0 + u) F n (t 0 ) (F (t 0 + u) F (t 0 )) = argmax u M n (u) û n (x) + (F (t 0 + u) F (t 0 ) f(t 0 )u) xu} M n (u) (P n P )(1 [0,t0 +u] 1 [0,t0 ]) + (F (t 0 + u) F (t 0 ) f(t 0 )u) xu. Taking x = Kn 1/3 for K > 0, we see that a natural choice for M is M(u) = F (t 0 + u) F (t 0 ) f(t 0 )u = 1 2 f (t 0 )(1 + o(1))u 2 as u 0. Since the bracketing entropy integral of the class of functions M δ = {m u 1 [0,t0 +u] 1 [0,t0 ] : u δ} is bounded uniformly in δ, and the envelopes M δ satisfy P Mδ 2 δ, it follows that φ n(δ) = δ + δkn 1/6 (with the second term coming from the term xu in the definition of M n ) works in Theorem of van der Vaart and Wellner (1996). We conclude that n 1/3 û n ( Kn 1/3 ) = O p (1). Since the process n 2/3 (M n (n 1/3 t) M n (0)), t K converges weakly for every M > 0 in the space l ([ M, M]) to the process f(t 0 )W (t)+f (t 0 )t 2 /2+Kt where W is a standard twosided Brownian motion process starting from 0, it follows from the argmax continuous mapping theorem that n 1/3 û n ( Kn 1/3 ) d argmax t { f(t 0 )W (t) f (t 0 )t 2 + Kt}. Then by the portmanteau theorem followed by the fact that argmax t {aw (t) bt 2 ct} d = (a/b) 1/3 argmax t {W (t) t 2 } c/(2b)

20 20 JON A. WELLNER (see Exercise yy) lim sup P (n 1/3 ( ˆf n (t 0 ) f(t 0 )) K) n = lim sup P (n 1/3 û n ( Kn 1/3 ) 0) n P (argmax t { f(t 0 )W (t) f (t 0 )t 2 + Kt} 0) ( = P 2 ) 2/3 f(t 0 ) f argmax (t 0 ) t (W (t) t 2 ) + K/ f (t 0 ) 0 ( = P 2 ) 2/3 f(t 0 ) f argmax (t 0 ) t (W (t) t 2 ) K/ f (t 0 ) ( ) = P (4f(t 0 ) f (t 0 ) ) 1/3 argmax t (W (t) t 2 ) K where the right side can be made arbitrarily small by choosing K large. Similarly, lim sup P (n 1/3 ( ˆf n (t 0 ) f(t 0 )) > K) n = lim sup P (n 1/3 û n (Kn 1/3 ) > 0) n lim sup P (n 1/3 û n (Kn 1/3 ) K/(2 f (t 0 ) )) n P (argmax t { f(t 0 )W (t) f (t 0 )t 2 Kt} K/(2 f (t 0 ) )) ( = P 2 ) 2/3 f(t 0 ) f argmax (t 0 ) t (W (t) t 2 ) K/ f (t 0 ) K(2 f (t 0 ) ) ( = P 2 ) 2/3 f(t 0 ) f argmax (t 0 ) t (W (t) t 2 ) K/(2 f (t 0 ) ) ( ) = P (4f(t 0 ) f (t 0 ) ) 1/3 argmax t (W (t) t 2 ) K/2 where the right side can be made arbitrarily small by choosing K large. Thus (14) is proved. To prove (13), we note that since ˆf n (t) ˆf n (t Mn 1/3 ) and ˆfn (t) ˆf n (t+mn 1/3 ) t [t 0 Mn 1/3, t 0 +Mn 1/3 ], it suffices to show that for every ɛ > 0 we can choose a K = K ɛ so that lim sup P (n 1/3 ( ˆf n (t 0 Mn 1/3 ) f(t 0 )) K) ɛ, (15) n

21 and LIMITING DISTRIBUTIONS FOR GRENANDER 21 lim sup P (n 1/3 (f(t 0 ) ˆf n (t 0 + Mn 1/3 )) K) ɛ. (16) n We will prove (15); the proof of (16) is similar. To see that (15) holds, we proceed as before via switching to write P (n 1/3 ( ˆf n (t 0 Mn 1/3 ) f(t 0 )) K) = P ( ˆf n (t 0 Mn 1/3 ) f(t 0 ) + Kn 1/3 ) = P (V n (f(t 0 ) + Kn 1/3 ) t 0 Mn 1/3 ) = P (û n (Kn 1/3 ) Mn 1/3 ). where n 1/3 û n (Kn 1/3 ) = O p (1) and by the argmax theorem n 1/3 (û n (Kn 1/3 ) d argmax t { f(t 0 )W (t) + f (t 0 )t 2 /2 Kt} d = ( f(t0 ) f (t 0 ) ) 2/3 argmax t {W (t) t 2 } K/ f (t 0 ). Thus by another application of the Portmanteau theorem lim sup P (n 1/3 ( ˆf n (t 0 Mn 1/3 ) f(t 0 )) K) n = lim sup P (n 1/3 û n (Kn 1/3 ) M) n ( ) 2/3 P f(t0 ) f argmax (t 0 ) t {W (t) t 2 } K/ f (t 0 ) M ( ) 2/3 = P f(t0 ) f argmax (t 0 ) t {W (t) t 2 } K/ f (t 0 ) M where the right side can be made arbitrarily small by choosing K sufficiently large. Second proof: mid-point property. Let 0 < t 1 <... < t m be the points of change of slope of F n, the least concave majorant of F n : thus F n satisfies F n (x) F n (x) for all x 0, F n (x) = F n (x) if fn (x ) > f n (x+). Then F n has the following mid-point property : F n (t k ) = 1 2 ( F n (t k ) + F n (t k 1 )) = 1 2 (F n(t k ) + F n (t k 1 ))

22 22 JON A. WELLNER where t k (t k + t k 1 )/2. Since F n (t k ) F n (t k ), this yields, with τ + n min{t k : t k > t 0 }, τ n max{t k : t k < t 0 }, and τ n (τ + n + τ n )/2, or, equivalently, 1 2 (F n(τ + n ) + F n (τ n )) F n (τ n ), F n (τ + n ) F n (τ n ) (F n (τ n ) F n (τ n )) 0. By an empirical process argument (see e.g. Kim and Pollard (1990), Lemma 4.1, page 201), for each fixed ɛ > 0 we can replace F n by F at the price of a term of order ɛ(τ + n τ n ) 2 + O p (n 2/3 ). Thus we find that F (τ + n ) F (τ n ) (F (τ n ) F (τ n )) + ɛ(τ + n τ n ) 2 + O p (n 2/3 ) 0. Rewriting this in terms of the indicators of the intervals involved we find that τ + n τ n {1 [τn,τ +n ] 1 [τ n,τn) (s) } f 0 (s)ds + ɛ(τ + n τ n ) 2 + O p (n 2/3 ) 0. But, upon expanding f 0 about τ n, and using f 0 (τ n ) p f 0 (x 0 ) < 0, we deduce that 1 4 f 0(τ n )(τ + n τ n ) 2 + O p (n 2/3 ) 0, and this implies that τ + n τ n = O p (n 1/3 ). 5. Acknowledgements. Thanks to Hanna Jankowski for reading through earlier versions of these notes. REFERENCES Anevski, D. and Hössjer, O. (2002). Monotone regression and density function estimation at a point of discontinuity. J. Nonparametr. Stat Bass, R. F. (1984). Markov processes and convex minorants. In Seminar on probability, XVIII, vol of Lecture Notes in Math. Springer, Berlin, Carolan, C. and Dykstra, R. (1999). Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist Carolan, C. and Dykstra, R. (2001). Marginal densities of the least concave majorant of Brownian motion. Ann. Statist Çinlar, E. (1992). Sunset over Brownistan. Stochastic Process. Appl Groeneboom, P. (1983). The concave majorant of Brownian motion. Ann. Probab

23 LIMITING DISTRIBUTIONS FOR GRENANDER 23 Groeneboom, P. (1985). Estimating a monotone density. In Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II. Wadsworth Statist./Probab. Ser., Wadsworth, Belmont, CA. Groeneboom, P. (1996). Lectures on inverse problems. In Lectures on probability theory and statistics (Saint-Flour, 1994), vol of Lecture Notes in Math. Springer, Berlin, Groeneboom, P. and Pyke, R. (1983). Asymptotic normality of statistics based on the convex minorants of empirical distribution functions. Ann. Probab Jongbloed, G. (2000). Minimax lower bounds and moduli of continuity. Statist. Probab. Lett Leurgans, S. (1982). Asymptotic distributions of slope-of-greatest-convex-minorant estimators. Ann. Statist Pitman, J. W. (1983). Remarks on the convex minorant of Brownian motion. In Seminar on stochastic processes, 1982 (Evanston, Ill., 1982), vol. 5 of Progr. Probab. Statist. Birkhäuser Boston, Boston, MA, Prakasa Rao, B. L. S. (1969). Estimation of a unimodal density. Sankhy Ser. A Wright, F. T. (1981). The asymptotic behavior of monotone regression estimates. Ann. Statist Jon A. Wellner University of Washington Department of Statistics Box Seattle, WA jaw@stat.washington.edu

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions