A Law of the Iterated Logarithm. for Grenander s Estimator

A Law of the Iterated Logarithm for Grenander s Estimator Jon A. Wellner University of Washington, Seattle Probability Seminar October 24, 2016

Based on joint work with: Lutz Dümbgen and Malcolm Wolff

Outline 1: 2: 3: 4: 5: Introduction: classical LIL s. Grenander s estimator: the MLE of a decreasing density. Chernoff s distribution and tail behavior. A LIL for Grenander s estimator. Summary; further questions and open problems. UW Probability Seminar, October 24, 2016 1.2

1. Introduction: Classical LIL s Setting: X 1, X 2,..., X n,... i.i.d. random variables with µ = E(X 1 ) = 0, V ar(x 1 ) = σ 2. S n n i=1 X i = nx n. Then, with Z N(0, 1), S n n = n(x n µ) d σz Theorem. (Hartman and Wintner (1941)). lim sup n lim inf n S n 2nloglogn = +σ S n 2nloglogn = σ Moreover, Strassen (1964) showed that a.s., a.s.. S n 2nloglogn [ σ, σ] a.s.... and... UW Probability Seminar, October 24, 2016 1.3

with S n (t) we have S k at t = k/n, 2nloglogn linearly interp. on k/n t (k + 1)/n, k = 0, 1,..., n 1, S n σk a.s. where { K f C[0, 1] : f(t) = Corollary. t 0 f(s)ds, some f, 1 0 } f 2 (s)ds 1. lim sup n S n = lim sup 2nloglogn n S n(1) = σ sup f(1) = σ 1 f K where the supremum in the last line is achieved at f(t) = t, 0 t 1. UW Probability Seminar, October 24, 2016 1.4

Connection with exponential bounds: Suppose that X i N(0, 1) for 1 i n. n 1/2 S n N(0, 1), σ = 1, and P ( n 1/2 S n > x ) = 1 Φ(x) x 1 φ(x) = 1 2π 1 x e x2 /2. Thus with b n (2loglogn) 1/2, and δ > 0, P (n 1/2 S n > (1 + δ)(2loglogn) 1/2 ) 1 1 exp( (1 + δ) 2 loglogn) 2π (1 + δ)b n C (logn) (1+δ)2 b n C b nk (klogα) (1+δ)2, n k α k, α > 1. Therefore, by the Borell-Cantelli lemma, P (n 1/2 k S nk > (1 + δ)(2loglogn k ) 1/2 i.o.) = 0. UW Probability Seminar, October 24, 2016 1.5

2. Grenander s estimator X 1, X 2,..., X n are i.i.d. with density f on R + [0, ). Then Grenander s estimator f n of f is the Maximum Likelihood Estimator of f 0 over the class of decreasing densities on R + : if then f n satisfies L n (g) n 1 n i=1 log{g(x i )} = P n (logg) L n ( f n ) = max{l n (g) : g a monotone decreasing density on R + }. UW Probability Seminar, October 24, 2016 1.6

Grenander (1956) showed that the maximum likelihood estimator f n of f is the (left-) derivative of the least concave majorant of the empirical distribution function F n f n = left derivative of the least concave majorant of F n, the empirical distribution of X 1,..., X n i.i.d. F UW Probability Seminar, October 24, 2016 1.7

1 0.8 0.6 0.4 0.2 1 2 3 4 Least Concave Majorant and Empirical n = 10 UW Probability Seminar, October 24, 2016 1.8

4 3 2 1 1 2 3 4 Grenander Estimator and Exp(1) density, n = 10 UW Probability Seminar, October 24, 2016 1.9

1.0 0.8 0.6 0.4 0.2 Least Concave Majorant and Empirical n = 40 UW Probability Seminar, October 24, 2016 1.10

1.2 1.0 0.8 0.6 0.4 0.2 0.0 Grenander Estimator and Exp(1) density, n = 40 4 UW Probability Seminar, October 24, 2016 1.11

How to study f n? Groeneboom s switching relation! Switching for f n : Define ŝ n (a) argmax s 0 {F n (s) as}, a > 0 sup{s 0 : F n (s) as = sup(f n (z) az)}. z 0 Then for each fixed t (0, ) and a > 0 { f n (t) < a } = {ŝ n (a) < t}. UW Probability Seminar, October 24, 2016 1.12

s n (a) t UW Probability Seminar, October 24, 2016 1.13

Theorem. (Prakasa Rao (1969), Groeneboom (1985)): If f(x) > 0, f (x) < 0, and f is continuous at x, then with c (4f(x)/(f (x)) 2 ) 1/3 and C (2 1 f(x) f (x) ) 1/3, S n (x, t) n 1/3 ( f n (x + n 1/3 ct) f(x) ) where d C S(t) W is a standard two-sided Brownian motion starting at 0, and S the left - derivative of the least concave majorant C of W (t) t 2. UW Probability Seminar, October 24, 2016 1.14

In particular, with t = 0, n 1/3 ( f n (x) f(x) ) d CS(0) d = C2Z where the distribution of Z argmax h {W (h) h 2 } is Chernoff s distribution. Proof: By the switching relation { f n (t) < a} = {ŝ n (a) < t}: P (n 1/3 ( f n (x 0 + n 1/3 t) f(x 0 )) < y) = P ( ˆf n (x 0 + n 1/3 t) < f(x 0 ) + yn 1/3 ), = P (ŝ n (f(x 0 ) + yn 1/3 ) < x 0 + n 1/3 t) = P (argmax v {F n (v) (f(x 0 ) + n 1/3 y)v} < x 0 + n 1/3 t) UW Probability Seminar, October 24, 2016 1.15

Now we change variables v = x 0 + n 1/3 h in the argument of F n, center and scale to find: the right side in the last display equals P (argmax h {F n (x 0 + n 1/3 h) (f(x 0 ) + n 1/3 y)(x 0 + n 1/3 h)} < t) = P ( argmax h {F n (x 0 + n 1/3 h) F n (x 0 ) (F (x 0 + n 1/3 h) F (x 0 )) (1) + F (x 0 + n 1/3 h) F (x 0 ) f(x 0 )n 1/3 h n 2/3 yh} < t ). Let G n (t) n 1 n i=1 1{ξ i t}, U n (t) n(g n (t) t) with ξ i i.i.d. Uniform[0, 1]. Thus F n d = Gn (F ). The the stochastic term in (1) satisfies, with W two-sided standard Brownian motion, UW Probability Seminar, October 24, 2016 1.16

n 2/3 { F n (x 0 + n 1/3 h) F n (x 0 ) (F (x 0 + n 1/3 h) F (x 0 )) } d = n 2/3 1/2 { U n (F (x 0 + n 1/3 h)) U n (F (x 0 )) } = n 1/6 { U(F (x 0 + n 1/3 h)) U(F (x 0 )) } + o p (1) by KMT d = n 1/6 W (f(x 0 )n 1/3 h) + o p (1) d = f(x 0 )W (h) + o p (1) where W is a standard two-sided Brownian motion process starting from 0. On the other hand, with δ n n 1/3, n 2/3 ( F (x 0 + n 1/3 ) F (x 0 ) f(x 0 )n 1/3 h ) = δ 2 n (F (x 0 + δ n h) F (x 0 ) f(x 0 )δ n h) b h 2 with b = f (x 0 ) /2 by our hypotheses, while n 2/3 n 1/3 n 1/3 h = n 0 h = h. UW Probability Seminar, October 24, 2016 1.17

Thus it follows that the last probability above converges to P ( { argmax h aw (h) b h 2 yh } < t ) { P (Sa,b (t) < y) by switching again = where P ( (a/b) (2/3) argmax h {W (h) h 2 } (2b) 1 y < t ), by (2) below S a,b (t) = slope at t of the least concave majorant of aw (h) bh 2 f 0 (x 0 )W (h) f 0 (x 0) h 2 /2 d = 2 1 f 0 (x 0 )f 0 (x 0) S(t/c 0 ). and where we used argmax{aw (h) bh 2 } d = ( a ) 2/3 argmax{w (h) h 2 } 1 y. (2) b 2b UW Probability Seminar, October 24, 2016 1.18

First appearance of Z: Chernoff (1964), Estimation of the mode: X 1,..., X n i.i.d. with density f and distribution function F. Fix a > 0; ˆx a center of the interval of length 2a containing the most observations. x a center of the interval of length 2a maximizing F (x + a) F (x a). Chernoff shows: ) 1/3 Z n 1/3 ( (ˆx a x a ) 8f(xa +a) d c where c f (x a a) f (x a + a). f Z (z) = 1 2g(z)g( z) where g(t) lim x u(t, x) = lim x t 2 u x(t, x), x t 2 UW Probability Seminar, October 24, 2016 1.19

u(t, x) P (t,x) (W (z) > z 2, for some z t) is a solution to the backward heat equation t u(t, x) = 1 2 x2u(t, x) under the boundary conditions 2 u(t, t 2 ) = lim u(t, x) = 1, lim x t2 u(t, x) = 0. x UW Probability Seminar, October 24, 2016 1.20

Groeneboom (1989) showed that g has Fourier transform given by ĝ(λ) = eiλs g(s)ds = Groeneboom (1989) also showed that 2 1/3 Ai(i(2) 1/3 λ) ; (3) 1 f Z (z) 2Ai (a 1 ) 44/3 z exp 2 3 z3 + 3 1/3 a 1 z 1 P (Z > z) ( 2Ai (a 1 ) 44/31 z exp 23 ) z3 as ( ), z where a 1 2.3381... is the largest zero of the Airy function Ai on the negative real axis and Ai (a 1 ) 0.7022. UW Probability Seminar, October 24, 2016 1.21

0.7 0.6 0.5 0.4 0.3 0.2 0.1-1.5-1 -0.5 0.5 1 1.5 UW Probability Seminar, October 24, 2016 1.22

4. A LIL for Grenander s estimator The tail behavior of Z given in the last display leads naturally to the following conjecture concerning a LIL for the Grenander estimator: lim sup n n 1/3 ( f n (x 0 ) f(x 0 )) ((3/2)loglogn) 1/3 = a.s. 1 2 f(x 0)f (x 0 ) 1/3 2; note that if δ > 0 and n k α k with α > 1, then exp ( 23 ) [(1 + δ)((3/2)loglogn k) 1/3 ] 3 = exp ( (1 + δ) 3 ) loglogn k Equivalently lim sup n = (logn k ) (1+δ)3 n 1/3 ( f n (x 0 ) f(x 0 )) (2loglogn) 1/3 = a.s. ( klogα) (1+δ)3). 1 2 f(x 0)f (x 0 ) 1/3 2 1 = a.s. 2 f(x 0)f (x 0 ) 1/3 2 1 2 1/3 ( ) 3 1/3 2 ( ) 3 1/3. UW Probability Seminar, October 24, 2016 1.23 4

How to prove this conjecture? Step 1: Switching! Step 2: localize; use a functional LIL for the local empirical process: Deheuvels and Mason (1994); Mason (2004). Step 3: Find the set of limit points via 1 and 2; study these via properties of the natural Strassen limit sets analogous to the distributional equivalents for Brownian motion. Step 4: Solve the resulting variational problem over the set of limit points expressed in terms of the Strassen limit points. UW Probability Seminar, October 24, 2016 1.24

Step 1: Switching. Let r n (n 1 2loglogn) 1/3. We want to find a number y 0 such that P (r 1 n ( f n (x 0 ) f(x 0 )) > y i.o.) = { 0, if y > y0, 1, if y < y 0. Now { f n (x 0 ) > a} = {ŝ n (a) > x 0 }, so { f n (x 0 ) > f(x 0 ) + r n y i.o. } = {ŝ n (f(x 0 ) + r n y) > x 0 i.o.}. (4) By letting s = x 0 + r n h in the definition of ŝ n we see that ŝ n (f(x 0 ) + r n y) x 0 = r n argmax h {F n (x 0 + r n h) f(x 0 + r n y)(x 0 + r n h)} and hence the right side of (4) can be rewritten as {ĥn > 0 i.o.} where ĥ n = argmax h {F n (x 0 + r n h) (f(x 0 ) + r n y)(x 0 + r n h)} = argmax h { r 2 n [F n (x 0 + r n h) F n (x 0 ) (F (x 0 + r n h) F (x 0 ))] + r 2 n [F (x 0 + r n h) F (x 0 ) f(x 0 )r n h] yh }. (5) UW Probability Seminar, October 24, 2016 1.25

The deterministic (drift) term on the right side converges to f (x 0 )h 2 /2 as n. Step 2: LIL for the local empirical process. By a theorem of Mason (1988), the sequence of functions { r 2 n [F n (x 0 + r n h) F n (x 0 ) (F (x 0 + r n h) F (x 0 )) : h R } is almost surely relatively compact with limit set {g(f(x 0 ) ) : g G} where the two-sided Strassen limit set G is given by G = { g : R R g(t) = t 0 ġ(s)ds, t R, ġ2 (s)ds 1 Proof: Reduce to the local empirical process of ξ 1,..., ξ n i.i.d. Uniform(0, 1). As in Mason (1988), take k n nr n = n 2/3 (2loglogn) 1/3 and n 1 k n = r n 0. UW Probability Seminar, October 24, 2016 1.26 }.

Thus the processes involved in the argmax in (5) are almost surely relatively compact with limit set {g(f(x 0 )h) + 2 1 f (x 0 )h 2 yh : g G} (6) Step 3: Strassen set equivalences. Lemma 1. Let c > 0 and d R. then {t g(ct + d) g(d) : g G} = cg. [This is analogous to W (ct + d) W (d) = d cw (t).] UW Probability Seminar, October 24, 2016 1.27

Lemma 2. Let α, β be positive constants and γ R. Then { argmax{αg(h) βh 2 γh} : g G } = { (α/β) 2/3 argmax h {g(h) h 2 } γ/(2β) : g G }. [This is analogous to argmax ( αw (h) βh 2 γh) ) d = (α/β) 2/3 {W (h) h 2 )} γ/(2β).] By Lemmas 1 and 2, with a = f(x 0 ), b = f (x 0 ) /2, {g(f(x 0 )h) + 2 1 f (x 0 )h 2 yh : g G} = {ag(h) bh 2 yh : g G} = {(a/b) 2/3 argmax{g(h) h 2 } y/(2b) : g G} UW Probability Seminar, October 24, 2016 1.28

Hence with T g argmax h {g(h) h 2 }, {ĥn > 0 i.o. } { (a if y > y 0 2b(a/b) 2/3 sup g G a.s. = = { = It remains only to show that sup g G This follows from Lemma 3: b 2b ) 2/3 sup ( a b g G T g > y 2b ) 2/3 sup g G } T g > y T g = 2 1 f(x 0 )f (x 0 ) 1/3 2 sup g G T g = (3/4) 1/3 0.90856.... } T g. UW Probability Seminar, October 24, 2016 1.29

Lemma 3. Let t 0 be an arbitrary positive number and let ġ L 1 ([0, t 0 ]) be an arbitrary function satisfying t0 t ġ(s)ds 0 t2 0 0 t2 for 0 t t 0. Then, with ġ 0 (u) 2u, 0 t u, t0 0 ġ(u)2 du t0 0 ġ0(u) 2 du = t0 0 (2u)2 du = 4t3 0 3. UW Probability Seminar, October 24, 2016 1.30

0.8 0.5 1.0 1.5 2.0-0.5 0.6-1.0 0.4-1.5-2.0 0.2-2.5 0.5 1.0 1.5 2.0-3.0 g 0 (t) and g 0 (t) t 2 UW Probability Seminar, October 24, 2016 1.31

Intuition: A few pictures and trial functions ġ quickly lead to For this ġ 0 we have ġ 0 (s) 2s, 0 s t 0, 0, t 0 < s <, 0, t < 0. g 0 (t) = t 2 1 [0,t0 ] (t) + t2 0 1 (t 0, ) (t) + 0 1 (,0) (t). Note that ġ0 2(s)ds = t 0 0 (2s)2 ds = 4t 3 0 /3. and that g 0 (t) t 2 = { 0, t t0, t 2 0 t2, t > t 0. Thus argmax(g 0 (t) t 2 ) = t 0 while sup t 0 (g 0 (t) t 2 ) = 0 is achieved for all 0 t t 0. To force g 0 G we simply require that t 0 0 ġ2 0 (s)ds = 1 and this yields 4t3 0 /3 = 1 or t 0 = (3/4) 1/3. Proof. UW Probability Seminar, October 24, 2016 1.32

6. Open questions and further problems: For the mode estimator M( f n ) of a log-concave density with mode m M(f) and f (m) < 0, we know that n 1/5 (M( f n ) M(f)) d C 2 M(H (2) ) where H (2) is the convex function given by the second derivative of the invelope process H of Y (t) t 4 + Do we have a LIL for M( f n ): lim sup n t 0 W (s)ds. n 1/5 (M( f n ) M(f)) (2loglogn) 1/5 = a.s. C 2 <? With definitions as in the last problem, do we have P (M(H (2) ) t) K 1 exp( K 2 t 5 ) for all t 0 for some K 1, K 2 <? An answer to the previous problem would start to give information about K 2! UW Probability Seminar, October 24, 2016 1.33

Are the touch points of H and Y isolated? (See Groeneboom, Jongbloed, & W (2001), Groeneboom and Jongbloed (2015), page 320.) Balabdaoui and W (2014) show that Chernoff s distribution is log-concave. Is it strongly log-concave? Is there a Berry-Esseen type result for Grenander s estimator? That is, for what sequences r n do we have sup t R P ( n 1/3 ( f n (x) f(x))/c t ) P (2Z t) K r n for some K finite. UW Probability Seminar, October 24, 2016 1.34

References: Chernoff s density is log- Balabdaoui, F. and Wellner, J. A. (2014). concave. Bernoulli 20, 231-244. Balabdaoui, F., Rufibach, K., and Wellner, J. A. (2009). Limit distribution theory for maximum likelihood estimation of a log-concave density. Ann. Statist. 37, 1299-1331. Chernoff, H. (1964). Estimation of the mode. Ann. Inst. Statist. Math. 16, 31-41. Dümbgen, L., Wellner, J.A., and Wolff, M. (2016). A law of the iterated logarithm for Grenander s estimator. Stoch. Proc. Applic. xx, yy-zz. To appear. arxiv:1502.00320v1. Groeneboom, P. (1989). Brownian motion with a parabolic drift and Airy functions. Prob. Theor. Rel. Fields 81, 79-109. Groeneboom, P., Jongbloed, G., and Wellner, J.A. (2001). A canonical process for estimation of convex functions: the invelope of integrated Brownian motion +t 4. Ann. Statist. 29, 1620-1652. Mason, D. (2004). A uniform functional law of the logarithm for the local empirical process. Ann. Prob. 32, 1891-1418. UW Probability Seminar, October 24, 2016 1.35

Mason, D. (2004). A uniform functional law of the logarithm for the local empirical process. Ann. Prob. 32, 1891-1418. UW Probability Seminar, October 24, 2016 1.36