Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010
Outline: Five Lectures on Shape Restrictions L1: Monotone functions: maximum likelihood and least squares L2: Optimality of the MLE of a monotone density (and comparisons?) L3: Estimation of convex and k monotone density functions L4: Estimation of log-concave densities: d = 1 and beyond L5: More on higher dimensions and some open problems Statistical Seminar, Fréjus 2.1
Outline: Lecture 2 A: Local asymptotic minimax lower bounds B: Lower bounds for estimation of a monotone density Several scenarios C: D: E: Global lower bounds and upper bounds (briefly) Lower bounds for estimation of a convex density Lower bounds for estimation of a log-concave density Statistical Seminar, Fréjus 2.2
A. Local asymptotic minimax lower bounds Proposition. (Two-point lower bound) Let P be a set of probability measures on a measurable space (X, A), and let ν be a real-valued function defined on P. Moreover, let l : [0, ) [0, ) be an increasing convex loss function with l(0) = 0. Then, for any P 1, P 2 P such that H(P 1, P 2 ) < 1 and with E n,i f(x 1,..., X n ) = E n,i f(x) = for i = 1, 2, it follows that f(x 1,..., x n )dp i (x 1 ) dp i (x n ), f(x)dp n i (x) inf max { E n,1 l( T n ν(p 1 ) ), E n,2 l( T n ν(p 2 ) ) } (1) T n ( ) 1 l 4 ν(p 1) ν(p 2 ) {1 H 2 (P 1, P 2 )} 2n. Statistical Seminar, Fréjus 2.3
A. Local asymptotic minimax lower bounds Proof. By Jensen s inequality E n,i l( T n ν(p i ) ) l(e n,i T n ν(p i ) ), i = 1, 2, and hence the left side of (1) is bounded below by ( l inf max{e n,1 T n ν(p 1 ), E n,2 T n ν(p 2 ) T n Thus it suffices to prove the proposition for l(x) = x. Let p 1 dp 1 /(d(p 1 + P 2 ), p 2 = dp 2 /d(p 1 + P 2 ), and µ = P 1 + P 2 (or let p i be the density of P i with respect to some other convenient dominating measure µ, i = 1, 2). ). Statistical Seminar, Fréjus 2.4
A. Local asymptotic minimax lower bounds Two Facts: Fact 1: Suppose P, Q abs. cont. wrt µ, H 2 (P, Q) 2 1 { p q} 2 dµ = 1 pqdµ 1 ρ(p, Q). Then (1 H 2 (P, Q)) 2 1 { 1 } 2 (p q) dµ 2 (p q) dµ. Fact 2: If P and Q are two probability measures on a measurable space (X, A) and P n and Q n denote the corresponding product measures on (X n, A n ) (of X 1,..., X n i.i.d. as P or Q respectively), then ρ(p, Q) pqdµ satisfies Exercise. Prove Fact 1. Exercise. Prove Fact 2. ρ(p n, Q n ) = ρ(p, Q) n. (2) Statistical Seminar, Fréjus 2.5
A. Local asymptotic minimax lower bounds max { E n,1 T n ν(p 1 ), E n,2 T n ν(p 2 ) } 1 2 = 1 2 1 2 { En,1 T n ν(p 1 ) + E n,2 T n ν(p 2 ) } T n (x) ν(p 1 ) + n i=1 T n (x) ν(p 2 ) p 1 (x i )dµ(x 1 ) dµ(x n ) n i=1 [ T n (x) ν(p 1 ) + T n (x) ν(p 2 ) ] 1 2 ν(p 1) ν(p 2 ) n i=1 p 1 (x i ) p 2 (x i )dµ(x 1 ) dµ(x n ) n i=1 n i=1 p 1 (x i ) n i=1 p 2 (x i )dµ(x 1 ) dµ(x n ) 1 4 ν(p 1) ν(p 2 ) {1 H 2 (P1 n, P 2 n )}2 by Fact 1 = 1 4 ν(p 1) ν(p 2 ) {1 H 2 (P 1, P 2 )} 2n by Fact 2. p 2 (x i )dµ(x 1 ) Statistical Seminar, Fréjus 2.6
Several scenarios, estimation of f(x 0 ): S1 When f(x 0 ) > 0, f (x 0 ) < 0. S2 When x 0 (a, b) with f(x) constant on (a, b). In particular, f(x) = 1 [0,1] (x), x 0 (0, 1). S3 When f is discontinuous at x 0. S4 When f (j) (x 0 ) = 0 for j = 1,..., k 1, f (k) (x 0 ) 0. Statistical Seminar, Fréjus 2.7
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.8
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.9
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.10
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.11
S1: f 0 (x 0 ) > 0, f 0 (x 0) < 0. Suppose that we want to estimate ν(f) = f(x 0 ) for a fixed Let f 0 be the density corresponding to P 0, and suppose that f 0 (x 0) < 0. To apply our two-point lower bound Proposition we need to construct a sequence of densities f n that are near f 0 in the sense that for some constant A, and nh 2 (f n, f 0 ) A ν(f n ) ν(f 0 ) = b 1 n where b n. Hence we will try the following choice of f n. For c > 0, define f n (x) = f 0 (x) if x x 0 cn 1/3 or x > x 0 + cn 1/3, f 0 (x 0 cn 1/3 ) if x 0 cn 1/3 < x x 0, f 0 (x 0 + cn 1/3 ) if x 0 < x x 0 + cn 1/3. Statistical Seminar, Fréjus 2.12
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.13
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.14
It is easy to see that n 1/3 ν(f n ) ν(f 0 ) = n 1/3 (f 0 (x 0 cn 1/3 ) f 0 (x 0 )) f 0 (x 0) c (3) On the other hand some calculation shows that H 2 (p n, p 0 ) = 1 2 = 1 2 = 1 2 [ 0 [ f n (x) f n (x) 0 x0 +cn 1/3 [ f 0 (x)] 2 dx f 0 (x)] 2 [ f n (x) + f 0 (x)] 2 dx f n (x) + f 0 (x)] 2 [f n (x) f 0 (x)] 2 x 0 cn 1/3 [ f n (x) + f dx 0 (x)] 2 f 0 (x 0) 2 c 3 4f 0 (x 0 ) 3n. Statistical Seminar, Fréjus 2.15
Now we can combine these two pieces with our two-point lower bound Proposition to find that, for any estimator T n of ν(f) = f(x 0 ) and the loss function l(x) = x we have inf max { E n n 1/3 T n ν(f n ), E 0 n 1/3 T n ν(f 0 ) } T n 1 4 n1/3 (ν(f n ) ν(f 0 )) { 1 nh2 (f n, f 0 ) n = 1 4 n1/3 (f 0 (x 0 cn 1/3 ) f 0 (x 0 )) 1 4 f 0 (x 0) cexp ( 2 f 0 (x 0) 2 12f 0 (x 0 ) c3 ) { } 2n 1 nh2 (f n, f 0 ) n = 1 4 f 0 (x 0) cexp ( } 2n f 0 (x 0) 2 6f 0 (x 0 ) c3 ) Statistical Seminar, Fréjus 2.16
We now choose c to maximize the quantity on the right side. It is easily seen that the maximum is achieved when This yields c = c 0 ( ) 1/3 2f0 (x 0 ) f 0 (x 0) 2. lim inf n inf max { E n n 1/3 T n ν(f n ), E 0 n 1/3 T n ν(f 0 ) } T n e 1/3 ( 2 f 4 0 (x 0 ) f 0 (x 0 ) ) 1/3. This lower bound has the appropriate structure in the sense that the (nonparametric) MLE of f, f n (x 0 ) converges at rate n 1/3 and it has the same dependence on f 0 (x 0 ) and f 0 (x 0) as does the MLE. Statistical Seminar, Fréjus 2.17
Furthermore, note that for n sufficiently large sup E f T n ν(f) f:h(f,f 0 ) Cn 1/2 max { E n n 1/3 T n ν(f n ), E 0 n 1/3 T n ν(f 0 ) } if C 2 > 2A 2f 0 (x 0) 2 c 3 0 /(12f 0(x 0 )), and hence we conclude that lim inf n inf T n e 1/3 4 = e 1/3 4 2/3 sup E f T n ν(f) f:h(f,f 0 ) Cn 1/2 ( 2 f 0 (x 0 ) f 0 (x 0 ) ) 1/3 ( 2 1 f 0 (x 0) f 0 (x 0 ) ) 1/3 for all C sufficiently large. Comparison of E S(0) with e 1/3 42/3 = 0.284356? From Groeneboom and Wellner (2001), E S(0) = 2E Z = 2(.41273655) = 0.825473. Statistical Seminar, Fréjus 2.18
S2: x 0 (a, b) with f 0 (x) = f 0 (x 0 ) > 0 for all x (a, b). To apply our two-point lower bound Proposition we again need to construct a sequence of densities f n that are near f 0 in the sense that nh 2 (f n, f 0 ) A for some constant A, and ν(f n ) ν(f 0 ) = b 1 n where b n. In this scenario we define a sequence of densities {f n } by where f n (x) = f 0 (x), x a n f 0 (x) + c b a n x 0 a, a n < x x 0 f 0 (x) c b a n x 0 < x < b n b x 0 f 0 (x), b b n. a n sup{x : f 0 (x) f 0 (x 0 ) + cn 1/2 (b a)/(x 0 a)} b n inf{x : f 0 (x) < f 0 (x 0 ) cn 1/2 (b a)/(b x 0 )}. The intervals (a n, a) and (b, b n ) may be empty if f(a ) > f(a+) and/or f(b+) < f(b ) and n is large. Statistical Seminar, Fréjus 2.19
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.20
It is easy to see that n ν(fn ) ν(f 0 ) = n f n (x 0 ) f 0 (x 0 ) = c b a x 0 a (4) On the other hand some calculation shows that H 2 (f n, f 0 ) c2 (b a) 2 4nf 0 (x 0 ) = { 1 x 0 a + 1 b x 0 c 2 (b a) 3 4nf 0 (x 0 )(x 0 a)(b x 0 ). } Statistical Seminar, Fréjus 2.21
Combining these two pieces with the two-point lower bound Proposition we find that, in scenario 2, for any estimator T n of ν(f) = f(x 0 ) and the loss function l(x) = x we have inf max { E n n Tn ν(f n ), E 0 n Tn ν(f 0 ) } T n 1 4 n(ν(f n ) ν(f 0 )) = 1 4 c b a x 0 a { 1 4 c b a x 0 a exp Ac exp( Bc 2 ) { 1 nh2 (f n, f 0 ) n } 2n 1 nh2 (f n, f 0 ) n ( c 2 (b a) 3 2f 0 (x 0 )(x 0 a)(b x 0 ) } 2n ) Statistical Seminar, Fréjus 2.22
We now choose c to maximize the quantity on the right side. It is easily seen that the maximum is achieved when c = c 0 1/ 2B, with Ac 0 exp( Bc 2 0 ) = Ac 0exp( 1/2) and c 0 = ( f 0 (x 0 ) (b a)3 (x 0 a)(b x 0 ) ) 1/2. lim inf n inf max { E n n Tn ν(f n ), E 0 n Tn ν(f 0 ) } T n e 1/2 f0 (x 0 ) b x0 4 b a x 0 a. Repeating this argument with the right-continuous version of the sequence {f n } yields a similar bound, but with the factor (b x 0 )/(x 0 a) replaced by (x 0 a)/(b x 0 ). Statistical Seminar, Fréjus 2.23
By taking the maximum of the two lower bounds yields the last display with the right side replaced by e 1/2 { } f0 (x 0 ) b x0 x0 4 b a max x 0 a, a b x 0 { e 1/2 f0 (x 0 ) b x0 4 b a x 0 a b x } 0 x0 b a + a x0 a. b x 0 b a This lower bound has the appropriate structure in the sense that the MLE of f, f n (x 0 ) converges at rate n 1/2 and the limiting behavior of the MLE has exactly the same dependence on f 0 (x 0 ), b a, x 0 a, and b x 0. Statistical Seminar, Fréjus 2.24
Theorem. (Carolan and Dykstra, 1999) If f 0 is decreasing with f 0 constant on (a, b), the maximal open interval containing x 0, then, with p f 0 (x 0 )(b a) = P 0 (a < X < b), n( f n (x 0 ) f 0 (x 0 )) d f0 (x 0 ) { 1 pz + S ( x0 a b a b a where Z N(0, 1) and S is the process of left-derivatives of the least concave majorant Û of a Brownian bridge process U independent of Z. Note that by using Groeneboom (1983) { f0 (x E 0 ) 1 ( )} x0 a pz + S b a b a f0 (x 0 ) ( ) S x0 b a E a b a { f0 (x = 0 ) 2 (b x0 b a 2 ) 3/2 π(b a) (x 0 a) 1/2 + (x 0 a) 3/2 } (b x 0 ) 1/2. Statistical Seminar, Fréjus 2.25 )}
S3: f 0 (x 0 ) > f 0 (x 0 +). In this case we consider estimation of the functional ν(f) = (f(x 0 +) + f(x 0 ))/2 f(x 0 ). To apply our two-point lower bound Proposition, consider the following choice of f n : for c > 0, define f n (x) = f 0 (x) if x x 0 or x > b n, f 0 (x 0 ) +(x x 0 ) f 0(b n ) f 0 (x 0 ) c/n if x 0 < x b n. where b n x 0 + c/n. Then define f n = f n / 0 In this case f n (y)dy. ν(f n ) ν(f 0 ) = f n (x 0 ) f 0 (x 0 ) = f n (x 0 ) 1 + o(1) f 0(x 0 +) + f 0 (x 0 ) 2 = 1 2 (f 0(x 0 ) f 0 (x 0 +)) + o(1) d + o(1). Statistical Seminar, Fréjus 2.26
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.27
Some calculation shows that r 2 = { f 0 (x 0 ) H 2 (f n, f 0 ) = cr2 (1 + o(1)) n where f 0 (x 0 +)} 2 {3 f 0 (x 0 ) + f 0 (x 0 ) + f 0 (x 0 +) f 0 (x 0 +)}. Combining these pieces with the two-point lower bound yields inf max {E n T n ν(f n ), E 0 T n ν(f 0 ) } T n 1 4 ν(f n) ν(f 0 ) { 1 nh2 (f n, f 0 ) n = 1 8 (f 0(x 0 ) f 0 (x 0 +)) (1 + o(1)) } 2n { 1 cr2 (1 + o(1)) n } 2n d 4 exp ( cr 2) = d 4e by choosing c = 1/r 2. Statistical Seminar, Fréjus 2.28
This corresponds to the following theorem for the MLE f n : Theorem. (Anevski and Hössjer, 2002; W, 2007) If x 0 is a discontinuity point of f 0, d (f 0 (x 0 ) f 0 (x 0 +))/2 with f 0 (x 0 +) > 0 and f(x 0 ) (f 0 (x 0 ) + f 0 (x 0 ))/2, then f n (x 0 ) f 0 (x 0 ) d R(0) where h R(h) is the process of left-derivatives of the least concave majorant M of the process M defined by M(h) = N 0 (h) d h { N(f0 (x 0 +)h) f 0 (x 0 +)h dh, h 0 N(f 0 (x 0 )h) f 0 (x 0 )h + dh, h < 0 where N is a standard (rate 1) two-sided Poisson process on R. Statistical Seminar, Fréjus 2.29
5 40 20 20 40 5 10 Statistical Seminar, Fréjus 2.30
S4: f 0 (x 0 ) > 0, f (j) 0 (x 0) = 0, j = 1, 2,..., p 1, and f (p) 0 (x 0) 0. In this case, consider the perturbation f ɛ of f 0 given for ɛ > 0 by f ɛ (x) = Then for ν(f) = f(x 0 ) f 0 (x) if x x 0 ɛ or x > x 0 + ɛ, f 0 (x 0 ɛ) if x 0 ɛ < x x 0 f 0 (x 0 + ɛ) if x 0 < x x 0 + ɛ. where f (p) ν(f ɛ ) ν(f 0 ) 0 (x 0) ɛ p, p! H 2 f (p) (f ɛ, f 0 ) A 0 (x 0) 2 p ɛ 2p+1 B p ɛ 2p+1 f 0 (x 0 ) A p 2p 2 (2p!) 2 (2p 2 + 3p + 1). Statistical Seminar, Fréjus 2.31
1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.32
Choosing ɛ = cn 1/(2p+1), plugging into our two-point bound, and optimizing with respect to c yields inf max { n p/(2p+1) E n T n ν(f n ), n p/(2p+1) E 0 T n ν(f 0 ) } T n 1 4 ν(f n) ν(f 0 ) 1 4 f (p) 0 (x 0) p! { 1 nh2 (f n, f 0 ) n c p exp ( 2B p c 2p+1) } 2n = D p ( f (p) 0 (x 0) f 0 (x 0 ) p ) 1/(2p+1) taking c = with D p 1 4p! ( p p (2p + 1)A p p ( p (2p + 1)B p ) 1/(2p+1) exp( p/(2p + 1)). ) 1/(2p+1) Statistical Seminar, Fréjus 2.33
The resulting lower bound corresponds to the following theorem for f n : Theorem. (Wright (1981); Leurgans (1982); Anevski and Hössjer (2002)) Suppose that f (j) 0 (x 0) = 0 for j = 1,..., p 1, f (p) 0 (x 0) 0, and f (p) 0 is continuous at x 0. Then n p/(2p+1) ( f n (x 0 + n 1/(2p+1) t) f 0 (x 0 )) d C p S p (t) where S p is the process given by the left-derivatives of the least concave majorant Ŷp of Y p (t) W (t) t p+1, and where In particular C p = ( f 0 (x 0 ) p f (p) 0 (x 0) /(p + 1)!) 1/(2p+1). n p/(2p+1) ( f n (x 0 ) f 0 (x 0 )) d C p S p (0) Proof. Switching + (argmax-)continuous mapping theorem. Statistical Seminar, Fréjus 2.34
Summary: The MLE f n is locally adaptive to f 0, at least in scenarios 1-4. S1: rate n 1/3 ; localization n 1/3 ; constants agree with minimax lower bound. S2: rate n 1/2 ; localization n 0 = 1, none; constants agree with minimax bound. S3: rate n 0 = 1; localization n 1 ; constants agree(?). S4: rate n p/(2p+1) ; localization n 1/(2p+1) ; constants agree. Statistical Seminar, Fréjus 2.35
C: Global lower and upper bounds (briefly) Birgé (1986, 1989) expresses the global optimality of f n in terms of its L 1 risks as follows: Lower bound: Birgé (1987). Let F denote the class of all decreasing densities f on [0, 1] satisfying f M with M > 1. Then the minimax risk for F with respect to the L 1 metric d 1 (f, g) f(x) g(x) dx based on n observations is R M (d 1, n) inf sup E f d 1 ( ˆf n, f). ˆf n f F Then there is an absolute constant C such that ( ) logm 1/3 R M (d 1, n) C. n Upper bound, Grenander: Birgé (1989). Let f n denote the Grenander estimator of f F. Then ( ) logm 1/3 sup E f d 1 ( ˆf n, f) 4.75. f F M n Statistical Seminar, Fréjus 2.36
C: Global lower and upper bounds (briefly) Birgé s bounds are complemented by the remarkable results of Groeneboom (1985), Groeneboom, Hooghiemstra, and Lopuhaa (1999). Set V (t) sup{s : W (s) (s t) 2 is maximal} where W is a standard two-sided Brownian motion process starting from 0. Theorem. (Groeneboom (1985), GHL (1999)) Suppose that f is a decreasing density on [0, 1] satisfying: A1. 0 < f(1) f(y) f(x) f(0) < for 0 x y 1. A2. 0 < inf 0<x<1 f (x) sup 0<x<1 f (x) <. A3. sup 0<x<1 f (x) <. Then, with µ = 2E V (0) 1 0 1 2 f (x)f(x) 1/3 dx, n 1/6 { n 1/3 1 where σ 2 = 8 0 0 f n (x) f(x) dx µ } Cov( V (0), V (t) t )dt. d σz N(0, σ 2 ) Statistical Seminar, Fréjus 2.37
D: Lower bounds: convex decreasing density Now consider estimation of a convex decreasing density f on [0, ). (Original motivation: Hampel s (1987) bird-migration problem.) Since f exists almost everywhere, we are now interested in in estimation of ν 1 (f) = f(x 0 ) and ν 2 (f) = f (x 0 ). We let D 2 denote the class of all convex decreasing densities on R +. Note that every f D 2 can be written as a scale mixture of the triangular (or Beta(1, 2)) density: if f D 2, then f(x) = 0 2y 1 (1 x/y) + dg(y) for some (mixing) distribution G on [0, ). This corresponds to the fact that monotone decreasing density f D D 1 can be written as a scale mixture of the Uniform(0, 1) (or Beta(1, 1)) density: if f D 1, then f(x) = 0 y 1 1 [0,y] (x)dg(y) for some distribution G on [0, ). Statistical Seminar, Fréjus 2.38
D: Lower bounds: convex decreasing density Scenario 1: Suppose that f 0 D 2 and x 0 (0, ) satisfy f 0 (x 0 ) > 0, f 0 (x 0) > 0, and f 0 is continuous at x 0. To establish lower bounds, consider the perturbations f ɛ of f 0 given by f ɛ (x) = f 0 (x 0 ɛc ɛ ) + (x x 0 + ɛc ɛ )f 0 (x 0 ɛc ɛ ), f 0 (x 0 + ɛ) + (x x 0 ɛ)f 0 (x 0 + ɛ), f 0 (x), x (x 0 ɛc ɛ, x 0 ɛ), x (x 0 ɛ, x 0 + ɛ), elsewhere; here c ɛ is chosen so that f ɛ is continuous at x 0 ɛ. Now define f ɛ by f ɛ (x) = f ɛ (x) + τ ɛ (x 0 ɛ x)1 [0,x0 ɛ] (x) with τ ɛ chosen so that f ɛ integrates to 1. Statistical Seminar, Fréjus 2.39
D: Lower bounds: convex decreasing density 2.0 1.5 1.0 0.5 0.0 0 Statistical Seminar, Fréjus 2.40
D: Lower bounds: convex decreasing density 2.0 1.5 1.0 0.5 Statistical Seminar, Fréjus 2.41
D: Lower bounds: convex decreasing density Now ν 1 (f ɛ ) ν 1 (f 0 ) = f ɛ (x 0 ) f 0 (x 0 ) 1 2 f (2) 0 (x 0)ɛ 2 (1 + o(1)), ν 2 (f ɛ ) ν 2 (f 0 ) = f ɛ(x 0 ) f 0 (x 0) f (2) 0 (x 0)ɛ(1 + o(1)), and some further computation (Jongbloed (1995), (2000)) shows that H 2 (f ɛ, f 0 ) = 2f (2) 0 (x 0) 2 5f 0 (x 0 ) ɛ5 (1 + o(1)). Thus taking ɛ ɛ n = cn 1/5, writing f n for f ɛn, and using our two-point lower bound proposition yields lim inf n inf max { E n n 2/5 T n ν 1 (f n ), E 0 n 2/5 T n ν 1 (f 0 ) } T n and... 1 4 f 0 2(x 0)f (2) 0 (x 0) 2 8 2 e 2 1/5 Statistical Seminar, Fréjus 2.42,
D: Lower bounds: convex decreasing density lim inf n inf max { E n n 1/5 T n ν 2 (f n ), E 0 n 1/5 T n ν 2 (f 0 ) } T n 1 4 f 0(x 0 )f (2) 0 (x 0) 3 4e 1/5 We will see that the MLE achieves these rates and that the limiting distributions involve exactly these constants tomorrow. Other Scenarios? S2: f 0 triangular on [0, 1]? (Degenerate mixing distribution at 1.) S3: x 0 (a, b) where f 0 is linear on (a, b)? S4: x 0 a bend or kink point for x 0 : f 0 (x 0 ) < f 0 (x 0+)? S5:? Statistical Seminar, Fréjus 2.43.