Nonparametric estimation under Shape Restrictions

Similar documents
STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

Nonparametric estimation under Shape Restrictions

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities

Shape constrained estimation: a brief introduction

Chernoff s distribution is log-concave. But why? (And why does it matter?)

A Law of the Iterated Logarithm. for Grenander s Estimator

Maximum likelihood: counterexamples, examples, and open problems

Nonparametric estimation under shape constraints, part 2

Asymptotic normality of the L k -error of the Grenander estimator

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Estimation of a discrete monotone distribution

A tailor made nonparametric density estimate

Iowa State University. Instructor: Alex Roitershtein Summer Homework #5. Solutions

Lecture 7 Introduction to Statistical Decision Theory

Least squares under convex constraint

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

f(x) f(z) c x z > 0 1

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.

THE INVERSE FUNCTION THEOREM

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 3

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Lectures 22-23: Conditional Expectations

Lecture 35: December The fundamental statistical distances

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Lecture 8: Information Theory and Statistics

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

with Current Status Data

MATH 217A HOMEWORK. P (A i A j ). First, the basis case. We make a union disjoint as follows: P (A B) = P (A) + P (A c B)

Chapter 6. Integration. 1. Integrals of Nonnegative Functions. a j µ(e j ) (ca j )µ(e j ) = c X. and ψ =

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Bi-s -Concave Distributions

Generalized continuous isotonic regression

0.1 Uniform integrability

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

Minimax lower bounds I

THEOREMS, ETC., FOR MATH 515

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Continuity. Matt Rosenzweig

Problem List MATH 5143 Fall, 2013

Generalized Orlicz spaces and Wasserstein distances for convex concave scale functions

Estimation of a Convex Function: Characterizations and Asymptotic Theory

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

4 Invariant Statistical Decision Problems

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

6.1 Variational representation of f-divergences

Integration on Measure Spaces

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Economics 204. The Transversality Theorem is a particularly convenient formulation of Sard s Theorem for our purposes: with r 1+max{0,n m}

MATH 6337 Second Midterm April 1, 2014

Random fraction of a biased sample

Math 5210, Definitions and Theorems on Metric Spaces

Set, functions and Euclidean space. Seungjin Han

MATH 202B - Problem Set 5

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Brownian Motion and Stochastic Calculus

The International Journal of Biostatistics

Immerse Metric Space Homework

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

arxiv:math/ v2 [math.st] 17 Jun 2008

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

6.2 Fubini s Theorem. (µ ν)(c) = f C (x) dµ(x). (6.2) Proof. Note that (X Y, A B, µ ν) must be σ-finite as well, so that.

CHAPTER 3: LARGE SAMPLE THEORY

Nonparametric estimation under shape constraints, part 1

lim n C1/n n := ρ. [f(y) f(x)], y x =1 [f(x) f(y)] [g(x) g(y)]. (x,y) E A E(f, f),

DIFFERENTIAL CRITERIA FOR POSITIVE DEFINITENESS

Analysis II - few selective results

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

1 Solution to Problem 2.1

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski

Problem set 5, Real Analysis I, Spring, otherwise. (a) Verify that f is integrable. Solution: Compute since f is even, 1 x (log 1/ x ) 2 dx 1

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Notes on the Lebesgue Integral by Francis J. Narcowich November, 2013

Integral Jensen inequality

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

On Efficiency of the Plug-in Principle for Estimating Smooth Integrated Functionals of a Nonincreasing Density

Part II Probability and Measure

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Elements of Probability Theory

here, this space is in fact infinite-dimensional, so t σ ess. Exercise Let T B(H) be a self-adjoint operator on an infinitedimensional

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Estimation of a k monotone density, part 3: limiting Gaussian versions of the problem; invelopes and envelopes

x y More precisely, this equation means that given any ε > 0, there exists some δ > 0 such that

DRAFT - Math 101 Lecture Note - Dr. Said Algarni

Real Analysis Notes. Thomas Goller

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Econ 508B: Lecture 5

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley

Maximum Likelihood Estimation under Shape Constraints

On continuous time contract theory

Transcription:

Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010

Outline: Five Lectures on Shape Restrictions L1: Monotone functions: maximum likelihood and least squares L2: Optimality of the MLE of a monotone density (and comparisons?) L3: Estimation of convex and k monotone density functions L4: Estimation of log-concave densities: d = 1 and beyond L5: More on higher dimensions and some open problems Statistical Seminar, Fréjus 2.1

Outline: Lecture 2 A: Local asymptotic minimax lower bounds B: Lower bounds for estimation of a monotone density Several scenarios C: D: E: Global lower bounds and upper bounds (briefly) Lower bounds for estimation of a convex density Lower bounds for estimation of a log-concave density Statistical Seminar, Fréjus 2.2

A. Local asymptotic minimax lower bounds Proposition. (Two-point lower bound) Let P be a set of probability measures on a measurable space (X, A), and let ν be a real-valued function defined on P. Moreover, let l : [0, ) [0, ) be an increasing convex loss function with l(0) = 0. Then, for any P 1, P 2 P such that H(P 1, P 2 ) < 1 and with E n,i f(x 1,..., X n ) = E n,i f(x) = for i = 1, 2, it follows that f(x 1,..., x n )dp i (x 1 ) dp i (x n ), f(x)dp n i (x) inf max { E n,1 l( T n ν(p 1 ) ), E n,2 l( T n ν(p 2 ) ) } (1) T n ( ) 1 l 4 ν(p 1) ν(p 2 ) {1 H 2 (P 1, P 2 )} 2n. Statistical Seminar, Fréjus 2.3

A. Local asymptotic minimax lower bounds Proof. By Jensen s inequality E n,i l( T n ν(p i ) ) l(e n,i T n ν(p i ) ), i = 1, 2, and hence the left side of (1) is bounded below by ( l inf max{e n,1 T n ν(p 1 ), E n,2 T n ν(p 2 ) T n Thus it suffices to prove the proposition for l(x) = x. Let p 1 dp 1 /(d(p 1 + P 2 ), p 2 = dp 2 /d(p 1 + P 2 ), and µ = P 1 + P 2 (or let p i be the density of P i with respect to some other convenient dominating measure µ, i = 1, 2). ). Statistical Seminar, Fréjus 2.4

A. Local asymptotic minimax lower bounds Two Facts: Fact 1: Suppose P, Q abs. cont. wrt µ, H 2 (P, Q) 2 1 { p q} 2 dµ = 1 pqdµ 1 ρ(p, Q). Then (1 H 2 (P, Q)) 2 1 { 1 } 2 (p q) dµ 2 (p q) dµ. Fact 2: If P and Q are two probability measures on a measurable space (X, A) and P n and Q n denote the corresponding product measures on (X n, A n ) (of X 1,..., X n i.i.d. as P or Q respectively), then ρ(p, Q) pqdµ satisfies Exercise. Prove Fact 1. Exercise. Prove Fact 2. ρ(p n, Q n ) = ρ(p, Q) n. (2) Statistical Seminar, Fréjus 2.5

A. Local asymptotic minimax lower bounds max { E n,1 T n ν(p 1 ), E n,2 T n ν(p 2 ) } 1 2 = 1 2 1 2 { En,1 T n ν(p 1 ) + E n,2 T n ν(p 2 ) } T n (x) ν(p 1 ) + n i=1 T n (x) ν(p 2 ) p 1 (x i )dµ(x 1 ) dµ(x n ) n i=1 [ T n (x) ν(p 1 ) + T n (x) ν(p 2 ) ] 1 2 ν(p 1) ν(p 2 ) n i=1 p 1 (x i ) p 2 (x i )dµ(x 1 ) dµ(x n ) n i=1 n i=1 p 1 (x i ) n i=1 p 2 (x i )dµ(x 1 ) dµ(x n ) 1 4 ν(p 1) ν(p 2 ) {1 H 2 (P1 n, P 2 n )}2 by Fact 1 = 1 4 ν(p 1) ν(p 2 ) {1 H 2 (P 1, P 2 )} 2n by Fact 2. p 2 (x i )dµ(x 1 ) Statistical Seminar, Fréjus 2.6

Several scenarios, estimation of f(x 0 ): S1 When f(x 0 ) > 0, f (x 0 ) < 0. S2 When x 0 (a, b) with f(x) constant on (a, b). In particular, f(x) = 1 [0,1] (x), x 0 (0, 1). S3 When f is discontinuous at x 0. S4 When f (j) (x 0 ) = 0 for j = 1,..., k 1, f (k) (x 0 ) 0. Statistical Seminar, Fréjus 2.7

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.8

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.9

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.10

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.11

S1: f 0 (x 0 ) > 0, f 0 (x 0) < 0. Suppose that we want to estimate ν(f) = f(x 0 ) for a fixed Let f 0 be the density corresponding to P 0, and suppose that f 0 (x 0) < 0. To apply our two-point lower bound Proposition we need to construct a sequence of densities f n that are near f 0 in the sense that for some constant A, and nh 2 (f n, f 0 ) A ν(f n ) ν(f 0 ) = b 1 n where b n. Hence we will try the following choice of f n. For c > 0, define f n (x) = f 0 (x) if x x 0 cn 1/3 or x > x 0 + cn 1/3, f 0 (x 0 cn 1/3 ) if x 0 cn 1/3 < x x 0, f 0 (x 0 + cn 1/3 ) if x 0 < x x 0 + cn 1/3. Statistical Seminar, Fréjus 2.12

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.13

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.14

It is easy to see that n 1/3 ν(f n ) ν(f 0 ) = n 1/3 (f 0 (x 0 cn 1/3 ) f 0 (x 0 )) f 0 (x 0) c (3) On the other hand some calculation shows that H 2 (p n, p 0 ) = 1 2 = 1 2 = 1 2 [ 0 [ f n (x) f n (x) 0 x0 +cn 1/3 [ f 0 (x)] 2 dx f 0 (x)] 2 [ f n (x) + f 0 (x)] 2 dx f n (x) + f 0 (x)] 2 [f n (x) f 0 (x)] 2 x 0 cn 1/3 [ f n (x) + f dx 0 (x)] 2 f 0 (x 0) 2 c 3 4f 0 (x 0 ) 3n. Statistical Seminar, Fréjus 2.15

Now we can combine these two pieces with our two-point lower bound Proposition to find that, for any estimator T n of ν(f) = f(x 0 ) and the loss function l(x) = x we have inf max { E n n 1/3 T n ν(f n ), E 0 n 1/3 T n ν(f 0 ) } T n 1 4 n1/3 (ν(f n ) ν(f 0 )) { 1 nh2 (f n, f 0 ) n = 1 4 n1/3 (f 0 (x 0 cn 1/3 ) f 0 (x 0 )) 1 4 f 0 (x 0) cexp ( 2 f 0 (x 0) 2 12f 0 (x 0 ) c3 ) { } 2n 1 nh2 (f n, f 0 ) n = 1 4 f 0 (x 0) cexp ( } 2n f 0 (x 0) 2 6f 0 (x 0 ) c3 ) Statistical Seminar, Fréjus 2.16

We now choose c to maximize the quantity on the right side. It is easily seen that the maximum is achieved when This yields c = c 0 ( ) 1/3 2f0 (x 0 ) f 0 (x 0) 2. lim inf n inf max { E n n 1/3 T n ν(f n ), E 0 n 1/3 T n ν(f 0 ) } T n e 1/3 ( 2 f 4 0 (x 0 ) f 0 (x 0 ) ) 1/3. This lower bound has the appropriate structure in the sense that the (nonparametric) MLE of f, f n (x 0 ) converges at rate n 1/3 and it has the same dependence on f 0 (x 0 ) and f 0 (x 0) as does the MLE. Statistical Seminar, Fréjus 2.17

Furthermore, note that for n sufficiently large sup E f T n ν(f) f:h(f,f 0 ) Cn 1/2 max { E n n 1/3 T n ν(f n ), E 0 n 1/3 T n ν(f 0 ) } if C 2 > 2A 2f 0 (x 0) 2 c 3 0 /(12f 0(x 0 )), and hence we conclude that lim inf n inf T n e 1/3 4 = e 1/3 4 2/3 sup E f T n ν(f) f:h(f,f 0 ) Cn 1/2 ( 2 f 0 (x 0 ) f 0 (x 0 ) ) 1/3 ( 2 1 f 0 (x 0) f 0 (x 0 ) ) 1/3 for all C sufficiently large. Comparison of E S(0) with e 1/3 42/3 = 0.284356? From Groeneboom and Wellner (2001), E S(0) = 2E Z = 2(.41273655) = 0.825473. Statistical Seminar, Fréjus 2.18

S2: x 0 (a, b) with f 0 (x) = f 0 (x 0 ) > 0 for all x (a, b). To apply our two-point lower bound Proposition we again need to construct a sequence of densities f n that are near f 0 in the sense that nh 2 (f n, f 0 ) A for some constant A, and ν(f n ) ν(f 0 ) = b 1 n where b n. In this scenario we define a sequence of densities {f n } by where f n (x) = f 0 (x), x a n f 0 (x) + c b a n x 0 a, a n < x x 0 f 0 (x) c b a n x 0 < x < b n b x 0 f 0 (x), b b n. a n sup{x : f 0 (x) f 0 (x 0 ) + cn 1/2 (b a)/(x 0 a)} b n inf{x : f 0 (x) < f 0 (x 0 ) cn 1/2 (b a)/(b x 0 )}. The intervals (a n, a) and (b, b n ) may be empty if f(a ) > f(a+) and/or f(b+) < f(b ) and n is large. Statistical Seminar, Fréjus 2.19

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.20

It is easy to see that n ν(fn ) ν(f 0 ) = n f n (x 0 ) f 0 (x 0 ) = c b a x 0 a (4) On the other hand some calculation shows that H 2 (f n, f 0 ) c2 (b a) 2 4nf 0 (x 0 ) = { 1 x 0 a + 1 b x 0 c 2 (b a) 3 4nf 0 (x 0 )(x 0 a)(b x 0 ). } Statistical Seminar, Fréjus 2.21

Combining these two pieces with the two-point lower bound Proposition we find that, in scenario 2, for any estimator T n of ν(f) = f(x 0 ) and the loss function l(x) = x we have inf max { E n n Tn ν(f n ), E 0 n Tn ν(f 0 ) } T n 1 4 n(ν(f n ) ν(f 0 )) = 1 4 c b a x 0 a { 1 4 c b a x 0 a exp Ac exp( Bc 2 ) { 1 nh2 (f n, f 0 ) n } 2n 1 nh2 (f n, f 0 ) n ( c 2 (b a) 3 2f 0 (x 0 )(x 0 a)(b x 0 ) } 2n ) Statistical Seminar, Fréjus 2.22

We now choose c to maximize the quantity on the right side. It is easily seen that the maximum is achieved when c = c 0 1/ 2B, with Ac 0 exp( Bc 2 0 ) = Ac 0exp( 1/2) and c 0 = ( f 0 (x 0 ) (b a)3 (x 0 a)(b x 0 ) ) 1/2. lim inf n inf max { E n n Tn ν(f n ), E 0 n Tn ν(f 0 ) } T n e 1/2 f0 (x 0 ) b x0 4 b a x 0 a. Repeating this argument with the right-continuous version of the sequence {f n } yields a similar bound, but with the factor (b x 0 )/(x 0 a) replaced by (x 0 a)/(b x 0 ). Statistical Seminar, Fréjus 2.23

By taking the maximum of the two lower bounds yields the last display with the right side replaced by e 1/2 { } f0 (x 0 ) b x0 x0 4 b a max x 0 a, a b x 0 { e 1/2 f0 (x 0 ) b x0 4 b a x 0 a b x } 0 x0 b a + a x0 a. b x 0 b a This lower bound has the appropriate structure in the sense that the MLE of f, f n (x 0 ) converges at rate n 1/2 and the limiting behavior of the MLE has exactly the same dependence on f 0 (x 0 ), b a, x 0 a, and b x 0. Statistical Seminar, Fréjus 2.24

Theorem. (Carolan and Dykstra, 1999) If f 0 is decreasing with f 0 constant on (a, b), the maximal open interval containing x 0, then, with p f 0 (x 0 )(b a) = P 0 (a < X < b), n( f n (x 0 ) f 0 (x 0 )) d f0 (x 0 ) { 1 pz + S ( x0 a b a b a where Z N(0, 1) and S is the process of left-derivatives of the least concave majorant Û of a Brownian bridge process U independent of Z. Note that by using Groeneboom (1983) { f0 (x E 0 ) 1 ( )} x0 a pz + S b a b a f0 (x 0 ) ( ) S x0 b a E a b a { f0 (x = 0 ) 2 (b x0 b a 2 ) 3/2 π(b a) (x 0 a) 1/2 + (x 0 a) 3/2 } (b x 0 ) 1/2. Statistical Seminar, Fréjus 2.25 )}

S3: f 0 (x 0 ) > f 0 (x 0 +). In this case we consider estimation of the functional ν(f) = (f(x 0 +) + f(x 0 ))/2 f(x 0 ). To apply our two-point lower bound Proposition, consider the following choice of f n : for c > 0, define f n (x) = f 0 (x) if x x 0 or x > b n, f 0 (x 0 ) +(x x 0 ) f 0(b n ) f 0 (x 0 ) c/n if x 0 < x b n. where b n x 0 + c/n. Then define f n = f n / 0 In this case f n (y)dy. ν(f n ) ν(f 0 ) = f n (x 0 ) f 0 (x 0 ) = f n (x 0 ) 1 + o(1) f 0(x 0 +) + f 0 (x 0 ) 2 = 1 2 (f 0(x 0 ) f 0 (x 0 +)) + o(1) d + o(1). Statistical Seminar, Fréjus 2.26

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.27

Some calculation shows that r 2 = { f 0 (x 0 ) H 2 (f n, f 0 ) = cr2 (1 + o(1)) n where f 0 (x 0 +)} 2 {3 f 0 (x 0 ) + f 0 (x 0 ) + f 0 (x 0 +) f 0 (x 0 +)}. Combining these pieces with the two-point lower bound yields inf max {E n T n ν(f n ), E 0 T n ν(f 0 ) } T n 1 4 ν(f n) ν(f 0 ) { 1 nh2 (f n, f 0 ) n = 1 8 (f 0(x 0 ) f 0 (x 0 +)) (1 + o(1)) } 2n { 1 cr2 (1 + o(1)) n } 2n d 4 exp ( cr 2) = d 4e by choosing c = 1/r 2. Statistical Seminar, Fréjus 2.28

This corresponds to the following theorem for the MLE f n : Theorem. (Anevski and Hössjer, 2002; W, 2007) If x 0 is a discontinuity point of f 0, d (f 0 (x 0 ) f 0 (x 0 +))/2 with f 0 (x 0 +) > 0 and f(x 0 ) (f 0 (x 0 ) + f 0 (x 0 ))/2, then f n (x 0 ) f 0 (x 0 ) d R(0) where h R(h) is the process of left-derivatives of the least concave majorant M of the process M defined by M(h) = N 0 (h) d h { N(f0 (x 0 +)h) f 0 (x 0 +)h dh, h 0 N(f 0 (x 0 )h) f 0 (x 0 )h + dh, h < 0 where N is a standard (rate 1) two-sided Poisson process on R. Statistical Seminar, Fréjus 2.29

5 40 20 20 40 5 10 Statistical Seminar, Fréjus 2.30

S4: f 0 (x 0 ) > 0, f (j) 0 (x 0) = 0, j = 1, 2,..., p 1, and f (p) 0 (x 0) 0. In this case, consider the perturbation f ɛ of f 0 given for ɛ > 0 by f ɛ (x) = Then for ν(f) = f(x 0 ) f 0 (x) if x x 0 ɛ or x > x 0 + ɛ, f 0 (x 0 ɛ) if x 0 ɛ < x x 0 f 0 (x 0 + ɛ) if x 0 < x x 0 + ɛ. where f (p) ν(f ɛ ) ν(f 0 ) 0 (x 0) ɛ p, p! H 2 f (p) (f ɛ, f 0 ) A 0 (x 0) 2 p ɛ 2p+1 B p ɛ 2p+1 f 0 (x 0 ) A p 2p 2 (2p!) 2 (2p 2 + 3p + 1). Statistical Seminar, Fréjus 2.31

1.0 0.8 0.6 0.4 0.2 Statistical Seminar, Fréjus 2.32

Choosing ɛ = cn 1/(2p+1), plugging into our two-point bound, and optimizing with respect to c yields inf max { n p/(2p+1) E n T n ν(f n ), n p/(2p+1) E 0 T n ν(f 0 ) } T n 1 4 ν(f n) ν(f 0 ) 1 4 f (p) 0 (x 0) p! { 1 nh2 (f n, f 0 ) n c p exp ( 2B p c 2p+1) } 2n = D p ( f (p) 0 (x 0) f 0 (x 0 ) p ) 1/(2p+1) taking c = with D p 1 4p! ( p p (2p + 1)A p p ( p (2p + 1)B p ) 1/(2p+1) exp( p/(2p + 1)). ) 1/(2p+1) Statistical Seminar, Fréjus 2.33

The resulting lower bound corresponds to the following theorem for f n : Theorem. (Wright (1981); Leurgans (1982); Anevski and Hössjer (2002)) Suppose that f (j) 0 (x 0) = 0 for j = 1,..., p 1, f (p) 0 (x 0) 0, and f (p) 0 is continuous at x 0. Then n p/(2p+1) ( f n (x 0 + n 1/(2p+1) t) f 0 (x 0 )) d C p S p (t) where S p is the process given by the left-derivatives of the least concave majorant Ŷp of Y p (t) W (t) t p+1, and where In particular C p = ( f 0 (x 0 ) p f (p) 0 (x 0) /(p + 1)!) 1/(2p+1). n p/(2p+1) ( f n (x 0 ) f 0 (x 0 )) d C p S p (0) Proof. Switching + (argmax-)continuous mapping theorem. Statistical Seminar, Fréjus 2.34

Summary: The MLE f n is locally adaptive to f 0, at least in scenarios 1-4. S1: rate n 1/3 ; localization n 1/3 ; constants agree with minimax lower bound. S2: rate n 1/2 ; localization n 0 = 1, none; constants agree with minimax bound. S3: rate n 0 = 1; localization n 1 ; constants agree(?). S4: rate n p/(2p+1) ; localization n 1/(2p+1) ; constants agree. Statistical Seminar, Fréjus 2.35

C: Global lower and upper bounds (briefly) Birgé (1986, 1989) expresses the global optimality of f n in terms of its L 1 risks as follows: Lower bound: Birgé (1987). Let F denote the class of all decreasing densities f on [0, 1] satisfying f M with M > 1. Then the minimax risk for F with respect to the L 1 metric d 1 (f, g) f(x) g(x) dx based on n observations is R M (d 1, n) inf sup E f d 1 ( ˆf n, f). ˆf n f F Then there is an absolute constant C such that ( ) logm 1/3 R M (d 1, n) C. n Upper bound, Grenander: Birgé (1989). Let f n denote the Grenander estimator of f F. Then ( ) logm 1/3 sup E f d 1 ( ˆf n, f) 4.75. f F M n Statistical Seminar, Fréjus 2.36

C: Global lower and upper bounds (briefly) Birgé s bounds are complemented by the remarkable results of Groeneboom (1985), Groeneboom, Hooghiemstra, and Lopuhaa (1999). Set V (t) sup{s : W (s) (s t) 2 is maximal} where W is a standard two-sided Brownian motion process starting from 0. Theorem. (Groeneboom (1985), GHL (1999)) Suppose that f is a decreasing density on [0, 1] satisfying: A1. 0 < f(1) f(y) f(x) f(0) < for 0 x y 1. A2. 0 < inf 0<x<1 f (x) sup 0<x<1 f (x) <. A3. sup 0<x<1 f (x) <. Then, with µ = 2E V (0) 1 0 1 2 f (x)f(x) 1/3 dx, n 1/6 { n 1/3 1 where σ 2 = 8 0 0 f n (x) f(x) dx µ } Cov( V (0), V (t) t )dt. d σz N(0, σ 2 ) Statistical Seminar, Fréjus 2.37

D: Lower bounds: convex decreasing density Now consider estimation of a convex decreasing density f on [0, ). (Original motivation: Hampel s (1987) bird-migration problem.) Since f exists almost everywhere, we are now interested in in estimation of ν 1 (f) = f(x 0 ) and ν 2 (f) = f (x 0 ). We let D 2 denote the class of all convex decreasing densities on R +. Note that every f D 2 can be written as a scale mixture of the triangular (or Beta(1, 2)) density: if f D 2, then f(x) = 0 2y 1 (1 x/y) + dg(y) for some (mixing) distribution G on [0, ). This corresponds to the fact that monotone decreasing density f D D 1 can be written as a scale mixture of the Uniform(0, 1) (or Beta(1, 1)) density: if f D 1, then f(x) = 0 y 1 1 [0,y] (x)dg(y) for some distribution G on [0, ). Statistical Seminar, Fréjus 2.38

D: Lower bounds: convex decreasing density Scenario 1: Suppose that f 0 D 2 and x 0 (0, ) satisfy f 0 (x 0 ) > 0, f 0 (x 0) > 0, and f 0 is continuous at x 0. To establish lower bounds, consider the perturbations f ɛ of f 0 given by f ɛ (x) = f 0 (x 0 ɛc ɛ ) + (x x 0 + ɛc ɛ )f 0 (x 0 ɛc ɛ ), f 0 (x 0 + ɛ) + (x x 0 ɛ)f 0 (x 0 + ɛ), f 0 (x), x (x 0 ɛc ɛ, x 0 ɛ), x (x 0 ɛ, x 0 + ɛ), elsewhere; here c ɛ is chosen so that f ɛ is continuous at x 0 ɛ. Now define f ɛ by f ɛ (x) = f ɛ (x) + τ ɛ (x 0 ɛ x)1 [0,x0 ɛ] (x) with τ ɛ chosen so that f ɛ integrates to 1. Statistical Seminar, Fréjus 2.39

D: Lower bounds: convex decreasing density 2.0 1.5 1.0 0.5 0.0 0 Statistical Seminar, Fréjus 2.40

D: Lower bounds: convex decreasing density 2.0 1.5 1.0 0.5 Statistical Seminar, Fréjus 2.41

D: Lower bounds: convex decreasing density Now ν 1 (f ɛ ) ν 1 (f 0 ) = f ɛ (x 0 ) f 0 (x 0 ) 1 2 f (2) 0 (x 0)ɛ 2 (1 + o(1)), ν 2 (f ɛ ) ν 2 (f 0 ) = f ɛ(x 0 ) f 0 (x 0) f (2) 0 (x 0)ɛ(1 + o(1)), and some further computation (Jongbloed (1995), (2000)) shows that H 2 (f ɛ, f 0 ) = 2f (2) 0 (x 0) 2 5f 0 (x 0 ) ɛ5 (1 + o(1)). Thus taking ɛ ɛ n = cn 1/5, writing f n for f ɛn, and using our two-point lower bound proposition yields lim inf n inf max { E n n 2/5 T n ν 1 (f n ), E 0 n 2/5 T n ν 1 (f 0 ) } T n and... 1 4 f 0 2(x 0)f (2) 0 (x 0) 2 8 2 e 2 1/5 Statistical Seminar, Fréjus 2.42,

D: Lower bounds: convex decreasing density lim inf n inf max { E n n 1/5 T n ν 2 (f n ), E 0 n 1/5 T n ν 2 (f 0 ) } T n 1 4 f 0(x 0 )f (2) 0 (x 0) 3 4e 1/5 We will see that the MLE achieves these rates and that the limiting distributions involve exactly these constants tomorrow. Other Scenarios? S2: f 0 triangular on [0, 1]? (Degenerate mixing distribution at 1.) S3: x 0 (a, b) where f 0 is linear on (a, b)? S4: x 0 a bend or kink point for x 0 : f 0 (x 0 ) < f 0 (x 0+)? S5:? Statistical Seminar, Fréjus 2.43.