Nonparametric Density and Regression Estimation

Similar documents
Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

University of Mannheim, West Germany

) ) = γ. and P ( X. B(a, b) = Γ(a)Γ(b) Γ(a + b) ; (x + y, ) I J}. Then, (rx) a 1 (ry) b 1 e (x+y)r r 2 dxdy Γ(a)Γ(b) D

Entrance Exam, Real Analysis September 1, 2017 Solve exactly 6 out of the 8 problems

Solutions Final Exam May. 14, 2014

FOURIER COEFFICIENTS OF BOUNDED FUNCTIONS 1

Department of Statistics Purdue University West Lafayette, IN USA

Optimal global rates of convergence for interpolation problems with random design

Variance Function Estimation in Multivariate Nonparametric Regression

I ƒii - [ Jj ƒ(*) \'dzj'.

lim Prob. < «_ - arc sin «1 / 2, 0 <_ «< 1.

Chapter 5. Measurable Functions

Theoretical Statistics. Lecture 1.

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

ON THE GIBBS PHENOMENON FOR HARMONIC MEANS FU CHENG HSIANG. 1. Let a sequence of functions {fn(x)} converge to a function/(se)

P a g e 5 1 of R e p o r t P B 4 / 0 9

A proof of a partition conjecture of Bateman and Erdős

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

Logical Connectives and Quantifiers

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Wavelet Shrinkage for Nonequispaced Samples

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

University of Toronto Department of Statistics

Thus f is continuous at x 0. Matthew Straughn Math 402 Homework 6

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

ON EMPIRICAL BAYES WITH SEQUENTIAL COMPONENT

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

Problem Set 5: Solutions Math 201A: Fall 2016

Math 328 Course Notes

Upper Bounds for Partitions into k-th Powers Elementary Methods

Real Analysis Notes. Thomas Goller

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

Measure and Integration: Solutions of CW2

Nonparametric estimation under Shape Restrictions

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Suppose R is an ordered ring with positive elements P.

Statistical inference on Lévy processes

Introduction to Regression

lim n C1/n n := ρ. [f(y) f(x)], y x =1 [f(x) f(y)] [g(x) g(y)]. (x,y) E A E(f, f),

Approximation Theoretical Questions for SVMs

Functions. Chapter Continuous Functions

ORTHOGONAL SERIES REGRESSION ESTIMATORS FOR AN IRREGULARLY SPACED DESIGN

Immerse Metric Space Homework

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Citation Vietnamese Academy of Science and T P.590-P

A L A BA M A L A W R E V IE W

Real Analysis Problems

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

ESTIMATES FOR THE MONGE-AMPERE EQUATION

Lecture 35: December The fundamental statistical distances

MATHEMATICS AND GAMES* Nagayoshi lwahori University of Tokyo

Optimal Estimation of a Nonsmooth Functional

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

A GENERALIZATION OF THE FLAT CONE CONDITION FOR REGULARITY OF SOLUTIONS OF ELLIPTIC EQUATIONS

THE REGULARITY OF MAPPINGS WITH A CONVEX POTENTIAL LUIS A. CAFFARELLI

L. Brown. Statistics Department, Wharton School University of Pennsylvania

NECESSARY CONDITIONS FOR WEIGHTED POINTWISE HARDY INEQUALITIES

How hard is this function to optimize?

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

NEAREST NEIGHBOR DENSITY ESTIMATES 537 THEOREM. If f is uniformly continuous on IEgd and if k(n) satisfies (2b) and (3) then If SUpx I,/n(x) - f(x)l _

Rademacher functions

Minimax Risk: Pinsker Bound

252 P. ERDÖS [December sequence of integers then for some m, g(m) >_ 1. Theorem 1 would follow from u,(n) = 0(n/(logn) 1/2 ). THEOREM 2. u 2 <<(n) < c

1 Background and Review Material

Lecture 16: Sample quantiles and their asymptotic properties

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

or - CHAPTER 7 Applications of Integration Section 7.1 Area of a Region Between Two Curves 1. A= ~2[0- (x :2-6x)] dr=-~2(x 2-6x) dr

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

AN EFFICIENT ESTIMATOR FOR THE EXPECTATION OF A BOUNDED FUNCTION UNDER THE RESIDUAL DISTRIBUTION OF AN AUTOREGRESSIVE PROCESS

Jsee x dx = In Isec x + tanxl + C Jcsc x dx = - In I cscx + cotxl + C

SDS : Theoretical Statistics

Continued Fraction Digit Averages and Maclaurin s Inequalities

2 Sequences, Continuity, and Limits

The Lebesgue Integral

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

1. For each statement, either state that it is True or else Give a Counterexample: (a) If a < b and c < d then a c < b d.

Mark Gordon Low

ON THE AVERAGE NUMBER OF REAL ROOTS OF A RANDOM ALGEBRAIC EQUATION

Lecture 2: Random Variables and Expectation

DENSE ALGEBRAS OF FUNCTIONS IN Lp

Supplementary Information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

Later Faber2 established necessary and His theorem is the following: The latter condition is more restrictive than the former.

7 Convergence in R d and in Metric Spaces

Adaptivity to Local Smoothness and Dimension in Kernel Regression

Homework 11. Solutions

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Austin Mohr Math 704 Homework 6

The concentration of the chromatic number of random graphs

Lectures 2 3 : Wigner s semicircle law

Recovering Distributions from Gaussian RKHS Embeddings

Transcription:

Lower Bounds for the Integrated Risk in Nonparametric Density and Regression Estimation By Mark G. Low Technical Report No. 223 October 1989 Department of Statistics University of California Berkeley, California

Lower Bounds for the Integrated Risk in Nonparametric Density and Regression Estimation By Mark G. Low

- 2 - Abstract Lower bounds for the minimax risk are found for density and regression functions satisfying a uniform Lipschitz condition. The measure of loss is integrated squared error or squared Hellinger distance. Ratios of known upper bounds to these lower bounds are shown to be as small as two in a specific example.

- 3-0. Introduction Lower bounds for the asymptotic miniax rsk under squared error loss have recently been found for a variety of nonparametric problems. The pointwise estimation of regression functions or densities has received particular attention. For a large class of models, ratios of the maximum risk of best linear estimators to the minimax risk has been shown to be close to one. (See Brown and Farrell (1987), Brown and Low (1988), Donoho and Liu (1987, 1988).) When interest has focused on estimating an entire density or regression function loss functions typically considered are L1 or integrated squared error. The L1 approach to density estimation has been summarized by Devroye and Gyorfi (1985). For an integrated squared error loss Stone (1982) has established best asymptotic rates for nonparametric regression and density estmaion. For certain ellipsoidal parameter spaces, Efroimovich and Pinsker (1982) and Nussbaum (1985) found first order asymptotically optimal solutions. In this paper we consider density or regression functions satisfying a uniform Lipschitz condition (0.1) If(x)-f(Y)I < MIX-YI. We denote by R(M) the set of functions defined on [-- 1 1 ] which satisfy (0.1). 2' 2 The subset of all density functions which belong to R(M) will be written D(M). These parameter spaces are not ellipsoidal. In Section 1 we derive lower bounds for the ninimax risk over the classes D(M) and R(M) for density estimation and nonparametric regression respectively. For density estimation we look at both integrated squared error loss and squared Hellinger distance losses. For nonparametric regression we only consider integrated squared error. We have restricted attention to densities and regression functions satisfying (0.1) so that we can compare these lower bounds to known upper bounds. This is done in Section 2. We should point out, however, that the method we use to construct lower bounds can be applied in a variety of other contexts eveit if sometimes more ingenuity is needed. Essential to our arguments is a knowledge of good lower bounds for the corresponding pointwise estimation problem. The proof of Theorem 1 then shows how to connect the pointwise estimation problem to the global estimaton problem. The results in this paper should therefore be understood as part of an ongoing effort to find general techniques for bounding the minimax risk in nonpametric problems. See for example Donoho and Johnstone (1989). The contribution of this paper is to show how to connect local problems to global problems.

- 4-1. Lower Bounds a) Local We consider two nonparametric statistical problems. In density estimation we observe i.i.d. random variables X1,., Xn each with density f e D (M). In the regression problem observe Yi-N(r(yi),1), independent, yi - -" i=l,...,n, where r e R(M). Estimators of f or r will be written fn or r where the subscript n will indicate that fn and are functions of ^ X1,...,Xn and Y1,.. Yn. Before establishing lower bounds for estimating an entire density or regression function we need corresponding results for the pointwise problem. This local problem has recently been addressed by Donoho and Liu (1987, 1988). In particular the propo sition given below is essentially contained in these papers. For this reason we only give a brief outline of a proof here. In both the pointwise and global estimation problems the lower bounds are expressed in terms of the minimax risk for the bounded normal mean problem. Let p (d, 02) be the minimax risk of estimating 0 from one observation for the family {N(0,a2),I0l < d). We write fen and sn' for densities and regression functions on the interval [- - defined by 0 X ~~~~~2 2 f n(x) = ti + g&(x)j (Cn(0)) where Son(X) = gn (X) (1.1) ge (x) = 1+X (1.2) Cn(0) = (1 +Jg(x)dx) = 1 + OD 1/3. Let (1.3) d Nm[21 Proposition a) If X1,..., Xn are i.i.d. random variables with common density fqn, then (1.4) lim sup n" inf Ep (0 - O)2 = M 2'3[3 d7]/3p(d,1). n-->o e.ioei:mdhnk.2

- 5 - b) If Y1,..., Y. are independent N (s; (yi), 1) then (1.5) limsup n2 inf sup E )2 = M23 3 d23 p (d, 1). Proof 2 ~~~~~~~~~~~~~~~n Let i [2= D12 n13 0 and P, be the probability with density rl fon (xj) with respect to Lebesgue measure on Rn. Straightforward calculations show that the experiments (P,,: I'VI s [] 3 D3/2) converge to the experiment (N (r, 1): Iv1 s [}] MD3a2). Similar calculations may be found in Donoho and Liu (1988). Hence lrn n2 inf su -113 Ef (0 9)2 n (1.6) [ 2 P[33/2] 1/ 1] Since d = 1/2 NM3'2 (1.7) D-1 = L- J 1/3 M22/3 d-2 (1.6) and (1.7) taken together yield (1.4). (1.5) is established in a similar way. b) Global To obtain our global lower bounds we construct perturbations of a fixed density f0 and regression s0. We choose and f0: [,1] -* R, fo(x) 1 so: [0,1] - R, s0(x) 3 0. Let D > 0 and 3n(D) = [n1/3 / 2D where [ ] denotes the greatest integer less than. Our subfamilies of interest are fe (x) = [fo (x) + gi; (x - (2i - 1) D n41'3) (C(n) ( fn))1l 0fn = (01...I0p) lo 1.sMDn1/3, i= 1,..,4 n

s0 (x) = so (x) + On, gl; 1 I Cn (01) i=l (x - (2i - 1) D n-1/3) where gon and C. (O) are defined in (1.1) and (1.2). Lower bounds for the global problem will be written in terms of sup d-'3 p (d, 1). This is facilitated -by a recent study of m (q) = sup d2q-2 p (d, 1) by Donoho and Liu (1987). They found m 2= sup d-23 p (d,1) = 0.450 Theorem 1 Let Xl,..., Xn be i.i.d. random variables with density f e D (M). Then a) (1.6) limsupn2/3inf sup Ef n--~cm in f D(M) f(f(x)-fn (x))2dx] 2 sup d-"3 p (d, 1) b) dn) ((n) = - 12, 1/3 M2-3 0.45 " 1/3 (1.7) lim sup n2r3 inf su Ef J [ - n--),ao in fed(m) l > 1 1 M213 sup d-213 p (d, 1). 4 L 12 i d Proof 2iDn-3 j (fen (x) - f (X))2 and (2i-2)D n-13 i (on) = (O,... 9 09 oil,0...,0 ). Then a) Let Ri (1,f = 1 (x) - 2 inf sup Ef F. Ri (fon fa)?. en i=-1 2 inf su Ri (0,fn) - [fd) iinfnsup E [1) -X)-X3 21ce since C(n (0On) - ii -+ 0 as n -*oo, uniformly for all 0" s.t. OI0< MDn-13. Hence

- 7 - inf U&Ef[J(f(X) -fn(x))2dx] 2 inf S E (%X)91)fn(x))2] in ASD A)0 0nfsu el (g(x-d where d is defined in (1.3). = i()inf sup lxi 2 (01-0 (x))2dx 0(X)01 0(x) 01 I L Dn7 Dif1/3 ) + (I1(9n) n 23 2Dn1/3M2 [3 j f213p(d,1)(1 + o(1)). 2D 3 2 Now (1.6) follows on taking sup. dp (1.7) Follows from the observation 4 1+ O = 1 + -0 + 0(02) and the analysis 2 given in a). Theorem 2 Let Yi = r (yi) + ei where ei are i.i.d. N (0, 1) random variables, r e R (M) and yi are equally spaced in the interval [ 0, 1. Then lim n2/3 inf s uer (rn(x) - r(x))2 1 l d n--*.?rer&) 12x M1 u d p(,1 Proof Replace f by r and D by R in the proof of Theorem 1. 2. Upper Bounds Throughout this section minimax risk refers to an integrated squared error loss. An obvious but crude upper bound for the minimax risk can be given in terms of the asymptotic linear mininax risk for the pointwise estimation problem under squared error loss. Let D (M, 1) be the class of densities such that f e D (M) and f(xo) S 1. Then clealy the asymptotic minimax risk of the best linear estimator for estimating f (xo) when f is assumed to belong to D ((M, 1) is an upper bound for the minimax risk under integrated squared loss over the class D (M). Similarly in the regression context the asymptotic linear minmax risk over the class R (M) for the pointwise problem is an upper bound for the minimax risk of the global problem. For the classes D (M, 1) and R (M) the asymptotic best linear estimators for the pointwise estimation of density and regression functions can be found in Sacks and Ylvisaker (1978, 1981). Alternatively the hardest linear subfamily methodology of Donoho and Liu (1987, 1988) also yields these best linear estimators.

- 8 - The asymptotic minimax nsk for these best linear estmators over the classes D (M, 1) and R (M) for the density and regression problems respectively is the same and is given by (2.1) BL (M) n-13 where B = 1/3 BL(M) = M213 L1 The ratio of these upper bounds to the lower bounds given in Theorem 1 and M23(!)1/3 The,orem 2 is given bv 3 _--- q- -.CoI -= 3. 04 M213 (12l)1/3045 12 The upper bounds given in (2.1) are quite conservative. (2.3) EP(B) = (f: [0,1] - R, f > ff= 1, f'25 B2). For the class of densities Efroimovich and Pinsker (1982) found the asymptotic minimax risk. Nussbaum (1985) did likewise for the regression problem with N(M) = {f: [0,1] - R, ff2 B2). Since D(M, 1) C. EP(M) and R(M, 1) c N(M), these numbers are also upper bounds for the classes D (M) and R (M). The ratios of these upper bounds to our lower bounds are the same and equal to [ 3 2 1/3 L4 1 2.2. 12-0.45 Moreover, we believe the upper bounds derived from Efroimovich and Pinsker are an overestimate of the minimax risk. Conjecture Let ci = (-1)iDnf/3, Nn = 2 f^(x) A&. -- v ndifl'31 n 4D -x I - i--1dn 1 fn(x) = f(dn) < x fn(x) = fn(l - Dn) Dn-1/3 1 - Dn -1/3 -< x 1 Dn1/3 -< x < 1 - Dn-1/3

and - 9- Nu fn(x) = 1 + 4(x1-(2i-3)Dnf1'). Then we conjecture (2.4) limnm n- sun EfFJ(fn = -lfx3 n- fed( Li) LO()-f() nxli f,fn(x)fn dx Ef[0X f()2 Calculation of the right hand side of (2.4) is D2 + 2 (2.5) 63 3D (2.5) takes its minimum value of straightforward and yields 1, when D = (21)1"3. This gives a ratio (21)1/3 (1 )1/3.0.45 12 = 1.84. Acknowledgements. This work is based on part of the author's doctoral dissertation, which was written at Cornell University under the supervision of Professor L.D. Brown. His encouragement is gratefully acknowledged.

- 10- References Brown, L.D. and Farrell, R.H. (1987) A lower bound for the risk in estimatng the value of a probability density. Cornell Statistics Center preprint series. Brown, L.D. and Low, M.G. (1988) Information inequality bounds on the minimax risk (with an application to nonparametric regression). Cornell Statistics Center preprint series. Casella, G. and Strawderman, W.E. (1981) Estimatng a bounded normal mean. Stat. 9, 870-878. Ann. Devroye, L. and Gy6rfi, L. (1985) Nonparametric density estimation: The L1 view. Wiley, New York. Donoho, D.L. and Johnstone, I. (1989). Minimax Risk over lp-balls. Technical Report, University of California, Berkeley. Donoho, D.L. and Liu, R.C. (1987) On the minimax estimation of linear functions. Technical Report, University of California, Berkeley. Donoho, D.L. and Liu, R.C. (1988) Geometrizing rates of convergence, 111. Report, University of California, Berkeley. Technical Efroimovich, S.Y. and Pinsker, M.S. (1982) Estimation of square-integrable probability density of a random variable (1983), 175-189. Nussbaum, M. (1985) Spline smoothing in regression models and asymptotic efficiency in L2. Ann. Stat. 13, 984-997. Sacks, J. and Ylvisaker, D. (1978) Linear estimation for approximately linear models. Ann. Stat. 6, 1122-1137. I.. va Sacks, J. and Ylvisaker, D. (1981) Asymptotically optimum kernels for density estimation at a point. Ann. Stat. 9, 334-346. Stone, C. (1982) Optimal global rates of convergence for nonparametric regression. Ann. Stat. 10, 4, 1040-1053. Department of Statistics University of California Berkeley, California