Applications of the van Trees inequality to non-parametric estimation.

Similar documents
Chapter 1. Density Estimation

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Gradient Descent etc.

Order of Accuracy. ũ h u Ch p, (1)

Convexity and Smoothness

Kernel Density Estimation

Homework 1 Due: Wednesday, September 28, 2016

Differentiation in higher dimensions

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim

Convexity and Smoothness

Solutions to the Multivariable Calculus and Linear Algebra problems on the Comprehensive Examination of January 31, 2014

Math 161 (33) - Final exam

Math 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0

IEOR 165 Lecture 10 Distribution Estimation

Section 3.1: Derivatives of Polynomials and Exponential Functions

The total error in numerical differentiation

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

4.2 - Richardson Extrapolation

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

Runge-Kutta methods. With orders of Taylor methods yet without derivatives of f (t, y(t))

NUMERICAL DIFFERENTIATION

Continuity and Differentiability of the Trigonometric Functions

MATH 155A FALL 13 PRACTICE MIDTERM 1 SOLUTIONS. needs to be non-zero, thus x 1. Also 1 +

DEPARTMENT MATHEMATIK SCHWERPUNKT MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

Math 1241 Calculus Test 1

Numerical Differentiation

The Priestley-Chao Estimator

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

Function Composition and Chain Rules

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

2.3 Product and Quotient Rules

Calculus I - Spring 2014

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

A = h w (1) Error Analysis Physics 141

Continuity and Differentiability Worksheet

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

232 Calculus and Structures

5.1 introduction problem : Given a function f(x), find a polynomial approximation p n (x).

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Poisson Equation in Sobolev Spaces

3.1. COMPLEX DERIVATIVES 89

2.8 The Derivative as a Function

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Analytic Functions. Differentiable Functions of a Complex Variable

Minimal surfaces of revolution

The Derivative as a Function

Chapter Seven The Quantum Mechanical Simple Harmonic Oscillator

A Goodness-of-fit test for GARCH innovation density. Hira L. Koul 1 and Nao Mimoto Michigan State University. Abstract

LECTURE 14 NUMERICAL INTEGRATION. Find

Basic Nonparametric Estimation Spring 2002

1 Introduction to Optimization

A SHORT INTRODUCTION TO BANACH LATTICES AND

Exam 1 Review Solutions

Bootstrap confidence intervals in nonparametric regression without an additive model

Math 242: Principles of Analysis Fall 2016 Homework 7 Part B Solutions

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

AMS 147 Computational Methods and Applications Lecture 09 Copyright by Hongyun Wang, UCSC. Exact value. Effect of round-off error.

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

Section 15.6 Directional Derivatives and the Gradient Vector

HOMEWORK HELP 2 FOR MATH 151

Section 2.7 Derivatives and Rates of Change Part II Section 2.8 The Derivative as a Function. at the point a, to be. = at time t = a is

MA455 Manifolds Solutions 1 May 2008

Polynomial Functions. Linear Functions. Precalculus: Linear and Quadratic Functions

Polynomial Interpolation

Fall 2014 MAT 375 Numerical Methods. Numerical Differentiation (Chapter 9)

7 Semiparametric Methods and Partially Linear Regression

Continuity. Example 1

THE IMPLICIT FUNCTION THEOREM

2.3 Algebraic approach to limits

Uniform Convergence Rates for Nonparametric Estimation

Deconvolution problems in density estimation

MATH1151 Calculus Test S1 v2a

A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression

SFU UBC UNBC Uvic Calculus Challenge Examination June 5, 2008, 12:00 15:00

Exam 1 Solutions. x(x 2) (x + 1)(x 2) = x

Finite Difference Methods Assignments

Polynomial Interpolation

Kernel Density Based Linear Regression Estimate

Test 2 Review. 1. Find the determinant of the matrix below using (a) cofactor expansion and (b) row reduction. A = 3 2 =

How to Find the Derivative of a Function: Calculus 1

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

Numerical Analysis MTH603. dy dt = = (0) , y n+1. We obtain yn. Therefore. and. Copyright Virtual University of Pakistan 1

1 + t5 dt with respect to x. du = 2. dg du = f(u). du dx. dg dx = dg. du du. dg du. dx = 4x3. - page 1 -

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h

New Distribution Theory for the Estimation of Structural Break Point in Mean

The Verlet Algorithm for Molecular Dynamics Simulations

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

Chapter 1D - Rational Expressions

Copyright c 2008 Kevin Long

Exercises for numerical differentiation. Øyvind Ryan

3.4 Worksheet: Proof of the Chain Rule NAME

Bootstrap prediction intervals for Markov processes

Department of Mathematics, K.T.H.M. College, Nashik F.Y.B.Sc. Calculus Practical (Academic Year )

The Laplace equation, cylindrically or spherically symmetric case

Lecture 10: Carnot theorem

Functions of the Complex Variable z

2.4 Exponential Functions and Derivatives (Sct of text)

Transcription:

Brno-06, Lecture 2, 16.05.06 D/Stat/Brno-06/2.tex www.mast.queensu.ca/ blevit/ Applications of te van Trees inequality to non-parametric estimation. Regular non-parametric problems. As an example of suc problems we consider asymptotic optimality of te empirical cmumlative distribution function. Let X 1,..., X n i.i.d. c.d.f. F x), p.d.f. fx) A typical non-parametric problem, and a classical one, is to estimate te cdf F x), for any given x. It s called a regular non-parametric problem, since in many ways it is similar to estimating an unnown parameter, in a regular parametric model. In suc problems a) te rate of convergence is typically 1 n and b) tere exist unbiased or asymptotically unbiased estimators. Here F x) itself can be seen as te unnown parameter of interest, altoug it is now a functional parameter, or an infinitly dimensional parameter. Te ecdf We ave 1x) = ˆF n x) = # {X i : X i x} = 1 n n { 1 x 0 0 x < 0, 1x X i) = n 1x X i ) i=1 { 1 wit probability F x) 0 wit probability 1 F x) E X F ˆF n x) F x)) 2 = 1 F x)1 F x)) n or equivalently ne X F ˆF n x) F x)) 2 = F x)1 F x)). In te case of a real unnown parameter θ, we used intervals a, b), to introduce te notion of locally asymptotically minimax estimates. Wat plays te role of suc intervals, in a non-parametric setting? We need a notion of open vicinities V in our case. Te easiest way to acieve tis is to use a metric, or a distance function on te set of all distributions. Tis can be done in many different ways. We coose te classical distance in varition metric ϱf 1, F 2 ) = f 1 x) f 2 x) dx. Tus defined measure of closeness satisfies all te axioms of distance: symmetry: te triangle inequality: ϱf 1, F 2 ) 0 ϱf 1, F 2 ) = ϱf 2, F 1 ) ϱf 1, F 2 ) ϱf 1, F 3 ) + ϱf 3, F 2 ) A set V is called open if for any F 0 V, tere exist ε > 0 suc tat {F : ϱf, F 0 ) < ε} V. Now for any subset V sup ne X F ˆF n x) F x)) 2 = sup F x)1 F x)). 1) 1

Teorem 1. For any open subset V and any sequence of estimators F n x), lim sup ne X F ˆF n x) F x)) 2 sup F x)1 F x)). 2) n Corollary. According to 1)-2), te empirical cdf ˆF n x) is locally asymptotically minimax. Proof of Teorem 1. Let V bean arbitrary vicinity and let F 0 x) be an arbitrary element of V. Denote te corresponding pdf by f 0 x). Let us introduce a one-dimensional parametric sub-family of distributions F x θ) as follows: We mention some properties of tese family. fy θ) = f 0 y)1 + θ1x y) F 0 x))), θ < 1/2. 3) 1. fx θ) is a probability density for all θ < 1/2. Indeed since θ1x y) F 0 x)) θ 1x y) F 0 x) 1 2 1x y) + F 0x)) ) 1 1 + 1) = 1 2 we ave Moreover, fy θ) dy = f 0 y) dy + θ 1 + θ1x y) F 0 x)) 1 1 = 0. 1x y) F 0 x))f 0 y) dy = 1 + θf 0 x) F 0 x)) = 1. 2. ϱf θ, F 0 ) = fy θ) f 0 y) dy = f 0 y) θ1x y) F 0 x y) dy θ 2f 0 y) dy = 2θ. Corollary. Tere is δ > 0 suc tat F θ x) V, for all θ < δ. 3. ψθ) := F x θ) is a continuously differentiable function and ψ θ) = F 0 x)1 F 0 x)). Indeed, ψθ) = 1x y)fx θ) dy = 1x y)f 0 y)1 + θ1x y) F 0 x))) dy = F 0 y) + θ 1x y)f 0 y) dy + θ 1x y) F 0 x))1x y)f 0 y) dy = 1x y) F 0 x)) 2 f 0 y) dy = F 0 x) + θf 0 x)1 F 0 x)) 4. Iθ) is continuously differentiable and I0) = F 0 x)1 F 0 x)). Indeed, log fy θ) = log f 0 y) + log1 + θ1x y) F 0 x)) 2

Tus log fy θ)) = 1x y) F 0 x) 1 + θ1x y) F 0 x)) ) 2 Iθ) = log fy θ)) fy θ) dy = θ ) 2 1x y) F 0 x) f 0 y)1 + θ1x y) F 0 x)) dy = 1 + θ1x y) F 0 x)) 1x y) F 0 x)) 2 1 + θ1x y) F 0 x)) f 0y) dy It is easy to see now tat Iθ) is continuous and I0) = 1x y) F 0 x)) 2 f 0 x) dy = Var X1 θ1x X 1 ) = F 0 x)1 F 0 x)). Combining tese properties wit an asymptotic version of te van Trees inequality we obtain, for any estimator F n x) of F x), lim sup ne X F F n x) F x)) 2 lim sup ne X θ F n x) F x θ)) 2 := n n θ δ lim sup E X θ n ψ n x) ψθ)) 2 ψ 0)) 2 n θ δ I0) Since F 0 is an arbitrary element of V, we also obtain = F 0x)1 F 0 x))) 2 F 0 x)1 F 0 x)) = F 0 x)1 F 0 x)). lim sup ne X F F n x) F x)) 2 sup F x)1 F x)). n Remar. How was determined te form of te family 3)? Let us loo at an arbitrary sub-family were y) is any bounded function suc tat f 0 y)y) dy = 0. fy θ) = f 0 y)1 + θy)) 3 ) Ten 3 ) determines a family of distribution, for all sufficiently small θ. Similarly to te above calculations we find ten tat ψ 0) = 1x y) F 0 x))y)f 0 y) dy and I0) = 2 y)f 0 y) dy. Terefore, te resulting lower bound would be ψ 0)) 2 I0) = 1x y) F0 x))y)f 0 y) dy ) 2 2 y)f 0 y) dy max 3

Now, by te Caucy-Scwarz inequality 1x y) F0 x))f 0 y)y)) dy ) 2 2 y)f 0 y) dy 1x y) F 0 x)) 2 f 0 y) dy. Tus te best poosible lower bounf is te one we ave obtained before, and an be acieved if and only if y) = const 1x y) F 0 x)). Tat s exactly te coice we ave used, te coice of te constant being insignifacnt. Tus te family used in te proof is te ardest parametric subfamily. Acordingly, tis metod of proving asymptotic optimality is called metod of ardest parametric subfamily. Singular non-parametric problems. density estimation, in te setting As a typical exampe of suc problems, we will consider ernel Te classical ernel density estimators are X = X 1,..., X n ) i.i.d. fx) f n x) = 1 n n ) x Xi i=1 Te functions x) is called ernel function. Te parameter is called bandwidt: = n 0. We will assume tat is an arbitrary symmetric function. How good is tis estimate? does it converge to te true density? It is clear tat as n, sould go to 0. We will assume tat is an arbitrary symmetric function, x) x), and tat x) dx = 1, x) c <, x) = 0 if x 1 bounded support) and tat x)x 2 dx c, 2 x) dx c. Note tat since x) is symmetric, 1 xx) dx = 0. We will also assume tat fx) C, f x) C, f x) C Te variance-bias decomposition of te mean square error: Ef n x) fx)) 2 = Varf n x)) + Ef n x) fx)) 2 Ef n x) = 1 ) x E X1 4

Varf n x)) = 1 n 2 Var ) x X1 1 n 2 E2 ) x X1 Let us analyze tese expectations using Taylor expansion. ) ) x X1 X1 x := E = E = fy) dy = 1 z) fx) + f x)z + 12 ) f x + ϑz)z) 2 dz = Tus fx) + 1 1 2 2 f x + ϑz)z 2 ) = fx) + O 3 ). 4) Ef n x) = fx) + O 2 ). Similarly, Terefore ) x E 2 X1 y x = 2 1 2 z)fx + z) dz = fx) Varf n x)) 1 n 2 E2 Combining tese estimaties togeter sows tat Ef n x) fx)) 2 C n + C4 = C ) fx) dx = 2 z) dz + O 2 ) C. 5) ) x X1 C n. ) 1 n + 4. 6) If we coose too small ten te variance term is becomes large under-smooting). If we coose large, ten te bias becomes large over-smooting). So we ave to strie a balance, by minimizing te r..s. If we oose a bandwidt suc tat ten 1 n = 4, Ef n x) fx)) 2 C 1 n = 5, = 1 n 1/5 n 1/5 n + 1 ) 1 = C n 4/5 n + 1 ) = 2C 4/5 n 4/5 n 4/5 If te minimize te r..s. of 6) w.r.t., we will obtain te same rate, wit a somewat better constant. Note owever, te even wit te best balance possible, te bias and variance terms are of te same order. In oter words, te resulting estimator is asymptotically biased! Note also tat te resulting rate of convergence is slower tan in te case of te cdf, and tis clearly sows in simulations! Let F = {fx) : sup fx) C, x sup x f x) C, sup f x) C} x Teorem 2. Tere exist constants 0 < d < D < and an estimator ˆf x) suc tat for any real x sup E f ˆf n x) fx)) 2 5 D n 4/5.

On te oter and, for any estimator f n x) of fx) and any real x sup E f f n x) fx)) 2 d n 4/5. Proof of te lower bound. Let x be fixed. Coose f 0 x) F suc tat tat and for some ε > 0 f 0 x) > 0 sup fy) C ε, y sup y f y) C ε, sup f y) C ε. 7) y Also coose a symmetric density x) satisfying all te properties assumed in te proof of te lower bound and additionally 0) > 0, y) c, y) c. Let us coose a one-dimensional parametric sub-family fx θ): were according to 4) fy θ) = f 0 y) 1 + θ = f 0 y) C. )) Here = n 0 will be cosen later. Note tat since we want to estimate fx):, our target funational is ψθ) := fx θ) = f 0 x) 1 + θ 0) )). Let us note te following properties of te parametric family fy θ). 1. If δ is a sufficiently small number, ten fy θ) F for all θ < δ 2. Indeed, te most severe restriction on θ stems from te requirement Since fy θ) yy = f 0 y) + θ f 0 y) fy θ) yy C. 8) ) ) )) y x y x + 2 f 0 y) + 2 f 0 y) te required property 8) easily follows from 7). Note tat only a very small piece of te above family fits into te familty F. Te size of tis family will feature prominently in te resulting lower bound. 2. According to 4), for all sufficiently small, ψ θ) = f 0 x)0) ) = f 0 x)0) + O)) 0)f 0 x)/2 > 0. 3. Barring te tecnicalitites we ave log fy θ) = log f 0 y) + log 1 + θ ) x y 6 )) log f 0 y) + θ ) x y )

and according to 4)-5) Iθ) log fy θ)) = θ ) ) 2 x y fy θ) dy ) 2 x y f 0 y) dy 2 f 0 x) ) x y ) ) 2 x y f 0 y) dy = 2 z) dz + O 2 ) c 1. combining all te above properties of te tus cosen family fy θ) wit te van Trees lower bound we obtain for arbitrary estimator fx) sup E X f f n x) fx)) 2 sup EX f f n x) fx θ)) 2 θ δ 2 sup E X f ψ n ψθ)) 2 θ δ 2 ψ θ)λθ) dθ ) 2 n Iθ)λθ) dθ + π2 δ 2 4 f 2 0x)0)/2) c 1 n + π2 δ 2 4 = c 2 n + 1 4 Here again we ave to balance te two terms appearing in te denomiantor, a problem wic is similar but sligtly different from 6). We an simply coose suc tat n = 1 4, 5 = 1 n, = 1 n 1/5 n + 1 4 = 2n4/5. Tus it follows te for some d > 0 sup E X f f n x) fx)) 2 d 2n 4/5. 7