Spread, estimators and nuisance parameters

Similar documents
Ef ciency of the empirical distribution for ergodic diffusion

Observed information in semi-parametric models

Efficiency of Profile/Partial Likelihood in the Cox Model

Stochastic integral. Introduction. Ito integral. References. Appendices Stochastic Calculus I. Geneviève Gauthier.

Non-parametric Tests for the Comparison of Point Processes Based on Incomplete Data

EXTREMAL POLYNOMIALS ON DISCRETE SETS

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Student s t-statistic under Unimodal Densities

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

2 Interval-valued Probability Measures

Chapter 1. GMM: Basic Concepts

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Robust Estimation and Inference for Extremal Dependence in Time Series. Appendix C: Omitted Proofs and Supporting Lemmata

A note on L convergence of Neumann series approximation in missing data problems

Nonparametric Estimation of Wages and Labor Force Participation

5 Banach Algebras. 5.1 Invertibility and the Spectrum. Robert Oeckl FA NOTES 5 19/05/2010 1

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

University of California San Diego and Stanford University and

Measuring robustness

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

Economics 241B Review of Limit Theorems for Sequences of Random Variables

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

exclusive prepublication prepublication discount 25 FREE reprints Order now!

GENERIC RESULTS FOR ESTABLISHING THE ASYMPTOTIC SIZE OF CONFIDENCE SETS AND TESTS. Donald W.K. Andrews, Xu Cheng and Patrik Guggenberger.

Simple Estimators for Monotone Index Models

Entrance from 0 for increasing semistable Markov processes

Notes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 770

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

An elementary proof of the weak convergence of empirical processes

The International Journal of Biostatistics

Advanced Economic Growth: Lecture 21: Stochastic Dynamic Programming and Applications

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

3 Random Samples from Normal Distributions

A large-deviation principle for Dirichlet posteriors

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation

A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS

A Gentle Introduction to Stein s Method for Normal Approximation I

Comments on integral variants of ISS 1

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

Optimal Estimation of a Nonsmooth Functional

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

Complex Analysis. PH 503 Course TM. Charudatt Kadolkar Indian Institute of Technology, Guwahati

Simple Estimators for Semiparametric Multinomial Choice Models


Second-Order Analysis of Spatial Point Processes

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

GALOIS THEORY I (Supplement to Chapter 4)

On the Asymptotic E ciency of GMM

Ordinal Aggregation and Quantiles

Analysis of change-point estimators under the null hypothesis

Economics 241B Estimation with Instruments

3 Integration and Expectation

Functional Analysis (2006) Homework assignment 2

Stein s method and weak convergence on Wiener space

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

7 Influence Functions

Jackknife Euclidean Likelihood-Based Inference for Spearman s Rho

AN EFFICIENT ESTIMATOR FOR THE EXPECTATION OF A BOUNDED FUNCTION UNDER THE RESIDUAL DISTRIBUTION OF AN AUTOREGRESSIVE PROCESS

How to Attain Minimax Risk with Applications to Distribution-Free Nonparametric Estimation and Testing 1

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

Notes on Generalized Method of Moments Estimation

A REMARK ON ROBUSTNESS AND WEAK CONTINUITY OF M-ESTEVtATORS

Testing for Regime Switching: A Comment

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

T i t l e o f t h e w o r k : L a M a r e a Y o k o h a m a. A r t i s t : M a r i a n o P e n s o t t i ( P l a y w r i g h t, D i r e c t o r )

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Local Asymptotic Normality

EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS

A general linear model OPTIMAL BIAS BOUNDS FOR ROBUST ESTIMATION IN LINEAR MODELS

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

A time-to-event random variable T has survivor function F de ned by: F (t) := P ft > tg. The maximum likelihood estimator of F is ^F.

Derivation of dual forces in robot manipulators

Near convexity, metric convexity, and convexity

SUBOPTIMALITY OF THE KARHUNEN-LOÈVE TRANSFORM FOR FIXED-RATE TRANSFORM CODING. Kenneth Zeger

Week 2 Spring Lecture 3. The Canonical normal means estimation problem (cont.).! (X) = X+ 1 X X, + (X) = X+ 1

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

University of California, Berkeley

A scale-free goodness-of-t statistic for the exponential distribution based on maximum correlations

A note on profile likelihood for exponential tilt mixture models

Oberwolfach Preprints

ADVANCE TOPICS IN ANALYSIS - REAL. 8 September September 2011

Additive Isotonic Regression

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

2 ANDREW L. RUKHIN We accept here the usual in the change-point analysis convention that, when the minimizer in ) is not determined uniquely, the smal

Lecture 21. Hypothesis Testing II

Time is discrete and indexed by t =0; 1;:::;T,whereT<1. An individual is interested in maximizing an objective function given by. tu(x t ;a t ); (0.

Real Analysis: Homework # 12 Fall Professor: Sinan Gunturk Fall Term 2008

We begin by thinking about population relationships.

Mean-Variance Utility

Central Limit Theorem for Non-stationary Markov Chains

Semiparametric posterior limits

Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

Transcription:

Bernoulli 3(3), 1997, 323±328 Spread, estimators and nuisance parameters EDWIN. VAN DEN HEUVEL and CHIS A.J. KLAASSEN Department of Mathematics and Institute for Business and Industrial Statistics, University of Amsterdam, Plantage Muidergracht 24, 118 TV Amsterdam, The Netherlands A general spread inequality for arbitrary estimators of a one-dimensional parameter is given. This nite-sample inequality yields bounds on the distribution of estimators in the presence of nite- or in nite-dimensional nuisance parameters. Keywords: nite-sample inequality; nuisance parameters; spread Consider an arbitrary family P of probability distributions P on a measurable space (, A ) and assume that this family is dominated by a ó- nite measure ì on (, A ). Let í be a functional from P to the real line, and let be a random variable which takes values in and has its probability distribution P in P. We are interested in estimation of í(p) byan estimator T ˆ t( ) based on. We will consider the distribution of c(p)(t í(p)) under P, where c is a functional from P to (, 1). This distribution may be anything and therefore very little can be said about it. On the other hand, it cannot be arbitrarily much concentrated for several possible P 2 P simultaneously. To make this claim more precise, we will consider an average distribution function, where the average is taken over the family P of probability distributions, G(y) ˆ P(c(P)(T í(p)) < y)d~w(p), y 2, (1) P with the weight function ~W a probability measure on the measurable space (P, P ). As stated in our Theorem 1 below, the distribution function G in (1) is at least as spread out as a certain distribution function K, notation G > 1 K: This means that any two quantiles of G are at least as far apart as the corresponding quantiles of K, i.e. G 1 (u) G 1 (v) > K 1 (u) K 1 (v),, v < u, 1, (2) where the quantile function F 1 is de ned by F 1 (t) ˆ inf fy: F(y) > tg: The partial ordering of one-dimensional distribution functions by spread has been introduced by Bickel and Lehmann (1979). The distribution function K depends only on P, ~W, í(:) and c(:), but not on t(:). Therefore, K is a bound on the average distribution of an arbitrary estimator T according to the ordering of spread. This so-called spread inequality (2) may be used to derive local asymptotic minimax To whom correspondence should be addressed. e-mail: vdheuvel@fwi.uva.nl 135±7265 # 1997 Chapman & Hall

324 E.. van den Heuvel and C.A.J. Klaassen results; see emark 1 below. Bootstrapping both the distribution of an estimator and the bound K, the performance of this estimator may be evaluated. For estimators of location this topic has been studied by Venetiaan (1994). The spread inequality also sharpens global CrameÂr±ao inequalities; see Corollary 2.1 of Klaassen (1984) and formulae (2.4.2) and (2.4.29) of van den Heuvel (1996). To prove the general spread inequality we will rewrite the distribution function G in (1). To that end, assume that the functional í is (P, B )-measurable, with B the Borel ó- eld on, and c is (P, U )-measurable, with U an arbitrary ó- eld on (, 1), and de ne the probability measure W on the measurable space ( 3 (, 1), B 3 U) by W(B 3 U) ˆ ~W((í(P), c(p)) 2 B 3 U), for every B 2 B, U 2 U: Then the distribution function G may be rewritten as G(y) ˆ P(z(T è) < y) d ~W(Pj(í(P), c(p)) ˆ (è, z)) dw(è, z), (3) 3(, 1) P è,z where ~W(:j(í(P), c(p)) ˆ (è, z)) is the conditional probability measure of ~W given (í(p), c(p)) ˆ (è, z) and P è,z is the set of probability measures such that (í(p), c(p)) ˆ (è, z): If we assume that ~W(:jí(P), c(p)) is a regular version of the conditional probability measure we may de ne the probability measure P è,z on (, A ) by P è,z (A) ˆ P(A) d ~W(Pj(í(P), c(p)) ˆ (è, z)), A 2 A, è 2 í(p), z 2 c(p): (4) P è,z Note that P è,z is absolutely continuous with respect to ì, and denote its density by p(:jè, z): The distribution function G may be written as G(y) ˆ P è,z (z(t è) < y) dw(è, z): (5) 3(, 1) If (W, Z) is a random vector with probability distribution W, then G may be viewed as the distribution of Z(t( ) W), where the distribution of given (W, Z) ˆ (è, z) is de ned in (4). Consequently, the estimation problem is completely described by the joint distribution of (, W, Z): To obtain a bound on the distribution function G in (5) it suf ces to show that G has a density g of the form g(y) ˆ ES1 ( y,1) (Z(t( ) W)), y 2, (6) where S is a random variable based on the random vector (, W, Z): In fact, S is the score statistic de ned in (11) below. The bound K for distribution functions G satisfying (6) is de ned by its inverse K 1 (u) ˆ u 1 1 2 s 1 ds,, u < 1, (7) H 1 (t) dt with H the distribution function of the score statistic S. The validity of this lower bound is shown by rewriting (6) as

Spread, estimators and nuisance parameters 325 with g(g 1 (s)) ˆ 1 ø(t)h 1 (t) dt,, s, 1, (8) ø(t) ˆ E(1 (G 1 (s),1)(z(t( ) W))jS ˆ H 1 (t)),, t, 1, and by minimizing the right-hand side of (8) over all (critical) functions ø, < ø < 1, satisfying 1 s ˆ 1 ø(t) dt: To obtain relation (6) a global L 1 -differentiability condition on the density of (, W, Z) suf ces. The random vector (, W, Z) has density f (x, è, z) p(xjè, z)w(è, z) on 3 3 (, 1) with respect to the measure í ; ì 3 Lebesgue 3 ë if W has a density w with respect to the ó- nite measure Lebesgue 3 ë on the measurable space ( 3 (, 1), B 3 U). If there exists a function h 2 L 1 ( f ) such that j f (x, è Ez, z) f (x, è, z) Eh(x, è, z) f (x, è, z)j dí(x, è, z) ˆ o(e) (9) 33(, 1) holds, then relation (6) is valid with S ˆ h(, W, Z): In our Theorem 1 we will give suf cient conditions for relation (9) and the proof of this theorem shows how (9) implies (6) and hence our spread inequality. Theorem 1. Let ~W be a probability measure on the measurable space (P, P ), í a functional from P to and c a functional from P to (, 1). Let ~W(:j(í(P), c(p))) be a regular version of the conditional probability measure given (í(p), c(p)), de ne the probability measure W on the measurable space ( 3 (, 1), B 3 U) by W(B 3 U) ˆ ~W((í(P), c(p)) 2 B 3 U), B 2 B, U 2 U, and assume that W has density w with respect to the ó- nite measure Lebesgue 3 ë on ( 3 (, 1), B 3 U): Furthermore, let the probability measure P è,z be de ned by (4) and assume that it has a density p(:jè, z) with respect to a ó- nite measure ì on (, A ). If the function è! f (x, è, z) p(xjè, z)w(è, z) is absolutely continuous with respect to Lebesgue measure on with adon±nikodym derivative f _ (x, è, z), for ì 3 ë- almost all (x, z) 2 3 (, 1), if f _ (x, è, z) is ì 3 Lebesgue 3 ë-measurable and if z 1 j f _ (x, è, z)j dì(x) dè dë(z), 1 (1) holds, then the distribution function G in (1) has density g satisfying (6) with S equal to S ˆ Z 1 f_ (, W, Z): (11) f This implies that G is at least as spread out as K (see (2)) where K is de ned in (7) and H is the distribution function of S in (11). Proof. Since è! f (x, è, z) is absolutely continuous on for ì 3 ë-almost all (x, z) 2 3 (, 1), we obtain lim sup je 1 f f (x, è E=z, z) f (x, è, z)gj dì(x) dè dë(z) E!

326 E.. van den Heuvel and C.A.J. Klaassen è E=z < lim sup E 1 j f _ (x, t, z)j dt dì(x) dè dë(z) (12) E! è ˆ z 1 j f _ (x, t, z)j dt dì(x) dë(z), 1, in view of (1). From 1 G(y ä) ˆ 1 ( y,1) (z(t(x) è)) f (x, è ä=z, z) dì(x) dè dë(x), y 2, inequality (12) and Vitali's theorem it follows that G is differentiable with derivative g satisfying g(y) ˆ lim 1 ( y,1) (z(t(x) è))ä 1 f f (x, è, z) f (x, è ä=z, z)g dì(x) dè dë(z) ä! ˆ 1 ( y,1) (z(t(x) è))z 1 f _ (x, è, z) dì(x) dè dë(z): By Lemma 1.2.2 of Klaassen (1981), we obtain that fè 2 : f (x, è, z) ˆ, f _ (x, è, z) 6ˆ g is a Lebesgue null set, for ì 3 ë-almost all (x, z) 2 3 (, 1): This yields relation (6) with S given in (11). Now following the proof of Theorem 1.1 of Klaassen (1989a) from here on we obtain our result, via an argument as in the Neyman±Pearson lemma, as indicated below formula (7) above. u Theorem 1 is a generalization of Theorem 1.1 of Klaassen (1989a). The requirement of the latter that p(xjè, z) and w(è, z) should be both absolutely continuous in è is replaced here by the slightly weaker condition that p(xjè, z)w(è, z) is absolutely continuous in è. Taking P to be one-dimensional parametric with parameter è and taking the random variable c(p) ˆ Z to be degenerate at the constant a we obtain Theorem 1.1 of Klaassen (1989a). Our formulation is also more general in the sense that models with nuisance parameters are incorporated. In fact, both parametric models P ˆ fp è,ç : è 2 È, ç 2 Hg, È, H k, and semiparametric models P ˆ fp è,f : è 2 È, F 2 Fg, È, for F a set of distribution functions F, are included. This means that a spread inequality for arbitrary estimators of è in the presence of nite- or in nite-dimensional (unknown) nuisance parameters is contained in Theorem 1. We will illustrate this via a parametric model P ˆ fp è,ç : è 2 È, ç 2 Hg in the following example. Example 1. Let 1, 2,..., n be independent identically normally distributed random variables with mean è and standard deviation ç. Our model P is parametric with parameter space 3 (, 1), hence weight functions ~W on P p can be de ned via distributions W on 3 (, 1). Here we choose í(p) ˆ è and c(p) ˆ n =ç ˆ z: The problem is estimating the parameter è by an estimator T ˆ t( 1, 2,..., n ) in the presence of the nuisance parameter ç. This problem is determined by (, W, Z), where the conditional distribution of ˆ ( 1, 2,..., n ) T given (W, Z) ˆ (è, z) has density

Spread, estimators and nuisance parameters 327 p(xjè, z) ˆ Yn z p ö z p (x i è), x ˆ (x 1, x 2,..., x n ) T 2 n, n n iˆ1 with ö the standard normal density. Let the distribution W n ˆ W of (W, Z) be such that the corresponding weight function in (è, ç) is independent of the sample size n, with the conditional distribution of W given ç normal with mean zero and variance ç 2 ó 2 : More speci cally, let W n have density w n (è, z) ˆ ó 1 z 1 ö(ó 1 n 1=2 zè)w(n 1=2 z 1 ) with respect to Lebesgue measure on 3 (, 1) with w a xed density on (, 1). Now, the score statistic S in (11) becomes S ˆ Z n ( i W) Z n ó iˆ1 2 n W: The conditional distribution of S given Z ˆ z is equal to a normal distribution with mean zero and variance 1 ó 2 n 1, and hence H(y) ˆ Ö((1 ó 2 n 1 ) 1=2 y): Furthermore, 1 s H 1 (t) dt ˆ (1 ó 2 n 1 ) 1=2 ö(ö 1 (s)) and therefore K(y) ˆ Ö((1 ó 2 n 1 ) 1=2 y): Consequently, Theorem 1 yields G(:) > 1 Ö((1 ó 2 n 1 ) 1=2 :), (13) where 1 G(y) ˆ P è,z (z(t( 1, 2,..., n ) è) < y) 1 p ó z ö z p ó n è w 2 dè dz: (14) n z Note that T n ˆ t( 1, 2,..., n ) ˆ (n ó 2 ) 1 P n iˆ1 i is optimal, in the sense that equality is attained in (13). Consequently, our spread inequality (2) (see also Theorem 1) is sharp in this case. However, the conditional distribution of Z(T n W) given (W, Z) ˆ (è, z) is normal with mean z(1 ó 2 n) 1 è and variance (1 ó 2 n 1 ) 2 : This means that the optimal estimator T n is biased, but asymptotically consistent. The biasedness is caused by the weight function or prior w n (è, z) ˆ ó 1 z 1 ö(ó 1 n 1=2 zè)w(n 1=2 z 1 ), which is not uninformative. If we let ó tend to in nity then the prior tends to the uninformative prior, i.e. uninformative with respect to è. Futhermore, T n ˆ (n ó 2 ) 1 P n iˆ1 i converges to the sample mean n 1 P n iˆ1 i, as ó! 1. Indeed, the sample mean normed by Z, i.e. Z(n 1 P n iˆ1 i W), is standard normal as is the bound K in the limit. emark 1. Local asymptotic lower bounds may be obtained from Theorem 1 by choosing the weight function appropriately and by subsequently taking limits for sample size tending to in nity. For parametric models of n observations P n ˆ fp (n) è,ç : è 2 È, ç 2 Hg, È, H, we rst identify weight functions ~W n on P n via distributions W n on È 3 H, as in Example 1. Let W n have density a 2 n w(a n(è è ), a n (ç ç )), (15) where w is a density on 2, a n ˆ c(p (n) è,ç ) is independent of è and ç, and a n! 1 as n! 1. Under the same kind of conditions as in Theorem 4.1 of Klaassen (1989b) this yields a local asymptotic spread inequality at the parameter point (è, ç ): The corresponding asymptotic

328 E.. van den Heuvel and C.A.J. Klaassen bound is of the form (7), where H is the convolution of the limit distribution function of the ef cient score function with respect to the parameter of interest è and the nuisance parameter ç by another distribution function. Choosing w appropriately and taking suitable limits we obtain a bound K which in local asymptotic normality situations is the same as in the convolution theorem and the local asymptotic minimax theorem. For semiparametric models P n ˆ fp (n) è,ç : è 2 È, ç 2 Hg, È, H in nite-dimensional, an asymptotic spread inequality at (è, ç ) may be obtained by deriving spread inequalities for two-dimensional submodels through (è, ç ) and subsequently maximizing these bounds over all possible submodels. For more details and examples of local asymptotic spread inequalities for parametric and semiparametric models see Sections 2.5, 2.6 and 4.2 of van den Heuvel (1996), and for a review of the theory of ef cient estimation in semiparametric models based on the convolution theorem see Bickel et al. (1993). Acknowledgements We would like to thank the referees for their helpful comments. The research of the rst author was nanced by the Netherlands Organization for the Advancement of Scienti c esearch (NWO). eferences Bickel, P.J. and Lehmann, E.L. (1979) Descriptive statistics for nonparametric models. IV. Spread. In J. JurecÏkova (ed.), Contributions to Statistics. Jaroslav HaÂjek Memorial Volume, pp. 33±4. Prague: Academia. Bickel, P.J., Klaassen, C.A.J., itov, Y. and Wellner, J.A. (1993) Ef cient and Adaptive Estimation for Semiparametric Models. Baltimore, MD: Johns Hopkins University Press. Klaassen, C.A.J. (1981) Statistical Performance of Location Estimators, Mathematical Centre Tract 133. Amsterdam: Mathematisch Centrum. Klaassen, C.A.J. (1984) Location estimators and spread. Ann. Statist., 12, 311±321. Klaassen, C.A.J. (1989a) Estimators and spread. Ann. Statist., 17, 859±867. Klaassen, C.A.J. (1989b) The asymptotic spread of estimators. J. Statist. Plann. Inference, 23, 267± 285. van den Heuvel, E.. (1996) Bounds for statistical estimation in semiparametric models. Ph.D. thesis, University of Amsterdam. Venetiaan, S.A. (1994) Bootstrap bounds. Ph.D. thesis, University of Amsterdam. eceived November 1994 and revised September 1996.