Commentary on Basu (1956)

Similar documents
Asymptotic Relative Efficiency in Estimation

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Asymptotic efficiency of simple decisions for the compound decision problem

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

7. Estimation and hypothesis testing. Objective. Recommended reading

Applications of Basu's TheorelTI. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University

Statistical inference

Bahadur representations for bootstrap quantiles 1

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Can we do statistical inference in a non-asymptotic way? 1

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Lecture 4: September Reminder: convergence of sequences

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Gaussian Estimation under Attack Uncertainty

On robust and efficient estimation of the center of. Symmetry.

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Parametric Techniques Lecture 3

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Parametric Techniques

On probabilities of large and moderate deviations for L-statistics: a survey of some recent developments

Lecture 16: Sample quantiles and their asymptotic properties

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

On Backtesting Risk Measurement Models

Lecture 21. Hypothesis Testing II

How likely is Simpson s paradox in path models?

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0)

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Delta Method and Applications

Série Mathématiques. C. RADHAKRISHNA RAO First and second order asymptotic efficiencies of estimators

7. Estimation and hypothesis testing. Objective. Recommended reading

A noninformative Bayesian approach to domain estimation

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Fiducial Inference and Generalizations

A sequential hypothesis test based on a generalized Azuma inequality 1

On Invariant Within Equivalence Coordinate System (IWECS) Transformations

What s New in Econometrics. Lecture 13

Testing for Regime Switching in Singaporean Business Cycles

Plugin Confidence Intervals in Discrete Distributions

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

William C.L. Stewart 1,2,3 and Susan E. Hodge 1,2

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

A General Overview of Parametric Estimation and Inference Techniques.

arxiv: v1 [math.st] 28 Feb 2017

Khinchin s approach to statistical mechanics

Contents 1. Contents

Math 494: Mathematical Statistics

A Likelihood Ratio Test

Seminar über Statistik FS2008: Model Selection

Basic Concepts of Inference

An Introduction to Multivariate Statistical Analysis

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Introduction to Machine Learning. Lecture 2

Review: Statistical Model

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Stat 5101 Lecture Notes

Outline of GLMs. Definitions

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Swarthmore Honors Exam 2012: Statistics

Empirical Power of Four Statistical Tests in One Way Layout

General Glivenko-Cantelli theorems

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Extensions to Basu s theorem, factorizations, and infinite divisibility

Research Article A Note on the Central Limit Theorems for Dependent Random Variables

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Generalized Linear Models

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Testing Restrictions and Comparing Models

Estimation and Confidence Sets For Sparse Normal Mixtures

Lectures on Statistics. William G. Faris

V. Properties of estimators {Parts C, D & E in this file}

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Mark Gordon Low

DESCRIPTIVE STATISTICS FOR NONPARAMETRIC MODELS I. INTRODUCTION

Local Asymptotics and the Minimum Description Length

Plausible Values for Latent Variables Using Mplus

Long-Run Covariability

Lecture 8: Information Theory and Statistics

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Stat 5421 Lecture Notes Likelihood Inference Charles J. Geyer March 12, Likelihood

Chapter 9: Hypothesis Testing Sections

(1) Introduction to Bayesian statistics

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Rank-sum Test Based on Order Restricted Randomized Design

Mathematical Statistics

But Wait, There s More! Maximizing Substantive Inferences from TSCS Models Online Appendix

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

Comparing two independent samples

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Statistics: Learning models from data

Some History of Optimality

Theoretical Statistics. Lecture 23.

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Probability and Measure

Transcription:

Commentary on Basu (1956) Robert Serfling University of Texas at Dallas March 2010 Prepared for forthcoming Selected Works of Debabrata Basu (Anirban DasGupta, Ed.), Springer Series on Selected Works in Probability and Statistics. Asymptotic relative efficiency of estimators For statistical estimation problems, it is typical and even desirable that more than one reasonable estimator can arise for consideration. One natural and time-honored approach for choosing an estimator is simply to compare the sample sizes at which the competing estimators meet a given standard of performance. This depends upon the chosen measure of performance and upon the particular population distribution F. For example, we might compare the sample mean versus the sample median for location estimation. Consider a distribution function F with density function f symmetric about an unknown point θ to be estimated. For {X 1,..., X n } a sample from F, put X n = n 1 n i=1 X i and Med n = median{x 1,..., X n }. Each of X n and Med n is a consistent estimator of θ in the sense of convergence in probability to θ as the sample size n. To choose between these estimators we need to use further information about their performance. In this regard, one key aspect is efficiency, which answers: Question A How concentrated about θ is the sampling distribution of θ? Criteria for asymptotic relative efficiency Variance as a measure of performance A simple and natural criterion relative to the above question is the variance of the sampling distribution: the smaller this variance, the more efficient is that estimator. In this Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083-0688, USA. Email: serfling@utdallas.edu. Website: www.utdallas.edu/ serfling. Support by NSF Grant DMS-0805786 and NSA Grant H98230-08-1-0106 is gratefully acknowledged. 1

regard, let us consider large-sample sampling distributions. For X n, the classical central limit theorem tells us: if F has finite variance σ 2 F, then the sampling distribution of X n is approximately N(θ, σ 2 F/n), i.e., Normal with mean θ and variance σ 2 F/n. For Med n, a similar classical result [10] tells us: if the density f is continuous and positive at θ, then the sampling distribution of Med n is approximately N(θ, 1/4[f(θ)] 2 n). On this basis, we consider X n and Med n to perform equivalently at respective sample sizes n 1 and n 2 if σ 2 F n 1 = 1 4[f(θ)] 2 n 2. Keeping in mind that these sampling distributions are only approximations assuming that n 1 and n 2 are large, we define the asymptotic relative efficiency (ARE) of Med to X as the large-sample limit of the ratio n 1 /n 2, i.e., ARE(Med, X, F) = 4[f(θ)] 2 σ 2 F. (1) More generally, for any parameter η of a distribution F, and for estimators η (1) and η (2) which are approximately N(η, V 1 (F)/n) and N(η, V 2 (F)/n), respectively, the ARE of η (2) to η (1) is given by ARE( η (2), η (1), F) = V 1(F) V 2 (F). (2) Interpretation: If η (2) is used with a sample of size n, the number of observations needed for η (1) to perform equivalently is ARE( η (2), η (1), F) n. In view of the asymptotic normal distribution underlying the above formulation of ARE in estimation, we may also characterize the ARE given by (2) as the limiting ratio of sample sizes at which the lengths of associated confidence intervals at approximate level 100(1 α)%, ( η (i) ± Φ 1 1 α ) V i (F), i = 1, 2, 2 n i converge to 0 at the same rate, when holding fixed the coverage probability 1 α. (In practice, of course, consistent estimates of V i (F), i = 1, 2, are used in forming the CI.) The treatment of ARE for consistent asymptotically normal estimators using the variance criterion had been long well established by the 1950s see [1] for a string of references. Probability concentration as a measure Iinstead of comparison of asymptotic variance parameters as a criterion, one may quite naturally compare the probability concentrations of the estimators in any ε-neighborhood of the target parameter η: P( η (i) η > ε), i = 1, 2. When we have log P( η (i) n η > ε) n γ (i) (ε, η), i = 1, 2, 2

as is typical, then the ratio of sample sizes n 1 /n 2 at which these concentration probabilities converge to 0 at the same rate is given by γ (1) (ε, η)/γ (2) (ε, η), which then represents another ARE measure for the efficiency of estimator η n (2) relative to η n (1). This entails approximation of the sampling distribution in the tails. Accordingly, instead of central limit theory the relevant tool is large deviation theory, which is rather more formidable. In the context of hypothesis testing, Chernoff [3] argued that when the sample size approaches infinity it is appropriate to minimize both Type I and Type II error probabilities, rather than minimizing one with the other held fixed. He developed an ARE index essentially based on tail probability approximations. See also [10, 1.15.4] for general discussion. How compatible are these two criteria? Those who have been fortunate enough to observe D. Basu in action, as I was when we were colleagues at Florida State University in the early 1970s, know his talent for inquiring into the boundaries of any good idea. Relative to the present context, when the variance and probability concentration criteria were just becoming established criteria in the 1950s, stemming from somewhat differing orientations, it was Basu who thought of exploring their compatibility. Basu [2] provides an example in which not only do the variance-based and concentration-based measures disagree on which estimator is better. but they do so in the most extreme sense: one ARE is infinite at every choice of F in a given class Ω, while the other ARE is zero for every such F. Basu s construction is straightforward and worth discussing, so we briefly examine some details. For X 1,..., X n an i.i.d. sample from N(µ, 1), put X n = n 1 n X i and S n = i=1 n (X i X n ) 2. Basu defines the estimation sequences for µ given by T = {t n } and T = {t n }, with i=1 t n = (1 H n )X n + nh n and t n = X [ n], where H n = 1 if S n > a n and 0 otherwise, and a n satisfies P(S n > a n ) = 1/n. He shows that d n(t n µ) N(0, 1). Since also n 1/4 (t n µ) d N(0, 1), it follows that the ARE according to (2) is given by ARE(t n, t n 1 n, N(µ, 1)) = lim = 0. (3) n n 1/2 He also shows that the corresponding ARE based on concentration probabilities for any fixed choice of ε is given by n 1 lim =. (4) n o(n 1 ) An immediate observation about this example is that it is not pathological. Rather, it employs ordinary ingredients characteristic of typical application contexts. 3

Another important aspect is that the disagreement between the two notions of ARE is as extreme as possible. Not merely differing with respect to whether the ARE is < 1 or > 1, here one version is infinite at every choice of F in the class Ω = {N(µ, 1) : < µ < }, while the other version is zero for every such F. The details of proof yield the interesting corollary that (4) also gives the concentration probability ARE of t n versus simply X n. Thus the estimator which is optimal under the variance ARE criterion is infinitely nonoptimal under the concentration probability ARE criterion. A slight variation on Basu s {t n } provides an example of superefficient estimator, similar to that of J. L. Hodges (see Le Cam, 1953). discussed in Lehmann and Casella (1998). Put t n = A(1 H n )X n + nh n for some constant A 1. Then we have that n(t d n µ) N(0, A 2 ) + lim n nµ(a 1), i.e., n(t n µ) converges to ± if µ 0 and otherwise converges to N(0, A 2 ). Therefore, in the case that µ = 0 and A < 1, the estimator t n outperforms the optimal estimator. See Lehmann and Casella (1998) for useful discussion of superefficiency. We see that the content of Basu s example, like all of his contributions to statistical thinking, reflects great ingenuity and insight applied very productively to useful purposes. Subsequent developments The impact of Basu [2] thus has been to motivate stronger interest in large deviation (LD) approaches to ARE. For example, Bahadur [1] follows up with a deep discussion of this approach along with many constructive ideas. Quite a variety of LD and related moderate deviation approaches are discussed in Serfling [10, Chap. 10]. More recently, Puhalskii and Spokoiny [9] provide an extensive treatment of the LD approach in statistical inference. For convenient elementary overviews on ARE in estimation and testing, see DasGupta [4], Serfling [11], and Nikitin [8], for example. Acknoledgment Very helpful suggestions of Anirban DasGupta are greatly appreciated and have been used to improve the manuscript. Also, support by NSF Grant DMS-0805786 and NSA Grant H98230-08-1-0106 is gratefully acknowledged. References [1] Bahadur, R. R. (1960). Asymptotic relative efficiency of tests and estimates. Sankhyā 22 229 252. [2] Basu, D. (1956). On the concept of asymptotic relative efficiency. Sankhyā 17 193 196. 4

[3] Chernoff, H. (1952). A measure of asymptotic relative efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics 23 493 507. [4] DasGupta, A. (1998). Asymptotic relative efficiency. Encyclopedia of Biostatistics, Vol. I, 210 215, P. Armitage and T. Colton (Eds.). John Wiley, New York. [5] Le Cam, L. (1953). On some asymptotic properties of maximum likelihood estimates and related Bayes estimates. University of Calif. Publ. in Statistics 1 277 330. [6] Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, 2nd edition. Springer. [7] Nikitin, Y. (1995). Asymptotic Efficiency of Nonparametric Tests. Cambridge University Press. [8] Nikitin, Y. (2010). Asymptotic relative efficiency in testing. International Encyclopedia of Statistical Sciences (M. Lovric, ed.). Springer. [9] Puhalskii, A. and Spokoiny, V. (1998). On large-deviation efficiency in statistical inference. Bernoulli 4 203 272. [10] Serfling, R. (1980). Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New York. [11] Serfling, R. (2010). Asymptotic relative efficiency in estimation. International Encyclopedia of Statistical Sciences (M. Lovric, ed.). Springer. 5