ROYAL INSTITUTE OF TECHNOLOGY KUNGL TEKNISKA HÖGSKOLAN. Department of Signals, Sensors & Systems

Similar documents
Maximum Likelihood Estimation

ROYAL INSTITUTE OF TECHNOLOGY KUNGL TEKNISKA HÖGSKOLAN. Department of Signals, Sensors & Systems Signal Processing S STOCKHOLM

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

sine wave fit algorithm

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Average Reward Parameters

Czechoslovak Mathematical Journal

Richard DiSalvo. Dr. Elmer. Mathematical Foundations of Economics. Fall/Spring,

computation of the algorithms it is useful to introduce some sort of mapping that reduces the dimension of the data set before applying signal process

Time-Delay Estimation *

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

2 C. A. Gunter ackground asic Domain Theory. A poset is a set D together with a binary relation v which is reexive, transitive and anti-symmetric. A s

NECESSARY CONDITIONS FOR WEIGHTED POINTWISE HARDY INEQUALITIES

A General Overview of Parametric Estimation and Inference Techniques.

Lecture 8: Information Theory and Statistics

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

On Information and Sufficiency

Lecture 21. Hypothesis Testing II

1 More concise proof of part (a) of the monotone convergence theorem.

Math 328 Course Notes

Série Mathématiques. C. RADHAKRISHNA RAO First and second order asymptotic efficiencies of estimators

ECONOMETRIC MODELS. The concept of Data Generating Process (DGP) and its relationships with the analysis of specication.

Detection Theory. Composite tests

Testing for a Global Maximum of the Likelihood

Midterm 1. Every element of the set of functions is continuous

Expressions for the covariance matrix of covariance data

Statistical inference

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Notes on the Point-Set Topology of R Northwestern University, Fall 2014

On the convergence rates of genetic algorithms

Notes on Complexity Theory Last updated: December, Lecture 2

On Moving Average Parameter Estimation

Covariance 1. Abstract. lower bounds that converges monotonically to the CR bound with exponential speed of convergence. The

Lecture 7 Introduction to Statistical Decision Theory

Mi-Hwa Ko. t=1 Z t is true. j=0

LEBESGUE INTEGRATION. Introduction

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

Chapter 3. Point Estimation. 3.1 Introduction

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

Analog Neural Nets with Gaussian or other Common. Noise Distributions cannot Recognize Arbitrary. Regular Languages.

ECE 275A Homework 6 Solutions

A New Subspace Identification Method for Open and Closed Loop Data

A NOTE ON Q-ORDER OF CONVERGENCE

Two results in statistical decision theory for detecting signals with unknown distributions and priors in white Gaussian noise.

Improved MUSIC Algorithm for Estimation of Time Delays in Asynchronous DS-CDMA Systems

The uniformly accelerated motion in General Relativity from a geometric point of view. 1. Introduction. Daniel de la Fuente

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Inference in non-linear time series

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Some History of Optimality

AN INEQUALITY FOR THE NORM OF A POLYNOMIAL FACTOR IGOR E. PRITSKER. (Communicated by Albert Baernstein II)

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

Quantum logics with given centres and variable state spaces Mirko Navara 1, Pavel Ptak 2 Abstract We ask which logics with a given centre allow for en

Inferring from data. Theory of estimators

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Evaluating the Performance of Estimators (Section 7.3)

A Preference Semantics. for Ground Nonmonotonic Modal Logics. logics, a family of nonmonotonic modal logics obtained by means of a

This is the published version of a paper presented at IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2013.

The Degree of the Splitting Field of a Random Polynomial over a Finite Field

2 FRED J. HICKERNELL the sample mean of the y (i) : (2) ^ 1 N The mean square error of this estimate may be written as a sum of two parts, a bias term

Constrained Leja points and the numerical solution of the constrained energy problem

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

which the theoretical results were obtained. Image Signals The geometry of visual scenes under perspective projection generally yields complex image s

10. Linear Models and Maximum Likelihood Estimation

0.1 Uniform integrability

COMPACT DIFFERENCE OF WEIGHTED COMPOSITION OPERATORS ON N p -SPACES IN THE BALL

Computer Science Dept.

Cover page. : On-line damage identication using model based orthonormal. functions. Author : Raymond A. de Callafon

12 CHAPTER 1. PRELIMINARIES Lemma 1.3 (Cauchy-Schwarz inequality) Let (; ) be an inner product in < n. Then for all x; y 2 < n we have j(x; y)j (x; x)

Congurations of periodic orbits for equations with delayed positive feedback

3.0.1 Multivariate version and tensor product of experiments

RESEARCH REPORT. A note on gaps in proofs of central limit theorems. Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen

Information-Theoretic Limits of Matrix Completion

Stochastic dominance with imprecise information

12 Expectations. Expectations 103

YET MORE ON THE DIFFERENTIABILITY OF CONVEX FUNCTIONS

THE BAYESIAN ABEL BOUND ON THE MEAN SQUARE ERROR

Stochastic integral. Introduction. Ito integral. References. Appendices Stochastic Calculus I. Geneviève Gauthier.

2 Statistical Estimation: Basic Concepts

Can we do statistical inference in a non-asymptotic way? 1

The geometry of the statistical model for range-based localization

Three-dimensional Stable Matching Problems. Cheng Ng and Daniel S. Hirschberg. Department of Information and Computer Science

The iterative convex minorant algorithm for nonparametric estimation

Estimating Individual Mahalanobis Distance in High- Dimensional Data

which possibility holds in an ensemble with a given a priori probability p k log 2

Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim

ST5215: Advanced Statistical Theory

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

CONSIDER the problem of estimating the mean of a

ML ESTIMATION AND CRB FOR NARROWBAND AR SIGNALS ON A SENSOR ARRAY

Transcription:

The Evil of Supereciency P. Stoica B. Ottersten To appear as a Fast Communication in Signal Processing IR-S3-SB-9633 ROYAL INSTITUTE OF TECHNOLOGY Department of Signals, Sensors & Systems Signal Processing S-100 44 STOCKHOLM KUNGL TEKNISKA HÖGSKOLAN Institutionen för Signaler, Sensorer & System Signalbehandling 100 44 STOCKHOLM

The Evil of Supereciency P. Stoica Systems and Control Group, Uppsala University P.O. Box 27, S{751 03 Uppsala, Sweden B. Ottersten 1 Department of Signals, Sensors and Systems Royal Institute of Technology, S{100 44 Stockholm, Sweden. We discuss the intriguing notion of statistical supereciency in a straightforward manner with a minimum of formality. We point out that for any given parameter estimator there exist other estimators which have a strictly lower asymptotic variance and hence are statistically more ecient than the former. In particular, if the former estimator was statistically ecient (in the sense that its asymptotic variance was equal to the Cramer-Rao bound) then the latter estimators could be called \superef- cient". Among others, the phenomenon of supereciency implies that asymptotically there exists no uniformly minimum-variance parameter estimator. Key words: Eciency, Minimum-Variance, Cramer-Rao Bound 1 Introductory Remarks and Finite-Sample Supereciency The notion of statistical eciency plays a central role in the theory of parameter estimation. Usually an estimator ^ N of a (true and unknown) parameter vector is called statistically ecient if its mean square error (MSE) matrix attains the unbiased Cramer-Rao bound (CRB) C^N def = Ef(^ N? )(^ N? ) T g = C CRB (1) 1 Corresponding author. Currently visiting at ArrayComm Inc., 3141 Zanker Rd, San Jose, CA 95134, Phone (408) 952-1854, Email otterste@ s3.kth.se. Preprint submitted to Signal ProcessingTo appear as a Fast Communication insignal Processing

Hereafter Efg stands for the expectation operator and N denotes the number of data samples. Remark: As an historical aside we remark on the fact that what is now known as the CRB inequality was apparently discovered, for the single-parameter case, by Doob [1] and rediscovered in a neater manner by Frechet [2]. Darmois [3], Cramer [4] and Rao [5] have presented generalizations of the CRB inequality to the multi-parameter case. In particular the CRB derivation by Cramer in [4] is a masterpiece that is worth reading even nowadays. 2 It is well known that in nite samples (i.e. for N < 1) parameter estimators which satisfy (1) exist only under very restrictive conditions (see, e.g., [6] and the references therein). At any rate, whenever those conditions are satised, the maximum likelihood estimator (MLE) satises (1). Nevertheless, even in such cases it is misleading to call the MLE \statistically ecient", since parameter estimators with lower MSE may well exist. The latter estimators, which must necessarily be biased, might be said to be \superecient" with respect to the MLE. A rst example of an estimator more ecient statistically than the MLE was given by Stein (see [6] and the references therein). Let fy t g N t=1 denote a sequence of independent and identically distributed Gaussian random vectors with mean. Also let ^ = 1 N NX t=1 y t : (2) It is well known that ^ above is the MLE of. The covariance matrix of the estimation errors in ^ is readily shown to equal the CRB, and hence ^ can be said to be \statistically ecient" according to the usual terminology. Stein's estimator of is given by! ^ = 1? n? 2 ^ for n = dim() 3 (3) Nk^k 2 where k k denotes the vector Euclidean norm. By a somewhat involved calculation (see, e.g. [6]) it is possible to show that ^ has a lower MSE than ^ and hence than the CRB. More exactly n? 2 2 E k^? k 2? E k^? k 2 = E k^k?2 : (4) N A simpler example of an estimator more ecient than the MLE was presented in [7]. Consider the scenario above with = 0 and n = 1, and let 2 denote 2

the variance of fy t g. It is well known that the MLE of 2 is given by ^ 2 = 1 N NX t=1 y 2 t (5) and also that its MSE (or variance) is equal to C CRB = 2 4 =N. Let ^ 2 = 1 N + 2 NX t=1 y 2 t : (6) A straightforward calculation (see, e.g., [7]) shows that the MSE of (6) is 2 4 =(N + 2), which is always less than the MSE of the MLE and hence than the C CRB. The conclusion of the previous discussion about the nite-sample case is that calling an estimator, such as the MLE, \statistically ecient" whenever it achieves the C CRB is not fully justied, as more ecient estimators (that is, estimators with MSE less than C CRB ) may exist. In the nite-sample case, discussed so far, the existence of such \superecient" estimators is easily understood and hence accepted. These estimators compromise bias for variance in such a way that their MSE becomes lower than the C CRB. In the asymptotic case, which is discussed in the next section, the existence of superecient estimators is more intriguing and hence more dicult to accept. At an intuitive level the existence of such estimators should apparently be related to the discussion on biased estimation in the previous paragraphs. More exactly, even though asymptotically superecient estimators that are asymptotically unbiased do exist, (as we will see shortly), such estimators must be biased for N < 1. Indeed, under regularity conditions the covariance matrix of an unbiased estimator must satisfy the CRB inequality for all values of N (including N! 1, after proper normalization), and hence such an estimator cannot be \superecient" (i.e. it cannot have an asymptotic covariance matrix less than the asymptotic CRB). The phenomenon of statistical supereciency was apparently ignored in the engineering literature (with the notable exception of [8]). The discussion of asymptotic statistical supereciency in this note is based on [6] and is meant to introduce the concept and its main consequences to the signal processing community. 3

2 Asymptotic supereciency Let denote the compact set of possible parameter values, and let 2 be the true (and unknown) parameter vector. Furthermore, let ^ N denote the MLE of. Under weak regularity conditions the normalized estimation error p N(^N? ) converges in distribution to a Gaussian random vector with zero mean and covariance matrix equal to the asymptotic normalized CRB, that is p N(^N? ) d?! N (0; C CRB ) (7) where CCRB = lim N!1 NC CRB (with C CRB as dened before). The statistical and signal processing literature contains many examples of cases where (7) holds true 2. Let ~ N denote any other estimator of which is such that, similarly to (7), p N( ~ N? ) d?! N (0; C ~N ) : (8) Since both ^ N and ~ N above are asymptotically unbiased and consistent (we assume that the matrices C CRB and C ~N are nite), one might think that the following inequality (which is reminiscent of the CRB inequality for unbiased estimators) should hold C ~ N C CRB for any 2 : (9) (Note that both sides in (9) usually depend on, but to simplify the notation we have written C~ N in lieu of C~ () etc.). Fisher himself conjectured that N (9) should hold true, and hence that the MLE should asymptotically be the minimum-variance estimator in. However, (9) does not hold. In fact, for any given parameter estimator (MLE included) there exist other estimators which have a strictly lower asymptotic variance at some points in, and which are thus statistically more ecient. In particular, if the former estimator is asymptotically statistically ecient (such as is the MLE), in the sense that the covariance matrix of its asymptotic distribution attains C CRB, then the latter estimators can be called asymptotically statistically superecient. 2 The normalization may be dierent from that used in (7), for instance in sinusoidal parameter estimation problems the normalizing factor may be N 3=2, but this variation is of minor importance for the discussion that follows. We also stress that the following discussion does not depend on the assumption that the asymptotic distribution is Gaussian, as in (7). 4

The phenomenon of supereciency was discovered by Hodges and presented as a counterexample to Fisher's conjecture on (9) (see, e.g., [6] and the references therein). It implies that asymptotically there exists no uniformly minimumvariance parameter estimator. This strong negative result is sometimes said to be due to the evil of supereciency [6] (which has inspired the title of this note). To motivate the claims above we will make use of a generic example patterned after [6]. (We stress the fact that what follows is just an example of how to obtain a superecient estimator; no \general rules" for deriving such estimators appear to be available.) Let g denote a xed point in, and let ^ N be any given estimator of 2 satisfying (7). Consider the following estimator associated with ^ N ^ N = 8 > < >: ^ N if k^ N? g k > N?1=4 g if k^ N? g k N?1=4 (10) The asymptotic distribution of ^N is readily derived. Let 6= g (we assume that the dierence (? g ) does not depend on N, a rather reasonable condition). Then, prob(k^n? ^ N k > 0) (11) = prob(k^ N? g k N?1=4 ) = prob(kn 1=2 (^ N? ) + N 1=2 (? g )k N 1=4 )! 0 as N! 1 which implies that (^N? ^ N ) converges in probability to zero (^N? ^ N ) p?! 0 as N! 1 : (12) It follows from (12) and some standard stochastic convergence results (see, e.g., Prop. 6.3.3 in [9]) that ^N and ^ N have the same asymptotic distribution for 6= g. In particular, C^N = C^N at 6= g : (13) Next, let = g. Then, prob(k^n? k > 0) = prob(kn 1=2 (^ N? )k > N 1=4 )! 0 as N! 1 (14) 5

and hence, in this case, ^N converges both in distribution and in probability to (see Prop. 6.3.5 in [9]). Consequently, the asymptotic variance of ^N is zero at = g. In conclusion, ^N has the same asymptotic variance as ^ N for all points 6= g in, and a strictly smaller asymptotic variance at = g. The proof that for any estimator ^ N one can obtain an estimator ^N with lower asymptotic variance and the same type of asymptotic law is thus concluded. It is interesting to note that the subsets of on which one can asymptotically beat the CRB can be shown to have zero measure (a result attributed to LeCam, see [8] and the references therein). In view of this fact we may wonder whether the supereciency has any practical relevance. The theoretical importance of this concept denitely dominates its practical importance. However this does not mean that the statistical supereciency has no potential practical relevance. Consider, for instance, the problem of detecting a signal of unknown amplitude in noisy measurements. A usual way to solve this problem is to obtain an estimate of (let us say ^ N ) as well as of the standard deviation of ^ N (say ^ N ). Then one decides that the signal is not present (i.e. = 0) if j^ N j ^ N (for some constant ), and that it is present (i.e. 6= 0) otherwise. In general ^ N is on the order of N?1=2, and the aforementioned rule can be shown to be inconsistent. A simple idea to obtain a consistent detection rule would be to make use of the \superecient" estimate in (10) with g = 0. Based on ^N we simply decide that the signal is not present (is present) whenever ^N = 0 (j^n j > 0). Under the hypothesis that = 0 we have (cf. (14)), prob(j^n j > 0)! 0 as N! 1 (15) and under the assumption that 6= 0 (and constant) we get (cf. (11)), prob(^n = 0) = prob(j^n j 6= N?1=4 )! 0 as N! 1 (16) which proves the consistency of the detection rule based on ^N. Observe that in terms of the original estimate, ^ N, the detection rule based on ^N amounts to comparing ^ N with a threshold on the order of N?1=4 (in lieu of the order of N?1=2 mentioned above). 6

3 Concluding Remarks Owing to the existence of supereciency we should in principle avoid saying that an estimator which asymptotically achieves the (asymptotic) CRB is (uniformly) asymptotically ecient. However, there exists a considerable statistical and signal processing literature making use of such a phrasing. It may therefore be confusing and dicult to attempt changing the terminology. A solution to this dilemma is provided by [6]. It is shown there that there exist parameter estimators which, under weak conditions, are asymptotically minimax optimal in the sense that they yield the lowest possible value of the following loss function lim sup E k^ N? k 2 (17) N!1 2 By an abuse of terminology such estimators can be called \asymptotically statistically ecient" [6]. Under regularity conditions the MLE is such an estimator. Hence we can continue saying that the MLE is \asymptotically statistically ecient", but at the same time we should be aware of the fact that such a statement is not true in the Fisher's sense of possessing minimum asymptotic variance at each point in the parameter set. References [1] J.L. Doob, Statistical estimation, Trans. American Math. Soc., 39 (1936) 410{ 421. [2] M. Frechet, Sur l'extension de certaines evaluations statistiques au cas de petits echantillons, Revue Inst. Int. de Stat., 11 (1943) 182{205. [3] G. Darmois, Sur les limites de la dispersion de certaines estimations, Revue Inst. Int. de Stat., 13 (1945) 9{15. [4] H. Cramer, A contribution to the theory of statistical estimation, Skand. aktuarietidskrift, 29 (1946) 85{94. [5] C.R. Rao, Minimum variance and the estimation of several parameters, Proc. Cambridge Phil. Soc., 43 (1946) 280{283. [6] I.A. Ibragimov and R.Z. Has'minskii, Statistical Estimation { Asymptotic Theory (Springer-Verlag, New York, 1981). [7] P. Stoica and R. Moses, On biased estimators and the unbiased Cramer-Rao bound, Signal Processing 29 (1991) 344 350. 7

[8] G.R. Benitz, Asymptotic results for maximum likelihood estimation with an array of sensors, IEEE Trans. Info. Th., 39:1374{1385, 1993. [9] B. Brockwell and R. Davis, Time Series { Theory and Methods, 2 nd edition (Springer-Verlag, New York, 1991). 8