On the Efficiency of Maximum-Likelihood Estimators of Misspecified Models

217 25th European Signal Processing Conerence EUSIPCO On the Eiciency o Maximum-ikelihood Estimators o Misspeciied Models Mahamadou amine Diong Eric Chaumette and François Vincent University o Toulouse - ISAE-Supaero Toulouse France Emails: mouhamadou.diong@isae.r eric.chaumette@isae.r rancois.vincent@isae.r Abstract The key results on maximum-likelihood M estimation o misspeciied models have been introduced by statisticians P.J. Huber H. Akaike H. White Q. H. Vuong resorting to a general probabilistic ormalism somewhat diicult to rephrase into the ormalism widespread in the signal processing literature. In particular Vuong proposed two misspeciied Cramér-Rao bounds CRBs to address respectively the situation where the true parametric probability model is known or not known. In this communication derivations o the existing results on the accuracy o M estimation o misspeciied models are outlined in an easily comprehensible manner. Simple alternative derivations o these two misspeciied CRBs based on the seminal work o Barankin which underlies all the lower bounds introduced in deterministic estimation are provided. Since two distinct CRBs exist when the true parametric probability model is known a quasi-eiciency denomination is introduced. I. INTRODUCTION Since its introduction by R.A. Fisher in deterministic estimation 12 the method o maximum likelihood M estimation has become one o the most widespread used methods o estimation. The ongoing success o M estimators MEs originates rom the act that under reasonably general conditions on the probabilistic observation model 12 the MEs are in the limit o large sample support Gaussian distributed and consistent. Additionally i the observation model is Gaussian some additional asymptotic regions o operation yielding or a subset o MEs Gaussian distributed and consistent estimates have also been identiied at inite sample support 34. However a undamental assumption underlying the above classical results on the properties o MEs is that the probability distribution which determines the behavior o the observations is known to lie within a speciied parametric amily o probability distributions the probability model. In other words the probability model is assumed to be correctly speciied. Actually in many i not most circumstances a certain amount o mismatch between the true probability distribution o the observations and the probability model that we assume is present. As a consequence it is natural to investigate what happens to the properties o ME i the probability model is misspeciied i.e. not correctly speciied. Huber 5 explored in detail the perormance o MEs under very general assumptions on misspeciication and proved consistency normality and derived the MEs asymptotic covariance that is oten This work has been partially supported by the DGA/MRIS 215.6.9..47.75.1. reerred to as the Huber s sandwich covariance in literature. However Hubert did not explicitly discuss the inormation theoretic interpretation o this limit. This interpretation has been emphasized by Akaike 6 who has observed that when the true distribution is unknown the ME is a natural estimator or the parameters which minimizes the Kullback-eibler inormation criterion KIC between the true and the assumed probability model. Then White 7 provided simple conditions under which the ME is a strongly consistent estimator or the parameter vector which minimize the KIC. While not as general as Huber s conditions White s conditions are however suiciently general to have broad applicability. astly Q. H. Vuong 8 proposed two misspeciied Cramér-Rao bounds CRBs to address respectively the situation where the true parametric probability model is known or not known under a general probabilistic ormalism involving regular and semiregular parametric models. Thereore the purpose o this communication is twoold. Firstly in order to oster the understanding o the works o Huber Akaike and White on misspeciied MEs 567 derivations o the key results are outlined in an easily comprehensible manner. Secondly ollowing the lead o Barankin s seminal work in deterministic estimation 9 simple alternative derivations o the two misspeciied CRBs introduced by Vuong are put orward. As a by-product the misspeciied CRB proposed by Vuong 8 Theorem 4.1 when the true parametric probability model is unknown is a least-upper CRB in the Barankin sense which coincides with the Huber s sandwich covariance and so called misspeciied CRB under M constraints in 1 42 or misspeciied CRB or misspeciied-unbiased estimators in 11 5. ast since two distinct misspeciied CRBs exist when the true parametric probability model is known a quasi-eiciency denomination is introduced. A. Notations and assumptions et x l be a M-dimensional complex random vector representing the outcome o a random experiment i.e. the observation vector whose probability density unction p.d.. is known to belong to a amily P. A structure S is a set o hypotheses which implies a unique p.d.. in P or x l. Such p.d.. is indicated with p x l ; S. The set o all the a priori possible structures is called a probability model 2. We assume that the p.d.. o the random vector x l has a parametric representation i.e. we assume that every structure S is parameterized by a ISBN 978--9928626-7-1 EURASIP 217 15

217 25th European Signal Processing Conerence EUSIPCO P -dimensional vector ψ that is p x l ; ψ p x l ; S ψ and that the model is described by a compact subspace U R P. In the ollowing we consider i.i.d. observations {x l } l=1 or which the true parametric p.d.. denoted p δ x l p x l ; δ δ U δ R P δ and the assumed parametric p.d.. denoted x l x l ; U R P belong to two generally dierent amilies o p.d.. s P δ and F. et us denote: E g x E g x = g x x dx E δ g x δ g x = g x p δ x dx where x T = x T 1... x T x = l=1 x l and p δ x = l=1 p δ x l. I the true model is unknown or not needed i.e. we do not have or do not need prior inormation on the particular parameterization o the true distribution we reer to p δ x and P δ only as p x = l=1 p x l and P respectively and we denote: g x = g x p x dx. II. M ESTIMATION OF MISSPECIFIED MODES As mentioned in the introduction several authors 567 have contributed to show that under mild regularity conditions given in 7 and summarized in 11 Section II.A the misspeciied ME MME deined as: x = arg max { x} = arg max {ln x /} 1a is in the limit o large sample support a strongly consistent estimator or the parameter vector 1 which minimizes the KIC: x a.s. = arg min {D p } D p = ln p x l / x l. 1b Indeed as noticed in 6 since ln x / = l=1 ln x l / a.s. ln x l strong law o large numbers x is in general a natural estimator o: = arg max { ln x l } = arg min { ln p x l / x l } which can be proved to be strongly consistent under mild regularity conditions given in 7. Thereore the gradient o the M objective unction 1a can be well approximated via a irst order Taylor series expansion about : x a.s. 2 1 ln x; ln x; T 2a which in the limit o large sample support yields: x a.s. W 1 ln x; 2 ln x l ; W = T 2b 1 The value is called the pseudo-true parameter o or the model x l when p x l is the true p.d... ollowing rom similar argument given by Cramér 1 pp. 5-537. Hence x is asymptotically normal 7 Th. 3.2: x A N m C 3a m W 1 ln xl ; 3b C C = W 1 C ζ W 1 3c T where C = x m x m m = x ln x; ζ ζ x; = and: C ζ = ln xl ; ln xl ; ln x l ; T ln xl ; T. 3d In the particular case o the MME 1a-1b and under the regularity conditions summarized in 11 Section II.A is an interior point o U i.e. a local minimum o divergence D p which satisies: = arg Then: min {D p } = arg max { ln x l } { } ln xl ; = arg =. 4a ln xl ; = m = x = C = x x T 4b and the asymptotic covariance matrix 3c can be urther simpliied and reduces to the Huber s sandwich covariance: C H = W 1 ln xl ; ln x l ; T W 1. 4c ast since any covariance matrix satisies the covariance inequality 2 that is η x: C x η x T η x η x T 1 η x x T 5a consequently according to 3c almost surely η x: C H x η x T η x η x T 1 T Ep η x x 5b which is reerred to as the Huber s sandwich covariance inequality in the literature on misspeciied models. ISBN 978--9928626-7-1 EURASIP 217 151

217 25th European Signal Processing Conerence EUSIPCO III. CRB FOR MISSPECIFIED MODES IN THE BARANKIN SENSE When the true parametric model p x; δ is known and assumed to be correctly speciied all the lower bounds introduced in deterministic estimation 12 have been derived rom the seminal work o Barankin. In 9 Barankin established the general orm o the greatest lower bound on any sth absolute central moment s > 1 o a uniormly unbiased estimator with respect to p x; δ generalizing the earlier works o Cramér Rao and Battacharayya on locally unbiased estimators. Barankin showed 9 Section 6 among other things that the deinition o the CRB can be generalized to any absolute moment as the limiting orm o the Hammersley- Chapman-Robbins bound HaChRB. The general results introduced by Barankin require not only the knowledge o the parameterization o the true distribution p x; δ but also the ormulation o a uniorm unbiasedness constraint o the orm: E δ ĝ x = g δ δ U δ 6a i one wants to derive lower bounds on the MSE o an unbiased estimate ĝ x o the vector g δ o unctions o δ. Since the MME x 1a-1b is in the limit o large sample support an unbiased estimate o 4b and since there exists an implicit relationship between and δ 4a: δ = arg { E δ ln xl ; the orm o 6a o interest is thereore: } = 6b E δ x = δ δ U δ. 6c Then according to 9 and in the particular case o the MSE absolute moment o order 2 the misspeciied CRB MCRB or unbiased estimates o the pseudo-true parameter is deined or any selected value δ as the limiting orm o the HaChRB obtained rom the covariance inequality 5a where 13: η x T = 1 p x; δ + u 1 dδ p x; δ... p x; δ + u Pδ dδ p x; δ E δ x η x T = δ + u 1 dδ... δ + u Pδ dδ = δ u p is the pth column o the identity matrix I Pδ and dδ leading to C MCRB δ where: MCRB δ = δ δ T ln p x; δ ln p x; δ 1 δ E δ δ T. 7 δ / δ T can be easily obtained using the ollowing implicit unction theorem 14 Theorem 9.28. et h δ = h 1 δ h P δ T be a unction o R P R P δ R P. et us assume the ollowing: A1 h p δ or p = 1... P are dierentiable unctions on a neighborhood o the point δ in R P R P δ A2 h δ = A3 the P P Jacobian matrix o h δ with respect to is nonsingular at δ. Then there is a neighborhood o the point δ in R P δ there is a neighborhood Θ o the point in R P and there is a unique mapping ϕ : Θ such that = ϕ δ and h ϕ δ δ = or all δ in. Furthermore ϕ δ is dierentiable at δ and satisies: ϕ δ ϕ δ = h δ 1 h δ δ δ T δ T + o δ δ. In the case addressed 6b: h δ = E ln xl ; δ and ϕ δ = δ. Then: δ δ T and 7 becomes: = W 1 ln xl ; E δ ln p x l ; δ δ T MCRB δ = W 1 ln xl ; ln p x l ; δ Eδ δ T 1 1 ln p E xl ; δ ln p x l ; δ δ ln p xl ; δ ln x l ; E δ δ T W 1. 8 Clearly the Barankin approach provides a simpler alternative derivation o 8 than the one earlier proposed by Vuong 8 Theorem 3.1 under a general probabilistic ormalism involving regular and semi-regular parametric models and somewhat diicult to ollow. Moreover by the covariance inequality: E δ ln xl ; ln x l ; T ln xl ; ln p x l ; δ E δ δ T ln p xl ; δ ln p x l ; δ 1 E δ ln p xl ; δ ln x l ; E δ δ T 9a ISBN 978--9928626-7-1 EURASIP 217 152

217 25th European Signal Processing Conerence EUSIPCO with equality i 2: ln x l ; ln xl ; ln p x l ; δ = E δ δ T ln p xl ; δ ln p x l ; δ 1 ln p xl ; δ E δ. 9b δ Thereore when p x; δ is known one can assert that in most cases: C C H > MCRBδ 1 in other words in most cases is no longer an eicient estimator o in comparison with the correctly speciied case 1. Furthermore i p x; and x; share the same parameterization i.e. p x; and x; then the MSE o is: MSE = T x x 11a and in the limit o large sample support is given by: MSE C H + T. 11b Thereore when the true parametric model is known one can assert that in most cases the MME is not a consistent estimator o and whenever it is consistent it is not an eicient estimator o which contrasts with the behavior o MEs since i the MEs are consistent then they are also asymptotically eicient 3. IV. EAST-UPPER CRB FOR MISSPECIFIED MODES IN THE BARANKIN SENSE I the true model is unknown i.e. we do not have prior inormation on the particular parameterization o the true distribution p x l the ormulation o uniorm unbiasedness 6c is no longer possible. However the Barankin approach can still be used by building on Vuong s work where it is shown 8 Theorem 4.1 that under mild regularity conditions summarized in 11 Section II.A the ollowing surrogate parametric model: p x l p x l ; = p x l c 1 + exp 1 x l; x l ; 12 where c is a normalizing constant is a locally least avorable true parametric model in the MSE sense. Indeed the minimization o the KIC D p 1b at the vicinity o yields a locally unbiased estimator o allowing or the derivation o the MCRB 8 associated with p x l ; which satisies 9b at since 8 A.62: ln x l ; / = α ln p x l ; / α = 2. Thus the MCRB associated with p x l ; coincides at with the Huber s sandwich covariance C H 4c. Thereore by reerence to 5b the Huber s sandwich covariance appears to be the leastupper MCRB UMCRB or locally unbiased estimates o the pseudo-true parameter : UMCRB = C H 13 both in Vuong and Barankin senses. Another noteworthy point stressed in 1 Section VII.C is that since in the limit o large sample support the MME satisies 1 57: ln x; x T = W 1 ln xl ; ln x l ; T C H 4c is also obtained rom 5b where the score unction η x is deined as η x ln x; /. Finally the so called MCRB under M constraints 1 42 and the so called MCRB or misspeciied-unbiased estimators 8 Theorem 4.1 11 5 appear to be in the Barankin sense the UMCRB 13. A. Quasi-eiciency Interestingly enough i the true parametric model p x l ; is known the parametric model 12 can still be deined as: p x l = p x l ; δ 1 + exp 1 x l; c x l ; where = δ and all the results mentioned above still hold as well. Then the covariance matrix o a locally unbiased estimator o may be either equal to the MCRB or to the UMCRB. In the Barankin sense the ormer case deines an eicient estimator. Hence the need to introduce a new denomination or the latter case. We propose to call such an estimator a quasi-eicient estimator. Finally one can assert that in most cases the MME is not a consistent estimator o and whenever it is consistent it is only a quasieicient estimator o. As expected both the MCRB and the UMCRB reduce to the usual CRB when the model is known to be correctly speciied i.e. x l ; p x l ; since then = and p x l ; = p x l ; leading to: MCRB = UMCRB = 1 1 ln p E xl ; ln p x l ; T = CRB and in the limit o large sample support any quasi-eicient estimator becomes an eicient estimator. And last but not least since the MMEs asymptotic covariance matrix C H is available 4c the derivation o additional lower bounds via the Huber s sandwich inequality 5b may seem questionable. It is probably the reason why misspeciied lower bounds have received little consideration in the literature 815 except very recently 111. V. AN IUSTRATIVE EXAMPE We revisit the problem o the estimation o the variance o Gaussian data in the presence o misspeciied mean value proposed in 11 Section III. et us assume to have a set o i.i.d. scalar observations x = x 1... x T distributed according to a Gaussian p.d.. with a known mean value m x and an unknown variance σ 2 x i.e. p x; = ISBN 978--9928626-7-1 EURASIP 217 153

217 25th European Signal Processing Conerence EUSIPCO p N x; m x 1 I. Suppose now that the assumed Gaussian p.d.. is x; = p N x; m1 I so we misspeciy the mean value. Then 1a-1b become 11 Section III: x = 1 x l m 2 = + m x m 2 l=1 where x = and the UMCRB is given by 11 22: UMCRB = 22 + 4 m x m 2 + m x m 4. 14 According to 7 since / = 1 the MCRB is simply: MCRB = 1 E ln px; 2 = CRB = 22 15 which exempliy that the MME is only a quasi-eicient estimator o as predicted in most cases when the true parametric model is known 1. REFERENCES 1 H. Cramér Mathematical Methods o Statistics. Princeton NJ: Princeton Univ. Press 1946 2 E.. ehmann and G. Casella Theory o Point Estimation 2nd ed.. Springer 1998 3 S. Haykin J. itva and T. J. Shepherd Radar Array Processing Chapter 4 Springer-Verlag 1993 4 A. Renaux P. Forster E. Chaumette and P. arzabal On the high snr cml estimator ull statistical characterization IEEE Trans. on SP 5412: 484-4843 26 5 P. J. Huber The behavior o maximum likelihood estimates under nonstandard conditions in Proceedings o the Fith Berkeley Symposium in Mathematical Statistics and Probability 1967. 6 H. Akaike Inormation Theory and an Extension o the Maximum ikelihood Principle in Proceeding o IEEE ISIT 1973 7 H. White Maximum likelihood estimation o misspeciied models Econometrica vol. 5 pp. 1-25 Jan. 1982. 8 Q. H. Vuong Cramér-Rao bounds or misspeciied models working paper 652 Div. o he Humanities and Social Sci. Caltech Pasadena USA 1986 9 E.W. Barankin ocally best unbiased estimates Ann. Math. Stat. 24: 477-51 1949. 1 C.D. Richmond and.. Horowitz Parameter Bounds on Estimation Accuracy Under Model Misspeciication IEEE Trans. on SP 639: 2263-2278 215 11 S. Fortunati F. Gini and M. S. Greco The misspeciied CRB and its application to the scatter matrix estimation in complex elliptically symmetric distributions in IEEE Trans. Signal Process 649: 2387-2399 216 12 K. Todros and J. Tabrikian General Classes o Perormance ower Bounds or Parameter Estimation-Part I: Non-Bayesian Bounds or Unbiased Estimators IEEE Trans. on IT 561: 564-582 21. 13 R. McAulay E.M. Hostetter Barankin Bounds on parameter estimation IEEE Trans. on IT 176: 669-676 1971 14 W. Rudin Principles o Mathematical Analysis. New York: Mc-Graw- Hill 1976. 15 T. B. Fomby and R. C. Hill Maximum-ikelihood Estimation o Misspeciied Models: Twenty Years ater. Oxord U.K.: Elsevier td. 23 ISBN 978--9928626-7-1 EURASIP 217 154