Measure-Transformed Quasi Maximum Likelihood Estimation

Size: px

Start display at page:

Download "Measure-Transformed Quasi Maximum Likelihood Estimation"

Andrew Watson
5 years ago
Views:

1 Measure-Transformed Quasi Maximum Likelihood Estimation 1 Koby Todros and Alfred O. Hero Abstract In this paper, we consider the problem of estimating a deterministic vector parameter when the likelihood function is unknown or not expressible. We develop an estimator, called measure-transformed quasi maximum likelihood estimator (MT-QMLE), that minimizes the empirical Kullback-Leibler divergence between the transformed probability measure of the data and a hypothesized Gaussian probability distribution. By judicious choice of the transform we show that the proposed estimator can gain sensitivity to higher-order statistical information and resilience to outliers. Under some regularity conditions we show that the MT-QMLE is consistent, asymptotically normal and unbiased. Furthermore, we derive a necessary and sufficient condition for its asymptotic efficiency. The MT-QMLE requires complete knowledge, and in some cases only partial knowledge, of the mean and covariance functions that characterize the hypothesized Gaussian probability measure. When these quantities are completely unknown a plug-in version of the MT-QMLE is developed that is based on their empirical estimates. The MT-QMLE and its plug-in version are applied to source localization problems in simulation examples that illustrate their sensitivity to higher-order statistical information and resilience to outliers. Index Terms Higher-order statistics, parameter estimation, probability measure transform, robust estimation, source localization. I. INTRODUCTION Classical multivariate estimation [1], [2] deals with the problem of estimating a deterministic vector parameter using a sequence of multivariate samples from an underlying probability distribution. When the probability distribution is known to lie within a specified parametric family of probability measures, parameter estimation techniques such as the method of maximum likelihood [3] can be implemented that utilize complete statistical information. In many practical scenarios this knowledge is unavailable, and therefore, one must resort to other methods that require partial statistical information.

2 2 One of the most popular techniques of this kind is the method of moments [3], [4] that is based on fitting multivariate cumulants of certain orders to their empirical estimates. Usually, first and secondorder cumulants, i.e., the mean vector and covariance matrix, are used. Their popularity arises from the fact that they are easy to manipulate, their sample estimates have simple implementations, and the performance analysis of the resulting estimators is often traceable. In some circumstances, first and second-order cumulants may be non-informative, such as for certain types of non-gaussian data. In order to overcome this limitation higher-order cumulants may be incorporated. However, unlike first and secondorder cumulants, higher-order cumulants involve complicated tensor analysis [5], and their empirical estimates are highly non-robust to outliers and have increased computational and sample complexity. Based on the observation that the mean vector and the covariance matrix are the gradient vector and Hessian matrix of the cumulant generating function evaluated at the origin, another framework that does not require knowledge of the likelihood function has been proposed in [6]. This method operates by fitting the gradient or Hessian of the cumulant generating function evaluated at some off-origin points to their empirical estimates. These points are called processing points and can be judiciously chosen to improve estimation performance. Similarly to the mean vector and the covariance matrix, the resultant cumulantrelated quantities have appealing decomposability properties and their sample estimates have simple implementations. Additionally, they have the key advantage that they involve higher-order statistical information. This approach has been successfully applied to signal gain estimation [6] and to autoregression parameter estimation [7], where a data-driven procedure for optimal selection of the processing points has been devised that minimizes an empirical estimate of the asymptotic mean-squared-error (MSE). Other applications can be found in [8]-[13]. Recently, we have shown that the off-origin gradient vector and Hessian matrix of the cumulant generating function may be viewed as the standard mean vector and covariance matrix under some transformed probability measure [14]-[16]. Hence, a wider class of estimators than the one proposed in [6] can be obtained by considering other measure-transformed mean vectors and covariance matrices. These estimators can gain sensitivity to higher-order statistical information, resilience to outliers, and yet have the computational advantages of first and second-order methods of moments. In this paper, we use this measure transformation approach to develop a new estimator that operates by jointly fitting a measure-transformed mean vector and covariance matrix to their empirical estimates. The proposed transform is structured by a non-negative function, called the MT-function, and maps the probability distribution into a set of new probability measures on the observation space. By modifying the MT-function, classes of measure transformations can be obtained that have different useful properties.

3 3 Under the proposed transform we define the measure-transformed (MT) mean vector and covariance matrix, derive their strongly consistent estimates, and study their sensitivity to higher-order statistical information and resilience to outliers. Similarly to the quasi maximum likelihood estimator (QMLE) [17], [18], the proposed estimator, called MT-QMLE minimizes the empirical Kullback-Leibler divergence [19] between the transformed probability measure and a hypothesized probability distribution. In this paper, the hypothesized distribution is a Gaussian probability measure with the MT-mean vector and MT-covariance matrix parameterized by the unknown parameter to be estimated. Under some regularity conditions we show that the MT-QMLE is consistent, asymptotically normal and unbiased. Additionally, a necessary and sufficient condition for asymptotic efficiency is derived. Furthermore, a data-driven procedure for optimal selection of the MTfunction within some parametric class of functions is developed that minimizes an empirical estimate of the asymptotic MSE. The MT-QMLE requires complete knowledge, and in some cases only partial knowledge, of the MTmean and MT-covariance functions that characterize the hypothesized Gaussian probability measure. When these quantities are completely unknown a plug-in version of the MT-QMLE is developed that is based on their empirical estimates. We illustrate the MT-QMLE for the problem of angle of arrival (AOA) based source localization [20] in heavy-tailed compound Gaussian noise [21] that produces outliers. By specifying the MT-function within the family of zero-centered Gaussian functions parameterized by a scale parameter, we show that unlike the sample covariance matrix (SCM) based estimator [22] the MT-QMLE is resilient to outliers. Moreover, we show that the MT-QMLE performs similarly to the maximum-likelihood estimator (MLE) that, unlike the MT-QMLE, requires complete knowledge of the likelihood function. The plug-in MT- QMLE is illustrated for the problem of received signal strength (RSS) based source localization [23] in heavy-tailed compound Gaussian noise. By specifying the MT-function within the family Gaussian functions parameterized by location and scale parameters, we show that unlike the sample mean based estimator, which is the MLE under the assumption of isotropic Gaussian noise, the MT-QMLE is resilient to outliers and performs similarly to the standard MLE. The paper is organized as follows. In Section II, we define the MT-mean and MT-covariance and derive their empirical estimates. In Section III, we use these quantities to construct the MT-QMLE. The plug-in MT-QMLE is developed in Section IV. The proposed estimators are applied to source localization problems in Section V. In Section VI, the main points of this contribution are summarized. The proofs of the propositions and theorems stated throughout the paper are given in the Appendix.

4 4 II. MEASURE-TRANSFORMED MEAN AND COVARIANCE In this section, we develop a transform on the probability measure of a random vector whose probability distribution is parametrized by the vector parameter to be estimated. Under the proposed transform, we define the measure-transformed mean vector and covariance matrix, derive their strongly consistent estimates, and establish their sensitivity to higher-order statistical information and resilience to outliers. These quantities will be used in the following section to construct the measure-transformed quasi maximum likelihood estimator. A. Probability measure transform We define the measure space (, S,P ; ), where C p is the observation space of a random vector, S is a -algebra over, P ; is a probability measure that belongs to some unknown parametric family of probability measures {P ;# : # 2 } on S, 2 is the unknown vector parameter to be estimated and R m denotes the parameter space. Let g :!C denote an integrable scalar function on. The expectation of g () under P ; is defined as Z E[g ();P ; ], g (x) dp ; (x), (1) where x 2. The empirical probability measure ˆP from P ; is specified by ˆP (A) = 1 N where A 2S, and n ( ) is the Dirac probability measure at n [24]. N n=1 Definition 1. Given a non-negative function u : C p! R + satisfying a transform on P ; is defined via the relation: where A 2S and given a sequence of samples n, n =1,...,N n (A), (2) 0 < E[u ();P ; ] < 1, (3) Q ; (A), T u [P ; ](A) = ' u (x; ), The function u ( ) is called the MT-function. Z A ' u (x; ) dp ; (x), (4) u (x) E[u ();P ; ]. (5) Proposition 1 (Properties of the transform). Let Q ; be defined by relation (4). Then

5 5 1) Q ; is a probability measure on S. 2) Q ; is absolutely continuous w.r.t. P ;, with Radon-Nikodym derivative [24]: [A proof is given in Appendix A] dq ; (x) dp ; (x) = ' u (x; ). (6) The probability measure Q ; is said to be generated by the MT-function u ( ). By modifying u ( ), such that the condition (3) is satisfied, virtually any probability measure on S can be obtained. B. The MT-mean and MT-covariance According to (6) the mean vector and covariance matrix of under Q ; are given by: µ ( ), E[' u (; );P ; ] (7) and ( ), E µ ( ) µ ( ) H 'u (; );P ;, (8) respectively. Equations (7) and (8) imply that µ ( ) and ( ) are weighted mean and covariance of under P ;, with the weighting function ' u ( ; ) defined in (5). Hence, for any non-constant analytic MT-function u ( ), which has a convergent Taylor series expansion, the MT-mean and MT-covariance involve higher-order statistical moments of P ;. By modifying the MT-function u ( ), such that the condition (3) is satisfied, the MT-mean and MT-covariance under Q ; are modified. In particular, by choosing u ( ) to be any non-zero constant valued function we have Q ; = P ;, for which the standard mean vector µ ( ) and covariance matrix ( ) are obtained. C. The empirical MT-mean and MT-covariance According to (7) and (8) the MT-mean and MT-covariance can be estimated using only samples from the distribution P ;. In the following Proposition, strongly consistent estimates of µ ( ) are constructed based on N i.i.d. samples of. Proposition 2 (Strongly consistent estimates of the MT-mean and MT-covariance). Let n, n =1,...,N denote a sequence of i.i.d. samples from P ;. Define the empirical mean and covariance estimates: ˆµ, N n ˆ' u ( n ) (9) n=1

6 6 and respectively, where If ˆ, N n=1 n ˆ' u ( n ), ˆµ n ˆµ H ˆ'u ( n ), () u ( n ) P N n=1 u ( n). (11) h i E kk 2 2 u ();P ; < 1 then ˆµ w.p. 1! µ ( ) and ˆ w.p. 1! ( ) as N!1, where w.p.! 1 denotes convergence with probability (w.p.) 1 [25]. [The proof is very similar to the proof of Proposition 2 in [26] and therefore omitted] Note that for u() 1 the estimators ˆµ N and N 1 ˆ reduce to the standard unbiased sample mean vector and sample covariance matrix (SCM), respectively. Also notice that ˆµ written as statistical functionals [ ] and (2), i.e., and and ˆ can be [ ] of the empirical probability measure ˆP defined in ˆµ = E[u (); ˆP ] E[u (); ˆP, [ ˆP ] (12) ] ˆ E[( [ = ˆP ])( [ ˆP ]) H u (); ˆP ] E[u (); ˆP, ] [ ˆP ]. (13) By (5), (7), (8), (12) and (13), when ˆP is replaced by P ; we have [P ; ] = µ ( ) and [P ]= ( ), which implies that ˆµ and ˆ ; are Fisher consistent [27]. D. Robustness to outliers Here, we study the robustness of the empirical MT-mean and MT-covariance to outliers using their vector and matrix valued influence functions [28]. Define the probability measure P, (1 )P ; + y, where 0 1, y 2 C p, and y is the Dirac probability measure at y. The influence function of a Fisher consistent estimator with statistical functional H[ ] at probability distribution P ; [28]: is defined as H [P ] H [P ; ] IF H (y; P ; ), lim [P ]. =0 The influence function describes the effect on the estimator of an infinitesimal contamination at the point y. An estimator is said to be B-robust if its influence function is bounded [28]. Using (12)-(14) one can

7 7 verify that the influence functions of ˆµ and IF x IF x and (y; P ; )= u (y)[(y ˆ are given by µ ( )) (y; P ; )= u (y)(y E[u ();P ; ] µ ( ))(y µ ( )) H E[u(); P ; ] (15) ( )], (16) respectively. The following proposition states a sufficient condition for boundedness of (15) and (16). This condition is satisfied for the Gaussian MT-functions proposed in Section V. Proposition 3. The influence function (15) and (16) are bounded if the MT-function u(y) and the product u(y)kyk 2 2 are bounded over Cp. [The proof is similar to the proof of Proposition 3 in [26] and therefore omitted] III. THE MEASURE-TRANSFORMED QUASI MAIMUM LIKELIHOOD ESTIMATOR In this section we develop an estimator for that minimizes the empirical Kulback-Leibler divergence between the transformed probability measure Q ; and a complex circular Gaussian probability distribution [29] ;#, # 2 with MT-mean µ (#) and MT-covariance (#). Regularity conditions for consistency, asymptotic normality and unbiasedness are derived. Additionally, we provide a closedform expression for the asymptotic MSE and obtain a necessary and sufficient condition for asymptotic efficiency. Optimal selection of the MT-function out of some parametric class of functions is also discussed. A. The MT-QMLE The Kullback-Leibler divergence (KLD) between Q ; D KL h Q ; and ;# is defined as [19] : i ;#, E log q (; ) (; #) ; Q ;, (17) where q (x; ) and (x; #), det 1 h (#) i exp x µ H (#) 1 (#) x µ (#) (18) are the density functions of Q ; and ;#, respectively, w.r.t. a dominating -finite measure on S. h i According to (6), D Q ; ;# can be estimated using only samples from P ;. Therefore, similarly to (9) and (), an empirical estimate of (17) given a sequence of samples n, n =1,...,N from P ; is defined as: ˆD KL hq ; i ;#, N n=1 ˆ' u ( n ) log q ( n ; ) ( n ; #), (19)

8 8 where ˆ' u ( ) is defined in (11). The proposed estimator of is obtained by minimization of (19) w.r.t. #, which by (9), () and (18) amounts to maximization of the following objective function: h i J u (#), D ˆ LD (#) ˆµ µ (#) 2, (20) ( x (#)) 1 where D LD [A B], tr AB 1 log det AB 1 p is the log-determinant divergence [30] between positive definite matrices A, B 2 C p p, and kak C, p a H Ca denotes the weighted `2-norm of a vector a 2 C p with positive-definite weighting matrix C 2 C p p. Hence, the proposed MT-QMLE is given by: ˆ u = arg max J u (#). (21) B. Asymptotic performance analysis Here, we study the asymptotic performance of the estimator (21). For simplicity, we assume a sequence of i.i.d. samples n, n =1,...,N from P ;. Theorem 1 (Strong consistency of ˆ u ). Assume that the following conditions are satisfied: (A-1) The parameter space is compact. (A-2) µ ( ) 6= µ (A-3) (#) is non-singular 8# 2. (#) or ( ) 6= (#) 8 6= #. (A-4) µ (#) and (#) are continuous over. h i (A-5) E kk 2 2 u ();P ; < 1. Then, ˆ u w.p. 1! as N!1. (22) [A proof is given in Appendix B] Theorem 2 (Asymptotic normality and unbiasedness of ˆ u ). Assume that the following conditions are satisfied: (B-1) ˆ u P! as N!1, where P! denotes convergence in probability [25]. (B-2) lies in the interior of which is assumed to be compact. (B-3) µ (#) and (#) are twice continuously differentiable in. (B-4) E u 2 h i ();P ; < 1 and E kk 4 2 u2 ();P ; < 1. Then, ˆ u D! N (0, C u ( )) as N!1, (23)

9 9 where D! denotes convergence in distribution [25]. The asymptotic MSE takes the form: C u ( ) =N 1 F 1 u ( ) G u ( ) F 1 u ( ), (24) where G u (#), E u 2 () u (; #) T u (; #);P ;, (25) u (; #), r # log (; #), (26) F u (#), E[u () u (; #);P ; ], (27) u (; #), r 2 # log (; #) (28) and it is assumed that F u ( ) is non-singular. [A proof is given in Appendix C] The following proposition relates the asymptotic MSE (24) of ˆ u and the Cramér-Rao lower bound (CRLB) [31], [32]. Proposition 4 (Relation to the CRLB). Let f (x; #), dp ;# (x)/d (x) (29) denote the density function of P ;# w.r.t. a -finite dominating measure on S, and let Assume that the following conditions are satisfied: (; #), r # log f(; #). (30) (C-1) For any # 2, the k k@# j k, j =1,...,p exist a.e., where # k, k =1,...,p denote the entries of the vector #. (C-2) For any # 2, v k (x; log (x; k f (x; #) u (x) 2L 1 (,), k =1,...,p, where L 1 (,) denotes the space of absolutely integrable functions, defined on, w.r.t. the measure. (C-3) There exist dominating functions r k,j (x) 2L 1 (,), k, j =1,...,p, such that r k,j (x) a.e. (C-4) The matrix G u ( ) (25) and the Fisher information matrix are non-singular. I FIM ( ), E (; ) T (; );P j Then, C u ( ) N 1 I 1 FIM ( ),

10 where equality holds if and only if [A proof is given in Appendix D] (; ) =I FIM ( ) F 1 u ( ) u (; ) u () w.p. 1. (31) One can verify that when P ; is a Gaussian measure, the condition (31) is satisfied only for non-zero constant valued MT-functions for which Q ; = P ; and the resulting MT-mean and MT-covariance (7), (8) only involve first and second-order moments. This implies that in the Gaussian case, non-constant MTfunctions will always lead to asymptotic performance degradation. In the non-gaussian case, however, as will be shown in the sequel, there are cases where the MT-function should deviate from a constant value in order to decrease the asymptotic MSE. This deviation results in weighted mean and covariance that involve higher-order moments. By solving the differential equation (31) one can obtain the MT-function for which the resulting MT-QMLE is asymptotically efficient. Unfortunately, in the non-gaussian case, the solution is highly cumbersome and requires the knowledge of the likelihood function f (; ) which is practically unavailable under the considered circumstances. Therefore, as described in Subsection III-C, we resort to another technique for optimal choice of the MT-function that is based on the following empirical estimate of the asymptotic MSE. Theorem 3 (Empirical asymptotic MSE). Define the empirical asymptotic MSE: where and Ĉ u (ˆ u ), N 1 ˆF 1 u (ˆ u )Ĝu(ˆ u )ˆF 1 u (ˆ u ), (32) Ĝ u (#), N 1 N n=1 u 2 ( n ) u ( n ; #) T u ( n ; #) (33) ˆF N u (#), N 1 u ( n ) u ( n ; #). (34) n=1 Furthermore, assume that the following conditions are satisfied: (D-1) ˆ u P! as N!1. (D-2) µ (#) and (#) are twice continuously differentiable in which is assumed to be compact. (D-3) E u 2 h i ();P ; < 1 and E kk 4 2 u2 ();P ; < 1. Then, Ĉ u (ˆ u ) P! C u ( ) as N!1. (35) [A proof is given in Appendix E]

11 11 C. Optimal choice of the MT-function We propose to specify the MT-function within some parametric family {u (;!),! 2 C r } that satisfies the conditions stated in Definition 1 and Theorem 3. For example, in order to gain robustness against outliers, the Gaussian family of functions that satisfy the conditions in Proposition 3 is a logical choice. An optimal choice of the MT-function parameter! would be this that minimizes the empirical asymptotic MSE (32) that is constructed by the same sequence of samples used for obtaining the MT- QMLE (21). IV. THE PLUG-IN MEASURE-TRANSFORMED QUASI MAIMUM LIKELIHOOD ESTIMATOR The MT-QMLE (21) requires the knowledge of the MT-mean and MT-covariance functions µ (#) and (#). When these quantities are unknown and only their empirical estimates µ (#) and (#) based on M samples from P ;# are available for any # 2, we proposed the following plug-in version of the MT-QMLE: u = arg max J u (#), (36) where J u (#), h i D ˆ LD (#) ˆµ µ (#) 2. (37) ( x (#)) 1 The following theorem states an asymptotic relation between the MT-QMLE and its plug-in version. Theorem 4. (Relation to the MT-QMLE) Assume that the estimators µ (#) and consistent, i.e., (#) are uniformly sup µ (#) µ (#) P! 0 and sup (#) (#) P! 0, as M!1. (38) Furthermore, assume that J u (#) (20) and J u (#) (37) have unique global maxima w.p. 1. Then, u P! ˆ u as M!1. (39) [A proof is given in Appendix F] Therefore, when the conditions stated in Theorem 4 are satisfied and M is sufficiently large, the plugin MT-QMLE (36) attains the MSE performance of the MT-QMLE (21). Hence, when both M and the number of samples N from P ; are sufficiently large, an empirical estimate of the asymptotic MSE of u can be obtained using (32) after replacing µ µ (#) and (#) in (18) with µ (#) and (#) and (#) with their sample estimates (replace (#) and recalculate the terms comprising (32)). Similarly to the MT-QMLE, this empirical estimate can be used for optimal choice of the MT-function.

12 12 A. Robust angle of arival based source localization V. APPLICATIONS We illustrate the use of the proposed MT-QMLE (21) for robust AOA based source localization in heavy-tailed compound Gaussian noise that produces outliers. We consider a uniform linear array (ULA) of p sensors with half wave-length spacing that receive a signal generated by a narrowband far-field point source with azimuthal AOA. Under this model the array output satisfies [20]: n = S n a ( )+W n, n =1,...,N, (40) where n is a discrete time index, n 2 C p is the vector of received signals, S n 2 C is the emitted signal, a ( ), [1, exp ( i sin ( )),...,exp ( i (r 1) sin ( ))] T is the steering vector of the array toward direction and W n 2 C p is an additive noise. We assume that the following conditions are satisfied: 1) 2 =[ /2, /2 ] where is a small positive constant, 2) the emitted signal is symmetrically distributed about the origin, 3) S n and W n are statistically independent and first-order stationary, and 4) the noise component is compound Gaussian with stochastic representation [21]: W n = n Z n, (41) where n 2 R ++ is a first-order stationary process, called the texture component, and Z n 2 C p is a proper-complex wide-sense stationary Gaussian process with zero-mean and scaled unit covariance 2 ZI. The processes n and Z n are also assumed to be statistically independent. In order to gain robustness against outliers, as well as sensitivity to higher-order moments, we specify the MT-function in the zero-centred Gaussian family of functions parametrized by a width parameter!, i.e., u G (x,!)=(! 2 ) p exp kxk 2 /! 2,! 2 R ++. (42) One can verify that the MT-function (42) satisfies the B-robustness conditions stated in Proposition 3. Using (7), (8) and (40)-(42) it can be shown that the MT-mean and MT-covariance under Q (u G) ;# by: are given µ (ug) (#;!) =0 (43) and (ug) (#;!) =r S (!) a (#) a H (#)+r W (!) I, (44) respectively, where r S (!) and r W (!) are some strictly positive functions of!. Hence, by substituting (42)-(44) into (20) the resulting MT-QMLE (21) is obtained by solving the following maximization

13 13 problem: where Ĉ(uG) (!), ˆ (ug) ˆ ug (!) = arg max ah (#) Ĉ(uG) (!) a (#), (!)+ˆµ (ug) (!) ˆµ (ug)h (!). Under the considered settings, it can be shown that the conditions stated in Theorems 1-3 are satisfied. The resulting asymptotic MSE (24) is given by: h E 4 4 Z + 2 Z 2!2 p 2 S 2 h p 2pS, p i 2 C ug ( ;!) = 2 Z 2 Z,! ; P S, 6 h +!2 E 2 p S 2 h p i ps, Z,! ; P 2 cos 2 ( )(p 2 1) N, (45) S, where h (S,,!), ( 2 +! 2 )/! 2 p 2 exp S 2 /( 2 +! 2 ). (46) Furthermore, its empirical estimate (32) takes the form: P N n=1 Ĉ ug (ˆ ug (!);!) = 2 ( n, ˆ ug (!))u 2 G ( n,!) ( P N n=1 ( n, ˆ ug (!))u G ( n,!)), (47) 2 where (,#), 2Re ȧ H (#) H a (#), ȧ (#), da (#)/d#, ä (#), d 2 a (#)/d 2 # and (,#), 2Re ä H (#) H a (#) + ȧ H (#) 2. In the following simulation examples, we consider a BPSK signal impinging on a 4-element ULA with half wavelength spacing from DOA = 30. We consider an -contaminated Gaussian noise model [21] under which the texture component of the compound Gaussian noise (41) is a binary random variable with =1w.p. 1 and = c w.p.. The parameters and c that control the heaviness of the noise tails were set to 0.2 and 0, respectively. We define the signal-to-noise-ratio (SNR) as SNR, log 2 S / 2 Z (1 )+ c 2. In the first simulation example, we compared the asymptotic MSE (45) to its empirical estimate (47) as a function of! obtained from a single realization of N = 00 i.i.d. snapshots with SNR = 25 [db]. Observing Fig. 1, one sees that due to the consistency of (47) the compared quantities are close. In the second simulation example, we compared the empirical, asymptotic (45) and empirical asymptotic (47) MSEs of the MT-QMLE to the empirical MSEs obtained by the SCM-based estimator [22] and the MLE. The optimal Gaussian MT-function parameter! opt was obtained by minimizing (47) over =[1, 30]. Here, N = 00 snapshots were used with averaging over 00 Monte-Carlo simulations. The SNR is used to index the performances as depicted in Fig. 2. One sees that the MT-QMLE outperforms the non-robust SCM-based estimator and performs similarly to the MLE that, unlike the MT-QMLE, requires complete knowledge of the likelihood function. Additionally, one can notice that the empirical, asymptotic and empirical asymptotic MSEs of the MT-QMLE are nearly identical.

14 14 In order to evaluate the performance of the proposed estimator under the nominal Gaussian noise case, we set the contamination parameter to zero and repeated the simulation examples described above under the same settings with the exception that different SNR levels, ranging from 0 to 20 [db], were considered. Observing Fig. 3, one sees that, again, the asymptotic MSE and its consistent empirical estimate behave similarly as the width parameter of the Gaussian MT-function! varies. Furthermore, observing Fig. 4, one can notice that the proposed MT-QMLE performs similarly to the standard SCM based estimator when the noise is normally distributed. Asymptotic RMSE Empirical asymptotic RMSE 0 RMSE [deg] ω Fig. 1. AOA based source localization in -contaminated Gaussian noise: Asymptotic RMSE (45) and its empirical estimate (47) versus the width parameter! of the Gaussian MT-function (42). The true angle of arrival, number of snapshots and signalto-noise ratio were set to =30, N =00and SNR = 25 [db], respectively. Notice that due to the consistency of (47) the compared quantities are close. B. Robust received signal strength based source localization Here, we illustrate the use of the plug-in MT-QMLE (36) for robust received signal strength (RSS) based source localization in heavy-tailed non-gaussian noise. For simplicity, we consider an array of p anchor sensors located on a circle with radius r a [meters] that receive the power of a source located on a circle with known radius r s [meters]. Under the log-distance path loss model [23], the array output satisfies: n = a ( )+W n, n =1,...,N, (48)

15 15 2 RMSE [deg] Empirical RMSE: MT QMLE Asymptotic RMSE: MT QMLE Empirical asymptotic RMSE: MT QMLE Empirical RMSE: SCM Empirical RMSE: MLE SNR [db] Fig. 2. AOA based source localization in -contaminated Gaussian noise: The empirical, asymptotic (45) and empirical asymptotic (47) RMSEs of the MT-QMLE versus SNR as compared to the empirical RMSEs of the SCM-based estimator and the MLE. The true angle of arrival and number of snapshots were set to =30 and N =00, respectively. Notice that the MT-QMLE outperforms the non-robust SCM-based estimator and performs similarly to the MLE that, unlike the MT-QMLE, requires complete statistical information. 1 Asymptotic RMSE Empirical asymptotic RMSE RMSE [deg] ω Fig. 3. AOA based source localization in Gaussian noise: Asymptotic RMSE (45) and its empirical estimate (47) versus the width parameter! of the Gaussian MT-function (42). The true angle of arrival, number of snapshots and signal-to-noise ratio were set to =30, N =00and SNR = 0 [db], respectively. Notice that due to the consistency of (47) the compared quantities are close.

16 16 0 Empirical RMSE: MT QMLE Asymptotic RMSE: MT QMLE Empirical asymptotic RMSE: MT QMLE Empirical RMSE: SCM Empirical RMSE: MLE RMSE [deg] SNR [db] Fig. 4. AOA based source localization in Gaussian noise: The empirical, asymptotic (45) and empirical asymptotic (47) RMSEs of the MT-QMLE versus SNR as compared to the empirical RMSEs of the SCM-based estimator and the MLE. The true angle of arrival and number of snapshots were set to =30 and N =00, respectively. Notice that the MT-QMLE performs similarly to the SCM-based estimator when the noise is normally distributed. where n is a discrete time index, n 2 R p is a vector of received signals, [a ( )] k, P 0 log (d k ( )/d 0 ), k =1,...,p, (49) P 0 is the reference power in [db] at reference distance d 0 [meters], is the path loss exponent, is the distance in [meters] between the source and the k-th anchor, d k ( ), kz s ( ) z a ( k )k 2 (50) z s ( ), r s [cos ( ), sin ( )] T (51) is the source coordinates vector, 2 =[0, 2 ] is the unknown angle to be estimated, is a small positive constant, z a ( k ), r a [cos ( k ), sin ( k )] T (52) is the vector of coordinates of the k-th anchor sensor, and W n 2 R p is an unobserved additive i.i.d. noise process with unknown distribution. In this case, the MT-mean µ (#) and MT-covariance (#) are unknown, and therefore, the MT-QMLE (21) cannot be implemented. We assume a training sequence W 0 m, m =1,...,M of i.i.d.

17 17 samples from the unknown noise distribution P W that is independent of the observation sequence n, n =1,...,N. Using the training sequence we obtain empirical estimates of µ (#) and (#) for any value of # 2 and implement the plug version of the MT-QMLE (36). By (7), (8) and (48) it follows that the MT-mean and MT-covariance of under Q ;# are given by µ (#) =a (#)+µ (v) W (#) (53) and (#) = (v) W (#), (54) respectively, where the parametric MT-function v ( ; ) is defined as v (W; #), u (a (#)+W). (55) Based on relations (53) and (54) we define the following empirical estimates of µ (#) and (#): µ (#), a (#)+ µ (v) W (#) (56) and (#), (v) W (#), (57) where µ (v) W (#) and (v) W (#) are obtained using (9) and () with the training noise sequence W 0 m, m =1,...,M and the parametric MT-function (55). The plug-in MT-QMLE is obtained by substituting (56) and (57) into (37) followed by the maximization step (36). In order to test the RSS-based localization performance under heavy-tailed non-gaussian noise that produces outliers, we assume a real compound Gaussian noise with stochastic representation [21] W n = n Z n, (58) where n 2 R ++ is a first-order stationary process, called the texture component, and Z n 2 R p is a real wide-sense stationary Gaussian process, that is independent of n, with zero-mean and scaled unit covariance 2 ZI. Furthermore, in order to gain robustness against outliers and sensitivity to higher-order statistical information we consider the following Gaussian MT-function that satisfies the B-robustness conditions stated in Proposition 3. u G (x;!) =(2 2 ) p/2 exp where!, [t, ] T, t is a location parameter and is a scale parameter. kx a (t)k 2 /2 2, (59) Under the choice of the Gaussian MT-function (59), it can be shown that the empirical estimates of µ (u G) (#;!) and (u G) (#;!), defined in (56) and (57), are uniformly consistent. Therefore, by Theorem

18 18 4 the plug-in MT-QMLE attains the asymptotic MSE (24) of the MT-QMLE when the noise training sequence length, M, is sufficiently large. A closed form expression of the resulting asymptotic MSE is given in Appendix G. Moreover, an empirical estimate of the asymptotic MSE is obtained using (32) after replacing µ (u G) (#;!) and (u G) (#;!) with their sample estimates (56), (57). A closed form expression of the empirical asymptotic MSE of the plug-in MT-QMLE is given in Appendix H. In the following simulation examples, we consider p =4sensors located on a circle with radius r a =5 [meters] that receive the power of a source located on a wider circle with radius r s = 50 [meters]. The angle was set to 180, the reference power and distance were set to P 0 = [db] and d 0 =1[meter], and the path loss exponent was set to =5. We consider an -contaminated Gaussian noise model [21] under which the texture component of the compound Gaussian noise (58) is a binary random variable with =1w.p. 1 and = c w.p.. The parameters and c that control the heaviness of the noise tails were set to 0.2 and 50, respectively. The length of the noise training sequence was set to M = 000. In the first simulation example, we compared the asymptotic MSE (86) to its empirical estimate (8) as a function of the location and width parameters t and of the Gaussian MT-function (59). The empirical estimate was obtained from a single realization of N = 00 i.i.d. snapshots with noise variance 2 W = [db]. Observing Fig. 5, one sees that the compared quantities are close. In the second simulation example, we compared the empirical, asymptotic (86) and empirical asymptotic (8) MSEs of the plug-in MT-QMLE to the empirical MSEs obtained by the MLE and the standard sample mean based estimator, which is the MLE under the assumption of spatially white Gaussian noise. The optimal Gaussian MT-function parameters t opt and opt were obtained by minimizing the empirical asymptotic MSE (8) over =[0, 2 ] [2, 15]. Here, N = 00 snapshots were used with averaging over 00 Monte-Carlo simulations. The noise variance 2 W is used to index the performances as depicted in Fig. 6. One sees that the plug-in MT-QMLE outperforms the non-robust sample-mean based estimator and performs similarly to the MLE that, unlike the plug-in MT-QMLE, requires complete knowledge of the likelihood function. Additionally, one can notice that the empirical, asymptotic and empirical asymptotic MSEs of the plug-in MT-QMLE are nearly identical. In order to evaluate the performance of the proposed estimator under the nominal Gaussian noise case, we set the contamination parameter to zero and repeated the simulation examples described above under the same settings with the exception that different SNR levels, ranging from 0 to 20 [db], were considered. Observing Fig. 7, one sees that, again, the asymptotic MSE and its consistent empirical estimate behave similarly as the location and width parameters of the Gaussian MT-function varies. Furthermore, observing Fig. 8, one can notice that the proposed MT-QMLE performs similarly to the

19 Root EAMSE [deg] Root AMSE [deg] t [deg] 5 t [deg] τ τ (a) (b) Fig. 5. RSS based source localization in -contaminated Gaussian noise: (a) The asymptotic RMSE (86) of the plug-in MT-QMLE and (b) its empirical estimate (8) versus the the location and width parameters (t, ) of the Gaussian MT-function (59). The true angle, the number of observations, the noise training sequence length and the noise variance were set to = 180, N = 00, M = 000, and 2 W = [db], respectively. Notice that due to the consistency of (8) the compared quantities are close. standard SCM based estimator when the noise is normally distributed. VI. C ONCLUSION In this paper a new multivariate estimator, called MT-QMLE, was derived by applying a transform to the probability distribution of the data. A plug-in version of the MT-QMLE was also developed that is based on empirical estimates of the MT-mean and MT-covariance functions that characterize the hypothesized Gaussian probability measure. By specifying the MT-function in the Gaussian family, the proposed MTQMLE and its plug-in version were applied to robust AOA and RSS based source localization problems. Exploration of other MT-functions may result in additional estimators in this class that have different useful properties. A PPENDI A. Proof of Proposition 1: Property 1: Since 'u (x, ) is nonnegative, then by Corollary in [25] Q is a measure on S. Furthermore, Q; ( ) = 1 so that Q; is a probability measure on S.

20 20 2 Empirical RMSE: plug in MT QMLE Asymptotic RMSE: plug in MT QMLE Empirical asymptotic RMSE: plug in MT QMLE Empirical RMSE: sample mean Empirical RMSE: MLE 1 RMSE [deg] σ2w [db] Fig. 6. RSS based source localization in -contaminated Gaussian noise: The empirical, asymptotic (86) and empirical asymptotic (8) RMSEs of the plug-in MT-QMLE versus the noise variance 2 W as compared to the empirical RMSEs of the sample-mean based estimator and the MLE. The true angle, the number of observations and the noise training sequence length were set to = 180, N = 00 and M = 000, respectively. Notice that the plug-in MT-QMLE outperforms the non-robust sample-mean based estimator and performs similarly to the MLE that, unlike the plug-in MT-QMLE, requires complete statistical Root EAMSE [deg] Root AMSE [deg] information τ (a) t [deg] t [deg] τ (b) Fig. 7. RSS based source localization in Gaussian noise: (a) The asymptotic RMSE (86) of the plug-in MT-QMLE and (b) its empirical estimate (8) versus the the location and width parameters (t, ) of the Gaussian MT-function (59). The true angle, the number of observations, the noise training sequence length and the noise variance were set to = 180, N = 00, M = 000, and 2 W = 5 [db], respectively. Notice that due to the consistency of (8) the compared quantities are close.

21 21 1 RMSE [deg] 0 1 Empirical RMSE: plug in MT QMLE Asymptotic RMSE: plug in MT QMLE Empirical asymptotic RMSE: plug in MT QMLE Empirical RMSE: sample mean Empirical RMSE: MLE σ 2 [db] W Fig. 8. RSS based source localization in Gaussian noise: The empirical, asymptotic (86) and empirical asymptotic (8) RMSEs of the plug-in MT-QMLE versus the noise variance 2 W as compared to the empirical RMSEs of the sample-mean based estimator and the MLE. The true angle, the number of observations and the noise training sequence length were set to =180, N =00and M =000, respectively. Notice that the plug-in MT-QMLE performs similarly sample mean based estimator when the noise is normally distributed. Property 2: Follows from definitions and in [25]. B. Proof of Theorem 1: Define the deterministic quantity i J u (#), D LD h ( ) (#) µ ( ) µ (#) 2 ( x (#)) 1. (60) According to Lemmas 1 and 2, stated below, Ju (#) is uniquely maximized at # = and the random objective function J u (#) defined in (20) converges uniformly w.p. 1 to J u (#) as N!1. Furthermore, by assumptions A-3 and A-4 J u (#) is continuous over the compact parameter space w.p. 1. Therefore, by Theorem 3.4 in [33] we conclude that (22) holds. Lemma 1. Let Ju (#) be defined by relation (60). If conditions A-1-A-4 are satisfied, then J u (#) is uniquely maximized at # =.

22 22 Proof. By (60) and assumptions A-3 and A-4 J u (#) is continuous over the compact set. Therefore, by the extreme-value theorem [36] it must have a global maximum in. We now show that the global maximum is uniquely attained at # =. According to (60) i J u ( ) Ju (#) =D LD h ( ) (#) + µ ( ) µ (#) 2 ( x (#)) 1. (61) Notice that the log-determinant divergence between positive-definite matrices A, B satisfies D LD [A B] 0 with equality if and only if A = B [30]. Also notice that the weighted `2-norm of a vector a with positive-definite weighting matrix C satisfies kak C 0 with equality if and only if a = 0. Therefore, by assumptions A-2 and A-3 we conclude that the difference in (61) is strictly positive for all # 6=, which implies that Ju (#) is uniquely maximized at # =. Lemma 2. Let J u (#) and J u (#) be defined by relations (20) and (60), respectively. If conditions A-1 and A-3-A-5 are satisfied, then sup J u (#) Ju (#) w.p. 1! 0 as N!1. (62) Proof. Using (20), (60), the Cauchy-Schwartz inequality and its the matrix version [34] one can verify that sup J u (#) Ju (#) sup ( ) ˆ + µ Fro 1 (#) + 2 µ ( ) ˆµ Fro sup ( ) µ H ( ) ˆµ + log det ˆ 1 ( ) ˆµ H 1 (#) µ (#) w.p. 1. Notice that by assumptions A-3 and A-4 and the compactness of, the terms sup 1 (#) µ sup µ ( ) and ˆ continuous mappings of ˆµ Fro k( (#)) 1 k Fro and (#) are finite. Also notice that by assumption A-5 and Proposition 2 ˆµ w.p. 1! w.p. 1! ( ) as N!1, and that the operators k k Fro, k k, log det [ ] define real and ˆ. Hence, by the Mann-Wald Theorem [35] the upper bound in (63) converges to zero w.p. 1 as N!1, and therefore, the relation (62) must hold. C. Proof of Theorem 2: By assumptions B-1 and B-2 the estimator ˆ u is weakly consistent and the true parameter vector lies in the interior of, which is assumed to be compact. Therefore, we conclude that ˆ u lies in an open

23 23 neighbourhood U of with sufficiently high probability as N gets large. Hence, ˆ u is a local maximum point of the objective function J u (#) (20) whose gradient satisfies r # J u ˆ u N u ( n )= n=1 N u ( n ) u n ; ˆ u = 0. (63) n=1 By (18) and assumption B-3, the vector function u (; #) defined in (26) is continuous over w.p. 1. Using (63) and the mean-value theorem [37] applied to each entry of u (; #) we conclude that 0 = 1 p N N n=1 u ( n ) u n, ˆ u = 1 p N N n=1 u ( n ) u ( n, ) ˆFu ( ) p N ˆ u, (64) where ˆF u (#) defined in (34) is an unbiased estimator of the symmetric matrix function F u (#) (27), u (; #) is defined in (28) and lies on the line segment connecting ˆ u and. Since ˆ u is weakly consistent must be weakly consistent as well. Therefore, by Lemma 3, stated below, ˆF u ( ) P! F u ( ) as N!1, where F u (#) is non-singular at # = by assumption. Hence, by the Mann-Wald theorem [35] ˆF 1 u ( ) P! F 1 u ( ) as N!1, (65) which implies that ˆF u ( ) is invertible with sufficiently high probability as N gets large. Therefore, by (64) the equality p N ˆ u = ˆF u 1 ( 1 ) p N N n=1 u ( n ) u ( n, ) (66) holds with sufficiently high probability as N gets large. Furthermore, by Lemma 5 stated below 1 p N N n=1 u ( n ) u ( n, ) D! N (0, G u ( )) as N!1, (67) where G u (#) is defined in (25). Thus, by (65)-(67) and Slutsky s theorem [25] the relation (23) holds. Lemma 3. Let F u (#) and ˆF u (#) be the matrix functions defined in (27) and (34), respectively. Furthermore, let denote an estimator of. If P! as N!1and conditions B-3 and B-4 are satisfied, then ˆF u ( ) P! F u ( ) as N!1. Proof. By the triangle inequality kˆf u ( ) F u ( ) kkˆf u ( ) F u ( ) k + kf u ( ) F u ( )k. (68) Therefore, it suffices to prove that kˆf u ( ) F u ( ) k! P 0, as N!1 (69)

24 24 and kf u ( ) F u ( )k P! 0 as N!1. (70) By Lemma 4, stated below, F u (#) is continuous in. Therefore, since P! as N!1, by the Mann-Wald Theorem [35] we conclude that (70) holds. We further conclude by Lemma 4 that kˆf u ( ) F u ( ) ksup ˆF u (#) F u (#) P! 0, as N!1. Lemma 4. Let F u (#) and ˆF u (#) be the matrix functions defined in (27) and (34), respectively. If conditions B-3 and B-4 are satisfied, then F u (#) is continuous in and sup as N!1. ˆF u (#) F u (#) Proof. Notice that the samples n, n =1,...,N are i.i.d., the parameter space is compact and by (18) and assumption B-3 the matrix function u (; #) defined in (28) is continuous in w.p. 1. Furthermore, using (18), (28), the triangle inequality, the Cauchy-Schwartz inequality, assumption B-3, and the compactness of it can be shown that there exists a positive scalar C such that [ u (; #)] k,j 1+kk 2 C w.p. 1. 8k, j =1,...,p. Therefore, by (27), (34), assumption B-4 and the uniform weak law of large numbers [39] we conclude that F u (#) is continuous in and sup as N!1. ˆF u (#) F u (#) P! 0 P! 0 Lemma 5. Given a sequence n, n =1,...,N of i.i.d. samples from P ; 1 p N N n=1 u ( n ) u ( n, ) D! N (0, G u ( )) as N!1, where G u ( ) and u (, ) are defined in (25) and (26), respectively. Proof. Since n, n =1,...,N are i.i.d. random vectors and the functions u ( ) and u (, ) are real the products u ( n ) u ( n, ) n =1,...,N are real and i.i.d. Furthermore, by Lemmas 6 and 7, stated below, the definition of G u ( ) in (25) and the compactness of we have that E[u () u (, );P ; ]= 0 and cov [u () u (, );P ; ] = G u ( ) is finite. Therefore by the central limit theorem [38] 1 NP we conclude that p N u ( n ) u ( n, ) is asymptotically normal with zero mean and covariance G u ( ). n=1 Lemma 6. The random vector function u (, #) defined in (26) satisfies E[u () u (, #);P ;# ]=0. (71)

25 25 Proof. Notice that E [u () u (, #);P ;# ]=E [ u (, #) ' u (; #);P ;# ] E [u ();P ;# ], (72) where ' u ( ; ) is defined in (5) and the expectation E [u ();P ;# ] is finite by assumption (3). Also notice that by (18) and (26) the k-th entry of the vector function u (, #) is given by: " # [ u (, #)] log (; #) = tr (#) k ( ) + 2Re µ H (#) (#) + tr k " (#) (#) 1 k Therefore, by (7), (8), (72) and (73) the relation (71) is easily verified. µ (#) (73) # µ H (#). Lemma 7. Let G u (#) and Ĝu (#) be the matrix functions defined in (25) and (33), respectively. If conditions B-3 and B-4 are satisfied, then G u (#) is continuous in and sup as N!1. Ĝ u (#) G u (#) Proof. Notice that the samples n, n =1,...,N are i.i.d., the parameter space is compact and by (18) and assumption B-3 the vector function u (; #) defined in (26) is continuous in w.p. 1. Using (18), (26), the triangle inequality, the Cauchy-Schwartz inequality, assumption B-3, and the compactness of it can be shown that there exists a positive scalar C such that T u (; #) u (; #) 1+kk k,j 2 + kk kk3 2 + kk4 2 C 8k, j =1,...,p. (74) Therefore, by (25), (33), assumption B-4 and the uniform weak law of large numbers [39] we conclude that G u (#) is continuous in and sup D. Proof of Proposition 4: Ĝ u (#) G u (#) P! 0 as N!1. Define (; ), u() u(; ). Using the definitions of C u ( ), G u ( ) and F u ( ) in (24), (25) and (27), respectively, identity 1 stated below, the symmetricity of F u ( ), the non-singularity of G u ( ) and the covariance semi-inequality [3] one can verify that C 1 u ( ) = NE[(; ) T (; );P ; ]E 1 (; ) T (; );P ; E[ (; ) T (; ); P ; ] P! 0 NE[(; ) T (; ); P ; ], NI FIM ( ). (75)

26 26 where equality holds if and only if the relation (31) holds. The proof is complete by taking the inverse of (75). Identity 1. F u ( ) =E[u() u(; ) T (; ); P ; ] Proof. According to Lemma 6, stated in Appendix C, E[u () u (, #);P ;# ]=0. Therefore, by (1) and (29) 0 log T (x; #) u (x) f (x; #) (76) = Z u log (x; T f (x; #) d (x) + log T (x; log f (x; #) u (x) f (x; #) The second equality in (76) stems from assumptions C-1-C-3 and Theorem 2.40 in [40] according to which integration and differentiation can be interchanged. Therefore, by (1), (27), (29), (30) and (76) E[u() u(; #) T (; #); P ;# ]= E[u() u (; #); P ;# ], F u ( ). (77) E. Proof of Theorem 3: Under assumption D-1, ˆ u P! as N!1. Therefore, by Lemma 3, stated in Appendix C, and Lemma 8, stated below, that require conditions D-2 and D-3 to be satisfied, we have that ˆF u (ˆ u )! P F u ( ) as N!1and Ĝu(ˆ u )! P G u ( ) as N!1. Hence, by the Mann-Wald theorem [35] we conclude that (35) is satisfied. Lemma 8. Let G u (#) and Ĝu (#) be the matrix functions defined in (25) and (33), respectively. Furthermore, let denote an estimator of. If P! as N!1and conditions D-2 and D-3 are satisfied, then Ĝu ( )! P G u ( ) as N!1. Proof. The proof is similar to the one of Lemma 3 stated in Appendix C. Simply replace F u (#) and ˆF u (#) with G u (#) and Ĝu (#) and use Lemma 7 instead of Lemma 4. F. Proof of Theorem 4: First, we show that the objective function J u (#) (37) converges uniformly in probability to the objective function J u (#) (20), i.e., we show that sup J u (#) Ju (#) P! 0 as M!1. Using (), (20), (37),

27 27 the triangle inequality, the Cauchy-Schwartz inequality, and its matrix extension [34] one can verify that sup where Ĉ J u (#) Ju (#) Ĉ, ˆ + ˆµ ˆµ H, B 1 (#), Fro sup B 1 (#)+sup 1 (#) h i B 2 (#), log det (#) B 3 (#), µ (#) 2 ( B 4 (#), Notice that for any k =1,...,4 Pr sup B k (#) > Pr sup + Pr sup + Pr sup Pr sup B k (#) >,sup B k (#) >,sup B k (#) >,sup B 2 (#) +sup B 3 (#) + ˆµ 1 (#) h log det (#) sup B 4 (#), (78), (79) Fro i, (80) µ x (#)) 1 (#) 2, ( x (#)) 1 (81) 1 (#) µ (#). (82) 1 (#) µ (#) µ (#) µ (#) > 1 (#) (#) > 2 µ (#) µ (#) 1, sup µ (#) µ (#) > 1 +Pr + Pr sup B k (#) >,sup sup µ (#) µ (#) 1, sup where, 1 and 2 are positive constants. By (38), Pr Pr sup Pr (#) (#) 2 (#) (#) > 2 sup (#) (#) 2, (83) µ (#) µ (#) > 1! 0 and (#) (#) > 2! 0 as M!1. Moreover, by (79)-(82) we have that sup B k (#) >,sup µ (#) µ (#) 1, sup as 1, 2! 0. Hence, by (83) we conclude that sup B k (#) therefore, by (78) sup J u (#) Ju (#) (#) (#) 2! 0 P! 0 as M!18k =1,...,4, and P! 0 as M!1. (84)

28 28 We now use this result to prove the statement of the theorem. Notice that for any >0 and >0 h i Pr u ˆ u > = Pr u ˆ u >,sup J u (#) Ju (#) > (85) + Pr Pr sup + Pr By (84) Pr sup and J u (#) have unique global maxima w.p. 1, u ˆ u >,sup J u (#) Ju (#) > u ˆ u >,sup J u (#) Ju (#) J u (#) Ju (#). J u (#) Ju (#) >! 0 as M!1. Furthermore, by (21) and (36) when J u (#) Pr u ˆ u >,sup Therefore, by (85) we conclude that u P! ˆ u as M!1. J u (#) Ju (#)! 0 as! 0. G. Asymptotic MSE of the plug-in MT-MLE in the problem of RSS based source localization by: According to (24), (53)-(55), (58) and (59), the asymptotic MSE of the plug-in MT-QMLE is given C ug ( ;!) = 1 N G ( ;!) F 2 ( ;!), (86) where G ( ;!) = B ( ;!)E[h ( ;,!);P ] and F ( ;!) = A ( ;!)E[g ( ;,!);P ]. The scalar functions A (#;!) and B (#;!) are defined here as A (#;!), 1 2 tr H 2 (#,!) + µ (u G) (#;!) 2 (u G ) 1 (87) x (#;!) and B (#;!), 0.25B 1 (#,!) 0.5B 2 (#,!)+0.25B 3 (#,!)+B 4 (#,!) B 5 (#,!), (88)

MEASURE TRANSFORMED QUASI LIKELIHOOD RATIO TEST

MEASURE TRANSFORMED QUASI LIKELIHOOD RATIO TEST Koby Todros Ben-Gurion University of the Negev Alfred O. Hero University of Michigan ABSTRACT In this paper, a generalization of the Gaussian quasi likelihood